Jump to content

Performance problem with a basic analysis view


fmaroni

Recommended Posts

I implemented  a simple test case in which my view shows years (2004 -> 2009) on columns, customers and months on rows (months are repeated for each customer), the only measure is invoice amount  ($);

 

the olap schema is a single cube with 3 (private) dimensions:

- customer, which is mapped to a customers table ( almost 4000 records)

-  month, which is mapped to a months table (12 records, it has month name and number(pk) as fields)

-  year, which is mapped to a one-field(the year as integer) view obtained from the fact table using a "select distinct ..." statement on a datetime field

 

the fact table is an invoices table with about 130000 records

 

it has of coursea fk relation with the customres table, the months table and practically also with the years view (but the rdbms does not seem to allow a formal definition of fk relation between a table and a view)

 

the fk fields in the fact table are customer id, year and month, all fo them of type int; month and year are computed columns (with persistent values) from a datetime field in the same table

 

expanding everything in the view, to finally get also the customers dimension expanded, it takes about 2 minutes;

 

re-collapsing and espanding again almost takes the same time or anyway a similar time (i.e. 1\'30\'\' vs 2\'00\'\'): is there a straight way to at least avoid what it seems a almost total re-computing with every user interaction ? (can the mondrian caching system be configured to help  some more in that direction ?)

 

i followed the performance advices in the mondrian tech. guide about the rdbms level, which in such simple case means to put indexes on all pk and fk fields, so i did. i was thinking there should be something i can still do for performance before going to defining precomputed aggregate tables, since i read in that guide that it has been proved possible to excute queries aganinst cubes with a fact table having millions of rows (but actually i have to say that cardinality of dimensions wasn\'t specified) mantaining immediate response times, without aggregate tables.

 

i\'m also wondering if the bottleneck could be more in the ui html generation (when in that view everything is expanded, a table with about 3000*12 rows is shown, i don\'t know if that could give a performance problem or not) than in the sql generation/execution, how can i check if that is the case or not ?

 

moreover, is, in general, from a performance p.o.v., in cases like this one (year  dimension)  when for 1 or more dimensions the members come from the fact table, better to use degenerated dimensions with sql expressions or, as it is for the schema i\'ll paste below, to have "utility" tables or views in the db fo each dimension ?

(in this case i tried both ways but i had no important variation of response times)

 

finally, again generally, can the fact that a dimension is mapped to a view instead of a table (year dimesion, in this case) somehow harm performances ?

 

system info:

JasperServer 3.1 on a debian 4.0 vm (vmware) with 1Gb of ram; physical host: pentium D 2.8GHz, 2GB of ram, win.2003sp2, sql server 2005 express

(the db is in the sql sever 2005 instance on the physical host)

jvm mem. options: -Xms512m -Xmx1024m -XX:PermSize=32m -XX:MaxPermSize=128m

 

i tried everything also on a  core2duo 2,53GHz physical machine with 4GB of ram (about 3.25 actually usable cause of the o.s. being win.xp-sp3), times are lower but still not good for an interactive usage (around the minute at best)

 

I\'ll paste the query and the schema used in the code section, along with a translation of italian words used in the code.

 

thanks in advance for any advice or for just reading

Code:

 



Post Edited by fmaroni at 09/30/2009 09:05
Link to comment
Share on other sites

  • Replies 3
  • Created
  • Last Reply

Top Posters In This Topic

Top Posters In This Topic

At the moment it seems to me (of course i can always be wrong, being new to JasperServer, as it is surely clear) that  the in memory representation of the cube is recalculated every time the user open a view and at almost every user-interaction ... is there a way to make jasperserver build the cube at  a scheduled or anyway different moment preceding the user opening and interaction with the view ?

Link to comment
Share on other sites

in the case i described in the 1st post, it seems there is an issue (still not totally sure if at olap engine or presentation level) related to the cardinality of the customers dimension  (the table to which that dimension is mapped to have about 4000 records -> about 4000 customers)

 

referreing to the the same setup, schema and query i described in the 1st post and to the core2duo machine i also mentioned there,

 

i noticed that if i change the customers dimension to be a degenerated dimension which get its members from the fact table

(that saves the engine from performing some costly join ops i guess), the time to have the totally expanded view goes from about 1 min. to 15-20 sec;

 

maybe more important, i'm seeing that,

while changing the num of records in the fact table from about 120000 to about 1000 still it itakes 50-60 sec to have the totally expanded view ready,

it seems that the time decrease almost linearly decreasing the number of records in the customers table:

 

going from about 4000 to about 400 records, i went from about 60 sec to 6-8 sec

 

are things expected to work like that ?

 

any advice ?

 

i had applied from the start  the rdbms level optimizations (indexes and compund indexexs, mainly, practically on every column used in that simple schema) adviced in the JasperAnalysis Ultimate Guide ; as you can see in the schema i have no "snowflakes" patterns, no virtual cubes, no parent-child hierarchy which could require futher schema level optimization, if i correctly understood the thing; i also tried, leaving everything else unchanged of course, the "approxRowCount" attribute, it hadn't any appreciable effect.

i also used precalculated aggregate tables for invoice amounts (the measure),  especially one holding all of the leaf level values (all of the values dispalyed with the view totally expanded), but time didn't lower significantly.

 

should i do something about the mondrian engine properties (<-> "analysys options") ?

 

i started lowering the records in the fact table (and then also in the big dimesion one, finding that "linearity" i didn't expect)

as a preliminary test for a different implementation of the olap schema related to this consideration i found in paragraph 5.3.3

of JasperAnalysis Ultimate Guide:

---

"One of the most important variables in determining the duration of a multidimensional query is the size of the fact table
involved. Depending on the nature of the query, the duration of the query may be more than directly proportional
(polynomial) to the size of the fact table. In this case, it would be quicker to query two fact tables half the size, then take
the union of the results.
Also, if a large fact table has many rows that are not commonly queried, it makes sense to split them into another table,
making a logical partition. For example, if most queries about product orders are only concerned with the current year,
putting order information for other years into separate fact tables would be an effective performance optimization. The
schema might have an orders_2007 and orders_historical table, and the orders_historical table could be further split into
tables for each historical year."

---

any practical hint about how to implement a schema with multiple fact tables and getting the same result as before from the union on the results of the queries done against each fact table, would be very appreciated, since there is no explicit example or guidance in the mondrian tech guide about that.

an indipendent cube for each partial fact table ? (since in JasperAnalysis Ultimate Guide virtual cube are not mentioned as a good thing for performance)

cause of the times i obeserved as i described before, could/should i use a similar procedure but splitting my customers table into more tables, instead of doing that with  the fact table ? (possible ? in case how to build such a schema, in a useful way for the explained purpose ?)

 

 

thanks in advance

Link to comment
Share on other sites

  • 2 months later...

I think the issue is in the HTML generation and rendering in the browser. If you save the rendered page, what is the size of the HTML? This is how much the server has to generate and the browser has to manage. Your customer dimension is the main driver of the result size. You can probably do some things like using NON EMPTY to reduce the displayed result.

 

The engine is certainly not recalculating the underlying query results for each request, but the full screen is regenerated each time.

 

Sherman

Jaspersoft

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×
×
  • Create New...