fmaroni Posted September 30, 2009 Share Posted September 30, 2009 I implemented a simple test case in which my view shows years (2004 -> 2009) on columns, customers and months on rows (months are repeated for each customer), the only measure is invoice amount ($); the olap schema is a single cube with 3 (private) dimensions:- customer, which is mapped to a customers table ( almost 4000 records)- month, which is mapped to a months table (12 records, it has month name and number(pk) as fields)- year, which is mapped to a one-field(the year as integer) view obtained from the fact table using a "select distinct ..." statement on a datetime field the fact table is an invoices table with about 130000 records it has of coursea fk relation with the customres table, the months table and practically also with the years view (but the rdbms does not seem to allow a formal definition of fk relation between a table and a view) the fk fields in the fact table are customer id, year and month, all fo them of type int; month and year are computed columns (with persistent values) from a datetime field in the same table expanding everything in the view, to finally get also the customers dimension expanded, it takes about 2 minutes; re-collapsing and espanding again almost takes the same time or anyway a similar time (i.e. 1\'30\'\' vs 2\'00\'\'): is there a straight way to at least avoid what it seems a almost total re-computing with every user interaction ? (can the mondrian caching system be configured to help some more in that direction ?) i followed the performance advices in the mondrian tech. guide about the rdbms level, which in such simple case means to put indexes on all pk and fk fields, so i did. i was thinking there should be something i can still do for performance before going to defining precomputed aggregate tables, since i read in that guide that it has been proved possible to excute queries aganinst cubes with a fact table having millions of rows (but actually i have to say that cardinality of dimensions wasn\'t specified) mantaining immediate response times, without aggregate tables. i\'m also wondering if the bottleneck could be more in the ui html generation (when in that view everything is expanded, a table with about 3000*12 rows is shown, i don\'t know if that could give a performance problem or not) than in the sql generation/execution, how can i check if that is the case or not ? moreover, is, in general, from a performance p.o.v., in cases like this one (year dimension) when for 1 or more dimensions the members come from the fact table, better to use degenerated dimensions with sql expressions or, as it is for the schema i\'ll paste below, to have "utility" tables or views in the db fo each dimension ?(in this case i tried both ways but i had no important variation of response times) finally, again generally, can the fact that a dimension is mapped to a view instead of a table (year dimesion, in this case) somehow harm performances ? system info:JasperServer 3.1 on a debian 4.0 vm (vmware) with 1Gb of ram; physical host: pentium D 2.8GHz, 2GB of ram, win.2003sp2, sql server 2005 express(the db is in the sql sever 2005 instance on the physical host)jvm mem. options: -Xms512m -Xmx1024m -XX:PermSize=32m -XX:MaxPermSize=128m i tried everything also on a core2duo 2,53GHz physical machine with 4GB of ram (about 3.25 actually usable cause of the o.s. being win.xp-sp3), times are lower but still not good for an interactive usage (around the minute at best) I\'ll paste the query and the schema used in the code section, along with a translation of italian words used in the code. thanks in advance for any advice or for just reading Code: Post Edited by fmaroni at 09/30/2009 09:05 Link to comment Share on other sites More sharing options...
fmaroni Posted September 30, 2009 Author Share Posted September 30, 2009 At the moment it seems to me (of course i can always be wrong, being new to JasperServer, as it is surely clear) that the in memory representation of the cube is recalculated every time the user open a view and at almost every user-interaction ... is there a way to make jasperserver build the cube at a scheduled or anyway different moment preceding the user opening and interaction with the view ? Link to comment Share on other sites More sharing options...
fmaroni Posted September 30, 2009 Author Share Posted September 30, 2009 in the case i described in the 1st post, it seems there is an issue (still not totally sure if at olap engine or presentation level) related to the cardinality of the customers dimension (the table to which that dimension is mapped to have about 4000 records -> about 4000 customers) referreing to the the same setup, schema and query i described in the 1st post and to the core2duo machine i also mentioned there, i noticed that if i change the customers dimension to be a degenerated dimension which get its members from the fact table(that saves the engine from performing some costly join ops i guess), the time to have the totally expanded view goes from about 1 min. to 15-20 sec; maybe more important, i'm seeing that,while changing the num of records in the fact table from about 120000 to about 1000 still it itakes 50-60 sec to have the totally expanded view ready, it seems that the time decrease almost linearly decreasing the number of records in the customers table: going from about 4000 to about 400 records, i went from about 60 sec to 6-8 sec are things expected to work like that ? any advice ? i had applied from the start the rdbms level optimizations (indexes and compund indexexs, mainly, practically on every column used in that simple schema) adviced in the JasperAnalysis Ultimate Guide ; as you can see in the schema i have no "snowflakes" patterns, no virtual cubes, no parent-child hierarchy which could require futher schema level optimization, if i correctly understood the thing; i also tried, leaving everything else unchanged of course, the "approxRowCount" attribute, it hadn't any appreciable effect.i also used precalculated aggregate tables for invoice amounts (the measure), especially one holding all of the leaf level values (all of the values dispalyed with the view totally expanded), but time didn't lower significantly. should i do something about the mondrian engine properties (<-> "analysys options") ? i started lowering the records in the fact table (and then also in the big dimesion one, finding that "linearity" i didn't expect)as a preliminary test for a different implementation of the olap schema related to this consideration i found in paragraph 5.3.3of JasperAnalysis Ultimate Guide:---"One of the most important variables in determining the duration of a multidimensional query is the size of the fact tableinvolved. Depending on the nature of the query, the duration of the query may be more than directly proportional(polynomial) to the size of the fact table. In this case, it would be quicker to query two fact tables half the size, then takethe union of the results.Also, if a large fact table has many rows that are not commonly queried, it makes sense to split them into another table,making a logical partition. For example, if most queries about product orders are only concerned with the current year,putting order information for other years into separate fact tables would be an effective performance optimization. Theschema might have an orders_2007 and orders_historical table, and the orders_historical table could be further split intotables for each historical year."---any practical hint about how to implement a schema with multiple fact tables and getting the same result as before from the union on the results of the queries done against each fact table, would be very appreciated, since there is no explicit example or guidance in the mondrian tech guide about that.an indipendent cube for each partial fact table ? (since in JasperAnalysis Ultimate Guide virtual cube are not mentioned as a good thing for performance)cause of the times i obeserved as i described before, could/should i use a similar procedure but splitting my customers table into more tables, instead of doing that with the fact table ? (possible ? in case how to build such a schema, in a useful way for the explained purpose ?) thanks in advance Link to comment Share on other sites More sharing options...
swood Posted December 1, 2009 Share Posted December 1, 2009 I think the issue is in the HTML generation and rendering in the browser. If you save the rendered page, what is the size of the HTML? This is how much the server has to generate and the browser has to manage. Your customer dimension is the main driver of the result size. You can probably do some things like using NON EMPTY to reduce the displayed result. The engine is certainly not recalculating the underlying query results for each request, but the full screen is regenerated each time. ShermanJaspersoft Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now