Jump to content

Can DistinctCount be done in SQL using AWS edition?


dmitry.dimov

Recommended Posts

I have a table along the lines of:

timestamp, user_id, ...

with hundreds of thousands of records per day, and tens of thousands of distinct users. I can get report of Count(user_id) daily without issues. However, when I switch aggregation to DistinctCount, the report is incomplete and only shows a couple days. When I look at the generated query, I see that it's trying to bring in all distinct user_id records, instead of sending COUNT (DISTINCT user_id) to the database. Because the record limit is 200,000, the report can only get data for a few days. If it did COUNT DISTINCT, it would of course be just a single number per day.

 

Thank you,

Dmitry

Link to comment
Share on other sites

  • 2 weeks later...
  • Replies 2
  • Created
  • Last Reply

Top Posters In This Topic

Dmitry,

The AWS product is not limited in the way it does aggregation.  Anything which works on stand-alone JRS, will work on AWS.

 

There is a setting you can change to push down aggregation, you can see if this addresses your issue (might depend on DB type):

http://community.jaspersoft.com/wiki/unexpected-results-andor-performance-issues-group-summaries-ad-hoc-v50-and-later

Alternatively, you could explore a couple other options:

- Create a derived "aggregate" table for the target data in his domain, which groups the values as desired and returns count(distinct ColumnName)

- Create a custom entry point/SQL function in the semantic layer config



 

 

BTW, the 200K limit can be changed in Ad Hoc Server Settings (Manage -> Server Settings -> Ad Hoc Settings -> "Ad Hoc Dataset Row Limit".

Link to comment
Share on other sites

  • 10 months later...

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×
×
  • Create New...