Jump to content

Most scheduled reports never run


theodan

Recommended Posts

Hello.

 

I have 5 report units in my repository.  I scheduled about 1000 jobs (total) for them via web services, all set to run at the same time (12:50).  My report queries can each take up to 60 seconds to run.

 

Some of the jobs did seem to kick off at the expected time.  All 1000 jobs had their "Next run time" updated, as expected. However, only about 80 of them actually seemed to run (i.e. generated outputs) or had their "Last ran at" timestamp updated.  The other 920 never seemed to actually run and had empty "Last ran at" timestamps.

 

I'm using JS 3.5, and have not modified any of the default Quartz settings or properties.

 

No exceptions were logged by either JS or Quartz.  The only curious entries I found were the last ones:

 

12:50:07,099  WARN FileBufferedOutputStream,Finalizer:240 - Error while deleting the temporary file
12:50:07,099  WARN FileBufferedOutputStream,Finalizer:240 - Error while deleting the temporary file
12:50:16,114  WARN FileBufferedOutputStream,Finalizer:240 - Error while deleting the temporary file
12:50:57,083  WARN FileBufferedOutputStream,Finalizer:240 - Error while deleting the temporary file
13:24:31,645  WARN FileBufferedOutputStream,Finalizer:240 - Error while deleting the temporary file

 

but I'm still not convinced these are at all related to my problem.

 

I've tried this entire experiment about 5 times now, and each time roughly the same thing happens.  The exact number of jobs that actually run varies, but it's never been above 50-100.

 

Any ideas?

 

Thanks,

-Dan



Post Edited by theodan at 05/30/2009 15:20
Link to comment
Share on other sites

  • Replies 5
  • Created
  • Last Reply

Top Posters In This Topic

I have found several other JS forum posts suggesting I'm not the only one with this problem.  They suggest that whenever you schedule more than 25 or 50 reports to run at the same time, many of them simply fail to run, with no exception in the logs and no clue as to what happened.

 

I turned on DEBUG logging and read up on Quartz Scheduler and got some further ideas.  Given that:

- I scheduled 1000 report jobs to run at the same time

- My report queries can take up to 1 minute each to run

- I'm using the default Quartz thread pool size of 2

it seems that a lot of the triggers in Quartz are "misfiring", and this is to be expected.  According to Quartz docs, a "misfire" is what happens when the time for a trigger passes and no Quartz thread is available to process that trigger.  Quartz has a MisfireHandler that is supposed to scan persistent triggers for misfires, and process them as soon as threads are available.

 

I innocently assumed I didn't have to worry about any of this, and that somewhere between Quartz and JS, someone would make sure that all misfires would eventually be picked up and processed correctly.  I have no problem with reports missing their scheduled time, as long as they do actually run eventually.  But I kept an eye on my database machine, and its CPU usage goes from 90% to 0% about 3 minutes after my 1000 reports were scheduled to run.  So basically JS/Quartz runs about 50 of the 1000 scheduled reports, and then seems to just give up on the other 950.  I'm not sure if the bug is in Quartz or in the way JS has configured Quartz.  Given that Quartz does nothing but scheduling and has been stress tested in enterprise apps for a long time now, I'd put my money on the latter.

 

If anyone has any ideas to focus my needle in a haystack search, they'd be very much appreciated.

 

Thanks,

-Dan



Post Edited by theodan at 05/30/2009 14:50
Link to comment
Share on other sites

This thread:

jasperforge.org/plugins/espforum/view.php

seemed promising, but either I followed its advice incorrectly or it just doesn't work.

 

I tried modifying js.quartz.base.properties (not js.quartz.properties, like the thread suggested, because that didn't seem like the right file) by adding:

org.quartz.scheduler.idleWaitTime = 600000

but not only did that not improve anything, it doesn't really make sense after researching it.  The Quartz wiki docs define this setting as "the amount of time in milliseconds that the scheduler will wait before re-queries for available triggers when the scheduler is otherwise idle".  In other words, it's the scheduler's trigger polling interval.  From my understanding, that has nothing to do with how long triggers are considered valid, as suggested in that other thread.



Post Edited by theodan at 05/30/2009 15:15
Link to comment
Share on other sites

Turns out it's org.quartz.jobStore.misfireThreshold that fixed my problem.  I modified js.quartz.base.properties as follows:

 

org.quartz.jobStore.misfireThreshold=36000000
 

 

Now all 1000 of my reporting jobs run to completion.

 

However, this fix doesn't seem ideal.  I shouldn't have to keep increasing this setting to keep up with however long each reporting run will be expected to take.  There should be a way for me to tell JS/Quartz that I want all jobs to run, regardless of how late they are.

 

What was basically happening before I put in the fix was that org.quartz.jobStore.misfireThreshold was set to the default value of 180000 (3 minutes).  So JS/Quartz were processing as many of my reports as they could in the first 3 minutes, after which all the rest of them were being declared "misfires".  Unfortunately, the "MisfireHandler" kept saying it was "handling" the misfires, but it wouldn't actually invoke the misfired jobs or do anything else with them except log them as misfires.  I've attached a small part of the DEBUG-level log file that shows this problem.

 

Is the MisfireHandler a JS component or is it a Quartz component?  Does something have to be configured to tell JS/Quartz to "always run misfired jobs no matter how late they are"?



Post Edited by theodan at 05/30/2009 21:12
Link to comment
Share on other sites

Hello Theodan,

We are having similar problems.  No messages from JS that jobs are not running.  The last date ran and next date run are correct until it comes time to run the report.  The report is skipped, the last date ran remains the same, the the next date run changes to the correct date for the next run.  However...it does not run and JS does not generate a message that the job did not run.  Users are emailing us asking what happened to their reports.  This is in a production environment utilizing JS commercial.  I have read some other posts, but they do not appear to have much useful info.

Anyway, just wanted to thank you for your perseverance in this matter.  We'll take a look at these settings and hopefully our problem will be corrected as well.

Take Care,

Chris

Link to comment
Share on other sites

  • 2 years later...

Hi

I also want to thank you for your research into this issue. I established a production jasperserver environment with only a few dozen reports that run at various intervals (daily, weekly ...). I didn't think I had to babysit them, but just now, was astonished to find that several of my dailys have been skipping certain days rather randomly. No errors in the log. All the data is present. The scheduler says last run date was the day before yesterday. Just now I had to run a few of my scheduled reports manually. I will implement your solution and keep an eye on it for a few weeks.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×
×
  • Create New...