Issue Description
Under certain circumstances customers have seen scheduled jobs not execute, instead the next run-time lists the current time, or some time in the future, yet the job never actually executes, the "next run" time just keeps advancing.
There is a configurable default number of scheduler threads (threadCount) in WEB-INFjs.quartz.base.properties, the default it two. These two worker threads peruse the scheduler queue to pickup any "triggered/fired" jobs so they can fill and export their reports. If the two are stuck, hung or busy due to a long-running report, then it is possible that other firings could be skipped over based upon "misfire" policy.
When a firing occurs, if the job isn't "picked up" within three minutes (by default, but configurable) we deem this a misfire. Following are default behaviors when a misfire occurs:
For non-recurring / one-time: it will fire immediately (of course if the two threads are stuck then just remain in the queue)
For simple recurring: it will skip till the next interval at which is should run
For calendar recurring: it will fire immediately the most recent timed-trigger, and skip anything prior.
So if the worker threads are being hung-up by a poorly performing report, this can cause misfires which effectively re-queue the "next run" time, over and over perpetually and reports never get executed and delivered until the backlog from the poorly performing report(s) clears out. If you have many poorly performing reports scheduled in a pretty tight time-frame the backlog can be large.
Resolution
You can increase the number of threads, but note that this will lead to more concurrency, which could hurt CPU/RAM and result in delays for people running reports via the UI.
If your scheduler seems stuck, our recommendation is to halt the scheduling of new jobs by users, remove or disable poorly performing schedules jobs so they don't requeue. (It may be necessary to restart JRS to kill hung reports, and it may be necessary to clean out the Fired Triggers table, if so, contact support). Then limit the types of reports that can be scheduled based upon your tested limits (resources like ram/cpu factored with threadCount and misfireThreshold). Ensure you space out the timing properly, and if necessary you can always add a node into a cluster that does nothing except run the scheduler.
Ref. Case 01485362
https://community.jaspersoft.com/wiki/what-are-various-quartz-misfire-policy-definitions
Recommended Comments
There are no comments to display.
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now