Batch and Scheduled Jobs
Smile CDR has several mechanisms it uses internally for executing background tasks. These are:
Batch Jobs are jobs that can process large amounts of data in a distributed way, taking advantage of all available processing power in a cluster to do so.
Batch jobs in Smile CDR include:
Batch job processing in Smile CDR leverages an internal framework called Batch2. The Batch2 framework uses a combination of processing channels and database tables in order to distribute work across the cluster.
Batch jobs are divided into a series of steps. Each step has a distinct identifier within the Batch Job definition and performs a specific function with a defined set of inputs and outputs. The first step accepts a set of job parameters as input and produces zero or more work chunks as a result. Subsequent steps accept these work chunks as input, and may produce subsequent work chunks as output, except for the final step which does not. Some jobs have a special kind of final step called a reduction step that prepares a report or aggregates data across all output from the previous step.
Using the FHIR Bulk Export Batch Job as an example:
For each work chunk emitted by a job step, two things happen:
The Message Broker channel used to send and receive work chunk notifications is named batch2-work-notification-[nodeId]-[moduleId]
. The -
characters may be replaced with .
characters in the name depending on the Replace Hyphens with Periods setting. Work chunk notification messages contain the UUID associated with the work chunk, but do not contain the associated data. Workers receive these notifications, load the associated data from the database, and then begin processing it.
Assuming that an external message broker has been configured, the use of a message channel allows the server to distribute processing across all Smile CDR processes within the same node.
The following diagram shows the individual steps in the Bulk Export Batch Job. Note that the database and kafka channels are shown multiple times in order to clearly show the flow of data, but these all refer to the same database table and message channel respectively.
If you are designing a Smile CDR installation which will handle large amounts of data, it is important to consider the following things:
Smile CDR employs a system called Quartz to provide cluster-aware scheduling for recurring jobs. Clustered jobs are scheduled on a set frequency, and will execute on only a single process within the cluster for each occurrence of the scheduling frequency.
Clustered Scheduled Jobs are typically used for maintenance. For example:
The Quartz scheduler is also used to schedule non-cluster-aware jobs. These jobs execute at a given frequency on every process within the cluster at the same frequency.
These jobs are typically used to expire internal memory caches and advance processing in internal maintenance jobs.
The Scheduler Thread Count setting is used to control the number of threads that are used to process scheduled jobs.
If this value is set to the default of 4, then 4 threads will be available on each Smile CDR process for executing Clustered Scheduled Jobs, and an additional 4 threads will be available for executing Local Scheduled Jobs.
Because most scheduled jobs are fast and relatively lightweight, it is uncommon to need to modify this setting.
You are about to leave the Smile Digital Health documentation and navigate to the Open Source HAPI-FHIR Documentation.