About Batch Job
The heart of Mule’s batch processing lies within the batch job scope. A batch job is a scope that splits large messages into records. This way, the runtime can process such large messages asynchronously.
Just as flows process messages, batch jobs process records.
A batch job contains one or more batch steps that act upon records as they move through the batch job.
The Batch XML structure was modified on Mule 4.0. The example below shows abbreviated details to highlight batch elements.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 <flow name="flowOne"> <batch:job jobName="batchJob"> <batch:process-records> <batch:step name="batchStep1"> <event processor/> <event processor/> </batch:step> <batch:step name="batchStep1"> <event processor/> <event processor/> </batch:step> </batch:process-records> </batch:job> </flow>
A batch job executes when the flow reaches the process-records section of the batch job.
When the batch job starts executing, the runtime splits the incoming message into records, stores them in a persistent queue, and queries and schedules those records in blocks of records to process.
After all records have passed through all batch steps, the runtime ends the batch job instance and reports the batch job result indicating which records succeeded and which failed during processing.
By default, the runtime stores 100 records in each batch block. You can customize this size to improve batch’s performance.
Each batch job contains three different phases: Load and Dispatch, Process, and On Complete.
This first phase is implicit. During this phase, the runtime performs all the behind the scenes work to create a batch job instance. Essentially, this is the phase during which Mule turns a serialized message payload into a collection of records for processing as a batch. You don’t need to configure anything for this activity to occur, though it is useful to understand the tasks Mule completes during this phase.
Mule sends the message payload through a collection splitter. During this phase, the runtime creates a new batch job instance. The batch job instance is an occurrence in a Mule application resulting from the execution of a batch job in a Mule flow; it exists for as long as Mule processes each record in a batch. The runtime identifies each batch job instance using a unique String known as batch job instance ID.
This identifier is useful if you want, for example, to pass the local job instance ID to an external system for referencing and managing data, improve the job’s custom logging, or even send an email or SMS notifications for meaningful events around that specific batch job instance.
Mule exposes the batch job instance ID through the
batchJobInstanceIdvariable. This variable is available in every step and in the on-complete phase.
Mule creates a persistent queue and associates it to the new batch job instance.
For each item generated by the splitter, Mule creates a record and stores it in the queue. This is an "all or nothing" activity – Mule either successfully generates and queues a record for every item, or the whole message fails during this phase.
Mule presents the batch job instance, with all its queued-up records, to the first batch step for processing.
This second phase is required. During the process phase, the runtime begins processing the records in the batch asynchronously. Each record moves through the event processors in the first batch step, then is sent back to the original queue while it waits to be processed by the second batch step and so on until every record has passed through every batch step. Only one queue exists and records are picked out of it for each batch step, processed, and then sent back to it; each record keeps track of what stages it has been processed through while it sits on this queue. Note that a batch job instance does not wait for all its queued records to finish processing in one batch step before pushing any of them to the next batch step. Queues are persistent.
Mule persists a list of all records as they succeed or fail to process through each batch step. If a record should fail to be processed by an event processor in a batch step, the runtime continues processing the batch, skipping over the failed record in each subsequent batch step.
At the end of this phase, the batch job instance completes and, therefore, ceases to exist.
Beyond simple processing of records, there are several things you can do with records within a batch step.
You can apply filters by adding acceptExpressions within each batch step to prevent the step from processing certain records.
For example, you can set a filter to prevent a step from processing any records which failed processing in the preceding step.
You can use a batch aggregator processor to aggregate records in groups, sending them as bulk upserts to external sources or services.
For example, rather than upserting each individual contact (that is, a record) in a batch to Google Contacts, you can configure a batch aggregator to accumulate, say, 100 records, then upsert all of them to Google Contacts in one chunk.
During this phase, you can optionally configure the runtime to create a report or summary of the records it processed for the particular batch job instance. This phase exists to give system administrators and developers some insight into which records failed to address any issues that might exist with the input data.
After Mule executes the entire batch job, the output becomes a batch job result object (
BatchJobResult). Because Mule processes a batch job as an asynchronous, one-way flow, the results of batch processing do not feed back into the flow which may have triggered it, nor do the results return as a response to a caller. Any event source that feeds data into a batch job must be one-way, not request-response.
You have two options for working with the output:
Create a report in the On Complete phase, using DataWeave using information such as the number of failed records and successfully processed records, and in which step any errors might have occurred.
Reference the batch job result object elsewhere in the Mule application to capture and use batch metadata, such as the number of records which failed to process in a particular batch job instance.
If you leave the On Complete phase empty and do not reference the batch job result object elsewhere in your application, the batch job simply completes, whether failed or successful.
As a good practice, it is recommended that you configure some mechanism for reporting on failed or successful records so as to facilitate further action where required.