Handling Errors During Batch Jobs

Mule batch processing is designed to handle very large data sets and to perform almost real-time data integration that recovers from crashes and continues processing a job from a point of failure. However, verbose logs for issues that occur in large data sets can become enormous and severely impact performance.

To limit this impact, Mule uses INFO-level logging by default, as described in Logs of Failing Records Inside a Batch Step. For cases in which you require more verbose log messages, you can change the mode to DEBUG. This mode is helpful for debugging and is feasible for some cases that involve smaller data sets.

The following property is available for setting the logging mode:

log4j.logger.com.mulesoft.module.batch=DEBUG

Avoid using the DEBUG mode in a production environment when processing very large data sets.

Logs of Failing Records Inside a Batch Step

When processing a batch job instance, a processor inside a batch step can fail or raise an error, for example, because of corrupted or incomplete record data. By default, Mule uses the INFO log level to log stack traces according to the following logic when issues occur:

Mule gets the exception’s full stack trace.
Mule strips the stack trace from error message in the log.
Even if all records raise the same error, the messages being processed would probably contain specific information related to those records. For example, if I’m pushing leads to my Salesforce account and one record fails because the lead was already uploaded, another repeated lead would have different record information, but the error is the same.

Mule verifies if the stack trace was already logged in the current step.
The first time the runtime encounters this error, Mule logs the error and produces a message like this one:

com.mulesoft.mule.runtime.module.batch.internal.DefaultBatchStep: Found exception processing record on step 'batchStep1' for job instance 'Batch Job Example' of job 'CreateLeadsBatch'.

This is the first record to show this exception on this step for this job instance. Subsequent records with the same failures will not be logged for performance and log readability reasons:

Mule logs on a "by step" basis. If another step also raises the same error, the runtime logs it again for that step.

When the batch job reaches the On Complete phase, Mule displays an error summary with every error type and the number of occurrences in each batch step.
The error summary for a batch job with two batch steps that raised a batch.exception type:

************************************************************************************************************************
*             - - + Exception Type + - -             *         - - + Step + - -        *       - - + Count + - -       *
************************************************************************************************************************
* com.mulesoft.mule.runtime.module.batch.exception.B *            batchStep1           *                10             *
* com.mulesoft.mule.runtime.module.batch.exception.B *            batchStep2           *                9              *
************************************************************************************************************************

Here, the first step failed ten times, and the second failed nine.

DataWeave Functions for Error Handling

Mule 4.x includes a set of DataWeave functions that you can use in the context of a batch step.

DataWeave Function Description

DataWeave Function	Description
#[Batch::isSuccessfulRecord()]	A boolean function that returns true if the current record has not thrown exceptions in any prior step.
#[Batch::isFailedRecord()]	A boolean function that returns true if the current record has thrown exceptions in any prior step.
#[Batch::failureExceptionForStep(String)]	Receives the name of a step as a String argument. If the current record threw exception on that step, then it returns the actual Exception object. Otherwise it returns null
#[Batch::getStepExceptions()]	Returns a java `Map<String, Exception>` in which the keys are the name of a batch step in which the current record has failed, and the value is the exception itself. If the record hasn’t failed in any step, this Map will be empty but will never be null. Also, the Map contains no entries for steps in which the record hasn’t failed.
#[Batch::getFirstException()]	Returns the Exception for the very first step in which the current record has failed. If the record hasn’t failed in any step, then it returns null.
#[Batch::getLastException()]	Returns the Exception for the last step in which the current record has failed. If the record hasn’t failed in any step, then it returns null.

#[Batch::isSuccessfulRecord()]

A boolean function that returns true if the current record has not thrown exceptions in any prior step.

#[Batch::isFailedRecord()]

A boolean function that returns true if the current record has thrown exceptions in any prior step.

#[Batch::failureExceptionForStep(String)]

Receives the name of a step as a String argument. If the current record threw exception on that step, then it returns the actual Exception object. Otherwise it returns null

#[Batch::getStepExceptions()]

Returns a java Map<String, Exception> in which the keys are the name of a batch step in which the current record has failed, and the value is the exception itself.
If the record hasn’t failed in any step, this Map will be empty but will never be null. Also, the Map contains no entries for steps in which the record hasn’t failed.

#[Batch::getFirstException()]

Returns the Exception for the very first step in which the current record has failed. If the record hasn’t failed in any step, then it returns null.

#[Batch::getLastException()]

Returns the Exception for the last step in which the current record has failed. If the record hasn’t failed in any step, then it returns null.

Example

Imagine a batch job that polls files containing contact information.
In the first step, the batch job aggregates the contacts information and transforms them using the Transform Message component to then being pushed to Salesforce.
In the second step, the job transforms the same contacts again to match the data structure of another third-party contacts application (say, Google Contacts for example) and pushes them to this service using HTTP request.
Now, assume that as a third step, you need to be able to write into a JMS dead-letter queue per each record that has failed. To keep it simple, let’s say that the message will be the exception itself. This requirement holds a trick: each record could have failed in both steps, which means that the same record would translate into two JMS messages.

Such an application would look like this:

Since the goal is to gather failures, it makes sense to configure the Failures step with an ONLY_FAILURES filter (see Refining Batch Steps Processing for more details about batch filters).
The set-payload processor in this step can be configured to use the Batch::getStepExceptions() function.

As stated above, this function returns a map with all errors found in all steps. And since our goal is to send the exceptions through JMS and we don’t care about the steps, we can use a foreach scope to iterate over the map’s values (the errors) and send them through a JMS outbound endpoint:

Batch Processing Strategies for Error Handling

Mule has three options for handling a record-level error:

Finish processing Stop the execution of the current job instance. Finish the execution of the records currently in-flight, but do not pull any more records from the queues and set the job instance into a FAILURE state. The On Complete phase is invoked.
Continue processing the batch regardless of any failed records, using the acceptExpression and acceptPolicy attributes to instruct subsequent batch steps how to handle failed records.
Continue processing the batch regardless of any failed records (using the acceptExpression and acceptPolicy attributes to instruct subsequent batch steps how to handle failed records), until the batch job accumulates a maximum number of failed records at which point the execution will halt just like in option 1.

By default, Mule’s batch jobs follow the first error handling strategy which halts the batch instance execution. The above behavior is controlled through the maxFailedRecords attributes.

Failed Record Handling Option Batch Job Attribute Value

Failed Record Handling Option	Batch Job Attribute	Value
Stops processing when a failed record is found	`maxFailedRecords`	`0`
Continues processing indefinitely, regardless of the number of failed records	`maxFailedRecords`	`-1`
Continues processing until reaching the maximum number of failed records	`maxFailedRecords`	`integer`

Stops processing when a failed record is found

maxFailedRecords

0

Continues processing indefinitely, regardless of the number of failed records

maxFailedRecords

-1

Continues processing until reaching the maximum number of failed records

maxFailedRecords

integer

<batch:job jobName="Batch1" maxFailedRecords="0">

Crossing the Max Failed Threshold

When a batch job accumulates enough failed records to cross the maxFailedRecords threshold, Mule aborts processing for any remaining batch steps, skipping directly to the On Complete phase.

For example, if you set the value of maxFailedRecords to "10" and a batch job accumulates ten failed records in the first of three batch steps, Mule does not attempt to process the batch through the remaining two batch steps. Instead, it aborts further processing and skips directly to On Complete to report on the batch job failure.

If a batch job does not accumulate enough failed records to cross the maxFailedRecords threshold, all records – successes and failures – continue to flow from batch step to batch step; use filters to control which records each batch step processes.

Handling Errors During Batch Jobs

Logs of Failing Records Inside a Batch Step

DataWeave Functions for Error Handling

Example

Batch Processing Strategies for Error Handling

Crossing the Max Failed Threshold

See Also