About The Streaming Batch Aggregator
You can configure a batch aggregator scope to stream its content.
This enables you to aggregate all the records in the job instance, no matter how many or how large they are.
Instead of a list of elements that you receive with a fixed-size batch aggregator, the streaming functionality ensures that you receive all the records in the job instance without running out of memory.
For example, if you need to write millions records to CSV file, you can process the records as a streaming batch aggregator.
1 2 3 4 5 6 7 8 9 10 11 12 13 <batch:job jobName="batchJob"> <batch:process-records > <batch:step name="batchStep"> <batch:aggregator streaming="true"> <file:write path="reallyLarge.csv"> <file:content><![CDATA[%dw 2.0 ... }]]></file:content> </batch:aggregator> </batch:step> </batch:process-records> </batch:job>
When using a streaming aggregator, you can replace, change, or store the payload and variable data of each record.
By adding a foreach scope you can iterate trough the entire set of streaming records, and use the Groovy and the scripting module to modify the payload and create a variable for each collected record.
You can sequentially go over each record’s data and persistently store each record’s payload and variables. This method of accessing records within the batch aggregator is called sequential access.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 <batch:job jobName="batchJob"> <batch:process-records> <batch:step name="batchStep"> <batch:aggregator doc:name="batchAggregator" streaming="true"> <foreach doc:name="For Each"> <script:execute engine="groovy"> <script:code> vars['marco'] = 'polo' vars['record'].payload = 'foo' </script:code> </script:execute> </foreach> </batch:aggregator> </batch:step> </batch:process-records> </batch:job>
The sequential access method assumes that:
The aggregator size matches the amount of aggregated records.
There is a direct correlation between the aggregated records and the items in the list.
Due to memory restrictions, random access is not supported for streaming aggregators.
The record payloads for random access are exposed as an
immutable List and since streaming aggregators implies having access to the entire set of records, without a fixed commit size the runtime can’t guaranteed that all records will fit in memory.
Streaming from SaaS providers: In general, you likely wouldn’t use batch streaming when sending data through an Anypoint Connector TO a SaaS provider, like Salesforce, because SaaS providers often have restrictions on accepting streaming input. Rather, use streaming batch processing when writing to a file such as CSV, JSON, or XML.
Batch streaming and performance: Batch processing streaming data does affect the performance of your application, slowing the pace at which it processes transactions. Though performance slows, the trade-off to be able to batch process streaming data may warrant using it in your implementation.
Batch streaming and access to items: The biggest drawback to using batch streaming is that you have limited access to the items in the output. In other words, with a fixed-size commit, you get an unmodifiable list, thus allowing you to access and iteratively process its items; with streaming commit, you get a one-read, forward-only iterator.