Streaming Data Processing with DataMapper
Especially useful when working with large datasets, Anypoint DataMapper supports streaming data input and output. For example, when reading information from a very large input file, you can use streaming you avoid having DataMapper load the whole file into memory. Instead, DataMapper works as a pipeline: it sequentially reads the file and store the data in a cache, performs data mapping, sends the output to the next transformer, empties the cache, then begins again. Using this procedure, DataMapper can parse a 500 MB CSV file using only about 75 MB of RAM, resulting in significant improvements in performance and resources utilization.
Anypoint DataMapper streaming supports CSV and fixed width input and output formats.
You can configure the size of the stream cache to optimize for performance.
This document assumes the reader is familiar with the Anypoint DataMapper Transformer. If you are not, start from the beginning: DataMapper User Guide and Reference. For a listing of all available tools in DataMapper, consult DataMapper Visual Reference.
View DataMapper Documentation Table of Contents
To set the Streaming parameter in your data mapping flow, follow these steps:
In the DataMapper view, click the Properties icon (highlighted below).
DataMapper displays the general configuration dialog, shown below. Click Streaming.
In the Pipe Size input field, enter the desired size of the cache. The default is 2048. Bear in mind that:
When working with files, the value of Pipe Size is expressed in bytes
When working with collections, the value is expressed in number of collection elements
If an exception occurs in the mapping, DataMapper stops the streaming engine as soon as possible. To avoid undesired consequences in case of failure (such as inserting only part of a row into a database) use Transactions.
This example illustrates the use of the Streaming feature in Anypoint DataMapper.
An HTTP Connector receives a CSV file, then passes it to DataMapper. DataMapper maps the input data from CSV to POJO. A Database Connector inserts the data into an external database. In this scenario, DataMapper and the Database connector work in parallel as a pipeline, further improving application performance.
The image below displays the DataMapper view as configured for this example.
Finally, the output connector receives the list of maps, then incorporates each item as a value in the SQL query for the external database.
INSERT INTO Persons (name, city, email, phone) VALUES (#[payload.name], #[payload.city], #[payload.email], #[payload.phone])