Contact Free trial Login

Streaming Data Processing with DataMapper

Mule Runtime Engine versions 3.5, 3.6, and 3.7 reached End of Life on or before January 25, 2020. For more information, contact your Customer Success Manager to determine how you can migrate to the latest Mule version.

Especially useful when working with large datasets, Anypoint DataMapper supports streaming data input and output. For example, when reading information from a very large input file, you can use streaming you avoid having DataMapper load the whole file into memory. Instead, DataMapper works as a pipeline: it sequentially reads the file and store the data in a cache, performs data mapping, sends the output to the next transformer, empties the cache, then begins again. Using this procedure, DataMapper can parse a 500 MB CSV file using only about 75 MB of RAM, resulting in significant improvements in performance and resources utilization.

  • Anypoint DataMapper streaming supports CSV and fixed width input and output formats.

  • You can configure the size of the stream cache to optimize for performance.

Setting Streaming in DataMapper

To set the Streaming parameter in your data mapping flow, follow these steps:

  1. In the DataMapper view, click the Properties icon (highlighted below).

  2. DataMapper displays the general configuration dialog, shown below. Click Streaming.

  3. In the Pipe Size input field, enter the desired size of the cache. The default is 2048. Bear in mind that:

    • When working with files, the value of Pipe Size is expressed in bytes

    • When working with collections, the value is expressed in number of collection elements

Handling Exceptions

If an exception occurs in the mapping, DataMapper stops the streaming engine as soon as possible. To avoid undesired consequences in case of failure (such as inserting only part of a row into a database) use Transactions.


This example illustrates the use of the Streaming feature in Anypoint DataMapper.

An HTTP Connector receives a CSV file, then passes it to DataMapper. DataMapper maps the input data from CSV to POJO. A Database Connector inserts the data into an external database. In this scenario, DataMapper and the Database connector work in parallel as a pipeline, further improving application performance.


The image below displays the DataMapper view as configured for this example.


Finally, the output connector receives the list of maps, then incorporates each item as a value in the SQL query for the external database.

INSERT INTO Persons (name, city, email, phone) VALUES (#[], #[], #[], #[])

Was this article helpful?

💙 Thanks for your feedback!

Edit on GitHub