name,lastname,age,gender
Mariano,de Achaval,37,male
Paula,de Estrada,37,female
CSV Format
MIME type: application/csv
ID: csv
The CSV data format is represented as a DataWeave array of objects in which each object represents a row. All simple values are represented as strings.
The DataWeave reader for CSV input supports the following parsing strategies:
-
Indexed
-
In-Memory
-
Streaming
By default, the CSV reader stores input data from an entire file in-memory if the file is 1.5MB or less. If the file is larger than 1.5 MB, the process writes the data to disk. For very large files, you can improve the performance of the reader by setting a streaming property to true.
For additional details, see DataWeave Readers.
Examples
The following examples show uses of the CSV format.
Example: Represent CSV Data
The following example shows how DataWeave represents CSV data.
Input
The following sample data serves as input for the DataWeave source.
Example: Stream CSV Data
By default, the CSV reader stores input data from an entire file in-memory
if the file is 1.5MB or less. If the file is larger than 1.5 MB, the process
writes the data to disk. For very large files, you can improve the performance
of the reader by setting a streaming
property to true
. To demonstrate the use of this property, the next example streams a CSV file and transforms it to JSON.
Input
The structure of the CSV input looks something like the following. Note that a streamed file is typically much longer.
street,city,zip,state,beds,baths,sale_date
3526 HIGH ST,SACRAMENTO,95838,CA,2,1,Wed May 21 00:00:00 EDT 2018
51 OMAHA CT,SACRAMENTO,95823,CA,3,1,Wed May 21 00:00:00 EDT 2018
2796 BRANCH ST,SACRAMENTO,95815,CA,2,1,Wed May 21 00:00:00 EDT 2018
2805 JANETTE WAY,SACRAMENTO,95815,CA,2,1,Wed May 21 00:00:00 EDT 2018
6001 MCMAHON DR,SACRAMENTO,95824,CA,2,1,,Wed May 21 00:00:00 EDT 2018
5828 PEPPERMILL CT,SACRAMENTO,95841,CA,3,1,Wed May 21 00:00:00 EDT 2018
XML Configuration
To demonstrate a use of the streaming
property, the following Mule flow streams a CSV file and transforms it to JSON.
<flow name="dw-streamingFlow" >
<scheduler doc:name="Scheduler" >
<scheduling-strategy >
<fixed-frequency frequency="1" timeUnit="MINUTES"/>
</scheduling-strategy>
</scheduler>
<file:read
path="${app.home}/input.csv"
config-ref="File_Config"
outputMimeType="application/csv; streaming=true; header=true"/>
<ee:transform doc:name="Transform Message" >
<ee:message >
<ee:set-payload ><![CDATA[%dw 2.0
output application/json
---
payload map ((row) -> {
zipcode: row.zip
})]]></ee:set-payload>
</ee:message>
</ee:transform>
<file:write doc:name="Write"
config-ref="File_Config1"
path="/path/to/output/file/output.json"/>
<logger level="INFO" doc:name="Logger" message="#[payload]"/>
</flow>
-
The example configures the Read operation (
<file:read/>
) to stream the CSV input by settingoutputMimeType="application/csv; streaming=true"
. The input CSV file is located in the project directory,src/main/resources
, which is the location of${app.home}
. -
The DataWeave script in the Transform Message component uses the
map
function to iterate over each row in the CSV payload and select the value of each field in thezip
column. -
The Write operation returns a file,
output.json
, which contains the result of the transformation. -
The Logger prints the same output payload that you see in
output.json
.
Configuration Properties
DataWeave supports the following configuration properties for this format.
Reader Properties
This format accepts properties that provide instructions for reading input data.
Parameter | Type | Default | Description |
---|---|---|---|
|
|
|
Line number on which the body starts. |
|
|
|
Character to use for escaping special characters, such as separators or quotes. |
|
|
|
Indicates whether a CSV header is present.
Valid values are |
|
|
|
Line number on which the CSV header is located. |
|
|
|
Indicates whether to ignore an empty line. Valid values are |
|
|
|
Character to use for quotes. |
|
|
|
Character that separates one field from another field. |
|
|
|
Streams input when set to Valid values are |
Writer Properties
This format accepts properties that provide instructions for writing output data.
Parameter | Type | Default | Description |
---|---|---|---|
|
|
|
Line number on which the body starts. |
|
|
|
Size of the buffer writer, in bytes. The value must be greater than |
|
|
|
Generates the output as a data stream when set to Valid values are |
|
|
|
The encoding to use for the output, such as UTF-8. |
|
|
|
Character to use for escaping special characters, such as separators or quotes. |
|
|
|
Indicates whether a CSV header is present.
Valid values are |
|
|
|
Line number on which the CSV header is located. |
|
|
|
Indicates whether to ignore an empty line. Valid values are |
|
|
|
Line separator to use when writing CSV, for example, |
|
|
|
Character to use for quotes. |
|
|
|
Quotes header values when set to Valid values are |
|
|
|
Quotes every value when set to Valid values are |
|
|
|
Character that separates one field from another field. |