name,lastname,age,gender
Mariano,de Achaval,37,male
Paula,de Estrada,37,female
CSV Format
MIME Type: application/csv
ID: csv
The CSV data format is represented as a DataWeave array of objects in which each object represents a row. All simple values are represented as strings.
The DataWeave reader for CSV input supports the following parsing strategies:
-
Indexed
-
In-Memory
-
Streaming
By default, the CSV reader stores input data from an entire file in-memory if the file is 1.5MB or less. If the file is larger than 1.5 MB, the process writes the data to disk. For very large files, you can improve the performance of the reader by setting a streaming property to true.
For additional details, see DataWeave Readers.
Examples
The following examples show uses of the CSV format.
Example: Represent CSV Data
The following example shows how DataWeave represents CSV data.
Input
The following sample data serves as input for the DataWeave source.
Example: Stream CSV Data
By default, the CSV reader stores input data from an entire file in-memory
if the file is 1.5MB or less. If the file is larger than 1.5 MB, the process
writes the data to disk. For very large files, you can improve the performance
of the reader by setting a streaming
property to true
. To demonstrate the use of this property, the next example streams a CSV file and transforms it to JSON.
Input
The structure of the CSV input looks something like the following. Note that a streamed file is typically much longer.
street,city,zip,state,beds,baths,sale_date
3526 HIGH ST,SACRAMENTO,95838,CA,2,1,Wed May 21 00:00:00 EDT 2018
51 OMAHA CT,SACRAMENTO,95823,CA,3,1,Wed May 21 00:00:00 EDT 2018
2796 BRANCH ST,SACRAMENTO,95815,CA,2,1,Wed May 21 00:00:00 EDT 2018
2805 JANETTE WAY,SACRAMENTO,95815,CA,2,1,Wed May 21 00:00:00 EDT 2018
6001 MCMAHON DR,SACRAMENTO,95824,CA,2,1,,Wed May 21 00:00:00 EDT 2018
5828 PEPPERMILL CT,SACRAMENTO,95841,CA,3,1,Wed May 21 00:00:00 EDT 2018
XML Configuration
To demonstrate a use of the streaming
property, the following Mule flow streams a CSV file and transforms it to JSON.
<flow name="dw-streamingFlow" >
<scheduler doc:name="Scheduler" >
<scheduling-strategy >
<fixed-frequency frequency="1" timeUnit="MINUTES"/>
</scheduling-strategy>
</scheduler>
<file:read
path="${app.home}/input.csv"
config-ref="File_Config"
outputMimeType="application/csv; streaming=true; header=true"/>
<ee:transform doc:name="Transform Message" >
<ee:message >
<ee:set-payload ><![CDATA[%dw 2.0
output application/json
---
payload map ((row) -> {
zipcode: row.zip
})]]></ee:set-payload>
</ee:message>
</ee:transform>
<file:write doc:name="Write"
config-ref="File_Config1"
path="/path/to/output/file/output.json"/>
<logger level="INFO" doc:name="Logger" message="#[payload]"/>
</flow>
-
The example configures the Read operation (
<file:read/>
) to stream the CSV input by settingoutputMimeType="application/csv; streaming=true"
. The input CSV file is located in the project directory,src/main/resources
, which is the location of${app.home}
. -
The DataWeave script in the Transform Message component uses the
map
function to iterate over each row in the CSV payload and select the value of each field in thezip
column. -
The Write operation returns a file,
output.json
, which contains the result of the transformation. -
The Logger prints the same output payload that you see in
output.json
.
Configuration Properties
DataWeave supports the following configuration properties for CSV.
Reader Properties
The CSV format accepts properties that provide instructions for reading input data.
Parameter | Type | Default | Description |
---|---|---|---|
|
|
|
The line number on which the body starts. |
|
|
|
Character used to escape invalid characters, such as separators or quotes within field values. |
|
|
|
Indicates whether a CSV header is present.
Valid values are
|
|
|
|
The line number on which the CSV header is located. |
|
|
|
Ignores any empty line.
Valid values are |
|
|
|
Character to use for quotes. |
|
|
|
Character that separates one field from another field. |
|
|
|
Property for streaming CSV input. Use only if entries are accessed sequentially. Valid values are |
Writer Properties
The CSV format accepts properties that provide instructions for writing output data.
Parameter | Type | Default | Description |
---|---|---|---|
|
|
|
Line number on which the body starts. |
|
|
|
Size of the writer buffer. |
|
|
|
When set to |
|
|
|
Encoding for the writer to use, such as |
|
|
|
Character to use for escaping an invalid character, such as occurrences of the separator or quotes within field values. |
|
|
|
Indicates whether to write a CSV header. Valid values are |
|
|
|
Identifies the line number on which the header is located. |
|
|
|
Ignores any empty line.
Valid values are |
|
|
New Line |
Line separator to use when writing the CSV, for example, |
|
|
|
The character to be used for quotes. |
|
|
|
Indicates whether to quote header values.
Valid values are |
|
|
|
Indicates whether to quote every value
(even if the value contains special characters). Valid values are |
|
|
|
Character that separates one field from another field. |