CSV Format

MIME type: application/csv

ID: csv

The CSV data format is represented as a DataWeave array of objects in which each object represents a row. All simple values are represented as strings.

The DataWeave reader for CSV input supports the following parsing strategies:

  • Indexed

  • In-Memory

  • Streaming

By default, the CSV reader stores input data from an entire file in-memory if the file is 1.5MB or less. If the file is larger than 1.5 MB, the process writes the data to disk. For very large files, you can improve the performance of the reader by setting a streaming property to true.

For additional details, see DataWeave Readers.

Examples

The following examples show uses of the CSV format.

Example: Represent CSV Data

The following example shows how DataWeave represents CSV data.

Input

The following sample data serves as input for the DataWeave source.

name,lastname,age,gender
Mariano,de Achaval,37,male
Paula,de Estrada,37,female

Source

The DataWeave script transforms the CSV input payload to the DataWeave (dw) format and MIME type.

%dw 2.0
output application/dw
---
payload

Output

The DataWeave script produces the following output.

[
  {
    name: "Mariano",
    lastname: "de Achaval",
    age: "37",
    gender: "male"
  },
  {
    name: "Paula",
    lastname: "de Estrada",
    age: "37",
    gender: "female"
  }
]

Example: Stream CSV Data

By default, the CSV reader stores input data from an entire file in-memory if the file is 1.5MB or less. If the file is larger than 1.5 MB, the process writes the data to disk. For very large files, you can improve the performance of the reader by setting a streaming property to true. To demonstrate the use of this property, the next example streams a CSV file and transforms it to JSON.

Input

The structure of the CSV input looks something like the following. Note that a streamed file is typically much longer.

CSV File Input for Streaming Example (truncated):
street,city,zip,state,beds,baths,sale_date
3526 HIGH ST,SACRAMENTO,95838,CA,2,1,Wed May 21 00:00:00 EDT 2018
51 OMAHA CT,SACRAMENTO,95823,CA,3,1,Wed May 21 00:00:00 EDT 2018
2796 BRANCH ST,SACRAMENTO,95815,CA,2,1,Wed May 21 00:00:00 EDT 2018
2805 JANETTE WAY,SACRAMENTO,95815,CA,2,1,Wed May 21 00:00:00 EDT 2018
6001 MCMAHON DR,SACRAMENTO,95824,CA,2,1,,Wed May 21 00:00:00 EDT 2018
5828 PEPPERMILL CT,SACRAMENTO,95841,CA,3,1,Wed May 21 00:00:00 EDT 2018

XML Configuration

To demonstrate a use of the streaming property, the following Mule flow streams a CSV file and transforms it to JSON.

<flow name="dw-streamingFlow" >
  <scheduler doc:name="Scheduler" >
    <scheduling-strategy >
      <fixed-frequency frequency="1" timeUnit="MINUTES"/>
    </scheduling-strategy>
  </scheduler>
  <file:read
     path="${app.home}/input.csv"
     config-ref="File_Config"
     outputMimeType="application/csv; streaming=true; header=true"/>
  <ee:transform doc:name="Transform Message" >
    <ee:message >
      <ee:set-payload ><![CDATA[%dw 2.0
output application/json
---
payload map ((row) -> {
zipcode: row.zip
})]]></ee:set-payload>
    </ee:message>
  </ee:transform>
  <file:write doc:name="Write"
    config-ref="File_Config1"
    path="/path/to/output/file/output.json"/>
  <logger level="INFO" doc:name="Logger" message="#[payload]"/>
</flow>
  • The example configures the Read operation (<file:read/>) to stream the CSV input by setting outputMimeType="application/csv; streaming=true". The input CSV file is located in the project directory, src/main/resources, which is the location of ${app.home}.

  • The DataWeave script in the Transform Message component uses the map function to iterate over each row in the CSV payload and select the value of each field in the zip column.

  • The Write operation returns a file, output.json, which contains the result of the transformation.

  • The Logger prints the same output payload that you see in output.json.

Output

The CSV streaming example produces the following output.

[
  {
    "zipcode": "95838"
  },
  {
    "zipcode": "95823"
  },
  {
    "zipcode": "95815"
  },
  {
    "zipcode": "95815"
  },
  {
    "zipcode": "95824"
  },
  {
    "zipcode": "95841"
  }
]

Configuration Properties

DataWeave supports the following configuration properties for this format.

Reader Properties

This format accepts properties that provide instructions for reading input data.

Parameter Type Default Description

bodyStartLineNumber

Number

0

Line number on which the body starts.

escape

String

\

Character to use for escaping special characters, such as separators or quotes.

header

Boolean

true

Indicates whether a CSV header is present.

  • If header=true, you can access the fields within the input by name, for example, payload.userName.

  • If header=false, you must access the fields by index, referencing the entry first and the field next, for example, payload[107][2].

Valid values are true or false.

headerLineNumber

Number

0

Line number on which the CSV header is located.

ignoreEmptyLine

Boolean

true

Indicates whether to ignore an empty line.

Valid values are true or false.

quote

String

"

Character to use for quotes.

separator

String

,

Character that separates one field from another field.

streaming

Boolean

false

Streams input when set to true. Use only if entries are accessed sequentially and do not require random access. The input must be a top-level array. See the streaming example, and see DataWeave Readers.

Valid values are true or false.

Writer Properties

This format accepts properties that provide instructions for writing output data.

Parameter Type Default Description

bodyStartLineNumber

Number

0

Line number on which the body starts.

bufferSize

Number

8192

Size of the buffer writer.

deferred

Boolean

false

Generates the output as a data stream when set to true, and defers the script’s execution until consumed.

Valid values are true or false.

encoding

String

null

The character set to use for the output, such as UTF-8.

escape

String

\

Character to use for escaping special characters, such as separators or quotes.

header

Boolean

true

Indicates whether a CSV header is present.

  • If header=true, you can access the fields within the input by name, for example, payload.userName.

  • If header=false, you must access the fields by index, referencing the entry first and the field next, for example, payload[107][2].

Valid values are true or false.

headerLineNumber

Number

0

Line number on which the CSV header is located.

ignoreEmptyLine

Boolean

true

Indicates whether to ignore an empty line.

Valid values are true or false.

lineSeparator

String

New Line

Line separator to use when writing CSV, for example, "\r\n". By default, DataWeave uses the system line separator.

quote

String

"

Character to use for quotes.

quoteHeader

Boolean

false

Quotes header values when set to true.

Valid values are true or false.

quoteValues

Boolean

false

Quotes every value when set to true, including values that contain special characters.

Valid values are true or false.

separator

String

,

Character that separates one field from another field.

Supported MIME Types

This format supports the following MIME types.

MIME Type

*/csv

Was this article helpful?

💙 Thanks for your feedback!

Edit on GitHub