Contact Us 1-800-596-4880

HDFS (Hadoop) Connector Reference - Mule 4

Anypoint Connector for Hadoop (HDFS) v6.0

HDFS Configurations

Parameters

Name Type Description Default Value Required

Name

String

The name for this configuration. Connectors reference the configuration with this name.

x

Connection

The connection types to provide to this configuration.

x

Expiration Policy

Configures the minimum amount of time that a dynamic configuration instance can remain idle before Mule considers it eligible for expiration. This does not mean that the platform expires the instance at the exact moment that it becomes eligible. Mule purges the instances as appropriate.

Connection Types

To access the data in a Hadoop (HDFS) instance, you must authenticate your application’s requests using a supported authentication method.

For Hadoop (HDFS) Connector, the following connection authentication types are supported:

Kerberos

Use the following fields in the Global Element Properties dialog window to configure the Kerberos connection type:

Name Type Description Default Value Required

Username

String

The Kerberos principal. The Username is passed to the HDFS client as the hadoop.job.ugi configuration entry. The Username can be overridden by values you specify in Configuration Resources and Configuration Entries. This parameter is called Username for backward compatibility reasons.

Keytab Path

String

Enter the path to the keytab file associated with the Username you specified. The path is used to obtain a ticket granting ticket (TGT) from the authorization server. If this value is not provided Kerberos looks for a TGT associated with the specified Username within your local Kerberos cache.

Name Node Uri

String

The name of the file system to connect to. The Name Node is passed to the HDFS client as the FileSystem#FS_DEFAULT_NAME_KEY configuration entry. The Name Node can be overridden by values you specify in Configuration Resources and Configuration Entries.

x

Configuration Resources

Array of String

A java.util.List of configuration resource files for the HDFS client to load. You can provide additional configuration files, for example, core-site.xml.

Configuration Entries

Object

A java.util.Map of configuration entries to use by the HDFS client. You can provide additional configuration entries as key-value pairs.

Reconnection

When the application is deployed, a connectivity test is performed on all connectors. If reconnection is enabled, deployment fails if the test doesn’t pass after exhausting the associated reconnection strategy.

Simple

Use the following fields in the Global Element Properties dialog window to configure the Simple connection type:

Name Type Description Default Value Required

Username

String

User identity that Hadoop uses for permissions in HDFS. When Simple authentication is used, Hadoop requires that the user is set as a system property called HADOOP_USER_NAME. If a value is not provided, Hadoop uses the username of the OS user who is currently logged in.

Name Node Uri

String

The name of the file system to which to connect. The Name Node is passed to the HDFS client as the FileSystem#FS_DEFAULT_NAME_KEY configuration entry. The Name Node can be overridden by values you specify in Configuration Resources and Configuration Entries.

x

Configuration Resources

Array of String

A java.util.List of configuration resource files for the HDFS client to load. You can provide additional configuration files, for example, core-site.xml.

Configuration Entries

Object

A java.util.Map of configuration entries for the HDFS client to use. You can provide additional configuration entries as key-value pairs.

Reconnection

When the application is deployed, a connectivity test is performed on all connectors. If reconnection is enabled, deployment fails if the test doesn’t pass after exhausting the associated reconnection strategy.

Operations

Associated Sources

Append

<hdfs:append>

Append the current payload to a file located at the designated path. Note: by default the Hadoop server has the append option disabled. To append data to an existing file, refer to the dfs.support.append configuration parameter.

Parameters

Name Type Description Default Value Required

Configuration

String

The name of the configuration to use.

x

Path

String

The path of the file to write to.

x

Buffer Size

Number

The buffer size to use when appending to the file.

4096

Payload

Binary

The payload to append to the file.

#[payload]

Reconnection Strategy

A retry strategy in case of connectivity errors

For Configurations

Throws

  • HDFS:CONNECTIVITY

  • HDFS:INVALID_REQUEST_DATA

  • HDFS:INVALID_STRUCTURE_FOR_INPUT_DATA

  • HDFS:RETRY_EXHAUSTED

  • HDFS:UNKNOWN

Copy From Local File

<hdfs:copy-from-local-file>

Copy the source file on the local disk to the file system for a target path, set Delete Source if the source file should be removed.

Parameters

Name Type Description Default Value Required

Configuration

String

The name of the configuration to use.

x

Delete Source

Boolean

Whether to delete the source.

false

Overwrite

Boolean

Whether to overwrite destination content.

true

Source

String

The source path in the file system.

x

Destination

String

The target path on the local disk.

x

Reconnection Strategy

A retry strategy in case of connectivity errors

For Configurations

Throws

  • HDFS:CONNECTIVITY

  • HDFS:INVALID_REQUEST_DATA

  • HDFS:INVALID_STRUCTURE_FOR_INPUT_DATA

  • HDFS:RETRY_EXHAUSTED

  • HDFS:UNKNOWN

Copy To Local File

<hdfs:copy-to-local-file>

Copy the source file in the file system to a local disk at the given target path. Set Delete Source if the source file should be removed. Use Raw Local File System indicates whether to use RawLocalFileSystem as it is a non-CRC file system.

Parameters

Name Type Description Default Value Required

Configuration

String

The name of the configuration to use.

x

Delete Source

Boolean

Whether to delete the source.

false

Use Raw Local File System

Boolean

Whether to use RawLocalFileSystem as a local file system.

false

Source

String

The source path on the File System.

x

Destination

String

The target path on the local disk.

x

Reconnection Strategy

A retry strategy in case of connectivity errors

For Configurations

Throws

  • HDFS:CONNECTIVITY

  • HDFS:INVALID_REQUEST_DATA

  • HDFS:INVALID_STRUCTURE_FOR_INPUT_DATA

  • HDFS:RETRY_EXHAUSTED

  • HDFS:UNKNOWN

Delete Directory

<hdfs:delete-directory>

Delete the file or directory located at the designated path.

Parameters

Name Type Description Default Value Required

Configuration

String

The name of the configuration to use.

x

Path

String

The path of the file to delete.

x

Reconnection Strategy

A retry strategy in case of connectivity errors

For Configurations

Throws

  • HDFS:CONNECTIVITY

  • HDFS:INVALID_REQUEST_DATA

  • HDFS:INVALID_STRUCTURE_FOR_INPUT_DATA

  • HDFS:RETRY_EXHAUSTED

  • HDFS:UNKNOWN

Delete File

<hdfs:delete-file>

Delete the file or directory located at the designated path.

Parameters

Name Type Description Default Value Required

Configuration

String

The name of the configuration to use.

x

Path

String

The path of the file to delete.

x

Reconnection Strategy

A retry strategy in case of connectivity errors

For Configurations

Throws

  • HDFS:CONNECTIVITY

  • HDFS:INVALID_REQUEST_DATA

  • HDFS:INVALID_STRUCTURE_FOR_INPUT_DATA

  • HDFS:RETRY_EXHAUSTED

  • HDFS:UNKNOWN

Get Metadata

<hdfs:get-metadata>

Get the metadata of a path

Parameters

Name Type Description Default Value Required

Configuration

String

The name of the configuration to use.

x

Path

String

The path of the file to delete.

x

Target Variable

String

The name of a variable to store the operation’s output.

Target Value

String

An expression to evaluate against the operation’s output and store the expression outcome in the target variable.

#[payload]

Reconnection Strategy

A retry strategy in case of connectivity errors

Output

Type

Metadata

For Configurations

Throws

  • HDFS:CONNECTIVITY

  • HDFS:INVALID_REQUEST_DATA

  • HDFS:INVALID_STRUCTURE_FOR_INPUT_DATA

  • HDFS:RETRY_EXHAUSTED

  • HDFS:UNKNOWN

Glob Status

<hdfs:glob-status>

Return all the files that match file pattern and are not checksum files. Results are sorted by their names.

Parameters

Name Type Description Default Value Required

Configuration

String

The name of the configuration to use.

x

Path Pattern

String

A regular expression specifying the path pattern.

x

Filter

String

The user supplied path filter

Target Variable

String

The name of a variable to store the operation’s output.

Target Value

String

An expression to evaluate against the operation’s output and store the expression outcome in the target variable.

#[payload]

Reconnection Strategy

A retry strategy in case of connectivity errors

Output

Type

Array of File Status

For Configurations

Throws

  • HDFS:CONNECTIVITY

  • HDFS:RETRY_EXHAUSTED

List Status

<hdfs:list-status>

List the statuses of the files and directories in the given path if the path is a directory.

Parameters

Name Type Description Default Value Required

Configuration

String

The name of the configuration to use.

x

Path

String

The given path

x

Filter

String

The user supplied path filter

Target Variable

String

The name of a variable to store the operation’s output.

Target Value

String

An expression to evaluate against the operation’s output and store the expression outcome in the target variable.

#[payload]

Reconnection Strategy

A retry strategy in case of connectivity errors

Output

Type

Array of File Status

For Configurations

Throws

  • HDFS:CONNECTIVITY

  • HDFS:INVALID_REQUEST_DATA

  • HDFS:INVALID_STRUCTURE_FOR_INPUT_DATA

  • HDFS:RETRY_EXHAUSTED

  • HDFS:UNKNOWN

Make Directories

<hdfs:make-directories>

Make the given file and all non-existent parents into directories. Has the semantics of Unix 'mkdir -p'. Existence of the directory hierarchy is not an error.

Parameters

Name Type Description Default Value Required

Configuration

String

The name of the configuration to use.

x

Path

String

The path to create one or more directories.

x

Permission

String

The file system permission to use when creating the directories, either in octal or symbolic format (umask).

Reconnection Strategy

A retry strategy in case of connectivity errors.

For Configurations

Throws

  • HDFS:CONNECTIVITY

  • HDFS:INVALID_REQUEST_DATA

  • HDFS:INVALID_STRUCTURE_FOR_INPUT_DATA

  • HDFS:RETRY_EXHAUSTED

  • HDFS:UNKNOWN

Read Operation

<hdfs:read-operation>

Read the content of a file designated by its path and streams it to the rest of the flow.

Parameters

Name Type Description Default Value Required

Configuration

String

The name of the configuration to use.

x

Path

String

The path of the file to read.

x

Buffer Size

Number

The buffer size to use when reading the file.

4096

Streaming Strategy

Configure if repeatable streams should be used and their behavior

Target Variable

String

The name of a variable to store the operation’s output.

Target Value

String

An expression to evaluate against the operation’s output and store the expression outcome in the target variable

#[payload]

Reconnection Strategy

A retry strategy in case of connectivity errors.

Output

Type

Binary

For Configurations

Throws

  • HDFS:CONNECTIVITY

  • HDFS:INVALID_REQUEST_DATA

  • HDFS:INVALID_STRUCTURE_FOR_INPUT_DATA

  • HDFS:RETRY_EXHAUSTED

  • HDFS:UNKNOWN

Rename

<hdfs:rename>

Renames path target to path destination.

Parameters

Name Type Description Default Value Required

Configuration

String

The name of the configuration to use.

x

Source

String

The source path to be renamed.

x

Destination

String

New path after rename.

x

Reconnection Strategy

A retry strategy in case of connectivity errors.

For Configurations

Throws

  • HDFS:CONNECTIVITY

  • HDFS:INVALID_REQUEST_DATA

  • HDFS:INVALID_STRUCTURE_FOR_INPUT_DATA

  • HDFS:RETRY_EXHAUSTED

  • HDFS:UNKNOWN

Set Owner

<hdfs:set-owner>

Set owner of a path for a file or a directory. The Ownername and Groupname cannot both be null.

Parameters

Name Type Description Default Value Required

Configuration

String

The name of the configuration to use.

x

Path

String

The path of the file or directory to set owner.

x

Ownername

String

If it is null, the original username remains unchanged.

x

Groupname

String

If it is null, the original groupname remains unchanged.

x

Reconnection Strategy

A retry strategy in case of connectivity errors.

For Configurations

Throws

  • HDFS:CONNECTIVITY

  • HDFS:INVALID_REQUEST_DATA

  • HDFS:INVALID_STRUCTURE_FOR_INPUT_DATA

  • HDFS:RETRY_EXHAUSTED

  • HDFS:UNKNOWN

Set Permission

<hdfs:set-permission>

Set permission of a path, that is, for a file or a directory.

Parameters

Name Type Description Default Value Required

Configuration

String

The name of the configuration to use.

x

Path

String

The path of the file or directory to set permission.

x

Permission

String

The file system permission to be set.

x

Reconnection Strategy

A retry strategy in case of connectivity errors.

For Configurations

Throws

  • HDFS:CONNECTIVITY

  • HDFS:INVALID_REQUEST_DATA

  • HDFS:INVALID_STRUCTURE_FOR_INPUT_DATA

  • HDFS:RETRY_EXHAUSTED

  • HDFS:UNKNOWN

Write

<hdfs:write>

Write the current payload to the designated path, either creating a new file or appending to an existing one.

Parameters

Name Type Description Default Value Required

Configuration

String

The name of the configuration to use.

x

Payload

Binary

the payload to write to the file.

#[payload]

Path

String

The path of the file to write to.

x

Permission

String

The file system permission to use if a new file is created, either in octal or symbolic format (umask).

700

Overwrite

Boolean

If a pre-existing file should be overwritten with the new content.

true

Buffer Size

Number

The buffer size to use when appending to the file.

4096

Replication

Number

Block replication for the file.

1

Block Size

Number

The block size to use when appending to the file.

1048576

Owner User Name

String

The username owner of the file.

Owner Group Name

String

The group owner of the file.

Reconnection Strategy

A retry strategy in case of connectivity errors.

For Configurations

Throws

  • HDFS:CONNECTIVITY

  • HDFS:INVALID_REQUEST_DATA

  • HDFS:INVALID_STRUCTURE_FOR_INPUT_DATA

  • HDFS:RETRY_EXHAUSTED

  • HDFS:UNKNOWN

Sources

Read

<hdfs:read>

Parameters

Name Type Description Default Value Required

Configuration

String

The name of the configuration to use.

x

Path

String

Read the content of a file designated by its path

x

Buffer Size

Number

4096

Primary Node Only

Boolean

Whether this source should be executed only on the primary node when running in a cluster.

Streaming Strategy

Configure if repeatable streams should be used and their behavior

Redelivery Policy

Defines a policy for processing the redelivery of the same message.

Reconnection Strategy

A retry strategy in case of connectivity errors.

Output

Type

Any

Attributes Type

Any

For Configurations

Types

Reconnection

Field Type Description Default Value Required

Fails Deployment

Boolean

When the application is deployed, a connectivity test is performed on all connectors. If set to true, deployment fails if the test doesn’t pass after exhausting the associated reconnection strategy.

Reconnection Strategy

The reconnection strategy to use.

Reconnect

Field Type Description Default Value Required

Frequency

Number

How often to reconnect (in milliseconds)

Count

Number

The number of reconnection attempts to make

blocking

Boolean

If false, the reconnection strategy runs in a separate, non-blocking thread.

true

Reconnect Forever

Field Type Description Default Value Required

Frequency

Number

How often in milliseconds to reconnect

blocking

Boolean

If false, the reconnection strategy runs in a separate, non-blocking thread.

true

Expiration Policy

Field Type Description Default Value Required

Max Idle Time

Number

A scalar time value for the maximum amount of time a dynamic configuration instance should be allowed to be idle before it’s considered eligible for expiration.

Time Unit

Enumeration, one of:

  • NANOSECONDS

  • MICROSECONDS

  • MILLISECONDS

  • SECONDS

  • MINUTES

  • HOURS

  • DAYS

A time unit that qualifies the maxIdleTime attribute

Repeatable In Memory Stream

Field Type Description Default Value Required

Initial Buffer Size

Number

The amount of memory that will be allocated to consume the stream and provide random access to it. If the stream contains more data than can be fit into this buffer, then the buffer expands according to the bufferSizeIncrement attribute, with an upper limit of maxInMemorySize.

Buffer Size Increment

Number

This is by how much the buffer size expands if it exceeds its initial size. Setting a value of zero or lower means that the buffer should not expand, meaning that a STREAM_MAXIMUM_SIZE_EXCEEDED error is raised when the buffer gets full.

Max Buffer Size

Number

The maximum amount of memory to use. If more than that is used, the STREAM_MAXIMUM_SIZE_EXCEEDED error is raised. A value lower than or equal to zero means no limit.

Buffer Unit

Enumeration, one of:

  • BYTE

  • KB

  • MB

  • GB

The unit in which all these attributes are expressed

Repeatable File Store Stream

Field Type Description Default Value Required

Max In Memory Size

Number

Defines the maximum memory that the stream should use to keep data in memory. If more memory is consumed, content on the disk is buffered.

Buffer Unit

Enumeration, one of:

  • BYTE

  • KB

  • MB

  • GB

The unit in which maxInMemorySize is expressed.

Redelivery Policy

Field Type Description Default Value Required

Max Redelivery Count

Number

The maximum number of times a message can be redelivered and processed unsuccessfully before triggering a process-failed message.

Use Secure Hash

Boolean

Whether to use a secure hash algorithm to identify a redelivered message.

Message Digest Algorithm

String

The secure hashing algorithm to use. If not set, the default is SHA-256.

Id Expression

String

Defines one or more expressions to use to determine when a message has been redelivered. This property can only be set if Use Secure Hash is false.

Object Store

Object Store

The object store where the redelivery counter for each message is stored.

Metadata

Field Type Description Default Value Required

Check Summary

Content Summary

File Status

Path Exists

Boolean

Check Summary

Field Type Description Default Value Required

Bytes Per CRC

Number

Crc Per Block

Number

Md5

String

Content Summary

Field Type Description Default Value Required

Directory Count

Number

File Count

Number

Length

Number

Snapshot Directory Count

Number

Snapshot File Count

Number

Snapshot Length

Number

Snapshot Space Consumed

Number

File Status

Field Type Description Default Value Required

Access Time

Number

Access time of file in milliseconds

Block Replication

Number

Replication factor of file

Block Size

Number

Block size of file

Directory

Boolean

Indicates whether path is a directory

Group

String

Group owner associated with file

Length

Number

Length of file in bytes

Modification Time

Number

Modification time of file in milliseconds

Owner

String

Owner of file

Path

String

Path name

Permission

String

Permission of file as an octal string

Symbolic Link

Boolean

Indicates whether a path is a symbolic link

View on GitHub