String
HDFS (Hadoop) Connector Reference - Mule 4
Anypoint Connector for Hadoop (HDFS) v6.0
Release Notes: Hadoop (HDFS) Connector Release Notes
HDFS Configurations
Parameters
Name | Type | Description | Default Value | Required |
---|---|---|---|---|
Name |
The name for this configuration. Connectors reference the configuration with this name. |
x |
||
Connection |
The connection types to provide to this configuration. |
x |
||
Expiration Policy |
Configures the minimum amount of time that a dynamic configuration instance can remain idle before Mule considers it eligible for expiration. This does not mean that the platform expires the instance at the exact moment that it becomes eligible. Mule purges the instances as appropriate. |
Connection Types
To access the data in a Hadoop (HDFS) instance, you must authenticate your application’s requests using a supported authentication method.
For Hadoop (HDFS) Connector, the following connection authentication types are supported:
Kerberos
Use the following fields in the Global Element Properties dialog window to configure the Kerberos connection type:
Name | Type | Description | Default Value | Required |
---|---|---|---|---|
Username |
String |
The Kerberos principal. The Username is passed to the HDFS client as the |
||
Keytab Path |
String |
Enter the path to the keytab file associated with the Username you specified. The path is used to obtain a ticket granting ticket (TGT) from the authorization server. If this value is not provided Kerberos looks for a TGT associated with the specified Username within your local Kerberos cache. |
||
Name Node Uri |
String |
The name of the file system to connect to. The Name Node is passed to the HDFS client as the |
x |
|
Configuration Resources |
Array of String |
A |
||
Configuration Entries |
Object |
A |
||
Reconnection |
When the application is deployed, a connectivity test is performed on all connectors. If reconnection is enabled, deployment fails if the test doesn’t pass after exhausting the associated reconnection strategy. |
Simple
Use the following fields in the Global Element Properties dialog window to configure the Simple connection type:
Name | Type | Description | Default Value | Required |
---|---|---|---|---|
Username |
String |
User identity that Hadoop uses for permissions in HDFS. When Simple authentication is used, Hadoop requires that the user is set as a system property called |
||
Name Node Uri |
String |
The name of the file system to which to connect. The Name Node is passed to the HDFS client as the |
x |
|
Configuration Resources |
Array of String |
A |
||
Configuration Entries |
Object |
A |
||
Reconnection |
When the application is deployed, a connectivity test is performed on all connectors. If reconnection is enabled, deployment fails if the test doesn’t pass after exhausting the associated reconnection strategy. |
Operations
Append
<hdfs:append>
Append the current payload to a file located at the designated path. Note: by default the Hadoop server has the append option disabled. To append data to an existing file, refer to the dfs.support.append configuration parameter.
Parameters
Name | Type | Description | Default Value | Required |
---|---|---|---|---|
Configuration |
String |
The name of the configuration to use. |
x |
|
Path |
String |
The path of the file to write to. |
x |
|
Buffer Size |
Number |
The buffer size to use when appending to the file. |
4096 |
|
Payload |
Binary |
The payload to append to the file. |
#[payload] |
|
Reconnection Strategy |
A retry strategy in case of connectivity errors |
Copy From Local File
<hdfs:copy-from-local-file>
Copy the source file on the local disk to the file system for a target path, set Delete Source if the source file should be removed.
Parameters
Name | Type | Description | Default Value | Required |
---|---|---|---|---|
Configuration |
String |
The name of the configuration to use. |
x |
|
Delete Source |
Boolean |
Whether to delete the source. |
false |
|
Overwrite |
Boolean |
Whether to overwrite destination content. |
true |
|
Source |
String |
The source path in the file system. |
x |
|
Destination |
String |
The target path on the local disk. |
x |
|
Reconnection Strategy |
A retry strategy in case of connectivity errors |
Copy To Local File
<hdfs:copy-to-local-file>
Copy the source file in the file system to a local disk at the given target path. Set Delete Source if the source file should be removed. Use Raw Local File System indicates whether to use RawLocalFileSystem as it is a non-CRC file system.
Parameters
Name | Type | Description | Default Value | Required |
---|---|---|---|---|
Configuration |
String |
The name of the configuration to use. |
x |
|
Delete Source |
Boolean |
Whether to delete the source. |
false |
|
Use Raw Local File System |
Boolean |
Whether to use RawLocalFileSystem as a local file system. |
false |
|
Source |
String |
The source path on the File System. |
x |
|
Destination |
String |
The target path on the local disk. |
x |
|
Reconnection Strategy |
A retry strategy in case of connectivity errors |
Delete Directory
<hdfs:delete-directory>
Delete the file or directory located at the designated path.
Delete File
<hdfs:delete-file>
Delete the file or directory located at the designated path.
Get Metadata
<hdfs:get-metadata>
Get the metadata of a path
Parameters
Name | Type | Description | Default Value | Required |
---|---|---|---|---|
Configuration |
String |
The name of the configuration to use. |
x |
|
Path |
String |
The path of the file to delete. |
x |
|
Target Variable |
String |
The name of a variable to store the operation’s output. |
||
Target Value |
String |
An expression to evaluate against the operation’s output and store the expression outcome in the target variable. |
#[payload] |
|
Reconnection Strategy |
A retry strategy in case of connectivity errors |
Glob Status
<hdfs:glob-status>
Return all the files that match file pattern and are not checksum files. Results are sorted by their names.
Parameters
Name | Type | Description | Default Value | Required |
---|---|---|---|---|
Configuration |
String |
The name of the configuration to use. |
x |
|
Path Pattern |
String |
A regular expression specifying the path pattern. |
x |
|
Filter |
String |
The user supplied path filter |
||
Target Variable |
String |
The name of a variable to store the operation’s output. |
||
Target Value |
String |
An expression to evaluate against the operation’s output and store the expression outcome in the target variable. |
#[payload] |
|
Reconnection Strategy |
A retry strategy in case of connectivity errors |
Output
Type |
Array of File Status |
List Status
<hdfs:list-status>
List the statuses of the files and directories in the given path if the path is a directory.
Parameters
Name | Type | Description | Default Value | Required |
---|---|---|---|---|
Configuration |
String |
The name of the configuration to use. |
x |
|
Path |
String |
The given path |
x |
|
Filter |
String |
The user supplied path filter |
||
Target Variable |
String |
The name of a variable to store the operation’s output. |
||
Target Value |
String |
An expression to evaluate against the operation’s output and store the expression outcome in the target variable. |
#[payload] |
|
Reconnection Strategy |
A retry strategy in case of connectivity errors |
Output
Type |
Array of File Status |
Make Directories
<hdfs:make-directories>
Make the given file and all non-existent parents into directories. Has the semantics of Unix 'mkdir -p'. Existence of the directory hierarchy is not an error.
Parameters
Name | Type | Description | Default Value | Required |
---|---|---|---|---|
Configuration |
String |
The name of the configuration to use. |
x |
|
Path |
String |
The path to create one or more directories. |
x |
|
Permission |
String |
The file system permission to use when creating the directories, either in octal or symbolic format (umask). |
||
Reconnection Strategy |
A retry strategy in case of connectivity errors. |
Read Operation
<hdfs:read-operation>
Read the content of a file designated by its path and streams it to the rest of the flow.
Parameters
Name | Type | Description | Default Value | Required |
---|---|---|---|---|
Configuration |
String |
The name of the configuration to use. |
x |
|
Path |
String |
The path of the file to read. |
x |
|
Buffer Size |
Number |
The buffer size to use when reading the file. |
4096 |
|
Streaming Strategy |
|
Configure if repeatable streams should be used and their behavior |
||
Target Variable |
String |
The name of a variable to store the operation’s output. |
||
Target Value |
String |
An expression to evaluate against the operation’s output and store the expression outcome in the target variable |
#[payload] |
|
Reconnection Strategy |
A retry strategy in case of connectivity errors. |
Rename
<hdfs:rename>
Renames path target to path destination.
Set Owner
<hdfs:set-owner>
Set owner of a path for a file or a directory. The Ownername and Groupname cannot both be null.
Parameters
Name | Type | Description | Default Value | Required |
---|---|---|---|---|
Configuration |
String |
The name of the configuration to use. |
x |
|
Path |
String |
The path of the file or directory to set owner. |
x |
|
Ownername |
String |
If it is null, the original username remains unchanged. |
x |
|
Groupname |
String |
If it is null, the original groupname remains unchanged. |
x |
|
Reconnection Strategy |
A retry strategy in case of connectivity errors. |
Set Permission
<hdfs:set-permission>
Set permission of a path, that is, for a file or a directory.
Parameters
Name | Type | Description | Default Value | Required |
---|---|---|---|---|
Configuration |
String |
The name of the configuration to use. |
x |
|
Path |
String |
The path of the file or directory to set permission. |
x |
|
Permission |
String |
The file system permission to be set. |
x |
|
Reconnection Strategy |
A retry strategy in case of connectivity errors. |
Write
<hdfs:write>
Write the current payload to the designated path, either creating a new file or appending to an existing one.
Parameters
Name | Type | Description | Default Value | Required |
---|---|---|---|---|
Configuration |
String |
The name of the configuration to use. |
x |
|
Payload |
Binary |
the payload to write to the file. |
#[payload] |
|
Path |
String |
The path of the file to write to. |
x |
|
Permission |
String |
The file system permission to use if a new file is created, either in octal or symbolic format (umask). |
700 |
|
Overwrite |
Boolean |
If a pre-existing file should be overwritten with the new content. |
true |
|
Buffer Size |
Number |
The buffer size to use when appending to the file. |
4096 |
|
Replication |
Number |
Block replication for the file. |
1 |
|
Block Size |
Number |
The block size to use when appending to the file. |
1048576 |
|
Owner User Name |
String |
The username owner of the file. |
||
Owner Group Name |
String |
The group owner of the file. |
||
Reconnection Strategy |
A retry strategy in case of connectivity errors. |
Sources
Read
<hdfs:read>
Parameters
Name | Type | Description | Default Value | Required |
---|---|---|---|---|
Configuration |
String |
The name of the configuration to use. |
x |
|
Path |
String |
Read the content of a file designated by its path |
x |
|
Buffer Size |
Number |
4096 |
||
Primary Node Only |
Boolean |
Whether this source should be executed only on the primary node when running in a cluster. |
||
Streaming Strategy |
|
Configure if repeatable streams should be used and their behavior |
||
Redelivery Policy |
Defines a policy for processing the redelivery of the same message. |
|||
Reconnection Strategy |
A retry strategy in case of connectivity errors. |
Types
Reconnection
Field | Type | Description | Default Value | Required |
---|---|---|---|---|
Fails Deployment |
Boolean |
When the application is deployed, a connectivity test is performed on all connectors. If set to true, deployment fails if the test doesn’t pass after exhausting the associated reconnection strategy. |
||
Reconnection Strategy |
The reconnection strategy to use. |
Reconnect
Field | Type | Description | Default Value | Required |
---|---|---|---|---|
Frequency |
Number |
How often to reconnect (in milliseconds) |
||
Count |
Number |
The number of reconnection attempts to make |
||
blocking |
Boolean |
If false, the reconnection strategy runs in a separate, non-blocking thread. |
true |
Reconnect Forever
Field | Type | Description | Default Value | Required |
---|---|---|---|---|
Frequency |
Number |
How often in milliseconds to reconnect |
||
blocking |
Boolean |
If false, the reconnection strategy runs in a separate, non-blocking thread. |
true |
Expiration Policy
Field | Type | Description | Default Value | Required |
---|---|---|---|---|
Max Idle Time |
Number |
A scalar time value for the maximum amount of time a dynamic configuration instance should be allowed to be idle before it’s considered eligible for expiration. |
||
Time Unit |
Enumeration, one of:
|
A time unit that qualifies the maxIdleTime attribute |
Repeatable In Memory Stream
Field | Type | Description | Default Value | Required |
---|---|---|---|---|
Initial Buffer Size |
Number |
The amount of memory that will be allocated to consume the stream and provide random access to it. If the stream contains more data than can be fit into this buffer, then the buffer expands according to the bufferSizeIncrement attribute, with an upper limit of maxInMemorySize. |
||
Buffer Size Increment |
Number |
This is by how much the buffer size expands if it exceeds its initial size. Setting a value of zero or lower means that the buffer should not expand, meaning that a STREAM_MAXIMUM_SIZE_EXCEEDED error is raised when the buffer gets full. |
||
Max Buffer Size |
Number |
The maximum amount of memory to use. If more than that is used, the STREAM_MAXIMUM_SIZE_EXCEEDED error is raised. A value lower than or equal to zero means no limit. |
||
Buffer Unit |
Enumeration, one of:
|
The unit in which all these attributes are expressed |
Repeatable File Store Stream
Field | Type | Description | Default Value | Required |
---|---|---|---|---|
Max In Memory Size |
Number |
Defines the maximum memory that the stream should use to keep data in memory. If more memory is consumed, content on the disk is buffered. |
||
Buffer Unit |
Enumeration, one of:
|
The unit in which maxInMemorySize is expressed. |
Redelivery Policy
Field | Type | Description | Default Value | Required |
---|---|---|---|---|
Max Redelivery Count |
Number |
The maximum number of times a message can be redelivered and processed unsuccessfully before triggering a process-failed message. |
||
Use Secure Hash |
Boolean |
Whether to use a secure hash algorithm to identify a redelivered message. |
||
Message Digest Algorithm |
String |
The secure hashing algorithm to use. If not set, the default is SHA-256. |
||
Id Expression |
String |
Defines one or more expressions to use to determine when a message has been redelivered. This property can only be set if Use Secure Hash is false. |
||
Object Store |
Object Store |
The object store where the redelivery counter for each message is stored. |
Metadata
Field | Type | Description | Default Value | Required |
---|---|---|---|---|
Check Summary |
||||
Content Summary |
||||
File Status |
||||
Path Exists |
Boolean |
Check Summary
Field | Type | Description | Default Value | Required |
---|---|---|---|---|
Bytes Per CRC |
Number |
|||
Crc Per Block |
Number |
|||
Md5 |
String |
Content Summary
Field | Type | Description | Default Value | Required |
---|---|---|---|---|
Directory Count |
Number |
|||
File Count |
Number |
|||
Length |
Number |
|||
Snapshot Directory Count |
Number |
|||
Snapshot File Count |
Number |
|||
Snapshot Length |
Number |
|||
Snapshot Space Consumed |
Number |
File Status
Field | Type | Description | Default Value | Required |
---|---|---|---|---|
Access Time |
Number |
Access time of file in milliseconds |
||
Block Replication |
Number |
Replication factor of file |
||
Block Size |
Number |
Block size of file |
||
Directory |
Boolean |
Indicates whether path is a directory |
||
Group |
String |
Group owner associated with file |
||
Length |
Number |
Length of file in bytes |
||
Modification Time |
Number |
Modification time of file in milliseconds |
||
Owner |
String |
Owner of file |
||
Path |
String |
Path name |
||
Permission |
String |
Permission of file as an octal string |
||
Symbolic Link |
Boolean |
Indicates whether a path is a symbolic link |