Contact Us 1-800-596-4880

Hadoop HDFS Connector API Reference

Hadoop Distributed File System (HDFS) Connector.

Additional Info

Requires Mule Enterprise License Yes  

Requires Entitlement

No  

Mule Version

3.6.0 or higher

Configs


Kerberos Configuration

<hdfs:config-with-kerberos>

Connection Management

Kerberos authentication configuration. Here you can configure properties required by "Kerberos Authentication" in order to establish connection with Hadoop Distributed File System.

Attributes

Name Java Type Description Default Value Required

name

String

The name of this configuration. With this name can be later referenced.

x 

nameNodeUri

String

The name of the file system to connect to. It is passed to HDFS client as the {FileSystem#FS_DEFAULT_NAME_KEY} configuration entry. It can be overridden by values in configurationResources and configurationEntries.

x 

keytabPath

String

Path to the keytab file associated with username. It is used in order to obtain TGT from "Authorization server". If not provided it will look for a TGT associated to username within your local kerberos cache.

 

username

String

A simple user identity of a client process. It is passed to HDFS client as the "hadoop.job.ugi" configuration entry. It can be overridden by values in configurationResources and configurationEntries.

 

configurationResources

List<String>

A List of configuration resource files to be loaded by the HDFS client. Here you can provide additional configuration files. (e.g core-site.xml)

 

configurationEntries

Map<String,String>

A Map of configuration entries to be used by the HDFS client. Here you can provide additional configuration entries as key/value pairs.

 


Simple Configuration

<hdfs:config>

Connection Management

Simple authentication configuration. Here you can configure properties required by "Simple Authentication" in order to establish connection with Hadoop Distributed File System.

Attributes

Name Java Type Description Default Value Required

name

String

The name of this configuration. With this name can be later referenced.

x 

nameNodeUri

String

The name of the file system to connect to. It is passed to HDFS client as the {FileSystem#FS_DEFAULT_NAME_KEY} configuration entry. It can be overridden by values in configurationResources and configurationEntries.

x 

username

String

A simple user identity of a client process. It is passed to HDFS client as the "hadoop.job.ugi" configuration entry. It can be overridden by values in configurationResources and configurationEntries.

 

configurationResources

List<String>

A List of configuration resource files to be loaded by the HDFS client. Here you can provide additional configuration files. (e.g core-site.xml)

 

configurationEntries

Map<String,String>

A Map of configuration entries to be used by the HDFS client. Here you can provide additional configuration entries as key/value pairs.

 

Processors


Read from path

<hdfs:read-operation>

Read the content of a file designated by its path and streams it to the rest of the flow:

XML Sample

<!-- Reading a file using with an operation rather than pooling with an endpoint -->
<hdfs:read-operation path="/tmp/test.dat" bufferSize="8192" config-ref="hdfs-conf"/>

Attributes

Name Java Type Description Default Value Required

config-ref

String

Specify which config to use

x 

path

String

the path of the file to read.

x 

bufferSize

int

the buffer size to use when reading the file.

4096

 

Returns

Return Java Type Description

InputStream

the result from executing the rest of the flow.


Get path meta data

<hdfs:get-metadata>

Get the metadata of a path, as described in HDFSConnector#read(String, int, SourceCallback), and store it in flow variables.

This flow variables are:

  • hdfs.path.exists - Indicates if the path exists (true or false)
  • hdfs.content.summary - A resume of the path info
  • hdfs.file.checksum - MD5 digest of the file (if it is a file and exists)
  • hdfs.file.status - A Hadoop object that contains info about the status of the file (org.apache.hadoop.fs.FileStatus

XML Sample

<!-- Store the meta-information of a path in flow variables -->
<hdfs:get-metadata path="/tmp/test.dat" config-ref="hdfs-conf"/>

Attributes

Name Java Type Description Default Value Required

config-ref

String

Specify which config to use

x 

path

String

the path whose existence must be checked.

x 


Write to path

<hdfs:write>

Write the current payload to the designated path, either creating a new file or appending to an existing one.

Attributes

Name Java Type Description Default Value Required

config-ref

String

Specify which config to use

x 

path

String

the path of the file to write to.

x 

permission

String

the file system permission to use if a new file is created, either in octal or symbolic format (umask).

700

 

overwrite

boolean

if a pre-existing file should be overwritten with the new content.

true

 

bufferSize

int

the buffer size to use when appending to the file.

4096

 

replication

int

block replication for the file.

1

 

blockSize

long

the buffer size to use when appending to the file.

1048576

 

ownerUserName

String

the username owner of the file.

 

ownerGroupName

String

the group owner of the file.

 

payload

InputStream

the payload to write to the file.

#[payload]

 


Append to file

<hdfs:append>

Append the current payload to a file located at the designated path. Note: by default the Hadoop server has the append option disabled. In order to be able append any data to an existing file refer to dfs.support.append configuration parameter

Attributes

Name Java Type Description Default Value Required

config-ref

String

Specify which config to use

x 

path

String

the path of the file to write to.

x 

bufferSize

int

the buffer size to use when appending to the file.

4096

 

payload

InputStream

the payload to append to the file.

#[payload]

 


Delete file

<hdfs:delete-file>

Delete the file or directory located at the designated path.

XML Sample

<!-- Delete a file -->
<hdfs:delete-file path="/tmp/test.dat" config-ref="hdfs-conf"/>

Attributes

Name Java Type Description Default Value Required

config-ref

String

Specify which config to use

x 

path

String

the path of the file to delete.

x 


Delete directory

<hdfs:delete-directory>

Delete the file or directory located at the designated path.

XML Sample

<!-- Delete a directory -->
<hdfs:delete-directory path="/tmp/my-dir" config-ref="hdfs-conf"/>

Attributes

Name Java Type Description Default Value Required

config-ref

String

Specify which config to use

x 

path

String

the path of the directory to delete.

x 


Make directories

<hdfs:make-directories>

Make the given file and all non-existent parents into directories. Has the semantics of Unix 'mkdir -p'. Existence of the directory hierarchy is not an error.

Attributes

Name Java Type Description Default Value Required

config-ref

String

Specify which config to use

x 

path

String

the path to create directories for.

x 

permission

String

the file system permission to use when creating the directories, either in octal or symbolic format (umask).

 


Rename

<hdfs:rename>

Renames path target to path destination.

XML Sample

<!-- Rename any source directory or file to the provided target path -->
<hdfs:rename source="/tmp/my-dir" target="/tmp/new-dir" config-ref="hdfs-conf"/>

Attributes

Name Java Type Description Default Value Required

config-ref

String

Specify which config to use

x 

source

String

the source path to be renamed.

x 

target

String

the target new path after rename.

x 

Returns

Return Java Type Description

Boolean

Boolean true if rename is successful.


List status

<hdfs:list-status>

List the statuses of the files/directories in the given path if the path is a directory

XML Sample

<!-- List the statuses of the given path -->
<hdfs:list-status path="/tmp/my-dir" filter="^.*/2014/02/$" config-ref="hdfs-conf"/>

Attributes

Name Java Type Description Default Value Required

config-ref

String

Specify which config to use

x 

path

String

the given path

x 

filter

String

the user supplied path filter

 

Returns

Return Java Type Description

List

FileStatus the statuses of the files/directories in the given path


Glob status

<hdfs:glob-status>

Return all the files that match file pattern and are not checksum files. Results are sorted by their names.

XML Sample

<!-- Return all the files that match file pattern, sorted by their names -->
<hdfs:glob-status pathPattern="/tmp/*/*" config-ref="hdfs-conf"/>

Attributes

Name Java Type Description Default Value Required

config-ref

String

Specify which config to use

x 

pathPattern

String

a regular expression specifying the path pattern.

x 

filter

PathFilter

the user supplied path filter

 

Returns

Return Java Type Description

List

FileStatus an array of paths that match the path pattern.


Copy from local file

<hdfs:copy-from-local-file>

Copy the source file on the local disk to the FileSystem at the given target path, set deleteSource if the source should be removed.

XML Sample

<!-- Copy from source local disk to the target FileSystem -->
<hdfs:copy-from-local-file deleteSource="true" overwrite="false" source="/tmp/mulesoft/" target="/user/mulesoft/" config-ref="hdfs-conf"/>

Attributes

Name Java Type Description Default Value Required

config-ref

String

Specify which config to use

x 

deleteSource

boolean

whether to delete the source.

false

 

overwrite

boolean

whether to overwrite a existing file.

true

 

source

String

the source path on the local disk.

x 

target

String

the target path on the File System.

x 


Copy to local file

<hdfs:copy-to-local-file>

Copy the source file on the FileSystem to local disk at the given target path, set deleteSource if the source should be removed. useRawLocalFileSystem indicates whether to use RawLocalFileSystem as it is a non CRC File System.

XML Sample

<!-- Copy to source local disk from the target FileSystem -->
<hdfs:copy-to-local-file deleteSource="false" useRawLocalFileSystem="false" source="/tmp/mulesoft/" target="/user/mulesoft/" config-ref="hdfs-conf"/>

Attributes

Name Java Type Description Default Value Required

config-ref

String

Specify which config to use

x 

deleteSource

boolean

whether to delete the source.

false

 

useRawLocalFileSystem

boolean

whether to use RawLocalFileSystem as local file system or not.

false

 

source

String

the source path on the File System.

x 

target

String

the target path on the local disk.

x 


Set permission

<hdfs:set-permission>

Set permission of a path (i.e., a file or a directory).

XML Sample

<!-- Set permission of a path to change. -->
<hdfs:set-permission path="/tmp/my-dir" permission="511" config-ref="hdfs-conf"/>

Attributes

Name Java Type Description Default Value Required

config-ref

String

Specify which config to use

x 

path

String

the path of the file or directory to set permission.

x 

permission

String

the file system permission to be set.

x 


Set owner

<hdfs:set-owner>

Set owner of a path (i.e., a file or a directory). The parameters username and groupname cannot both be null.

XML Sample

<!-- Set owner of a path to change. -->
<hdfs:set-owner path="/tmp/my-dir" ownername="mulesoft" groupname="supergroup" config-ref="hdfs-conf"/>

Attributes

Name Java Type Description Default Value Required

config-ref

String

Specify which config to use

x 

path

String

the path of the file or directory to set owner.

x 

ownername

String

If it is null, the original username remains unchanged.

 

groupname

String

If it is null, the original groupname remains unchanged.

 

Sources


Read from path

<hdfs:read>

Read the content of a file designated by its path and streams it to the rest of the flow, while adding the path metadata in the following inbound properties:

  • HDFSConnector#HDFS_PATH_EXISTS: a boolean set to true if the path exists
  • HDFSConnector#HDFS_CONTENT_SUMMARY: an instance of ContentSummary if the path exists.
  • HDFSConnector#HDFS_FILE_STATUS: an instance of FileStatus if the path exists.
  • HDFSConnector#HDFS_FILE_CHECKSUM: an instance of FileChecksum if the path exists, is a file and has a checksum.

Attributes

Name Java Type Description Default Value Required

config-ref

String

Specify which config to use

x 

path

String

the path of the file to read.

x 

bufferSize

int

the buffer size to use when reading the file.

4096

 

sourceCallback

SourceCallback

the SourceCallback used to propagate the event to the rest of the flow.

x 

Returns

Return Java Type Description

void

View on GitHub