Nav
You are viewing an older version of this section. Click here to navigate to the latest version.

HDFS Connector

Community

The Anypoint Connector for the Hadoop Distributed File System (HDFS) uses as a bi-directional gateway between Mule applications and your Apache Hadoop instance.

Prerequisites

To use the HDFS connector, you must have the following:

  • A working Apache Hadoop Server.

  • Anypoint Studio Community edition.

This document assumes that you are familiar with Mule, Anypoint Connectors, and the Anypoint Studio Essentials. To increase your familiarity with Studio, consider completing one or more Anypoint Studio Tutorials. Further, this page assumes that you have a basic understanding of Mule flows and Mule Global Elements

This document describes implementation examples within the context of Anypoint Studio, Mule ESB’s graphical user interface, and also includes configuration details for doing the same in the XML Editor.

Compatibility

HDFS Hadoop connector is compatible with the following:

Application/Service Version

Mule Runtime

3.6 or higher

Apache Hadoop

2.6.0 or higher

Installing and Configuring

Installing HDFS Connector in Anypoint Studio

You can install a connector in Anypoint Studio using the instructions in To Install a Connector from Anypoint Exchange.

Configuring a Global Element

To use the HDFS connector in your Mule application, you must configure a global HDFS element that can be used by all the HDFS connectors in the application.

  1. In Anypoint Studio, click the Global Elements tab at the base of the canvas, and click Create.

  2. In the Choose Global Type wizard, use the filter to locate and select HDFS, and click OK.

  3. Configure the parameters according to the table below.

    Parameter Description

    Name

    Enter a name for the configuration with which it can be referenced later.

    NameNode URI

    Enter the host name or IP address of the master node of the Hadoop cluster.

    Username

    Enter a Hadoop File System username.

    Configuration Resources

    If you want to override the configuration resources of the Hadoop instance, select a suitable option from here.

    Configuration Entries

    If you want to override the configuration entries of the Hadoop instance, select a suitable option from here.

  4. Access the Pooling Profile tab to configure any settings relevant to managing multiple connections using a connection pool.

  5. Access the Reconnection tab to configure any settings relevant to reconnection strategies that Mule should execute if it loses its connection to HDFS.

  6. Click Test Connection to receive a Connection Successful message.

  7. Click OK to save the global connector configurations.

  8. Return to the Message Flow tab in Studio.

Configuring with the XML Editor

Ensure that you have included the HDFS namespaces in your configuration file.


          
       
1
2
3
4
5
6
7
8
9
<mule xmlns:mulexml="http://www.mulesoft.org/schema/mule/xml" xmlns:hdfs="http://www.mulesoft.org/schema/mule/hdfs"
      xmlns:http="http://www.mulesoft.org/schema/mule/http"
      xmlns:doc="http://www.mulesoft.org/schema/mule/documentation"
      xmlns:spring="http://www.springframework.org/schema/beans"
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.mulesoft.org/schema/mule/core" xsi:schemaLocation="http://www.mulesoft.org/schema/mule/http http://www.mulesoft.org/schema/mule/http/current/mule-http.xsd
http://www.mulesoft.org/schema/mule/hdfs http://www.mulesoft.org/schema/mule/hdfs/current/mule-hdfs.xsd
http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-current.xsd
http://www.mulesoft.org/schema/mule/core http://www.mulesoft.org/schema/mule/core/current/mule.xsd
http://www.mulesoft.org/schema/mule/xml http://www.mulesoft.org/schema/mule/xml/current/mule-xml.xsd">

Follow these steps to configure a HDFS connector in your application:

  1. Create a global HDFS configuration outside and above your flows, using the following global configuration code.

    
                 
              
    1
    2
    
    <!-- Simple configuration -->
    <hdfs:config nameNodeUri="${mule.HDFS.nameNodeUri}" username="${mule.HDFS.username}"/>
Parameter Description

Name

Enter a name for the configuration with which it can be referenced later by config-ref. The name in this example is hdfs:config.

nameNode Uri

Enter the host name or IP address of the master node of the Hadoop cluster.

username

Enter the Hadoop FileSystem username.

Using the Connector

HDFS connector is an operation-based connector, which means that when you add the connector to your flow, you need to configure a specific operation for the connector to perform. The HDFS connector currently supports the following list of operations: 

Operation XML Element Description

Append to File

<hdfs:append>

Append the current payload to a file located at the designated path.

Copy from Local File

<hdfs:copy-from-local-file>

Copy the source file on the local disk to the FileSystem at the given target path. Set deleteSource if the source should be removed.

Copy to Local File

<hdfs:copy-to-local-file>

Copy the source file on the FileSystem to local disk at the given target path. Set deleteSource if the source should be removed.

Delete Directories

<hdfs:delete-directory>

Delete the file or directory located at the designated path.

Delete File

<hdfs:delete-file>

Delete the file or directory located at the designated path.

Get Path Meta Data

<hdfs:get-metadata>

Get the metadata of a path and store it in flow variables.

Glob Status

<hdfs:glob-status>

Return all the files that match file pattern and are not checksum files.

List Status

<hdfs:list-status>

List the statuses of the files and directories in the given path if the path is a directory.

Make Directories

<hdfs:make-directories>

Make the file and all non-existent parents into directories.

Read from Path

<hdfs:read>

Read the content of a file designated by its path and stream the content to the rest of the flow. Also add the HDFS_PATH_EXISTS and HDFS_CONTENT_SUMMARY inbound properties to the path metadata.

Rename

<hdfs:rename>

Rename path target to path destination.

Set Owner

<hdfs:set-owner>

Set the owner of a path, which can be a path to a file or a directory.

Set Permission

<hdfs:set-permission>

Set the permission of a path, which can be a path to a file or a directory.

Write to Path

<hdfs:write>

Write the current payload to the designated path, either creating a new file or appending to an existing one.

Adding the Connector to a Mule Flow

  1. Create a new Anypoint Studio project.

  2. Add any of the Mule Inbound endpoints, such as the HTTP listener, to begin with.

  3. Drag the HDFS connector onto the canvas, then select it to open the Properties Editor console.

  4. Configure the parameters of the connector according to the table below.

    Field Description Default

    Display Name

    Enter a unique label for the connector in your application.

    HDFS

    Connector Configuration

    Connect to a global element linked to this connector. Global elements encapsulate reusable data about the connection to the target resource or service. Select the global HDFS connector element that you just created.

     

    Operation

    Select the action this component must perform.

     

  5. Save your configurations.

Use Cases 

The following are two common use cases for the HDFS connector:

  • Creating a file in a Apache Hadoop instance using a Mule application.

  • Deleting a file from a Apache Hadoop instance using a Mule application.

Example: Use Case 1

Create a file in a Hadoop instance using a Mule application:

hdfsflow

  1. In Anypoint Studio, drag an HTTP connector into the canvas, and select it to open the properties editor console.

  2. Add a new HTTP Listener Configuration global element:

    1. In General Settings, click the Add button:

      4-1

    2. Configure the following HTTP parameters:

      5-1

      Field Value

      Port

      8090

      Path

      filecreate

      Host

      localhost

      Display Name

      HTTP_Listener_Configuration

  3. Reference the HTTP Listener Configuration global element:

    6-1

  4. Add a Logger scope to print the name of the file that needs to be created in the Mule Console. Configure the Logger according to the table below.

    Field Value

    Display Name

    Write to path log (or any other name you prefer)

    Message

    Create file: `#[message.inboundProperties['http.query.params'].path] with message: #[message.inboundProperties['http.query.params’].msg]

    Level

    INFO (Default)

  5. Add a Set Payload transformer to set the message input as payload, configuring it according to the table below.

    Field Value

    Display Name

    Set the message input as payload (or any other name you prefer)

    Value

    `#[message.inboundProperties['http.query.params’].msg]

  6. Drag the HDFS connector onto the canvas, and select it to open the properties editor console.

  7. Click the plus sign next to the Connector Configuration field to add a new global connector configuration.

  8. Configure the global element according to the table below.

    Field Value

    Names

    HDFS

    NameNode URI

    <NameNode URI of Hadoop instance>

    Username

    <Your Hadoop FileSystem username>

  9. Back in the properties editor of the HDFS connector in your application, configure the remaining parameters according to the table below.

    Field Value

    Display Name

    Write to Path (or any other name you prefer)

    Connector Configuration

    hdfs-conf (name of the global element you have created)

    Operation

    Write to path

    Path

    #[message.inboundProperties['http.query.params'].path]

  10. Run the project as a Mule Application (right-click the project name and click Run As > Mule Application).

  11. From a browser, navigate to http://localhost:8090/path=filecreate

  12. Mule conducts the query, and creates the file in Hadoop with the specified message.

XML Editor

hdfsflow

  1. Add an hdfs:config global element to your project, and configure its attributes according to the table below.

    
                 
              
    1
    
    <hdfs:config name="HDFS" doc:name="HDFS" username="<username>" nameNodeUri="<namenode" />
    Attribute Value

    name

    HDFS

    doc:name

    HDFS

    username

    <Your Hadoop FileSystem username>

    nameNodeUri

    NameNode URI of your Hadoop instance

  2. Add a http:listner-config element as shown below.

    
                 
              
    1
    2
    
    <http:listener-config name="HTTP_Listener_Configuration" host="localhost" port="8090" basePath="filecreate" doc:name="HTTP Listener Configuration"/>
    <http:connector name="HTTP_HTTPS" cookieSpec="netscape" validateConnections="true" sendBufferSize="0" receiveBufferSize="0" receiveBacklog="0" clientSoTimeout="10000" serverSoTimeout="10000" socketSoLinger="0" doc:name="HTTP-HTTPS"/>
    Attribute Value

    name

    HTTP_Listener_Configuration

    host

    localhost

    port

    8090

    basePath

    filecreate

    doc:name

    HTTP Listener Configuration

  3. Begin the flow with a http:listener.

    
                 
              
    1
    
    <http:listener config-ref="HTTP_Listener_Configuration" path="/" doc:name="HTTP"/>
    Attribute Value

    config-ref

    HTTP_Listener_Configuration

    Path

    /

    doc:name

    HTTP

  4. Add a Logger transformer to your flow, configuring the attributes according to the table below.

    
                 
              
    1
    
    <set-payload value="#[message.inboundProperties['http.query.params'].msg]" doc:name="Set the message input as payload"/>
    Attribute Value

    message

    Creating file: #[message.inboundProperties['http.query.params'].path]
    with message: #[message.inboundProperties['http.query.params'].msg]

    level

    INFO (Default)

    doc:name

    Write to Path Log

  5. Add a Set Payload transformer to set the message input as payload.

    
                 
              
    1
    
    <set-payload value="#[message.inboundProperties['http.query.params'].msg]" doc:name="Set the message input as payload"/>
    Attribute Value

    Value

    #[message.inboundProperties['http.query.params'].msg]

    doc:name

    Set the message input as payload

  6. Add a hdfs:write element to your flow, configuring the attributes according to the table below.

    Attribute Value

    config-ref

    hdfs-conf

    doc:name

    Write to Path

    path

    #[message.inboundProperties['http.query.params'].path]

  7. Run the project as a Mule Application (right-click project name and click Run As > Mule Application).

  8. From a browser, navigate to http://localhost:8090/path=filecreate

  9. Mule conducts the query, and creates the file in Hadoop with the specified message.

Example Code


          
       
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
<mule xmlns:tracking="http://www.mulesoft.org/schema/mule/ee/tracking" xmlns:mulexml="http://www.mulesoft.org/schema/mule/xml" xmlns:hdfs="http://www.mulesoft.org/schema/mule/hdfs"
      xmlns:http="http://www.mulesoft.org/schema/mule/http"
      xmlns:doc="http://www.mulesoft.org/schema/mule/documentation"
      xmlns:spring="http://www.springframework.org/schema/beans"
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.mulesoft.org/schema/mule/core" xsi:schemaLocation="http://www.mulesoft.org/schema/mule/http http://www.mulesoft.org/schema/mule/http/current/mule-http.xsd
http://www.mulesoft.org/schema/mule/hdfs http://www.mulesoft.org/schema/mule/hdfs/current/mule-hdfs.xsd
http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-current.xsd
http://www.mulesoft.org/schema/mule/core http://www.mulesoft.org/schema/mule/core/current/mule.xsd
http://www.mulesoft.org/schema/mule/xml http://www.mulesoft.org/schema/mule/xml/current/mule-xml.xsd
http://www.mulesoft.org/schema/mule/ee/tracking http://www.mulesoft.org/schema/mule/ee/tracking/current/mule-tracking-ee.xsd">
<hdfs:config name="hdfs-conf" nameNodeUri="<Name node URI>" username="<FileSystem Username>" doc:name="HDFS"/>
<http:listener-config name="HTTP_Listener_Configuration" host="localhost" port="8090" basePath="filecreate" doc:name="HTTP Listener Configuration"/>
<http:connector name="HTTP_HTTPS" cookieSpec="netscape" validateConnections="true" sendBufferSize="0" receiveBufferSize="0" receiveBacklog="0" clientSoTimeout="10000" serverSoTimeout="10000" socketSoLinger="0" doc:name="HTTP-HTTPS"/>
<flow name="Create_File_Flow" doc:name="Create_File_Flow">
<http:listener config-ref="HTTP_Listener_Configuration" path="/" doc:name="HTTP"/>
<logger message="Creating file: #[message.inboundProperties['http.query.params'].path] with message: #[message.inboundProperties['http.query.params'].msg]" level="INFO" doc:name="Write to Path Log"/>
<set-payload value="#[message.inboundProperties['http.query.params'].msg]" doc:name="Set the message input as payload"/>
<hdfs:write config-ref="hdfs-conf" path="#[message.inboundProperties['http.query.params'].path]" doc:name="Write to Path"/>
</flow>
</mule>

Example: Use Case 2

Delete a file from a Hadoop instance using a Mule application:

DeleteFileFlow

  1. Drag an HTTP connector into the canvas, then select it to open the properties editor console.

  2. Add a new HTTP Listener Configuration global element:

    1. In General Settings, click the Add button:

      11-1

    2. Configure the following HTTP parameters:

      12-1

      Field Value

      Port

      8090

      Path

      filedelete

      Host

      localhost

      Display Name

      HTTP_Listener_Configuration

  3. Reference the HTTP Listener Configuration global element:

    13

  4. Add a Logger scope after the HTTP endpoint to print the name of the file that needs to be deleted in the Mule Console. Configure the Logger according to the table below.

    Field Value

    Display Name

    Delete file log (or any other name you prefer)

    Message

    Deleting file: #[message.inboundProperties['http.query.params'].path]

    Level

    INFO (Default)

  5. Drag an HDFS connector onto the canvas, and click it to open the properties editor console.

  6. Click the plus sign next to the Connector Configuration field to add a new global connector configuration.

  7. Configure the global element according to the table below.

    Field Value

    Names

    HDFS

    NameNode URI

    <NameNode URI of Hadoop instance>

    Username

    <Your Hadoop FileSystem username>

  8. Back in the properties editor of the HDFS connector in your application, configure the remaining parameters according to the table below.

    Field Value

    Display Name

    Delete file (or any other name you prefer)

    Connector Configuration

    hdfs-conf (name of the global element you have created)

    Operation

    Delete file

    Path

    #[ message.inboundProperties['http.query.params'].path]

  9. Run the project as a Mule Application (right-click project name, and click Run As > Mule Application).

  10. From a browser, navigate to ` http://localhost:8090/path= filedelete`

  11. Mule conducts the query, and deletes the file from Hadoop.

XML Editor

hdfsflow

  1. Add a hdfs:config global element to your project, then configure its attributes according to the table below.

    
                 
              
    1
    
    <hdfs:config name="HDFS" doc:name="HDFS" username="<username>" nameNodeUri="<namenode" />
    Attribute Value

    name

    HDFS

    doc:name

    HDFS

    username

    <Your Hadoop FileSystem username>

    nameNodeUri

    NameNode URI of your Hadoop instance

  2. Add a http:listener-config element as follows:

    
                 
              
    1
    2
    
    <http:listener-config name="HTTP_Listener_Configuration" host="localhost" port="8090" basePath="filedelete" doc:name="HTTP Listener Configuration"/>
    <http:connector name="HTTP_HTTPS" cookieSpec="netscape" validateConnections="true" sendBufferSize="0" receiveBufferSize="0" receiveBacklog="0" clientSoTimeout="10000" serverSoTimeout="10000" socketSoLinger="0" doc:name="HTTP-HTTPS"/>
    Attribute Value

    name

    HTTP_Listener_Configuration

    host

    0.0.0.0

    port

    8090

    basePath

    filedelete

  3. Begin the flow with a http:listener.

    
                 
              
    1
    
    <http:listener config-ref="HTTP_Listener_Configuration" path="/" doc:name="HTTP"/>
  4. Add a Logger transformer to your flow, configuring the attributes according to the table below.

    
                 
              
    1
    2
    3
    
    <logger message="Deleting file:
    #[message.inboundProperties['http.query.params'].path]" level="INFO"
    doc:name="Delete file log"/>
    Attribute Value

    message

    Deleting file: #` [message.inboundProperties['http.query.params'].path]`

    level

    INFO (Default)

    doc:name

    Delete file log

  5. Add an hdfs:delete-file element to your flow, configuring the attributes according to the table below.

    
                 
              
    1
    2
    
    <hdfs:delete-file config-ref="hdfs-conf" doc:name="Delete
    file" path="#[message.inboundProperties['http.query.params'].path]"/>
    Attribute Value

    config-ref

    hdfs-conf

    doc:name

    Delete file

    path

    # [message.inboundProperties['http.query.params'].path]

  6. Run the project as a Mule Application (right-click project name, then select Run As > Mule Application).

  7. From a browser, navigate to ` http://localhost:8090/path= ` filedelete

  8. Mule conducts the query, and deletes the file from Hadoop.

Example Code


          
       
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
<mule xmlns:tracking="http://www.mulesoft.org/schema/mule/ee/tracking"
xmlns:mulexml="http://www.mulesoft.org/schema/mule/xml"
xmlns:hdfs="http://www.mulesoft.org/schema/mule/hdfs"
xmlns:http="http://www.mulesoft.org/schema/mule/http"
xmlns:doc="http://www.mulesoft.org/schema/mule/documentation"
xmlns:spring="http://www.springframework.org/schema/beans"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns="http://www.mulesoft.org/schema/mule/core"
xsi:schemaLocation="http://www.mulesoft.org/schema/mule/http
http://www.mulesoft.org/schema/mule/http/current/mule-http.xsd
http://www.mulesoft.org/schema/mule/hdfs
http://www.mulesoft.org/schema/mule/hdfs/current/mule-hdfs.xsd
http://www.springframework.org/schema/beans
http://www.springframework.org/schema/beans/spring-beans-current.xsd
http://www.mulesoft.org/schema/mule/core
http://www.mulesoft.org/schema/mule/core/current/mule.xsd
http://www.mulesoft.org/schema/mule/xml
http://www.mulesoft.org/schema/mule/xml/current/mule-xml.xsd
http://www.mulesoft.org/schema/mule/ee/tracking
http://www.mulesoft.org/schema/mule/ee/tracking/current/mule-tracking-ee.xsd
">
<hdfs:config name="hdfs-conf" nameNodeUri="<Name node URI>" username="
<FileSystem Username>" doc:name="HDFS"/>
<http:listener-config name="HTTP_Listener_Configuration" host="localhost" port="8090" basePath="filecreate" doc:name="HTTP Listener Configuration"/>
<http:connector name="HTTP_HTTPS" cookieSpec="netscape" validateConnections="true" sendBufferSize="0" receiveBufferSize="0" receiveBacklog="0" clientSoTimeout="10000" serverSoTimeout="10000" socketSoLinger="0" doc:name="HTTP-HTTPS"/>
<flow name="Delete_File_Flow" doc:name="Delete_File_Flow">
<http:listener config-ref="HTTP_Listener_Configuration" path="/" doc:name="HTTP"/>
<logger message="Deleting file:
#[message.inboundProperties['http.query.params'].path]" level="INFO"
doc:name="Delete file
log"/>
<hdfs:delete-file config-ref="hdfs-conf" doc:name="Delete file"
path="#[message.inboundProperties['http.query.params'].path]"/>
</flow>
</mule>