Contact Free trial Login

HDFS (Hadoop) Connector - Mule 4

Support Category: Select

HDFS Connector v6.0

Anypoint Connector for the Hadoop Distributed File System (HDFS) (HDFS Connector) is used as a bidirectional gateway between Mule applications and HDFS.

Prerequisites

To use this information, you should be familiar with HDFS, Mule, Anypoint Platform, and Anypoint Studio.

To use the HDFS connector, you need:

  • Anypoint Studio version 7.0 or above - An instance of Anypoint Studio. If you do not use Anypoint Studio for development.

  • An instance of Hadoop Distributed File System up and running.

Compatibility

HDFS Connector is compatible with the following:

Application/Service Version

Mule Runtime

4.0 or newer

Apache Hadoop

2.8.1 or newer

POM File Information

If you are coding your connector in XML, set the pom.xml file as follows:

<dependency>
  <groupId>org.mule.connectors</groupId>
  <artifactId>mule-hdfs-connector</artifactId>
  <version>RELEASE</version>
  <classifier>mule-plugin</classifier>
</dependency>

Mule converts RELEASE to the latest version. To specify a version, view Anypoint Exchange and click Dependency Snippets.

Add the Connector to Your Project

Anypoint Studio provides two ways to add the connector to your Studio project: from the Exchange button in the Studio taskbar or from the Mule Palette view.

Add the Connector Using Exchange

  1. In Studio, create a Mule project.

  2. Click the Exchange icon (X) in the upper-left of the Studio task bar.

  3. In Exchange, click Login and supply your Anypoint Platform username and password.

  4. In Exchange, search for "hdfs".

  5. Select the connector and click Add to project.

  6. Follow the prompts to install the connector.

Add the Connector in Studio

  1. In Studio, create a Mule project.

  2. In the Mule Palette view, click (X) Search in Exchange.

  3. In Add Modules to Project, type "hdfs" in the search field.

  4. Click this connector’s name in Available modules.

  5. Click Add.

  6. Click Finish.

Configure the Connector Global Element

To use the HDFS connector in your Mule application, you must configure a global HDFS element that can be used by the connector. The HDFS connector offers the following global configuration options, requiring the following credentials.

Simple Authentication Configuration

Field Description

NameNode URI

The URI of the file system to connect to. This is passed to the HDFS client as the FileSystem#FS_DEFAULT_NAME_KEY configuration entry. This field can be overridden by Configuration Resources and Configuration Entries.

Username

User identity that Hadoop uses for permissions in HDFS. When Simple Authentication is used, Hadoop requires the Username to be set as a system property called HADOOP_USER_NAME. If the variable is not set, Hadoop uses the current logged in OS user’s username.

Configuration Resources

A list of configuration resource files to be loaded by the HDFS client. Here you can provide additional configuration files. For example, core-site.xml.

Configuration Entries

A map of configuration entries to be used by the HDFS client. Here you can provide additional configuration entries as key-value pairs.

hdfs config

Kerberos Authentication Configuration

Field Description

NameNode URI

The URI of the file system to connect to. This is passed to HDFS client as the FileSystem#FS_DEFAULT_NAME_KEY configuration entry. This field can be overridden by the values in Configuration Resources and Configuration Entries.

Username

Kerberos principal. This is passed to HDFS client as the hadoop.job.ugi configuration entry. The Username can be overridden by the values in Configuration Resources and Configuration Entries. If not provided, it uses the currently logged in user’s username.

KeytabPath

Path to the keytab file associated with username. KeytabPath is used to obtain a ticket granting ticket (TGT) from the authorization server. If not provided, the connector looks for a TGT associated to the Username within your local Kerberos cache.

Configuration Resources

A list of configuration resource files to be loaded by the HDFS client. You can provide additional configuration files. For example, core-site.xml.

Configuration Entries

A map of configuration entries to be used by the HDFS client. You can provide additional configuration entries as key-value pairs.

hdfs config with kerberos

Use the Connector

You can use this connector to poll the content of a file at a configurable rate or interval, or to manipulate data in an HDFS server.

Connector Namespace and Schema

When designing your application in Studio, the act of dragging the connector’s operation from the palette into the Anypoint Studio canvas automatically populates the XML code with the connector namespace and schema location.

Namespace: http://www.mulesoft.org/schema/mule/hdfs
Schema Location: http://www.mulesoft.org/schema/mule/connector/current/mule-hdfs.xsd

If you are manually coding the Mule application in Studio’s XML editor or other text editor, define the namespace and schema location in the header of your configuration XML, inside the <mule> tag.

 <mule xmlns:http="xmlns:hdfs="http://www.mulesoft.org/schema/mule/hdfs"
      	xmlns:ee="http://www.mulesoft.org/schema/mule/ee/core"
      	xmlns="http://www.mulesoft.org/schema/mule/core"
       xmlns:doc="http://www.mulesoft.org/schema/mule/documentation"
      	xmlns:spring="http://www.springframework.org/schema/beans"
      	xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      	xsi:schemaLocation="http://www.mulesoft.org/schema/mule/http
      http://www.mulesoft.org/schema/mule/http/current/mule-http.xsd
      http://www.mulesoft.org/schema/mule/ee/core
      http://www.mulesoft.org/schema/mule/ee/core/current/mule-ee.xsd
      http://www.mulesoft.org/schema/mule/core
      http://www.mulesoft.org/schema/mule/core/current/mule.xsd
      http://www.mulesoft.org/schema/mule/hdfs
      http://www.mulesoft.org/schema/mule/hdfs/current/mule-hdfs.xsd">
</mule>

Example Use Case

The following example shows how to create a text file into HDFS using the connector:

  1. In Anypoint Studio, click File > New > Mule Project, name the project, and click OK.

  2. In the search field:

    1. Type "http" and drag the HTTP Listener operation to the canvas.

    2. Click the green plus sign to the right of Connector Configuration.

    3. In the next screen, click OK to accept the default settings. Name the Path endpoint /createFile.

  3. In the Search bar, type "hdfs" and drag the HDFS Write operation to the canvas.

  4. Set Path to /test.txt. This is the path of the file that is going to be created by HDFS. Leave default values for the other options.

  5. Run the application.

  6. From an HTTP client such as Postman or curl, make a POST request with "Content-type:plain/text" to localhost:8081/createFile with the content you want to write as a payload, for example, curl -X POST -H "Content-Type:plain/text" -d "payload to write to file" localhost:8090/createFile.

  7. Check that the /test.txt file has been created and has your content by using Hadoop explorer.

We use cookies to make interactions with our websites and services easy and meaningful, to better understand how they are used and to tailor advertising. You can read more and make your cookie choices here. By continuing to use this site you are giving us your consent to do this.