Free MuleSoft CONNECT Keynote & Expo Pass Available!

Register now+
Nav

Using XPath with the XML Module

The <xml-module:xpath-extract> operation supports the evaluation of XPath expressions.

Because XPath expressions can match any number of individual elements, this operation returns a List of Strings. If no element matches the expression, the operation returns an empty list.

XPath expressions are also namespace-aware, so this operation allows namespace mappings. These mappings are merged with any namespaces defined in the referenced namespace-directory, meaning that the evaluation combines both sets of namespace mappings.

Note that though MuleSoft supports the XPath standard, DataWeave is capable of achieving the same use cases and is the recommended tool for extracting and transforming XML documents.

This example shows how to work with XML that contains the script of the play Othello by William Shakespeare. The XML file looks like this:


         
      
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
<?xml version="1.0"?>
<!--<!DOCTYPE PLAY SYSTEM "play.dtd">-->
<PLAY>
    <TITLE>The Tragedy of Othello, the Moor of Venice</TITLE>
    <FM>
        <P>Text placed in the public domain by Moby Lexical Tools, 1992.</P>
        <P>SGML markup by Jon Bosak, 1992-1994.</P>
        <P>XML version by Jon Bosak, 1996-1998.</P>
        <P>This work may be freely copied and distributed worldwide.</P>
    </FM>
    <PERSONAE>
        <TITLE>Dramatis Personae</TITLE>

        <PERSONA>DUKE OF VENICE</PERSONA>
        <PERSONA>BRABANTIO, a senator.</PERSONA>
        <PERSONA>Other Senators.</PERSONA>
        <PERSONA>GRATIANO, brother to Brabantio.</PERSONA>
        <PERSONA>LODOVICO, kinsman to Brabantio.</PERSONA>
        <PERSONA>OTHELLO, a noble Moor in the service of the Venetian state.</PERSONA>
        <PERSONA>CASSIO, his lieutenant.</PERSONA>
        <PERSONA>IAGO, his ancient.</PERSONA>
        <PERSONA>RODERIGO, a Venetian gentleman.</PERSONA>
        <PERSONA>MONTANO, Othello's predecessor in the government of Cyprus.</PERSONA>
        <PERSONA>Clown, servant to Othello. </PERSONA>
        <PERSONA>DESDEMONA, daughter to Brabantio and wife to Othello.</PERSONA>
        <PERSONA>EMILIA, wife to Iago.</PERSONA>
        <PERSONA>BIANCA, mistress to Cassio.</PERSONA>
        <PERSONA>Sailor, Messenger, Herald, Officers, Gentlemen, Musicians, and Attendants.</PERSONA>
    </PERSONAE>

    <SCNDESCR>SCENE  Venice: a Sea-port in Cyprus.</SCNDESCR>
    <PLAYSUBT>OTHELLO</PLAYSUBT>
    <ACT><TITLE>ACT I</TITLE>
    <SCENE><TITLE>SCENE I.  Venice. A street.</TITLE>
    <STAGEDIR>Enter RODERIGO and IAGO</STAGEDIR>
    <SPEECH>
        <SPEAKER>RODERIGO</SPEAKER>
        <LINE>Tush! never tell me; I take it much unkindly</LINE>
        <LINE>That thou, Iago, who hast had my purse</LINE>
        <LINE>As if the strings were thine, shouldst know of this.</LINE>
    </SPEECH>

.....
<!-- A LOT MORE CONTENT -->

Now assume that you have the text of the entire play in the format shown above. You can use this script to extract all the lines that contain the word handkerchief:

XPath Script

         
      
1
2
3
<xml-module:xpath-extract xpath="//LINE[contains(., $word)]"> (1)
    <xml-module:context-properties>#[{'word': 'handkerchief'}]</xml-module:context-properties> (2)
</xml-module:xpath-extract>
1 Provides an XPath expression that uses the $ prefix to reference context properties.
2 Uses DataWeave in the context-properties parameter to produce the context properties.

The XPath script above outputs the following List of Strings, where each LINE is a different item in the list:

Output
<LINE>For the same handkerchief?</LINE>
<LINE>What handkerchief?</LINE>
<LINE>What handkerchief?</LINE>
<LINE>Have you not sometimes seen a handkerchief</LINE>
<LINE>I know not that; but such a handkerchief--</LINE>
<LINE>Where should I lose that handkerchief, Emilia?</LINE>
<LINE>Lend me thy handkerchief.</LINE>
<LINE>That handkerchief</LINE>
<LINE>Fetch me the handkerchief: my mind misgives.</LINE>
<LINE>The handkerchief!</LINE>
<LINE>The handkerchief!</LINE>
<LINE>The handkerchief!</LINE>
<LINE>Sure, there's some wonder in this handkerchief:</LINE>
<LINE>But if I give my wife a handkerchief,--</LINE>
<LINE>But, for the handkerchief,--</LINE>
<LINE>Boding to all--he had my handkerchief.</LINE>
<LINE>--Handkerchief--confessions--handkerchief!--To</LINE>
<LINE>--Is't possible?--Confess--handkerchief!--O devil!--</LINE>
<LINE>mean by that same handkerchief you gave me even now?</LINE>
<LINE>By heaven, that should be my handkerchief!</LINE>
<LINE>And did you see the handkerchief?</LINE>
<LINE>That handkerchief which I so loved and gave thee</LINE>
<LINE>By heaven, I saw my handkerchief in's hand.</LINE>
<LINE>I saw the handkerchief.</LINE>
<LINE>It was a handkerchief, an antique token</LINE>
<LINE>O thou dull Moor! that handkerchief thou speak'st of</LINE>
<LINE>How came you, Cassio, by that handkerchief</LINE>

Note that by default, the operation attempts to transform an XML document at the message payload level. However, you can use the content parameter to supply the input document:


         
      
1
2
3
4
5
6
7
<flow name="linesFromOthello">
    <file:read path="othello.xml" target="othello" />
    <xml-module:xpath-extract xpath="//LINE[contains(., $word)]">
        <xml-module:content>#[vars.othello]</xml-module:content>
        <xml-module:context-properties>#[{'word': 'handkerchief'}]</xml-module:context-properties>
    </xml-module:xpath-extract>
</flow>

The example above gets the content from elsewhere (in this case, from a file in the filesystem) and then references it using a simple expression.

Mapping Namespaces

There are cases in which the input XML document contains elements with different namespaces. For example, consider the following document:


         
      
1
2
3
4
5
6
7
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
  <soap:Body foo="bar">
    <ns1:echo xmlns:mule="http://simple.component.mule.org/">
      <ns1:echo>Hello!</ns1:echo>
    </ns1:echo>
  </soap:Body>
</soap:Envelope>

The next example maps each prefix used in the XPath expressions to the corresponding namespace uri.


         
      
1
2
3
4
5
6
7
8
 <flow name="xpathWithInlineNs">
    <xml-module:xpath-extract xpath="/soap:Envelope/soap:Body/mule:echo/mule:echo">
        <xml-module:namespaces>
            <xml-module:namespace prefix="soap" uri="http://schemas.xmlsoap.org/soap/envelope/"/>
            <xml-module:namespace prefix="mule" uri="http://simple.component.mule.org/"/>
        </xml-module:namespaces>
    </xml-module:xpath-extract>
</flow>

But what happens if you need to execute several XPath expressions that use the same namespaces? To avoid performing the mapping each time, you can create a namespace-directory for the mappings and then reference that directory, for example:


         
      
1
2
3
4
5
6
7
8
9
10
11
12
<xml-module:namespace-directory name="fullNs"> (1)
    <xml-module:namespaces>
        <xml-module:namespace prefix="soap" uri="http://schemas.xmlsoap.org/soap/envelope/"/>
        <xml-module:namespace prefix="mule" uri="http://simple.component.mule.org/"/>
    </xml-module:namespaces>
</xml-module:namespace-directory>

<flow name="xpathWithFullNs">
    <xml-module:xpath-extract
      xpath="/soap:Envelope/soap:Body/mule:echo/mule:echo"
      namespaceDirectory="fullNs"/> (2) (3)
</flow>
1 The namespace-directory element is used to map prefixes to the actual namespace URIs. Notice these prefixes should match those used in the input document.
2 You can then reference those prefixes in your XPath expression.
3 Finally, use the namespaceDirectory parameter to reference the mapping created in step 1.

Finally, you can combine use cases. For example, you can have a global namespaceDirectory that contains some mappings and then add additional ones at the operation level. This combination is useful if you have a lot of documents that, for example, all contain the soap namespace but only one of them contains the mule namespace:


         
      
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
<xml-module:namespace-directory name="partialNs"> (1)
    <xml-module:namespaces>
        <xml-module:namespace prefix="soap" uri="http://schemas.xmlsoap.org/soap/envelope/"/>
    </xml-module:namespaces>
</xml-module:namespace-directory>

<flow name="xpathWithMergedNs">
    <xml-module:xpath-extract
      xpath="/soap:Envelope/soap:Body/mule:echo/mule:echo"
      namespaceDirectory="partialNs">  (2) (3)
        <xml-module:namespaces>
            <xml-module:namespace prefix="mule" uri="http://simple.component.mule.org/"/> (4)
        </xml-module:namespaces>
    </xml-module:xpath-extract>
</flow>
1 Declare a namespace-directory like before, but only supply the common namespaces.
2 Provide your XPath expression.
3 Reference the partial namespace directory.
4 Provide the additional mapping.

It is important to note that the prefixes used in the mappings and XPath expressions must match the ones used in the input document.

Using XPath as a Function

The XML module provides a DataWeave function for extracting values using XPath. This is useful in cases such as a <choice> or <foreach> routers.

Note that can also use this function inside any DataWeave transformation.

Using the XPath Function with <foreach>

Returning to the Othello lines example, imagine that you want to iterate over all those lines and process them separately:


          
       
1
2
3
<foreach collection="#[XmlModule::xpath('//LINE', payload, {})]">
    <flow-ref name="processLine" />
</foreach>
  • The first argument is the XPath expression.

  • The second one is the input document, in this case, the message payload.

  • The third one is the context properties. Because this case does not require any, the function passes an empty object ({}).

Using XPath Function with <choice>

Returning to the Othello lines example, suppose you want to do something if the input document does not contain the word handkerchief.


          
       
1
2
3
4
5
<choice>
    <when expression="#[isEmpty(XmlModule::xpath('//LINE[contains(., \$word)]', vars.untrustedOthello, {'word': 'handkerchief'}))]">
        <flow-ref name="alteredOthello" />
    </when>
</choice>
  • Since the XmlModule::xpath function returns a list, the example uses the DataWeave isEmpty() function to test whether the output is empty or not.

  • The first argument of the XPath function is an expression that uses the $word context property.

  • The second argument is the input document within a variable.

  • The third argument provides the context properties values.

See Also