Nav

Querying a Database

When you configure a Database operation using the Database Connector, there are several ways to add variable values to the SQL statement you execute in the database.

Using Input Parameters to Protect Databasae Queries

The select operation is used to retrieve information from the RDBMS. The primary concept of this operation is that you will supply a SQL query and use DataWeave to supply the parameters:


         
      
1
2
3
4
5
6
7
8
<flow name="selectParameterizedQuery">
  <db:select config-ref="dbConfig">
    <db:sql>SELECT * FROM PLANET WHERE name = :name</db:sql>
    <db:input-parameters>
      #[{'name' : payload}]
    </db:input-parameters>
  </db:select>
</flow>

As you can see in the above example, input parameters are supplied as key-value pairs, which we can now create by embedding a DataWeave script. Those keys are used in conjunction with the semicolon character (:) to reference to a parameter value by name. This is the recommended approach for using parameters in your query.

The alternative is to directly write ` <db:sql>SELECT * FROM PLANET WHERE name = #[payload] </db:sql>`, but this is a very dangerous practice that is not recommended.

The advantages of using input parameters to configure the WHERE clause in a SELECT statement this way are:

  • The query becomes immune to SQL injection attacks

  • The connector can perform optimizations that are not possible otherwise, which improves the application’s overall performance

DataSense is available for the operation’s input and output. By analyzing the query, the connector will automatically calculate the structure of the query’s output by analyzing the projection section of the SQL statement. At the same time, by comparing the conditions in the WHERE clause to the table structure, it will also generate input DataSense to help you build the DataWeave script that will generate the input parameters.

Dynamic Queries

Sometimes, you not only need to parameterize the WHERE clause, but also parameterize parts of the query itself. Example use cases for this would be queries which need to hit online vs. historic tables depending on a condition, or complex queries where the project columns need to vary.

In this example, you can see how a full expression is used to produce the query by building a string in which the table depends on a variable $(vars.table). An important thing to notice is that although some of the query text is dynamic "SELECT * FROM $(vars.table), the WHERE clause is still using the best practice of defining the WHERE condition using input parameters, in this case WHERE name = :name:


         
      
1
2
3
4
5
6
7
<set-variable variableName="table" value="PLANET"/>
<db:select config-ref="dbConfig">
    <db:sql>#["SELECT * FROM $(vars.table) WHERE name = :name"]</db:sql>
    <db:input-parameters>
        #[{'name' : payload}]
    </db:input-parameters>
</db:select>
In Mule 3, the concept of select was split in parameterized and dynamic queries, and you couldn’t use both at the same time. You had to choose between having a dynamic query or having the advantages of using parameters (SQL Injection protection, PreparedStatement optimization, etc.). Furthermore, the syntax to do one or the other was different, so you had to learn two different ways of doing the same thing. But with the new Database Connector in Mule 4, you can now use both methods at the same time by using expressions in the query.

Why do I need dynamic queries at all for the example above? Can I just treat the table like another Input Parameter? Then answer is no. Input Parameters can only be applied to parameters in a WHERE clause. To modify any other part of the query, you need to use DW’s interpolation operator.

Streaming large results

Database tables tend to be big. One single query might return tens of thousands of records, especially when dealing with integration use cases. Streaming is a great solution for this. What does streaming mean? Suppose you have a query which returns 10K rows, attempting to fetch all those rows at once will result in the following:

  • Performance degradation, since that’s a big pull from the network

  • A risk of running out of memory, since all that information needs to be loaded into RAM.

Streaming means that the connector will not fetch the 10K rows at once; instead, it will fetch a smaller chunk, and once that chunk has been consumed it will go fetch the rest. That way, you can reduce pressure over the network and memory.

Streaming in Mule 4 vs Streaming in Mule 3

In Mule 3.x this was something you had to specifically enable because it was disabled by default. In Mule 4, this is transparent and always enabled, you don’t have to worry about it anymore, you can simply trust that the feature is there.

Another improvement from Mule 3 is that we can now use the new repeatable streams mechanism from Mule 4. That means that streams are now repeatable and you can have DataWeave and other components process the same stream many times, even in parallel. For more information on repeatable streams, please read the Streaming in Mule 4.0 article.

Limiting Results

Just because the Mule Runtime gives the connector to grafecully handle streaming, it doesn’t mean it’s a good idea to move large chunks of data from the Database to Mule. Even with streaming, a typical SQL query can return many rows, each one containing lots of information. This generates:

  • Large network overhead

  • Large resources footprint on the Database server

  • Large memory and requirements in the Mule Runtime

The select operation provides to parameters to help with this: fetchSize and maxRows:


          
       
1
2
3
<db:select fetchSize="200" maxRows="1000" config-ref="dbConfig">
  <db:sql>select * from some_table</db:sql>
</db:select>

In this example, we are telling the connector to not fetch more that 1000 total rows as the result of an SQL SELECT query, but to try to only stream results from the Database to the Database Connector 200 at a time. So in this example it would require 5 separate network round trips to retrieve 1000 matching rows.

The combination limits the total amount of information that will be retrieved (the maxRows value), and also guarantees that the data is returned from the Database over the network in smaller chunks (the fetchSize value).

The fetchSize is a hint to the JDBC driver and need not always be enforced. The behavior varies between JDBC driver providers. Often, the default fetchSize for a JDBC driver is 10.

Setting a Query Timeout

Query Timeout Reference

Sometimes database queries take a long time to execute. The following factors often cause delays in query execution:

  • An inefficient query, such as one having improper indexing that iterates over many rows

  • A busy RDBMS or network

  • A lock contention

Generally, it’s recommended to set a timeout on the query. To manage timeouts, configure queryTimeout and queryTimeoutUnit. The following example shows how to set a timeout for the Select operation, but all operations support setting a timeout:


         
      
1
2
3
<db:select queryTimeout="0" queryTimeoutUnit="SECONDS" config-ref="dbConfig">
   <db:sql>select * from some_table</db:sql>
</db:select>