Rate Limiting and Throttling Reference (Nov 2017 and Jul 2017)

The Rate Limiting policy limits the number of requests an API accepts within a period of time. The API rejects requests that exceed the limit.

You can configure multiple limits in units of time ranging from milliseconds to years.

The Throttling policy is similar to the Rate Limiting policy, except requests that exceed the limits enter a queue for possible processing in a subsequent period. The API eventually rejects the request if unable to process it after a certain number of attempts. You can configure a delay between attempts to retry processing rejected requests, as well as a limit on the number of retries.

The Rate Limiting and Throttling policies impose a limit on all requests. The SLA-based Rate Limiting and Throttling policies impose the limit on those requests specified by the level of access granted to the requesting application.


When you click apply on the API version details page, the policy configuration dialog appears. You set the number of requests, period of time for receiving the requests, and a time unit. For example, in the Limits section of the policy configuration dialog, you can set the following limits.

# of Reqs Time Period Time Unit







In the case of the rate limiting policy, if the API receives 123 requests within 2000 ms, the API rejects further requests, and if the API receives 100 requests within 1000 ms, the API rejects further requests. In the case of the throttling policy, requests are queued instead of rejected.

Adjusting the Rate Limit for Multiple Workers

Under the following conditions, you need to divide the rate limit by the number of workers and configure the result as the rate limit:

  • An application uses multiple workers that cannot share usage data, which is the case in a non-clustered environment (cloud or on-premises).

  • The application workload is equally distributed across the workers.

    Use a load balancer that distributes requests equally.

Rate Limiting and Throttling in a Mule cluster

When you apply rate limiting or throtting to a clustered environment, a component called cluster quota manager is introduced. This component manages the quota you set and distributes the quota among cluster members. Each node must request a quota allowance from the quota manager in order to serve requests. If that quota is exhausted, a node has to request more. If the quota manager doesn’t grant more quota, then the node cannot handle any more requests.

Due to the normal latency in network communication, if you set too short a time period in which to receive a request, the time required to request and receive the quota between a node and the quota manager can exceed the quota lifetime. The time might expire before the node can receive the additional quota. Using a sufficient period of time for receiving a request is recommended when running in a cluster. The minimal amount of time depends on a number of factors, such as workload and network speed. As a rule of thumb, use 1 minute in most cases, and not less than 20 seconds.


Use an HTTP endpoint instead of Jetty if you want to use rate-limited and throttling policies. Due to a limitation in the Jetty transport, rate-limiting and throttling policies do not work on an API that uses a Jetty inbound endpoint.