About Throttling and Rate Limiting Policies
The Rate Limiting policy limits the number of requests an API accepts within a window of time. The API rejects requests that exceed the limit. You can configure multiple limits with window sizes ranging from milliseconds to years.
The Throttling policy queues requests that exceed limits for possible processing in a subsequent window. The API eventually rejects the request if processing cannot occur after a certain number of attempts. You can configure a delay between retries, as well as limit the number of retries.
The Rate Limiting and Throttling policies impose a limit on all requests or a specific resource (in Mule 3.x and earlier, the API must be APIkit-based). The service level access (SLA)-based Rate Limiting and Throttling policies add further granularity, limiting requests by the level of access granted to the requesting application. These policies contain a persistence engine to preserve the current state of the policy in case of sudden restart (power outage).
How you configure the rate or throttling limit depends on whether the policy is SLA-based or not. If the policy is not SLA-based, you configure limits when you apply the policy.
This example sets one limit of 10 requests (quota) per every 2 seconds (window).
If the policy is SLA-based, Mule Runtime fetches all contracts, which defines the relationship between an SLA and the application, for the API.
When you configure Mule Runtime in a cluster, you can configure the policy for distributed access control.
Rate Limiting and Throttling policies are designed to limit API access, but have different intentions: Rate limiting protects an API by applying a hard limit on its access. Throttling shapes API access by smoothing spikes in traffic.