Contact Us 1-800-596-4880

Spike Control Policy

Policy Name

Spike Control


Regulates API traffic


Quality of Service

First Flex Gateway version available


Returned Status Codes

429 - The number of requests by HTTP APIs exceeded the configured limit. Request rejected after specified number of reattempts.


The Spike Control policy regulates your API request traffic by limiting the number of messages processed by an API. The policy ensures that the number of messages processed within a specified time does not exceed the limit that you configure. If the number is exceeded, the request is queued for retry based on you have configured the policy.

Configuring Policy Parameters

The Spike Control policy does not perform quota enforcement. The configuration parameters are restricted to protect the API and backend. Therefore, to ensure maximum performance, configure these parameters to the lowest possible value.

Flex Gateway Local Mode

In Local Mode, you apply the Spike Control policy to your API via declarative configuration files. Refer to the following policy definition and table of parameters:

- policyRef:
    name: spike-control-flex
    maximumRequests: <int> // REQUIRED, default: 1
    timePeriodInMilliseconds: <int> // REQUIRED, default: 1000
    delayTimeInMillis: <int> // REQUIRED, default: 1000
    delayAttempts: <int> // REQUIRED, default: 1
    queuingLimit: <int> // OPTIONAL, default: 0
    exposeHeaders: <bool> // OPTIONAL, default: false
Parameter Required or Optional Default Value Description




Number of allowed requests in the time window




Time window size




The amount of time for which each request is retained before retrying, in the event there is no quota remaining




The number of times a request is retried before it is rejected




The number of requests that can be queued at the any given time




Enable it only for internal APIs. Allows the policy to return information about the algorithm behavior in the X-RateLimit headers

Resource Configuration Example

- policyRef:
    name: spike-control-flex
    queuingLimit: 0
    exposeHeaders: true
    delayTimeInMillis: 1000
    timePeriodInMilliseconds: 1000
    delayAttempts: 1
    maximumRequests: 100000

Flex Gateway Connected Mode

When you apply the policy to your API from the UI, the following parameters are displayed:

Element Description Required

Number of Reqs

The number of requests allowed (in milliseconds) in the specified window


Time Period

The number of milliseconds, within which a request must be processed


Delay Time in Milliseconds

The amount of time for which each request is retained before retrying (in milliseconds) in case there is no quota remaining


Delay Attempts

The number of times a request is retried before it is rejected


Queuing Limit

The number of requests that can be queued at the any given time


Expose Headers

Enable it only for internal APIs, allows the policy to return information about the algorithm behavior in the X-RateLimit headers


Method & Resource conditions

The option to add configurations to only a select few or all methods and resources of the API


To enforce request quotas, see the Rate Limiting: SLA-Based Policy and Rate Limiting Policy.

How This Policy Works

The Spike Control policy uses a Sliding Window Algorithm which reduces spikes in traffic and protects the gateway instance. The window self-adjusts its display size to accommodate results over time.

If the current window has no remaining request quota, without closing the connection to the client, the Spike Control policy allows requests to be queued for processing later. For your policy to queue the requests, you must configure the policy parameters, as shown in the following example:

  • Time Period: 1 second

  • Number of Reqs: 2

  • Delay Time in Milliseconds: 499 milliseconds

  • Delay Attempts: 1

  • Queuing Limit: 5

Queuing a request requires retaining a thread and an HTTP connection. When you specify a Queuing Limit for the policy, the parameter protects the gateway from running out of resources and ensures that the API does not fail in case of an attack.

Request Timelines

The following diagram illustrates the lifespan of each request accepted by the algorithm, using the configuration defined in the previous section:

  1. The first two requests are accepted and the available quota is now 0.

    The quota increases to 1 after one second (the window size value) passes from request #1.

  2. Request #3 arrives.

    Because no quota is available, the request is queued for 499 milliseconds.

  3. Request #4 arrives.

    Because there still is no quota available, the request is queued for 499 milliseconds.

  4. Shortly before request #3 is retried, request #1 ages a second and its quota is released because it falls out of the window.

  5. Quota is now 1, and request #3 is reprocessed successfully.

    Quota returns to 0 until request #2 ages 1 second.

  6. Request #4 delay concludes and it must be reprocessed.

    Request #4 is rejected. Requests #2 and #3 have not aged a second yet and there is no quota available. Request #4 have no more delays.

  7. Request #5 arrives.

    Because request #2 has already aged more than a second, the request is accepted.

This example illustrates request delays and how queuing regulates spikes in traffic. In high-contention scenarios, the backend continues to function properly (it serves no more than X requests in the last Y milliseconds).

Users querying the API might still see their requests being served, but with higher latency. The exact configuration for the policy depends exclusively on your API and its throughput.


When does the sliding window start?

The window starts with the first request after the policy is successfully applied.

What type of window does the algorithm use?

The algorithm uses a sliding window that monitors transactions for the amount of time that you configure in the policy.

What happens when the quota is exhausted?

When the request quota is exhausted, the Spike Control policy queues the request and retries as configured. The request is rejected if no quota is remaining after several retries. Quota for other requests is available only after the first request exceeds the time limit configured in the policy; in this way, requests are processed in cycles.

Can I configure multiple windows?

No. The Spike Control policy ensures that the backend server does not serve more requests than it can handle. If you need accountability, use either Rate-Limiting SLA or Rate Limit policies instead.

What does each response header mean?

Each response header has information about the current state of the request:

  • X-Ratelimit-Remaining: The amount of quota currently available

  • X-Ratelimit-Limit: The maximum available requests per window

  • X-Ratelimit-Reset: The remaining time, in milliseconds, until the oldest request ages enough to fall outside of the sliding window.

    If quota is still available, the header is set to 0 (zero) because the algorithm can still serve new requests.

By default, the X-RateLimit headers are disabled in the response. You can enable these headers by selecting Expose Headers when you configure the policy.

When should I use Rate Limiting instead of Rate Limiting SLA or Spike Control?

Use Rate Limiting and Rate-Limiting SLA policies for accountability and to enforce a hard limit to a group (using the identifier in Rate Limiting) or to a client application (using Rate Limiting-SLA). If you want to protect your backend, use the Spike Control policy instead.