Rate Limiting Advanced - Plugin

Looking for the plugin's configuration parameters? You can find them in the Rate Limiting Advanced configuration reference doc.

Rate limit how many HTTP requests can be made in a given time frame.

The Rate Limiting Advanced plugin offers more functionality than the Kong Gateway (OSS) Rate Limiting plugin, such as:

Enhanced capabilities to tune the rate limiter, provided by the parameters limit and window_size. Learn more in Multiple Limits and Window Sizes
Support for Redis Sentinel, Redis cluster, and Redis SSL
Increased performance: Rate Limiting Advanced has better throughput performance with better accuracy. The plugin allows you to tune performance and accuracy via a configurable synchronization of counter data with the backend storage. This can be controlled by setting the desired value on the sync_rate parameter.
More limiting algorithms to choose from: These algorithms are more accurate and they enable configuration with more specificity. Learn more about our algorithms in How to Design a Scalable Rate Limiting Algorithm.
More control over which requests contribute to incrementing the rate limiting counters via the disable_penalty parameter
Consumer groups support: Apply different rate limiting configurations to select groups of consumers. Learn more in Rate limiting for consumer groups

Headers sent to the client

When this plugin is enabled, Kong sends some additional headers back to the client indicating the allowed limits, how many requests are available, and how long it will take until the quota will be restored.

For example:

RateLimit-Limit: 6
RateLimit-Remaining: 4
RateLimit-Reset: 47

The plugin also sends headers indicating the limits in the time frame and the number of remaining minutes:

X-RateLimit-Limit-Minute: 10
X-RateLimit-Remaining-Minute: 9

You can optionally hide the limit and remaining headers with the hide_client_headers option.

If more than one limit is being set, the plugin returns a combination of more time limits:

X-RateLimit-Limit-Second: 5
X-RateLimit-Remaining-Second: 4
X-RateLimit-Limit-Minute: 10
X-RateLimit-Remaining-Minute: 9

If any of the limits configured has been reached, the plugin returns an HTTP/1.1 429 status code to the client with the following JSON body:

{ "message": "API rate limit exceeded" }

The [Retry-After] header will be present on 429 errors to indicate how long the service is expected to be unavailable to the client. When using window_type=sliding and RateLimit-Reset, Retry-After may increase due to the rate calculation for the sliding window.

The headers RateLimit-Limit, RateLimit-Remaining, and RateLimit-Reset are based on the Internet-Draft RateLimit Header Fields for HTTP and may change in the future to respect specification updates.

Multiple limits and window sizes

An arbitrary number of limits/window sizes can be applied per plugin instance. This allows you to create multiple rate limiting windows (e.g., rate limit per minute and per hour, and per any arbitrary window size). Because of limitations with Kong’s plugin configuration interface, each nth limit will apply to each nth window size. For example:

curl -X POST http://localhost:8001/services/example-service/plugins \
  --data "name=rate-limiting-advanced" \
  --data "config.limit=10" \
  --data "config.limit=100" \
  --data "config.window_size=60" \
  --data "config.window_size=3600"

This example applies rate limiting policies, one of which will trip when 10 hits have been counted in 60 seconds, or the other when 100 hits have been counted in 3600 seconds. For more information, see the Enterprise Rate Limiting Library.

The number of configured window sizes and limits parameters must be equal (as shown above); otherwise, an error occurs:

You must provide the same number of windows and limits

Window types

The Rate Limiting Advanced plugin supports these window types:

Fixed window: Fixed windows consist of buckets that are statically assigned to a definitive time range. Each request is mapped to only one fixed window based on its timestamp and will affect only that window’s counters.
Sliding window (default): A sliding window tracks the number of hits assigned to a specific key (such as an IP address, consumer, credential) within a given time window, taking into account previous hit rates to create a dynamically calculated rate. The default (and recommended) sliding window type ensures a resource is not consumed at a higher rate than what is configured.

For example, consider this configuration:

Limit size = 10
Window size = 60 seconds

With a fixed window type, you can predict when the window is going to be reset and if the client sends a burst of traffic. For example, if 12 requests arrive in one minute, 10 requests are accepted with a 200 response and two requests are rejected with a 429 response.

If you use a sliding window, the first instance is the same: the client sends a burst of 12 requests per minute, 10 requests are accepted with a 200 response and two requests are rejected with a 429 response. In this case, it appears to the client that the window is never reset. The algorithm counts the response 429 and the API is blocked indefinitely. This happens because the burst of traffic rate of 12 requests per minute is higher than the rate configured in the plugin, which is 10 requests per minute. If the client reduces the number of requests, then you get the response 200 again.

When the client receives a 429 response, it also receives a Retry-After:<seconds> header. This means the client has to wait some number of seconds before making a new request. If the client makes another request in less than this time, you get the 429 response again. Otherwise, the window is reset.

The sliding window type ensures the API is consumed in the configured requests per second rate. This is not always true for the fixed window strategy.

Consider the same example with 10 requests per minute instead of 12. Let’s say the client sends all 10 requests in the 59th second of the window:

In a fixed window, the window resets a second later, and the client can send another 10 requests in the first second of the following window. All of the requests are accepted, making the acceptance rate higher than the configured rate in that two-second time period.
In a sliding window, the window moves during the last 60 seconds to ensure it meets the configured rate.

Implementation considerations

Limit by IP Address

If limiting by IP address, it’s important to understand how the IP address is determined. The IP address is determined by the request header sent to Kong from downstream. In most cases, the header has a name of X-Real-IP or X-Forwarded-For.

By default, Kong uses the header name X-Real-IP. If a different header name is required, it needs to be defined using the real_ip_header Nginx property. Depending on the environmental network setup, the trusted_ips Nginx property may also need to be configured to include the load balancer IP address.

Strategies

The plugin supports three strategies.

Strategy	Pros	Cons
`local`	Minimal performance impact.	Less accurate. Unless there’s a consistent-hashing load balancer in front of Kong, it diverges when scaling the number of nodes.
`cluster`	Accurate¹, no extra components to support.	Each request forces a read and a write on the data store. Therefore, relatively, the biggest performance impact.
`redis`	Accurate¹, less performance impact than a `cluster` policy.	Needs a Redis installation. Bigger performance impact than a `local` policy.

[1]: Only when sync_rate option is set to 0 (synchronous behavior). See the configuration reference for more details.

Two common use cases are:

Every transaction counts. The highest level of accuracy is needed. An example is a transaction with financial consequences.
Backend protection. Accuracy is not as relevant. The requirement is only to protect backend services from overloading that’s caused either by specific users or by attacks.

Every transaction counts

In this scenario, because accuracy is important, the local policy is not an option. Consider the support effort you might need for Redis, and then choose either cluster or redis.

You could start with the cluster policy, and move to redis if performance reduces drastically.

Do remember that you cannot port the existing usage metrics from the data store to Redis. This might not be a problem with short-lived metrics (for example, seconds or minutes) but if you use metrics with a longer time frame (for example, months), plan your switch carefully.

Backend protection

If accuracy is of lesser importance, choose the local policy. You might need to experiment a little before you get a setting that works for your scenario. As the cluster scales to more nodes, more user requests are handled. When the cluster scales down, the probability of false negatives increases. So, adjust your limits when scaling.

For example, if a user can make 100 requests every second, and you have an equally balanced 5-node Kong cluster, setting the local limit to something like 30 requests every second should work. If you see too many false negatives, increase the limit.

To minimize inaccuracies, consider using a consistent-hashing load balancer in front of Kong. The load balancer ensures that a user is always directed to the same Kong node, thus reducing inaccuracies and preventing scaling problems.

Fallback from Redis

When the redis strategy is used and a Kong Gateway node is disconnected from Redis, the rate-limiting-advanced plugin will fall back to local. This can happen when the Redis server is down or the connection to Redis broken. Kong Gateway keeps the local counters for rate limiting and syncs with Redis once the connection is re-established. Kong Gateway will still rate limit, but the Kong Gateway nodes can’t sync the counters. As a result, users will be able to perform more requests than the limit, but there will still be a limit per node.

Rate limiting for consumer groups

You can use the consumer groups entity to manage custom rate limiting configurations for subsets of consumers. This is enabled by default without using the /consumer_groups/:id/overrides endpoint.

You can see an example of this in the Enforcing rate limiting tiers with the Rate Limiting Advanced plugin guide.

Next Rate Limiting Advanced Configuration