Rate limit how many HTTP requests a developer can make in a given period of seconds, minutes, hours, days, months or years. If the underlying Service/Route (or deprecated API entity) has no authentication layer, the Client IP address will be used, otherwise the Consumer will be used if an authentication plugin has been configured.

Note: The functionality of this plugin as bundled with versions of Kong prior to 0.13.1 and Kong Enterprise prior to 0.32 differs from what is documented herein. Refer to the CHANGELOG for details.


  • plugin: a plugin executing actions inside Kong before or after a request has been proxied to the upstream API.
  • Service: the Kong entity representing an external upstream API or microservice.
  • Route: the Kong entity representing a way to map downstream requests to upstream services.
  • Consumer: the Kong entity representing a developer or machine using the API. When using Kong, a Consumer only communicates with Kong which proxies every call to the said upstream API.
  • Credential: a unique string associated with a Consumer, also referred to as an API key.
  • upstream service: this refers to your own API/service sitting behind Kong, to which client requests are forwarded.


Enabling the plugin on a Service

Configure this plugin on a Service by making the following request:

$ curl -X POST http://kong:8001/services/{service}/plugins \
    --data "name=rate-limiting"  \
    --data "config.second=5" \
    --data "config.hour=10000"

  • service: the id or name of the Service that this plugin configuration will target.

Enabling the plugin on a Route

Configure this plugin on a Route with:

$ curl -X POST http://kong:8001/routes/{route_id}/plugins \
    --data "name=rate-limiting"  \
    --data "config.second=5" \
    --data "config.hour=10000"

  • route_id: the id of the Route that this plugin configuration will target.

Enabling the plugin on a Consumer

You can use the http://localhost:8001/plugins endpoint to enable this plugin on specific Consumers:

$ curl -X POST http://kong:8001/plugins \
    --data "name=rate-limiting" \
    --data "consumer_id={consumer_id}"  \
    --data "config.second=5" \
    --data "config.hour=10000"

Where consumer_id is the id of the Consumer we want to associate with this plugin.

You can combine consumer_id and service_id

in the same request, to furthermore narrow the scope of the plugin.

Global plugins

All plugins can be configured using the http://kong:8001/plugins/ endpoint. A plugin which is not associated to any Service, Route or Consumer (or API, if you are using an older version of Kong) is considered "global", and will be run on every request. Read the Plugin Reference and the Plugin Precedence sections for more information.


Here's a list of all the parameters which can be used in this plugin's configuration:

form parameterdefaultdescription
nameThe name of the plugin to use, in this case rate-limiting
service_idThe id of the Service which this plugin will target.
route_idThe id of the Route which this plugin will target.
enabledtrueWhether this plugin will be applied.
consumer_idThe id of the Consumer which this plugin will target.

The amount of HTTP requests the developer can make per second. At least one limit must exist.


The amount of HTTP requests the developer can make per minute. At least one limit must exist.


The amount of HTTP requests the developer can make per hour. At least one limit must exist.


The amount of HTTP requests the developer can make per day. At least one limit must exist.


The amount of HTTP requests the developer can make per month. At least one limit must exist.


The amount of HTTP requests the developer can make per year. At least one limit must exist.



The entity that will be used when aggregating the limits: consumer, credential, ip. If the consumer or the credential cannot be determined, the system will always fallback to ip.



The rate-limiting policies to use for retrieving and incrementing the limits. Available values are local (counters will be stored locally in-memory on the node), cluster (counters are stored in the datastore and shared across the nodes) and redis (counters are stored on a Redis server and will be shared across the nodes).



A boolean value that determines if the requests should be proxied even if Kong has troubles connecting a third-party datastore. If true requests will be proxied anyways effectively disabling the rate-limiting function until the datastore is working again. If false then the clients will see 500 errors.



Optionally hide informative response headers.


When using the redis policy, this property specifies the address to the Redis server.



When using the redis policy, this property specifies the port of the Redis server. By default is 6379.


When using the redis policy, this property specifies the password to connect to the Redis server.



When using the redis policy, this property specifies the timeout in milliseconds of any command submitted to the Redis server.



When using the redis policy, this property specifies Redis database to use.

Headers sent to the client

When this plugin is enabled, Kong will send some additional headers back to the client telling how many requests are available and what are the limits allowed, for example:

X-RateLimit-Limit-Minute: 10
X-RateLimit-Remaining-Minute: 9

or it will return a combination of more time limits, if more than one is being set:

X-RateLimit-Limit-Second: 5
X-RateLimit-Remaining-Second: 4
X-RateLimit-Limit-Minute: 10
X-RateLimit-Remaining-Minute: 9

If any of the limits configured is being reached, the plugin will return a HTTP/1.1 429 status code to the client with the following JSON body:

{"message":"API rate limit exceeded"}

Implementation considerations

The plugin supports 3 policies, which each have their specific pros and cons.

policy pros cons
cluster accurate, no extra components to support relatively the biggest performance impact, each request forces a read and a write on the underlying datastore.
redis accurate, lesser performance impact than a cluster policy extra redis installation required, bigger performance impact than a local policy
local minimal performance impact less accurate, and unless a consistent-hashing load balancer is used in front of Kong, it diverges when scaling the number of nodes

There are 2 use cases that are most common:

  1. every transaction counts. These are for example transactions with financial consequences. Here the highest level of accuracy is required.
  2. backend protection. This is where accuracy is not as relevant, but it is merely used to protect backend services from overload. Either by specific users, or to protect against an attack in general.


Enterprise-Only The Kong Community Edition of this Rate Limiting plugin does not include Redis Sentinel support. Kong Enterprise Subscription customers have the option of using Redis Sentinel with Kong Rate Limiting to deliver highly available master-slave deployments.

Every transaction counts

In this scenario, the local policy is not an option. So here the decision is between the extra performance of the redis policy against its extra support effort. Based on that balance, the choice should either be cluster or redis.

The recommendation is to start with the cluster policy, with the option to move over to redis if performance reduces drastically. Keep in mind existing usage metrics cannot be ported from the datastore to redis. Generally with shortlived metrics (per second or per minute) this is not an issue, but with longer lived ones (months) it might be, so you might want to plan your switch more carefully.

Backend protection

As accuracy is of lesser importance, the local policy can be used. It might require some experimenting to get the proper setting. For example, if the user is bound to 100 requests per second, and you have an equally balanced 5 node Kong cluster, setting the local limit to something like 30 requests per second should work. If you are worried about too many false-negatives, increase the value.

Keep in mind as the cluster scales to more nodes, the users will get more requests granted, and likewise when the cluster scales down the probability of false-negatives increases. So in general, update your limits when scaling.

The above mentioned inaccuracy can be mitigated by using a consistent-hashing load balancer in front of Kong, that ensures the same user is always directed to the same Kong node. This will both reduce the inaccuracy and prevent the scaling issues.

Most likely the user will be granted more than was agreed when using the local policy, but it will effectively block any attacks while maintaining the best performance.