Kong Gateway resource sizing guidelines

Uses: Kong Gateway

Scaling dimensions

Kong Gateway measures performance in the following dimensions:

Performance dimension	Measured in	Performance limited by…	Description
Latency	Microseconds or milliseconds	Memory-bound Add more database caching memory to decrease latency.	The delay between the downstream client sending a request and receiving a response. Increasing the number of Routes and plugins in a Kong Gateway cluster increases the amount of latency that’s added to each request.
Throughput	Seconds or minutes	CPU-bound Scale Kong Gateway vertically or horizontally to increase throughput.	The number of requests that Kong Gateway can process in a given time span.

When all other factors remain the same, decreasing the latency for each request increases the maximum throughput in Kong Gateway. This is because there is less CPU time spent handling each request, and more CPU available for processing traffic as a whole. Kong Gateway is designed to scale horizontally to add more overall compute power for configurations that add substantial latency into requests, while needing to meet specific throughput requirements.

Performance benchmarking and optimization as a whole is a complex exercise that must account for a variety of factors, including those external to Kong Gateway, such as the behavior of upstream services, or the health of the underlying hardware on which Kong Gateway is running.

General resource guidelines

These recommendations are a baseline guide only. For performance-critical environments, you should conduct specific tuning or benchmarking efforts.

Hybrid mode with large number of entities v3.5+

When Kong Gateway is operating in hybrid mode with a large number of entities (like Routes and Gateway Services), it can benefit from enabling dedicated_config_processing.

When enabled, certain CPU-intensive steps of the data plane reconfiguration operation are offloaded to a dedicated worker process. This reduces proxy latency during reconfigurations at the cost of a slight increase in memory usage. The benefits of this are most apparent with configurations of more than 1,000 entities.

Kong Gateway resources

Kong Gateway is designed to operate in a variety of deployment environments. It has no minimum system requirements to operate.

Resource requirements vary substantially based on configuration. The following high-level matrices offer a guideline for determining system requirements based on overall configuration and performance requirements.

The following table provides rough usage requirement estimates based on simplified examples with latency and throughput requirements on a per-node basis:

Size	Number of configured entities	Latency requirements	Throughput requirements	Use cases
Development	< 100	< 100 ms	< 500 RPS	Dev/test environments Latency-insensitive gateways
Small	< 1000	< 20 ms	< 2500 RPS	Production clusters Greenfield traffic deployments
Medium	< 10000	< 10 ms	< 10000 RPS	Mission-critical clusters Legacy and greenfield traffic Central enterprise-grade gateways
Large	< 50000+	< 10 ms	< 10000 RPS	Mission-critical clusters Legacy and greenfield traffic Central enterprise-grade gateways

Database resources

We do not provide any specific numbers for database sizing because it depends on your particular setup. Sizing varies based on:

Traffic
Number of nodes
Enabled features

For example: Rate limiting uses a database or Redis
Number and rate of change of entities
The rate at which Kong Gateway processes are started and restarted within the cluster
The size of Kong Gateway’s in-memory cache

Kong Gateway intentionally relies on the database as little as possible. To access configuration, Kong Gateway only reads configuration from the database when a node first starts or configuration for a given entity changes.

Everything in the database is meant to be read infrequently and held in memory as long as possible. Therefore, database resource requirements are lower than those of compute environments running Kong Gateway.

Query patterns are typically simple and follow schema indexes. Provision sufficient database resources in order to handle spiky query patterns.

You can adjust datastore settings in kong.conf to keep database access minimal. If the database is down for maintenance, see the in-memory caching section or keep Kong Gateway operational. If you choose to keep the database operational during downtime, Vitals data is not written to the database during this time.

Cluster resource allocations

Based on the expected size and demand of the cluster, we recommend the following resource allocations as a starting point:

Size	CPU	RAM	Typical cloud instance sizes
Development	1-2 cores	2-4 GB	AWS: t3.medium GCP: n1-standard-1 Azure: Standard A1 v2
Small	1-2 cores	2-4 GB	AWS: t3.medium GCP: n1-standard-1 Azure: Standard A1 v2
Medium	2-4 cores	4-8 GB	AWS: m5.large GCP: n1-standard-4 Azure: Standard A1 v4
Large	8-16 cores	16-32 GB	AWS: c5.xlarge GCP: n1-highcpu-16 Azure: F8s v2

We strongly discourage using throttled cloud instance types (such as the AWS t2 or t3 series of machines) in large clusters, because CPU throttling is detrimental to Kong Gateway’s performance. We also recommend testing and verifying the bandwidth availability for a given instance class. Bandwidth requirements for Kong Gateway depend on the shape and volume of traffic flowing through the cluster.

In-memory caching

We recommend defining the largest mem_cache_size possible while still providing adequate resources to the operating system and any other processes running adjacent to Kong Gateway. This configuration allows Kong Gateway to take maximum advantage of the in-memory cache, and reduce the number of trips to the database.

Each Kong Gateway worker process maintains its own memory allocations, and must be accounted for when provisioning memory. By default, one worker process runs per number of available CPU cores. We recommend allocating about 500MB of memory per worker process.

For example, on a machine with 4 CPU cores and 8 GB of RAM available, we recommend allocating between 4-6 GB to cache using mem_cache_size, depending on what other processes are running alongside Kong Gateway.

Plugin queues

Several Kong Gateway plugins use internal, in-memory queues to reduce the number of concurrent requests to an upstream server under high load conditions and provide buffering during temporary network and upstream outages.

These plugins include:

The queue.max_entries plugin configuration parameter determines how many entries can be waiting in a given plugin queue. The default value of 10,000 for queue.max_entries should provide for enough buffering in many installations while keeping the maximum memory usage of queues at reasonable levels. Once this limit is reached, the oldest entry is removed when a new entry is queued.

For larger configurations, we recommend experimentally determining the memory requirements of queues by running Kong Gateway in a test environment. You can force plugin queues to reach configured limits by observing its memory consumption while plugin upstream servers are unavailable. Most plugins use one queue per plugin instance, with the exception of the HTTP Log plugin, which uses one queue per log server upstream configuration.

Next steps

See Kong Gateway’s performance testing benchmark results and conduct your own performance tuning tests