Skip to content
Kong Docs are moving soon! Our docs are migrating to a new home. You'll be automatically redirected to the new site in the future. In the meantime, view this page on the new site!
Kong Logo | Kong Docs Logo
  • Docs
    • Explore the API Specs
      View all API Specs View all API Specs View all API Specs arrow image
    • Documentation
      API Specs
      Kong Gateway
      Lightweight, fast, and flexible cloud-native API gateway
      Kong Konnect
      Single platform for SaaS end-to-end connectivity
      Kong AI Gateway
      Multi-LLM AI Gateway for GenAI infrastructure
      Kong Mesh
      Enterprise service mesh based on Kuma and Envoy
      decK
      Helps manage Kong’s configuration in a declarative fashion
      Kong Ingress Controller
      Works inside a Kubernetes cluster and configures Kong to proxy traffic
      Kong Gateway Operator
      Manage your Kong deployments on Kubernetes using YAML Manifests
      Insomnia
      Collaborative API development platform
  • Plugin Hub
    • Explore the Plugin Hub
      View all plugins View all plugins View all plugins arrow image
    • Functionality View all View all arrow image
      View all plugins
      AI's icon
      AI
      Govern, secure, and control AI traffic with multi-LLM AI Gateway plugins
      Authentication's icon
      Authentication
      Protect your services with an authentication layer
      Security's icon
      Security
      Protect your services with additional security layer
      Traffic Control's icon
      Traffic Control
      Manage, throttle and restrict inbound and outbound API traffic
      Serverless's icon
      Serverless
      Invoke serverless functions in combination with other plugins
      Analytics & Monitoring's icon
      Analytics & Monitoring
      Visualize, inspect and monitor APIs and microservices traffic
      Transformations's icon
      Transformations
      Transform request and responses on the fly on Kong
      Logging's icon
      Logging
      Log request and response data using the best transport for your infrastructure
  • Support
  • Community
  • Kong Academy
Get a Demo Start Free Trial
1.5.x
  • Home icon
  • Kong Gateway Operator
  • Guides
  • Autoscaling Workloads
  • Horizontally autoscale workloads using Prometheus and Prometheus adapter
github-edit-pageEdit this page
report-issueReport an issue
  • Kong Gateway
  • Kong Konnect
  • Kong Mesh
  • Kong AI Gateway
  • Plugin Hub
  • decK
  • Kong Ingress Controller
  • Kong Gateway Operator
  • Insomnia
  • Kuma

  • Docs contribution guidelines
  • unreleased
  • 1.6.x (latest)
  • 1.5.x
  • 1.4.x
  • 1.3.x
  • 1.2.x
  • 1.1.x
  • 1.0.x
  • Introduction
    • Overview
    • Deployment Topologies
      • Hybrid Mode
      • DB-less Mode
    • Key Concepts
      • Gateway API
      • Gateway Configuration
      • Managed Gateways
    • Changelog
    • Version Support Policy
    • FAQ
  • Get Started
    • Konnect
      • Install Gateway Operator
      • Create a KonnectExtension
      • Deploy a Data Plane
      • Create a Route
    • Kong Ingress Controller
      • Install Gateway Operator
      • Create a Gateway
      • Create a Route
  • Production Deployment
    • Overview
    • Install
    • Enterprise License
    • Monitoring
      • Metrics
      • Status fields
        • Overview
        • DataPlane
        • ControlPlane
        • Gateway
    • Upgrade Gateway Operator
    • Certificates
      • Using custom CA for signing operator certificates
  • Guides
    • AI Gateway
    • Customization
      • Set data plane image
      • Deploying Sidecars
      • Customizing PodTemplateSpec
      • Defining PodDisruptionBudget for DataPlane
    • Autoscaling Kong Gateway
    • Autoscaling Workloads
      • Overview
      • Prometheus
      • Datadog
    • Upgrading Data Planes
      • Rolling Deployment
      • Blue / Green Deployment
    • Kong Custom Plugin Distribution
    • Managing Konnect entities
      • Architecture overview
      • Gateway Control Plane
      • Service and Route
      • Consumer, Credentials and Consumer Groups
      • Key and Key Set
      • Upstream and Targets
      • Certificate and CA Certificate
      • Vault
      • Data Plane Client Certificate
      • Tagging and Labeling
      • Managing Plugin Bindings by CRD
      • Cloud Gateways - Networks
      • Cloud Gateways - Data Plane Group Configuration
      • FAQ
    • Migration
      • Migrate Konnect DataPlanes from KGO v1.4.x to v1.5.x
  • Reference
    • Custom Resources
      • Overview
      • GatewayConfiguration
      • ControlPlane
      • DataPlane
      • KongPluginInstallation
    • Understanding KonnectExtension
    • Configuration Options
    • License
    • Version Compatibility
enterprise-switcher-icon Switch to OSS
On this pageOn this page
  • Install Prometheus
  • Create a ServiceMonitor to scrape Kong Gateway Operator
  • Install prometheus-adapter
  • Send traffic
  • Verify metrics are exposed in Kubernetes
  • Use exposed metric in HorizontalPodAutoscaler
  • Observe Kubernetes SuccessfulRescale events
You are browsing documentation for an older version. See the latest documentation here.

Horizontally autoscale workloads using Prometheus and Prometheus adapter
Available with Kong Gateway Enterprise subscription - Contact Sales

Kong Gateway Operator can be integrated with Prometheus and prometheus-adapter in order to use Kong Gateway latency metrics to autoscale workloads based on their metrics.

Install Prometheus

Note: You can reuse your current Prometheus setup and skip this step but please be aware that it needs to be able to scrape Kong Gateway Operator’s metrics (e.g. through ServiceMonitor) and note down the namespace in which it’s deployed.

  1. Add the prometheus-community helm charts:

    helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
    helm repo update
    
  2. Install Prometheus via kube-prometheus-stack helm chart:

    helm upgrade --install --create-namespace -n prometheus prometheus prometheus-community/kube-prometheus-stack
    

Create a ServiceMonitor to scrape Kong Gateway Operator

To make Prometheus scrape Kong Gateway Operator’s /metrics endpoint, we’ll need to create a ServiceMonitor:

echo '
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    release: prometheus
  name: gateway-operator
  namespace: kong-system
spec:
  endpoints:
  - port: https
    scheme: https
    path: /metrics
    bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
    tlsConfig:
      insecureSkipVerify: true
  selector:
    matchLabels:
      control-plane: controller-manager ' | kubectl apply -f -

After applying the above manifest you can check one of the metrics exposed by Kong Gateway Operator to verify that the scrape config has been applied.

To access the Prometheus UI, create a port-forward and visit http://localhost:9090

kubectl port-forward service/prometheus-kube-prometheus-prometheus 9090:9090 -n prometheus

This can be verified by going to your Prometheus UI and querying e.g.:

up{service=~"kgo-gateway-operator-metrics-service"}

Prometheus metrics can take up to 2 minutes to appear

Install prometheus-adapter

The prometheus-adapter package makes Prometheus metrics usable in Kubernetes.

To deploy prometheus-adapter you’ll need to decide what time series to expose so that Kubernetes can consume it.

Note: Kong Gateway Operator enriches specific metrics for use with prometheus-adapter. See the overview for a complete list.

Create a values.yaml file to deploy the prometheus-adapter helm chart. This configuration calculates a kong_upstream_latency_ms_60s_average metric, which exposes a 60s moving average of upstream response latency:

prometheus:
  # Update this value if Prometheus is installed in a different namespace
  url: http://prometheus-kube-prometheus-prometheus.prometheus.svc

rules:
  default: false
  custom:
  - seriesQuery: '{__name__=~"^kong_upstream_latency_ms_(sum|count)",kubernetes_namespace!="",kubernetes_name!="",kubernetes_kind!=""}'
    resources:
      overrides:
        exported_namespace:
          resource: "namespace"
        exported_service:
          resource: "service"
    name:
      as: "kong_upstream_latency_ms_60s_average"
    metricsQuery: |
      sum by (exported_service) (rate(kong_upstream_latency_ms_sum{<<.LabelMatchers>>}[60s:10s]))
        /
      sum by (exported_service) (rate(kong_upstream_latency_ms_count{<<.LabelMatchers>>}[60s:10s]))

Install prometheus-adapter using Helm:

helm upgrade --install --create-namespace -n prometheus --values values.yaml prometheus-adapter prometheus-community/prometheus-adapter

Send traffic

To trigger autoscaling, run the following command in a new terminal window. This will cause the underlying deployment to sleep for 100ms on each request and thus increase the average response time to that value.

while curl -k "http://$(kubectl get gateway kong -o custom-columns='name:.status.addresses[0].value' --no-headers -n default)/echo/shell?cmd=sleep%200.1" ; do sleep 1; done

Keep this running while we move on to next steps.

Verify metrics are exposed in Kubernetes

When all is configured you should be able to see the metric you’ve configured in prometheus-adapter exposed via the Kubernetes Custom Metrics API:

kubectl get --raw '/apis/custom.metrics.k8s.io/v1beta1/namespaces/default/services/echo/kong_upstream_latency_ms_60s_average' | jq

Note: The prometheus-adapter may take up to 2 minutes to populate the custom metrics

This should result in:

{
  "kind": "MetricValueList",
  "apiVersion": "custom.metrics.k8s.io/v1beta1",
  "metadata": {},
  "items": [
    {
      "describedObject": {
        "kind": "Service",
        "namespace": "default",
        "name": "echo",
        "apiVersion": "/v1"
      },
      "metricName": "kong_upstream_latency_ms_60s_average",
      "timestamp": "2024-03-06T13:11:12Z",
      "value": "102312m",
      "selector": null
    }
  ]
}

Note: 102312m is a Kubernetes way of expressing numbers as integers. value represents the latency in microseconds, and is approximately equivalent to 102 milliseconds (ms).

Use exposed metric in HorizontalPodAutoscaler

When the metric configured in prometheus-adapter is available through Kubernetes’ Custom Metrics API we can use it in HorizontalPodAutoscaler to autoscale our workload: specifically the echo Deployment.

This can be done by using the following manifest, which will scale the underlying echo Deployment between 1 and 10 replicas, trying to keep the average latency across last 60s at 40ms.

echo '
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: echo
  namespace: default
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: echo
  minReplicas: 1
  maxReplicas: 10
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 1
      policies:
      - type: Percent
        value: 100
        periodSeconds: 10
    scaleUp:
      stabilizationWindowSeconds: 1
      policies:
      - type: Percent
        value: 100
        periodSeconds: 2
      - type: Pods
        value: 4
        periodSeconds: 2
      selectPolicy: Max
  metrics:
  - type: Object
    object:
      metric:
        name: "kong_upstream_latency_ms_60s_average"
      describedObject:
        apiVersion: v1
        kind: Service
        name: echo
      target:
        type: Value
        value: "40" ' | kubectl apply -f -

Observe Kubernetes SuccessfulRescale events

You can watch SuccessfulRescale events using the following kubectl command:

kubectl get events -n default --field-selector involvedObject.name=echo --field-selector involvedObject.kind=HorizontalPodAutoscaler -w

If everything went well we should see the SuccessfulRescale events:

12m          Normal   SuccessfulRescale   horizontalpodautoscaler/echo   New size: 2; reason: Service metric kong_upstream_latency_ms_60s_average above target
12m          Normal   SuccessfulRescale   horizontalpodautoscaler/echo   New size: 4; reason: Service metric kong_upstream_latency_ms_60s_average above target
12m          Normal   SuccessfulRescale   horizontalpodautoscaler/echo   New size: 8; reason: Service metric kong_upstream_latency_ms_60s_average above target
12m          Normal   SuccessfulRescale   horizontalpodautoscaler/echo   New size: 10; reason: Service metric kong_upstream_latency_ms_60s_average above target

Then when latency drops (when you stop sending traffic with the curl command) you should observe the SuccessfulRescale events scaling your workloads down:

4s          Normal   SuccessfulRescale   horizontalpodautoscaler/echo   New size: 1; reason: All metrics below target
Thank you for your feedback.
Was this page useful?
Too much on your plate? close cta icon
More features, less infrastructure with Kong Konnect. 1M requests per month for free.
Try it for Free
  • Kong
    Powering the API world

    Increase developer productivity, security, and performance at scale with the unified platform for API management, service mesh, and ingress controller.

    • Products
      • Kong Konnect
      • Kong Gateway Enterprise
      • Kong Gateway
      • Kong Mesh
      • Kong Ingress Controller
      • Kong Insomnia
      • Product Updates
      • Get Started
    • Documentation
      • Kong Konnect Docs
      • Kong Gateway Docs
      • Kong Mesh Docs
      • Kong Insomnia Docs
      • Kong Konnect Plugin Hub
    • Open Source
      • Kong Gateway
      • Kuma
      • Insomnia
      • Kong Community
    • Company
      • About Kong
      • Customers
      • Careers
      • Press
      • Events
      • Contact
  • Terms• Privacy• Trust and Compliance
© Kong Inc. 2025