Monitor metrics with Prometheus and Grafana

TL;DR

Deploy a servicemonitor Kubernetes resource using the Kong Gateway Helm chart, then use a KongClusterPlugin to configure the prometheus plugin for all Services in the cluster.

Prerequisites

If you don’t have a Konnect account, you can get started quickly with our onboarding wizard.

  1. The following Konnect items are required to complete this tutorial:
    • Personal access token (PAT): Create a new personal access token by opening the Konnect PAT page and selecting Generate Token.
  2. Set the personal access token as an environment variable:

    export KONNECT_TOKEN='YOUR KONNECT TOKEN'
    
    Copied to clipboard!

How it works

Kong Gateway provides a Prometheus plugin that exports Gateway Service and Route metrics automatically.

The Prometheus stack scrapes metrics from deployments that match the labels defined within a servicemonitor resource. The Kong Ingress Controller Helm chart can automatically label your deployments and create a servicemonitor instance to enable Prometheus metrics scraping.

Enable Prometheus

Kong Gateway doesn’t expose Prometheus metrics by default. To enable the metrics, create a prometheus plugin instance:

echo "
apiVersion: configuration.konghq.com/v1
kind: KongClusterPlugin
metadata:
  name: prometheus
  namespace: kong
  annotations:
    kubernetes.io/ingress.class: kong
  labels:
    global: 'true'
config:
  status_code_metrics: true
  bandwidth_metrics: true
  upstream_health_metrics: true
  latency_metrics: true
  per_consumer: false
plugin: prometheus
" | kubectl apply -f -
Copied to clipboard!

Deploy demo Services

This how-to deploys multiple Services to your Kubernetes cluster to simulate a production environment.

Deploy the Services and create routing resources:

kubectl apply -f https://developer.konghq.com/manifests/kic/multiple-services.yaml -n kong
Copied to clipboard!

Generate traffic

Once the Service and Routes are deployed, it’s time to generate some fake traffic. In the same terminal window, run the following command:

while true;
do
  curl $PROXY_IP/billing/status/200
  curl $PROXY_IP/billing/status/501
  curl $PROXY_IP/invoice/status/201
  curl $PROXY_IP/invoice/status/404
  curl $PROXY_IP/comments/status/200
  curl $PROXY_IP/comments/status/200
  sleep 0.01
done
Copied to clipboard!

Access Grafana

Grafana is an observability tool that you can use to observe Prometheus metrics over time. To access Grafana, you will need to port-forward the Services in a new terminal window:

kubectl -n monitoring port-forward services/prometheus-operated 9090 &
kubectl -n monitoring port-forward services/promstack-grafana 3000:80 &
Copied to clipboard!

You will also need to get the password for the admin user.

Execute the following to see and copy the password:

kubectl get secret --namespace monitoring promstack-grafana -o jsonpath="{.data.admin-password}" | base64 --decode ; echo
Copied to clipboard!

Navigate to http://localhost:3000 and use the username admin and the password that you copied.

Once logged in, you will see a Kong (official) dashboard in the bottom left. Click on this link.

Metrics collected

Grafana can show the following metrics that are scraped from Prometheus.

Request latencies of various Services

Request latencies

Kong Gateway collects the latency data of how long your Services take to respond to requests. You can use this data to alert the on-call engineer if the latency goes beyond a certain threshold. For example, let’s say you have an SLA that your APIs will respond with latency of less than 20 millisecond for 95% of requests. You could configure Prometheus to alert you based on the following query:

histogram_quantile(0.95, sum(rate(kong_request_latency_ms_sum{route=~"$route"}[1m])) by (le)) > 20
Copied to clipboard!

This query calculates the 95th percentile of the total request latency (or duration) for all of your Services and alerts you if it is more than 20 milliseconds. The “type” label in this query is “request”, which tracks the latency added by Kong Gateway and the Service. You can switch this to “upstream” to track latency added by only the Service. Prometheus is highly flexible and well-documented, so we won’t go into the details of setting up alerts here, but you’ll be able to find them in the Prometheus documentation.

Kong Gateway proxy latency

Proxy latencies

Kong Gateway also collects metrics about its performance. The following query is similar to the previous one, but gives us insight into latency added by Kong Gateway:

histogram_quantile(0.90, sum(rate(kong_kong_latency_ms_bucket[1m])) by (le,service)) > 2
Copied to clipboard!

Error rates

Error rates

Another important metric to track is the rate of errors and requests your Services are serving. The time series kong_http_status collects HTTP status code metrics for each Service.

This metric can help you track the rate of errors for each of your Services:

sum(rate(kong_http_requests_total{code=~"5[0-9]{2}"}[1m])) by (service)
Copied to clipboard!

You can also calculate the percentage of requests in any duration that are errors. Try to come up with a query to derive that result.

All HTTP status codes are indexed, meaning you could use the data to learn about your typical traffic pattern and identify problems. For example, a sudden rise in 404 response codes could be indicative of client codes requesting an endpoint that was removed in a recent deploy.

Request rate and bandwidth

Request rates

You can derive the total request rate for each of your Services or across your Kubernetes cluster using the kong_http_status time series.

Bandwidth

Another metric that Kong Gateway keeps track of is the amount of network bandwidth (kong_bandwidth) being consumed. This gives you an estimate of how request/response sizes correlate with other behaviors in your infrastructure.

You now have metrics for the Services running inside your Kubernetes cluster and have much more visibility into your applications, without making any modifications in your Services. You can use Alertmanager or Grafana to now configure alerts based on the metrics observed and your SLOs.

Cleanup

helm uninstall kong -n kong
Copied to clipboard!

Did this doc help?

Something wrong?

Help us make these docs great!

Kong Developer docs are open source. If you find these useful and want to make them better, contribute today!