Skip to content
Kong Docs are moving soon! Our docs are migrating to a new home. You'll be automatically redirected to the new site in the future. In the meantime, check out the new site now!
Kong Logo | Kong Docs Logo
  • Docs
    • Explore the API Specs
      View all API Specs View all API Specs View all API Specs arrow image
    • Documentation
      API Specs
      Kong Gateway
      Lightweight, fast, and flexible cloud-native API gateway
      Kong Konnect
      Single platform for SaaS end-to-end connectivity
      Kong AI Gateway
      Multi-LLM AI Gateway for GenAI infrastructure
      Kong Mesh
      Enterprise service mesh based on Kuma and Envoy
      decK
      Helps manage Kong’s configuration in a declarative fashion
      Kong Ingress Controller
      Works inside a Kubernetes cluster and configures Kong to proxy traffic
      Kong Gateway Operator
      Manage your Kong deployments on Kubernetes using YAML Manifests
      Insomnia
      Collaborative API development platform
  • Plugin Hub
    • Explore the Plugin Hub
      View all plugins View all plugins View all plugins arrow image
    • Functionality View all View all arrow image
      View all plugins
      AI's icon
      AI
      Govern, secure, and control AI traffic with multi-LLM AI Gateway plugins
      Authentication's icon
      Authentication
      Protect your services with an authentication layer
      Security's icon
      Security
      Protect your services with additional security layer
      Traffic Control's icon
      Traffic Control
      Manage, throttle and restrict inbound and outbound API traffic
      Serverless's icon
      Serverless
      Invoke serverless functions in combination with other plugins
      Analytics & Monitoring's icon
      Analytics & Monitoring
      Visualize, inspect and monitor APIs and microservices traffic
      Transformations's icon
      Transformations
      Transform request and responses on the fly on Kong
      Logging's icon
      Logging
      Log request and response data using the best transport for your infrastructure
  • Support
  • Community
  • Kong Academy
Get a Demo Start Free Trial
Kong Gateway
3.9.x
  • Home icon
  • Kong Gateway
  • Production Deployment
  • Monitoring
  • Expose and graph Kong AI Metrics
github-edit-pageEdit this page
report-issueReport an issue
  • Kong Gateway
  • Kong Konnect
  • Kong Mesh
  • Kong AI Gateway
  • Plugin Hub
  • decK
  • Kong Ingress Controller
  • Kong Gateway Operator
  • Insomnia
  • Kuma

  • Docs contribution guidelines
  • 3.10.x (latest)
  • 3.9.x
  • 3.8.x
  • 3.7.x
  • 3.6.x
  • 3.5.x
  • 3.4.x (LTS)
  • 3.3.x
  • 2.8.x (LTS)
  • Archive (3.0.x and pre-2.8.x)
  • Introduction
    • Overview of Kong Gateway
    • Support
      • Version Support Policy
      • Third Party Dependencies
      • Browser Support
      • Vulnerability Patching Process
      • Software Bill of Materials
    • Stability
    • Release Notes
    • Breaking Changes
      • Kong Gateway 3.9.x
      • Kong Gateway 3.8.x
      • Kong Gateway 3.7.x
      • Kong Gateway 3.6.x
      • Kong Gateway 3.5.x
      • Kong Gateway 3.4.x
      • Kong Gateway 3.3.x
      • Kong Gateway 3.2.x
      • Kong Gateway 3.1.x
      • Kong Gateway 3.0.x
      • Kong Gateway 2.8.x or earlier
    • Key Concepts
      • Services
      • Routes
      • Consumers
      • Upstreams
      • Plugins
      • Consumer Groups
    • How Kong Works
      • Routing Traffic
      • Load Balancing
      • Health Checks and Circuit Breakers
    • Glossary
  • Get Started with Kong
    • Get Kong
    • Services and Routes
    • Rate Limiting
    • Proxy Caching
    • Key Authentication
    • Load-Balancing
  • Install Kong
    • Overview
    • Kubernetes
      • Overview
      • Install Kong Gateway
      • Configure the Admin API
      • Install Kong Manager
    • Docker
      • Using docker run
      • Build your own Docker images
    • Linux
      • Amazon Linux
      • Debian
      • Red Hat
      • Ubuntu
    • Post-installation
      • Set up a data store
      • Apply Enterprise license
      • Enable Kong Manager
  • Kong in Production
    • Deployment Topologies
      • Overview
      • Kubernetes Topologies
      • Hybrid Mode
        • Overview
        • Deploy Kong Gateway in Hybrid mode
      • DB-less Deployment
      • Traditional
    • Running Kong
      • Running Kong as a non-root user
      • Securing the Admin API
      • Using systemd
    • Access Control
      • Start Kong Gateway Securely
      • Programatically Creating Admins
      • Enabling RBAC
    • Licenses
      • Overview
      • Download your License
      • Deploy Enterprise License
      • Using the License API
      • Monitor Licenses Usage
    • Networking
      • Default Ports
      • DNS Considerations
      • Network and Firewall
      • CP/DP Communication through a Forward Proxy
      • PostgreSQL TLS
        • Configure PostgreSQL TLS
        • Troubleshooting PostgreSQL TLS
    • Kong Configuration File
    • Environment Variables
    • Serving a Website and APIs from Kong
    • Monitoring
      • Overview
      • Prometheus
      • StatsD
      • Datadog
      • Health Check Probes
      • Expose and graph AI Metrics
    • Tracing
      • Overview
      • Writing a Custom Trace Exporter
      • Tracing API Reference
    • Resource Sizing Guidelines
    • Blue-Green Deployments
    • Canary Deployments
    • Clustering Reference
    • Performance
      • Performance Testing Benchmarks
      • Establish a Performance Benchmark
      • Improve performance with Brotli compression
    • Logging and Debugging
      • Log Reference
      • Dynamic log level updates
      • Customize Gateway Logs
      • Debug Requests
      • AI Gateway Analytics
    • Configure a gRPC service
    • Use the Expressions Router
    • Upgrade and Migration
      • Upgrading Kong Gateway 3.x.x
      • Backup and Restore
      • Upgrade Strategies
        • Dual-Cluster Upgrade
        • In-Place Upgrade
        • Blue-Green Upgrade
        • Rolling Upgrade
      • Upgrade from 2.8 LTS to 3.4 LTS
      • Migrate from OSS to Enterprise
      • Migration Guidelines Cassandra to PostgreSQL
      • Migrate to the new DNS client
      • Breaking Changes
  • Kong Gateway Enterprise
    • Overview
    • Secrets Management
      • Overview
      • Getting Started
      • Secrets Rotation
      • Advanced Usage
      • Backends
        • Overview
        • Environment Variables
        • AWS Secrets Manager
        • Azure Key Vaults
        • Google Cloud Secret Manager
        • HashiCorp Vault
      • How-To
        • Securing the Database with AWS Secrets Manager
      • Reference Format
    • Dynamic Plugin Ordering
      • Overview
      • Get Started with Dynamic Plugin Ordering
    • Audit Logging
    • Keyring and Data Encryption
    • Workspaces
    • Consumer Groups
    • Event Hooks
    • Configure Data Plane Resilience
    • About Control Plane Outage Management
    • FIPS 140-2
      • Overview
      • Install the FIPS Compliant Package
    • Authenticate your Kong Gateway Amazon RDS database with AWS IAM
    • Verify Signatures for Signed Kong Images
    • Verify Build Provenance for Signed Kong Images
    • Datakit
      • Overview
      • Get Started with Datakit
      • Datakit Configuration Reference
      • Datakit Examples Reference
  • Kong AI Gateway
    • Overview
    • Get started with AI Gateway
    • LLM Provider Integration Guides
      • OpenAI
      • Cohere
      • Azure
      • Anthropic
      • Mistral
      • Llama2
      • Vertex/Gemini
      • Amazon Bedrock
    • LLM Library Integration Guides
      • LangChain
    • AI Gateway Analytics
    • Expose and graph AI Metrics
    • AI Gateway Load Balancing
    • AI Gateway plugins
  • Kong Manager
    • Overview
    • Enable Kong Manager
    • Get Started with Kong Manager
      • Services and Routes
      • Rate Limiting
      • Proxy Caching
      • Authentication with Consumers
      • Load Balancing
    • Authentication and Authorization
      • Overview
      • Create a Super Admin
      • Workspaces and Teams
      • Reset Passwords and RBAC Tokens
      • Basic Auth
      • LDAP
        • Configure LDAP
        • LDAP Service Directory Mapping
      • OIDC
        • Configure OIDC
        • OIDC Authenticated Group Mapping
        • Migrate from previous configurations
      • Sessions
      • RBAC
        • Overview
        • Enable RBAC
        • Add a Role and Permissions
        • Create a User
        • Create an Admin
    • Networking Configuration
    • Workspaces
    • Create Consumer Groups
    • Sending Email
    • Troubleshoot
  • Develop Custom Plugins
    • Overview
    • Getting Started
      • Introduction
      • Set up the Plugin Project
      • Add Plugin Testing
      • Add Plugin Configuration
      • Consume External Services
      • Deploy Plugins
    • File Structure
    • Implementing Custom Logic
    • Plugin Configuration
    • Accessing the Data Store
    • Storing Custom Entities
    • Caching Custom Entities
    • Extending the Admin API
    • Writing Tests
    • Installation and Distribution
    • Proxy-Wasm Filters
      • Create a Proxy-Wasm Filter
      • Proxy-Wasm Filter Configuration
    • Plugin Development Kit
      • Overview
      • kong.client
      • kong.client.tls
      • kong.cluster
      • kong.ctx
      • kong.ip
      • kong.jwe
      • kong.log
      • kong.nginx
      • kong.node
      • kong.plugin
      • kong.request
      • kong.response
      • kong.router
      • kong.service
      • kong.service.request
      • kong.service.response
      • kong.table
      • kong.telemetry.log
      • kong.tracing
      • kong.vault
      • kong.websocket.client
      • kong.websocket.upstream
    • Plugins in Other Languages
      • Go
      • Javascript
      • Python
      • Running Plugins in Containers
      • External Plugin Performance
  • Kong Plugins
    • Overview
    • Authentication Reference
    • Allow Multiple Authentication Plugins
    • Plugin Queuing
      • Overview
      • Plugin Queuing Reference
  • Admin API
    • Overview
    • Declarative Configuration
    • Enterprise API
      • Information Routes
      • Health Routes
      • Tags
      • Debug Routes
      • Services
      • Routes
      • Consumers
      • Plugins
      • Certificates
      • CA Certificates
      • SNIs
      • Upstreams
      • Targets
      • Vaults
      • Keys
      • Filter Chains
      • Licenses
      • Workspaces
      • RBAC
      • Admins
      • Consumer Groups
      • Event Hooks
      • Keyring and Data Encryption
      • Audit Logs
      • Status API
    • Open Source API
  • Reference
    • kong.conf
    • Injecting Nginx Directives
    • CLI
    • Key Management
    • The Expressions Language
      • Overview
      • Language References
      • Performance Optimizations
    • Rate Limiting Library
    • WebAssembly
    • Reserved Entity Names
    • FAQ
enterprise-switcher-icon Switch to OSS
On this pageOn this page
  • Overview
  • Grafana dashboard
  • Available metrics
  • Accessing the metrics
You are browsing documentation for an older version. See the latest documentation here.

Expose and graph Kong AI Metrics
Available with Kong Gateway Enterprise subscription - Contact Sales

This guide walks you through collecting AI metrics and sending them to Prometheus, and setting up the Grafana dashboard.

Overview

Kong’s AI Gateway (AI Proxy) calls LLM-based services according to the settings of the AI Proxy plugin. You can aggregate the LLM provider responses to count the number of tokens used by the AI Proxy. If you have defined input and output costs in the models, you can also calculate cost aggregation. The metrics details also expose if the requests have been cached by Kong Gateway, saving the cost of contacting the LLM providers, which improves performance.

Kong AI Gateway exposes metrics related to Kong and proxied upstream services in Prometheus exposition format, which can be scraped by a Prometheus Server.

Metrics tracked by this plugin are available on both the Admin API and Status API at the http://localhost:<port>/metrics endpoint. Note that the URL to those APIs is specific to your installation. See Accessing the metrics for more information.

The Kong Prometheus plugin records and exposes metrics at the node level. Your Prometheus server will need to discover all Kong nodes via a service discovery mechanism, and consume data from each node’s configured /metrics endpoint.

Grafana dashboard

AI Metrics exported by the plugin can be graphed in Grafana using a drop-in dashboard:

AI Grafana Dashboard

Available metrics

All the following AI LLM metrics are available per provider, model, cache, database name (if cached), embeddings provider (if cached), embeddings model (if cached), and workspace.

When ai_llm_metrics is set to true:

  • AI Requests: AI request sent to LLM providers.
  • AI Cost: AI Cost charged by LLM providers.
  • AI Tokens: AI Tokens counted by LLM providers. These are also available per token type in addition to the options listed previously.
  • AI LLM Latency: Time taken to return a response by LLM providers.
  • AI Cache Fetch Latency: Time taken to return a response from the cache.
  • AI Cache Embeddings Latency: Time taken to generate embedding during the cache.

AI metrics are disabled by default as it may create high cardinality of metrics and may cause performance issues. To enable them, set ai_metrics to true in the Prometheus plugin configuration.

Here is an example of output you could expect from the /metrics endpoint:

curl -i http://localhost:8001/metrics

Response:

HTTP/1.1 200 OK
Server: openresty/1.15.8.3
Date: Tue, 7 Jun 2020 16:35:40 GMT
Content-Type: text/plain; charset=UTF-8
Transfer-Encoding: chunked
Connection: keep-alive
Access-Control-Allow-Origin: *


# HELP ai_llm_requests_total AI requests total per ai_provider in Kong
# TYPE ai_llm_requests_total counter
ai_llm_requests_total{ai_provider="provider1",ai_model="model1",cache_status="hit",vector_db="redis",embeddings_provider="openai",embeddings_model="text-embedding-3-large",workspace="workspace1"} 100
# HELP ai_llm_cost_total AI requests cost per ai_provider/cache in Kong
# TYPE ai_llm_cost_total counter
ai_llm_cost_total{ai_provider="provider1",ai_model="model1",cache_status="hit",vector_db="redis",embeddings_provider="openai",embeddings_model="text-embedding-3-large",workspace="workspace1"} 50
# HELP ai_llm_provider_latency AI latencies per ai_provider in Kong
# TYPE ai_llm_provider_latency bucket
ai_llm_provider_latency_ms_bucket{ai_provider="provider1",ai_model="model1",cache_status="",vector_db="",embeddings_provider="",embeddings_model="",workspace="workspace1",le="+Inf"} 2
# HELP ai_llm_tokens_total AI tokens total per ai_provider/cache in Kong
# TYPE ai_llm_tokens_total counter
ai_llm_tokens_total{ai_provider="provider1",ai_model="model1",cache_status="",vector_db="",embeddings_provider="",embeddings_model="",token_type="prompt_tokens",workspace="workspace1"} 1000
ai_llm_tokens_total{ai_provider="provider1",ai_model="model1",cache_status="",vector_db="",embeddings_provider="",embeddings_model="",token_type="completion_tokens",workspace="workspace1"} 2000
ai_llm_tokens_total{ai_provider="provider1",ai_model="model1",cache_status="hit",vector_db="redis",embeddings_provider="openai",embeddings_model="text-embedding-3-large",token_type="total_tokens",workspace="workspace1"} 3000
# HELP ai_cache_fetch_latency AI cache latencies per ai_provider/database in Kong
# TYPE ai_cache_fetch_latency bucket
ai_cache_fetch_latency{ai_provider="provider1",ai_model="model1",cache_status="hit",vector_db="redis",embeddings_provider="openai",embeddings_model="text-embedding-3-large",workspace="workspace1",le="+Inf"} 2
# HELP ai_cache_embeddings_latency AI cache latencies per ai_provider/database in Kong
# TYPE ai_cache_embeddings_latency bucket
ai_cache_embeddings_latency{ai_provider="provider1",ai_model="model1",cache_status="hit",vector_db="redis",embeddings_provider="openai",embeddings_model="text-embedding-3-large",workspace="workspace1",le="+Inf"} 2
# HELP ai_llm_provider_latency AI cache latencies per ai_provider/database in Kong
# TYPE ai_llm_provider_latency bucket
ai_llm_provider_latency{ai_provider="provider1",ai_model="model1",cache_status="hit",vector_db="redis",embeddings_provider="openai",embeddings_model="text-embedding-3-large",workspace="workspace1",le="+Inf"} 2

Note: If you don’t use any cache plugins, then cache_status, vector_db, embeddings_provider, and embeddings_model values will be empty.

Accessing the metrics

In most configurations, the Kong Admin API will be behind a firewall or would need to be set up to require authentication. Here are a couple of options to allow access to the /metrics endpoint to Prometheus:

  1. If the Status API is enabled, then its /metrics endpoint can be used. This is the preferred method.

  2. The /metrics endpoint is also available on the Admin API, which can be used if the Status API is not enabled. Note that this endpoint is unavailable when RBAC is enabled on the Admin API (Prometheus does not support Key-Auth to pass the token).

Thank you for your feedback.
Was this page useful?
Too much on your plate? close cta icon
More features, less infrastructure with Kong Konnect. 1M requests per month for free.
Try it for Free
  • Kong
    Powering the API world

    Increase developer productivity, security, and performance at scale with the unified platform for API management, service mesh, and ingress controller.

    • Products
      • Kong Konnect
      • Kong Gateway Enterprise
      • Kong Gateway
      • Kong Mesh
      • Kong Ingress Controller
      • Kong Insomnia
      • Product Updates
      • Get Started
    • Documentation
      • Kong Konnect Docs
      • Kong Gateway Docs
      • Kong Mesh Docs
      • Kong Insomnia Docs
      • Kong Konnect Plugin Hub
    • Open Source
      • Kong Gateway
      • Kuma
      • Insomnia
      • Kong Community
    • Company
      • About Kong
      • Customers
      • Careers
      • Press
      • Events
      • Contact
  • Terms• Privacy• Trust and Compliance
© Kong Inc. 2025