AI Gateway audit log reference

Uses: AI Gateway Kong Gateway

Log details

Each AI plugin returns a set of tokens. Log entries include the following details:

Property	Description
`ai.$PLUGIN_NAME.payload.request`	The request payload.
`ai.$PLUGIN_NAME.payload.response`	The response payload.
`ai.$PLUGIN_NAME.usage.prompt_token`	The number of tokens used for prompting.
`ai.$PLUGIN_NAME.usage.completion_token`	The number of tokens used for completion.
`ai.$PLUGIN_NAME.usage.total_tokens`	The total number of tokens used.
`ai.$PLUGIN_NAME.usage.cost`	The total cost of the request (input and output cost).
`ai.$PLUGIN_NAME.usage.time_per_token`	v3.8+ The average time to generate an output token, in milliseconds.
`ai.$PLUGIN_NAME.meta.request_model`	The model used for the AI request.
`ai.$PLUGIN_NAME.meta.provider_name`	The name of the AI service provider.
`ai.$PLUGIN_NAME.meta.response_model`	The model used for the AI response.
`ai.$PLUGIN_NAME.meta.plugin_id`	The unique identifier of the plugin.
`ai.$PLUGIN_NAME.meta.llm_latency`	v3.8+ The time, in milliseconds, it took the LLM provider to generate the full response.
`ai.$PLUGIN_NAME.cache.cache_status`	v3.8+ The cache status. This can be `Hit`, `Miss`, `Bypass` or `Refresh`.
`ai.$PLUGIN_NAME.cache.fetch_latency`	v3.8+ The time, in milliseconds, it took to return a cache response.
`ai.$PLUGIN_NAME.cache.embeddings_provider`	v3.8+ For semantic caching, the provider used to generate the embeddings.
`ai.$PLUGIN_NAME.cache.embeddings_model`	v3.8+ For semantic caching, the model used to generate the embeddings.
`ai.$PLUGIN_NAME.cache.embeddings_latency`	v3.8+ For semantic caching, the time taken to generate the embeddings.

The following example shows a structured AI Gateway log entry:

"ai": {
    "payload": { "request": "$OPTIONAL_PAYLOAD_REQUEST" },
    "$PLUGIN_NAME_1": {
      "payload": { "response": "$OPTIONAL_PAYLOAD_RESPONSE" },
      "usage": {
        "prompt_token": 28,
        "total_tokens": 48,
        "completion_token": 20,
        "cost": 0.0038,
        "time_per_token": 133
      },
      "meta": {
        "request_model": "command",
        "provider_name": "cohere",
        "response_model": "command",
        "plugin_id": "546c3856-24b3-469a-bd6c-f6083babd2cd",
        "llm_latency": 2670
      }
    },
    "$PLUGIN_NAME_2": {
      "payload": { "response": "$OPTIONAL_PAYLOAD_RESPONSE" },
      "usage": {
        "prompt_token": 89,
        "total_tokens": 145,
        "completion_token": 56,
        "cost": 0.0012,
        "time_per_token": 87
      },
      "meta": {
        "request_model": "gpt-35-turbo",
        "provider_name": "azure",
        "response_model": "gpt-35-turbo",
        "plugin_id": "5df193be-47a3-4f1b-8c37-37e31af0568b",
        "llm_latency": 4927
      }
    }
  }

      
        
      
    
Copied to clipboard!

If you’re using the AI Semantic Cache plugin, AI Gateway logs include additional fields under the cache object for each plugin entry. These fields provide insight into cache behavior—such as whether a response was served from cache, how long it took to fetch, and which embedding provider and model were used if applicable.

The following example shows how cache-related metadata appears alongside usage and model details in a structured AI log entry:

"ai": {
    "payload": { "request": "$OPTIONAL_PAYLOAD_REQUEST_" },
    "$PLUGIN_NAME_1": {
      "payload": { "response": "$OPTIONAL_PAYLOAD_RESPONSE" },
      "usage": {
        "prompt_token": 28,
        "total_tokens": 48,
        "completion_token": 20,
        "cost": 0.0038,
        "time_per_token": 133
      },
      "meta": {
        "request_model": "command",
        "provider_name": "cohere",
        "response_model": "command",
        "plugin_id": "546c3856-24b3-469a-bd6c-f6083babd2cd",
        "llm_latency": 2670
      },
      "cache": {
        "cache_status": "Hit",
        "fetch_latency": 21
      }
    },
    "$PLUGIN_NAME_2": {
      "payload": { "response": "$OPTIONAL_PAYLOAD_RESPONSE" },
      "usage": {
        "prompt_token": 89,
        "total_tokens": 145,
        "completion_token": 56,
        "cost": 0.0012
      },
      "meta": {
        "request_model": "gpt-35-turbo",
        "provider_name": "azure",
        "response_model": "gpt-35-turbo",
        "plugin_id": "5df193be-47a3-4f1b-8c37-37e31af0568b"
      },
      "cache": {
        "cache_status": "Hit",
        "fetch_latency": 444,
        "embeddings_provider": "openai",
        "embeddings_model": "text-embedding-3-small",
        "embeddings_latency": 424
      }
    }
  }

      
        
      
    
Copied to clipboard!

Note: When returning a cache response, time_per_token and llm_latency are omitted. The cache response can be returned either as a semantic cache or an exact cache. If it’s returned as a semantic cache, it will include additional details such as the embeddings provider, embeddings model, and embeddings latency.

AI Gateway audit log reference

Log details

Cache logging v3.8+

Did this doc help?

Help us make these docs great!

Still need help