AI Proxy Advanced Configuration - Plugin - unreleased

name or plugin

string required

The name of the plugin, in this case ai-proxy-advanced.

If using the Kong Admin API, Konnect API, declarative configuration, or decK files, the field is name.
If using the KongPlugin object in Kubernetes, the field is plugin.

instance_name

string

An optional custom name to identify an instance of the plugin, for example ai-proxy-advanced_my-service.

The instance name shows up in Kong Manager and in Konnect, so it's useful when running the same plugin in multiple contexts, for example, on multiple services. You can also use it to access a specific plugin instance via the Kong Admin API.

An instance name must be unique within the following context:

Within a workspace for Kong Gateway Enterprise
Within a control plane or control plane group for Konnect
Globally for Kong Gateway (OSS)

service.name or service.id

string

The name or ID of the service the plugin targets. Set one of these parameters if adding the plugin to a service through the top-level /plugins endpoint. Not required if using /services/{serviceName|Id}/plugins.

route.name or route.id

string

The name or ID of the route the plugin targets. Set one of these parameters if adding the plugin to a route through the top-level /plugins endpoint. Not required if using /routes/{routeName|Id}/plugins.

consumer.name or consumer.id

string

The name or ID of the consumer the plugin targets. Set one of these parameters if adding the plugin to a consumer through the top-level /plugins endpoint. Not required if using /consumers/{consumerName|Id}/plugins.

consumer_group.name or consumer_group.id

string

The name or ID of the consumer group the plugin targets. If set, the plugin will activate only for requests where the specified group has been authenticated /plugins endpoint. Not required if using /consumer_groups/{consumerGroupName|Id}/plugins.

enabled

boolean default: true

Whether this plugin will be applied.

config

record required

balancer

record required
- algorithm
  
  string default: round-robin Must be one of: round-robin, lowest-latency, lowest-usage, consistent-hashing, semantic
  
  Which load balancing algorithm to use.
- tokens_count_strategy
  
  string default: total-tokens Must be one of: total-tokens, prompt-tokens, completion-tokens
  
  What tokens to use for usage calculation. Available values are: total_tokens prompt_tokens, and completion_tokens.
- latency_strategy
  
  string default: tpot Must be one of: tpot, e2e
  
  What metrics to use for latency. Available values are: tpot (time-per-output-token) and e2e.
- hash_on_header
  
  string default: X-Kong-LLM-Request-ID
  
  The header to use for consistent-hashing.
- slots
  
  integer default: 10000 between: 10 65536
  
  The number of slots in the load balancer algorithm.
- retries
  
  integer default: 5 between: 0 32767
  
  The number of retries to execute upon failure to proxy.
- connect_timeout
  
  integer default: 60000 between: 1 2147483646
- write_timeout
  
  integer default: 60000 between: 1 2147483646
- read_timeout
  
  integer default: 60000 between: 1 2147483646

embeddings

record
- auth
  
  record
  header_name
  
  string referenceable
  
  If AI model requires authentication via Authorization or API key header, specify its name here.
  
  header_value
  
  string referenceable encrypted
  
  Specify the full auth header value for ‘header_name’, for example ‘Bearer key’ or just ‘key’.
  
  param_name
  
  string referenceable
  
  If AI model requires authentication via query parameter, specify its name here.
  
  param_value
  
  string referenceable encrypted
  
  Specify the full parameter value for ‘param_name’.
  
  param_location
  
  string Must be one of: query, body
  
  Specify whether the ‘param_name’ and ‘param_value’ options go in a query string, or the POST form/JSON body.
  
  azure_use_managed_identity
  
  boolean default: false
  
  Set true to use the Azure Cloud Managed Identity (or user-assigned identity) to authenticate with Azure-provider models.
  
  azure_client_id
  
  string referenceable
  
  If azure_use_managed_identity is set to true, and you need to use a different user-assigned identity for this LLM instance, set the client ID.
  
  azure_client_secret
  
  string referenceable encrypted
  
  If azure_use_managed_identity is set to true, and you need to use a different user-assigned identity for this LLM instance, set the client secret.
  
  azure_tenant_id
  
  string referenceable
  
  If azure_use_managed_identity is set to true, and you need to use a different user-assigned identity for this LLM instance, set the tenant ID.
  
  gcp_use_service_account
  
  boolean default: false
  
  Use service account auth for GCP-based providers and models.
  
  gcp_service_account_json
  
  string referenceable encrypted
  
  Set this field to the full JSON of the GCP service account to authenticate, if required. If null (and gcp_use_service_account is true), Kong will attempt to read from environment variable GCP_SERVICE_ACCOUNT.
  
  aws_access_key_id
  
  string referenceable encrypted
  
  Set this if you are using an AWS provider (Bedrock) and you are authenticating using static IAM User credentials. Setting this will override the AWS_ACCESS_KEY_ID environment variable for this plugin instance.
  
  aws_secret_access_key
  
  string referenceable encrypted
  
  Set this if you are using an AWS provider (Bedrock) and you are authenticating using static IAM User credentials. Setting this will override the AWS_SECRET_ACCESS_KEY environment variable for this plugin instance.
  
  allow_override
  
  boolean default: false
  
  If enabled, the authorization header or parameter can be overridden in the request by the value configured in the plugin.
- model
  
  record required
  provider
  
  string required Must be one of: openai, mistral
  
  AI provider format to use for embeddings API
  
  name
  
  string required
  
  Model name to execute.
  
  options
  
  record
  
  Key/value settings for the model
  
  upstream_url
  
  string
  
  upstream url for the embeddings

vectordb

record
- strategy
  
  string required Must be one of: redis
  
  which vector database driver to use
- dimensions
  
  integer required
  
  the desired dimensionality for the vectors
- threshold
  
  number required
  
  the default similarity threshold for accepting semantic search results (float)
- distance_metric
  
  string required Must be one of: cosine, euclidean
  
  the distance metric to use for vector searches
- redis
  
  record required
  host
  
  string default: 127.0.0.1
  
  A string representing a host name, such as example.com.
  
  port
  
  integer default: 6379 between: 0 65535
  
  An integer representing a port number between 0 and 65535, inclusive.
  
  connect_timeout
  
  integer default: 2000 between: 0 2147483646
  
  An integer representing a timeout in milliseconds. Must be between 0 and 2^31-2.
  
  send_timeout
  
  integer default: 2000 between: 0 2147483646
  
  An integer representing a timeout in milliseconds. Must be between 0 and 2^31-2.
  
  read_timeout
  
  integer default: 2000 between: 0 2147483646
  
  An integer representing a timeout in milliseconds. Must be between 0 and 2^31-2.
  
  username
  
  string referenceable
  
  Username to use for Redis connections. If undefined, ACL authentication won’t be performed. This requires Redis v6.0.0+. To be compatible with Redis v5.x.y, you can set it to default.
  
  password
  
  string referenceable encrypted
  
  Password to use for Redis connections. If undefined, no AUTH commands are sent to Redis.
  
  sentinel_username
  
  string referenceable
  
  Sentinel username to authenticate with a Redis Sentinel instance. If undefined, ACL authentication won’t be performed. This requires Redis v6.2.0+.
  
  sentinel_password
  
  string referenceable encrypted
  
  Sentinel password to authenticate with a Redis Sentinel instance. If undefined, no AUTH commands are sent to Redis Sentinels.
  
  database
  
  integer default: 0
  
  Database to use for the Redis connection when using the redis strategy
  
  keepalive_pool_size
  
  integer default: 256 between: 1 2147483646
  
  The size limit for every cosocket connection pool associated with every remote server, per worker process. If neither keepalive_pool_size nor keepalive_backlog is specified, no pool is created. If keepalive_pool_size isn’t specified but keepalive_backlog is specified, then the pool uses the default value. Try to increase (e.g. 512) this value if latency is high or throughput is low.
  
  keepalive_backlog
  
  integer between: 0 2147483646
  
  Limits the total number of opened connections for a pool. If the connection pool is full, connection queues above the limit go into the backlog queue. If the backlog queue is full, subsequent connect operations fail and return nil. Queued operations (subject to set timeouts) resume once the number of connections in the pool is less than keepalive_pool_size. If latency is high or throughput is low, try increasing this value. Empirically, this value is larger than keepalive_pool_size.
  
  sentinel_master
  
  string
  
  Sentinel master to use for Redis connections. Defining this value implies using Redis Sentinel.
  
  sentinel_role
  
  string Must be one of: master, slave, any
  
  Sentinel role to use for Redis connections when the redis strategy is defined. Defining this value implies using Redis Sentinel.
  
  sentinel_nodes
  
  array of type record len_min: 1
  
  Sentinel node addresses to use for Redis connections when the redis strategy is defined. Defining this field implies using a Redis Sentinel. The minimum length of the array is 1 element.
  
  host
  
  string required default: 127.0.0.1
  
  A string representing a host name, such as example.com.
  
  port
  
  integer default: 6379 between: 0 65535
  
  An integer representing a port number between 0 and 65535, inclusive.
  
  cluster_nodes
  
  array of type record len_min: 1
  
  Cluster addresses to use for Redis connections when the redis strategy is defined. Defining this field implies using a Redis Cluster. The minimum length of the array is 1 element.
  
  ip
  
  string required default: 127.0.0.1
  
  A string representing a host name, such as example.com.
  
  port
  
  integer default: 6379 between: 0 65535
  
  An integer representing a port number between 0 and 65535, inclusive.
  
  ssl
  
  boolean default: false
  
  If set to true, uses SSL to connect to Redis.
  
  ssl_verify
  
  boolean default: false
  
  If set to true, verifies the validity of the server SSL certificate. If setting this parameter, also configure lua_ssl_trusted_certificate in kong.conf to specify the CA (or server) certificate used by your Redis server. You may also need to configure lua_ssl_verify_depth accordingly.
  
  server_name
  
  string
  
  A string representing an SNI (server name indication) value for TLS.
  
  cluster_max_redirections
  
  integer default: 5
  
  Maximum retry attempts for redirection.
  
  connection_is_proxied
  
  boolean default: false
  
  If the connection to Redis is proxied (e.g. Envoy), set it true. Set the host and port to point to the proxy address.

response_streaming

string default: allow Must be one of: allow, deny, always

Whether to ‘optionally allow’, ‘deny’, or ‘always’ (force) the streaming of answers via server sent events.

max_request_body_size

integer default: 8192

max allowed body size allowed to be introspected

model_name_header

boolean default: true

Display the model name selected in the X-Kong-LLM-Model response header

targets

array of type record required
- route_type
  
  string required Must be one of: llm/v1/chat, llm/v1/completions, preserve
  
  The model’s operation implementation, for this provider. Set to preserve to pass through without transformation.
- auth
  
  record
  header_name
  
  string referenceable
  
  If AI model requires authentication via Authorization or API key header, specify its name here.
  
  header_value
  
  string referenceable encrypted
  
  Specify the full auth header value for ‘header_name’, for example ‘Bearer key’ or just ‘key’.
  
  param_name
  
  string referenceable
  
  If AI model requires authentication via query parameter, specify its name here.
  
  param_value
  
  string referenceable encrypted
  
  Specify the full parameter value for ‘param_name’.
  
  param_location
  
  string Must be one of: query, body
  
  Specify whether the ‘param_name’ and ‘param_value’ options go in a query string, or the POST form/JSON body.
  
  azure_use_managed_identity
  
  boolean default: false
  
  Set true to use the Azure Cloud Managed Identity (or user-assigned identity) to authenticate with Azure-provider models.
  
  azure_client_id
  
  string referenceable
  
  If azure_use_managed_identity is set to true, and you need to use a different user-assigned identity for this LLM instance, set the client ID.
  
  azure_client_secret
  
  string referenceable encrypted
  
  If azure_use_managed_identity is set to true, and you need to use a different user-assigned identity for this LLM instance, set the client secret.
  
  azure_tenant_id
  
  string referenceable
  
  If azure_use_managed_identity is set to true, and you need to use a different user-assigned identity for this LLM instance, set the tenant ID.
  
  gcp_use_service_account
  
  boolean default: false
  
  Use service account auth for GCP-based providers and models.
  
  gcp_service_account_json
  
  string referenceable encrypted
  
  Set this field to the full JSON of the GCP service account to authenticate, if required. If null (and gcp_use_service_account is true), Kong will attempt to read from environment variable GCP_SERVICE_ACCOUNT.
  
  aws_access_key_id
  
  string referenceable encrypted
  
  Set this if you are using an AWS provider (Bedrock) and you are authenticating using static IAM User credentials. Setting this will override the AWS_ACCESS_KEY_ID environment variable for this plugin instance.
  
  aws_secret_access_key
  
  string referenceable encrypted
  
  Set this if you are using an AWS provider (Bedrock) and you are authenticating using static IAM User credentials. Setting this will override the AWS_SECRET_ACCESS_KEY environment variable for this plugin instance.
  
  allow_override
  
  boolean default: false
  
  If enabled, the authorization header or parameter can be overridden in the request by the value configured in the plugin.
- model
  
  record required
  provider
  
  string required Must be one of: openai, azure, anthropic, cohere, mistral, llama2, gemini, bedrock, huggingface
  
  AI provider request format - Kong translates requests to and from the specified backend compatible formats.
  
  name
  
  string
  
  Model name to execute.
  
  options
  
  record
  
  Key/value settings for the model
  
  max_tokens
  
  integer default: 256
  
  Defines the max_tokens, if using chat or completion models.
  
  input_cost
  
  number
  
  Defines the cost per 1M tokens in your prompt.
  
  output_cost
  
  number
  
  Defines the cost per 1M tokens in the output of the AI.
  
  temperature
  
  number between: 0 5
  
  Defines the matching temperature, if using chat or completion models.
  
  top_p
  
  number between: 0 1
  
  Defines the top-p probability mass, if supported.
  
  top_k
  
  integer between: 0 500
  
  Defines the top-k most likely tokens, if supported.
  
  anthropic_version
  
  string
  
  Defines the schema/API version, if using Anthropic provider.
  
  azure_instance
  
  string
  
  Instance name for Azure OpenAI hosted models.
  
  azure_api_version
  
  string default: 2023-05-15
  
  ‘api-version’ for Azure OpenAI instances.
  
  azure_deployment_id
  
  string
  
  Deployment ID for Azure OpenAI instances.
  
  llama2_format
  
  string Must be one of: raw, openai, ollama
  
  If using llama2 provider, select the upstream message format.
  
  mistral_format
  
  string Must be one of: openai, ollama
  
  If using mistral provider, select the upstream message format.
  
  upstream_url
  
  string
  
  Manually specify or override the full URL to the AI operation endpoints, when calling (self-)hosted models, or for running via a private endpoint.
  
  upstream_path
  
  string
  
  Manually specify or override the AI operation path, used when e.g. using the ‘preserve’ route_type.
  
  gemini
  
  record
  
  api_endpoint
  
  string
  
  If running Gemini on Vertex, specify the regional API endpoint (hostname only).
  
  project_id
  
  string
  
  If running Gemini on Vertex, specify the project ID.
  
  location_id
  
  string
  
  If running Gemini on Vertex, specify the location ID.
  
  bedrock
  
  record
  
  aws_region
  
  string
  
  If using AWS providers (Bedrock) you can override the AWS_REGION environment variable by setting this option.
  
  huggingface
  
  record
  
  use_cache
  
  boolean
  
  Use the cache layer on the inference API
  
  wait_for_model
  
  boolean
  
  Wait for the model if it is not ready
- weight
  
  integer default: 100 between: 1 65535
  
  The weight this target gets within the upstream loadbalancer (1-65535).
- description
  
  string
  
  The semantic description of the target, required if using semantic load balancing.
- logging
  
  record required
  log_statistics
  
  boolean required default: false
  
  If enabled and supported by the driver, will add model usage and token metrics into the Kong log plugin(s) output.
  
  log_payloads
  
  boolean required default: false
  
  If enabled, will log the request and response body into the Kong log plugin(s) output.

Configuration

Compatible protocols

Parameters

name or plugin

instance_name

service.name or service.id

route.name or route.id

consumer.name or consumer.id

consumer_group.name or consumer_group.id

enabled

config

balancer

algorithm

tokens_count_strategy

latency_strategy

hash_on_header

slots

retries

connect_timeout

write_timeout

read_timeout

embeddings

auth

header_name

header_value

param_name

param_value

param_location

azure_use_managed_identity

azure_client_id

azure_client_secret

azure_tenant_id

gcp_use_service_account

gcp_service_account_json

aws_access_key_id

aws_secret_access_key

allow_override

model

provider

name

options

upstream_url

vectordb

strategy

dimensions

threshold

distance_metric

redis

host

port

connect_timeout

send_timeout

read_timeout

username

password

sentinel_username

sentinel_password

database

keepalive_pool_size

keepalive_backlog

sentinel_master

sentinel_role

sentinel_nodes

host

port

cluster_nodes

ip

port

ssl

ssl_verify

server_name

cluster_max_redirections

connection_is_proxied

response_streaming

max_request_body_size

model_name_header

targets

route_type

auth

header_name