Configuration
This plugin is compatible with DB-less mode.
Compatible protocols
The AI Proxy Advanced plugin is compatible with the following protocols:
grpc
, grpcs
, http
, https
Parameters
Here's a list of all the parameters which can be used in this plugin's configuration:
-
string required
The name of the plugin, in this case
ai-proxy-advanced
.- If using the Kong Admin API, Konnect API, declarative configuration, or decK files, the field is
name
. - If using the KongPlugin object in Kubernetes, the field is
plugin
.
- If using the Kong Admin API, Konnect API, declarative configuration, or decK files, the field is
-
string
An optional custom name to identify an instance of the plugin, for example
ai-proxy-advanced_my-service
.The instance name shows up in Kong Manager and in Konnect, so it's useful when running the same plugin in multiple contexts, for example, on multiple services. You can also use it to access a specific plugin instance via the Kong Admin API.
An instance name must be unique within the following context:
- Within a workspace for Kong Gateway Enterprise
- Within a control plane or control plane group for Konnect
- Globally for Kong Gateway (OSS)
-
string
The name or ID of the service the plugin targets. Set one of these parameters if adding the plugin to a service through the top-level
/plugins
endpoint. Not required if using/services/{serviceName|Id}/plugins
. -
string
The name or ID of the route the plugin targets. Set one of these parameters if adding the plugin to a route through the top-level
/plugins
endpoint. Not required if using/routes/{routeName|Id}/plugins
. -
string
The name or ID of the consumer the plugin targets. Set one of these parameters if adding the plugin to a consumer through the top-level
/plugins
endpoint. Not required if using/consumers/{consumerName|Id}/plugins
. -
string
The name or ID of the consumer group the plugin targets. If set, the plugin will activate only for requests where the specified group has been authenticated
/plugins
endpoint. Not required if using/consumer_groups/{consumerGroupName|Id}/plugins
. -
boolean default:
true
Whether this plugin will be applied.
-
record required
-
record required
-
string default:
round-robin
Must be one of:round-robin
,lowest-latency
,lowest-usage
,consistent-hashing
,semantic
Which load balancing algorithm to use.
-
string default:
total-tokens
Must be one of:total-tokens
,prompt-tokens
,completion-tokens
What tokens to use for usage calculation. Available values are:
total_tokens
prompt_tokens
, andcompletion_tokens
.
-
string default:
tpot
Must be one of:tpot
,e2e
What metrics to use for latency. Available values are:
tpot
(time-per-output-token) ande2e
.
-
string default:
X-Kong-LLM-Request-ID
The header to use for consistent-hashing.
-
integer default:
10000
between:10
65536
The number of slots in the load balancer algorithm.
-
integer default:
5
between:0
32767
The number of retries to execute upon failure to proxy.
-
integer default:
60000
between:1
2147483646
-
integer default:
60000
between:1
2147483646
-
integer default:
60000
between:1
2147483646
-
-
record
-
record
-
string referenceable
If AI model requires authentication via Authorization or API key header, specify its name here.
-
string referenceable encrypted
Specify the full auth header value for ‘header_name’, for example ‘Bearer key’ or just ‘key’.
-
string referenceable
If AI model requires authentication via query parameter, specify its name here.
-
string referenceable encrypted
Specify the full parameter value for ‘param_name’.
-
string Must be one of:
query
,body
Specify whether the ‘param_name’ and ‘param_value’ options go in a query string, or the POST form/JSON body.
-
boolean default:
false
Set true to use the Azure Cloud Managed Identity (or user-assigned identity) to authenticate with Azure-provider models.
-
string referenceable
If azure_use_managed_identity is set to true, and you need to use a different user-assigned identity for this LLM instance, set the client ID.
-
string referenceable encrypted
If azure_use_managed_identity is set to true, and you need to use a different user-assigned identity for this LLM instance, set the client secret.
-
string referenceable
If azure_use_managed_identity is set to true, and you need to use a different user-assigned identity for this LLM instance, set the tenant ID.
-
boolean default:
false
Use service account auth for GCP-based providers and models.
-
string referenceable encrypted
Set this field to the full JSON of the GCP service account to authenticate, if required. If null (and gcp_use_service_account is true), Kong will attempt to read from environment variable
GCP_SERVICE_ACCOUNT
.
-
string referenceable encrypted
Set this if you are using an AWS provider (Bedrock) and you are authenticating using static IAM User credentials. Setting this will override the AWS_ACCESS_KEY_ID environment variable for this plugin instance.
-
string referenceable encrypted
Set this if you are using an AWS provider (Bedrock) and you are authenticating using static IAM User credentials. Setting this will override the AWS_SECRET_ACCESS_KEY environment variable for this plugin instance.
-
boolean default:
false
If enabled, the authorization header or parameter can be overridden in the request by the value configured in the plugin.
-
-
record required
-
-
record
-
string required Must be one of:
redis
which vector database driver to use
-
integer required
the desired dimensionality for the vectors
-
number required
the default similarity threshold for accepting semantic search results (float)
-
string required Must be one of:
cosine
,euclidean
the distance metric to use for vector searches
-
record required
-
string default:
127.0.0.1
A string representing a host name, such as example.com.
-
integer default:
6379
between:0
65535
An integer representing a port number between 0 and 65535, inclusive.
-
integer default:
2000
between:0
2147483646
An integer representing a timeout in milliseconds. Must be between 0 and 2^31-2.
-
integer default:
2000
between:0
2147483646
An integer representing a timeout in milliseconds. Must be between 0 and 2^31-2.
-
integer default:
2000
between:0
2147483646
An integer representing a timeout in milliseconds. Must be between 0 and 2^31-2.
-
string referenceable
Username to use for Redis connections. If undefined, ACL authentication won’t be performed. This requires Redis v6.0.0+. To be compatible with Redis v5.x.y, you can set it to
default
.
-
string referenceable encrypted
Password to use for Redis connections. If undefined, no AUTH commands are sent to Redis.
-
string referenceable
Sentinel username to authenticate with a Redis Sentinel instance. If undefined, ACL authentication won’t be performed. This requires Redis v6.2.0+.
-
string referenceable encrypted
Sentinel password to authenticate with a Redis Sentinel instance. If undefined, no AUTH commands are sent to Redis Sentinels.
-
integer default:
0
Database to use for the Redis connection when using the
redis
strategy
-
integer default:
256
between:1
2147483646
The size limit for every cosocket connection pool associated with every remote server, per worker process. If neither
keepalive_pool_size
norkeepalive_backlog
is specified, no pool is created. Ifkeepalive_pool_size
isn’t specified butkeepalive_backlog
is specified, then the pool uses the default value. Try to increase (e.g. 512) this value if latency is high or throughput is low.
-
integer between:
0
2147483646
Limits the total number of opened connections for a pool. If the connection pool is full, connection queues above the limit go into the backlog queue. If the backlog queue is full, subsequent connect operations fail and return
nil
. Queued operations (subject to set timeouts) resume once the number of connections in the pool is less thankeepalive_pool_size
. If latency is high or throughput is low, try increasing this value. Empirically, this value is larger thankeepalive_pool_size
.
-
string
Sentinel master to use for Redis connections. Defining this value implies using Redis Sentinel.
-
string Must be one of:
master
,slave
,any
Sentinel role to use for Redis connections when the
redis
strategy is defined. Defining this value implies using Redis Sentinel.
-
array of type
record
len_min:1
Sentinel node addresses to use for Redis connections when the
redis
strategy is defined. Defining this field implies using a Redis Sentinel. The minimum length of the array is 1 element.
-
array of type
record
len_min:1
Cluster addresses to use for Redis connections when the
redis
strategy is defined. Defining this field implies using a Redis Cluster. The minimum length of the array is 1 element.
-
boolean default:
false
If set to true, uses SSL to connect to Redis.
-
boolean default:
false
If set to true, verifies the validity of the server SSL certificate. If setting this parameter, also configure
lua_ssl_trusted_certificate
inkong.conf
to specify the CA (or server) certificate used by your Redis server. You may also need to configurelua_ssl_verify_depth
accordingly.
-
string
A string representing an SNI (server name indication) value for TLS.
-
integer default:
5
Maximum retry attempts for redirection.
-
boolean default:
false
If the connection to Redis is proxied (e.g. Envoy), set it
true
. Set thehost
andport
to point to the proxy address.
-
-
-
string default:
allow
Must be one of:allow
,deny
,always
Whether to ‘optionally allow’, ‘deny’, or ‘always’ (force) the streaming of answers via server sent events.
-
integer default:
8192
max allowed body size allowed to be introspected
-
boolean default:
true
Display the model name selected in the X-Kong-LLM-Model response header
-
array of type
record
required-
string required Must be one of:
llm/v1/chat
,llm/v1/completions
,preserve
The model’s operation implementation, for this provider. Set to
preserve
to pass through without transformation.
-
record
-
string referenceable
If AI model requires authentication via Authorization or API key header, specify its name here.
-
string referenceable encrypted
Specify the full auth header value for ‘header_name’, for example ‘Bearer key’ or just ‘key’.
-
string referenceable
If AI model requires authentication via query parameter, specify its name here.
-
string referenceable encrypted
Specify the full parameter value for ‘param_name’.
-
string Must be one of:
query
,body
Specify whether the ‘param_name’ and ‘param_value’ options go in a query string, or the POST form/JSON body.
-
boolean default:
false
Set true to use the Azure Cloud Managed Identity (or user-assigned identity) to authenticate with Azure-provider models.
-
string referenceable
If azure_use_managed_identity is set to true, and you need to use a different user-assigned identity for this LLM instance, set the client ID.
-
string referenceable encrypted
If azure_use_managed_identity is set to true, and you need to use a different user-assigned identity for this LLM instance, set the client secret.
-
string referenceable
If azure_use_managed_identity is set to true, and you need to use a different user-assigned identity for this LLM instance, set the tenant ID.
-
boolean default:
false
Use service account auth for GCP-based providers and models.
-
string referenceable encrypted
Set this field to the full JSON of the GCP service account to authenticate, if required. If null (and gcp_use_service_account is true), Kong will attempt to read from environment variable
GCP_SERVICE_ACCOUNT
.
-
string referenceable encrypted
Set this if you are using an AWS provider (Bedrock) and you are authenticating using static IAM User credentials. Setting this will override the AWS_ACCESS_KEY_ID environment variable for this plugin instance.
-
string referenceable encrypted
Set this if you are using an AWS provider (Bedrock) and you are authenticating using static IAM User credentials. Setting this will override the AWS_SECRET_ACCESS_KEY environment variable for this plugin instance.
-
boolean default:
false
If enabled, the authorization header or parameter can be overridden in the request by the value configured in the plugin.
-
-
record required
-
string required Must be one of:
openai
,azure
,anthropic
,cohere
,mistral
,llama2
,gemini
,bedrock
,huggingface
AI provider request format - Kong translates requests to and from the specified backend compatible formats.
-
string
Model name to execute.
-
record
Key/value settings for the model
-
integer default:
256
Defines the max_tokens, if using chat or completion models.
-
number
Defines the cost per 1M tokens in your prompt.
-
number
Defines the cost per 1M tokens in the output of the AI.
-
number between:
0
5
Defines the matching temperature, if using chat or completion models.
-
number between:
0
1
Defines the top-p probability mass, if supported.
-
integer between:
0
500
Defines the top-k most likely tokens, if supported.
-
string
Defines the schema/API version, if using Anthropic provider.
-
string
Instance name for Azure OpenAI hosted models.
-
string default:
2023-05-15
‘api-version’ for Azure OpenAI instances.
-
string
Deployment ID for Azure OpenAI instances.
-
string Must be one of:
raw
,openai
,ollama
If using llama2 provider, select the upstream message format.
-
string Must be one of:
openai
,ollama
If using mistral provider, select the upstream message format.
-
string
Manually specify or override the full URL to the AI operation endpoints, when calling (self-)hosted models, or for running via a private endpoint.
-
string
Manually specify or override the AI operation path, used when e.g. using the ‘preserve’ route_type.
-
record
-
record
-
record
-
-
-
integer default:
100
between:1
65535
The weight this target gets within the upstream loadbalancer (1-65535).
-
string
The semantic description of the target, required if using semantic load balancing.
-
record required
-
-