AI Proxy Advanced

Overview Examples Configuration reference Changelog

Configuration

configobjectrequired

Hide Child Parameters

balancerobject

Hide Child Parameters

algorithmstring

Which load balancing algorithm to use.

Allowed values:consistent-hashinglowest-latencylowest-usagepriorityround-robinsemantic

Default:round-robin

connect_timeoutinteger

Default:60000

>= 1<= 2147483646

failover_criteriaarray[string]

Specifies in which cases an upstream response should be failover to the next target. Each option in the array is equivalent to the function of http://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_next_upstream

Allowed values:errorhttp_403http_404http_429http_500http_502http_503http_504invalid_headernon_idempotenttimeout

Default:error, timeout

hash_on_headerstring

The header to use for consistent-hashing.

Default:X-Kong-LLM-Request-ID

latency_strategystring

What metrics to use for latency. Available values are: tpot (time-per-output-token) and e2e.

Allowed values:e2etpot

Default:tpot

read_timeoutinteger

Default:60000

>= 1<= 2147483646

retriesinteger

The number of retries to execute upon failure to proxy.

Default:5

>= 0<= 32767

slotsinteger

The number of slots in the load balancer algorithm.

Default:10000

>= 10<= 65536

tokens_count_strategystring

What tokens to use for usage calculation. Available values are: total_tokens prompt_tokens, completion_tokens and cost.

Allowed values:completion-tokenscostprompt-tokenstotal-tokens

Default:total-tokens

write_timeoutinteger

Default:60000

>= 1<= 2147483646

embeddingsobject

Hide Child Parameters

authobject

Hide Child Parameters

allow_overrideboolean

If enabled, the authorization header or parameter can be overridden in the request by the value configured in the plugin.

Default:false

aws_access_key_idstring

Set this if you are using an AWS provider (Bedrock) and you are authenticating using static IAM User credentials. Setting this will override the AWS_ACCESS_KEY_ID environment variable for this plugin instance.
This field is encrypted.
This field is referenceable.

aws_secret_access_keystring

Set this if you are using an AWS provider (Bedrock) and you are authenticating using static IAM User credentials. Setting this will override the AWS_SECRET_ACCESS_KEY environment variable for this plugin instance.
This field is encrypted.
This field is referenceable.

azure_client_idstring

If azure_use_managed_identity is set to true, and you need to use a different user-assigned identity for this LLM instance, set the client ID.
This field is referenceable.

azure_client_secretstring

If azure_use_managed_identity is set to true, and you need to use a different user-assigned identity for this LLM instance, set the client secret.
This field is encrypted.
This field is referenceable.

azure_tenant_idstring

If azure_use_managed_identity is set to true, and you need to use a different user-assigned identity for this LLM instance, set the tenant ID.
This field is referenceable.

azure_use_managed_identityboolean

Set true to use the Azure Cloud Managed Identity (or user-assigned identity) to authenticate with Azure-provider models.

Default:false

gcp_service_account_jsonstring

Set this field to the full JSON of the GCP service account to authenticate, if required. If null (and gcp_use_service_account is true), Kong will attempt to read from environment variable GCP_SERVICE_ACCOUNT.
This field is encrypted.
This field is referenceable.

gcp_use_service_accountboolean

Use service account auth for GCP-based providers and models.

Default:false

header_namestring

If AI model requires authentication via Authorization or API key header, specify its name here.
This field is referenceable.

header_valuestring

Specify the full auth header value for ‘header_name’, for example ‘Bearer key’ or just ‘key’.
This field is encrypted.
This field is referenceable.

param_locationstring

Specify whether the ‘param_name’ and ‘param_value’ options go in a query string, or the POST form/JSON body.

Allowed values:bodyquery

param_namestring

If AI model requires authentication via query parameter, specify its name here.
This field is referenceable.

param_valuestring

Specify the full parameter value for ‘param_name’.
This field is encrypted.
This field is referenceable.

modelobjectrequired

Hide Child Parameters

namestringrequired

Model name to execute.

optionsobject

Key/value settings for the model

Show Child Parameters

providerstringrequired

AI provider format to use for embeddings API

Allowed values:azurebedrockgeminihuggingfacemistralopenai

genai_categorystring

Generative AI category of the request

Allowed values:audio/speechaudio/transcriptionimage/generationrealtime/generationtext/embeddingstext/generation

Default:text/generation

llm_formatstring

LLM input and output format and schema to use

Allowed values:bedrockcoheregeminihuggingfaceopenai

Default:openai

max_request_body_sizeinteger

max allowed body size allowed to be introspected. 0 means unlimited, but the size of this body will still be limited by Nginx’s client_max_body_size.

Default:8192

>= 0

model_name_headerboolean

Display the model name selected in the X-Kong-LLM-Model response header

Default:true

response_streamingstring

Whether to ‘optionally allow’, ‘deny’, or ‘always’ (force) the streaming of answers via server sent events.

Allowed values:allowalwaysdeny

Default:allow

targetsarray[object]required

Hide Child Parameters

authobject

Hide Child Parameters

allow_overrideboolean

If enabled, the authorization header or parameter can be overridden in the request by the value configured in the plugin.

Default:false

aws_access_key_idstring

aws_secret_access_keystring

azure_client_idstring

If azure_use_managed_identity is set to true, and you need to use a different user-assigned identity for this LLM instance, set the client ID.
This field is referenceable.

azure_client_secretstring

azure_tenant_idstring

If azure_use_managed_identity is set to true, and you need to use a different user-assigned identity for this LLM instance, set the tenant ID.
This field is referenceable.

azure_use_managed_identityboolean

Set true to use the Azure Cloud Managed Identity (or user-assigned identity) to authenticate with Azure-provider models.

Default:false

gcp_service_account_jsonstring

gcp_use_service_accountboolean

Use service account auth for GCP-based providers and models.

Default:false

header_namestring

If AI model requires authentication via Authorization or API key header, specify its name here.
This field is referenceable.

header_valuestring

Specify the full auth header value for ‘header_name’, for example ‘Bearer key’ or just ‘key’.
This field is encrypted.
This field is referenceable.

param_locationstring

Specify whether the ‘param_name’ and ‘param_value’ options go in a query string, or the POST form/JSON body.

Allowed values:bodyquery

param_namestring

If AI model requires authentication via query parameter, specify its name here.
This field is referenceable.

param_valuestring

Specify the full parameter value for ‘param_name’.
This field is encrypted.
This field is referenceable.

descriptionstring

The semantic description of the target, required if using semantic load balancing. Specially, setting this to ‘CATCHALL’ will indicate such target to be used when no other targets match the semantic threshold.

loggingobjectrequired

Hide Child Parameters

log_payloadsboolean

If enabled, will log the request and response body into the Kong log plugin(s) output.

Default:false

log_statisticsboolean

If enabled and supported by the driver, will add model usage and token metrics into the Kong log plugin(s) output.

Default:false

modelobjectrequired

Hide Child Parameters

namestring

Model name to execute.

optionsobject

Key/value settings for the model

Show Child Parameters

providerstringrequired

AI provider request format - Kong translates requests to and from the specified backend compatible formats.

Allowed values:anthropicazurebedrockcoheregeminihuggingfacellama2mistralopenai

route_typestringrequired

The model’s operation implementation, for this provider.

Allowed values:audio/v1/audio/speechaudio/v1/audio/transcriptionsaudio/v1/audio/translationsimage/v1/images/editsimage/v1/images/generationsllm/v1/assistantsllm/v1/batchesllm/v1/chatllm/v1/completionsllm/v1/embeddingsllm/v1/filesllm/v1/responsespreserverealtime/v1/realtime

weightinteger

The weight this target gets within the upstream loadbalancer (1-65535).

Default:100

>= 1<= 65535

vectordbobject

Hide Child Parameters

dimensionsintegerrequired

the desired dimensionality for the vectors

distance_metricstringrequired

the distance metric to use for vector searches

Allowed values:cosineeuclidean

pgvectorobjectrequired

Hide Child Parameters

databasestring

the database of the pgvector database

Default:kong-pgvector

hoststring

the host of the pgvector database

Default:127.0.0.1

passwordstring

the password of the pgvector database
This field is encrypted.
This field is referenceable.

portinteger

the port of the pgvector database

Default:5432

sslboolean

whether to use ssl for the pgvector database

Default:false

ssl_certstring

the path of ssl cert to use for the pgvector database

ssl_cert_keystring

the path of ssl cert key to use for the pgvector database

ssl_requiredboolean

whether ssl is required for the pgvector database

Default:false

ssl_verifyboolean

whether to verify ssl for the pgvector database

Default:false

ssl_versionstring

the ssl version to use for the pgvector database

Allowed values:anytlsv1_2tlsv1_3

Default:tlsv1_2

timeoutnumber

the timeout of the pgvector database

Default:5000

userstring

the user of the pgvector database
This field is referenceable.

Default:postgres

redisobjectrequired

Hide Child Parameters

cluster_max_redirectionsinteger

Maximum retry attempts for redirection.

Default:5

cluster_nodesarray[object]

Cluster addresses to use for Redis connections when the redis strategy is defined. Defining this field implies using a Redis Cluster. The minimum length of the array is 1 element.

>= 1 characters

Show Child Parameters

connect_timeoutinteger

An integer representing a timeout in milliseconds. Must be between 0 and 2^31-2.

Default:2000

>= 0<= 2147483646

connection_is_proxiedboolean

If the connection to Redis is proxied (e.g. Envoy), set it true. Set the host and port to point to the proxy address.

Default:false

databaseinteger

Database to use for the Redis connection when using the redis strategy

Default:0

hoststring

A string representing a host name, such as example.com.

Default:127.0.0.1

keepalive_backloginteger

Limits the total number of opened connections for a pool. If the connection pool is full, connection queues above the limit go into the backlog queue. If the backlog queue is full, subsequent connect operations fail and return nil. Queued operations (subject to set timeouts) resume once the number of connections in the pool is less than keepalive_pool_size. If latency is high or throughput is low, try increasing this value. Empirically, this value is larger than keepalive_pool_size.

>= 0<= 2147483646

keepalive_pool_sizeinteger

The size limit for every cosocket connection pool associated with every remote server, per worker process. If neither keepalive_pool_size nor keepalive_backlog is specified, no pool is created. If keepalive_pool_size isn’t specified but keepalive_backlog is specified, then the pool uses the default value. Try to increase (e.g. 512) this value if latency is high or throughput is low.

Default:256

>= 1<= 2147483646

passwordstring

Password to use for Redis connections. If undefined, no AUTH commands are sent to Redis.
This field is encrypted.
This field is referenceable.

portinteger

An integer representing a port number between 0 and 65535, inclusive.

Default:6379

>= 0<= 65535

read_timeoutinteger

An integer representing a timeout in milliseconds. Must be between 0 and 2^31-2.

Default:2000

>= 0<= 2147483646

send_timeoutinteger

An integer representing a timeout in milliseconds. Must be between 0 and 2^31-2.

Default:2000

>= 0<= 2147483646

sentinel_masterstring

Sentinel master to use for Redis connections. Defining this value implies using Redis Sentinel.

sentinel_nodesarray[object]

Sentinel node addresses to use for Redis connections when the redis strategy is defined. Defining this field implies using a Redis Sentinel. The minimum length of the array is 1 element.

>= 1 characters

Show Child Parameters

sentinel_passwordstring

Sentinel password to authenticate with a Redis Sentinel instance. If undefined, no AUTH commands are sent to Redis Sentinels.
This field is encrypted.
This field is referenceable.

sentinel_rolestring

Sentinel role to use for Redis connections when the redis strategy is defined. Defining this value implies using Redis Sentinel.

Allowed values:anymasterslave

sentinel_usernamestring

Sentinel username to authenticate with a Redis Sentinel instance. If undefined, ACL authentication won’t be performed. This requires Redis v6.2.0+.
This field is referenceable.

server_namestring

A string representing an SNI (server name indication) value for TLS.

sslboolean

If set to true, uses SSL to connect to Redis.

Default:false

ssl_verifyboolean

If set to true, verifies the validity of the server SSL certificate. If setting this parameter, also configure lua_ssl_trusted_certificate in kong.conf to specify the CA (or server) certificate used by your Redis server. You may also need to configure lua_ssl_verify_depth accordingly.

Default:false

usernamestring

Username to use for Redis connections. If undefined, ACL authentication won’t be performed. This requires Redis v6.0.0+. To be compatible with Redis v5.x.y, you can set it to default.
This field is referenceable.

strategystringrequired

which vector database driver to use

Allowed values:pgvectorredis

thresholdnumberrequired

the default similarity threshold for accepting semantic search results (float)

consumerobject

If set, the plugin will activate only for requests where the specified has been authenticated. (Note that some plugins can not be restricted to consumers this way.). Leave unset for the plugin to activate regardless of the authenticated Consumer.

* Additional properties are NOT allowed.

Hide Child Parameters

idstring

consumer_groupobject

If set, the plugin will activate only for requests where the specified consumer group has been authenticated. (Note that some plugins can not be restricted to consumers groups this way.). Leave unset for the plugin to activate regardless of the authenticated Consumer Groups

* Additional properties are NOT allowed.

Hide Child Parameters

idstring

protocolsarray[string]

A list of the request protocols that will trigger this plugin. The default value, as well as the possible values allowed on this field, may change depending on the plugin type. For example, plugins that only work in stream mode will only support tcp and tls.

Allowed values:grpcgrpcshttphttpswswss

Default:grpc, grpcs, http, https, ws, wss

routeobject

If set, the plugin will only activate when receiving requests via the specified route. Leave unset for the plugin to activate regardless of the route being used.

* Additional properties are NOT allowed.

Hide Child Parameters

idstring

serviceobject

If set, the plugin will only activate when receiving requests via one of the routes belonging to the specified Service. Leave unset for the plugin to activate regardless of the Service being matched.

* Additional properties are NOT allowed.

Hide Child Parameters

idstring

AI Proxy Advanced

Configuration

Did this doc help?

Help us make these docs great!

Still need help