Set up AI Proxy with LangChain
This guide walks you through setting up the AI Proxy plugin with LangChain.
Kong AI Gateway delivers a suite of AI-specific plugins on top of the API Gateway platform, enabling you to:
- Route a single consumer interface to multiple models, across many providers
- Load balance similar models based on cost, latency, and other metrics/algorithms
- Deliver a rich analytics and auditing suite for your deployments
- Enable semantic features to protect your users, your models, and your costs
- Provide no-code AI enhancements to your existing REST APIs
- Leverage Kong’s existing ecosystem of authentication, monitoring, and traffic-control plugins
Get started
Kong AI Gateway exchanges inference requests in the OpenAI formats - thus you can easily and quickly connect your existing LangChain OpenAI adaptor-based integrations directly through Kong with no code changes.
You can target hundreds of models across the supported providers, all from the same client-side codebase.
Create LLM configuration
Kong AI Gateway uses the same familiar service/route/plugin system as the API Gateway product, with a declarative setup that launches a complete gateway system configured from a single YAML file.
Create your gateway YAML file, using the AI Proxy plugin, in this example for:
- The OpenAI backend and GPT-4o model
- The Gemini backend and Google One-hosted Gemini model
_format_version: "3.0"
# A service can hold plugins and features for "all models" you configure
services:
- name: ai
url: https://localhost:32000 # this can be any hostname
# A route can denote a single model, or can support multiple based on the request parameters
routes:
- name: openai-gpt4o
paths:
- "/gpt4o"
plugins:
- name: ai-proxy # ai-proxy is the core AI Gateway enabling feature
config:
route_type: llm/v1/chat
model:
provider: openai
name: gpt-4o
auth:
header_name: Authorization
header_value: "Bearer <OPENAI_KEY_HERE>" # replace with your OpenAI key
Output this file to kong.yaml
.
Launch the Gateway
Launch the Kong open-source gateway, loading this configuration YAML, with one command:
docker run -it --rm --name kong-ai -p 8000:8000 \
-v "$(pwd)/kong.yaml:/etc/kong/kong.yaml" \
-e "KONG_DECLARATIVE_CONFIG=/etc/kong/kong.yaml" \
-e "KONG_DATABASE=off" \
kong:3.8
Validate
Check you are reaching GPT-4o on OpenAI correctly:
curl -H 'Content-Type: application/json' -d '{"messages":[{"role":"user","content":"What are you?"}]}' http://127.0.0.1:8000/gpt4o
Response:
{
...
...
"content": "I am an AI language model developed by OpenAI, designed to assist with generating text-based responses and providing information on a wide range of topics. How can I assist you today?",
...
...
}
Execute Your LangChain Code
Now you can configure your LangChain client code to point to Kong, and we should see identical results.
First, load the LangChain SDK into your Python dependencies:
# WSL2, Linux, macOS-native:
pip3 install -U langchain-openai
# or macOS if installed via Homebrew:
python3 -m venv .venv
source .venv/bin/activate
pip install -U langchain-openai
Then create an app.py
script:
from langchain_openai import ChatOpenAI
kong_url = "http://127.0.0.1:8000"
kong_route = "gpt4o"
llm = ChatOpenAI(
base_url=f'{kong_url}/{kong_route}', # simply override the base URL from OpenAI, to Kong
model="gpt-4o",
api_key="NONE" # set to NONE as we have not added any gateway-layer security yet
)
response = llm.invoke("What are you?")
print(f"$ ChainAnswer:> {response.content}")
Run the script:
python3 ./app.py
Custom tool usage
Kong also supports custom tools, defined via any supported OpenAI-compatible SDK, including LangChain.
With the same kong.yaml
configuration, you can execute a simple custom tool definition:
from langchain_openai import ChatOpenAI
from langchain_core.tools import tool
kong_url = "http://127.0.0.1:8000"
kong_route = "gpt4o"
@tool
def multiply(first_int: int, second_int: int) -> int:
"""Multiply two integers together."""
return first_int * second_int
llm = ChatOpenAI(
base_url=f'{kong_url}/{kong_route}',
api_key="department-1-api-key"
)
llm_with_tools = llm.bind_tools([multiply])
chain = llm_with_tools | (lambda x: x.tool_calls[0]["args"]) | multiply
response = chain.invoke("What's four times 23?")
print(f"$ ToolsAnswer:> {response}")
Prepare the Gateway for production
Secure your AI model
We’ve just opened up our GPT-4o subscription to the localhost
.
Now add a Kong-level API key to the kong.yaml
configuration file, which secures your published AI route, and allows your to track usage across multiple
users, departments, paying-subscribers, or any other entity:
_format_version: "3.0"
services:
- name: ai
url: https://localhost:32000
routes:
- name: openai-gpt4o
paths:
- "/gpt4o"
plugins:
- name: ai-proxy
config:
route_type: llm/v1/chat
model:
provider: openai
name: gpt-4o
auth:
header_name: Authorization
header_value: "Bearer <OPENAI_KEY_HERE>" # replace with your OpenAI key again
# Now we add a security plugin at the "individual model" scope
- name: key-auth
config:
key_names:
- Authorization
# and finally a consumer with **its own API key**
consumers:
- username: department-1
keyauth_credentials:
- key: "Bearer department-1-api-key"
Adjust your Python code accordingly:
...
...
llm = ChatOpenAI(
base_url=f'{kong_url}/{kong_route}',
model="gpt-4o",
api_key="department-1-api-key" # THIS TIME WE SET THE API KEY FOR THE CONSUMER, AS CREATED ABOVE
)
...
...
Observability
There are two mechanisms for observability in Kong AI Gateway, depending on your deployment architecture:
- Self-hosted and Kong open-source users can bring their favourite JSON-log dashboard software.
- Kong Konnect users can use Konnect Advanced Analytics to automatically visualize every aspect of the AI Gateway operation.
Self-hosting AI Gateway observability
You can use one (or more) of Kong’s many logging protocol plugins, sending your AI Gateway metrics and logs (in JSON format) to your chosen dashboarding software.
You can choose to log metrics, input/output payloads, or both.
Sample ELK stack
Use the sample Elasticsearch/Logstash/Kibana stack on GitHub to see the full range of observability tools available when running LangChain applications via Kong AI Gateway.
Boot it up in three steps:
-
Clone the repository:
git clone https://github.com/KongHQ-CX/kong-ai-gateway-observability && cd kong-ai-gateway-observability/
-
Export your OpenAI API auth header (with API key) into the current shell environment:
export OPENAI_AUTH_HEADER="Bearer sk-proj-......"
-
Start the stack:
docker compose up
Now you can run the same LangChain code as in the previous step(s), visualizing exactly what’s happening in Kibana, at the following URL:
http://localhost:5601/app/dashboards#/view/aa8e4cb0-9566-11ef-beb2-c361d8db17a8
Example reports
You can generate analytics over every AI request executed by LangChain/Kong:
And even, if enabled, every request and response, as granular as “who-is-executing-what-when”:
This uses the HTTP Log plugin to send all AI statistics and payloads to Logstash.
Prompt tuning, audit, and cost control features
Now that you have your LangChain codebase calling one or many LLMs via Kong AI Gateway, you can snap-in as many features as required by harnessing Kong’s growing array of AI plugins.