Table of Contents
This document covers integrating Kong Vitals with a new or existing InfluxDB time series server or cluster. Leveraging a time series database for Vitals data can improve request and Vitals performance in very-high traffic Kong Enterprise clusters (such as environments handling tens or hundreds of thousands of requests per second), without placing addition write load on the database backing the Kong cluster.
For using Vitals with a database as the backend (i.e. PostgreSQL, Cassandra), please refer to Kong Vitals.
This guide assumes an existing InfluxDB server or cluster is already installed and is accepting write traffic. Production-ready InfluxDB installations should be deployed as a separate effort, but for proof-of-concept testing, running a local InfluxDB instance is possible via Docker:
$ docker run -p 8086:8086 \ -v $PWD:/var/lib/influxdb \ influxdb
Writing Vitals data to InfluxDB requires that the
kong database is created.
Currently, this operation must be done manually. This can be done via the
influx> create database kong;
Alternatively the InfluxDB API may be queried directly to create the database.
In addition to enabling Vitals, Kong must be configured to use InfluxDB as the backing strategy for Vitals. The InfluxDB host and port must also be defined:
vitals_strategy = influxdb vitals_tsdb_address = 127.0.0.1:8086 # the IP or hostname, and port, of InfluxDB
As with other Kong configurations, changes take effect on kong reload or kong restart.
Kong Vitals records metrics in two InfluxDB measurements-
contains field values for request latencies and HTTP, and tags for various Kong
entities associated with the requests (e.g., the Route and Service in question,
kong_datastore_cache, which contains points about cache hits and
misses. Measurement schemas are listed below:
> show tag keys name: kong_request tagKey ------ consumer hostname route service status_f wid workspace name: kong_datastore_cache tagKey ------ hostname wid
> show field keys name: kong_request fieldKey fieldType -------- --------- kong_latency integer proxy_latency integer request_latency integer status integer name: kong_datastore_cache fieldKey fieldType -------- --------- hits integer misses integer
wid is used to differentiate the unique worker ID per host, to avoid
duplicate metrics shipped at the same point in time.
As demonstrated above, the series cardinality of the
varies based on the cardinality of the Kong cluster configuration - a greater
number of Service/Route/Consumer/Workspace combinations handled by Kong results
in a greater series cardinality as written by Vitals. Please consult the
InfluxDB sizing guidelines
for reference on appropriately sizing an InfluxDB node/cluster. Note that the
query behavior when reading Vitals data falls under the “moderate” load
category as defined by the above document - several
GROUP BY statements and
functions are used to generate the Vitals API responses, which can require
significant CPU resources to execute when hundreds of thousands or millions of
data points are present.
Kong buffers Vitals metrics and writes InfluxDB points in batches to improve throughput in InfluxDB and reduce overhead in the Kong proxy path. Each Kong worker process flushes its buffer of metrics every 5 seconds or 5000 data points, whichever comes first.
Metrics points are written with microsecond (
u) precision. To comply with
the Vitals API, measurement
values are read back grouped by second. Note that due to limitations in the
OpenResty API, writing values with microsecond precision requires an additional
syscall per request.
Currently, Vitals InfluxDB data points are not downsampled or managed via
retention policy by Kong. InfluxDB operators are encouraged to manually manage
the retention policy of the
kong database to reduce the disk space and memory
needed to manage Vitals data points. Currently, Kong Vitals ignores data points
older than 25 hours; it is safe to create a retention policy with a 25-hour
duration for measurements written by Kong.