The observability stack was reporting clean traces, but they felt hollow. We had our services instrumented with the SkyWalking agent, requests flowing through our Kong API Gateway were being tracked, and we could see the beautiful flame graphs connecting our ASP.NET Core microservices. Yet, a critical piece of information was consistently missing from every single span: the tenant_id
. In our multi-tenant architecture, nearly every business decision, every log filter, every metric dimension, pivots on this identifier.
The initial, brute-force solution was to pass X-Tenant-Id
as a header. This worked, but it was an explicit dependency that polluted our service contracts. Every developer had to remember to read it, propagate it, and log it. It was fragile and violated the principle of separating business logic from cross-cutting concerns. The observability system should provide this context implicitly, not force the application code to carry it.
Our point of ingress, Kong, was the only component with the full context to resolve a tenant_id
from an incoming request’s API key. The logical conclusion was that Kong must be responsible for injecting this tenant context into the trace. The challenge was that Kong, by default, is just a proxy; it’s unaware of the specifics of SkyWalking’s sw8
propagation header. A generic plugin was needed, one that could enrich the trace context without breaking it.
Our first attempt was to simply decode the sw8
header, append our data, and re-encode it. This was a non-starter. The sw8
header has a strict, position-based format: 1-{traceId}-{segmentId}-3-{parentService}-{parentInstance}-{parentEndpoint}-{addressUsedAtClient}
. Tampering with it directly would corrupt the trace, making it unreadable by downstream agents. A real-world project cannot afford such a brittle solution.
The answer lay in a lesser-known but official extension to the SkyWalking propagation protocol: the sw8-correlation
header. This header is designed specifically for carrying custom, key-value data across service boundaries within a trace. The downstream SkyWalking agent automatically picks up this header, parses the data, and attaches it to the active span’s correlation context. This was the mechanism we needed. The implementation path was clear: build a custom Kong Lua plugin that performs tenant resolution and then constructs a valid sw8-correlation
header.
The following diagram illustrates the target request flow, where the custom plugin acts as the enrichment point.
sequenceDiagram participant Client participant Kong Gateway participant TenantEnrichmentPlugin participant AspNetCoreService participant SkyWalkingAgent Client->>+Kong Gateway: GET /api/data (with API-Key) Kong Gateway->>+TenantEnrichmentPlugin: access phase TenantEnrichmentPlugin-->>TenantEnrichmentPlugin: 1. Extract API-Key TenantEnrichmentPlugin-->>TenantEnrichmentPlugin: 2. Resolve tenant_id TenantEnrichmentPlugin-->>TenantEnrichmentPlugin: 3. Create 'sw8-correlation' header TenantEnrichmentPlugin-->>-Kong Gateway: Set request header Kong Gateway->>+AspNetCoreService: Forward request with sw8 & sw8-correlation AspNetCoreService->>+SkyWalkingAgent: (On request start) SkyWalkingAgent-->>SkyWalkingAgent: 4. Read sw8 & sw8-correlation headers SkyWalkingAgent-->>SkyWalkingAgent: 5. Create new span, populate correlation context SkyWalkingAgent-->>-AspNetCoreService: Provides tracing context Note right of AspNetCoreService: Application code can now access
'tenant_id' from the trace context. AspNetCoreService-->>-Kong Gateway: Response Kong Gateway-->>-Client: Response
First, let’s define the structure of our custom plugin. A Kong plugin requires at least two files: schema.lua
to define its configuration and handler.lua
for the core logic.
Here is the schema.lua
. It defines a single configuration field, tenant_lookup_url
, which is the endpoint our plugin will call to resolve an API key into a tenant_id
. In a production scenario, this would likely be a highly-available, low-latency service or a local cache.
-- file: kong/plugins/tenant-trace-injector/schema.lua
local typedefs = require "kong.db.schema.typedefs"
-- A simple schema for our plugin.
-- We define a configuration property `tenant_lookup_url` to be used for resolving
-- the tenant ID from an API key. This makes the plugin configurable per-service or per-route.
return {
name = "tenant-trace-injector",
fields = {
{ consumer = typedefs.no_consumer },
{
config = {
type = "record",
fields = {
{
tenant_lookup_url = {
type = "string",
required = true,
-- Example: "http://auth-service.internal:8080/v1/resolve-tenant"
-- This URL will be called with a POST request containing the API key.
},
},
},
},
},
},
}
The core logic resides in handler.lua
. This implementation hooks into Kong’s access
phase, which runs after authentication but before the request is proxied to the upstream service. This is the ideal stage to perform our context injection.
-- file: kong/plugins/tenant-trace-injector/handler.lua
local BasePlugin = require "kong.plugins.base_plugin"
local http = require "resty.http"
local cjson = require "cjson"
local b64 = require "ngx.encode.base64"
local TenantTraceInjectorHandler = BasePlugin:extend()
TenantTraceInjectorHandler.PRIORITY = 1000 -- Runs after authentication plugins like key-auth.
TenantTraceInjectorHandler.VERSION = "1.0.0"
function TenantTraceInjectorHandler:new()
TenantTraceInjectorHandler.super.new(self, "tenant-trace-injector")
end
-- This function executes for every request in the 'access' phase of the Kong gateway.
function TenantTraceInjectorHandler:access(conf)
TenantTraceInjectorHandler.super.access(self)
-- Step 1: Attempt to get the authenticated consumer's API key.
-- We assume a preceding plugin like 'key-auth' has already run and identified the consumer.
local consumer = kong.client.get_consumer()
if not consumer then
kong.log.warn("tenant-trace-injector: No consumer found. Skipping tenant injection.")
return
end
-- We retrieve the API key from the credential associated with the consumer.
-- This part is highly dependent on your authentication setup. Here, we assume 'key-auth'.
local credential = kong.client.get_credential()
if not (credential and credential.key) then
kong.log.warn("tenant-trace-injector: No API key found for consumer. Skipping.")
return
end
local api_key = credential.key
-- Step 2: Resolve the tenant_id by calling the external lookup service.
-- In a production system, this HTTP call is a critical performance bottleneck.
-- Caching (e.g., using `kong.cache` or an external Redis) is essential here.
local httpc = http.new()
-- Using a short timeout to prevent blocking Kong's event loop for too long.
httpc:set_timeout(500) -- 500ms timeout
local res, err = httpc:request_uri(conf.tenant_lookup_url, {
method = "POST",
headers = { ["Content-Type"] = "application/json" },
body = cjson.encode({ apiKey = api_key })
})
if not res then
kong.log.err("tenant-trace-injector: Failed to connect to tenant lookup service: ", err)
-- Fail open: we don't block the request, just log the error and continue without the tag.
return
end
if res.status ~= 200 then
kong.log.err("tenant-trace-injector: Tenant lookup service returned status ", res.status, ". Body: ", res.body)
return
end
local body, json_err = cjson.decode(res.body)
if json_err then
kong.log.err("tenant-trace-injector: Failed to decode JSON response from lookup service: ", json_err)
return
end
local tenant_id = body.tenantId
if not tenant_id then
kong.log.warn("tenant-trace-injector: Tenant ID not found in lookup service response for key.")
return
end
-- Step 3: Construct the sw8-correlation header.
-- The format is a comma-separated list of key:value pairs.
-- Each key and value must be Base64 encoded.
local correlation_key = "tenant_id"
local correlation_value = tostring(tenant_id)
-- SkyWalking's protocol expects both the key and the value to be Base64 encoded.
local encoded_key = b64.encode_uri(correlation_key)
local encoded_value = b64.encode_uri(correlation_value)
local correlation_header_value = encoded_key .. ":" .. encoded_value
-- Step 4: Set the header on the request to be proxied to the upstream service.
kong.service.request.set_header("sw8-correlation", correlation_header_value)
kong.log.debug("tenant-trace-injector: Injected sw8-correlation header with tenant_id: ", tenant_id)
end
return TenantTraceInjectorHandler
To deploy this plugin, the Lua files need to be placed in a directory accessible by the Kong process. A common approach is to use a custom Docker image.
# Dockerfile for a custom Kong image with our plugin
FROM kong:3.4
# Set the path where Kong will look for custom plugins.
ENV KONG_PLUGINS=bundled,tenant-trace-injector
ENV KONG_LUA_PACKAGE_PATH=/opt/?.lua;;
# Create the plugin directory and copy our custom plugin files into it.
USER root
RUN mkdir -p /opt/kong/plugins/tenant-trace-injector
COPY ./tenant-trace-injector /opt/kong/plugins/tenant-trace-injector
RUN chown -R kong:kong /opt/kong/plugins/tenant-trace-injector
USER kong
With the plugin built into our Kong image, we can now enable it on a specific route or service. Here is an example of applying it to a service via Kong’s Admin API.
# This assumes Kong's admin API is available at http://localhost:8001
# First, create a dummy upstream service.
curl -i -X POST http://localhost:8001/services \
--data name=product-service \
--data url='http://aspnetcore-backend:80'
# Create a route to access this service.
curl -i -X POST http://localhost:8001/services/product-service/routes \
--data 'paths[]=/products' \
--data name=product-route
# Enable the key-auth plugin first (our plugin depends on it).
curl -i -X POST http://localhost:8001/services/product-service/plugins \
--data "name=key-auth"
# Create a consumer and an API key.
curl -i -X POST http://localhost:8001/consumers/ \
--data "username=my-tenant-consumer"
curl -i -X POST http://localhost:8001/consumers/my-tenant-consumer/key-auth \
--data "key=mysecretapikey"
# Finally, enable our custom plugin.
# The `tenant_lookup_url` points to a mock service that we need to create.
# In a real setup, this would be a robust authentication/authorization service.
curl -i -X POST http://localhost:8001/services/product-service/plugins \
--data "name=tenant-trace-injector" \
--data "config.tenant_lookup_url=http://mock-auth-service:3000/resolve"
For this to work, we also need a mock service that our Kong plugin can call. A simple Node.js/Express app can simulate this behavior.
// file: mock-auth-service/index.js
const express = require('express');
const app = express();
app.use(express.json());
// This endpoint simulates resolving an API key to a tenant ID.
app.post('/resolve', (req, res) => {
const { apiKey } = req.body;
console.log(`Received API key for resolution: ${apiKey}`);
if (apiKey === 'mysecretapikey') {
// In a real system, this would come from a database lookup.
return res.status(200).json({ tenantId: 'tenant-abc-123' });
} else {
return res.status(404).json({ error: 'API key not found' });
}
});
const port = 3000;
app.listen(port, () => {
console.log(`Mock auth service listening on port ${port}`);
});
Now for the receiving end: the ASP.NET Core service. The key is configuring the SkyWalking .NET agent to connect to a SkyWalking OAP (Observability Analysis Platform) and ensuring it’s active for our application. This is typically done via environment variables or a configuration file.
Here is a minimal Program.cs
for a web API that will be instrumented.
// file: AspNetCoreService/Program.cs
using SkyApm.Abstractions;
using SkyApm.Tracing;
var builder = WebApplication.CreateBuilder(args);
// Add services to the container.
builder.Services.AddControllers();
// Adding ITracingContextAccessor to the dependency injection container
// allows us to access the current trace context anywhere in our application.
builder.Services.AddSingleton<ITracingContextAccessor, TracingContextAccessor>();
var app = builder.Build();
// A simple middleware to demonstrate accessing the correlation context.
app.Use(async (context, next) =>
{
var tracingContextAccessor = context.RequestServices.GetRequiredService<ITracingContextAccessor>();
var logger = context.RequestServices.GetRequiredService<ILogger<Program>>();
// This is the crucial part: reading the value injected by our Kong plugin.
// The key 'tenant_id' must match the key used in the Lua plugin.
var tenantId = tracingContextAccessor.TracingContext.CorrelationContext.Get("tenant_id");
if (!string.IsNullOrEmpty(tenantId))
{
// We can now use this tenantId for logging, metrics, or business logic.
// By using structured logging, this becomes a queryable field in our log aggregator.
using (logger.BeginScope(new Dictionary<string, object> { ["TenantId"] = tenantId }))
{
logger.LogInformation("Tenant context successfully retrieved from SkyWalking correlation context.");
await next(context);
}
}
else
{
logger.LogWarning("Request processed without a 'tenant_id' in the SkyWalking correlation context.");
await next(context);
}
});
app.MapControllers();
app.MapGet("/products/{id}", (int id, ILogger<Program> logger) => {
// The TenantId from the middleware's scope will be automatically included in this log entry.
logger.LogInformation("Fetching product with ID {ProductId}", id);
// Some business logic would go here...
return Results.Ok(new { ProductId = id, Name = "Super Widget" });
});
app.Run();
To activate the SkyWalking agent, we need to configure our application’s runtime environment. Using Docker Compose, this is straightforward.
# file: docker-compose.yml
version: '3.8'
services:
# SkyWalking OAP and UI for visualization
skywalking-oap:
image: apache/skywalking-oap-server:9.5.0
container_name: skywalking-oap
ports:
- "11800:11800" # gRPC receiver
- "12800:12800" # HTTP receiver
healthcheck:
test: ["CMD", "/skywalking/bin/swctl", "ch", "-s", "127.0.0.1:12800"]
interval: 10s
timeout: 5s
retries: 5
skywalking-ui:
image: apache/skywalking-ui:9.5.0
container_name: skywalking-ui
depends_on:
- skywalking-oap
ports:
- "8080:8080"
environment:
SW_OAP_ADDRESS: http://skywalking-oap:12800
# Our ASP.NET Core backend service
aspnetcore-backend:
build:
context: ./AspNetCoreService
container_name: aspnetcore-backend
environment:
# SkyWalking Agent Configuration
- ASPNETCORE_HOSTINGSTARTUPASSEMBLIES=SkyApm.Agent.AspNetCore
- SKYWALKING__SERVICENAME=product-service-netcore
- SKYWALKING__GRPCSERVERS=skywalking-oap:11800
# Tells the agent to collect correlation context data
- SKYWALKING__CORRELATION__MAXNUMBEROFKEYS=10
- SKYWALKING__CORRELATION__TOTALVALUESLENGTHLIMIT=2048
# Mock service for tenant resolution
mock-auth-service:
build:
context: ./mock-auth-service
container_name: mock-auth-service
# Custom Kong instance
kong-db:
image: postgres:13
environment:
- POSTGRES_USER=kong
- POSTGRES_PASSWORD=kong
- POSTGRES_DB=kong
healthcheck:
test: ["CMD", "pg_isready", "-U", "kong"]
interval: 5s
timeout: 5s
retries: 5
kong-migrations:
image: kong:3.4
depends_on:
kong-db:
condition: service_healthy
environment:
- KONG_DATABASE=postgres
- KONG_PG_HOST=kong-db
- KONG_PG_PASSWORD=kong
command: "kong migrations bootstrap"
kong:
build:
context: . # Assumes Dockerfile is in the root with the plugin directory
dockerfile: Kong.Dockerfile
container_name: kong-gateway
depends_on:
kong-migrations:
condition: service_completed_successfully
aspnetcore-backend:
condition: service_started
mock-auth-service:
condition: service_started
environment:
- KONG_DATABASE=postgres
- KONG_PG_HOST=kong-db
- KONG_PG_PASSWORD=kong
- KONG_PROXY_ACCESS_LOG=/dev/stdout
- KONG_ADMIN_ACCESS_LOG=/dev/stdout
- KONG_PROXY_ERROR_LOG=/dev/stderr
- KONG_ADMIN_ERROR_LOG=/dev/stderr
- KONG_ADMIN_LISTEN=0.0.0.0:8001
# Must declare our custom plugin here so Kong loads it
- KONG_PLUGINS=bundled,tenant-trace-injector
- KONG_LOG_LEVEL=debug # For development
ports:
- "8000:8000" # Proxy
- "8001:8001" # Admin
After deploying this stack and sending a request with the correct API key, the result is immediately visible in the logs of the aspnetcore-backend
service.
{
"@t": "2023-10-27T11:45:00.123Z",
"@m": "Tenant context successfully retrieved from SkyWalking correlation context.",
"TenantId": "tenant-abc-123" // <-- The injected value!
}
{
"@t": "2023-10-27T11:45:00.125Z",
"@m": "Fetching product with ID 42",
"ProductId": 42,
"TenantId": "tenant-abc-123" // <-- Context is maintained via logger scope.
}
This demonstrates a complete, end-to-end flow. The custom Kong plugin successfully intercepts the request, calls an external service to resolve business context, injects this context into a standard SkyWalking propagation header, and the downstream ASP.NET Core service transparently consumes it from the trace context. The application code is kept clean, and the observability data is enriched at the earliest possible point.
This solution, while effective, is not without its own trade-offs. The tenant_lookup_url
call from the Kong plugin introduces an additional network hop and point of failure for every single request. In a high-throughput environment, this lookup must be backed by an extremely fast and resilient service, and aggressive caching within the Lua plugin (using lua-resty-lrucache
or kong.cache
) becomes non-negotiable. Furthermore, the sw8-correlation
header is base64 encoded but not encrypted, making it unsuitable for propagating highly sensitive data. Its purpose is for correlation context, not secure data transfer. The applicability is therefore limited to internal systems where the network path between the gateway and services is trusted. Future iterations would need to focus heavily on the performance and security hardening of this tenant resolution step.