Building a Metadata-Driven Query Gateway with a Custom Kong Lua Plugin Interfacing Trino and Firestore on AWS EKS


The initial problem wasn’t about data access; it was about the operational friction of data access. Our data science and business intelligence teams needed to query a federated data landscape spanning multiple databases and S3 buckets, a perfect use case for Trino. The challenge was that exposing the Trino coordinator directly led to a cascade of predictable issues: poorly written, resource-intensive queries from non-expert users, inconsistent security contexts, and a complete lack of a coherent API contract for downstream applications. We needed an abstraction layer—a gateway that could translate simple, intention-based REST API calls into safe, performant Trino SQL queries.

This immediately brought API gateways into the picture. We were already running Kong on an AWS EKS cluster for microservice ingress, making it a natural candidate. The default proxying capabilities were insufficient, however. We needed dynamic, request-dependent logic: parse an incoming request, fetch metadata from a store, construct a SQL query, and then rewrite the request to target the Trino coordinator. This is a textbook case for a custom Kong plugin. The choice of Lua was dictated by the environment; it’s the native language for Kong plugins, running directly within Nginx’s event loop, offering minimal overhead.

For the metadata store, we debated between Redis, a traditional RDBMS like Postgres, and Firestore. A relational database felt like overkill for storing what were essentially structured JSON documents defining our query templates. Redis was a possibility, but we wanted a solution that didn’t require us to manage another stateful service within Kubernetes. Firestore’s serverless model, simple document-oriented API, and strong consistency guarantees for single-document operations made it an ideal fit. It would hold the “recipes” for our queries, completely decoupled from the gateway’s deployment lifecycle. The entire stack—Kong, its custom plugin, and the Trino cluster—would run on EKS, providing a unified, scalable operational environment.

The architecture crystallized into the following flow:

sequenceDiagram
    participant C as Client
    participant K as Kong Gateway (EKS)
    participant P as Custom Lua Plugin
    participant F as Firestore
    participant T as Trino Coordinator (EKS)
    participant DS as Data Sources

    C->>+K: GET /queries/active_users?country=US&days=30
    K->>+P: access phase
    P->>+F: Fetch template for 'active_users'
    F-->>-P: Return query template JSON
    P-->>P: Validate 'country' and 'days' params
    P-->>P: Construct Trino SQL from template
    P->>-K: Rewrite request: POST /v1/statement with SQL body
    K->>+T: Forward rewritten request
    T->>+DS: Execute federated query
    DS-->>-T: Return results
    T-->>-K: Forward query results
    K-->>C: Return JSON response

The Firestore Metadata Contract

Before writing a single line of Lua, we had to define the structure of our query templates in Firestore. A poor model here would cripple the system’s flexibility. We settled on a query_templates collection where each document ID represents a public-facing query name (e.g., active_users).

A document for an active_users query might look like this. This isn’t just a string; it’s a structured set of rules the plugin will enforce.

// Firestore Document: /query_templates/active_users
{
  "description": "Calculates active users by country over a specified period.",
  "trino_catalog": "hive",
  "trino_schema": "production",
  "target_table": "user_logins",
  "allowed_select_fields": [
    {"name": "user_id", "alias": "userId"},
    {"name": "login_timestamp", "alias": "loginTimestamp"},
    {"name": "country_code", "alias": "countryCode"}
  ],
  "allowed_parameters": [
    {
      "name": "country",
      "type": "string",
      "required": true,
      "sql_template": "country_code = '{value}'"
    },
    {
      "name": "days",
      "type": "integer",
      "required": true,
      "default_value": 30,
      "sql_template": "login_timestamp > NOW() - INTERVAL '{value}' DAY"
    }
  ],
  "mandatory_where_clauses": [
    "is_active = true"
  ],
  "query_suffix": "LIMIT 1000"
}

This structure provides the plugin with everything it needs:

  1. target_table: The primary table for the query.
  2. allowed_select_fields: A strict whitelist of columns that can be queried, preventing SELECT * style abuses.
  3. allowed_parameters: Defines the API parameters, their types, and crucially, the sql_template for safely injecting their values into the WHERE clause.
  4. mandatory_where_clauses: Injects non-negotiable filters, like ensuring we only query active users, regardless of the client’s request. This is a key security and performance guardrail.
  5. query_suffix: A hard-coded LIMIT to prevent accidental data dumps that could overwhelm Trino or the client.

EKS Cluster Configuration for Kong and Trino

Deploying stateful workloads like Trino and a critical ingress like Kong on EKS requires careful configuration. We use Helm charts for both.

The Kong Helm chart deployment values must be configured to mount our custom plugin code. We package our Lua files into a ConfigMap and mount it as a volume into the Kong pods.

# kong-helm-values.yaml
# --- Abridged for clarity ---
#
# Assumes a ConfigMap named 'kong-plugin-trino-gateway' exists
# with our Lua source files.

env:
  # Add our plugin to the Lua path
  KONG_LUA_PACKAGE_PATH: "/opt/kong/plugins/?.lua;;"
  # Declare the name of our custom plugin
  KONG_PLUGINS: "bundled,trino-gateway"

# Mount the plugin source code from the ConfigMap
deployment:
  kong:
    volumes:
    - name: kong-plugin-trino-gateway
      configMap:
        name: kong-plugin-trino-gateway
    volumeMounts:
    - name: kong-plugin-trino-gateway
      mountPath: /opt/kong/plugins/trino-gateway

Creating the ConfigMap is straightforward with kubectl:

# Plugin directory structure:
# ./trino-gateway/
#   ├── handler.lua
#   └── schema.lua

kubectl create configmap kong-plugin-trino-gateway --from-file=./trino-gateway

For Trino, the key consideration is ensuring it’s discoverable by Kong. We deploy it within the same cluster and expose the coordinator via a ClusterIP service.

# trino-service.yaml
apiVersion: v1
kind: Service
metadata:
  name: trino-coordinator
  namespace: data-platform
spec:
  type: ClusterIP
  selector:
    app: trino
    component: coordinator
  ports:
  - name: http
    port: 8080
    targetPort: 8080

Kong will then be configured with an upstream service pointing to trino-coordinator.data-platform.svc.cluster.local:8080.

The Custom Kong Plugin: Core Logic

The plugin is composed of two files: schema.lua and handler.lua. The schema defines the configuration parameters for the plugin itself, such as the location of our Firestore service.

-- /trino-gateway/schema.lua

local typedefs = require "kong.db.schema.typedefs"

return {
  name = "trino-gateway",
  fields = {
    -- The route this plugin is attached to must be configured with
    -- strip_path = true and the path should contain a capturing group for the query ID.
    -- Example route path: /queries/(?<query_id>[%w_]+)
    {
      config = {
        type = "record",
        fields = {
          {
            firestore_rest_host = {
              type = "string",
              required = true,
              description = "Hostname for the Firestore REST API, e.g., firestore.googleapis.com"
            },
            {
              firestore_project_id = {
                type = "string",
                required = true,
                description = "GCP Project ID for Firestore."
            },
            {
              firestore_collection = {
                type = "string",
                required = false,
                default = "query_templates",
                description = "The Firestore collection holding query templates."
            },
            {
              gcp_service_account_secret = {
                type = "string",
                required = true,
                description = "Name of the Kong secret holding the GCP service account JSON key."
            },
            {
              request_timeout = {
                type = "number",
                required = false,
                default = 10000, -- 10 seconds
                description = "Timeout in milliseconds for the HTTP call to Firestore."
            }
          },
        },
      },
    },
  },
}

The handler.lua contains the actual request processing logic. In a real-world project, this file becomes complex, so breaking it down into smaller, testable functions is critical.

-- /trino-gateway/handler.lua

local http = require "resty.http"
local cjson = require "cjson.safe"
local string_format = string.format

local TrinoGatewayHandler = {
  PRIORITY = 1000,
  VERSION = "0.1.0",
}

-- Helper for fetching secrets from Kong vault or environment
local function get_gcp_key_json(secret_name)
  -- In a production setup, this would fetch from a secure vault.
  -- For this example, we assume it's stored in a Kong secret entity.
  local secret, err = kong.vaults.get(secret_name)
  if err or not secret then
    kong.log.err("Failed to retrieve GCP secret: ", secret_name, " error: ", tostring(err))
    return nil
  end
  return secret
end

-- Safely formats string values for SQL to prevent basic injection.
-- A real implementation might use more robust parameter binding if the DB supports it.
local function sanitize_sql_string(value)
  return string.gsub(value, "'", "''")
end

-- Main execution function, runs in the 'access' phase of the request.
function TrinoGatewayHandler:access(conf)
  -- 1. Get the query ID captured from the request URI.
  -- This relies on the route being configured correctly, e.g., /queries/(?<query_id>[%w_]+)
  local query_id = kong.router.get_route().vars.query_id
  if not query_id then
    return kong.response.exit(400, { message = "Query ID missing in request path." })
  end

  -- 2. Fetch the GCP service account key.
  local gcp_key_json = get_gcp_key_json(conf.gcp_service_account_secret)
  if not gcp_key_json then
    return kong.response.exit(500, { message = "Internal Server Error: Could not load service credentials." })
  end
  -- This is a simplistic auth token generation for demonstration.
  -- In production, use a proper JWT library to sign a token for OAuth 2.0.
  -- For Firestore REST API, you'd typically get an OAuth2 access token.
  -- For simplicity, let's assume we have a way to get a Bearer token. This part is complex.
  -- We'll mock the token for now. A real solution needs a robust OAuth2 client.
  local access_token = "MOCKED_OAUTH2_ACCESS_TOKEN" -- Placeholder for real token logic

  -- 3. Construct the request to Firestore REST API.
  local firestore_url = string_format(
    "https://%s/v1/projects/%s/databases/(default)/documents/%s/%s",
    conf.firestore_rest_host,
    conf.firestore_project_id,
    conf.firestore_collection,
    query_id
  )

  local httpc = http.new()
  local res, err = httpc:request_uri(firestore_url, {
    method = "GET",
    headers = {
      ["Authorization"] = "Bearer " .. access_token,
      ["Content-Type"] = "application/json",
    },
    ssl_verify = true, -- Always verify SSL in production
    timeout = conf.request_timeout,
  })

  if not res then
    kong.log.err("Failed to connect to Firestore: ", err)
    return kong.response.exit(503, { message = "Service unavailable: Cannot reach metadata store." })
  end

  if res.status ~= 200 then
    kong.log.err("Firestore returned non-200 status: ", res.status, " Body: ", res.body)
    return kong.response.exit(404, { message = "Query template not found." })
  end

  local firestore_doc = cjson.decode(res.body)
  -- The fields are nested under a 'fields' key in the Firestore REST API response.
  -- This requires a transformation function to be usable.
  local template = self:transform_firestore_response(firestore_doc.fields)
  if not template then
    return kong.response.exit(500, { message = "Internal error: Invalid query template format." })
  end

  -- 4. Validate request parameters and build the SQL query.
  local request_args = kong.request.get_query()
  local sql, build_err = self:build_sql_query(template, request_args)

  if build_err then
    return kong.response.exit(400, { message = build_err })
  end

  -- 5. Rewrite the request to target the Trino coordinator.
  kong.service.request.set_method("POST")
  kong.service.request.set_path("/v1/statement")
  kong.service.request.set_header("Content-Type", "text/plain")
  -- Set the Trino user. Could be derived from an auth token.
  kong.service.request.set_header("X-Trino-User", "api-gateway-user")
  kong.service.request.set_body(sql)
end

-- Transforms the verbose Firestore REST API JSON into a simple Lua table.
function TrinoGatewayHandler:transform_firestore_response(fields)
  -- This function would recursively parse the Firestore value types
  -- (stringValue, integerValue, mapValue, arrayValue, etc.)
  -- A production implementation of this is non-trivial and omitted for brevity.
  -- Let's assume it produces a clean Lua table matching our expected template structure.
  -- For example, it would turn { "stringValue": "user_logins" } into "user_logins".
  -- Returning a mocked, transformed template for this example:
  return {
    target_table = "user_logins",
    allowed_select_fields = {
      {name = "user_id", alias = "userId"},
      {name = "country_code", alias = "countryCode"}
    },
    allowed_parameters = {
      { name = "country", type = "string", required = true, sql_template = "country_code = '{value}'" },
      { name = "days", type = "integer", required = true, sql_template = "login_timestamp > NOW() - INTERVAL '{value}' DAY" }
    },
    mandatory_where_clauses = { "is_active = true" },
    query_suffix = "LIMIT 1000"
  }
end

-- Constructs the final SQL query from the template and request arguments.
function TrinoGatewayHandler:build_sql_query(template, request_args)
  local select_clauses = {}
  for _, field in ipairs(template.allowed_select_fields) do
    table.insert(select_clauses, string_format("%s AS %s", field.name, field.alias))
  end
  local select_statement = "SELECT " .. table.concat(select_clauses, ", ")

  local from_statement = "FROM " .. template.target_table

  local where_clauses = {}
  if template.mandatory_where_clauses then
    for _, clause in ipairs(template.mandatory_where_clauses) do
      table.insert(where_clauses, clause)
    end
  end

  for _, param_def in ipairs(template.allowed_parameters) do
    local arg_val = request_args[param_def.name]
    if not arg_val then
      if param_def.required then
        return nil, string_format("Missing required parameter: %s", param_def.name)
      else
        goto continue -- Skip to next parameter if not present and not required
      end
    end

    -- Type validation
    if param_def.type == "integer" and not tonumber(arg_val) then
      return nil, string_format("Parameter '%s' must be an integer.", param_def.name)
    end

    local sanitized_val
    if param_def.type == "string" then
      sanitized_val = sanitize_sql_string(arg_val)
    else
      sanitized_val = arg_val -- Assume numeric types are safe
    end

    table.insert(where_clauses, string_format(string.gsub(param_def.sql_template, "{value}", "%s"), sanitized_val))
    ::continue::
  end

  local where_statement = ""
  if #where_clauses > 0 then
    where_statement = "WHERE " .. table.concat(where_clauses, " AND ")
  end

  local full_sql = string.format("%s %s %s %s",
    select_statement,
    from_statement,
    where_statement,
    template.query_suffix or ""
  )

  kong.log.debug("Generated Trino SQL: ", full_sql)
  return full_sql, nil
end

return TrinoGatewayHandler

A common pitfall here is underestimating the complexity of Firestore authentication and JSON parsing within Lua. The GCP REST APIs require OAuth 2.0 with JWT signing. A robust implementation would use a library like lua-resty-jwt to sign the assertion and then exchange it for an access token. The transform_firestore_response function is also critical; the raw JSON from Firestore is verbose and must be normalized into a developer-friendly Lua table.

Putting It All Together

With the plugin code in a ConfigMap and the Kong deployment updated, the final step is to configure Kong’s entities.

  1. Create an Upstream Service for Trino:

    curl -i -X POST http://localhost:8001/services \
      --data name=trino-service \
      --data url=http://trino-coordinator.data-platform.svc.cluster.local:8080
  2. Create a Route to Expose the Gateway:

    curl -i -X POST http://localhost:8001/services/trino-service/routes \
      --data 'name=query-gateway-route' \
      --data 'paths[]=/queries/(?<query_id>[%w_]+)' \
      --data 'strip_path=true'

    The named capture group (?<query_id>...) is essential for the plugin to extract the query template name from the path.

  3. Apply the Custom Plugin to the Route:

    # First, create the GCP secret in Kong
    curl -i -X PUT http://localhost:8001/secrets \
      --data "key=gcp-sa-key" \
      --data "value=$(cat /path/to/your/service-account.json | base64)"
    
    # Now, apply the plugin
    curl -i -X POST http://localhost:8001/routes/query-gateway-route/plugins \
      --data "name=trino-gateway" \
      --data "config.firestore_rest_host=firestore.googleapis.com" \
      --data "config.firestore_project_id=my-gcp-project" \
      --data "config.gcp_service_account_secret=gcp-sa-key"

Now, a client can make a simple, clean API call:

curl "http://<KONG_IP>:8000/queries/active_users?country=US&days=15"

The plugin intercepts this, fetches the active_users template from Firestore, validates the parameters, and generates a safe, bounded SQL query like this to send to Trino:

SELECT user_id AS userId, country_code AS countryCode
FROM user_logins
WHERE is_active = true AND country_code = 'US' AND login_timestamp > NOW() - INTERVAL '15' DAY
LIMIT 1000

The client receives the query results from Trino without ever seeing or constructing the SQL, achieving our primary goal of a secure, managed data access layer.

This architecture, while effective, is not without its limitations. The synchronous call to Firestore in the request path adds latency. For high-throughput scenarios, caching the Firestore templates in-memory within Kong (e.g., using lua-resty-lrucache) would be a necessary optimization to avoid hammering the metadata store. Furthermore, this design only handles synchronous, short-running queries. A robust implementation would need to handle Trino’s asynchronous query execution model for long-running analytics, where the gateway would return a query status URL for the client to poll. The current security model is also basic; a production system should integrate with an OIDC provider, parse JWTs within the plugin, and use claims to drive user-specific query permissions and row-level security.


  TOC