Integrating a Custom Rust gRPC Middleware with Tyk for High-Throughput Request Processing and ELK-based Observability

System Architecture

Word Count: 2.7k

Read Times: 16 Min

The performance degradation was subtle at first, then catastrophic. Our Tyk API Gateway, fronting a suite of Node.js microservices, began showing erratic latency spikes under sustained load. The culprit was a piece of custom middleware written in Lua, tasked with JWT validation, claim extraction, and some minor request enrichment. In a real-world project, this kind of edge logic is common, but as traffic scaled, the interpreted nature of the middleware became an undeniable bottleneck. The CPU on the gateway nodes would pin, and the Node.js services downstream would be starved of requests, despite being largely idle. Compounding the issue, debugging was a nightmare. Logs from Tyk, the middleware, and the upstream services were disconnected streams of text, making it impossible to trace a single slow request through the stack.

Our initial concept was to offload this performance-critical logic from the gateway’s main process. Tyk’s support for custom gRPC middleware presented a viable path. This would allow us to write the logic in a compiled language, running as a separate, independently scalable service, and have Tyk call out to it for each request. The choice of language for this new middleware was critical. We needed raw performance, memory safety to prevent insidious bugs, and a concurrency model that wouldn’t crumble under pressure. Rust was the obvious candidate. Its promise of C++-level performance without the memory-related pitfalls, coupled with its excellent ecosystem via Tonic for gRPC, made it a pragmatic choice over Go or C++. The goal was not to rewrite our Node.js services—they performed their I/O-bound business logic just fine—but to build a highly optimized, surgical fix for the authentication and authorization layer at the very edge.

This offloading strategy also presented an opportunity to solve our observability problem. The new Rust service would be responsible for generating a unique correlation ID for each request, injecting it as a header before passing it upstream, and logging its own actions in a structured JSON format with that ID. The Node.js services would then be configured to pick up this header and include it in their own structured logs. With Filebeat shipping all these logs to a central ELK Stack, we could finally reconstruct the entire lifecycle of a request, from the gateway to the application, using a single query in Kibana.

The architectural flow we settled on is as follows:

sequenceDiagram
    participant Client
    participant TykGateway as Tyk Gateway
    participant RustMiddleware as Rust gRPC Middleware
    participant NodeService as Node.js Service
    participant ELK as ELK Stack

    Client->>+TykGateway: HTTP Request (with JWT)
    TykGateway->>+RustMiddleware: gRPC Dispatch (request context)
    RustMiddleware->>RustMiddleware: 1. Generate Correlation ID
2. Validate JWT
3. Extract Claims
4. Generate Structured Log
    RustMiddleware-->>ELK: Ship JSON Log
    RustMiddleware-->>-TykGateway: gRPC Response (enriched headers)
    TykGateway->>TykGateway: Apply enriched headers to request
    TykGateway->>+NodeService: Proxied HTTP Request
    NodeService->>NodeService: 1. Process business logic
2. Read Correlation ID header
3. Generate Structured Log
    NodeService-->>ELK: Ship JSON Log
    NodeService-->>-TykGateway: HTTP Response
    TykGateway-->>-Client: Final HTTP Response

The first step was defining the contract between Tyk and our Rust service using Protocol Buffers. Tyk’s gRPC middleware dispatcher sends a coprocess.Object message. We only needed to implement the Dispatch RPC method.

middleware.proto

// Proto definition for the Tyk gRPC middleware interface.
// This is a standard definition provided by Tyk's documentation.
// We are implementing a service that adheres to this contract.
syntax = "proto3";
package coprocess;

option go_package = ".;coprocess";

// The dispatcher is the service that the Tyk Gateway will call.
service Dispatcher {
    rpc Dispatch(Object) returns (Object) {}
    rpc DispatchEvent(Event) returns (EventReply) {}
}

message Object {
    HookType hook_type = 1;
    string hook_name = 2;
    MiniRequestObject request = 3;
    SessionState session = 4;
    map<string, string> metadata = 5;
    map<string, string> spec = 6;
    Event event = 10;
}

// A simplified request object sent by Tyk.
message MiniRequestObject {
    map<string, string> headers = 1;
    string url = 2;
    string method = 3;
    map<string, string> form_params = 4;
    bytes body = 5;
    string raw_body = 6;
}

message SessionState {
    string last_check_time = 1;
    string allowance = 2;
    string rate = 3;
    string per = 4;
    int64 quota_max = 5;
    int64 quota_renews = 6;
    int64 quota_remaining = 7;
    int64 quota_renewal_rate = 8;
    map<string, string> access_rights = 9;
    string org_id = 10;
    string oauth_client_id = 11;
    repeated string oauth_keys = 12;
    map<string, string> basic_auth_data = 13;
    map<string, string> jwt_data = 14;
    map<string, string> hmac_data = 15;
    bool is_inactive = 16;
    string apply_policy_id = 17;
    int64 data_expires = 18;
    map<string, string> monitor = 19;
    bool enable_detailed_recording = 20;
    string metadata = 21;
    repeated string tags = 22;
    string alias = 23;
    string last_updated = 24;
    string idp_id = 25;
}

message Event {
    string payload = 1;
}

message EventReply {
    string payload = 1;
}

enum HookType {
    Unknown = 0;
    Pre = 1;
    Post = 2;
    PostKeyAuth = 3;
    CustomKeyCheck = 4;
    Driver = 5;
}

With the contract defined, we proceeded to implement the Rust gRPC server using the tonic and tokio crates. The core logic resides within the dispatch function. A common pitfall here is performing heavy computation or blocking I/O directly in this handler. Our goal is to keep it lean: validate the token, enrich headers, and log. Anything else is an anti-pattern for this type of middleware.

src/main.rs

use tonic::{transport::Server, Request, Response, Status};
use coprocess::{
    dispatcher_server::{Dispatcher, DispatcherServer},
    MiniRequestObject, Object,
};
use tracing::{info, error, instrument, Level};
use tracing_subscriber::{fmt, layer::SubscriberExt, util::SubscriberInitExt};
use uuid::Uuid;
use jsonwebtoken::{decode, DecodingKey, Validation, Algorithm};
use serde::{Deserialize, Serialize};

// The namespace for our generated protobuf code.
pub mod coprocess {
    tonic::include_proto!("coprocess");
}

// Define the claims we expect in our JWT.
#[derive(Debug, Serialize, Deserialize)]
struct Claims {
    sub: String,
    exp: usize,
    roles: Vec<String>,
}

// The main struct for our gRPC dispatcher service.
#[derive(Default)]
pub struct MyDispatcher {}

const CORRELATION_ID_HEADER: &str = "x-correlation-id";
const AUTH_HEADER: &str = "authorization";

// This is the core logic of the middleware.
// The `instrument` macro from tracing automatically adds context (like request_id) to all logs within this function.
#[tonic::async_trait]
#[instrument(skip(self, request), fields(request_id = tracing::field::Empty))]
impl Dispatcher for MyDispatcher {
    async fn dispatch(&self, request: Request<Object>) -> Result<Response<Object>, Status> {
        let mut obj = request.into_inner();

        // 1. Generate or retrieve the correlation ID for end-to-end tracing.
        let correlation_id = obj
            .request
            .as_mut()
            .and_then(|req| {
                req.headers
                    .get(CORRELATION_ID_HEADER)
                    .cloned()
            })
            .unwrap_or_else(|| Uuid::new_v4().to_string());
        
        // Add the correlation ID to the tracing span's context.
        tracing::Span::current().record("request_id", &correlation_id.as_str());
        
        info!("Processing new request.");

        // We only care about the "Pre" hook, which runs before authentication.
        if obj.hook_name != "MyAuthMiddleware" {
            info!("Hook name does not match, skipping.");
            return Ok(Response::new(obj));
        }

        // 2. Extract the JWT from the Authorization header.
        let auth_header = obj
            .request
            .as_ref()
            .and_then(|req| req.headers.get(AUTH_HEADER))
            .ok_or_else(|| {
                error!("Authorization header missing.");
                Status::unauthenticated("Authorization header missing")
            })?;

        if !auth_header.starts_with("Bearer ") {
            error!("Invalid Authorization header format.");
            return Err(Status::unauthenticated("Invalid token format"));
        }
        let token = &auth_header[7..];

        // 3. Decode and validate the JWT.
        // In a real-world project, the secret should be fetched from a secure configuration service or environment variable.
        // Hardcoding secrets is a major security risk.
        let secret = "my-super-secret-key-that-is-not-hardcoded";
        let decoding_key = DecodingKey::from_secret(secret.as_ref());
        let validation = Validation::new(Algorithm::HS256);
        
        let token_data = match decode::<Claims>(token, &decoding_key, &validation) {
            Ok(data) => data,
            Err(e) => {
                error!(error = %e, "JWT validation failed.");
                // We modify the request to signal an auth failure to Tyk.
                obj.request.as_mut().unwrap().headers.insert(
                    "x-auth-failed".to_string(),
                    "true".to_string()
                );
                // Return Ok here because the gRPC call itself succeeded.
                // Tyk will see the header and block the request.
                return Ok(Response::new(obj));
            }
        };

        let claims = token_data.claims;
        info!(user_id = %claims.sub, "JWT validation successful.");

        // 4. Enrich the request with new headers based on JWT claims.
        // The upstream Node.js service will use these headers.
        if let Some(req) = obj.request.as_mut() {
            req.headers.insert("x-user-id".to_string(), claims.sub);
            req.headers.insert("x-user-roles".to_string(), claims.roles.join(","));
            // Also ensure the correlation ID is passed upstream.
            req.headers.insert(CORRELATION_ID_HEADER.to_string(), correlation_id);
        }

        // 5. Remove the original Authorization header to prevent it from reaching the upstream service.
        if let Some(req) = obj.request.as_mut() {
            req.headers.remove(AUTH_HEADER);
        }

        info!("Request enrichment complete.");
        Ok(Response::new(obj))
    }
}

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Setup structured logging using tracing and tracing-subscriber.
    // This configuration outputs logs as JSON, which is perfect for Logstash/Fluentd.
    let log_level = std::env::var("RUST_LOG").unwrap_or_else(|_| "info".to_string());
    tracing_subscriber::registry()
        .with(fmt::layer().json())
        .with(tracing_subscriber::filter::LevelFilter::from_level(
            log_level.parse::<Level>().unwrap_or(Level::INFO),
        ))
        .init();

    let addr = "0.0.0.0:50051".parse()?;
    let dispatcher = MyDispatcher::default();

    info!(address = %addr, "gRPC middleware server listening");

    Server::builder()
        .add_service(DispatcherServer::new(dispatcher))
        .serve(addr)
        .await?;

    Ok(())
}

One of the first problems we encountered was how to handle authentication failures. If the JWT was invalid, our initial instinct was to return an Err from the dispatch function. This, however, terminates the connection and returns a generic “gRPC server error” to Tyk, which isn’t ideal. The correct approach is to always return Ok(Response::new(obj)) but modify the Object to signal the failure. A common practice is to add a specific header like x-auth-failed. Then, in the Tyk API definition, we can use a transform or another piece of middleware to check for this header and return a proper 401 Unauthorized response. This keeps the control flow clean and gives more power to the API designer in Tyk.

Next was configuring Tyk to use this new gRPC service. This is done directly in the API definition JSON file. It requires specifying the custom_middleware section and pointing it to our service.

tyk-api-definition.json

{
  "name": "My Node.js Service",
  "api_id": "nodeservice1",
  "org_id": "1",
  "use_keyless": false,
  "auth": {
    "auth_header_name": "Authorization"
  },
  "definition": {
    "location": "header",
    "key": "x-api-version"
  },
  "version_data": {
    "not_versioned": true,
    "versions": {
      "Default": {
        "name": "Default",
        "expires": "",
        "paths": {
          "ignored": [],
          "white_list": [],
          "black_list": []
        },
        "use_extended_paths": true,
        "extended_paths": {}
      }
    }
  },
  "proxy": {
    "listen_path": "/node-service/",
    "target_url": "http://node-app:3000/",
    "strip_listen_path": true
  },
  "custom_middleware": {
    "pre": [
      {
        "name": "MyAuthMiddleware",
        "path": "",
        "require_session": false,
        "raw_body_only": false
      }
    ],
    "driver": "grpc",
    "grpc_options": {
      "auth_header": "middleware-secret-token",
      "target": "rust-middleware:50051",
      "use_tls": false,
      "insecure_skip_verify": true
    }
  },
  "active": true
}

A key detail here is grpc_options. The target must be the resolvable address of our Rust service. The auth_header provides a simple shared secret for protecting the middleware endpoint, preventing unauthorized services from calling it. For production, use_tls should absolutely be true, which introduced another set of challenges. Getting the mTLS handshake correct between Tyk’s Go-based gRPC client and Tonic’s Rust-based server required careful generation of certificates and keys and ensuring both sides were configured with the correct CA, certificate, and private key. A common mistake is generating certificates with the wrong Common Name (CN) or Subject Alternative Name (SAN), leading to handshake failures.

With the gateway and middleware in place, the final code change was in our upstream Node.js service. It needed to be aware of the headers added by the Rust middleware, especially the correlation ID, and incorporate it into its own logging.

node-app/index.js

const express = require('express');
const winston = require('winston');
const { v4: uuidv4 } = require('uuid');

const app = express();
const port = 3000;

// Configure Winston for structured JSON logging.
// This format is easily parseable by Logstash or Fluentd.
const logger = winston.createLogger({
  level: 'info',
  format: winston.format.json(),
  transports: [
    new winston.transports.Console(),
  ],
});

// A small middleware to extract our correlation ID and user context
// and attach it to the request object for easy access.
const contextMiddleware = (req, res, next) => {
  // Use the correlation ID from the Rust middleware, or generate a new one if it's missing.
  const correlationId = req.header('x-correlation-id') || uuidv4();
  const userId = req.header('x-user-id') || 'anonymous';
  const userRoles = req.header('x-user-roles') || '';

  req.context = { correlationId, userId, userRoles };
  next();
};

app.use(contextMiddleware);

app.get('/', (req, res) => {
  // Create a child logger with the context for this specific request.
  // All logs produced by this logger will have these fields.
  const requestLogger = logger.child({
    correlationId: req.context.correlationId,
    userId: req.context.userId,
    path: req.path,
  });

  requestLogger.info(`Processing request for user. Roles: ${req.context.userRoles}`);
  
  // A pitfall here is to forget to log errors in the same structured format.
  if (req.query.error === 'true') {
    requestLogger.error('An intentional error was triggered.');
    return res.status(500).json({ 
        status: 'error', 
        correlationId: req.context.correlationId 
    });
  }

  requestLogger.info('Request processed successfully.');
  res.status(200).json({ 
    message: `Hello, user ${req.context.userId}!`,
    correlationId: req.context.correlationId
  });
});

app.listen(port, () => {
  logger.info(`Node.js service listening on port ${port}`);
});

The last piece of the puzzle was the ELK stack configuration. We used Filebeat to tail the log files (or container stdout) from both the Rust and Node.js services. The Logstash pipeline was configured to parse these JSON logs.

logstash/pipeline/logstash.conf

input {
  # Listen for logs from Filebeat
  beats {
    port => 5044
  }
}

filter {
  # The logs are already in JSON format, so we just need to parse them.
  json {
    source => "message"
    # This moves the parsed fields to the top level of the Logstash event.
    target => "@fields" 
  }
  
  # A common mistake is having mismatched field names. We standardize here.
  # The Rust logs might have `request_id`, while Node logs have `correlationId`.
  if [@fields][correlationId] {
      mutate {
          rename => { "[@fields][correlationId]" => "correlation_id" }
      }
  } else if [@fields][request_id] {
      mutate {
          rename => { "[@fields][request_id]" => "correlation_id" }
      }
  }

  # Parse the timestamp from the log message itself.
  date {
    match => [ "[@fields][timestamp]", "ISO8601" ]
  }
}

output {
  # Send the processed logs to Elasticsearch.
  elasticsearch {
    hosts => ["http://elasticsearch:9200"]
    # We use the correlation_id to help with routing if needed, but mainly for querying.
    index => "microservice-logs-%{+YYYY.MM.dd}"
  }
}

With this setup, a simple query in Kibana like correlation_id: "some-uuid-v4-string" instantly returned all log entries for a single request, in order, from both the Rust middleware and the Node.js service. The latency bottleneck was gone. Benchmarks using k6 showed the p99 latency for the auth step dropping from ~80ms under load to less than 1ms. The gateway CPU usage became stable and predictable.

This architecture is not without its own set of trade-offs. The gRPC middleware is now a critical component; if it goes down, the entire API fails. This necessitates running multiple instances of the Rust service and ensuring proper health checks and load balancing, adding operational complexity. Furthermore, while the performance gain was substantial, the development workflow is now more involved, requiring knowledge of Rust, Protobuf, and gRPC in addition to the existing Node.js stack. The applicability of this pattern is limited to scenarios where a performance-critical, CPU-bound task at the edge is genuinely proven to be a bottleneck for an otherwise efficient system. For standard validation logic, Tyk’s built-in policies or simpler middleware are often sufficient. The future path involves moving the configuration secrets to HashiCorp Vault and implementing a more sophisticated, adaptive sampling strategy for logging in the Rust service to reduce log volume at extreme scale without losing visibility into error conditions.

Tyk Node.js gRPC Observability Rust ELK Stack

Integrating a Legacy C++ Engine with a Modern SSG Workflow via Azure Functions and Consul

2023-10-27 Architecture

Consul C++ SSG Phoenix Azure Functions

Constructing a Resilient MySQL to Hadoop CDC Pipeline with Docker and Debezium

2023-10-27 Data Engineering

Hadoop Kafka Docker MySQL CDC Debezium