Constructing a Real-Time JWT Anomaly Detection Pipeline Using eBPF and an Astro Recoil Frontend

Security Go JWT Astro eBPF Web API Recoil

Observability

Word Count: 3k

Read Times: 18 Min

Our production environment relies on a suite of Go-based Web APIs, serving a modern Single Page Application built with a framework that leans heavily on React. Authentication is standardized on JWT. Recently, we faced a series of incidents that our existing infrastructure—a combination of a WAF, API gateway rate-limiting, and application-level logging—was ill-equipped to handle. The attacks weren’t simple brute-force attempts; they were subtle, probing for weaknesses in our token validation logic, including potential replay attacks and credential stuffing against our /auth/refresh endpoint. The core problem was a lack of visibility. Application logs could tell us a token validation failed, but they couldn’t provide the high-frequency, low-latency context needed to distinguish a sophisticated attack from benign user errors. Instrumenting the Go application with more detailed logging was a non-starter; the performance overhead was unacceptable, and it would require a risky redeployment of critical services.

The initial concept was to move the observation point out of the application and into the kernel. If we could inspect network packets destined for our API servers at the kernel level, we could gain zero-instrumentation visibility into every single authentication attempt without touching the Go application’s code. This would be incredibly low-overhead and give us raw, unfiltered data. eBPF was the only technology that fit this profile. The plan crystallized: an eBPF program would capture JWT-related traffic, a user-space Go service would consume these raw events to perform real-time anomaly detection, and a WebSocket stream would push alerts to a dedicated security dashboard.

For the technology stack, the choices were driven by pragmatism. The eBPF program itself must be written in a restricted C, but the user-space controller and the API it was protecting were already in Go. The cilium/ebpf library provides first-class Go bindings for loading and interacting with eBPF programs, making it possible to manage the entire backend within a single language ecosystem. For the frontend dashboard, we needed something that could handle real-time data streams efficiently. Astro’s islands architecture was a perfect fit. The majority of the dashboard UI could be rendered to static HTML for fast loads, with only the critical, data-driven components—the event log, alert panels—hydrated as interactive React components. This avoids the performance penalty of a monolithic client-side rendered application. Within these React islands, Recoil was chosen for state management. Its minimalistic, atom-based approach is ideal for propagating state from a WebSocket connection to various UI components without the ceremony of more complex state management libraries.

The eBPF Kernel-Level Packet Inspector

The first step is crafting the eBPF program. The goal here is not to perform full JWT validation in the kernel—that would be far too complex and insecure. Instead, the eBPF program’s sole responsibility is to identify packets that likely contain a JWT, extract minimal metadata (source IP, source port), and push this data to a user-space process via a perf event buffer. We’ll attach this program to the TC (Traffic Control) ingress hook, allowing it to inspect all incoming traffic on the API server’s network interface.

Here is the eBPF program, jwt_monitor.c. In a real-world project, this would be compiled using Clang into an eBPF object file.

// SPDX-License-Identifier: GPL-2.0
// +build ignore

#include <vmlinux.h>
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_endian.h>

// Define a struct to hold the event data sent to user space.
// We keep this minimal to reduce kernel-to-user-space overhead.
struct event {
    __u32 saddr; // Source IPv4 address
    __u16 sport; // Source TCP port
    __u64 timestamp_ns;
};

// Perf event buffer map. This is the channel for sending data to user space.
struct {
    __uint(type, BPF_MAP_TYPE_PERF_EVENT_ARRAY);
    __uint(key_size, sizeof(int));
    __uint(value_size, sizeof(int));
    __uint(max_entries, 128);
} events SEC(".maps");

// A simple helper to search for a substring in a buffer.
// This is a naive implementation and has limitations in eBPF (e.g., bounded loops).
static __always_inline int mem_search(const char *haystack, int haystack_len, const char *needle, int needle_len) {
    if (haystack_len < needle_len) {
        return 0;
    }
    for (int i = 0; i <= haystack_len - needle_len; i++) {
        int found = 1;
        for (int j = 0; j < needle_len; j++) {
            if (haystack[i + j] != needle[j]) {
                found = 0;
                break;
            }
        }
        if (found) {
            return 1;
        }
    }
    return 0;
}

SEC("tc")
int jwt_http_filter(struct __sk_buff *skb) {
    void *data = (void *)(long)skb->data;
    void *data_end = (void *)(long)skb->data_end;

    // We need to parse headers to get to the TCP payload.
    struct ethhdr *eth = data;
    if ((void *)eth + sizeof(*eth) > data_end) {
        return TC_ACT_OK;
    }

    if (eth->h_proto != bpf_htons(ETH_P_IP)) {
        return TC_ACT_OK;
    }

    struct iphdr *ip = data + sizeof(*eth);
    if ((void *)ip + sizeof(*ip) > data_end) {
        return TC_ACT_OK;
    }

    if (ip->protocol != IPPROTO_TCP) {
        return TC_ACT_OK;
    }
    
    // Calculate IP header length
    __u32 ip_hdr_len = ip->ihl * 4;

    struct tcphdr *tcp = (void *)ip + ip_hdr_len;
    if ((void *)tcp + sizeof(*tcp) > data_end) {
        return TC_ACT_OK;
    }
    
    // We only care about traffic to our API port, e.g., 8080.
    if (tcp->dest != bpf_htons(8080)) {
        return TC_ACT_OK;
    }

    // Calculate TCP header length
    __u32 tcp_hdr_len = tcp->doff * 4;
    
    // Point to the beginning of the TCP payload (HTTP request)
    char *payload = (char *)tcp + tcp_hdr_len;
    int payload_len = data_end - (void *)payload;
    
    // The maximum length we can inspect is limited.
    // We check a reasonable portion of the packet for the auth header.
    if (payload_len <= 0) {
        return TC_ACT_OK;
    }

    // The core logic: search for "Authorization: Bearer eyJ".
    // "eyJ" is the Base64Url encoding for `{"alg":...}`, the start of nearly all JWTs.
    // This is a heuristic, not a guarantee, but it's highly effective and fast.
    const char auth_header_prefix[] = "Authorization: Bearer eyJ";
    
    // The verifier has constraints on stack size, so we can't have large constants on the stack.
    // Using a helper function is a common pattern.
    if (mem_search(payload, payload_len, auth_header_prefix, sizeof(auth_header_prefix) - 1)) {
        struct event evt = {};
        evt.saddr = ip->saddr;
        evt.sport = tcp->source;
        evt.timestamp_ns = bpf_ktime_get_ns();

        // Submit the event to the perf buffer for user space to read.
        bpf_perf_event_output(skb, &events, BPF_F_CURRENT_CPU, &evt, sizeof(evt));
    }

    return TC_ACT_OK; // Always let the packet pass through
}

char LICENSE[] SEC("license") = "GPL";

The pitfall here is the mem_search function. In a real production system, this naive search is inefficient and could be fooled by packet fragmentation. A more robust solution might involve kprobes on kernel TCP receive functions to work with reassembled data, but that adds significant complexity. For this use case, where HTTP requests with JWTs are typically small and contained within a single packet, this TC-based approach is a pragmatic starting point.

The Go User-Space Anomaly Detector

With the eBPF program ready, the next piece is the Go application that loads it, listens for events, and applies the detection logic. This controller runs alongside our main Web API.

// main.go
package main

import (
	"bytes"
	"encoding/binary"
	"errors"
	"log"
	"net"
	"os"
	"os/signal"
	"sync"
	"syscall"
	"time"

	"github.com/cilium/ebpf"
	"github.com/cilium/ebpf/link"
	"github.com/cilium/ebpf/perf"
	"golang.org/x/time/rate"
)

// The Go struct must match the C struct layout exactly.
type Event struct {
	Saddr      uint32
	Sport      uint16
	_          [2]byte // Padding to align
	TimestampNs uint64
}

// Holds state for anomaly detection logic.
// A real-world project would use a more sophisticated storage like Redis
// to share state across multiple detector instances.
type AnomalyDetector struct {
	mu           sync.Mutex
	ipFailures   map[uint32]*rate.Limiter // Tracks failure rates per IP
	alertChannel chan string
}

func NewAnomalyDetector(alertChannel chan string) *AnomalyDetector {
	return &AnomalyDetector{
		ipFailures:   make(map[uint32]*rate.Limiter),
		alertChannel: alertChannel,
	}
}

// This is where the core detection logic resides.
// For this example, we detect high-frequency requests from a single IP.
func (ad *AnomalyDetector) ProcessEvent(event Event) {
	ad.mu.Lock()
	defer ad.mu.Unlock()

	limiter, exists := ad.ipFailures[event.Saddr]
	if !exists {
		// New IP seen: Allow 10 events per second, with a burst of 20.
		limiter = rate.NewLimiter(rate.Limit(10), 20)
		ad.ipFailures[event.Saddr] = limiter
	}

	if !limiter.Allow() {
		// This IP has exceeded the rate limit.
		// In a real system, you would check if the token validation actually failed
		// by correlating with application logs. Here, we just generate an alert.
		ip := net.IP(binary.BigEndian.AppendUint32(nil, event.Saddr)).String()
		alertMsg := "High-frequency JWT usage detected from IP: " + ip
		log.Println("ALERT:", alertMsg)

		// Non-blocking send to the alert channel for WebSocket broadcasting.
		select {
		case ad.alertChannel <- alertMsg:
		default:
			log.Println("Alert channel is full, dropping alert.")
		}
	}
}

func main() {
	// A channel to send alerts to the WebSocket hub.
	alertChannel := make(chan string, 100)
	detector := NewAnomalyDetector(alertChannel)
	
	// Start the WebSocket server in a separate goroutine.
	hub := newHub()
	go hub.run()
	go serveWs(hub, "/ws")

	// Goroutine to forward alerts from detector to WebSocket hub.
	go func() {
		for alert := range alertChannel {
			hub.broadcast <- []byte(alert)
		}
	}()

	stop := make(chan os.Signal, 1)
	signal.Notify(stop, syscall.SIGTERM, syscall.SIGINT)

	// Load the compiled eBPF object file.
	spec, err := ebpf.LoadCollectionSpec("jwt_monitor.o")
	if err != nil {
		log.Fatalf("could not load BPF spec: %v", err)
	}

	coll, err := ebpf.NewCollection(spec)
	if err != nil {
		log.Fatalf("could not create BPF collection: %v", err)
	}
	defer coll.Close()

	// Get the eBPF program and map from the collection.
	prog := coll.Programs["jwt_http_filter"]
	if prog == nil {
		log.Fatal("BPF program 'jwt_http_filter' not found")
	}
	eventsMap := coll.Maps["events"]
	if eventsMap == nil {
		log.Fatal("BPF map 'events' not found")
	}

	// Attach the eBPF program to the network interface.
	// Replace "eth0" with the actual interface of your server.
	iface, err := net.InterfaceByName("eth0")
	if err != nil {
		log.Fatalf("could not get interface: %v", err)
	}

	// Attach program to TC ingress hook.
	l, err := link.AttachTCX(link.TCXOptions{
		Program:   prog,
		Attach:    ebpf.AttachTCIngress,
		Interface: iface.Index,
	})
	if err != nil {
		log.Fatalf("could not attach TC program: %v", err)
	}
	defer l.Close()

	log.Printf("eBPF program attached to interface %s. Waiting for events...", iface.Name)

	// Set up the perf event reader.
	rd, err := perf.NewReader(eventsMap, os.Getpagesize()*4)
	if err != nil {
		log.Fatalf("could not create perf event reader: %v", err)
	}
	defer rd.Close()

	go func() {
		<-stop
		rd.Close()
	}()

	var event Event
	for {
		record, err := rd.Read()
		if err != nil {
			if errors.Is(err, perf.ErrClosed) {
				log.Println("Perf reader closed.")
				return
			}
			log.Printf("error reading from perf reader: %v", err)
			continue
		}

		if record.LostSamples > 0 {
			log.Printf("lost %d samples", record.LostSamples)
			continue
		}

		if err := binary.Read(bytes.NewBuffer(record.RawSample), binary.LittleEndian, &event); err != nil {
			log.Printf("error parsing event: %v", err)
			continue
		}
		
		detector.ProcessEvent(event)
	}
}

// NOTE: The WebSocket hub and server implementation (newHub, hub.run, serveWs)
// is omitted here for brevity, but it is a standard Go WebSocket setup
// using a library like "gorilla/websocket".

The anomaly detector logic here is basic, using a simple rate limiter per IP. A production system would be far more sophisticated, potentially tracking unique token identifiers (the jti claim) to detect replay attacks, or correlating source IPs with known malicious networks. The key is that this logic lives entirely in user space, where we have the flexibility of a full programming language and access to external data sources, while the kernel component remains simple, fast, and secure.

The Astro and Recoil Real-Time Dashboard

The final piece is the frontend. The Astro project structure allows us to build a mostly static UI, with a single interactive “island” for handling the live data.

File structure:

src/
├── components/
│   ├── EventLog.tsx       # The React component for the log
│   └── ThreatPanel.tsx    # Another React component for aggregated stats
├── layouts/
│   └── Layout.astro
├── pages/
│   └── dashboard.astro
└── store/
    └── atoms.ts         # Recoil state definitions

Recoil State (src/store/atoms.ts):

import { atom } from 'recoil';

export interface SecurityEvent {
  timestamp: string;
  message: string;
  id: number;
}

export const eventsState = atom<SecurityEvent[]>({
  key: 'eventsState',
  default: [],
});

// An example of derived state or another atom for aggregation.
export const ipThreatState = atom<Record<string, number>>({
  key: 'ipThreatState',
  default: {},
});

The Main Astro Page (src/pages/dashboard.astro):

This page sets up the static layout and includes the EventLog component as a client-side hydrated island.

---
import Layout from '../layouts/Layout.astro';
import EventLog from '../components/EventLog';
---

<Layout title="JWT Anomaly Detection Dashboard">
  <main class="container mx-auto p-4">
    <h1 class="text-2xl font-bold mb-4">Real-Time Security Events</h1>
    
    <div class="grid grid-cols-1 md:grid-cols-3 gap-4">
      <div class="md:col-span-2">
        <!-- This component will be interactive on the client -->
        <EventLog client:load />
      </div>
      <div>
        <!-- A placeholder for another potential panel -->
        <div class="bg-gray-800 p-4 rounded-lg shadow">
          <h2 class="text-xl font-semibold mb-2">Threat Overview</h2>
          <p>Threat details will appear here.</p>
        </div>
      </div>
    </div>
  </main>
</Layout>

The Interactive React Component (src/components/EventLog.tsx):

This is the core of the interactive UI. It establishes the WebSocket connection and uses Recoil to manage the state.

import React, { useEffect, useState } from 'react';
import { RecoilRoot, useRecoilState } from 'recoil';
import { eventsState, SecurityEvent } from '../store/atoms';

const MAX_EVENTS = 100; // Keep the log from growing indefinitely

function EventLogContent() {
  const [events, setEvents] = useRecoilState(eventsState);
  const [connectionStatus, setConnectionStatus] = useState('Connecting...');

  useEffect(() => {
    // A common mistake is not handling WebSocket reconnections.
    // In production, a more robust solution with exponential backoff is needed.
    const ws = new WebSocket('ws://localhost:8080/ws');

    ws.onopen = () => {
      setConnectionStatus('Connected');
    };

    ws.onmessage = (event) => {
      const message = event.data;
      setEvents((oldEvents) => {
        const newEvent: SecurityEvent = {
          id: Date.now(),
          timestamp: new Date().toISOString(),
          message: message,
        };
        // Prepend new event and slice to maintain max length
        const updatedEvents = [newEvent, ...oldEvents];
        return updatedEvents.slice(0, MAX_EVENTS);
      });
    };

    ws.onclose = () => {
      setConnectionStatus('Disconnected. Attempting to reconnect...');
      // Simple retry logic
      setTimeout(() => {
        // This would trigger a re-render and re-run of useEffect in a more complex setup
      }, 5000);
    };
    
    ws.onerror = () => {
        setConnectionStatus('Connection Error');
    };

    // Cleanup function to close the socket when the component unmounts.
    return () => {
      ws.close();
    };
  }, []); // Empty dependency array ensures this runs only once on mount.

  return (
    <div className="bg-gray-900 text-white p-4 rounded-lg shadow h-[600px] flex flex-col">
      <div className="flex justify-between items-center mb-2">
        <h2 className="text-xl font-semibold">Live Event Stream</h2>
        <span className="text-sm text-gray-400">{connectionStatus}</span>
      </div>
      <div className="font-mono text-sm overflow-y-auto flex-grow">
        {events.length === 0 ? (
          <p className="text-gray-500">Awaiting events...</p>
        ) : (
          events.map((event) => (
            <div key={event.id} className="border-b border-gray-700 py-1">
              <span className="text-cyan-400 mr-2">{event.timestamp}</span>
              <span>{event.message}</span>
            </div>
          ))
        )}
      </div>
    </div>
  );
}

// The component exported to Astro must wrap its content in RecoilRoot.
export default function EventLog() {
  return (
    <RecoilRoot>
      <EventLogContent />
    </RecoilRoot>
  );
}

The entire system flow can be visualized as follows:

sequenceDiagram
    participant C as Client
    participant K as Kernel (eBPF)
    participant U as Go User-Space Detector
    participant W as WebSocket Hub
    participant F as Astro/Recoil Frontend

    C->>+K: HTTP Request with JWT
    K->>K: TC hook fires, eBPF program runs
    K-->>+U: Sends event via perf buffer
    U->>U: ProcessEvent() runs anomaly logic
    Note right of U: limiter.Allow() fails
    U->>+W: alertChannel <- "High-frequency..."
    W->>+F: Broadcasts alert via WebSocket
    F->>F: Recoil state is updated
    F-->>F: React component re-renders with new event
    K-->>C: Packet continues to actual Web API

The solution provides a highly performant, low-touch observability layer for a critical part of our authentication flow. The overhead on the production API server is negligible, as the eBPF program is hyper-efficient and the Go detector runs as a separate process. The frontend remains lightweight and responsive, demonstrating the power of Astro’s island architecture for specialized, data-intensive UIs.

The current implementation, however, has clear boundaries. The eBPF program’s packet parsing is naive and only inspects unencrypted traffic; it would be useless against a TLS-terminated service unless attached to a hook point after decryption, such as with kprobes on OpenSSL functions, which is a significant jump in complexity and fragility. The anomaly detection logic is stateful and local to a single detector instance, meaning it cannot scale horizontally or survive a restart without losing its state. A more robust architecture would replace the in-memory state with a stream processing platform like Kafka and Flink, allowing for durable, distributed state management and much more sophisticated detection algorithms. Finally, the system is purely for detection, not prevention. The logical next step is to integrate this pipeline with our API gateway or a firewall controller to automatically block IPs that are identified as malicious, closing the loop from detection to response.

Security Go JWT Astro eBPF Web API Recoil

Implementing a Resilient Swift WebSocket Architecture on Docker Swarm with Externalized Session State

2023-10-27 Backend Development

Redis Swift Docker Swarm WebSockets High Availability

Building an End-to-End Observability Pipeline with Astro Rust Sentry and GitLab CI

2023-10-27 Observability

GitLab CI/CD Rust Astro Dgraph Sentry