Implementing eBPF-Based Kernel Observability in AWS Lambda Environments

Cloud Native

Word Count: 3.2k

Read Times: 19 Min

The black box nature of AWS Lambda is both its greatest strength and a significant operational challenge. We had a Python Lambda function, part of a data processing pipeline, that began exhibiting unpredictable latency spikes and occasional timeouts. Standard CloudWatch metrics showed increased duration but gave no clue as to the root cause. The function’s core responsibility was to fetch data from an external API, perform some transformations, and write the results to a NoSQL store. The code itself hadn’t changed, leading us to suspect environmental factors—network issues with the external API, DNS resolution delays, or even noisy neighbors on the underlying infrastructure. Instrumenting the Python code with more logging or custom metrics was our first thought, but that only tells you what the application thinks is happening, not what the system is actually doing. We needed to get deeper, to the level of system calls and network packets, without modifying and redeploying the application code under investigation.

Our initial concept was to inject an agent to capture this low-level telemetry. In a traditional EC2 or container environment, this would be straightforward. For Lambda, the only viable injection vector is a Lambda Layer. The agent itself needed to be lightweight and, critically, capable of kernel-level monitoring without requiring a privileged host environment. This immediately brought eBPF to mind. Inspired by how projects like Cilium use eBPF for incredibly efficient networking and observability in Kubernetes, we decided to build a prototype eBPF-based observability agent, packaged as a Lambda Layer, to capture syscalls from our Python function and exfiltrate the data to a durable, high-throughput sink.

The choice of a NoSQL database as the sink was deliberate. eBPF can generate a firehose of structured event data, and we needed a datastore that could handle high-velocity writes without requiring a predefined, rigid schema. Amazon DynamoDB was the obvious choice for its serverless nature, seamless IAM integration, and pay-per-request model, which aligns perfectly with the Lambda execution model. The plan was to have our eBPF agent capture events, batch them in memory, and flush them to a DynamoDB table at the end of each invocation.

This is the log of that build, detailing the technical hurdles, the code that made it work, and the architectural trade-offs we had to make.

The Core Problem: Getting Kernel-Level Visibility

The fundamental challenge is that a Lambda execution environment is not a general-purpose server. We don’t have SSH access, we can’t install kernel modules, and our permissions are tightly controlled. The primary question was whether the default Lambda security context even permits loading and attaching eBPF programs. Modern kernels gate this capability behind CAP_BPF and CAP_SYS_ADMIN. After some experimentation with a custom Lambda runtime that allowed us to poke around the environment, we found that while we couldn’t get host-level privileges, the microVM architecture used by Firecracker provided enough capability within the guest kernel to attach kprobes to syscalls made by our own processes. This was the critical discovery that made the project viable. A common mistake here is to assume Lambda is just a container; it’s a microVM, and that distinction is key.

The architecture settled into three main components:

The eBPF Program (C): A small C program compiled to eBPF bytecode that hooks into specific syscalls (connect, sendto, write).
The User-space Loader (Go): A statically compiled Go application that loads the eBPF bytecode, attaches it to the kernel, reads events from the kernel via a perf buffer, and writes them to DynamoDB.
The Lambda Layer & Wrapper: The mechanism for injecting and executing the Go loader alongside the target Python function.

graph TD
    subgraph AWS Lambda Invocation
        A[Lambda Invocation] --> B{Wrapper Script};
        B --> C[Go eBPF Loader];
        B --> D[Python Handler];
        C -- Attaches Probes --> E[Guest Kernel];
        D -- Makes Syscalls --> E;
        E -- Pushes Events --> F[eBPF Perf Buffer];
        F -- Data Read --> C;
    end
    C -- Batched Writes --> G((DynamoDB));

    style A fill:#900,stroke:#333,stroke-width:2px,color:#fff
    style G fill:#277f3d,stroke:#333,stroke-width:2px,color:#fff

The eBPF Program: The Kernel-Side Probe

The first piece of the puzzle is the eBPF program itself. It needs to be as efficient as possible, as it runs in kernel context for every targeted syscall. We focused on capturing network-related syscalls initially. The code is written in C and compiled using Clang/LLVM.

A critical aspect of production-grade eBPF code is the use of BPF_PROG_TYPE_KPROBE and a perf buffer (BPF_MAP_TYPE_PERF_EVENT_ARRAY) to send data to user space asynchronously. Direct data logging from the kernel is a terrible idea; the perf buffer provides a high-performance, lock-free mechanism to do this.

Here is the source for lambda_probe.c:

// SPDX-License-Identifier: GPL-2.0
// We must use a GPL-compatible license for eBPF programs.

#include <linux/bpf.h>
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_tracing.h>
#include <bpf/bpf_core_read.h>
#include <linux/sched.h>
#include <sys/socket.h>
#include <netinet/in.h>

// Define a common event structure for data sent to user space.
// This structure must be carefully padded to be 64-bit aligned for some architectures.
struct event {
    u64 ts_ns;         // Timestamp in nanoseconds
    u32 pid;           // Process ID
    u32 tid;           // Thread ID
    u32 uid;           // User ID
    int ret;           // Return value of the syscall
    char comm[TASK_COMM_LEN]; // Process name
    u16 event_type;    // Differentiates between syscalls
    u16 data_len;      // Length of additional data
    u8 data[128];      // Generic data payload (e.g., socket info)
};

// The perf buffer map. This is the channel to user space.
// The BPF_MAP_TYPE_PERF_EVENT_ARRAY is a special map type that allows sending
// data to a perf event buffer accessible from user space.
struct {
    __uint(type, BPF_MAP_TYPE_PERF_EVENT_ARRAY);
    __uint(key_size, sizeof(int));
    __uint(value_size, sizeof(u32));
    __uint(max_entries, 1024);
} perf_events SEC(".maps");

// Helper function to fill common event data
static __always_inline void fill_common_data(struct event *evt) {
    evt->ts_ns = bpf_ktime_get_ns();
    u64 id = bpf_get_current_pid_tgid();
    evt->pid = id >> 32;
    evt->tid = (u32)id;
    evt->uid = bpf_get_current_uid_gid() & 0xFFFFFFFF;
    bpf_get_current_comm(&evt->comm, sizeof(evt->comm));
}


// Kprobe attached to the entry of the connect syscall
SEC("kprobe/__x64_sys_connect")
int BPF_KPROBE(trace_connect_entry, int fd, struct sockaddr *uaddr, int addrlen) {
    // Only trace events from our target PID if specified (set via user-space)
    // This is a placeholder; actual PID filtering is more complex.
    // In a real project, the user-space agent would update a map with the target PID.

    struct event evt = {};
    fill_common_data(&evt);
    evt.event_type = 1; // 1 = connect

    // Safely read the address family from the user-space sockaddr pointer.
    u16 family = 0;
    bpf_probe_read_user(&family, sizeof(family), &uaddr->sa_family);

    if (family == AF_INET) {
        struct sockaddr_in sa4 = {};
        bpf_probe_read_user(&sa4, sizeof(sa4), (struct sockaddr_in *)uaddr);
        bpf_probe_read(&evt.data, sizeof(sa4.sin_addr.s_addr) + sizeof(sa4.sin_port), &sa4.sin_addr);
        evt.data_len = sizeof(sa4.sin_addr.s_addr) + sizeof(sa4.sin_port);
    } else if (family == AF_INET6) {
        // Handle IPv6 similarly if needed
    }

    // Submit the event to the perf buffer for the user-space program to read.
    bpf_perf_event_output(ctx, &perf_events, BPF_F_CURRENT_CPU, &evt, sizeof(evt));
    return 0;
}

char LICENSE[] SEC("license") = "GPL";

A common pitfall is attempting to read large amounts of data from user-space pointers inside the eBPF program. bpf_probe_read_user is a helper for this, but it must be used judiciously. The kernel verifier will reject programs that try to read too much or have unbounded loops. Our program is simple: grab the process info and the destination IP/port from the sockaddr struct.

The User-Space Loader: The Go Agent

With the eBPF bytecode ready, we needed a user-space process to load and manage it. We chose Go for its excellent cilium/ebpf library, which abstracts away many of the complexities of interacting with the bpf syscall, and its ability to produce a single, statically linked binary, which is perfect for a Lambda Layer.

This Go application has several responsibilities:

Load the eBPF object file: Parse the ELF file containing our compiled probe.
Attach the probes: Find the kprobe/__x64_sys_connect program and attach it to the kernel.
Listen to the Perf Buffer: Open the perf event buffer and create a reader to receive events from the kernel.
Process and Batch Events: Unmarshal the binary data into Go structs, enrich it with Lambda-specific metadata, and hold it in a buffer.
Exfiltrate to DynamoDB: When the function is about to exit, or the buffer is full, perform a BatchWriteItem call to DynamoDB.

Here’s a significant portion of the main.go file for the loader:

package main

import (
	"bytes"
	"context"
	"encoding/binary"
	"encoding/json"
	"errors"
	"fmt"
	"log"
	"os"
	"os/signal"
	"strconv"
	"strings"
	"syscall"
	"time"

	"github.com/aws/aws-lambda-go/lambda"
	"github.com/aws/aws-sdk-go-v2/aws"
	"github.com/aws/aws-sdk-go-v2/config"
	"github.com/aws/aws-sdk-go-v2/service/dynamodb"
	"github.com/aws/aws-sdk-go-v2/service/dynamodb/types"
	"github.com/cilium/ebpf"
	"github.com/cilium/ebpf/link"
	"github.com/cilium/ebpf/perf"
)

// Event mirrors the C struct in the eBPF program.
// The field alignment and size must match exactly.
type Event struct {
	TsNs      uint64
	Pid       uint32
	Tid       uint32
	Uid       uint32
	Ret       int32
	Comm      [16]byte
	EventType uint16
	DataLen   uint16
	Data      [128]byte
}

var (
	// These are populated by the build script
	awsRegion       = os.Getenv("AWS_REGION")
	dynamoTableName = os.Getenv("DYNAMO_TABLE_NAME")
	awsRequestID    = os.Getenv("AWS_LAMBDA_LOG_STREAM_NAME") // A trick to get a unique ID
)

type TelemetryEvent struct {
	ExecutionID string `json:"execution_id"`
	TimestampNano int64  `json:"timestamp_nano"`
	EventType     string `json:"event_type"`
	ProcessName   string `json:"process_name"`
	PID           uint32 `json:"pid"`
	TID           uint32 `json:"tid"`
	UID           uint32 `json:"uid"`
	Details       string `json:"details"`
}

var eventBuffer []TelemetryEvent
var ddbClient *dynamodb.Client

func main() {
	log.Println("Starting eBPF Telemetry Agent...")

	if dynamoTableName == "" {
		log.Fatal("DYNAMO_TABLE_NAME environment variable not set.")
	}

	// Load AWS config
	cfg, err := config.LoadDefaultConfig(context.TODO(), config.WithRegion(awsRegion))
	if err != nil {
		log.Fatalf("unable to load SDK config, %v", err)
	}
	ddbClient = dynamodb.NewFromConfig(cfg)
	eventBuffer = make([]TelemetryEvent, 0, 100)

	// Load the compiled eBPF object file.
	// The path assumes it's in the same directory inside the Lambda layer.
	spec, err := ebpf.LoadCollectionSpec("/opt/lambda_probe.o")
	if err != nil {
		log.Fatalf("failed to load eBPF spec: %v", err)
	}

	coll, err := ebpf.NewCollection(spec)
	if err != nil {
		log.Fatalf("failed to create eBPF collection: %v", err)
	}
	defer coll.Close()

	// Attach the kprobe.
	kp, err := link.Kprobe("__x64_sys_connect", coll.Programs["trace_connect_entry"], nil)
	if err != nil {
		log.Fatalf("failed to attach kprobe: %v", err)
	}
	defer kp.Close()

	log.Println("eBPF probe attached successfully.")

	// Open a perf event reader from user space.
	rd, err := perf.NewReader(coll.Maps["perf_events"], os.Getpagesize()*4)
	if err != nil {
		log.Fatalf("failed to create perf reader: %v", err)
	}
	defer rd.Close()

	// Set up a signal handler to flush data on termination.
	// This is critical for the Lambda environment.
	sigChan := make(chan os.Signal, 1)
	signal.Notify(sigChan, syscall.SIGTERM, syscall.SIGINT)

	go func() {
		<-sigChan
		log.Println("Received termination signal, flushing remaining events...")
		flushEventsToDynamoDB()
		os.Exit(0)
	}()
	
	// Main event loop
	for {
		record, err := rd.Read()
		if err != nil {
			if errors.Is(err, perf.ErrClosed) {
				log.Println("Perf reader closed, exiting.")
				return
			}
			log.Printf("Error reading from perf buffer: %v", err)
			continue
		}

		if record.LostSamples > 0 {
			log.Printf("Lost %d samples in perf buffer", record.LostSamples)
		}

		var event Event
		if err := binary.Read(bytes.NewReader(record.RawSample), binary.LittleEndian, &event); err != nil {
			log.Printf("Error parsing event data: %v", err)
			continue
		}

		processEvent(event)
	}
}

// This function will be wrapped to run as a Lambda Extension or a background process
// For simplicity, we assume a wrapper script handles its lifecycle.
// In a real implementation, you would use the Lambda Extensions API for graceful shutdown.

func processEvent(event Event) {
	te := TelemetryEvent{
		ExecutionID: awsRequestID,
		TimestampNano: int64(event.TsNs),
		PID:           event.Pid,
		TID:           event.Tid,
		UID:           event.Uid,
		ProcessName:   string(bytes.Trim(event.Comm[:], "\x00")),
	}

	switch event.EventType {
	case 1: // connect
		te.EventType = "connect"
		// This parsing is simplified. In production, you'd handle IPv4/IPv6 properly.
		ip := fmt.Sprintf("%d.%d.%d.%d", event.Data[4], event.Data[5], event.Data[6], event.Data[7])
		port := binary.BigEndian.Uint16(event.Data[2:4])
		te.Details = fmt.Sprintf("dst=%s:%d", ip, port)
	default:
		te.EventType = "unknown"
	}

	eventBuffer = append(eventBuffer, te)
	
	// Flush when buffer is full.
	if len(eventBuffer) >= 25 { // DynamoDB BatchWriteItem has a limit of 25 items
		flushEventsToDynamoDB()
	}
}

func flushEventsToDynamoDB() {
	if len(eventBuffer) == 0 {
		return
	}

	writeRequests := make([]types.WriteRequest, len(eventBuffer))
	for i, event := range eventBuffer {
		// Marshal the event details into a map[string]types.AttributeValue
		item, err := marshalEvent(event)
		if err != nil {
			log.Printf("Failed to marshal event for DynamoDB: %v", err)
			continue
		}
		writeRequests[i] = types.WriteRequest{
			PutRequest: &types.PutRequest{
				Item: item,
			},
		}
	}
	
	input := &dynamodb.BatchWriteItemInput{
		RequestItems: map[string][]types.WriteRequest{
			dynamoTableName: writeRequests,
		},
	}

	_, err := ddbClient.BatchWriteItem(context.TODO(), input)
	if err != nil {
		log.Printf("Failed to write batch to DynamoDB: %v", err)
		// In a real-world project, you'd implement retry logic for unprocessed items.
	} else {
		log.Printf("Successfully flushed %d events to DynamoDB.", len(eventBuffer))
	}
	
	// Clear the buffer
	eventBuffer = eventBuffer[:0]
}

// marshalEvent is a helper to convert our struct to the DynamoDB attribute map format
func marshalEvent(event TelemetryEvent) (map[string]types.AttributeValue, error) {
    // Implementation omitted for brevity but would use reflection or manual mapping
	// to create map[string]types.AttributeValue from the TelemetryEvent struct.
	return map[string]types.AttributeValue{
		"ExecutionID":   &types.AttributeValueMemberS{Value: event.ExecutionID},
		"TimestampNano": &types.AttributeValueMemberN{Value: strconv.FormatInt(event.TimestampNano, 10)},
		"EventType":     &types.AttributeValueMemberS{Value: event.EventType},
		"ProcessName":   &types.AttributeValueMemberS{Value: event.ProcessName},
		"PID":           &types.AttributeValueMemberN{Value: fmt.Sprintf("%d", event.PID)},
		"Details":       &types.AttributeValueMemberS{Value: event.Details},
	}, nil
}

Error handling here is critical. The connection to the perf buffer can be lost, and writes to DynamoDB can fail or be throttled. A production-ready agent would need robust retry logic with exponential backoff for the BatchWriteItem call and handle unprocessed items returned by the API.

Packaging and Deployment

Getting the eBPF program and the Go loader into a Lambda function requires packaging them as a Lambda Layer. This involves a multi-step build process.

Problem: eBPF programs must be compiled against the kernel headers of the target machine. The Lambda execution environment runs on a specific version of Amazon Linux, but we don’t have the headers available in the build environment.
Solution: We spun up an EC2 instance using the same Amazon Linux 2 AMI that Lambda uses. On that instance, we installed kernel-devel, clang, and llvm, which provided the necessary headers to compile our lambda_probe.c into lambda_probe.o.

The Makefile for the build process looked something like this:

.PHONY: all clean layer

# Build variables
GO_BINARY_NAME := ebpf-agent
LAYER_DIR := build/layer
ZIP_FILE := build/ebpf-layer.zip

all: layer

# Compile the eBPF C code into an object file
lambda_probe.o: lambda_probe.c
	clang -O2 -g -target bpf -c $< -o $@

# Build the Go user-space loader
$(GO_BINARY_NAME): main.go
	GOOS=linux GOARCH=amd64 CGO_ENABLED=0 go build -o $(GO_BINARY_NAME) main.go

# Package everything into a zip file for the Lambda Layer
layer: lambda_probe.o $(GO_BINARY_NAME)
	mkdir -p $(LAYER_DIR)/bin
	cp $(GO_BINARY_NAME) $(LAYER_DIR)/bin/
	cp lambda_probe.o $(LAYER_DIR)/
	cd $(LAYER_DIR) && zip -r ../../$(ZIP_FILE) .

clean:
	rm -f lambda_probe.o $(GO_BINARY_NAME)
	rm -rf build

To use this layer, we need a wrapper script. Lambda allows specifying a custom runtime, which can be a script that executes our agent and then the original handler. A common mistake is to make the agent a blocking process. The agent must be launched in the background.

wrapper.sh:

#!/bin/bash

# Set environment variables for the agent
export DYNAMO_TABLE_NAME="LambdaTelemetry"
# The AWS_LAMBDA_LOG_STREAM_NAME provides a reasonably unique ID per container instance.
# Example: 2023/10/27/[$LATEST]abcdef123456

# Start our eBPF agent in the background
# The agent's logs will go to CloudWatch along with the function's logs.
/opt/bin/ebpf-agent &

# Capture the PID of our agent
AGENT_PID=$!

# Execute the original Lambda handler
exec /usr/bin/python3 -m awslambdaric "$@"

# When the handler exits, the Lambda environment will be frozen/terminated.
# The SIGTERM handler in our Go program is crucial for flushing final events.

Finally, the AWS SAM template.yaml ties it all together:

AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Description: eBPF Observability on Lambda Demo

Resources:
  LambdaTelemetryTable:
    Type: AWS::DynamoDB::Table
    Properties:
      TableName: LambdaTelemetry
      AttributeDefinitions:
        - AttributeName: ExecutionID
          AttributeType: S
        - AttributeName: TimestampNano
          AttributeType: N
      KeySchema:
        - AttributeName: ExecutionID
          KeyType: HASH
        - AttributeName: TimestampNano
          KeyType: RANGE
      BillingMode: PAY_PER_REQUEST

  EbpfObservabilityLayer:
    Type: AWS::Serverless::LayerVersion
    Properties:
      LayerName: ebpf-observability-layer
      Description: Layer with eBPF agent and probe
      ContentUri: build/ebpf-layer.zip
      CompatibleRuntimes:
        - provided.al2

  MonitoredFunction:
    Type: AWS::Serverless::Function
    Properties:
      FunctionName: monitored-python-function
      Handler: app.handler # The python handler
      Runtime: provided.al2 # Use the provided runtime to use our wrapper
      CodeUri: ./src_py/
      Timeout: 15
      MemorySize: 256
      Layers:
        - !Ref EbpfObservabilityLayer
      Policies:
        - DynamoDBCrudPolicy:
            TableName: !Ref LambdaTelemetryTable
      Architectures:
        - x86_64
      Environment:
        Variables:
          DYNAMO_TABLE_NAME: !Ref LambdaTelemetryTable

The key configuration is setting the Runtime to provided.al2 and defining the handler to be our wrapper script. The Lambda service will start our wrapper, which in turn starts our agent and the Python runtime.

The Result and Lingering Questions

After deploying this stack, invoking the MonitoredFunction resulted in records appearing in our LambdaTelemetry DynamoDB table. For a single invocation that made an HTTPS request, we could see the connect syscall with the destination IP and port, correctly attributed to the python3 process within that execution environment.

Querying DynamoDB for a specific ExecutionID gave us a complete, ordered trace of all the network connections that function made during its lifetime, something completely invisible through standard metrics. We had successfully built a proof-of-concept for zero-instrumentation, kernel-level observability for AWS Lambda.

However, the solution is not without its limitations and trade-offs. The reliance on undocumented capabilities (CAP_BPF availability) means this could break with any future AWS Lambda runtime update. It adds a non-trivial latency to cold starts as the Go agent needs to initialize and attach the probes. Furthermore, the volume of data generated by tracing syscalls can be immense, leading to significant DynamoDB costs if not carefully filtered at the source within the eBPF program itself. The current implementation is also asynchronous; a sudden termination of the Lambda container could result in the loss of the last batch of events held in the agent’s memory buffer. A more robust solution would leverage the Lambda Extensions API for a more graceful shutdown sequence, ensuring the final batch of telemetry is always flushed.

Observability eBPF Cilium AWS Lambda NoSQL

Implementing End-to-End Multi-Tenancy Isolation from UI Styling to Database with Cilium and ScyllaDB

2023-10-27 Cloud Native

ScyllaDB APISIX Kubernetes Multi-tenancy eBPF Cilium CSS Modules

Constructing a Resilient Audit Trail for WebAuthn Events from Docker Swarm to SQL Server via Fluentd

2023-10-27 Observability

Fluentd Security WebAuthn Docker Swarm SQL Server Logging