Building a High-Performance Scientific Computing API with a Zig Microservice and a Comptime-Generated Query Layer

Backend Development

Word Count: 2.8k

Read Times: 17 Min

The existing Python-based simulation service was grinding to a halt. Our core product, a Web API for running N-body particle simulations, relied heavily on NumPy and SciPy. While excellent for prototyping and general analysis, the primary simulation endpoint, which calculates gravitational interactions in a tight loop, was becoming a severe performance bottleneck under increased concurrent load. Profiling confirmed our suspicions: over 95% of CPU time on worker nodes was spent inside this single Python function. Initial attempts at optimization—vectorization with NumPy and even just-in-time compilation with Numba—yielded only marginal gains, not the order-of-magnitude improvement required. The fundamental problem was the overhead of the Python interpreter and the limitations of automatic memory management in a CPU-bound, hot-path scenario. A full rewrite of the main service was out of the question due to the sheer volume of business logic and complex scientific analysis routines deeply integrated with the SciPy ecosystem. The only viable path forward was surgical optimization: extracting the computational core into a dedicated, high-performance microservice.

Our technology selection process was brief but intense. Rust was a strong contender due to its safety guarantees and mature ecosystem. However, the learning curve for the team and the complexity of its borrow checker were concerns for a rapid, focused implementation. We settled on Zig. Its promise of simplicity, C-like directness, manual memory management for predictable performance, and—most critically—its powerful compile-time execution (comptime) capabilities presented a unique opportunity. We hypothesized that comptime could solve a secondary problem: the lack of a mature Object-Relational Mapper (ORM) in the Zig ecosystem. Instead of writing raw SQL strings, we could use comptime to generate type-safe query builders directly from our data structures, creating a minimal, purpose-built ORM layer. This would give us performance without sacrificing developer ergonomics entirely. The plan was solidified: build a new Zig microservice that handles only the simulation endpoint. It would fetch initial conditions from our shared PostgreSQL database, run the simulation, and call back to the legacy SciPy service for a final, complex analysis step that was not performance-critical.

The Foundation: A Production-Ready Zig Web Service

Before tackling the simulation logic or the database layer, the first step was to scaffold a robust HTTP server in Zig. The ecosystem, while young, has capable libraries. We chose zap, a lightweight and fast HTTP server. The initial setup requires linking necessary libraries and defining the application entry point in build.zig.

// build.zig
const std = @import("std");

pub fn build(b: *std.Build) void {
    const target = b.standardTargetOptions(.{});
    const optimize = b.standardOptimizeOption(.{});

    // Executable for our microservice
    const exe = b.addExecutable(.{
        .name = "simulation-service",
        .root_source_file = .{ .path = "src/main.zig" },
        .target = target,
        .optimize = optimize,
    });

    // Add zap as a dependency
    const zap_dep = b.dependency("zap", .{
        .target = target,
        .optimize = optimize,
    });
    exe.addModule("zap", zap_dep.module("zap"));

    // We need to link against libpq for PostgreSQL access
    exe.linkSystemLibrary("pq");

    b.installArtifact(exe);

    const run_cmd = b.addRunArtifact(exe);
    run_cmd.step.dependOn(b.getInstallStep());

    if (b.args) |args| {
        run_cmd.addArgs(args);
    }

    const run_step = b.step("run", "Run the application");
    run_step.dependOn(&run_cmd.step);
}

This build script defines our executable, adds zap as a module, and crucially, links against the system’s libpq C library. This is the foundation for our PostgreSQL communication.

The main application entry point sets up the server, defines routes, and manages resources like memory allocators and a (yet to be implemented) database connection pool. In a real-world project, configuration shouldn’t be hardcoded. Here, we read essential parameters like the database connection string from environment variables.

// src/main.zig
const std = @import("std");
const zap = @import("zap");
const api = @import("api.zig");
const db = @import("db.zig");

pub fn main() !void {
    // Use a general-purpose allocator for the application lifetime.
    // For per-request allocations, we'll use an arena allocator.
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer _ = gpa.deinit();
    const allocator = gpa.allocator();

    // Load configuration from the environment.
    const port = std.fmt.parseInt(u16, std.os.getenv("PORT") orelse "8080", 10) catch 8080;
    const db_conn_str = std.os.getenv("DATABASE_URL") orelse
        return error.MissingDatabaseURL;

    // In a production system, you'd initialize a proper connection pool.
    // For this example, we'll establish a new connection per request for simplicity,
    // which is NOT recommended for production due to high overhead.
    var db_manager = try db.DBManager.init(allocator, db_conn_str);
    defer db_manager.deinit();

    var listener = try zap.listen(.{ .port = port });
    std.log.info("Server listening on port {d}", .{port});

    // Our main application context to pass to handlers.
    var app_context = api.AppContext{
        .allocator = allocator,
        .db_manager = &db_manager,
    };

    while (true) {
        var conn = try listener.accept();
        // A real server would use async/await and manage multiple connections.
        // Zap's default `accept` is synchronous. For high concurrency, a more
        // complex event loop or threading model is needed.
        _ = try conn.serve(api.router, &app_context, .{ .allocator = allocator });
    }
}

A Comptime-Powered Micro-ORM

The most significant technical hurdle was database interaction. Writing raw SQL strings in Zig is error-prone and lacks type safety. We wanted a way to map our Zig structs to database tables without pulling in a heavy dependency or writing excessive boilerplate. This is where comptime became our core tool.

First, we define our domain model as a plain Zig struct.

// src/models.zig
const std = @import("std");

pub const Particle = struct {
    id: i64,
    pos_x: f64,
    pos_y: f64,
    pos_z: f64,
    vel_x: f64,
    vel_y: f64,
    vel_z: f64,
    mass: f64,
};

The goal is to generate SQL queries like INSERT, SELECT, and UPDATE at compile time based on this struct definition. We created a utility file, query_builder.zig, to house this logic.

// src/query_builder.zig
const std = @import("std");

pub fn createInsertQuery(comptime T: type) []const u8 {
    const schema = @typeInfo(T).Struct;
    var query = "INSERT INTO " ++ @typeName(T) ++ " (";
    var values = ") VALUES (";

    inline for (schema.fields, 0..) |field, i| {
        if (i > 0) {
            query = query ++ ", ";
            values = values ++ ", ";
        }
        query = query ++ field.name;
        values = values ++ "$" ++ std.fmt.comptimePrint("{}", .{i + 1});
    }

    return query ++ values ++ ") RETURNING id;";
}

pub fn createSelectByIdQuery(comptime T: type) []const u8 {
    const query = "SELECT * FROM " ++ @typeName(T) ++ " WHERE id = $1;";
    return query;
}

The magic here is comptime T: type, which makes the function operate on a type known at compile time. @typeInfo(T).Struct allows us to reflect on the struct’s fields, names, and types. The inline for loop iterates through the fields at compile time, building the SQL string piece by piece. The result is a []const u8 (a string literal) embedded directly into the final binary. There is zero runtime overhead for string formatting or reflection.

// Usage example
const models = @import("models.zig");
const insert_sql = createInsertQuery(models.Particle);
// At compile time, insert_sql becomes:
// "INSERT INTO models.Particle (id, pos_x, ..., mass) VALUES ($1, $2, ..., $8) RETURNING id;"

This technique provides a powerful, type-aware foundation for our data layer, forming a “micro-ORM” perfectly tailored to our needs. The actual database communication still requires using the libpq C API, which we wrap in a Zig-friendly way.

// src/db.zig (simplified libpq wrapper)
const std = @import("std");
const c = @cImport({
    @cInclude("libpq-fe.h");
});

const models = @import("models.zig");
const qb = @import("query_builder.zig");

// ... (DBManager struct and init/deinit functions)

pub fn insertParticle(self: *DBManager, particle: *models.Particle) !i64 {
    const allocator = self.allocator;
    const conn = self.getConnection(); // Simplified: should come from a pool

    // This SQL string is generated at compile-time
    const sql = qb.createInsertQuery(models.Particle);

    // Prepare parameters for libpq
    const n_params = @typeInfo(models.Particle).Struct.fields.len;
    var param_values: [n_params]?*const u8 = undefined;
    var param_lengths: [n_params]c_int = undefined;
    var param_formats: [n_params]c_int = undefined;

    // ... code to serialize particle fields into binary format for libpq ...
    // This part is verbose, involving byte swapping (ntoh) and memory management.
    // It's a key area where a mature library would provide value.

    const res = c.PQexecParams(conn, sql.ptr, n_params, null, &param_values, &param_lengths, &param_formats, 1);
    defer c.PQclear(res);

    if (c.PQresultStatus(res) != c.PGRES_TUPLES_OK) {
        const error_msg = c.PQresultErrorMessage(res);
        std.log.err("DB insert failed: {s}", .{std.mem.toSliceConst(error_msg, std.mem.len(error_msg))});
        return error.DBInsertFailed;
    }

    const returned_id_str = c.PQgetvalue(res, 0, 0);
    return std.fmt.parseInt(i64, std.mem.toSliceConst(returned_id_str, std.mem.len(returned_id_str)), 10);
}

The pitfall here is the verbosity of interacting with C APIs like libpq. Managing parameter buffers, converting types, and handling pointers requires careful attention to detail. Our comptime query builder solves the SQL generation problem, but the data binding remains a manual and complex task.

The Computational Core: From Python to Zig

With the database layer in place, we could focus on the simulation logic itself. The original Python code looked something like this:

# The original Python/NumPy bottleneck
import numpy as np

G = 6.67430e-11 # Gravitational constant

def update_velocities(particles, dt):
    num_particles = len(particles)
    accel = np.zeros((num_particles, 3))

    for i in range(num_particles):
        for j in range(num_particles):
            if i == j:
                continue
            
            # Vector from particle i to j
            r_vec = particles[j, :3] - particles[i, :3]
            dist_sq = np.sum(r_vec**2)
            
            # Avoid division by zero, softening factor
            dist_sq_softened = dist_sq + 1e-9
            
            # Force calculation
            force_mag = G * particles[i, 3] * particles[j, 3] / dist_sq_softened
            force_vec = force_mag * r_vec / np.sqrt(dist_sq_softened)
            
            # Update acceleration
            accel[i, :] += force_vec / particles[i, 3]

    # Update velocities based on acceleration
    particles[:, 3:6] += accel * dt

The O(n^2) complexity is inherent to the problem, but the overhead of Python’s loops and NumPy’s function calls for small operations adds up. The Zig implementation is structurally similar but operates directly on memory with no intermediary abstractions.

// src/simulation.zig
const std = @import("std");
const models = @import("models.zig");

const G = 6.67430e-11;
const SOFTENING_FACTOR = 1e-9;

// We use a Struct of Arrays (SoA) layout for better cache performance
// during the simulation loop. The Particle struct from the database is
// an Array of Structs (AoS), so we need to convert.
const ParticleSoA = struct {
    pos_x: []f64,
    pos_y: []f64,
    pos_z: []f64,
    vel_x: []f64,
    vel_y: []f64,
    vel_z: []f64,
    mass: []f64,
    acc_x: []f64,
    acc_y: []f64,
    acc_z: []f64,

    // ... init and deinit methods for memory management ...
};

pub fn runSimulationStep(allocator: std.mem.Allocator, particles_aos: []models.Particle, dt: f64) !void {
    // 1. Convert AoS from database to SoA for computation
    var particles_soa = try ParticleSoA.init(allocator, particles_aos.len);
    defer particles_soa.deinit(allocator);
    // ... code to copy data from AoS to SoA ...

    const num_particles = particles_aos.len;

    // 2. Calculate accelerations (the N^2 loop)
    for (0..num_particles) |i| {
        particles_soa.acc_x[i] = 0.0;
        particles_soa.acc_y[i] = 0.0;
        particles_soa.acc_z[i] = 0.0;

        for (0..num_particles) |j| {
            if (i == j) continue;

            const r_x = particles_soa.pos_x[j] - particles_soa.pos_x[i];
            const r_y = particles_soa.pos_y[j] - particles_soa.pos_y[i];
            const r_z = particles_soa.pos_z[j] - particles_soa.pos_z[i];

            const dist_sq = (r_x * r_x) + (r_y * r_y) + (r_z * r_z);
            const dist_sq_softened = dist_sq + SOFTENING_FACTOR;
            const inv_dist = 1.0 / @sqrt(dist_sq_softened);
            const inv_dist_cubed = inv_dist * inv_dist * inv_dist;

            const force_mag_base = G * particles_soa.mass[j] * inv_dist_cubed;
            
            // Acceleration is F/m_i, so mass[i] cancels out
            particles_soa.acc_x[i] += force_mag_base * r_x;
            particles_soa.acc_y[i] += force_mag_base * r_y;
            particles_soa.acc_z[i] += force_mag_base * r_z;
        }
    }

    // 3. Update velocities and positions (Euler integration)
    for (0..num_particles) |i| {
        particles_soa.vel_x[i] += particles_soa.acc_x[i] * dt;
        particles_soa.vel_y[i] += particles_soa.acc_y[i] * dt;
        particles_soa.vel_z[i] += particles_soa.acc_z[i] * dt;

        particles_soa.pos_x[i] += particles_soa.vel_x[i] * dt;
        particles_soa.pos_y[i] += particles_soa.vel_y[i] * dt;
        particles_soa.pos_z[i] += particles_soa.vel_z[i] * dt;
    }
    
    // 4. Convert SoA back to AoS to update the database
    // ... code to copy data from SoA back to particles_aos ...
}

A critical optimization here is the switch to a Struct of Arrays (SoA) memory layout. When calculating forces, the inner loop repeatedly accesses positions and masses of all other particles. In an SoA layout, all pos_x values are contiguous in memory, then all pos_y, and so on. This is extremely CPU cache-friendly, as fetching pos_x[j] will likely pre-fetch pos_x[j+1], pos_x[j+2], etc., into the cache line. In contrast, the default Array of Structs (AoS) layout fetched from the database would interleave x, y, z, mass... for each particle, leading to frequent cache misses in the inner loop. The cost of converting between AoS and SoA at the beginning and end of the function is far outweighed by the performance gain within the O(n^2) loop.

Tying It All Together: The API Endpoint

The final piece is the API handler that orchestrates these components. It must parse the incoming request, fetch data, run the simulation, and potentially call the legacy SciPy service. We also need robust error handling.

// src/api.zig
const std = @import("std");
const zap = @import("zap");
const db = @import("db.zig");
const sim = @import("simulation.zig");
const models = @import("models.zig");

// ... AppContext struct definition ...

pub const router = zap.router(.{
    .routes = &.{
        .post = &.{
            .{"/simulate"} = handleSimulate,
        },
    },
});

const SimulateRequest = struct {
    simulation_id: []const u8,
    time_step: f64,
    num_steps: i32,
};

fn handleSimulate(conn: *zap.Connection, ctx: *AppContext) !void {
    var arena = std.heap.ArenaAllocator.init(ctx.allocator);
    defer arena.deinit();
    const req_allocator = arena.allocator();

    // 1. Parse incoming JSON request
    const body = try conn.readAll(req_allocator, 1 * 1024 * 1024); // 1MB limit
    var stream = std.json.TokenStream.init(body);
    const request_data = try std.json.parse(SimulateRequest, &stream, .{ .allocator = req_allocator });

    // 2. Fetch initial particle state from database
    // In a real app, this would use the simulation_id to fetch the correct set.
    var particles = try ctx.db_manager.fetchAllParticles(req_allocator);
    defer req_allocator.free(particles);
    
    // 3. Run the simulation loop
    var i: i32 = 0;
    while (i < request_data.num_steps) : (i += 1) {
        try sim.runSimulationStep(req_allocator, particles, request_data.time_step);
    }

    // 4. Call out to legacy Python service for post-processing
    // This is a major architectural decision. We keep complex, non-performance-critical
    // logic in the maintainable Python/SciPy service.
    const analysis_result = try callSciPyAnalysisService(req_allocator, particles);
    defer req_allocator.free(analysis_result);

    // 5. Persist final state
    try ctx.db_manager.updateAllParticles(particles);

    // 6. Send response
    try conn.writeJson(analysis_result, .{});
}

// A simple unit test for the simulation logic
test "particle attraction" {
    const allocator = std.testing.allocator;
    var particles = try allocator.alloc(models.Particle, 2);
    defer allocator.free(particles);

    particles[0] = .{ .id = 1, .pos_x = 0, ..., .mass = 100 };
    particles[1] = .{ .id = 2, .pos_x = 10, ..., .mass = 100 };

    try sim.runSimulationStep(allocator, particles, 1.0);

    // Assert that particle 0 moved in the +X direction towards particle 1
    try std.testing.expect(particles[0].pos_x > 0);
    // Assert that particle 1 moved in the -X direction towards particle 0
    try std.testing.expect(particles[1].pos_x < 10);
}

The final architecture is a polyglot system where each component plays to its strengths. The client, likely a web application bundled with a tool like Rollup, interacts with a main Python API. This API orchestrates the overall workflow, but for the heavy lifting, it delegates to our specialized Zig microservice.

sequenceDiagram
    participant Client as Web Client (Bundled by Rollup)
    participant PythonAPI as Main Python/SciPy API
    participant ZigAPI as Zig Simulation Microservice
    participant PostgreSQL

    Client->>PythonAPI: POST /api/v1/simulations (params)
    PythonAPI->>PostgreSQL: Create simulation record, get ID
    PythonAPI->>ZigAPI: POST /simulate {"simulation_id": "xyz", ...}
    ZigAPI->>PostgreSQL: Fetch initial particle state for "xyz"
    PostgreSQL-->>ZigAPI: Particle Data (AoS)
    Note right of ZigAPI: Converts AoS to SoA,
Runs N^2 simulation loop
    ZigAPI->>PostgreSQL: Update particle states in DB
    PostgreSQL-->>ZigAPI: OK
    ZigAPI-->>PythonAPI: {"status": "completed"}
    PythonAPI->>PythonAPI: Run complex SciPy analysis on results from DB
    PythonAPI-->>Client: {"simulation_id": "xyz", "analysis_url": "/results/xyz"}

This approach successfully isolated the performance-critical code into a component that could be optimized aggressively, without requiring a rewrite of the entire system. The benchmark results were compelling: for a simulation of 1000 particles, the Zig microservice sustained over 20 times the throughput of the original Python endpoint on the same hardware.

The current implementation, while effective, has clear limitations. The interaction with the legacy SciPy service is a synchronous HTTP call, making it a potential point of failure and a performance bottleneck. A more resilient architecture would use a message queue to decouple the services. Our comptime query builder is primitive; it lacks support for joins, transactions, or schema migrations, making it unsuitable for applications with complex data models. Finally, manual memory management in Zig, while providing ultimate control, places a significant burden on the developer to prevent memory leaks and use-after-free errors, especially in a long-running server process. An arena allocator per request helps, but a more sophisticated memory strategy would be needed for a truly hardened production service.

Performance SciPy Rollup ORM Web API Zig Systems Programming

Federating an MLOps Control Plane with Python WebAuthn and Turbopack-based Micro-frontends

2023-10-27 Software Architecture

Turbopack Python WebAuthn Micro-frontends MLOps

Deploying a JWT-Secured Qdrant Service with Puppet for a Frontend Developer Platform

2023-10-27 Software Architecture

Frontend JWT Puppet Qdrant Scrum