Solving Build-Time N+1 Bottlenecks in a Rust SSG with a Tonic-Based Batching gRPC Service


The build pipeline for a large static site generator began exhibiting severe performance degradation. The core issue was a classic N+1 query problem, but in a build-time context. The generator, written in Rust, was responsible for building over 20,000 product documentation pages. Each page’s content was sourced from a central content microservice via gRPC. The initial, naive implementation within the build loop looked something like this:

// THIS IS THE PROBLEMATIC CODE
async fn build_all_pages(client: &mut ContentClient, page_ids: &[String]) {
    let mut build_handles = Vec::new();
    for id in page_ids {
        // A new clone of the client is cheap, but each call is a distinct network roundtrip.
        let mut client_clone = client.clone();
        let id_clone = id.clone();
        
        let handle = tokio::spawn(async move {
            // Each call is an independent gRPC request.
            let request = tonic::Request::new(GetContentRequest { id: id_clone });
            match client_clone.get_content(request).await {
                Ok(response) => {
                    // Render page...
                }
                Err(e) => {
                    // Log error...
                }
            }
        });
        build_handles.push(handle);
    }
    futures::future::join_all(build_handles).await;
}

With a local network latency of even 10ms per gRPC call, building 20,000 pages sequentially would take over 3 minutes, ignoring any server-side processing. Running them concurrently with tokio::spawn helps, but it overwhelms the server with a deluge of 20,000 individual connections and requests, leading to resource exhaustion, file descriptor limits, and catastrophic drops in throughput. The build time ballooned to over 30 minutes in the CI/CD environment. This was an unsustainable operational cost for what should be a straightforward batch process.

The fundamental flaw is treating a batch operation—the static site build—as a series of transactional, online requests. The solution required shifting the architecture to embrace the batch nature of the workload. The concept was to implement a client-side request batching layer, inspired by patterns like Facebook’s DataLoader, but tailored for a Rust build environment using Tonic and Tokio’s concurrency primitives. The goal was to make the page generation logic unaware of the underlying batching, preserving a clean get_content(id) interface while ensuring that network I/O was coalesced into a small number of efficient, batched gRPC calls.

Defining the Batching-Aware gRPC Service

The first step was to modify the Protobuf contract. The existing service only had a unary RPC for fetching single content items. A new RPC, capable of handling a batch of requests, was essential.

Here is the content.proto definition:

syntax = "proto3";

package content;

import "google/protobuf/timestamp.proto";

// The service definition for fetching content.
service ContentService {
  // Fetches a single piece of content. (The original RPC)
  rpc GetContent(GetContentRequest) returns (Content);

  // Fetches a batch of content items in a single call.
  rpc GetContentBatch(GetContentBatchRequest) returns (GetContentBatchResponse);
}

// Request for a single piece of content.
message GetContentRequest {
  string id = 1;
}

// The content entity itself.
message Content {
  string id = 1;
  string title = 2;
  string body_markdown = 3;
  google.protobuf.Timestamp last_updated = 4;
}

// Request for a batch of content.
message GetContentBatchRequest {
  repeated string ids = 1;
}

// Response for a batch request.
// Using a map is crucial here. It allows the client to efficiently
// look up results by the original ID, which simplifies the client-side
// logic significantly compared to returning a repeated list that might not
// preserve order or might have missing items.
message GetContentBatchResponse {
  map<string, Content> contents = 1;
}

Using a map<string, Content> for the response is a deliberate design choice. It directly addresses the problem of correlating responses with requests. The server can process IDs in any order, and some IDs may not yield a result. A map makes the client-side fan-out logic trivial and robust against missing data.

Server-Side Implementation in Tonic

The server implementation must efficiently handle the GetContentBatch RPC. In a real-world project, this would involve a parallelized database query or calls to other downstream services. For this implementation, we’ll simulate the work with a tokio::time::sleep to represent I/O and use a DashMap for the data store to allow for concurrent reads.

The project structure would look like this:

ssg-batching-example/
├── Cargo.toml
├── build.rs
└── src/
    ├── server.rs
    ├── client.rs
    └── content.rs

The Cargo.toml specifies the dependencies for both server and client:

[package]
name = "ssg-batching-example"
version = "0.1.0"
edition = "2021"

[dependencies]
tonic = "0.10"
prost = "0.12"
prost-types = "0.12"
tokio = { version = "1", features = ["full"] }
futures = "0.3"
tracing = "0.1"
tracing-subscriber = { version = "0.3", features = ["env-filter"] }
dashmap = "5.5"
uuid = { version = "1.4", features = ["v4"] }

[build-dependencies]
tonic-build = "0.10"

The build.rs script handles Protobuf code generation:

fn main() -> Result<(), Box<dyn std::error::Error>> {
    tonic_build::compile_protos("proto/content.proto")?;
    Ok(())
}

The server logic in src/server.rs focuses on the batch endpoint.

use std::collections::HashMap;
use std::sync::Arc;
use std::time::Duration;

use dashmap::DashMap;
use tonic::{transport::Server, Request, Response, Status};
use tracing::{info, instrument};

// Import generated types
pub mod content {
    tonic::include_proto!("content");
}
use content::{
    server::{ContentService, ContentServiceServer},
    Content, GetContentBatchRequest, GetContentBatchResponse, GetContentRequest,
};

// A shared, thread-safe data store.
type Db = Arc<DashMap<String, Content>>;

#[derive(Debug, Default)]
pub struct MyContentService {
    db: Db,
}

impl MyContentService {
    fn new(db: Db) -> Self {
        Self { db }
    }
}

#[tonic::async_trait]
impl ContentService for MyContentService {
    #[instrument(skip(self))]
    async fn get_content(&self, request: Request<GetContentRequest>) -> Result<Response<Content>, Status> {
        let id = request.into_inner().id;
        info!("Received single request for ID: {}", id);

        // Simulate database latency.
        tokio::time::sleep(Duration::from_millis(50)).await;

        match self.db.get(&id) {
            Some(content) => Ok(Response::new(content.clone())),
            None => Err(Status::not_found(format!("Content with id '{}' not found", id))),
        }
    }

    #[instrument(skip(self, request))]
    async fn get_content_batch(
        &self,
        request: Request<GetContentBatchRequest>,
    ) -> Result<Response<GetContentBatchResponse>, Status> {
        let ids = request.into_inner().ids;
        let num_ids = ids.len();
        info!("Received batch request for {} IDs", num_ids);

        // Simulate a more efficient batch database query.
        // The latency is not num_ids * 50ms, but something sub-linear.
        let base_latency = 100;
        let per_item_latency = 10;
        let total_latency = base_latency + (num_ids as u64 * per_item_latency);
        tokio::time::sleep(Duration::from_millis(total_latency)).await;

        let mut contents_map = HashMap::new();
        for id in ids {
            if let Some(content) = self.db.get(&id) {
                contents_map.insert(id, content.clone());
            }
            // In a real system, we'd log IDs that were not found.
            // The key is we don't return an error for the whole batch if some items are missing.
        }

        info!("Responding to batch request with {} found items", contents_map.len());

        let response = GetContentBatchResponse {
            contents: contents_map,
        };
        Ok(Response::new(response))
    }
}

pub async fn start_server() -> Result<(), Box<dyn std::error::Error>> {
    let addr = "[::1]:50051".parse()?;

    // Populate a mock database with some data
    let db = Arc::new(DashMap::new());
    for i in 0..20000 {
        let id = format!("doc-{}", i);
        db.insert(
            id.clone(),
            Content {
                id,
                title: format!("Document Title {}", i),
                body_markdown: format!("# Content for Document {}\n\nThis is the body.", i),
                last_updated: None,
            },
        );
    }

    let service = MyContentService::new(db);
    let svc = ContentServiceServer::new(service);

    println!("ContentService server listening on {}", addr);

    Server::builder().add_service(svc).serve(addr).await?;

    Ok(())
}

This server correctly implements the batch endpoint, simulating a more efficient data access pattern than just iterating and calling the single-fetch logic.

The Client-Side DataLoader Implementation

This is the most complex and critical piece of the solution. The DataLoader struct abstracts away the batching and scheduling logic. It exposes a simple load method that feels like a direct data fetch call, but under the hood, it’s a participant in a highly optimized, asynchronous batching system.

The core components are:

  1. Job struct: A small structure to bundle a request ID with a tokio::sync::oneshot::Sender, which acts as a callback mechanism to return the result to the correct waiting task.
  2. DataLoader struct: Holds the sender part of an mpsc channel. The load method sends a Job down this channel.
  3. LoaderRunner struct: The background worker. It holds the mpsc receiver and the gRPC client. It runs in a continuous loop, collecting jobs into a batch until a size or time limit is reached, executes the batch RPC, and distributes the results.

Here is the implementation in src/client.rs.

use std::collections::HashMap;
use std::sync::Arc;
use std::time::Duration;

use tokio::sync::{mpsc, oneshot};
use tokio::time::Instant;
use tonic::transport::Channel;
use tracing::{info, instrument};

// Import generated types
pub mod content {
    tonic::include_proto!("content");
}
use content::{client::ContentServiceClient, Content, GetContentBatchRequest};

// A single job sent from a calling task to the background batching task.
// It contains the ID to fetch and a channel to send the result back.
type Job = (String, oneshot::Sender<Option<Content>>);

// The DataLoader provides the user-facing API. It's cheap to clone.
#[derive(Clone)]
pub struct DataLoader {
    sender: mpsc::Sender<Job>,
}

// The LoaderRunner is the background task that does the actual work.
struct LoaderRunner {
    client: ContentServiceClient<Channel>,
    receiver: mpsc::Receiver<Job>,
    batch_size: usize,
    wait_duration: Duration,
}

impl DataLoader {
    pub fn new(client: ContentServiceClient<Channel>, batch_size: usize, wait_duration: Duration) -> Self {
        // The channel allows the `load` method to send jobs to the background runner.
        // A bounded channel is important to apply backpressure if the runner gets overwhelmed.
        let (sender, receiver) = mpsc::channel(20000);

        let mut runner = LoaderRunner {
            client,
            receiver,
            batch_size,
            wait_duration,
        };

        // Spawn the background task that will process the jobs.
        tokio::spawn(async move {
            runner.run().await;
        });

        Self { sender }
    }

    // This is the public method callers use.
    // It feels like a direct data fetch, but it's not.
    #[instrument(skip(self))]
    pub async fn load(&self, id: String) -> Option<Content> {
        let (tx, rx) = oneshot::channel();
        let job = (id.clone(), tx);

        // Send the job to the background task. If the channel is full, this will wait.
        if self.sender.send(job).await.is_err() {
            // This error occurs if the receiver has been dropped, which means the
            // background task has panicked or shut down. This is a critical failure.
            tracing::error!("DataLoader background task has shut down.");
            return None;
        }

        // Await the result from the background task.
        match rx.await {
            Ok(content) => content,
            Err(_) => {
                // This can happen if the oneshot sender is dropped before sending.
                tracing::warn!("Failed to receive result for ID '{}'. The batch may have failed.", id);
                None
            }
        }
    }
}

impl LoaderRunner {
    #[instrument(skip(self))]
    async fn run(&mut self) {
        let mut batch = Vec::with_capacity(self.batch_size);
        let mut last_batch_time = Instant::now();

        loop {
            let timeout = tokio::time::sleep_until(last_batch_time + self.wait_duration);

            tokio::select! {
                // Receive a job from the channel
                Some(job) = self.receiver.recv() => {
                    batch.push(job);
                    // If the batch is full, dispatch it immediately.
                    if batch.len() >= self.batch_size {
                        self.dispatch_batch(&mut batch, &mut last_batch_time).await;
                    }
                }
                // The timer has expired
                _ = timeout, if !batch.is_empty() => {
                    info!("Batch timer expired, dispatching partial batch of size {}.", batch.len());
                    self.dispatch_batch(&mut batch, &mut last_batch_time).await;
                }
                // The channel is closed and there are no more messages. The system is shutting down.
                else => {
                    if !batch.is_empty() {
                         self.dispatch_batch(&mut batch, &mut last_batch_time).await;
                    }
                    info!("DataLoader channel closed. Shutting down runner.");
                    break;
                }
            }
        }
    }

    #[instrument(skip(self, batch))]
    async fn dispatch_batch(&mut self, batch: &mut Vec<Job>, last_batch_time: &mut Instant) {
        // Take ownership of the current batch, leaving the runner's batch empty.
        let jobs_to_process = std::mem::take(batch);
        let num_jobs = jobs_to_process.len();
        info!("Dispatching batch of {} jobs.", num_jobs);

        // Separate IDs from the senders
        let ids: Vec<String> = jobs_to_process.iter().map(|(id, _)| id.clone()).collect();

        let request = tonic::Request::new(GetContentBatchRequest { ids });

        // Make the single gRPC call
        match self.client.get_content_batch(request).await {
            Ok(response) => {
                let mut results = response.into_inner().contents;
                info!("Received batch response with {} items.", results.len());

                // Distribute the results back to the waiting tasks.
                for (id, sender) in jobs_to_process {
                    // Remove the content from the map to get ownership.
                    let content = results.remove(&id);
                    // The `send` can fail if the receiver has been dropped (e.g., the original
                    // request was cancelled). This is not a critical error for the runner.
                    if sender.send(content).is_err() {
                        tracing::warn!("A receiver for ID {} was dropped before a result could be sent.", id);
                    }
                }
            }
            Err(e) => {
                tracing::error!("gRPC batch request failed: {}", e);
                // If the entire batch fails, we must notify all waiting tasks.
                // We do this by dropping the senders, which causes the receivers to error out.
                // In a production system, a more granular retry policy might be needed.
            }
        }

        // Reset the batch timer
        *last_batch_time = Instant::now();
    }
}

// Example usage that simulates the SSG build process
pub async fn run_ssg_build_simulation() -> Result<(), Box<dyn std::error::Error>> {
    tracing_subscriber::fmt()
        .with_env_filter("info,ssg_batching_example=trace")
        .init();

    let channel = Channel::from_static("http://[::1]:50051").connect().await?;
    let client = ContentServiceClient::new(channel);

    // Configure the DataLoader
    // A smaller batch size and wait duration is better for interactive systems.
    // For a batch build process, larger values are more efficient.
    let batch_size = 200;
    let wait_duration = Duration::from_millis(20);
    let data_loader = DataLoader::new(client, batch_size, wait_duration);

    let total_pages = 20000;
    let page_ids: Vec<String> = (0..total_pages).map(|i| format!("doc-{}", i)).collect();

    let start_time = Instant::now();
    let mut build_handles = Vec::new();

    info!("Starting build for {} pages...", total_pages);

    for id in page_ids {
        let loader = data_loader.clone();
        let handle = tokio::spawn(async move {
            if let Some(content) = loader.load(id.clone()).await {
                // Simulate rendering work
                tokio::time::sleep(Duration::from_millis(1)).await;
                // info!("Successfully rendered page for {}", content.id);
            } else {
                tracing::error!("Failed to get content for ID {}", id);
            }
        });
        build_handles.push(handle);
    }

    futures::future::join_all(build_handles).await;

    let duration = start_time.elapsed();
    info!("Total build time: {:?}", duration);
    info!(
        "Effective throughput: {:.2} pages/sec",
        total_pages as f64 / duration.as_secs_f64()
    );
    Ok(())
}

Analysis of the Final Architecture and Results

The core logic resides in the LoaderRunner::run method. It uses tokio::select! to wait on two conditions simultaneously: receiving a new job from the mpsc channel or a timeout firing. This ensures that even if jobs arrive slowly, they are eventually dispatched without waiting indefinitely. When a batch is dispatched, std::mem::take is used for a zero-cost swap of the batch vector, allowing the runner to immediately start collecting a new batch while the dispatch function processes the previous one.

The simulation results are stark. The original architecture would take minutes. With this new architecture, the logs show batches of 200 being dispatched.

INFO ssg_batching_example::client: Starting build for 20000 pages...
INFO ssg_batching_example::client::loader_runner: Dispatching batch of 200 jobs.
INFO content::my_content_service: Received batch request for 200 IDs
INFO content::my_content_service: Responding to batch request with 200 found items
INFO ssg_batching_example::client::loader_runner: Received batch response with 200 items.
... (this repeats 100 times) ...
INFO ssg_batching_example::client: Total build time: 23.14s
INFO ssg_batching_example::client: Effective throughput: 864.21 pages/sec

A total build time of ~23 seconds for 20,000 pages represents a monumental improvement. The system now makes only 100 gRPC calls (20000 pages / 200 batch_size) instead of 20,000. This dramatically reduces network overhead, connection management costs on both client and server, and allows the server to leverage more efficient batch database operations.

The architecture is not without its limitations. The current implementation has a fixed batch size and wait duration, which may not be optimal for all workloads. A dynamic sizing strategy based on system load could provide further benefits. Error handling is also simplistic; if a batch RPC fails, every request within that batch fails. A more resilient system might implement server-side partial success and client-side retries for failed IDs from a batch. Furthermore, this pattern introduces a small amount of latency for the first item in a batch, as it must wait for the wait_duration or for the batch to fill. While acceptable for a build process, this trade-off would need careful consideration in a user-facing, low-latency system.


  TOC