Implementing an Asynchronous Batch Processing Pipeline for Computer Vision Ingestion Using Axum and Tokio

Rust Engineering

Word Count: 2.5k

Read Times: 15 Min

A naive implementation of a computer vision service often follows a simple request-response model: an HTTP endpoint receives an image, processes it immediately, and returns the result. This architecture crumbles under concurrent load. The core issue is the tight coupling of I/O-bound ingestion with CPU/GPU-bound processing. Each incoming request monopolizes a worker thread while waiting for a potentially long-running OpenCV task to complete, leading to thread pool exhaustion, high latency, and inefficient resource utilization. In a real-world project, this design is a non-starter.

The immediate pain point manifests as a performance bottleneck at the processing stage. If each image analysis takes 200ms, the theoretical maximum throughput is only five requests per second per worker, regardless of how fast the network is. Furthermore, performing individual database writes for each processed image introduces significant overhead and can lead to contention on the database side.

Our initial concept to resolve this is to decouple ingestion from processing. An Axum web server will act as a high-speed ingestion endpoint. Its only job is to validate the incoming request, accept the image data, and place it into a durable, in-memory queue. It should respond to the client immediately with a 202 Accepted status and a unique job identifier. A separate, long-running background task, or a pool of them, will be responsible for consuming jobs from this queue. This worker will not process images one by one; instead, it will collect them into batches. Processing a batch of images at once allows for better resource amortization (e.g., loading a model into GPU memory once per batch) and significantly more efficient database operations via bulk inserts.

For the technology stack, Rust was a deliberate choice. Its performance characteristics and compile-time safety guarantees are invaluable for building a reliable, high-throughput backend service.

Axum: Built on top of Tokio, it’s a natural fit for our asynchronous architecture. Its extractor system and state management are clean and powerful, allowing us to build a lean ingestion layer.
Tokio: As the de-facto async runtime in Rust, it provides all the necessary primitives: green threads (tasks), message passing channels (mpsc), and synchronization tools.
OpenCV (Rust bindings): Despite the integration complexities often associated with C++ libraries in Rust, the opencv crate provides the necessary computer vision functions. We accept the build-time overhead for the powerful, battle-tested algorithms it offers.
MongoDB: A NoSQL document database is ideal here. The output of computer vision tasks is often semi-structured—a variable number of detected objects, each with different attributes. MongoDB’s flexible schema handles this gracefully. Its async Rust driver (mongodb) integrates perfectly with our Tokio-based runtime, and the insert_many operation is key to our batching strategy.

Here is the architectural flow we are building:

graph TD
    A[Client] -- HTTP POST /process --> B(Axum Ingestion Handler);
    B -- Pushes ImageJob --> C{tokio::mpsc::channel};
    B -- HTTP 202 Accepted --> A;
    D(Batching Worker Task) -- Pulls from --> C;
    D -- Collects into batch --> E[Image Batch];
    E -- Processes with OpenCV --> F[Processing Results];
    F -- Bulk insert --> G[(MongoDB)];

The implementation begins with setting up the project and defining our core data structures. The Cargo.toml must include all our chosen dependencies.

# Cargo.toml
[dependencies]
axum = { version = "0.7", features = ["multipart"] }
tokio = { version = "1.33", features = ["full"] }
opencv = "0.88"
mongodb = "2.8"
serde = { version = "1.0", features = ["derive"] }
serde_json = "1.0"
uuid = { version = "1.6", features = ["v4"] }
tracing = "0.1"
tracing-subscriber = { version = "0.3", features = ["env-filter"] }
anyhow = "1.0"
bytes = "1.5"
tokio-util = { version = "0.7", features = ["codec"] }
futures = "0.3"

Next, we define the structures that will flow through our system. An ImageJob represents a single unit of work placed into our queue. The ProcessingResult will be the document we store in MongoDB.

// src/main.rs

use serde::{Serialize, Deserialize};
use uuid::Uuid;
use bytes::Bytes;

// Represents a single image processing request that gets queued.
#[derive(Debug)]
pub struct ImageJob {
    pub job_id: Uuid,
    pub image_data: Bytes,
}

// The final data structure to be stored in MongoDB.
// It must derive Serialize/Deserialize for BSON conversion.
#[derive(Debug, Serialize, Deserialize)]
pub struct ProcessingResult {
    pub job_id: String, // Storing UUID as a string for better compatibility
    pub detected_objects: Vec<DetectedObject>,
    pub processing_time_ms: u128,
    #[serde(with = "mongodb::bson::serde_helpers::chrono_datetime_as_bson_datetime")]
    pub created_at: chrono::DateTime<chrono::Utc>,
}

#[derive(Debug, Serialize, Deserialize)]
pub struct DetectedObject {
    pub class_name: String,
    pub confidence: f32,
    pub x: i32,
    pub y: i32,
    pub width: i32,
    pub height: i32,
}

The heart of our system is the batch_processor task. This is a long-running Tokio task that owns the receiving end of our MPSC channel. Its logic is critical: it must collect jobs into a batch, handle timeouts (to process incomplete batches during low traffic periods), execute the CV and database operations, and manage errors gracefully. A common mistake here is to have no timeout, which would result in jobs being stuck in the queue indefinitely if the batch size is never reached.

// src/worker.rs

use tokio::sync::mpsc::Receiver;
use tokio::time::{self, Duration};
use mongodb::{Database, Collection};
use crate::{ImageJob, ProcessingResult, DetectedObject};
use anyhow::Result;
use std::time::Instant;
use opencv::{
    prelude::*,
    core,
    imgcodecs,
    objdetect,
};

const BATCH_SIZE: usize = 16;
const BATCH_TIMEOUT: Duration = Duration::from_secs(5);

// The main function for the background worker task.
pub async fn run_batch_processor(
    mut rx: Receiver<ImageJob>,
    db: Database,
) {
    let collection: Collection<ProcessingResult> = db.collection("image_results");
    
    loop {
        let mut batch = Vec::with_capacity(BATCH_SIZE);
        
        // Wait for the first job with a timeout. If the receiver is closed,
        // the loop will break, allowing for graceful shutdown.
        match time::timeout(BATCH_TIMEOUT, rx.recv()).await {
            Ok(Some(job)) => batch.push(job),
            Ok(None) => {
                // Channel has been closed.
                tracing::info!("Channel closed, batch processor shutting down.");
                break;
            }
            Err(_) => {
                // Timeout elapsed with no jobs. Continue to the next iteration.
                continue;
            }
        }

        // Fill the rest of the batch until BATCH_SIZE is reached or the channel is empty.
        // This is non-blocking, so it drains any pending jobs quickly.
        while batch.len() < BATCH_SIZE {
            if let Ok(job) = rx.try_recv() {
                batch.push(job);
            } else {
                break; // Channel is empty or closed
            }
        }
        
        tracing::info!("Processing a batch of {} images.", batch.len());
        let processing_start = Instant::now();
        
        // Process the entire batch.
        // A real-world scenario might use rayon for parallel CPU processing
        // or batch inference on a GPU. Here we process sequentially for simplicity.
        let results = process_image_batch(batch);

        if !results.is_empty() {
            match collection.insert_many(results, None).await {
                Ok(insert_result) => {
                    tracing::info!(
                        "Successfully inserted {} documents into MongoDB. Batch processed in {}ms.",
                        insert_result.inserted_ids.len(),
                        processing_start.elapsed().as_millis()
                    );
                },
                Err(e) => {
                    tracing::error!("Failed to insert batch into MongoDB: {}", e);
                    // A pitfall here is not having a dead-letter queue.
                    // If the DB insert fails, these results are lost.
                    // For production, failed batches should be routed elsewhere for retry.
                }
            }
        }
    }
}

// This function simulates the core computer vision logic.
// It takes a batch of jobs and returns a vector of results.
fn process_image_batch(batch: Vec<ImageJob>) -> Vec<ProcessingResult> {
    // In a real application, you would initialize your CV model (e.g., a CascadeClassifier)
    // once outside the loop to avoid repeated loading costs.
    let mut cascade = objdetect::CascadeClassifier::new(
        "./data/haarcascade_frontalface_default.xml"
    ).expect("Failed to load Haar cascade classifier.");

    let mut batch_results = Vec::new();

    for job in batch {
        let processing_start_item = Instant::now();
        let job_id = job.job_id;
        
        // Attempt to decode the image from memory
        let frame = match Mat::from_slice(&job.image_data) {
            Ok(f) => f,
            Err(e) => {
                tracing::warn!(job_id = %job_id, "Failed to create Mat from slice: {}", e);
                continue;
            }
        };

        let img = match imgcodecs::imdecode(&frame, imgcodecs::IMREAD_COLOR) {
            Ok(i) => i,
            Err(e) => {
                tracing::warn!(job_id = %job_id, "Failed to decode image: {}", e);
                continue;
            }
        };

        let mut detected_objects = Vec::new();
        let mut faces = core::Vector::new();

        // The actual OpenCV call. Error handling is critical.
        if let Ok(()) = cascade.detect_multi_scale(
            &img,
            &mut faces,
            1.1,
            10,
            objdetect::CASCADE_SCALE_IMAGE,
            core::Size::new(30, 30),
            core::Size::new(0, 0),
        ) {
            for face in faces.iter() {
                detected_objects.push(DetectedObject {
                    class_name: "face".to_string(),
                    confidence: 1.0, // Haar cascades don't provide confidence
                    x: face.x,
                    y: face.y,
                    width: face.width,
                    height: face.height,
                });
            }
        } else {
            tracing::error!(job_id = %job_id, "Error during face detection.");
        }

        batch_results.push(ProcessingResult {
            job_id: job_id.to_string(),
            detected_objects,
            processing_time_ms: processing_start_item.elapsed().as_millis(),
            created_at: chrono::Utc::now(),
        });
    }

    batch_results
}

With the worker defined, we build the Axum web server. It needs shared state (AppState) to hold the database connection pool and the sender part of our MPSC channel. The handler for /process will use Axum’s Multipart extractor to handle file uploads efficiently.

// src/server.rs

use axum::{
    routing::post,
    Router,
    extract::{State, Multipart},
    response::{IntoResponse, Json},
    http::StatusCode,
};
use tokio::sync::mpsc::Sender;
use mongodb::Database;
use serde_json::json;
use std::sync::Arc;
use uuid::Uuid;
use crate::ImageJob;

// The shared state for our Axum application.
#[derive(Clone)]
pub struct AppState {
    pub tx: Sender<ImageJob>,
    pub db: Database,
}

pub fn create_router(app_state: Arc<AppState>) -> Router {
    Router::new()
        .route("/process", post(process_image_handler))
        .with_state(app_state)
}

// The handler for image ingestion.
// It is designed to be as fast as possible, offloading work immediately.
async fn process_image_handler(
    State(state): State<Arc<AppState>>,
    mut multipart: Multipart,
) -> impl IntoResponse {
    let mut image_data: Option<bytes::Bytes> = None;

    // A common pitfall is not handling multipart stream correctly.
    // We must iterate through fields to find our file.
    while let Some(field) = multipart.next_field().await.unwrap() {
        let name = field.name().unwrap_or("").to_string();
        if name == "image" {
            let data = field.bytes().await.unwrap();
            if !data.is_empty() {
                image_data = Some(data);
                break; // Found our image, no need to process other fields
            }
        }
    }

    let image_data = match image_data {
        Some(data) => data,
        None => {
            return (
                StatusCode::BAD_REQUEST,
                Json(json!({ "error": "Missing 'image' field in multipart form" })),
            )
        }
    };
    
    let job_id = Uuid::new_v4();
    let job = ImageJob {
        job_id,
        image_data,
    };

    // The core of the decoupling: send the job to the worker.
    // This is a non-blocking operation. If the channel is full,
    // it will return an error, which we must handle (backpressure).
    if let Err(e) = state.tx.send(job).await {
        tracing::error!("Failed to send job to worker channel: {}", e);
        return (
            StatusCode::SERVICE_UNAVAILABLE,
            Json(json!({ "error": "Processing queue is overloaded" })),
        );
    }
    
    tracing::info!(job_id = %job_id, "Accepted new image processing job.");

    // Immediately return 202 Accepted. The client knows the job is queued.
    (
        StatusCode::ACCEPTED,
        Json(json!({ "job_id": job_id.to_string() })),
    )
}

Finally, we tie everything together in main.rs. This involves initializing the logger, connecting to MongoDB, creating the MPSC channel, spawning the background worker task, setting up the Axum router, and implementing a graceful shutdown mechanism. Graceful shutdown is not optional in a production system; it ensures that in-flight requests and processing batches are not abruptly terminated.

// In src/main.rs

mod server;
mod worker;

// Re-export structs
pub use server::{AppState, create_router};
pub use worker::run_batch_processor;

use serde::{Serialize, Deserialize};
use uuid::Uuid;
use bytes::Bytes;

use std::sync::Arc;
use tokio::sync::mpsc;
use mongodb::{Client, options::ClientOptions};

#[derive(Debug)]
pub struct ImageJob {
    pub job_id: Uuid,
    pub image_data: Bytes,
}

#[derive(Debug, Serialize, Deserialize)]
pub struct ProcessingResult {
    pub job_id: String,
    pub detected_objects: Vec<DetectedObject>,
    pub processing_time_ms: u128,
    #[serde(with = "mongodb::bson::serde_helpers::chrono_datetime_as_bson_datetime")]
    pub created_at: chrono::DateTime<chrono::Utc>,
}

#[derive(Debug, Serialize, Deserialize)]
pub struct DetectedObject {
    pub class_name: String,
    pub confidence: f32,
    pub x: i32,
    pub y: i32,
    pub width: i32,
    pub height: i32,
}

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    // Setup structured logging. Essential for debugging async systems.
    tracing_subscriber::fmt()
        .with_env_filter(tracing_subscriber::EnvFilter::from_default_env())
        .init();

    // --- Database Connection ---
    let mongo_uri = std::env::var("MONGO_URI").unwrap_or_else(|_| "mongodb://localhost:27017".to_string());
    let mut client_options = ClientOptions::parse(&mongo_uri).await?;
    client_options.app_name = Some("image-processor".to_string());
    let client = Client::with_options(client_options)?;
    let db = client.database("vision_db");
    tracing::info!("Successfully connected to MongoDB.");

    // --- Channel for Decoupling ---
    // A bounded channel is crucial for applying backpressure. An unbounded channel
    // can lead to unbounded memory growth if the producer is faster than the consumer.
    let (tx, rx) = mpsc::channel::<ImageJob>(1024);

    // --- Spawn the Batch Processing Worker ---
    let worker_db = db.clone();
    let worker_handle = tokio::spawn(async move {
        run_batch_processor(rx, worker_db).await;
    });
    tracing::info!("Batch processing worker started.");

    // --- Setup Axum Server ---
    let app_state = Arc::new(AppState {
        tx: tx.clone(),
        db,
    });
    let app = create_router(app_state);
    let listener = tokio::net::TcpListener::bind("0.0.0.0:3000").await?;
    tracing::info!("Server listening on {}", listener.local_addr()?);
    
    axum::serve(listener, app)
        .with_graceful_shutdown(shutdown_signal(tx))
        .await?;
    
    // Wait for the worker to finish processing its final batch.
    worker_handle.await?;
    tracing::info!("Worker has shut down gracefully.");

    Ok(())
}

// Handles graceful shutdown signals.
async fn shutdown_signal(tx: mpsc::Sender<ImageJob>) {
    let ctrl_c = async {
        tokio::signal::ctrl_c()
            .await
            .expect("failed to install Ctrl+C handler");
    };

    #[cfg(unix)]
    let terminate = async {
        tokio::signal::unix::signal(tokio::signal::unix::SignalKind::terminate())
            .expect("failed to install signal handler")
            .recv()
            .await;
    };

    #[cfg(not(unix))]
    let terminate = std::future::pending::<()>();

    tokio::select! {
        _ = ctrl_c => {},
        _ = terminate => {},
    }

    tracing::info!("Signal received, starting graceful shutdown.");
    // When `tx` is dropped, the `rx.recv()` in the worker will return `None`,
    // allowing the worker's loop to terminate gracefully.
    drop(tx);
}

To run this project, one must place a Haar cascade XML file (e.g., haarcascade_frontalface_default.xml from the OpenCV repository) in a ./data directory. The service can then be started, and requests can be sent via curl:
curl -X POST -F "image=@/path/to/your/image.jpg" http://localhost:3000/process

This architecture successfully decouples ingestion from processing, enabling the web server to handle a high rate of incoming requests while the background worker processes them efficiently in batches. The use of an asynchronous runtime, a bounded channel for backpressure, and bulk database operations are key to its performance and stability.

However, this solution is not without its limitations. The tokio::mpsc channel is entirely in-memory. If the service instance crashes or is restarted, any jobs currently in the queue are permanently lost. For a production system requiring durability, this channel must be replaced with a persistent message broker like Redis Streams, RabbitMQ, or Kafka. Scaling this service beyond a single node also requires an external message queue so that multiple worker instances can pull from a shared pool of jobs. Furthermore, the error handling for batch processing is rudimentary; a failure to insert into MongoDB means the entire batch’s results are lost. A more robust implementation would incorporate a dead-letter queue (DLQ) to park failed jobs or individual items from a batch for later inspection and reprocessing.

Rust MongoDB NoSQL Axum OpenCV Tokio Concurrency

Building a Transactional Dead Letter Queue for ActiveJob Using a SQL Database

2023-10-27 Backend Engineering

Ruby on Rails PostgreSQL SQL Dead Letter Queue ActiveJob

Engineering Dynamic Service Topology Observability from a Puppet-Managed Monolith Using Consul Connect Grafana and a Test-Driven React Interface

2023-10-27 Software Architecture

Grafana Software Engineering & Architecture React Testing Library Puppet Consul Connect