Orchestrating a Next.js, Computer Vision, and SQL Server Stack with a Unified Pulumi and CircleCI Pipeline


The initial state of our deployment process was a liability. A developer pushing a change to our CV analysis platform would trigger a cascade of semi-automated scripts and manual interventions. The core components—a Next.js web application for user submissions, a Python-based computer vision service for processing resumes, and a SQL Server database for storing structured data—were treated as separate fiefdoms. The CV service container build alone, which baked in a multi-gigabyte model, routinely exceeded 40 minutes. Database schema changes required a DBA to manually run scripts against staging and then production, often out of sync with application deployments. This disjointed workflow resulted in integration failures, prolonged downtime, and a pervasive fear of shipping code.

Our objective was to consolidate this entire process into a single, declarative, and efficient pipeline. A git push to the main branch should deterministically build, test, and deploy the entire stack. Infrastructure would be managed as code, application artifacts would be immutable, and the stateful nature of the SQL Server database would be tamed within this automated flow. We settled on Pulumi for infrastructure management, leveraging TypeScript to align with our frontend team’s skillset, and CircleCI for its powerful caching and workflow orchestration capabilities.

This is the log of that build-out, focusing on the architectural decisions and code that solved our most significant bottlenecks: build performance, infrastructure coupling, and stateful database management in a supposedly immutable world.

The Foundational Monorepo and Pulumi Structure

To enforce a unified process, we migrated all components into a single monorepo. This structure is critical for coordinating changes across the stack.

/
├── .circleci/
│   └── config.yml
├── infrastructure/
│   ├── Pulumi.yaml
│   ├── Pulumi.dev.yaml
│   ├── index.ts
│   ├── package.json
│   └── tsconfig.json
├── services/
│   └── cv-processor/
│       ├── Dockerfile
│       ├── requirements.txt
│       └── src/
└── webapp/
    └── client-nextjs/
        ├── Dockerfile
        ├── next.config.js
        └── src/

The infrastructure directory contains our entire Pulumi project. It’s responsible for provisioning everything: the VPC, the ECS cluster for our services, the RDS instance for SQL Server, ECR repositories, and IAM roles. A core tenet was that no cloud resource would be created or modified outside of this Pulumi program.

Our initial Pulumi index.ts laid out the non-negotiable foundations: networking and the database. In a real-world project, you never want to tear down your database instance on a whim. We use Pulumi’s protect resource option to prevent accidental deletion.

// infrastructure/index.ts
import * as aws from "@pulumi/aws";
import * as awsx from "@pulumi/awsx";
import *in pulumi from "@pulumi/pulumi";

// Fetch configuration for the current stack (e.g., 'dev', 'staging').
const config = new pulumi.Config();
const dbPassword = config.requireSecret("sqlServerPassword");

// Create a dedicated VPC. This provides network isolation.
const vpc = new awsx.ec2.Vpc("cv-app-vpc", {
    numberOfAvailabilityZones: 2,
    cidrBlock: "10.0.0.0/16",
});

// Security group for the SQL Server instance.
// It only allows traffic from within the VPC on port 1433.
const dbSg = new aws.ec2.SecurityGroup("db-sg", {
    vpcId: vpc.id,
    description: "Allow SQL Server traffic from within VPC",
    ingress: [{
        protocol: "tcp",
        fromPort: 1433,
        toPort: 1433,
        // This is a critical piece: we reference the VPC's CIDR block
        // to ensure only resources within our VPC can access the DB.
        cidrBlocks: [vpc.vpc.cidrBlock],
    }],
    egress: [{
        protocol: "-1",
        fromPort: 0,
        toPort: 0,
        cidrBlocks: ["0.0.0.0/0"],
    }],
});

// Provision an AWS RDS instance for SQL Server.
// In a real-world scenario, this would have multi-AZ enabled and backups configured.
const dbInstance = new aws.rds.Instance("sql-server-db", {
    engine: "sqlserver-ex", // SQL Server Express Edition for dev
    instanceClass: "db.t3.medium",
    allocatedStorage: 20,
    username: "adminuser",
    password: dbPassword,
    vpcSecurityGroupIds: [dbSg.id],
    dbSubnetGroupName: vpc.privateSubnetGroup.name,
    skipFinalSnapshot: true, // For dev purposes only.
    publiclyAccessible: false, // Security best practice.
}, { 
    // This flag prevents Pulumi from accidentally destroying the database.
    // To delete it, this must be manually set to false and `pulumi up` run again.
    protect: true 
});

// Create a security group for our application services.
const appSg = new aws.ec2.SecurityGroup("app-sg", {
    vpcId: vpc.id,
    description: "Allow HTTP traffic to application services",
    ingress: [{
        protocol: "tcp",
        fromPort: 80,
        toPort: 80,
        cidrBlocks: ["0.0.0.0/0"], // From the public internet
    }],
    egress: [{
        protocol: "-1",
        fromPort: 0,
        toPort: 0,
        cidrBlocks: ["0.0.0.0/0"],
    }],
});

// An ECS cluster to run our containerized services.
const cluster = new aws.ecs.Cluster("app-cluster");

// Export key resource identifiers for use by other parts of the system,
// particularly the CI/CD pipeline.
export const vpcId = vpc.id;
export const dbEndpoint = dbInstance.endpoint;
export const dbName = dbInstance.name;
export const clusterName = cluster.name;
export const appSecurityGroupId = appSg.id;
export const privateSubnetIds = vpc.privateSubnetIds;

Handling the sqlServerPassword with config.requireSecret ensures it’s encrypted in the Pulumi state file. This is fundamental. We never commit raw secrets to version control. The value is set via the CLI: pulumi config set --secret sqlServerPassword 'YOUR_COMPLEX_PASSWORD'.

Tackling the 40-Minute CV Service Build

The primary source of our pipeline’s slowness was the CV processor service. Its Docker build downloaded a large pre-trained model for optical character recognition (OCR) and layout analysis on every run.

The original Dockerfile was naive:

# services/cv-processor/Dockerfile (OLD AND SLOW)
FROM python:3.9-slim

WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# This step is the bottleneck. It downloads gigabytes of data every time.
RUN python -c "import layoutparser as lp; lp.Detectron2LayoutModel('lp://PubLayNet/faster_rcnn_R_50_FPN_3x/config')"

COPY src/ .
CMD ["uvicorn", "src.main:app", "--host", "0.0.0.0", "--port", "80"]

The solution involved two key strategies implemented in CircleCI: Docker layer caching and a custom model cache.

First, we restructured the Dockerfile to be more cache-friendly. The layers that change least often (OS packages, Python dependencies) are placed first.

# services/cv-processor/Dockerfile (IMPROVED)
FROM python:3.9-slim

ENV PYTHONUNBUFFERED=1 \
    PIP_NO_CACHE_DIR=off \
    PIP_DISABLE_PIP_VERSION_CHECK=on

WORKDIR /app

# Step 1: Install dependencies. This layer is only rebuilt if requirements.txt changes.
COPY requirements.txt .
RUN pip install -r requirements.txt

# Step 2: Copy the model from a pre-cached location within the build environment.
# We will populate /tmp/model_cache in our CircleCI job.
COPY /tmp/model_cache /root/.torch/
# The model initialization command will now find the model locally instead of downloading.
RUN python -c "import layoutparser as lp; lp.Detectron2LayoutModel('lp://PubLayNet/faster_rcnn_R_50_FPN_3x/config')"

# Step 3: Copy source code. This layer changes frequently.
COPY src/ .

CMD ["uvicorn", "src.main:app", "--host", "0.0.0.0", "--port", "80"]

Second, we configured CircleCI to cache both the Docker layers and the downloaded model files themselves between pipeline runs. The config.yml is the heart of this optimization.

# .circleci/config.yml
version: 2.1

orbs:
  pulumi: pulumi/[email protected]

# Define reusable executor with a larger resource class for the CV build
executors:
  default-executor:
    docker:
      - image: cimg/base:stable
  python-builder:
    docker:
      - image: cimg/python:3.9
    resource_class: medium+ # More CPU/RAM for the model handling

# Define reusable commands to keep the workflow clean
commands:
  # Command to set up Docker layer caching
  setup_dlc:
    steps:
      - setup_remote_docker:
          version: 20.10.18
          docker_layer_caching: true

  # Command to install and configure Pulumi
  setup_pulumi:
    steps:
      - pulumi/login
      - run:
          name: "Configure Pulumi for non-interactive run"
          command: |
            pulumi stack select my-org/cv-app-stack/dev
            pulumi config set aws:region ${AWS_REGION}

jobs:
  # Job 1: Build the CV service container, leveraging caches
  build-and-push-cv-service:
    executor: python-builder
    steps:
      - checkout
      - setup_dlc

      # Restore the CV model cache
      - restore_cache:
          keys:
            - cv-model-cache-v1-{{ checksum "services/cv-processor/requirements.txt" }}
            - cv-model-cache-v1-

      # A critical step: If the cache didn't exist, we run a one-off container
      # to download the model and populate a directory that we can then cache.
      - run:
          name: "Prime CV Model Cache if Missing"
          command: |
            if [ ! -d "/tmp/model_cache" ]; then
              echo "Model cache not found. Priming cache..."
              mkdir -p /tmp/model_cache
              # We use a Docker container to ensure a clean environment for the download.
              # The downloaded model ends up in /tmp/model_cache on the CircleCI executor.
              docker run --rm -v /tmp/model_cache:/root/.torch python:3.9-slim \
                bash -c "pip install layoutparser 'git+https://github.com/facebookresearch/detectron2.git' && \
                python -c \"import layoutparser as lp; lp.Detectron2LayoutModel('lp://PubLayNet/faster_rcnn_R_50_FPN_3x/config')\""
            else
              echo "Model cache found."
            fi

      # Save the cache for the next run. This happens only if it was just primed.
      - save_cache:
          key: cv-model-cache-v1-{{ checksum "services/cv-processor/requirements.txt" }}
          paths:
            - /tmp/model_cache

      - run:
          name: "Build and Push CV Service Docker Image"
          command: |
            aws ecr get-login-password --region ${AWS_REGION} | docker login --username AWS --password-stdin ${AWS_ACCOUNT_ID}.dkr.ecr.${AWS_REGION}.amazonaws.com
            REPO_URL=$(pulumi stack output cvProcessorRepoUrl)
            docker build -t ${REPO_URL}:${CIRCLE_SHA1} ./services/cv-processor
            docker push ${REPO_URL}:${CIRCLE_SHA1}

  # Job 2: Build the Next.js web application
  build-webapp:
    executor: default-executor
    steps:
      - checkout
      - setup_dlc
      - restore_cache:
          keys:
            - node-modules-{{ checksum "webapp/client-nextjs/package-lock.json" }}
      - run:
          name: "Install Webapp Dependencies"
          command: cd webapp/client-nextjs && npm install
      - save_cache:
          key: node-modules-{{ checksum "webapp/client-nextjs/package-lock.json" }}
          paths:
            - webapp/client-nextjs/node_modules
      - run:
          name: "Build and Push Webapp Docker Image"
          command: |
            aws ecr get-login-password --region ${AWS_REGION} | docker login --username AWS --password-stdin ${AWS_ACCOUNT_ID}.dkr.ecr.${AWS_REGION}.amazonaws.com
            REPO_URL=$(pulumi stack output webappRepoUrl) # Assume this is exported from Pulumi
            docker build -t ${REPO_URL}:${CIRCLE_SHA1} ./webapp/client-nextjs
            docker push ${REPO_URL}:${CIRCLE_SHA1}

  # Job 3: Deploy all infrastructure and application updates
  deploy-infrastructure:
    executor: default-executor
    steps:
      - checkout
      - setup_pulumi
      # This job needs the image tag from the build jobs. We persist it to the workspace.
      - attach_workspace:
          at: .
      - run:
          name: "Deploy Infrastructure with Pulumi"
          command: |
            IMAGE_TAG=$(cat image_tag.txt)
            cd infrastructure
            # Pass the image tag to Pulumi, so it can update the ECS service definition.
            pulumi up --yes --config="cvServiceImageTag=${IMAGE_TAG}" --config="webappImageTag=${IMAGE_TAG}"

workflows:
  build-and-deploy:
    jobs:
      - build-and-push-cv-service:
          context: aws-creds # Context storing AWS secrets in CircleCI
      - build-webapp:
          context: aws-creds
      # The deploy job runs only after BOTH build jobs succeed.
      - deploy-infrastructure:
          context: aws-creds
          requires:
            - build-and-push-cv-service
            - build-webapp

This configuration reduced the CV service build time on a cache hit from over 40 minutes to under 5. The key was separating the slow, infrequent task (model download) from the fast, frequent task (application code changes) and aggressively caching the result of the former.

Managing SQL Server Migrations in a Declarative Pipeline

The most significant architectural friction was handling database schema migrations. Pulumi is declarative: it describes the desired state of infrastructure. Database migrations are imperative and ordered: run script A, then script B. Forcing this imperative process into a purely declarative tool is an anti-pattern.

Our first attempt involved a custom Pulumi resource that would run a migration script. This was a disaster. It was slow, hard to debug, and made Pulumi previews confusing because the resource was always marked for replacement.

The pragmatic solution was to accept that database migration is an application-level concern, not an infrastructure one. We treated it as a distinct lifecycle step in our CircleCI workflow, running after the Pulumi infrastructure update but before the new application code was fully live.

  1. Tooling: We used a simple migration tool like node-pg-migrate (adapted for SQL Server) or a .NET equivalent like Flyway or DbUp. The migration scripts are stored in version control alongside the application code.

  2. Pulumi’s Role: Pulumi’s job is to provision the database and provide its connection details (endpoint, username, password secret) as stack outputs. It does not touch the schema.

  3. CircleCI’s Role: We added a new job to the workflow.

# Additions to .circleci/config.yml

jobs:
  # ... other jobs ...
  
  run-db-migrations:
    executor: default-executor
    steps:
      - checkout
      - run:
          name: "Install Migration Tool and Dependencies"
          command: |
            # Example using a Node.js-based migration tool
            cd ./database/migrations
            npm install
      - pulumi/login
      - run:
          name: "Run Database Migrations"
          command: |
            # Securely fetch database connection details from Pulumi stack outputs
            DB_ENDPOINT=$(pulumi stack output dbEndpoint)
            DB_USER="adminuser"
            DB_PASSWORD=$(pulumi config get sqlServerPassword --show-secrets)
            
            # The migration tool uses these env vars to connect and apply migrations
            DATABASE_URL="mssql://${DB_USER}:${DB_PASSWORD}@${DB_ENDPOINT}" \
            npm run migrate up
            
workflows:
  build-and-deploy:
    jobs:
      - build-and-push-cv-service
      - build-webapp
      - deploy-infrastructure:
          requires:
            - build-and-push-cv-service
            - build-webapp
      - run-db-migrations:
          requires:
            - deploy-infrastructure # CRITICAL: Only run after infra is stable

The workflow now looks like this:

graph TD
    A[Git Push] --> B{CircleCI Workflow};
    B --> C[Build & Push CV Service];
    B --> D[Build & Push Webapp];
    C --> E[Deploy Infrastructure with Pulumi];
    D --> E;
    E --> F[Run DB Migrations];

This separation of concerns proved to be the right approach. Infrastructure provisioning remains declarative and predictable with Pulumi. Schema evolution, an ordered and stateful process, is handled by a dedicated, imperative step in the deployment pipeline. The potential for a race condition (new code deploying before the schema is ready) is mitigated by making the migration job a blocking dependency for the final service cutover, which can be managed with blue/green deployment patterns in ECS.

Tying it Together: The Application Service Definition

The final piece was updating the Pulumi program to define the containerized services and wire everything together. We used an awsx.ecs.FargateService component, which simplifies the creation of task definitions, services, and load balancers.

// infrastructure/index.ts (continued)
import * as awsx from "@pulumi/awsx";
// ... other imports ...

// Define configuration variables to receive the image tags from CircleCI.
const pulumiConfig = new pulumi.Config();
const cvServiceImageTag = pulumiConfig.require("cvServiceImageTag");
const webappImageTag = pulumiConfig.require("webappImageTag");

// ECR Repositories for our services
const cvProcessorRepo = new awsx.ecr.Repository("cv-processor-repo");
const webappRepo = new awsx.ecr.Repository("webapp-repo");

// Create a load balancer to front our services.
const alb = new awsx.lb.ApplicationLoadBalancer("app-lb", {
    securityGroups: [appSecurityGroupId],
    subnets: vpc.publicSubnetIds,
});

// Define the CV Processor Fargate Service
const cvService = new awsx.ecs.FargateService("cv-processor-service", {
    cluster: cluster.arn,
    taskDefinitionArgs: {
        container: {
            image: pulumi.interpolate`${cvProcessorRepo.url}:${cvServiceImageTag}`,
            cpu: 512,
            memory: 1024,
            portMappings: [{ containerPort: 80 }],
            // Pass the database connection string as an environment variable.
            // We construct it from our other resources and secrets.
            environment: [
                {
                    name: "DATABASE_URL",
                    value: pulumi.interpolate`mssql://${dbInstance.username}:${dbPassword}@${dbInstance.endpoint}/${dbInstance.name}`,
                },
            ],
        },
    },
    desiredCount: 1,
    securityGroups: [appSecurityGroupId],
    subnets: vpc.privateSubnetIds,
    assignPublicIp: false, // The service should be in private subnets
});

// A listener for the CV service, e.g., on path /api/cv/*
const cvListener = alb.createListener("cv-listener", {
    port: 80,
    protocol: "HTTP",
});

cvListener.createTargetGroup("cv-tg", {
    port: 80,
    targetType: "ip",
    protocol: "HTTP",
    vpcId: vpc.id,
}).createAttach("cv-attach", cvService);

// Similar service definition for the Next.js webapp...

// Export the repository URLs for the CircleCI build jobs to use.
export const cvProcessorRepoUrl = cvProcessorRepo.url;
export const webappRepoUrl = webappRepo.url;
export const loadBalancerDns = alb.loadBalancer.dnsName;

When the deploy-infrastructure job runs in CircleCI, it executes pulumi up with the new image tag. Pulumi detects that only the image property of the container definition has changed. It then registers a new ECS task definition and triggers a rolling deployment of the ECS service, draining old tasks and starting new ones with the updated container image. The entire process is automated, with minimal downtime.

The final pipeline provides a single source of truth for our application’s entire lifecycle. While robust, this approach is not without its own complexities. The coupling between CircleCI and Pulumi for passing variables like image tags requires careful management. The database migration step, while functional, remains a procedural island in a declarative sea; more advanced blue/green deployment strategies would require an additional orchestration layer to manage traffic shifting only after a successful migration and health checks on the new version. The cost of a perpetually running RDS instance for SQL Server is also a significant consideration, prompting future investigation into serverless database alternatives where appropriate.


  TOC