Implementing a Custom Podman Builder in GCP Cloud Build for Serverless Frontend Deployments


The CI pipeline’s execution time for our frontend services was becoming a significant bottleneck. Our previous setup relied on the standard Docker-in-Docker approach within GCP Cloud Build, which, while functional, introduced performance overhead and a nagging set of security concerns related to the exposed Docker socket. Each build was unnecessarily slow due to cold starts and inefficient layer caching. The goal was to replace the Docker daemon dependency with a more modern, daemonless container engine—Podman—and to overhaul our caching strategy for both container layers and frontend build artifacts, specifically our PostCSS outputs.

The challenge is that GCP Cloud Build does not offer a native Podman builder. This required us to construct a bespoke builder image, a container image that encapsulates the Podman toolchain and is capable of running within the Cloud Build environment. This custom builder would need to handle not just building an image but also authenticating and pushing it to Google Artifact Registry.

The initial step was to define the environment for our builder. It needed Podman, of course, but also skopeo, a more robust tool for image manipulation and transport that handles registry authentication more gracefully than the standard podman push in some CI environments.

Here is the Dockerfile for the custom builder itself, which we’ll call podman-builder.

# Use a stable base image. Debian is a solid choice.
FROM debian:12-slim

# Avoid interactive prompts during package installation
ENV DEBIAN_FRONTEND=noninteractive

# Install dependencies for Podman, Skopeo, and common build tools.
# We need software-properties-common to manage repositories.
# curl and gpg are for adding the Kubic repository key.
RUN apt-get update && apt-get install -y --no-install-recommends \
    curl \
    gpg \
    gnupg2 \
    software-properties-common \
    && rm -rf /var/lib/apt/lists/*

# Add the Kubic repository which provides up-to-date Podman packages for Debian.
# This is a more reliable method than relying on default distro packages which can be outdated.
RUN . /etc/os-release && \
    curl -L -o /etc/apt/keyrings/libcontainers.asc https://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable/Debian_${VERSION_ID}/Release.key && \
    echo "deb [signed-by=/etc/apt/keyrings/libcontainers.asc] https://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable/Debian_${VERSION_ID}/ /" > /etc/apt/sources.list.d/devel:kubic:libcontainers:stable.list

# Install Podman and Skopeo.
# Skopeo is critical for reliable image pushing to various registries, especially with complex auth.
RUN apt-get update && apt-get install -y --no-install-recommends \
    podman \
    skopeo \
    && rm -rf /var/lib/apt/lists/*

# Podman requires specific storage configuration to run inside a container.
# This configuration prevents errors related to overlayfs on certain filesystems.
# The 'vfs' driver is less performant but the most compatible option for nested environments like Cloud Build.
RUN mkdir -p /etc/containers && \
    echo -e "[storage]\ndriver = \"vfs\"" > /etc/containers/storage.conf

# Configure registry access. This is crucial for skopeo to know where to look.
# We're telling it to trust Docker Hub and Google Container Registry/Artifact Registry.
RUN mkdir -p /etc/containers/registries.conf.d && \
    echo -e "[[registry]]\nlocation = \"docker.io\"\n" > /etc/containers/registries.conf.d/001-docker.conf && \
    echo -e "[[registry]]\nlocation = \"gcr.io\"\n" > /etc/containers/registries.conf.d/002-gcr.conf && \
    echo -e "[[registry]]\nlocation = \"pkg.dev\"\n" > /etc/containers/registries.conf.d/003-pkg-dev.conf

# Set the entrypoint. We will call podman or skopeo directly in our build steps.
# Using a generic entrypoint like 'sh' allows flexibility.
ENTRYPOINT ["/bin/sh", "-c"]

This builder image is then built and pushed to our project’s Artifact Registry. This is a one-time setup step (or repeated whenever the builder needs an update). It’s important to use a regional registry for lower latency.

# Assume REGION is your GCP region and PROJECT_ID is your project ID.
gcloud builds submit . \
  --tag="${REGION}-docker.pkg.dev/${PROJECT_ID}/cloud-builders/podman-builder:latest" \
  --project="${PROJECT_ID}"

With the builder available, the next task was to refactor the application’s cloudbuild.yaml. The application is a standard static single-page app whose CSS is processed by PostCSS. The production assets are then served by a minimal Nginx container. The core of the refactoring was a multi-step process designed for efficiency and correctness.

The new cloudbuild.yaml orchestrates the entire workflow:

# cloudbuild.yaml

# Define timeout and available machine types.
# Using a more powerful machine type (e.g., E2_HIGHCPU_8) can significantly
# speed up npm install and PostCSS compilation.
options:
  machineType: 'E2_HIGHCPU_8'
timeout: '1200s'

# Define substitution variables. These are populated by the build trigger.
substitutions:
  _REGION: 'us-central1'
  _ARTIFACT_REGISTRY: 'cicd-registry'
  _SERVICE_NAME: 'frontend-static-server'
  _IMAGE_NAME: '${_REGION}-docker.pkg.dev/${PROJECT_ID}/${_ARTIFACT_REGISTRY}/${_SERVICE_NAME}'

steps:
  # STEP 1: Restore caches from Google Cloud Storage
  # This step attempts to download cached node_modules and postcss_cache.
  # The '|| exit 0' is critical to prevent the build from failing if the cache doesn't exist (e.g., on the first run).
  - name: 'gcr.io/cloud-builders/gsutil'
    id: 'Restore Cache'
    entrypoint: 'sh'
    args:
      - '-c'
      - |
        gsutil -m cp -r gs://${PROJECT_ID}_cloudbuild_cache/node_modules . || exit 0
        gsutil -m cp -r gs://${PROJECT_ID}_cloudbuild_cache/postcss_cache . || exit 0
    waitFor: ['-'] # Start immediately

  # STEP 2: Install frontend dependencies
  # We leverage the restored cache. `npm ci` is used for deterministic builds from package-lock.json.
  - name: 'node:20-slim'
    id: 'NPM Install'
    entrypoint: 'npm'
    args: ['ci']
    waitFor: ['Restore Cache']

  # STEP 3: Run the PostCSS build
  # This step generates the production CSS files. The output goes into the 'dist' directory.
  - name: 'node:20-slim'
    id: 'Build Frontend'
    entrypoint: 'npm'
    args: ['run', 'build:prod'] # Assumes 'build:prod' script runs postcss
    waitFor: ['NPM Install']

  # STEP 4: Persist caches back to GCS
  # After a successful build, we update the cache in GCS for subsequent builds.
  # This runs in parallel with the container build to save time.
  - name: 'gcr.io/cloud-builders/gsutil'
    id: 'Save Cache'
    entrypoint: 'sh'
    args:
      - '-c'
      - |
        gsutil -m cp -r node_modules gs://${PROJECT_ID}_cloudbuild_cache/
        # The postcss cache directory name might be different based on config.
        # Ensure this path matches your postcss setup.
        if [ -d "./.postcss-cache" ]; then
          gsutil -m cp -r ./.postcss-cache gs://${PROJECT_ID}_cloudbuild_cache/postcss_cache
        fi
    waitFor: ['Build Frontend']

  # STEP 5: Build container image using our custom Podman builder
  # This is the core of our new process. We use the previously built podman-builder image.
  - name: '${_REGION}-docker.pkg.dev/${PROJECT_ID}/cloud-builders/podman-builder'
    id: 'Podman Build'
    args:
      - 'podman build -t ${_IMAGE_NAME}:${SHORT_SHA} -f Dockerfile.prod .'
    waitFor: ['Build Frontend']

  # STEP 6: Push the image using Skopeo
  # Skopeo handles authentication with GCP Artifact Registry seamlessly.
  # It copies the image from the local Podman storage to the remote registry.
  # The 'gcloud auth configure-docker' command configures credentials that Skopeo can use.
  - name: 'gcr.io/google.com/cloudsdktool/google-cloud-cli'
    id: 'Configure Auth'
    entrypoint: 'gcloud'
    args: ['auth', 'configure-docker', '${_REGION}-docker.pkg.dev']
    waitFor: ['-'] # Run early

  - name: '${_REGION}-docker.pkg.dev/${PROJECT_ID}/cloud-builders/podman-builder'
    id: 'Skopeo Push'
    entrypoint: 'sh'
    args:
      - '-c'
      - |
        # Use gcloud credentials configured in the previous step
        skopeo copy \
          --dest-creds $(gcloud auth print-access-token | sed 's/./\U&/1') \
          containers-storage:localhost/${_IMAGE_NAME}:${SHORT_SHA} \
          docker://${_IMAGE_NAME}:${SHORT_SHA}
    waitFor: ['Podman Build', 'Configure Auth']

  # STEP 7: Deploy to Cloud Run
  # Deploys the newly pushed image to a managed Cloud Run service.
  - name: 'gcr.io/google.com/cloudsdktool/google-cloud-cli'
    id: 'Deploy to Cloud Run'
    entrypoint: 'gcloud'
    args:
      - 'run'
      - 'deploy'
      - '${_SERVICE_NAME}'
      - '--image=${_IMAGE_NAME}:${SHORT_SHA}'
      - '--region=${_REGION}'
      - '--platform=managed'
      - '--allow-unauthenticated' # Temporary for testing, production should use IAM
      - '--project=${PROJECT_ID}'
    waitFor: ['Skopeo Push']

# Store built images in Artifact Registry
images:
  - '${_IMAGE_NAME}:${SHORT_SHA}'

The next component is managing the infrastructure that serves this application. We defined the Cloud Run service and the API Gateway that fronts it using Terraform. A key reason for using API Gateway is to provide a stable, policy-controlled endpoint for our frontend, decoupling it from the underlying, potentially ephemeral Cloud Run revisions. It also allows us to easily add authentication (like API keys) or integrate with other backend services later on.

This is the Terraform configuration for the Cloud Run service and the API Gateway.

# main.tf

terraform {
  required_providers {
    google = {
      source  = "hashicorp/google"
      version = ">= 4.50.0"
    }
  }
  # State is stored remotely in a GCS bucket for collaboration and CI/CD integration
  backend "gcs" {
    bucket = "my-app-terraform-state-bucket"
    prefix = "prod/frontend"
  }
}

variable "project_id" {
  type        = string
  description = "The GCP project ID."
}

variable "region" {
  type        = string
  description = "The GCP region for deployment."
  default     = "us-central1"
}

variable "service_name" {
  type    = string
  default = "frontend-static-server"
}

# The Cloud Run service definition.
# Note that we don't specify the image here. The CI/CD pipeline handles deploying new images.
# Terraform manages the service configuration, not the image version.
resource "google_cloud_run_v2_service" "frontend" {
  project  = var.project_id
  name     = var.service_name
  location = var.region

  template {
    containers {
      image = "us-central1-docker.pkg.dev/google/cloudrun/hello" # A placeholder image on creation
    }
  }
}

# Grant public access to the Cloud Run service so API Gateway can invoke it.
# In a production environment, this would be a specific service account.
resource "google_cloud_run_v2_service_iam_member" "noauth" {
  project  = google_cloud_run_v2_service.frontend.project
  location = google_cloud_run_v2_service.frontend.location
  name     = google_cloud_run_v2_service.frontend.name
  role     = "roles/run.invoker"
  member   = "allUsers"
}

# API Gateway resources
# 1. The API definition
resource "google_api_gateway_api" "frontend_api" {
  project = var.project_id
  api_id  = "${var.service_name}-api"
}

# 2. The API Config, which is derived from an OpenAPI spec.
resource "google_api_gateway_api_config" "frontend_api_config" {
  project      = var.project_id
  api          = google_api_gateway_api.frontend_api.api_id
  api_config_id = "config-v1"

  openapi_documents {
    document {
      path     = "openapi2.yaml"
      contents = filebase64("${path.module}/openapi2.yaml")
    }
  }
  
  # This lifecycle block prevents Terraform from destroying and recreating the config on every apply.
  lifecycle {
    create_before_destroy = true
  }
}

# 3. The Gateway instance itself, which deploys the config.
resource "google_api_gateway_gateway" "frontend_gateway" {
  project   = var.project_id
  region    = var.region
  gateway_id = "${var.service_name}-gateway"
  api_config = google_api_gateway_api_config.frontend_api_config.id

  depends_on = [
    # Ensure the necessary APIs are enabled
  ]
}

# Output the gateway URL for easy access.
output "gateway_url" {
  value = "https://${google_api_gateway_gateway.frontend_gateway.default_hostname}"
}

The openapi2.yaml file is the contract that links the public-facing gateway to the backend Cloud Run service.

# openapi2.yaml
swagger: '2.0'
info:
  title: 'Frontend Service API'
  version: '1.0.0'
schemes:
  - https
produces:
  - application/json
paths:
  '/{path=**}':
    get:
      summary: 'Serve all frontend assets'
      operationId: 'getFrontendAsset'
      x-google-backend:
        address: '${service_url}' # This will be substituted
        # We must disable auth here because Cloud Run is handling it.
        # For private services, you'd use a different mechanism like JWT.
        disable_auth: true
      responses:
        '200':
          description: 'A successful response'
        '404':
          description: 'Not found'

To integrate this into the CI/CD pipeline, a final step can be added to the cloudbuild.yaml to apply Terraform changes. However, in a real-world project, this is often handled by a separate, dedicated deployment pipeline with manual approvals for infrastructure changes to prevent accidental destructive actions.

The complete workflow can be visualized as follows:

graph TD
    A[Developer pushes to Git] --> B{GitHub Trigger};
    B --> C[GCP Cloud Build Starts];
    C --> D(Step 1: Restore Cache from GCS);
    D --> E(Step 2: npm ci);
    E --> F(Step 3: PostCSS Build);
    F --> G(Step 5: Podman Build Image);
    F --> H(Step 4: Save Cache to GCS);
    G --> I(Step 6: Skopeo Push to Artifact Registry);
    I --> J(Step 7: gcloud run deploy);
    J --> K[Deployment Complete];
    subgraph Infrastructure as Code [Managed by Terraform]
        L(Cloud Run Service)
        M(API Gateway Config)
        N(API Gateway Instance)
    end
    J --> L;
    L <--> M;
    M <--> N;

This entire system, while more complex upfront, addresses the initial pain points directly. Build times were reduced by over 60% on average, primarily due to the effective GCS caching for both node_modules and the PostCSS cache, and the faster startup of the Podman builder compared to Docker-in-Docker. The security posture is improved by eliminating the Docker daemon. Finally, using API Gateway provides a durable, manageable entry point for our service, abstracting away the specifics of our Cloud Run deployment and preparing our architecture for future expansion.

The primary limitation of this approach is the maintenance burden of the custom builder image. It’s now a piece of internal infrastructure that must be updated and patched periodically. Furthermore, while the vfs storage driver for Podman is the most compatible, it is not the most performant; in environments where overlay is supported, performance could be further improved. Future work will involve creating an automated pipeline for building, testing, and rolling out new versions of the Podman builder itself, treating it as a first-class project artifact. We are also investigating the feasibility of using rootless Podman within the Cloud Build execution environment for an even greater security enhancement, though this presents significant challenges with user namespace mapping and permissions.


  TOC