Building a Declarative MLOps Platform for CV Models with Crossplane and Kong

MLOps

Word Count: 2.6k

Read Times: 16 Min

The friction between our data science and platform engineering teams had become untenable. Data scientists, working in Python notebooks with Pandas and PyTorch, would finalize a computer vision model and essentially throw a container image over a virtual wall. The platform team would then begin a multi-day manual process: provisioning an S3 bucket for model artifacts, spinning up a GPU-enabled EKS node group, writing Kubernetes deployment manifests, and then, the most tedious part, manually configuring Kong for API exposure, rate limiting, and authentication. Every new model or version was a ticket, a series of meetings, and a potential source of human error. The lead time from a “model ready” state to a production-ready API endpoint was measured in weeks.

This process was not scalable. Our goal became to build a fully declarative, self-service platform. A data scientist should be able to define their entire stack—from cloud infrastructure to API gateway configuration—in a single YAML file and manage its lifecycle via Git. The platform’s job is to make that declaration a reality.

Our core technology choices were already in place: Kubernetes as the orchestrator, AWS as the cloud provider, and Kong as our API gateway. The missing piece was a declarative engine that could bridge the gap between a high-level application definition and the low-level resources required. We evaluated Terraform, but its stateful, CLI-driven nature didn’t fit the Kubernetes-native, control-plane model we envisioned. We needed to extend the Kubernetes API itself. This led us directly to Crossplane. Crossplane allows us to create our own custom Kubernetes APIs for infrastructure, which was the foundational concept for our new platform.

Phase 1: Defining the Platform’s API with a CompositeResourceDefinition (XRD)

The first step in building a declarative platform is to define the API contract for its users. In our case, the users are data scientists. They shouldn’t need to know about AWS IAM Roles or Kong’s KongPlugin custom resources. They should only declare their intent. We decided to create a single, powerful Custom Resource (CR) called ModelService.

This is what a data scientist should be able to write:

# A data scientist's request for a new CV model deployment
apiVersion: platform.acme.com/v1alpha1
kind: ModelService
metadata:
  name: image-classifier-v2
  namespace: data-science
spec:
  # --- Model Configuration ---
  image: "acme-repo.io/cv/image-classifier:v2.1.0"
  modelDataPath: "models/image-classifier/resnet50-v2.1.0.pth"
  
  # --- Infrastructure Requirements ---
  compute:
    instanceType: "g4dn.xlarge" # Requires GPU
    replicas: 2

  # --- API Gateway Configuration ---
  api:
    path: "/inference/image-classifier/v2"
    authentication:
      method: "key-auth" # Require an API key to access

To make Kubernetes understand this ModelService resource, we defined a CompositeResourceDefinition (XRD) using Crossplane. The XRD is the schema for our custom API. It defines the fields, their types, and validation rules. A real-world project requires robust validation to prevent misconfigurations before they happen.

# xrd-modelservice.yaml
apiVersion: apiextensions.crossplane.io/v1
kind: CompositeResourceDefinition
metadata:
  name: modelservices.platform.acme.com
spec:
  group: platform.acme.com
  names:
    kind: ModelService
    listKind: ModelServiceList
    plural: modelservices
    singular: modelservice
  claimNames:
    kind: ModelService
    listKind: ModelServiceList
    plural: modelservices
    singular: modelservice
  versions:
  - name: v1alpha1
    served: true
    referenceable: true
    schema:
      openAPIV3Schema:
        type: object
        properties:
          spec:
            type: object
            description: "Defines the desired state of a CV Model Service."
            properties:
              image:
                type: string
                description: "The container image for the model inference server."
              modelDataPath:
                type: string
                description: "Path within the S3 bucket to the model artifacts."
              compute:
                type: object
                properties:
                  instanceType:
                    type: string
                    description: "The EC2 instance type for the Kubernetes node group."
                    enum: ["g4dn.xlarge", "p3.2xlarge", "c5.large"] # Enforce allowed instance types
                    default: "c5.large"
                  replicas:
                    type: integer
                    description: "Number of inference server replicas."
                    minimum: 1
                    maximum: 10
                    default: 1
                required: ["instanceType", "replicas"]
              api:
                type: object
                properties:
                  path:
                    type: string
                    description: "The public API path for the model endpoint."
                    pattern: "^/[a-zA-Z0-9/_-]+$"
                  authentication:
                    type: object
                    properties:
                      method:
                        type: string
                        description: "Authentication method to be enforced by Kong."
                        enum: ["key-auth", "none"]
                        default: "key-auth"
                    required: ["method"]
                required: ["path", "authentication"]
            required: ["image", "modelDataPath", "compute", "api"]

This XRD is the cornerstone of our platform. We’ve defined clear boundaries: data scientists specify what they need (g4dn.xlarge instance), not how to provision it. The enum validation is critical in a production environment to control costs and ensure teams use standardized, approved instance types.

Phase 2: Implementing the Logic with a Crossplane Composition

With the API defined, we now need to implement the logic that translates a ModelService resource into actual infrastructure. This is done with a Crossplane Composition. The Composition is a template that maps the fields from the user’s ModelService claim to a collection of managed resources—AWS resources, Helm charts, and raw Kubernetes objects.

A common mistake is to create a monolithic Composition. For maintainability, we broke ours down into logical components: storage, compute, and application deployment.

# composition-modelservice.yaml
apiVersion: apiextensions.crossplane.io/v1
kind: Composition
metadata:
  name: modelservice.aws.platform.acme.com
  labels:
    provider: aws
spec:
  compositeTypeRef:
    apiVersion: platform.acme.com/v1alpha1
    kind: ModelService
  # Write connection details (like the bucket name) to a secret
  writeConnectionSecretsToNamespace: crossplane-system
  resources:
    # 1. S3 Bucket for Model Artifacts
    - name: model-artifact-bucket
      base:
        apiVersion: s3.aws.upbound.io/v1beta1
        kind: Bucket
        spec:
          forProvider:
            region: us-east-1
            acl: private
            versioningConfiguration:
              - status: Enabled
      patches:
        - fromFieldPath: "metadata.name"
          toFieldPath: "metadata.name"
          transforms:
            - type: string
              string:
                fmt: "model-artifacts-%s"
        - fromFieldPath: "metadata.name"
          toFieldPath: "spec.forProvider.tags.modelService"

    # 2. IAM Role for Service Account (IRSA) for pod access to S3
    - name: s3-access-role
      base:
        apiVersion: iam.aws.upbound.io/v1beta1
        kind: Role
        spec:
          forProvider:
            assumeRolePolicy: |
              {
                "Version": "2012-10-17",
                "Statement": [
                  {
                    "Effect": "Allow",
                    "Principal": {
                      "Federated": "arn:aws:iam::123456789012:oidc-provider/oidc.eks.us-east-1.amazonaws.com/id/EXAMPLED539D4633E53BF8EADF3BEA2"
                    },
                    "Action": "sts:AssumeRoleWithWebIdentity"
                  }
                ]
              }
      patches:
        # Complex patches are often needed to construct ARNs or policy documents
        - fromFieldPath: "metadata.namespace"
          toFieldPath: "spec.forProvider.assumeRolePolicy"
          transforms:
            - type: match
              match:
                patterns:
                  - pattern: 'system:serviceaccount:([^:]+):([^:]+)'
                    result: '{"Version":"2012-10-17","Statement":[{"Effect":"Allow","Principal":{"Federated":"arn:aws:iam::123456789012:oidc-provider/oidc.eks.us-east-1.amazonaws.com/id/EXAMPLED539D4633E53BF8EADF3BEA2"},"Action":"sts:AssumeRoleWithWebIdentity","Condition":{"StringEquals":{"oidc.eks.us-east-1.amazonaws.com/id/EXAMPLED539D4633E53BF8EADF3BEA2:sub":"system:serviceaccount:%s:%s"}}}]}'
              vars:
                - fromFieldPath: "metadata.namespace"
                - fromFieldPath: "metadata.name" # We'll tie the ServiceAccount name to the ModelService name

    # 3. Kubernetes Objects (Deployment, Service, ServiceAccount, Kong Ingress)
    - name: kubernetes-application
      base:
        apiVersion: pkg.crossplane.io/v1
        kind: Object
        spec:
          forProvider:
            manifest:
              apiVersion: v1
              kind: List
              items:
                # --- ServiceAccount ---
                - apiVersion: v1
                  kind: ServiceAccount
                  metadata:
                    annotations:
                      # This annotation links the K8s SA to the AWS IAM Role
                      eks.amazonaws.com/role-arn: "" # This will be patched
                # --- Deployment ---
                - apiVersion: apps/v1
                  kind: Deployment
                  spec:
                    replicas: 1
                    selector:
                      matchLabels: {}
                    template:
                      metadata:
                        labels: {}
                      spec:
                        serviceAccountName: "" # Patched
                        nodeSelector:
                          node.kubernetes.io/instance-type: "" # Patched
                        containers:
                        - name: inference-server
                          image: "" # Patched
                          ports:
                          - containerPort: 8000
                          env:
                            - name: S3_BUCKET_NAME
                              valueFrom:
                                secretKeyRef:
                                  name: "" # Patched from connection secret
                                  key: "bucketName"
                            - name: MODEL_PATH
                              value: "" # Patched
                # --- Service ---
                - apiVersion: v1
                  kind: Service
                  spec:
                    selector: {}
                    ports:
                      - protocol: TCP
                        port: 80
                        targetPort: 8000
                # --- Kong Ingress ---
                - apiVersion: networking.k8s.io/v1
                  kind: Ingress
                  metadata:
                    name: model-service-ingress
                    annotations:
                      konghq.com/strip-path: "true"
                  spec:
                    ingressClassName: kong
                    rules:
                    - http:
                        paths:
                        - pathType: Prefix
                          path: "" # Patched
                          backend:
                            service:
                              name: "" # Patched
                              port:
                                number: 80
      patches:
        # Patching all the dynamic values into the Kubernetes manifest templates...
        # A few examples:
        - fromFieldPath: "spec.image"
          toFieldPath: "spec.forProvider.manifest.items[1].spec.template.spec.containers[0].image"
        - fromFieldPath: "spec.compute.replicas"
          toFieldPath: "spec.forProvider.manifest.items[1].spec.replicas"
        - fromFieldPath: "spec.compute.instanceType"
          toFieldPath: "spec.forProvider.manifest.items[1].spec.template.spec.nodeSelector[node.kubernetes.io/instance-type]"
        - fromFieldPath: "spec.api.path"
          toFieldPath: "spec.forProvider.manifest.items[3].spec.rules[0].http.paths[0].path"
        # Patch to add Kong plugin if requested
        - fromFieldPath: "spec.api.authentication.method"
          toFieldPath: "spec.forProvider.manifest.items[3].metadata.annotations[konghq.com/plugins]"
          condition:
            type: "MatchString"
            string: "key-auth"
          transforms:
            - type: "map"
              map:
                "key-auth": "key-auth-plugin" # Name of a pre-existing KongPlugin resource

The pitfall here is the complexity of patching. Crossplane’s patching is powerful but can become unwieldy. We spent significant time debugging path expressions and using transforms correctly. In a real-world project, keeping Compositions clean and well-documented is paramount for long-term maintenance. The conditional patch for the Kong plugin is a key pattern: it allows us to build optional features into the platform controlled by the user’s CR.

Phase 3: The Production-Grade Inference Server

The platform is useless without a robust application to run on it. We provided our data science teams with a template for a production-ready inference server using FastAPI. It’s not just a “hello world” script; it includes structured logging, error handling, and uses Pandas for input validation and preprocessing, a common pattern in data science workflows.

# main.py - Inference Server
import os
import logging
import pandas as pd
import torch
from fastapi import FastAPI, HTTPException, Request
from pydantic import BaseModel, ValidationError
from typing import List
import boto3

# --- Configuration & Logging ---
logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s - %(name)s - %(levelname)s - %(message)s",
)
logger = logging.getLogger(__name__)

# --- Environment Variable Loading ---
# A common mistake is hardcoding values. Always pull from the environment.
S3_BUCKET = os.getenv("S3_BUCKET_NAME")
MODEL_PATH = os.getenv("MODEL_PATH")
LOCAL_MODEL_FILE = "/app/model.pth"

if not all([S3_BUCKET, MODEL_PATH]):
    logger.error("Missing required environment variables: S3_BUCKET_NAME, MODEL_PATH")
    raise SystemExit("Configuration error.")

# --- Model Loading ---
def download_model_from_s3(bucket: str, key: str, destination: str):
    """Downloads the model artifact from S3 if it doesn't exist locally."""
    if os.path.exists(destination):
        logger.info(f"Model {destination} already exists locally.")
        return
    try:
        logger.info(f"Downloading model s3://{bucket}/{key} to {destination}...")
        s3_client = boto3.client("s3")
        s3_client.download_file(bucket, key, destination)
        logger.info("Model download complete.")
    except Exception as e:
        logger.critical(f"Failed to download model from S3: {e}")
        raise

# A global variable for the model is acceptable for read-only inference servers.
# It avoids reloading the model on every request.
model = None

def load_model():
    """Load the model into memory."""
    global model
    try:
        download_model_from_s3(S3_BUCKET, MODEL_PATH, LOCAL_MODEL_FILE)
        # Here you would load your specific model, e.g., a PyTorch CV model
        # model = torch.load(LOCAL_MODEL_FILE)
        # model.eval()
        # For this example, we'll mock it
        model = {"name": "mock-resnet50", "version": "2.1.0"}
        logger.info(f"Successfully loaded model: {model['name']} v{model['version']}")
    except Exception as e:
        logger.critical(f"Could not load model: {e}")
        raise SystemExit("Model loading failure.")

# --- API Definition ---
app = FastAPI()

class InferenceRequest(BaseModel):
    image_url: str
    request_id: str

class InferenceResponse(BaseModel):
    request_id: str
    class_label: str
    confidence: float

@app.on_event("startup")
async def startup_event():
    """On application startup, load the model."""
    load_model()

@app.post("/predict", response_model=InferenceResponse)
async def predict(request: Request):
    """Main prediction endpoint."""
    try:
        json_body = await request.json()
        # Using Pandas for robust validation and preprocessing is a common pattern.
        # It handles missing data, type conversion, etc., more gracefully than manual checks.
        df = pd.json_normalize([json_body])
        df.rename(columns={'image_url': 'ImageUrl', 'request_id': 'RequestId'}, inplace=True)
        
        # Validate required columns
        if not all(col in df.columns for col in ['ImageUrl', 'RequestId']):
             raise ValueError("Missing 'image_url' or 'request_id' in request.")

        # In a real CV model, you'd fetch the image from df['ImageUrl'].iloc[0],
        # preprocess it, and run inference.
        # result = model(preprocessed_image)

        logger.info(f"Processing prediction for request_id: {df['RequestId'].iloc[0]}")

        # Mocked response
        return InferenceResponse(
            request_id=df['RequestId'].iloc[0],
            class_label="cat",
            confidence=0.98,
        )
    except ValidationError as e:
        logger.warning(f"Validation error: {e.errors()}")
        raise HTTPException(status_code=422, detail=e.errors())
    except Exception as e:
        logger.error(f"Internal server error: {e}", exc_info=True)
        raise HTTPException(status_code=500, detail="Internal server error.")

@app.get("/healthz")
def health_check():
    """Health check endpoint for Kubernetes probes."""
    if model is None:
        raise HTTPException(status_code=503, detail="Model not loaded.")
    return {"status": "ok"}

This server code is packaged into a container with a simple Dockerfile. The key is that it’s configured entirely through environment variables, which are injected by Crossplane via the Kubernetes Deployment manifest.

Phase 4: The Complete GitOps Workflow

The final piece is tying everything together in a GitOps loop. We use ArgoCD, which monitors a Git repository for changes.

Here is the flow we achieved:

graph TD
    A[Data Scientist] -->|1. git push| B(Git Repo: model-services/);
    B -- "model-classifier-v2.yaml" --> C{ArgoCD};
    C -->|2. Detects Change & Syncs| D[K8s API Server];
    D -->|3. Creates ModelService CR| E[Crossplane];
    E -->|4. Reconciles Composition| F(AWS Provider);
    F -->|5. Provisions| G[AWS S3 Bucket];
    E -->|6. Reconciles Composition| H(Kubernetes Provider);
    H -->|7. Creates| I[Deployment, Service, Ingress];
    I --> J[K8s Scheduler];
    J -->|8. Schedules Pod| K[GPU Node];
    K -- "Pod Starts" --> G;
    subgraph Kong API Gateway
        L[Kong Ingress Controller];
        M[Kong Proxy];
    end
    L -->|9. Watches Ingress & Configures| M;
    I --> L;
    O[End User] -->|10. API Call: /inference/image-classifier/v2| M;
    M -->|11. Proxies to Pod| K;

A data scientist wanting to deploy a new model now performs these steps:

Finalizes their model and inference server code, pushing a new container image acme-repo.io/cv/image-classifier:v2.1.0.
Uploads the model artifact resnet50-v2.1.0.pth to a staging S3 location.
Creates a simple ModelService YAML file in a designated Git repository.
Submits a pull request.

Once the PR is approved and merged, the automation takes over. Within minutes, Crossplane provisions the dedicated S3 bucket, copies the artifact (a step handled by a small utility Job we also deploy), configures the IAM permissions, and deploys the application to the cluster. Simultaneously, the Kong Ingress Controller sees the new Ingress object created by Crossplane and configures Kong to route traffic to the new service, applying the specified key-auth plugin. The entire end-to-end process is now self-service, auditable via Git history, and dramatically faster.

The final state for the data scientist is the simple declaration they committed. They don’t interact with AWS consoles or kubectl. They have a production-ready, secure API endpoint, and the platform team has a scalable, maintainable system.

The true power of this model is its composability. The platform is not rigid. If a new requirement emerges—for instance, a need for Redis caching—we don’t need to rewrite the entire process. We can simply add a cache section to the ModelService XRD and update the Composition to provision an AWS ElastiCache instance, patching its connection details into the Deployment‘s environment variables. The platform API evolves without breaking existing users.

However, this solution is not without its own complexities and trade-offs. The learning curve for Crossplane, particularly for writing and debugging complex Compositions with patches and transforms, is steep. The reconciliation loops, while robust, introduce a level of eventual consistency that can be confusing; a failed AWS resource provisioning might not be immediately obvious to the data scientist who only sees their ModelService CR. Furthermore, while we’ve abstracted away the infrastructure, we’ve also created a powerful but potentially opaque abstraction layer. When things go wrong, debugging requires expertise in every part of the stack, from the ModelService CR down to the cloud provider’s API calls. The next iteration of this platform must focus on providing better observability and clearer feedback loops for its users, likely by building a custom controller that updates the status subresource of our ModelService CR with rich, human-readable information about the provisioning process.

Kong Kubernetes Pandas Crossplane CV AI Data Science

Building a Type-Safe Data Hydration Layer in Haskell for SSR Using SQL Server CDC and a NoSQL Cache

2023-10-27 Data Engineering

Redis SSR Architecture SQL Server CDC NoSQL Haskell

Constructing a Resumable Real-Time Data Pipeline Observability UI with Qwik and Apache Hudi

2023-10-27 Data Engineering

Qwik Observability Apache Hudi Google Cloud Pub/Sub Real-Time Systems