The friction between our data science and platform engineering teams had become untenable. Data scientists, working in Python notebooks with Pandas and PyTorch, would finalize a computer vision model and essentially throw a container image over a virtual wall. The platform team would then begin a multi-day manual process: provisioning an S3 bucket for model artifacts, spinning up a GPU-enabled EKS node group, writing Kubernetes deployment manifests, and then, the most tedious part, manually configuring Kong for API exposure, rate limiting, and authentication. Every new model or version was a ticket, a series of meetings, and a potential source of human error. The lead time from a “model ready” state to a production-ready API endpoint was measured in weeks.
This process was not scalable. Our goal became to build a fully declarative, self-service platform. A data scientist should be able to define their entire stack—from cloud infrastructure to API gateway configuration—in a single YAML file and manage its lifecycle via Git. The platform’s job is to make that declaration a reality.
Our core technology choices were already in place: Kubernetes as the orchestrator, AWS as the cloud provider, and Kong as our API gateway. The missing piece was a declarative engine that could bridge the gap between a high-level application definition and the low-level resources required. We evaluated Terraform, but its stateful, CLI-driven nature didn’t fit the Kubernetes-native, control-plane model we envisioned. We needed to extend the Kubernetes API itself. This led us directly to Crossplane. Crossplane allows us to create our own custom Kubernetes APIs for infrastructure, which was the foundational concept for our new platform.
Phase 1: Defining the Platform’s API with a CompositeResourceDefinition (XRD)
The first step in building a declarative platform is to define the API contract for its users. In our case, the users are data scientists. They shouldn’t need to know about AWS IAM Roles or Kong’s KongPlugin
custom resources. They should only declare their intent. We decided to create a single, powerful Custom Resource (CR) called ModelService
.
This is what a data scientist should be able to write:
# A data scientist's request for a new CV model deployment
apiVersion: platform.acme.com/v1alpha1
kind: ModelService
metadata:
name: image-classifier-v2
namespace: data-science
spec:
# --- Model Configuration ---
image: "acme-repo.io/cv/image-classifier:v2.1.0"
modelDataPath: "models/image-classifier/resnet50-v2.1.0.pth"
# --- Infrastructure Requirements ---
compute:
instanceType: "g4dn.xlarge" # Requires GPU
replicas: 2
# --- API Gateway Configuration ---
api:
path: "/inference/image-classifier/v2"
authentication:
method: "key-auth" # Require an API key to access
To make Kubernetes understand this ModelService
resource, we defined a CompositeResourceDefinition
(XRD) using Crossplane. The XRD is the schema for our custom API. It defines the fields, their types, and validation rules. A real-world project requires robust validation to prevent misconfigurations before they happen.
# xrd-modelservice.yaml
apiVersion: apiextensions.crossplane.io/v1
kind: CompositeResourceDefinition
metadata:
name: modelservices.platform.acme.com
spec:
group: platform.acme.com
names:
kind: ModelService
listKind: ModelServiceList
plural: modelservices
singular: modelservice
claimNames:
kind: ModelService
listKind: ModelServiceList
plural: modelservices
singular: modelservice
versions:
- name: v1alpha1
served: true
referenceable: true
schema:
openAPIV3Schema:
type: object
properties:
spec:
type: object
description: "Defines the desired state of a CV Model Service."
properties:
image:
type: string
description: "The container image for the model inference server."
modelDataPath:
type: string
description: "Path within the S3 bucket to the model artifacts."
compute:
type: object
properties:
instanceType:
type: string
description: "The EC2 instance type for the Kubernetes node group."
enum: ["g4dn.xlarge", "p3.2xlarge", "c5.large"] # Enforce allowed instance types
default: "c5.large"
replicas:
type: integer
description: "Number of inference server replicas."
minimum: 1
maximum: 10
default: 1
required: ["instanceType", "replicas"]
api:
type: object
properties:
path:
type: string
description: "The public API path for the model endpoint."
pattern: "^/[a-zA-Z0-9/_-]+$"
authentication:
type: object
properties:
method:
type: string
description: "Authentication method to be enforced by Kong."
enum: ["key-auth", "none"]
default: "key-auth"
required: ["method"]
required: ["path", "authentication"]
required: ["image", "modelDataPath", "compute", "api"]
This XRD is the cornerstone of our platform. We’ve defined clear boundaries: data scientists specify what they need (g4dn.xlarge
instance), not how to provision it. The enum
validation is critical in a production environment to control costs and ensure teams use standardized, approved instance types.
Phase 2: Implementing the Logic with a Crossplane Composition
With the API defined, we now need to implement the logic that translates a ModelService
resource into actual infrastructure. This is done with a Crossplane Composition
. The Composition
is a template that maps the fields from the user’s ModelService
claim to a collection of managed resources—AWS resources, Helm charts, and raw Kubernetes objects.
A common mistake is to create a monolithic Composition
. For maintainability, we broke ours down into logical components: storage, compute, and application deployment.
# composition-modelservice.yaml
apiVersion: apiextensions.crossplane.io/v1
kind: Composition
metadata:
name: modelservice.aws.platform.acme.com
labels:
provider: aws
spec:
compositeTypeRef:
apiVersion: platform.acme.com/v1alpha1
kind: ModelService
# Write connection details (like the bucket name) to a secret
writeConnectionSecretsToNamespace: crossplane-system
resources:
# 1. S3 Bucket for Model Artifacts
- name: model-artifact-bucket
base:
apiVersion: s3.aws.upbound.io/v1beta1
kind: Bucket
spec:
forProvider:
region: us-east-1
acl: private
versioningConfiguration:
- status: Enabled
patches:
- fromFieldPath: "metadata.name"
toFieldPath: "metadata.name"
transforms:
- type: string
string:
fmt: "model-artifacts-%s"
- fromFieldPath: "metadata.name"
toFieldPath: "spec.forProvider.tags.modelService"
# 2. IAM Role for Service Account (IRSA) for pod access to S3
- name: s3-access-role
base:
apiVersion: iam.aws.upbound.io/v1beta1
kind: Role
spec:
forProvider:
assumeRolePolicy: |
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Federated": "arn:aws:iam::123456789012:oidc-provider/oidc.eks.us-east-1.amazonaws.com/id/EXAMPLED539D4633E53BF8EADF3BEA2"
},
"Action": "sts:AssumeRoleWithWebIdentity"
}
]
}
patches:
# Complex patches are often needed to construct ARNs or policy documents
- fromFieldPath: "metadata.namespace"
toFieldPath: "spec.forProvider.assumeRolePolicy"
transforms:
- type: match
match:
patterns:
- pattern: 'system:serviceaccount:([^:]+):([^:]+)'
result: '{"Version":"2012-10-17","Statement":[{"Effect":"Allow","Principal":{"Federated":"arn:aws:iam::123456789012:oidc-provider/oidc.eks.us-east-1.amazonaws.com/id/EXAMPLED539D4633E53BF8EADF3BEA2"},"Action":"sts:AssumeRoleWithWebIdentity","Condition":{"StringEquals":{"oidc.eks.us-east-1.amazonaws.com/id/EXAMPLED539D4633E53BF8EADF3BEA2:sub":"system:serviceaccount:%s:%s"}}}]}'
vars:
- fromFieldPath: "metadata.namespace"
- fromFieldPath: "metadata.name" # We'll tie the ServiceAccount name to the ModelService name
# 3. Kubernetes Objects (Deployment, Service, ServiceAccount, Kong Ingress)
- name: kubernetes-application
base:
apiVersion: pkg.crossplane.io/v1
kind: Object
spec:
forProvider:
manifest:
apiVersion: v1
kind: List
items:
# --- ServiceAccount ---
- apiVersion: v1
kind: ServiceAccount
metadata:
annotations:
# This annotation links the K8s SA to the AWS IAM Role
eks.amazonaws.com/role-arn: "" # This will be patched
# --- Deployment ---
- apiVersion: apps/v1
kind: Deployment
spec:
replicas: 1
selector:
matchLabels: {}
template:
metadata:
labels: {}
spec:
serviceAccountName: "" # Patched
nodeSelector:
node.kubernetes.io/instance-type: "" # Patched
containers:
- name: inference-server
image: "" # Patched
ports:
- containerPort: 8000
env:
- name: S3_BUCKET_NAME
valueFrom:
secretKeyRef:
name: "" # Patched from connection secret
key: "bucketName"
- name: MODEL_PATH
value: "" # Patched
# --- Service ---
- apiVersion: v1
kind: Service
spec:
selector: {}
ports:
- protocol: TCP
port: 80
targetPort: 8000
# --- Kong Ingress ---
- apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: model-service-ingress
annotations:
konghq.com/strip-path: "true"
spec:
ingressClassName: kong
rules:
- http:
paths:
- pathType: Prefix
path: "" # Patched
backend:
service:
name: "" # Patched
port:
number: 80
patches:
# Patching all the dynamic values into the Kubernetes manifest templates...
# A few examples:
- fromFieldPath: "spec.image"
toFieldPath: "spec.forProvider.manifest.items[1].spec.template.spec.containers[0].image"
- fromFieldPath: "spec.compute.replicas"
toFieldPath: "spec.forProvider.manifest.items[1].spec.replicas"
- fromFieldPath: "spec.compute.instanceType"
toFieldPath: "spec.forProvider.manifest.items[1].spec.template.spec.nodeSelector[node.kubernetes.io/instance-type]"
- fromFieldPath: "spec.api.path"
toFieldPath: "spec.forProvider.manifest.items[3].spec.rules[0].http.paths[0].path"
# Patch to add Kong plugin if requested
- fromFieldPath: "spec.api.authentication.method"
toFieldPath: "spec.forProvider.manifest.items[3].metadata.annotations[konghq.com/plugins]"
condition:
type: "MatchString"
string: "key-auth"
transforms:
- type: "map"
map:
"key-auth": "key-auth-plugin" # Name of a pre-existing KongPlugin resource
The pitfall here is the complexity of patching. Crossplane’s patching is powerful but can become unwieldy. We spent significant time debugging path expressions and using transforms correctly. In a real-world project, keeping Compositions
clean and well-documented is paramount for long-term maintenance. The conditional patch for the Kong plugin is a key pattern: it allows us to build optional features into the platform controlled by the user’s CR.
Phase 3: The Production-Grade Inference Server
The platform is useless without a robust application to run on it. We provided our data science teams with a template for a production-ready inference server using FastAPI. It’s not just a “hello world” script; it includes structured logging, error handling, and uses Pandas for input validation and preprocessing, a common pattern in data science workflows.
# main.py - Inference Server
import os
import logging
import pandas as pd
import torch
from fastapi import FastAPI, HTTPException, Request
from pydantic import BaseModel, ValidationError
from typing import List
import boto3
# --- Configuration & Logging ---
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s - %(name)s - %(levelname)s - %(message)s",
)
logger = logging.getLogger(__name__)
# --- Environment Variable Loading ---
# A common mistake is hardcoding values. Always pull from the environment.
S3_BUCKET = os.getenv("S3_BUCKET_NAME")
MODEL_PATH = os.getenv("MODEL_PATH")
LOCAL_MODEL_FILE = "/app/model.pth"
if not all([S3_BUCKET, MODEL_PATH]):
logger.error("Missing required environment variables: S3_BUCKET_NAME, MODEL_PATH")
raise SystemExit("Configuration error.")
# --- Model Loading ---
def download_model_from_s3(bucket: str, key: str, destination: str):
"""Downloads the model artifact from S3 if it doesn't exist locally."""
if os.path.exists(destination):
logger.info(f"Model {destination} already exists locally.")
return
try:
logger.info(f"Downloading model s3://{bucket}/{key} to {destination}...")
s3_client = boto3.client("s3")
s3_client.download_file(bucket, key, destination)
logger.info("Model download complete.")
except Exception as e:
logger.critical(f"Failed to download model from S3: {e}")
raise
# A global variable for the model is acceptable for read-only inference servers.
# It avoids reloading the model on every request.
model = None
def load_model():
"""Load the model into memory."""
global model
try:
download_model_from_s3(S3_BUCKET, MODEL_PATH, LOCAL_MODEL_FILE)
# Here you would load your specific model, e.g., a PyTorch CV model
# model = torch.load(LOCAL_MODEL_FILE)
# model.eval()
# For this example, we'll mock it
model = {"name": "mock-resnet50", "version": "2.1.0"}
logger.info(f"Successfully loaded model: {model['name']} v{model['version']}")
except Exception as e:
logger.critical(f"Could not load model: {e}")
raise SystemExit("Model loading failure.")
# --- API Definition ---
app = FastAPI()
class InferenceRequest(BaseModel):
image_url: str
request_id: str
class InferenceResponse(BaseModel):
request_id: str
class_label: str
confidence: float
@app.on_event("startup")
async def startup_event():
"""On application startup, load the model."""
load_model()
@app.post("/predict", response_model=InferenceResponse)
async def predict(request: Request):
"""Main prediction endpoint."""
try:
json_body = await request.json()
# Using Pandas for robust validation and preprocessing is a common pattern.
# It handles missing data, type conversion, etc., more gracefully than manual checks.
df = pd.json_normalize([json_body])
df.rename(columns={'image_url': 'ImageUrl', 'request_id': 'RequestId'}, inplace=True)
# Validate required columns
if not all(col in df.columns for col in ['ImageUrl', 'RequestId']):
raise ValueError("Missing 'image_url' or 'request_id' in request.")
# In a real CV model, you'd fetch the image from df['ImageUrl'].iloc[0],
# preprocess it, and run inference.
# result = model(preprocessed_image)
logger.info(f"Processing prediction for request_id: {df['RequestId'].iloc[0]}")
# Mocked response
return InferenceResponse(
request_id=df['RequestId'].iloc[0],
class_label="cat",
confidence=0.98,
)
except ValidationError as e:
logger.warning(f"Validation error: {e.errors()}")
raise HTTPException(status_code=422, detail=e.errors())
except Exception as e:
logger.error(f"Internal server error: {e}", exc_info=True)
raise HTTPException(status_code=500, detail="Internal server error.")
@app.get("/healthz")
def health_check():
"""Health check endpoint for Kubernetes probes."""
if model is None:
raise HTTPException(status_code=503, detail="Model not loaded.")
return {"status": "ok"}
This server code is packaged into a container with a simple Dockerfile
. The key is that it’s configured entirely through environment variables, which are injected by Crossplane via the Kubernetes Deployment
manifest.
Phase 4: The Complete GitOps Workflow
The final piece is tying everything together in a GitOps loop. We use ArgoCD, which monitors a Git repository for changes.
Here is the flow we achieved:
graph TD A[Data Scientist] -->|1. git push| B(Git Repo: model-services/); B -- "model-classifier-v2.yaml" --> C{ArgoCD}; C -->|2. Detects Change & Syncs| D[K8s API Server]; D -->|3. Creates ModelService CR| E[Crossplane]; E -->|4. Reconciles Composition| F(AWS Provider); F -->|5. Provisions| G[AWS S3 Bucket]; E -->|6. Reconciles Composition| H(Kubernetes Provider); H -->|7. Creates| I[Deployment, Service, Ingress]; I --> J[K8s Scheduler]; J -->|8. Schedules Pod| K[GPU Node]; K -- "Pod Starts" --> G; subgraph Kong API Gateway L[Kong Ingress Controller]; M[Kong Proxy]; end L -->|9. Watches Ingress & Configures| M; I --> L; O[End User] -->|10. API Call: /inference/image-classifier/v2| M; M -->|11. Proxies to Pod| K;
A data scientist wanting to deploy a new model now performs these steps:
- Finalizes their model and inference server code, pushing a new container image
acme-repo.io/cv/image-classifier:v2.1.0
. - Uploads the model artifact
resnet50-v2.1.0.pth
to a staging S3 location. - Creates a simple
ModelService
YAML file in a designated Git repository. - Submits a pull request.
Once the PR is approved and merged, the automation takes over. Within minutes, Crossplane provisions the dedicated S3 bucket, copies the artifact (a step handled by a small utility Job we also deploy), configures the IAM permissions, and deploys the application to the cluster. Simultaneously, the Kong Ingress Controller sees the new Ingress
object created by Crossplane and configures Kong to route traffic to the new service, applying the specified key-auth
plugin. The entire end-to-end process is now self-service, auditable via Git history, and dramatically faster.
The final state for the data scientist is the simple declaration they committed. They don’t interact with AWS consoles or kubectl
. They have a production-ready, secure API endpoint, and the platform team has a scalable, maintainable system.
The true power of this model is its composability. The platform is not rigid. If a new requirement emerges—for instance, a need for Redis caching—we don’t need to rewrite the entire process. We can simply add a cache
section to the ModelService
XRD and update the Composition
to provision an AWS ElastiCache
instance, patching its connection details into the Deployment
‘s environment variables. The platform API evolves without breaking existing users.
However, this solution is not without its own complexities and trade-offs. The learning curve for Crossplane, particularly for writing and debugging complex Compositions
with patches and transforms, is steep. The reconciliation loops, while robust, introduce a level of eventual consistency that can be confusing; a failed AWS resource provisioning might not be immediately obvious to the data scientist who only sees their ModelService
CR. Furthermore, while we’ve abstracted away the infrastructure, we’ve also created a powerful but potentially opaque abstraction layer. When things go wrong, debugging requires expertise in every part of the stack, from the ModelService
CR down to the cloud provider’s API calls. The next iteration of this platform must focus on providing better observability and clearer feedback loops for its users, likely by building a custom controller that updates the status
subresource of our ModelService
CR with rich, human-readable information about the provisioning process.