The feedback loop for updating on-device machine learning models is fundamentally broken in most mobile development cycles. A newly trained, higher-accuracy model is ready, but deploying it requires a full application update, submission to the App Store or Play Store, and a lengthy review process. This latency is unacceptable for rapidly evolving models. We needed a system to decouple the model update cycle from the application release cycle, allowing us to push new models to our mobile clients dynamically, securely, and with full visibility into the process.
Our initial architecture concept was grounded in GitOps principles: a Git repository would serve as the single source of truth for our models. Any commit to the main
branch should trigger an automated pipeline that validates the model, versions it, and deploys it to an endpoint accessible by our mobile apps. Critically, we also needed a lightweight internal dashboard to track the currently active model, its performance metrics, and its history. A full-blown microservices architecture felt like over-engineering for this specific problem. We settled on a lean stack: GitHub Actions for CI/CD, Caddy as a high-performance file server and reverse proxy, and a Next.js frontend using Incremental Static Regeneration (ISR) for the dashboard.
This combination seemed promising. Caddy could provide automatic HTTPS for our model download endpoint out-of-the-box. ISR would give us a dashboard that felt instantaneous to load—as it’s served statically—but could update itself in the background moments after a new model was deployed. The entire system would be automated, triggered by a simple git push
.
The first piece of the puzzle was the CI pipeline itself. We established a repository to store our model-building scripts and the model itself. The pipeline, defined in a GitHub Actions workflow, needed to perform several key steps: checkout the code, run a validation script to ensure the model meets our performance and accuracy thresholds, package the model artifact (model.tflite
), and generate a corresponding metadata.json
file.
Here’s the initial structure of our GitHub Actions workflow.
# .github/workflows/deploy_model.yml
name: Deploy Mobile ML Model
on:
push:
branches:
- main
jobs:
build-and-deploy:
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.9'
cache: 'pip'
- name: Install dependencies
run: pip install -r requirements.txt
- name: Validate model
id: validation
run: |
# This script would perform real model validation.
# For this example, it simulates validation and generates artifacts.
# In a real-world project, this script is the core of your MLOps logic.
# It would check for accuracy degradation, inference speed, model size, etc.
python scripts/validate_and_package.py
# Capture the generated version from the script's output
MODEL_VERSION=$(cat build/metadata.json | jq -r .version)
echo "model_version=${MODEL_VERSION}" >> $GITHUB_OUTPUT
- name: Display validation results
run: |
echo "Validation complete. New model version: ${{ steps.validation.outputs.model_version }}"
echo "Metadata:"
cat build/metadata.json
# The next step is deployment, which we will build out.
The core logic resides in scripts/validate_and_package.py
. This is not just a placeholder; in a production environment, this script is where the MLOps rigor is applied. It must be robust. It loads the candidate model, runs it against a golden dataset, and checks for regressions in key business metrics. If the validation passes, it creates the final artifacts.
A common mistake is to only check for accuracy. In a mobile context, model size and inference latency are equally, if not more, important. A model that is 1% more accurate but 50% slower is often a net loss for user experience.
Here is a pragmatic, production-oriented example of such a script. Note the inclusion of error handling and clear artifact generation.
# scripts/validate_and_package.py
import json
import os
import time
import uuid
import random # For simulating metrics
# In a real project, you would import tensorflow, pytorch, onnxruntime, etc.
# import tensorflow as tf
# --- Configuration ---
BUILD_DIR = "build"
MODEL_FILE = "model.tflite"
METADATA_FILE = "metadata.json"
VALIDATION_DATASET_PATH = "data/validation_set.npy"
# --- Performance Thresholds ---
# These are critical for preventing regressions.
MIN_ACCURACY = 0.92
MAX_LATENCY_MS = 50.0 # Milliseconds per inference on a reference CPU
MAX_MODEL_SIZE_KB = 2048
def simulate_inference(model_path):
"""
Simulates running inference and returns performance metrics.
In a real implementation, this would load the model and run it.
"""
print("Simulating model inference...")
# Simulate latency
latency = random.uniform(35.0, 60.0)
# Simulate accuracy
accuracy = random.uniform(0.90, 0.95)
time.sleep(2) # Simulate workload
print(f" - Simulated Accuracy: {accuracy:.4f}")
print(f" - Simulated Latency: {latency:.2f} ms")
return accuracy, latency
def main():
"""
Main validation and packaging logic.
Exits with a non-zero status code on failure.
"""
print("--- Starting Model Validation and Packaging ---")
# For this example, we assume the model file `model.tflite` exists in the repo root.
if not os.path.exists(MODEL_FILE):
print(f"Error: Model file '{MODEL_FILE}' not found.")
exit(1)
# 1. Check Model Size
model_size_bytes = os.path.getsize(MODEL_FILE)
model_size_kb = model_size_bytes / 1024
print(f"Model size: {model_size_kb:.2f} KB")
if model_size_kb > MAX_MODEL_SIZE_KB:
print(f"Error: Model size ({model_size_kb:.2f} KB) exceeds threshold ({MAX_MODEL_SIZE_KB} KB).")
exit(1)
print(" - Size check: PASSED")
# 2. Run Performance Validation
# In a real scenario, you'd load the validation dataset and the model here.
# accuracy, latency = run_validation(MODEL_FILE, VALIDATION_DATASET_PATH)
accuracy, latency = simulate_inference(MODEL_FILE)
if accuracy < MIN_ACCURACY:
print(f"Error: Model accuracy ({accuracy:.4f}) is below threshold ({MIN_ACCURACY}).")
exit(1)
print(" - Accuracy check: PASSED")
if latency > MAX_LATENCY_MS:
print(f"Error: Model latency ({latency:.2f} ms) exceeds threshold ({MAX_LATENCY_MS} ms).")
exit(1)
print(" - Latency check: PASSED")
# 3. All checks passed. Generate artifacts.
print("All validation checks passed. Generating deployment artifacts...")
os.makedirs(BUILD_DIR, exist_ok=True)
# Generate a unique version ID for this build
version_id = str(uuid.uuid4())
# Create metadata
metadata = {
"version": version_id,
"timestamp_utc": time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime()),
"metrics": {
"accuracy": round(accuracy, 4),
"latency_ms": round(latency, 2),
"size_kb": round(model_size_kb, 2)
},
"filename": MODEL_FILE,
"download_path": f"/models/{version_id}/{MODEL_FILE}"
}
# Write metadata file
metadata_path = os.path.join(BUILD_DIR, METADATA_FILE)
with open(metadata_path, 'w') as f:
json.dump(metadata, f, indent=2)
print(f"Generated metadata at '{metadata_path}'")
# Copy model file to build directory, renaming it to its version
versioned_model_dir = os.path.join(BUILD_DIR, version_id)
os.makedirs(versioned_model_dir, exist_ok=True)
final_model_path = os.path.join(versioned_model_dir, MODEL_FILE)
os.rename(MODEL_FILE, final_model_path)
print(f"Packaged model at '{final_model_path}'")
print("--- Artifact generation complete. ---")
if __name__ == "__main__":
main()
With the artifact generation handled, the next challenge was publishing. Instead of a complex cloud storage setup, we opted for a simple, robust solution using rsync
over SSH to push the build artifacts to our Caddy server. This is a common pattern for smaller, pragmatic setups and avoids cloud provider lock-in.
The GitHub Actions workflow required secrets for the SSH connection (SSH_PRIVATE_KEY
, SSH_HOST
, SSH_USER
) and a new deployment step.
# .github/workflows/deploy_model.yml (...continued)
- name: Deploy to Caddy Server
uses: easingthemes/ssh-[email protected]
with:
SSH_PRIVATE_KEY: ${{ secrets.SSH_PRIVATE_KEY }}
ARGS: "-rltgoDzvO --delete" # rsync arguments
SOURCE: "build/"
REMOTE_HOST: ${{ secrets.SSH_HOST }}
REMOTE_USER: ${{ secrets.SSH_USER }}
TARGET: "/var/www/models/artifacts"
This step synchronizes the contents of the local build/
directory with /var/www/models/artifacts
on the server. The --delete
flag ensures that old, unreferenced build artifacts from failed runs are cleaned up.
Now, we needed Caddy to serve these files. Caddy’s configuration, the Caddyfile
, is famously simple. We needed it to do two things:
- Serve the model artifacts from the directory our CI pipeline deploys to.
- Act as a reverse proxy for our Next.js dashboard application, which we’ll run on port 3000.
# /etc/caddy/Caddyfile
# Replace with your actual domain
model-hub.yourdomain.com {
# Automatic HTTPS is Caddy's killer feature.
# It handles certificate acquisition and renewal from Let's Encrypt.
tls [email protected]
# Centralized logging in JSON format.
log {
output file /var/log/caddy/model-hub.log {
roll_size 10mb
roll_keep 5
}
format json
}
# API endpoint for the mobile app to get the latest model metadata.
# We serve the `metadata.json` directly.
# By convention, we'll ensure the CI process always updates a `latest` symlink
# or file to point to the most recent metadata.
handle_path /api/latest-model {
root * /var/www/models/artifacts
file_server {
index latest.json # We'll create this file in CI
}
header Content-Type application/json
}
# Securely serve the versioned model files.
# e.g., /models/uuid-goes-here/model.tflite
handle_path /models/* {
root * /var/www/models/artifacts
file_server
}
# Reverse proxy for the Next.js ISR Dashboard
handle {
reverse_proxy localhost:3000
}
# Standard headers for security and best practices
header {
Strict-Transport-Security "max-age=31536000;"
X-Content-Type-Options "nosniff"
X-Frame-Options "DENY"
Referrer-Policy "strict-origin-when-cross-origin"
}
}
A problem emerged in this design: how does the mobile client know the URL of the latest model? Hardcoding is not an option. The client needs a stable endpoint to query for the latest metadata. To solve this, we modified our CI pipeline. After successfully rsync
-ing the versioned artifacts, it executes a remote command over SSH to update a latest.json
file. This file is a copy of the latest metadata.json
.
# .github/workflows/deploy_model.yml (...final deployment step)
- name: Deploy and Promote Model
env:
SSH_HOST: ${{ secrets.SSH_HOST }}
SSH_USER: ${{ secrets.SSH_USER }}
MODEL_VERSION: ${{ steps.validation.outputs.model_version }}
uses: appleboy/ssh-action@master
with:
host: ${{ env.SSH_HOST }}
username: ${{ env.SSH_USER }}
key: ${{ secrets.SSH_PRIVATE_KEY }}
script: |
# Use rsync to transfer the new version
rsync -rltgoDzvO --delete build/ /var/www/models/artifacts/
# Atomically update the 'latest' metadata file
# This prevents a race condition where a client might fetch an incomplete file.
cp "/var/www/models/artifacts/${MODEL_VERSION}/metadata.json" "/var/www/models/artifacts/latest.json.tmp"
mv "/var/www/models/artifacts/latest.json.tmp" "/var/www/models/artifacts/latest.json"
echo "Promotion successful. latest.json now points to version ${MODEL_VERSION}."
This atomic mv
operation is a crucial detail for production robustness.
With the model delivery backend in place, we turned to the ISR dashboard. We created a standard Next.js application. The main page would display the information from latest.json
. The key is using getStaticProps
with a revalidate
value.
// pages/index.js
import Head from 'next/head';
import styles from '../styles/Home.module.css';
// This function runs on the server at build time and on-demand via ISR.
export async function getStaticProps() {
const LATEST_METADATA_URL = 'http://localhost:80/api/latest-model';
try {
// We fetch from Caddy, not directly from the filesystem.
// This decouples Next.js from the file structure and uses the public API.
const res = await fetch(LATEST_METADATA_URL);
if (!res.ok) {
// Handle cases where the metadata file might not exist yet
throw new Error(`Failed to fetch latest model data, status: ${res.status}`);
}
const data = await res.json();
return {
props: {
modelData: data,
error: null,
},
// Incremental Static Regeneration:
// Re-generate this page in the background at most once every 60 seconds
// if a request comes in. We will also add on-demand revalidation.
revalidate: 60,
};
} catch (error) {
console.error("Error in getStaticProps:", error.message);
return {
props: {
modelData: null,
error: `Could not load model data. Is the backend running? (${error.message})`,
},
revalidate: 10, // Try to refetch sooner if there was an error
};
}
}
export default function Home({ modelData, error }) {
return (
<div className={styles.container}>
<Head>
<title>Mobile ML Model Hub</title>
<meta name="description" content="Dashboard for the currently deployed mobile ML model" />
</Head>
<main className={styles.main}>
<h1 className={styles.title}>
Mobile ML Model Deployment Status
</h1>
{error && <p className={styles.error}>{error}</p>}
{modelData && (
<div className={styles.card}>
<h2>Current Active Model</h2>
<p><strong>Version:</strong> <code>{modelData.version}</code></p>
<p><strong>Deployed At (UTC):</strong> {new Date(modelData.timestamp_utc).toLocaleString('en-GB', { timeZone: 'UTC' })}</p>
<p><strong>Model File:</strong> {modelData.filename}</p>
<p><strong>Download Path:</strong> <code>{modelData.download_path}</code></p>
<h3>Validation Metrics</h3>
<div className={styles.grid}>
<div className={styles.metric}>
<h4>Accuracy</h4>
<p>{(modelData.metrics.accuracy * 100).toFixed(2)}%</p>
</div>
<div className={styles.metric}>
<h4>Latency</h4>
<p>{modelData.metrics.latency_ms.toFixed(2)} ms</p>
</div>
<div className={styles.metric}>
<h4>Size</h4>
<p>{modelData.metrics.size_kb.toFixed(2)} KB</p>
</div>
</div>
</div>
)}
{!modelData && !error && <p>Loading model data...</p>}
</main>
</div>
);
}
The periodic revalidation (revalidate: 60
) is a good fallback, but we wanted instant updates. When the CI pipeline finishes, the dashboard should reflect the change immediately. This requires on-demand revalidation. We created a secure API route in Next.js for this purpose.
// pages/api/revalidate.js
export default async function handler(req, res) {
// 1. Security Check: Use a secret token to prevent unauthorized access.
// This token should be passed in a header, not a query param.
const revalidation_token = process.env.REVALIDATE_TOKEN;
const incoming_token = req.headers['x-revalidate-token'];
if (!revalidation_token) {
// Log an error on the server if the token is not configured.
console.error("REVALIDATE_TOKEN is not set in environment variables.");
return res.status(500).json({ message: 'Revalidation token not configured.' });
}
if (incoming_token !== revalidation_token) {
return res.status(401).json({ message: 'Invalid token' });
}
// 2. Revalidation Logic
try {
// We want to revalidate the home page ('/').
await res.revalidate('/');
console.log("Successfully triggered revalidation for '/'");
return res.json({ revalidated: true });
} catch (err) {
// If there was an error, Next.js will continue to serve the last
// successfully generated page.
console.error("Error during revalidation:", err);
return res.status(500).send('Error revalidating');
}
}
This required an update to our Caddyfile
to proxy this specific API route and our CI pipeline to call it.
# /etc/caddy/Caddyfile (addition)
# Handle the on-demand revalidation webhook securely.
# We add a specific handler before the main reverse_proxy handle.
# Caddy processes handlers in order.
handle_path /api/revalidate {
reverse_proxy localhost:3000
}
And the final addition to the GitHub Actions workflow:
# .github/workflows/deploy_model.yml (final step)
- name: Trigger ISR Revalidation
if: success()
run: |
curl -X POST \
-H "Content-Type: application/json" \
-H "x-revalidate-token: ${{ secrets.REVALIDATE_TOKEN }}" \
https://model-hub.yourdomain.com/api/revalidate
The complete flow is now visible.
graph TD A[Developer pushes to Git] --> B{GitHub Actions CI}; B --> C[Validate Model: Accuracy, Latency, Size]; C -- Pass --> D[Package Artifacts: model.tflite, metadata.json]; D --> E[Deploy to Server via rsync]; E --> F[Atomically Update latest.json]; F --> G[Trigger Revalidation Webhook]; subgraph Caddy Server H[Caddy] I[Next.js App] J[Static Files: /models/*, latest.json] end G --> H; H -- Proxies /api/revalidate --> I; I -- Calls res.revalidate() --> I; I -- Regenerates Page --> J[Static HTML for /]; subgraph Consumers K[Mobile App] L[Engineer] end K -- GET /api/latest-model --> H; H -- Serves latest.json --> K; K -- GET /models/.../model.tflite --> H; H -- Serves model file --> K; L -- Views Dashboard --> H; H -- Serves Static HTML --> L;
This architecture provides a robust, observable, and highly efficient pipeline for our mobile ML models. It’s built on simple, composable tools, avoiding the overhead of a massive MLOps platform where it isn’t needed.
This solution, however, is not without its limitations. The single-server approach for Caddy and Next.js is a potential single point of failure and doesn’t scale horizontally. The natural next step would be to replace the local file storage with a distributed object store like S3 or R2, and run the Next.js application in a containerized, replicated environment. The core logic of the pipeline would remain identical, but the deployment target would change. Furthermore, the current system only manages a single “latest” model track. A production-grade system would need to support multiple deployment channels (e.g., development
, beta
, production
) to allow for canary testing of new models on a subset of users before a full rollout. This would involve a more complex metadata structure and corresponding logic on both the client and the backend.