The operational disconnect between a data science team’s model registry and the front-end interfaces that consume them is a common source of friction. In our case, the process was manual: a model would be promoted to “Production” in MLflow, followed by a ticket to the platform team. That team would then manually update a simple JavaScript-based dashboard that displayed the model’s key performance indicators, artifact locations, and version. This manual handoff was not just slow; it was a reliable source of human error, leading to mismatched versions and stale information being presented to stakeholders.
Our objective was to create a zero-touch pipeline. When a specific model version is designated for production, an automated workflow should handle the entire lifecycle: fetching metadata from MLflow, building a fresh static front-end bundle with this new data, and deploying it to the existing fleet of virtual machines that host our internal tools. The existing infrastructure is managed by Puppet, a constraint we must engineer around, not replace.
The Orchestration Core: GitHub Actions Workflow
The entire process is orchestrated by a single GitHub Actions workflow. We opted for a workflow_dispatch
trigger initially, allowing a team member to manually trigger a deployment for a specific model. This provides a control point before we evolve to a fully event-driven system based on MLflow webhooks.
The workflow is broken down into distinct jobs for clarity and potential parallelization: fetch-model-metadata
, build-frontend
, and trigger-deployment
.
# .github/workflows/deploy-model-viewer.yml
name: Deploy ML Model Viewer
on:
workflow_dispatch:
inputs:
model_name:
description: 'The name of the MLflow model to deploy'
required: true
default: 'prod-recommendation-engine'
model_version:
description: 'The version of the model to deploy (e.g., 3)'
required: true
env:
MLFLOW_TRACKING_URI: ${{ secrets.MLFLOW_TRACKING_URI }}
MLFLOW_TRACKING_USERNAME: ${{ secrets.MLFLOW_TRACKING_USERNAME }}
MLFLOW_TRACKING_PASSWORD: ${{ secrets.MLFLOW_TRACKING_PASSWORD }}
ARTIFACT_NAME: model-viewer-dist
PYTHON_VERSION: '3.9'
jobs:
fetch-model-metadata:
runs-on: ubuntu-latest
outputs:
metadata_file: 'model_metadata.json'
steps:
- name: Checkout repository
uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: ${{ env.PYTHON_VERSION }}
- name: Install Python dependencies
run: |
pip install -r ./scripts/requirements.txt
- name: Fetch and Write Metadata
id: fetch
run: |
python ./scripts/fetch_mlflow_metadata.py \
--model-name "${{ github.event.inputs.model_name }}" \
--model-version "${{ github.event.inputs.model_version }}" \
--output-file "model_metadata.json"
- name: Upload metadata artifact
uses: actions/upload-artifact@v3
with:
name: model-metadata
path: model_metadata.json
build-frontend:
needs: fetch-model-metadata
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v3
- name: Download metadata artifact
uses: actions/download-artifact@v3
with:
name: model-metadata
path: ./public
- name: Setup Node.js
uses: actions/setup-node@v3
with:
node-version: '18'
cache: 'npm'
- name: Install NPM dependencies
run: npm install
- name: Build static site
# This script will read ./public/model_metadata.json
run: npm run build
- name: Upload distributable artifact
uses: actions/upload-artifact@v3
with:
name: ${{ env.ARTIFACT_NAME }}
path: ./dist
trigger-deployment:
needs: build-frontend
runs-on: ubuntu-latest
# This job requires more complex secrets for git operations and artifact storage
steps:
- name: Download distributable artifact
uses: actions/download-artifact@v3
with:
name: ${{ env.ARTIFACT_NAME }}
- name: Configure AWS Credentials
uses: aws-actions/configure-aws-credentials@v2
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-region: ${{ secrets.AWS_REGION }}
- name: Upload artifact to S3
id: upload_s3
run: |
TIMESTAMP=$(date +%s)
VERSION="${{ github.event.inputs.model_name }}-${{ github.event.inputs.model_version }}-${TIMESTAMP}"
aws s3 cp . s3://${{ secrets.S3_ARTIFACT_BUCKET }}/model-viewer/${VERSION} --recursive
echo "artifact_version=${VERSION}" >> $GITHUB_OUTPUT
- name: Checkout Hiera data repository
uses: actions/checkout@v3
with:
repository: 'our-org/puppet-hiera-data'
token: ${{ secrets.HIERA_REPO_PAT }}
path: 'hiera-data'
- name: Update Hiera configuration
run: |
cd hiera-data
# This script updates the YAML file with the new S3 path
../scripts/update_hiera.sh \
"common.yaml" \
"model_viewer::artifact_version" \
"${{ steps.upload_s3.outputs.artifact_version }}"
- name: Commit and push Hiera changes
run: |
cd hiera-data
git config --global user.name "GitHub Actions Bot"
git config --global user.email "[email protected]"
git add .
git commit -m "Automated deployment of model-viewer ${{ steps.upload_s3.outputs.artifact_version }}" || echo "No changes to commit"
git push
Bridging the Gap: The MLflow Client Script
A critical component is the Python script responsible for communicating with the MLflow Tracking Server. This script cannot be trivial; it must handle authentication, potential network issues, and cases where the requested model or version does not exist. It authenticates using environment variables populated by GitHub secrets. Its sole job is to query the MLflow API and serialize the required data into a structured JSON file that the front-end build process can consume without any ambiguity.
# scripts/fetch_mlflow_metadata.py
import os
import json
import argparse
import logging
from mlflow.tracking import MlflowClient
from mlflow.exceptions import RestException
# Basic logging configuration
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
def get_model_details(client, model_name, model_version):
"""
Fetches comprehensive details for a specific model version from MLflow.
Handles exceptions for missing models or versions.
"""
try:
logging.info(f"Fetching version '{model_version}' for model '{model_name}'...")
version_details = client.get_model_version(name=model_name, version=model_version)
# We need more than just the version details, like metrics from the parent run.
run_id = version_details.run_id
run_data = client.get_run(run_id).data
metrics = run_data.metrics
params = run_data.params
tags = run_data.tags
# Construct a clean, serializable dictionary for the frontend
metadata = {
"model_name": version_details.name,
"version": version_details.version,
"stage": version_details.current_stage,
"description": version_details.description,
"run_id": version_details.run_id,
"source_artifact_uri": version_details.source,
"created_timestamp": version_details.creation_timestamp,
"last_updated_timestamp": version_details.last_updated_timestamp,
"metrics": metrics,
"params": params,
"tags": {k: v for k, v in tags.items() if not k.startswith("mlflow.")}, # Filter out internal tags
}
logging.info(f"Successfully fetched metadata for run ID: {run_id}")
return metadata
except RestException as e:
logging.error(f"Failed to fetch model '{model_name}' version '{model_version}'. Error: {e}")
# In a CI environment, a non-zero exit code is crucial
exit(1)
except Exception as e:
logging.error(f"An unexpected error occurred: {e}")
exit(1)
def main():
parser = argparse.ArgumentParser(description="Fetch MLflow model metadata for frontend consumption.")
parser.add_argument("--model-name", required=True, help="The registered model name in MLflow.")
parser.add_argument("--model-version", required=True, help="The model version to fetch.")
parser.add_argument("--output-file", required=True, help="Path to write the output JSON file.")
args = parser.parse_args()
# MLflow client automatically picks up credentials from environment variables
# set in the GitHub Actions workflow (MLFLOW_TRACKING_URI, etc.)
try:
client = MlflowClient()
except Exception as e:
logging.error(f"Failed to initialize MLflow client. Check credentials and URI. Error: {e}")
exit(1)
metadata = get_model_details(client, args.model_name, args.model_version)
if metadata:
try:
with open(args.output_file, 'w') as f:
json.dump(metadata, f, indent=4)
logging.info(f"Metadata successfully written to {args.output_file}")
except IOError as e:
logging.error(f"Failed to write metadata to file {args.output_file}. Error: {e}")
exit(1)
if __name__ == "__main__":
main()
The JavaScript Front-End: Static Data Injection
The front-end is a simple, static single-page application. To maximize performance and simplify deployment, we avoid making client-side API calls to MLflow. Instead, the metadata is baked into the application at build time. Our build script reads the model_metadata.json
file and makes it available as a global JavaScript object.
Here is the core logic within our main application script. It assumes a build tool (like Webpack or Vite) has processed this file.
// src/app.js
// This import is handled by a build tool plugin or a custom build script.
// It effectively reads the JSON file and makes it available as a module.
import modelData from '../public/model_metadata.json';
// Simple error handling for the case where the data might be missing
function renderError(message) {
const container = document.getElementById('app');
container.innerHTML = `<div class="error-panel">
<h2>Failed to Load Model Data</h2>
<p>${message}</p>
</div>`;
}
function renderKeyValuePairs(title, dataObject) {
if (!dataObject || Object.keys(dataObject).length === 0) {
return '';
}
const items = Object.entries(dataObject)
.map(([key, value]) => `
<div class="kv-pair">
<span class="key">${key}</span>
<span class="value">${typeof value === 'number' ? value.toFixed(4) : value}</span>
</div>
`)
.join('');
return `
<div class="data-section">
<h3>${title}</h3>
<div class="kv-container">${items}</div>
</div>
`;
}
function render() {
const container = document.getElementById('app');
if (!container) {
console.error('Root element #app not found.');
return;
}
if (!modelData || !modelData.model_name) {
renderError('The model_metadata.json file is either missing or malformed.');
return;
}
// Convert timestamp to a readable format
const updatedDate = new Date(modelData.last_updated_timestamp).toLocaleString();
const appHTML = `
<header>
<h1>Model Viewer: ${modelData.model_name}</h1>
<span class="version-tag">Version: ${modelData.version}</span>
<span class="stage-tag stage-${modelData.stage.toLowerCase()}">${modelData.stage}</span>
</header>
<main>
<div class="metadata-card">
<h2>Details</h2>
<p class="description">${modelData.description || 'No description provided.'}</p>
<div class="info-grid">
<div><strong>Run ID:</strong> <span>${modelData.run_id}</span></div>
<div><strong>Last Updated:</strong> <span>${updatedDate}</span></div>
</div>
<div class="info-grid">
<div><strong>Artifact Path:</strong> <code class="code-block">${modelData.source_artifact_uri}</code></div>
</div>
</div>
${renderKeyValuePairs('Metrics', modelData.metrics)}
${renderKeyValuePairs('Parameters', modelData.params)}
${renderKeyValuePairs('Tags', modelData.tags)}
</main>
`;
container.innerHTML = appHTML;
}
// Ensure the DOM is ready before attempting to render
document.addEventListener('DOMContentLoaded', render);
Styling is handled by PostCSS, which allows us to use modern CSS features while ensuring browser compatibility. The configuration is minimal but effective for our needs, using plugins for nesting and future features.
// postcss.config.js
module.exports = {
plugins: {
'postcss-preset-env': {
stage: 1,
features: {
'nesting-rules': true,
},
},
'cssnano': {}, // Minify for production builds
},
};
An example of a styled component using nesting:
/* src/styles/main.css */
.metadata-card {
background-color: #f9f9f9;
border: 1px solid #e1e4e8;
border-radius: 6px;
padding: 24px;
margin-bottom: 20px;
h2 {
margin-top: 0;
border-bottom: 1px solid #ccc;
padding-bottom: 8px;
}
.info-grid {
display: grid;
grid-template-columns: 1fr;
gap: 12px;
margin-top: 16px;
@media (min-width: 768px) {
grid-template-columns: 1fr 1fr;
}
}
}
The Puppet Integration: Declarative Deployment
Here lies the most significant architectural constraint. The target nodes are managed by Puppet. A common anti-pattern would be to have the GitHub Action SSH into the nodes and run deployment commands. This is brittle and violates the declarative nature of configuration management.
Instead, we adopt a GitOps-like pattern. The source of truth for our application’s deployed version is not in the CI job but in a Hiera data file stored in a separate Git repository.
The trigger-deployment
job in our workflow does two things:
- It uploads the built front-end artifact (
dist
folder) to an S3 bucket with a unique versioned key (e.g.,model-viewer/prod-recommendation-engine-3-1677610000
). - It checks out the
puppet-hiera-data
repository, programmatically modifies a YAML file to update theartifact_version
key, and pushes the change.
A helper script handles the YAML update safely.
# scripts/update_hiera.sh
#!/bin/bash
set -euo pipefail
FILE_PATH=$1
KEY=$2
NEW_VALUE=$3
# A more robust solution would use a proper YAML parser like yq
# But for a simple key-value, grep/sed is sufficient and has fewer dependencies.
if grep -q "^${KEY}:" "${FILE_PATH}"; then
# Key exists, update it
sed -i "s|^${KEY}:.*|${KEY}: \"${NEW_VALUE}\"|" "${FILE_PATH}"
echo "Updated key '${KEY}' in ${FILE_PATH}"
else
# Key does not exist, append it
echo "${KEY}: \"${NEW_VALUE}\"" >> "${FILE_PATH}"
echo "Added key '${KEY}' to ${FILE_PATH}"
fi
Finally, the Puppet manifest on the server uses this data to ensure the correct version is deployed. It describes the desired state, and Puppet’s agent makes it so during its next run.
# modules/model_viewer/manifests/init.pp
class model_viewer (
String $artifact_version = undef,
String $install_dir = '/var/www/html/model-viewer',
String $artifact_bucket = 'our-internal-artifacts',
) {
# Ensure the web server package is installed and service is running
ensure_packages(['nginx'])
service { 'nginx':
ensure => running,
enable => true,
}
# Ensure the installation directory exists and has correct permissions
file { $install_dir:
ensure => directory,
owner => 'www-data',
group => 'www-data',
mode => '0755',
}
# Use the hiera-provided version to construct the S3 source URL
$s3_source_path = "s3://${artifact_bucket}/model-viewer/${artifact_version}/"
# A pitfall here is managing the cleanup of old files.
# We need to ensure that when we deploy a new version, any files from
# the old version that are no longer present are removed.
# Using `aws s3 sync` with the --delete flag is idempotent and handles this.
exec { 'sync-model-viewer-from-s3':
command => "/usr/local/bin/aws s3 sync '${s3_source_path}' '${install_dir}' --delete --no-progress",
path => ['/bin', '/usr/bin'],
# This `onlyif` is crucial. It prevents the command from running on every
# Puppet agent run if the content is already up-to-date.
# We check for a version file that should be part of the artifact.
onlyif => "test \"$(cat ${install_dir}/VERSION 2>/dev/null)\" != \"${artifact_version}\"",
# This requires the archive resource to create the version file
# for the check to work reliably on subsequent runs.
logoutput => true,
require => File[$install_dir],
notify => Exec['write-version-file'],
}
# Write a VERSION file to make the deployment state auditable on the machine
# and to help with the idempotency check above.
exec { 'write-version-file':
command => "echo '${artifact_version}' > ${install_dir}/VERSION",
path => ['/bin', '/usr/bin'],
refreshonly => true, # This exec only runs when notified by the sync
}
}
The data binding happens in Hiera:
# data/common.yaml
---
model_viewer::artifact_version: "prod-recommendation-engine-2-1677510000" # This value gets updated by CI
This architecture creates a clean separation of concerns. The CI/CD system is responsible for building the artifact and signaling intent by updating a data file. The configuration management system is responsible for observing that intent and converging the state of the infrastructure to match it.
graph TD subgraph GitHub Actions A[Manual Trigger: workflow_dispatch] --> B{Job: fetch-model-metadata}; B --> C[Python Script: fetch_mlflow_metadata.py]; C --> D[MLflow API]; D --> C; C --> E[model_metadata.json]; E --> F{Job: build-frontend}; F --> G[npm run build]; G --> H[Static Site Artifact]; H --> I{Job: trigger-deployment}; end subgraph AWS J[S3 Artifact Bucket] end subgraph Puppet Infra K[Hiera Git Repository] L[Puppet Server] M[Web Server VM] end I --> J[Upload Artifact to S3]; I --> K[Commit New Version to Hiera Data]; K --> L[Puppet Server Reads Hiera]; L --> M[Puppet Agent Run on VM]; M --> J[Syncs Artifact from S3];
The primary limitation of this approach is the latency introduced by Puppet’s agent run interval. The deployment is not instantaneous upon the Hiera data update; it occurs during the next scheduled check-in, which could be up to 30 minutes. For our internal dashboard use case, this delay is acceptable. For a system requiring near-instant deployments, a push-based mechanism using Puppet Bolt or a transition to a more dynamic container orchestration platform would be necessary. Furthermore, the reliance on a shell script to update Hiera YAML is functional but fragile; replacing it with a tool like yq
within the runner would improve robustness against complex YAML structures.