Building a Dynamic, Pluggable ML Model Serving Architecture Using Nomad, Keras, and Zookeeper-Driven Micro-frontends


The initial state was managed chaos. Our data science team, while brilliant, operated in a siloed fashion. Models existed as a constellation of Jupyter notebooks, one-off Flask scripts, and pickled files on shared drives. Deploying a model for internal demonstration meant grabbing an available VM, manually setting up a Python environment, and running a script inside a tmux session. Versioning was non-existent, resource allocation was a guessing game, and discovering what models were even available required asking around in a chat channel. This approach was not only inefficient but also introduced significant operational risk.

Our objective became clear: build a simple, on-premise, self-service platform where a data scientist could package their Keras model and, with a single command, deploy it into a shared cluster. The model should become instantly discoverable and usable via a central web portal, isolated from other models, with its resources managed automatically. We didn’t need the full complexity of a Kubernetes-based MLOps platform; our team was small, and operational simplicity was paramount.

This led to an architectural concept we termed “Model-as-a-Micro-frontend.” Each deployable unit would not just be a model API, but a self-contained web application encapsulating the Keras model, its prediction endpoint, and a simple UI for interaction. This aligns perfectly with the micro-frontend principle of independently deployable, autonomous components. For orchestration, HashiCorp Nomad was selected over Kubernetes for its architectural simplicity and lower operational burden, a critical factor for a small platform team. The final piece of the puzzle was service discovery. In a dynamic environment where models come and go, we needed a robust way for our central portal to know which models were running and where. We opted for Apache Zookeeper, a battle-hardened coordination service that provides the exact primitives needed for service registration and discovery, especially its support for ephemeral nodes.

The Self-Registering Model Service Component

The foundation of the entire system is the individual model service. It’s a Python application using Flask to serve a Keras model. In a real-world project, you’d use a more performant server like Gunicorn, but for clarity, the built-in Flask server suffices. The critical design choice here is making the service responsible for its own registration in Zookeeper upon startup.

Let’s start with the service’s structure. We’ll use a simple pre-trained MNIST model for demonstration.

/model_service
|-- app.py
|-- Dockerfile
|-- requirements.txt
|-- models/
|   `-- mnist_model.h5
`-- templates/
    `-- index.html

The core logic resides in app.py. This isn’t just a simple API wrapper. It integrates with Zookeeper using the kazoo client. On startup, it connects to Zookeeper, creates an ephemeral, sequential ZNode representing itself, and stores its allocated host and port in that node’s data.

model_service/app.py

import os
import logging
import socket
import atexit
import json
import numpy as np
import tensorflow as tf
from flask import Flask, request, jsonify, render_template
from kazoo.client import KazooClient
from kazoo.exceptions import NoNodeError, ConnectionLoss
from PIL import Image
import base64
import io

# --- Basic Flask App Setup ---
app = Flask(__name__)
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')

# --- Environment Variable Configuration ---
# These are crucial for decoupling the application from its environment.
# Nomad will inject these values at runtime.
ZK_HOSTS = os.getenv('ZK_HOSTS', '127.0.0.1:2181')
SERVICE_NAME = os.getenv('SERVICE_NAME', 'mnist-classifier')
# Nomad provides these networking variables automatically.
HOST_IP = os.getenv('NOMAD_IP_http', '127.0.0.1')
HOST_PORT = os.getenv('NOMAD_PORT_http', 5000)

# --- Keras Model Loading ---
# In a production system, this path would be part of the baked Docker image.
try:
    model = tf.keras.models.load_model('models/mnist_model.h5')
    logging.info("Keras model loaded successfully.")
except Exception as e:
    logging.error(f"Failed to load Keras model: {e}")
    model = None

# --- Zookeeper Service Registration ---
zk_client = None
service_path = None

def register_with_zookeeper():
    global zk_client, service_path
    try:
        zk_client = KazooClient(hosts=ZK_HOSTS, logger=logging)
        zk_client.start(timeout=15) # Add a reasonable timeout
        logging.info(f"Connected to Zookeeper at {ZK_HOSTS}")

        # Ensure the base path for all services exists.
        base_path = "/ml-services"
        zk_client.ensure_path(base_path)

        # The service path specific to this model type.
        model_base_path = f"{base_path}/{SERVICE_NAME}"
        zk_client.ensure_path(model_base_path)

        # The payload contains connection details and metadata.
        service_payload = {
            "service_name": SERVICE_NAME,
            "address": HOST_IP,
            "port": int(HOST_PORT),
            "version": "1.0.0" # Example metadata
        }
        payload_bytes = json.dumps(service_payload).encode('utf-8')

        # Create an ephemeral, sequential node.
        # Ephemeral: Automatically removed if the session is lost (i.e., the service dies).
        # Sequential: Avoids naming conflicts if multiple instances start simultaneously.
        service_path = zk_client.create(
            path=f"{model_base_path}/instance-",
            value=payload_bytes,
            ephemeral=True,
            sequence=True
        )
        logging.info(f"Service registered in Zookeeper at path: {service_path}")

    except Exception as e:
        logging.error(f"Could not register with Zookeeper: {e}")
        # If registration fails, the service is useless. We should probably exit.
        if zk_client:
            zk_client.stop()
            zk_client.close()
        raise SystemExit(f"Zookeeper registration failed. Shutting down.")

# --- Graceful Shutdown Hook ---
def unregister_service():
    if zk_client and zk_client.state != 'CLOSED' and service_path:
        try:
            if zk_client.exists(service_path):
                zk_client.delete(service_path)
                logging.info(f"Unregistered service from Zookeeper path: {service_path}")
        except NoNodeError:
            logging.warning(f"Zookeeper node {service_path} already gone.")
        except ConnectionLoss:
            logging.error("Lost connection to Zookeeper during cleanup.")
        finally:
            zk_client.stop()
            zk_client.close()
            logging.info("Zookeeper client stopped.")

atexit.register(unregister_service)

# --- API and UI Routes ---
@app.route('/')
def index():
    # This renders the simple UI for the micro-frontend.
    return render_template('index.html', service_name=SERVICE_NAME)

@app.route('/predict', methods=['POST'])
def predict():
    if not model:
        return jsonify({"error": "Model is not loaded"}), 500

    data = request.get_json(force=True)
    # The frontend sends a base64 encoded image from a canvas element.
    img_data = base64.b64decode(data['image'].split(',')[1])
    
    # Preprocess the image for the MNIST model.
    img = Image.open(io.BytesIO(img_data)).convert('L')
    img = img.resize((28, 28))
    img_array = np.array(img).reshape(1, 28, 28, 1).astype('float32') / 255.0

    prediction = model.predict(img_array)
    digit = int(np.argmax(prediction))

    return jsonify({'digit': digit})

if __name__ == '__main__':
    try:
        register_with_zookeeper()
        # In a real deployment, use a production-grade WSGI server.
        app.run(host='0.0.0.0', port=int(HOST_PORT))
    except SystemExit as e:
        logging.critical(str(e))

The key piece of logic is register_with_zookeeper. The use of an ephemeral node is a classic Zookeeper pattern for service discovery. If the container crashes or the network connection to Zookeeper is severed, the session times out, and Zookeeper automatically removes the node. This prevents the aggregator from routing traffic to dead instances. The atexit hook is a best-effort attempt at clean unregistration during a graceful shutdown.

The Dockerfile for this service is straightforward. It packages the application, dependencies, and the Keras model.

model_service/Dockerfile

# Use a slim base image with Python installed.
FROM python:3.9-slim

# Set the working directory
WORKDIR /app

# Copy requirements and install dependencies
COPY requirements.txt .
# A common mistake is not pinning dependencies, leading to unpredictable builds.
# In production, a lock file (e.g., poetry.lock, requirements.txt from pip freeze) is essential.
RUN pip install --no-cache-dir -r requirements.txt

# Copy the rest of the application code
COPY . .

# Expose the port the app will run on. This is for documentation;
# Nomad will map this to a host port.
EXPOSE 5000

# Command to run the application
CMD ["python", "app.py"]

Orchestrating with a Nomad Jobspec

With the self-contained, self-registering service packaged as a Docker image, the next step is to define how Nomad should run it. This is done via a job file written in HCL (HashiCorp Configuration Language). This file is declarative, describing the desired state of our service. A pitfall here is under-specifying resource requirements, which can lead to “noisy neighbor” problems in a shared cluster.

mnist_service.nomad

job "ml-model-mnist" {
  # The datacenter and region this job should run in.
  datacenters = ["dc1"]
  type = "service"

  # Defines a group of tasks that are co-located.
  group "classifier" {
    # The number of instances of this model service to run.
    count = 1

    # Restart policy is crucial for resilience.
    # If the app crashes (e.g., ZK connection fails on start), Nomad will try to restart it.
    restart {
      attempts = 3
      interval = "1m"
      delay = "15s"
      mode = "fail" # After 3 failed attempts, mark the allocation as failed.
    }

    # Network configuration is key for dynamic service discovery.
    network {
      # Use bridge mode for port mapping.
      mode = "bridge"
      # Request a dynamic port from Nomad for the 'http' label.
      # This prevents port collisions on the host machine.
      port "http" {}
    }

    # The actual task to run, which is our Docker container.
    task "mnist-server" {
      driver = "docker"

      config {
        image = "your-docker-registry/mnist-model-service:1.0.0"
        # Map the dynamically allocated host port to the container's port 5000.
        ports = ["http"]
      }

      # Define resource constraints. This is critical for scheduling and
      # preventing one job from starving others.
      resources {
        cpu    = 200 # MHz
        memory = 512 # MB
      }
      
      # Inject environment variables needed by the Python application.
      # This is how we pass the Zookeeper address and service name.
      env {
        SERVICE_NAME = "mnist-classifier"
        # The value for ZK_HOSTS should come from a central config or Nomad variables.
        # Hardcoding is for demonstration only.
        ZK_HOSTS = "192.168.1.100:2181,192.168.1.101:2181"
        
        # Nomad automatically provides NOMAD_IP_ and NOMAD_PORT_ variables
        # based on the network stanza. This is how the app knows its own address.
      }
    }
  }
}

This jobspec tells Nomad everything it needs to know: what Docker image to pull, how many copies to run, what resources to allocate, and how to configure its networking. The most important part is the synergy between network { port "http" {} } and the env block. Nomad assigns a random high-numbered port on the host and exposes it to the container via NOMAD_PORT_http. Our Python application reads this environment variable and uses it to register the correct, reachable address in Zookeeper.

The Aggregator Portal: Discovering and Displaying Models

The final component is the central portal or “shell” application. Its sole purpose is to provide a unified view of all active ML model micro-frontends. It does this by connecting to Zookeeper and watching for changes in the service registry.

This aggregator is also a Flask application. It doesn’t perform any ML itself; it’s purely a discovery and presentation layer.

aggregator/aggregator.py

import os
import logging
import json
from threading import Lock
from flask import Flask, render_template
from kazoo.client import KazooClient, KazooState
from kazoo.recipe.watchers import ChildrenWatch

logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')

# --- App Setup ---
app = Flask(__name__)
ZK_HOSTS = os.getenv('ZK_HOSTS', '127.0.0.1:2181')
SERVICE_BASE_PATH = "/ml-services"

# --- In-memory Service Cache ---
# This dictionary will hold the state of all discovered services.
# A lock is necessary because the Zookeeper watcher runs in a separate thread.
service_registry = {}
registry_lock = Lock()

# --- Zookeeper Client and Watcher ---
zk_client = KazooClient(hosts=ZK_HOSTS, logger=logging)

def state_listener(state):
    if state == KazooState.LOST:
        logging.warning("Zookeeper session lost. Watchers are gone.")
    elif state == KazooState.SUSPENDED:
        logging.warning("Zookeeper connection suspended.")
    else:
        logging.info(f"Zookeeper connection state is now: {state}")

zk_client.add_listener(state_listener)
zk_client.start()

# This is the core discovery logic.
# The ChildrenWatch will be triggered whenever a model type is added/removed.
@ChildrenWatch(client=zk_client, path=SERVICE_BASE_PATH)
def watch_model_types(model_types):
    logging.info(f"Detected model types: {model_types}")
    with registry_lock:
        # Prune model types that no longer exist
        current_types = set(service_registry.keys())
        removed_types = current_types - set(model_types)
        for model_type in removed_types:
            del service_registry[model_type]
            logging.info(f"Removed model type '{model_type}' from registry.")

        # Add watchers for new model types
        for model_type in model_types:
            if model_type not in service_registry:
                service_registry[model_type] = []
                # For each model type, watch its instances.
                instance_path = f"{SERVICE_BASE_PATH}/{model_type}"
                ChildrenWatch(client=zk_client, path=instance_path, func=update_model_instances(model_type))
    return True # Return True to keep the watch alive

# A factory function to create a specific watcher for each model's instances.
def update_model_instances(model_name):
    def watcher(instances):
        logging.info(f"Updating instances for '{model_name}': {instances}")
        instance_details = []
        for instance_id in instances:
            try:
                # For each instance, fetch its data (host, port, metadata).
                instance_path = f"{SERVICE_BASE_PATH}/{model_name}/{instance_id}"
                data, stat = zk_client.get(instance_path)
                if data:
                    instance_details.append(json.loads(data.decode('utf-8')))
            except Exception as e:
                logging.error(f"Failed to get data for instance {instance_id}: {e}")
        
        with registry_lock:
            service_registry[model_name] = instance_details
            logging.info(f"Updated registry for '{model_name}': {service_registry[model_name]}")
        return True # Keep watch alive
    return watcher

# --- Web Route ---
@app.route('/')
def portal():
    with registry_lock:
        # Create a deep copy to avoid race conditions during template rendering.
        current_services = json.loads(json.dumps(service_registry))
    return render_template('portal.html', services=current_services)

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=8080)

The aggregator uses two levels of ChildrenWatch. The first watches /ml-services to discover new types of models (e.g., mnist-classifier, sentiment-analyzer). When a new model type appears, it dynamically sets up a second ChildrenWatch on that model’s specific path (e.g., /ml-services/mnist-classifier) to track individual instances. This is a robust pattern that scales well without polling.

The accompanying portal.html template simply iterates over this registry and renders the information. For the micro-frontend integration, we use a simple iframe. In a production scenario, one might explore more sophisticated techniques like module federation, but iframes provide excellent isolation at the cost of some performance and integration elegance.

sequenceDiagram
    participant DS as Data Scientist
    participant Nomad
    participant ModelSvc as MNIST Model Service
    participant Zookeeper
    participant Aggregator

    DS->>Nomad: nomad job run mnist_service.nomad
    Nomad->>Nomad: Schedules task on a client node
    Nomad->>ModelSvc: Starts Docker container with dynamic port
    ModelSvc->>ModelSvc: Reads NOMAD_PORT_http env var
    ModelSvc->>Zookeeper: Connects and creates ephemeral node /ml-services/mnist-classifier/instance-001 with {host, port}
    Zookeeper-->>Aggregator: Notifies watcher of child change on /ml-services/mnist-classifier
    Aggregator->>Zookeeper: Gets data for new instance-001 node
    Aggregator->>Aggregator: Updates in-memory service registry
    
    participant User
    User->>Aggregator: HTTP GET /
    Aggregator->>User: Renders HTML with link to MNIST service
    
    Note over Nomad, ModelSvc: Later, ModelSvc container crashes
    Nomad->>ModelSvc: Detects failure
    Zookeeper->>Zookeeper: Ephemeral node for instance-001 is automatically deleted on session timeout
    Zookeeper-->>Aggregator: Notifies watcher of child deletion
    Aggregator->>Aggregator: Removes instance-001 from registry

The result is a highly decoupled system. The aggregator knows nothing about Keras or specific models. The model services know nothing about the aggregator. Nomad is only concerned with running containers according to a spec. Zookeeper acts as the central nervous system, enabling these components to coordinate without direct dependencies. When a data scientist deploys a new sentiment-analysis.nomad job, it registers itself under /ml-services/sentiment-analyzer, and the aggregator’s UI automatically updates to show the new service, no code changes required.

This architecture, while effective, has its boundaries. The reliance on Zookeeper introduces another piece of stateful infrastructure to manage, which might be overkill if the environment already has a service mesh or a discovery tool like Consul. The iframe-based UI integration is simple but can lead to a disjointed user experience and makes cross-frontend communication difficult. Security is also a major consideration not addressed here; in a real deployment, network policies in Nomad, mTLS between services, and ACLs on Zookeeper znodes would be mandatory. The next logical iteration would involve building a CI/CD pipeline that automates the Docker image build and nomad job submit process, truly enabling a GitOps workflow for ML model deployment.


  TOC