Constructing a Real-Time SSR Dashboard for Self-Monitoring Telemetry Using Sanic Caddy and InfluxDB

Observability

Word Count: 3.3k

Read Times: 20 Min

The requirement was for zero-dependency, sub-second visibility into the performance of a new service stack. Standard observability solutions like Prometheus and Grafana felt disproportionately heavy for an initial deployment, introducing operational overhead we weren’t ready to absorb. The goal was to create a closed-loop monitoring system where the service stack reports on its own health, presenting the data on an extremely lightweight, server-rendered dashboard. The entire feedback loop, from request hitting the edge to its metrics appearing on the dashboard, needed to be near-instantaneous.

This led to a design where our web server, Caddy, and our application server, Sanic, would stream their own performance metrics into a time-series database, InfluxDB. The same Sanic application would then query this data and use Server-Side Rendering (SSR) to build a real-time status page. This avoids the complexity of a client-side rendering framework and ensures the dashboard itself has a minimal performance footprint.

The architecture establishes a direct data flow:

graph TD
    subgraph "User Interaction"
        A[User Request] --> B{Caddy};
        E[Dashboard View] --> B;
    end

    subgraph "Application Stack"
        B -- Reverse Proxy --> C[Sanic App];
        C -- HTTP Response --> B;
        B -- HTML Response --> E;
    end

    subgraph "Telemetry Pipeline"
        B -- Structured JSON Logs --> D[Sanic Background Worker];
        C -- Middleware Metrics --> F[InfluxDB Writer];
        D -- Parsed Metrics --> F;
        F -- Batch Writes --> G[(InfluxDB)];
    end

    subgraph "Dashboard Rendering"
        C -- Dashboard Route --> H[InfluxDB Querier];
        H -- Flux Query --> G;
        G -- Time-Series Data --> H;
        H -- Render Context --> I[Jinja2 SSR Engine];
        I -- Generated HTML --> C;
    end

    style F fill:#f9f,stroke:#333,stroke-width:2px
    style D fill:#f9f,stroke:#333,stroke-width:2px

This entire system is designed to run within a single docker-compose setup, making it portable and self-contained. The initial pain point of observability overhead is solved by integrating the monitoring capability directly into the application’s runtime.

Phase 1: Foundational Infrastructure with Docker Compose

In a real-world project, cohesive container orchestration is non-negotiable. We’ll define our three core services: caddy, app (Sanic), and influxdb. The key is ensuring they share a network and that volumes are correctly configured for Caddy’s configuration and InfluxDB’s data persistence.

docker-compose.yml:

version: '3.8'

services:
  influxdb:
    image: influxdb:2.7
    container_name: telemetry_influxdb
    volumes:
      - influxdb_data:/var/lib/influxdb2
    ports:
      - "8086:8086"
    environment:
      - DOCKER_INFLUXDB_INIT_MODE=setup
      - DOCKER_INFLUXDB_INIT_USERNAME=admin
      - DOCKER_INFLUXDB_INIT_PASSWORD=password123
      - DOCKER_INFLUXDB_INIT_ORG=my-org
      - DOCKER_INFLUXDB_INIT_BUCKET=telemetry
      - DOCKER_INFLUXDB_INIT_ADMIN_TOKEN=my-super-secret-token
    networks:
      - monitor_net

  app:
    build: .
    container_name: telemetry_app
    ports:
      - "8000:8000"
    depends_on:
      - influxdb
    environment:
      - INFLUXDB_URL=http://influxdb:8086
      - INFLUXDB_TOKEN=my-super-secret-token
      - INFLUXDB_ORG=my-org
      - INFLUXDB_BUCKET=telemetry
      - SANIC_APP_HOST=0.0.0.0
      - SANIC_APP_PORT=8000
    volumes:
      - ./app:/app
    networks:
      - monitor_net

  caddy:
    image: caddy:2.7-alpine
    container_name: telemetry_caddy
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - ./Caddyfile:/etc/caddy/Caddyfile
      - caddy_data:/data
      - caddy_config:/config
    depends_on:
      - app
    networks:
      - monitor_net

networks:
  monitor_net:
    driver: bridge

volumes:
  influxdb_data:
  caddy_data:
  caddy_config:

This configuration initializes InfluxDB with a default organization, bucket, and user token, which the Sanic application will use to connect. Caddy is configured via a mounted Caddyfile.

Phase 2: Configuring Caddy for Structured Logging

The default Caddy access logs are human-readable, but for automated processing, we need structured data. Caddy’s support for JSON logging is a critical feature for this architecture. We configure it to proxy requests to our Sanic application and, more importantly, to format its logs as JSON, which will be the input for our telemetry pipeline.

Caddyfile:

# Global options block
{
    # Enable structured JSON logging for all sites.
    # This is the lifeblood of our Caddy-side telemetry.
    log {
        output stdout
        format json {
            # We explicitly include fields that are crucial for performance analysis.
            time_format "2006-01-02T15:04:05.000Z07:00"
        }
        level INFO
    }
}

# Define the primary site.
# Using a placeholder for production domains. Caddy handles HTTPS automatically.
# For local dev, it will generate a self-signed cert.
localhost {
    # Reverse proxy all requests to the Sanic application container.
    # The 'app' hostname is resolved by Docker's internal DNS.
    reverse_proxy app:8000
}

With output stdout and format json, Caddy will write detailed, machine-readable logs for every request to its standard output. Docker Compose aggregates these logs, making them available for our Sanic background worker to consume. The log entries will contain precise timing, status codes, and URI information.

Phase 3: Building the Asynchronous Sanic Application

The core of our system is the Sanic application. It serves three purposes: handling regular API requests, processing telemetry data in the background, and rendering the SSR dashboard.

First, the project structure:

.
├── Caddyfile
├── Dockerfile
├── docker-compose.yml
├── package.json
├── requirements.txt
└── app
    ├── __init__.py
    ├── core
    │   ├── __init__.py
    │   ├── influx_client.py
    │   └── log_processor.py
    ├── server.py
    ├── static
    │   └── css
    │       └── dashboard.css
    ├── styles
    │   ├── _base.scss
    │   ├── _variables.scss
    │   └── dashboard.scss
    └── templates
        └── dashboard.html

The Dockerfile for the Sanic app must install Python dependencies and also Node.js/Sass for our styling pipeline.

Dockerfile:

# Stage 1: Build CSS from SCSS
FROM node:18-alpine AS builder

WORKDIR /build
COPY app/styles/ ./styles
COPY package.json .
RUN npm install
RUN npm run build-css

# Stage 2: Build the final Python application
FROM python:3.11-slim

WORKDIR /app

# Install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy application code
COPY ./app .

# Copy compiled CSS from the builder stage
COPY --from=builder /build/static/css/dashboard.css ./static/css/dashboard.css

ENV PYTHONUNBUFFERED 1
ENV SANIC_APP_HOST 0.0.0.0
ENV SANIC_APP_PORT 8000

CMD ["sanic", "server.server:app", "--host", "0.0.0.0", "--port", "8000", "--workers=1"]

requirements.txt:

sanic==23.6.0
sanic-ext==23.6.0
influxdb-client==1.37.0
jinja2==3.1.2
aiofiles==23.1.0
python-dotenv==1.0.0

Phase 4: Robust InfluxDB Integration

Connecting to a database in an async application requires careful management of the connection lifecycle. A common mistake is creating a new client for every request. Instead, we initialize the client when the Sanic application starts and close it gracefully on shutdown.

app/core/influx_client.py:

import os
from logging import getLogger
from influxdb_client import Point
from influxdb_client.client.influxdb_client_async import InfluxDBClientAsync
from sanic import Sanic

logger = getLogger(__name__)

class InfluxDB:
    """A wrapper for managing the InfluxDB client lifecycle within a Sanic app."""
    def __init__(self):
        self._client: InfluxDBClientAsync | None = None
        self._write_api = None
        self._query_api = None

    async def connect(self, app: Sanic):
        """Initialize the async client and APIs."""
        url = os.environ["INFLUXDB_URL"]
        token = os.environ["INFLUXDB_TOKEN"]
        org = os.environ["INFLUXDB_ORG"]
        
        logger.info(f"Connecting to InfluxDB at {url}")
        self._client = InfluxDBClientAsync(url=url, token=token, org=org)
        self._write_api = self._client.write_api()
        self._query_api = self._client.query_api()
        app.ctx.influx_bucket = os.environ["INFLUXDB_BUCKET"]
        logger.info("InfluxDB connection established.")

    async def disconnect(self):
        """Gracefully close the client."""
        if self._client:
            await self._client.close()
            logger.info("InfluxDB connection closed.")

    async def write_point(self, point: Point):
        """Write a single data point. In a high-throughput system, batching is better."""
        if not self._write_api:
            logger.error("InfluxDB write API not available.")
            return
        
        try:
            await self._write_api.write(bucket=os.environ["INFLUXDB_BUCKET"], record=point)
        except Exception as e:
            # In production, this needs a retry mechanism or DLQ.
            logger.error(f"Failed to write point to InfluxDB: {e}")

    async def query(self, query: str):
        """Execute a Flux query."""
        if not self._query_api:
            logger.error("InfluxDB query API not available.")
            return None
        
        try:
            return await self._query_api.query(query=query)
        except Exception as e:
            logger.error(f"Failed to query InfluxDB: {e}")
            return None

# Singleton instance
influx_db = InfluxDB()

We integrate this into the main application file, server.py, using Sanic’s lifecycle listeners.

app/server.py:

from sanic import Sanic, Request, response
from sanic.log import logger
from sanic_ext import render
from influxdb_client import Point
import time

from .core.influx_client import influx_db
from .core.log_processor import start_log_processing

# --- Application Setup ---
app = Sanic("TelemetryApp")
app.config.TEMPLATING_ENABLE_ASYNC = True

# --- Lifecycle Hooks ---
@app.main_process_start
async def start(app: Sanic, _):
    await influx_db.connect(app)
    # Start the background task for processing Caddy logs
    app.add_task(start_log_processing(app))

@app.main_process_stop
async def stop(app: Sanic, _):
    await influx_db.disconnect()

# --- Middleware for Sanic Metrics ---
@app.middleware("request")
async def measure_request_time(request: Request):
    request.ctx.start_time = time.perf_counter()

@app.middleware("response")
async def record_sanic_metric(request: Request, response):
    if hasattr(request.ctx, "start_time"):
        duration_ms = (time.perf_counter() - request.ctx.start_time) * 1000
        
        point = (
            Point("request_metrics")
            .tag("source", "sanic")
            .tag("endpoint", request.name or "unknown")
            .tag("method", request.method)
            .tag("status_code", response.status)
            .field("duration_ms", duration_ms)
        )
        # Using add_task to avoid blocking the response path
        app.add_task(influx_db.write_point(point))

# --- API Routes ---
@app.get("/api/ping")
async def ping(request: Request):
    return response.json({"message": "pong"})

# --- SSR Dashboard Route ---
@app.get("/")
@app.ext.template("dashboard.html")
async def dashboard(request: Request):
    # This will be filled in Phase 7
    return {"data": "Placeholder"}

The middleware demonstrates a key pattern: it calculates the request duration and then uses app.add_task to send the metric to InfluxDB. This fires off the write operation without making the client wait for the database write to complete, which is crucial for maintaining low latency.

Phase 5: Processing Caddy Logs Asynchronously

This is where the system becomes self-aware. We need a process to consume the structured JSON logs from Caddy. A simple but effective way to handle this within our containerized setup is to read Docker’s log stream for the Caddy container. We’ll use asyncio.create_subprocess_exec to run docker logs -f and process the stream line by line.

app/core/log_processor.py:

import asyncio
import json
from logging import getLogger
from sanic import Sanic
from influxdb_client import Point
from .influx_client import influx_db

logger = getLogger(__name__)

# The container name must match what's in docker-compose.yml
CADDY_CONTAINER_NAME = "telemetry_caddy"

async def process_log_stream(stream):
    """Reads from a stream and processes each line as a JSON log entry."""
    while True:
        line = await stream.readline()
        if not line:
            await asyncio.sleep(0.1)
            continue
        
        try:
            log_entry = json.loads(line.decode('utf-8'))
            
            # A pitfall here is assuming fields always exist.
            # Real-world code needs more robust validation.
            if log_entry.get("logger") == "http.log.access":
                req = log_entry.get("request", {})
                point = (
                    Point("request_metrics")
                    .tag("source", "caddy")
                    .tag("host", req.get("host"))
                    .tag("method", req.get("method"))
                    .tag("status_code", log_entry.get("status"))
                    .field("duration_ms", log_entry.get("duration", 0) * 1000)
                    .field("uri", req.get("uri"))
                    .field("size", log_entry.get("size", 0))
                    .time(log_entry.get("ts")) # Use Caddy's timestamp
                )
                await influx_db.write_point(point)

        except (json.JSONDecodeError, KeyError) as e:
            # Ignoring malformed lines, but should log them for debugging.
            logger.warning(f"Could not parse Caddy log line: {line.strip()}, error: {e}")
        except Exception as e:
            logger.error(f"Unexpected error processing log line: {e}")

async def start_log_processing(app: Sanic):
    """
    Starts a subprocess to tail Caddy's logs and processes them.
    This is a fragile approach for production but demonstrates the concept.
    A dedicated log shipper (Vector, Fluentd) is the production-grade solution.
    """
    logger.info("Starting Caddy log processor...")
    while True:
        try:
            proc = await asyncio.create_subprocess_exec(
                "docker", "logs", "-f", "--since", "1s", CADDY_CONTAINER_NAME,
                stdout=asyncio.subprocess.PIPE,
                stderr=asyncio.subprocess.PIPE
            )
            logger.info(f"Connected to Docker logs stream for '{CADDY_CONTAINER_NAME}'")
            await process_log_stream(proc.stdout)
            
            # If the process exits, wait and retry
            await proc.wait()
            stderr_output = await proc.stderr.read()
            if stderr_output:
                logger.error(f"Log processor subprocess error: {stderr_output.decode().strip()}")

        except FileNotFoundError:
            logger.error("`docker` command not found. Is Docker installed and in the PATH?")
            break # Stop trying if docker command doesn't exist
        except Exception as e:
            logger.error(f"Log processor crashed: {e}. Restarting in 10 seconds.")

        await asyncio.sleep(10)

This background task is resilient; if the docker logs command fails or the container isn’t ready, it will wait and retry. In a production scenario, you would replace this with a more robust log shipping agent that writes directly to a message queue or an HTTP endpoint on the Sanic app.

Phase 6: Server-Side Rendering the Dashboard with Jinja2 and Flux

The final piece is the dashboard itself. The Sanic route will execute several queries against InfluxDB using the Flux language. These queries aggregate the raw metrics into meaningful statistics. The results are then passed to a Jinja2 template for rendering.

Updated app/server.py dashboard route:

# ... (imports and other routes) ...
@app.get("/")
@app.ext.template("dashboard.html")
async def dashboard(request: Request):
    """
    Queries InfluxDB for metrics and renders them server-side.
    """
    bucket = app.ctx.influx_bucket
    time_range = 'start: -1h' # Query data from the last hour

    # A common mistake is to run queries serially. asyncio.gather runs them concurrently.
    query_tasks = {
        "request_counts": f'''
            from(bucket: "{bucket}")
              |> range({time_range})
              |> filter(fn: (r) => r._measurement == "request_metrics")
              |> group(columns: ["source"])
              |> count()
              |> group()
        ''',
        "avg_latency": f'''
            from(bucket: "{bucket}")
              |> range({time_range})
              |> filter(fn: (r) => r._measurement == "request_metrics" and r._field == "duration_ms")
              |> group(columns: ["source"])
              |> mean()
              |> group()
        ''',
        "status_codes": f'''
            from(bucket: "{bucket}")
              |> range({time_range})
              |> filter(fn: (r) => r._measurement == "request_metrics")
              |> group(columns: ["status_code"])
              |> count()
              |> group()
        ''',
        "latency_over_time": f'''
            from(bucket: "{bucket}")
              |> range({time_range})
              |> filter(fn: (r) => r._measurement == "request_metrics" and r._field == "duration_ms")
              |> aggregateWindow(every: 1m, fn: mean, createEmpty: false)
              |> yield(name: "mean_latency")
        '''
    }
    
    results = await asyncio.gather(
        *(influx_db.query(q) for q in query_tasks.values())
    )
    
    # Process raw query results into a clean dictionary for the template
    data_context = {}
    raw_results = dict(zip(query_tasks.keys(), results))

    # Helper function to parse Flux results
    def parse_flux_result(tables, key_col, val_col="_value"):
        data = {}
        if tables:
            for table in tables:
                for record in table.records:
                    data[record.values.get(key_col)] = record.get_value()
        return data

    data_context["request_counts"] = parse_flux_result(raw_results.get("request_counts"), "source")
    data_context["avg_latency"] = parse_flux_result(raw_results.get("avg_latency"), "source")
    data_context["status_codes"] = parse_flux_result(raw_results.get("status_codes"), "status_code")
    
    # Process time-series data for the chart
    latency_data = []
    if raw_results.get("latency_over_time"):
        for table in raw_results["latency_over_time"]:
            for record in table.records:
                latency_data.append({
                    "time": record.get_time().isoformat(), 
                    "value": f"{record.get_value():.2f}"
                })
    data_context["latency_over_time"] = latency_data
    
    return {"data": data_context}

The queries are run concurrently using asyncio.gather for maximum efficiency. The raw Flux results, which are structured as a list of tables, are parsed into a simple dictionary that the template can easily consume.

The Jinja2 template then renders this data.

app/templates/dashboard.html:

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <meta http-equiv="refresh" content="5">
    <title>Live Telemetry Dashboard</title>
    <link rel="stylesheet" href="{{ url_for('static', name='css/dashboard.css') }}">
</head>
<body>
    <div class="container">
        <header>
            <h1>Live System Telemetry (Last 1 Hour)</h1>
        </header>
        <main>
            <section class="metrics-grid">
                <div class="card">
                    <h2>Requests (Caddy)</h2>
                    <p class="metric">{{ data.request_counts.get('caddy', 0) }}</p>
                </div>
                <div class="card">
                    <h2>Requests (Sanic)</h2>
                    <p class="metric">{{ data.request_counts.get('sanic', 0) }}</p>
                </div>
                <div class="card">
                    <h2>Avg Latency (Caddy)</h2>
                    <p class="metric">{{ "%.2f"|format(data.avg_latency.get('caddy', 0)) }} ms</p>
                </div>
                <div class="card">
                    <h2>Avg Latency (Sanic)</h2>
                    <p class="metric">{{ "%.2f"|format(data.avg_latency.get('sanic', 0)) }} ms</p>
                </div>
            </section>

            <section class="details-grid">
                <div class="card">
                    <h3>Status Code Distribution</h3>
                    <table>
                        <thead>
                            <tr><th>Status</th><th>Count</th></tr>
                        </thead>
                        <tbody>
                        {% for code, count in data.status_codes.items()|sort %}
                            <tr><td>{{ code }}</td><td>{{ count }}</td></tr>
                        {% else %}
                            <tr><td colspan="2">No data</td></tr>
                        {% endfor %}
                        </tbody>
                    </table>
                </div>
                <div class="card">
                    <h3>Latency Over Time (ms)</h3>
                    <div class="table-scroll">
                        <table>
                            <thead>
                                <tr><th>Timestamp</th><th>Avg Latency</th></tr>
                            </thead>
                            <tbody>
                            {% for point in data.latency_over_time|reverse %}
                                <tr><td>{{ point.time }}</td><td>{{ point.value }}</td></tr>
                            {% else %}
                                <tr><td colspan="2">No data</td></tr>
                            {% endfor %}
                            </tbody>
                        </table>
                    </div>
                </div>
            </section>
        </main>
    </div>
</body>
</html>

A simple <meta http-equiv="refresh" content="5"> tag is used for auto-refreshing, staying true to the minimal/zero JavaScript philosophy of this design.

Phase 7: Maintainable Styling with SCSS

Plain CSS for a dashboard can quickly become a mess. Using SCSS allows for variables, nesting, and mixins, which drastically improves maintainability.

package.json:

{
  "name": "telemetry-dashboard-styles",
  "version": "1.0.0",
  "scripts": {
    "build-css": "sass app/styles/dashboard.scss app/static/css/dashboard.css --style=compressed"
  },
  "devDependencies": {
    "sass": "^1.68.0"
  }
}

app/styles/_variables.scss:

$primary-bg: #1a1a2e;
$secondary-bg: #16213e;
$card-bg: #0f3460;
$text-color: #e94560;
$text-light: #dcdcdc;
$border-color: #e94560;
$font-family: 'Consolas', 'Menlo', monospace;

app/styles/dashboard.scss:

@import 'variables';
@import 'base';

.container {
    width: 90%;
    max-width: 1200px;
    margin: 2rem auto;
}

header h1 {
    text-align: center;
    margin-bottom: 2rem;
    color: $text-color;
    font-size: 2rem;
}

.metrics-grid, .details-grid {
    display: grid;
    gap: 1.5rem;
    margin-bottom: 2rem;
}

.metrics-grid {
    grid-template-columns: repeat(auto-fit, minmax(200px, 1fr));
}

.details-grid {
    grid-template-columns: 1fr 2fr;
}

.card {
    background-color: $card-bg;
    padding: 1.5rem;
    border-radius: 8px;
    border: 1px solid $border-color;
    color: $text-light;

    h2, h3 {
        margin-top: 0;
        margin-bottom: 1rem;
        color: $text-color;
        border-bottom: 1px solid darken($border-color, 10%);
        padding-bottom: 0.5rem;
    }

    .metric {
        font-size: 2.5rem;
        font-weight: bold;
        text-align: center;
        margin: 0;
        color: white;
    }
}

table {
    width: 100%;
    border-collapse: collapse;

    th, td {
        padding: 0.75rem;
        text-align: left;
        border-bottom: 1px solid $secondary-bg;
    }

    th {
        color: $text-color;
    }
}
.table-scroll {
    max-height: 400px;
    overflow-y: auto;
}

Running npm run build-css (as done in our Dockerfile) compiles these structured SCSS files into a single, minified CSS file that is served statically by Sanic.

This implementation achieves the initial goal: a self-contained, high-performance monitoring system. Caddy and Sanic metrics are captured and stored with minimal overhead, and the SSR dashboard provides immediate insight without the weight of a traditional observability stack.

The primary limitation of this design is the log processing mechanism. Tailing Docker logs via a subprocess is functional for a demonstration but lacks the robustness for a production system. It’s susceptible to failures if the Docker daemon is unresponsive or if the log format changes unexpectedly. A production-grade architecture would replace this component with a dedicated log shipper like Vector, configured to parse the Caddy JSON logs and forward them to a dedicated /ingest endpoint on the Sanic application. Furthermore, the dashboard is read-only; adding features like dynamic time-range selection would require introducing client-side state management, deviating from the initial SSR-purity principle. The error handling for database writes is also simplistic, lacking a retry strategy or a dead-letter queue, which would be essential for guaranteeing data integrity in a less-than-perfect network environment.

SSR Caddy Sass/SCSS Sanic InfluxDB

Implementing a Cross-Service Circuit Breaker in a Kafka and Nomad-Based RESTful System

2023-10-27 Resilience Engineering

Circuit Breaker Consul Go Kafka RESTful API Nomad

Building a Unified Data API Layer from a Spark Pipeline for Both Static Gatsby and Dynamic NuxtJS Frontends

2023-10-27 Data Engineering

Micro-frontends Pandas GraphQL Nuxt.js Gatsby Apache Spark