Designing a CQRS Architecture with TimescaleDB to Mitigate High-Cardinality Issues in Real-Time Monitoring Frontends

System Design

Word Count: 3.1k

Read Times: 19 Min

The collapse of a monitoring system under its own data weight is a familiar story. The initial antagonist is often an unbounded increase in label cardinality. A metric like http_requests_total is benign until it’s decorated with labels for every Kubernetes pod, container instance, customer ID, and software version. A relational database index that performs beautifully with a few thousand distinct time series quickly degrades into a multi-gigabyte, un-cacheable monstrosity, bringing query performance to a crawl. In this scenario, writes may remain fast, but the system becomes read-blind, rendering dashboards and alerts useless. This isn’t a hypothetical; it’s the default failure mode for naive observability backends.

A conventional three-tier architecture, where a single application service handles both ingestion and querying against a general-purpose database, is the direct path to this failure. The read and write workloads are fundamentally different. Ingestion is a high-volume, append-only stream of simple writes. Querying involves complex, time-bound aggregations across potentially millions of distinct series. Lumping these together creates resource contention at every layer: CPU cycles in the application, I/O on the database, and connection pool exhaustion.

The alternative requires acknowledging this workload dichotomy. We can physically separate the data ingestion path from the data query path. This is the core principle of Command Query Responsibility Segregation (CQRS). This architectural decision, however, introduces complexity. A second decision is then required for the persistence layer. A standard PostgreSQL instance will still buckle under the cardinality pressure. We need a datastore engineered for time-series data, specifically one with strategies for managing high-cardinality metadata. TimescaleDB, as a PostgreSQL extension, provides this capability without forcing a complete departure from the relational ecosystem.

The third piece of the puzzle is the user interface. A monitoring dashboard is not a static report; it’s a live view into a system’s health. The state is complex, constantly updating, and requires efficient rendering. A reactive state management library is non-negotiable. MobX offers a suitable model through its transparently reactive dependency tracking, simplifying the management of streaming data from the query backend. Babel becomes the essential build-tool glue, enabling the modern JavaScript syntax and features, like decorators, that make the MobX implementation clean and maintainable.

Let’s compare these two approaches.

Architecture A: The Monolithic Approach (The Path to Failure)

A single Node.js/Express application serves two primary endpoints: /ingest and /query. It connects to a standard PostgreSQL database with a simple table.

-- postgres_schema_naive.sql
CREATE TABLE metrics (
    timestamp   TIMESTAMPTZ       NOT NULL,
    name        TEXT              NOT NULL,
    value       DOUBLE PRECISION  NOT NULL,
    labels      JSONB
);

-- Inefficient indexing for high-cardinality labels
CREATE INDEX idx_metrics_name_timestamp ON metrics (name, timestamp DESC);
CREATE INDEX idx_metrics_labels ON metrics USING GIN (labels);

The service code might look like this:

// monolithic_service.js
const express = require('express');
const { Pool } = require('pg');

const app = express();
app.use(express.json());

const pool = new Pool({ /* connection details */ });

// INGESTION
app.post('/ingest', async (req, res) => {
    // In a real-world project, batching is essential.
    // This simplified example shows a single metric write.
    const { timestamp, name, value, labels } = req.body;
    try {
        await pool.query(
            'INSERT INTO metrics(timestamp, name, value, labels) VALUES($1, $2, $3, $4)',
            [timestamp, name, value, labels]
        );
        res.status(202).send();
    } catch (err) {
        console.error('Ingestion failed:', err);
        res.status(500).send('Internal Server Error');
    }
});

// QUERYING
app.post('/query', async (req, res) => {
    const { name, timeRange, labelFilters, groupBy } = req.body;
    // This query construction is simplistic and vulnerable to SQL injection.
    // Production code requires a robust query builder.
    const filterClauses = Object.entries(labelFilters).map(([key, val]) => 
        `labels @> '{"${key}": "${val}"}'`
    ).join(' AND ');

    const queryString = `
        SELECT 
            time_bucket('1 minute', timestamp) as bucket,
            -- Complex aggregation logic would go here
            avg(value) as avg_value
        FROM metrics
        WHERE name = $1
          AND timestamp BETWEEN $2 AND $3
          AND ${filterClauses} -- High potential for slow queries
        GROUP BY bucket, ${groupBy ? `labels->>'${groupBy}'` : '1'}
        ORDER BY bucket;
    `;
    try {
        const { rows } = await pool.query(queryString, [name, timeRange.start, timeRange.end]);
        res.json(rows);
    } catch (err) {
        console.error('Query failed:', err);
        res.status(500).send('Internal Server Error');
    }
});

app.listen(3000);

Analysis of Architecture A:

Pros: Simple to understand and implement initially. A single codebase and deployment artifact.
Cons:
1. Write/Read Contention: A surge in ingested metrics can starve the query endpoints of resources (CPU, connections). Long-running, complex queries can block writes.
2. Database Inefficiency: The GIN index on the labels JSONB column becomes a major bottleneck. As cardinality grows, the index size explodes. Queries that filter on multiple labels perform poorly as the planner struggles to find an optimal path. time_bucket is not a native PostgreSQL function; it would need to be implemented manually or with another extension, adding complexity.
3. Scalability Ceiling: Scaling this service means scaling both read and write capabilities together, which is inefficient. You might need 10x the write capacity but only 2x the read capacity, but you are forced to scale them in lockstep.

Architecture B: CQRS with TimescaleDB (The Scalable Approach)

This architecture separates the system into distinct services and leverages a purpose-built database.

graph TD
    subgraph Frontend
        A[Browser with MobX UI]
    end

    subgraph Backend
        LB_WRITE[Load Balancer] --> C1[Command Service 1]
        LB_WRITE --> C2[Command Service 2]
        LB_WRITE --> CN[...Command Service N]

        LB_READ[Load Balancer] --> Q1[Query Service 1]
        LB_READ --> Q2[Query Service 2]

        subgraph Persistence
            DB[(TimescaleDB)]
        end

        C1 --> DB
        C2 --> DB
        CN --> DB
        Q1 --> DB
        Q2 --> DB
    end
    
    A -- REST/GraphQL API Calls --> LB_READ
    IngestionSources[Metric Sources] -- High Volume Writes --> LB_WRITE

The fundamental decision is to trade initial implementation simplicity for long-term scalability and performance resilience. The cost of managing two service types and a more specialized database is accepted because the cost of Architecture A failing in production is far higher.

Core Implementation: The Command Path

The Command Service’s only job is to accept metric data, perform minimal validation, and write it to TimescaleDB as efficiently as possible. It should be lightweight and horizontally scalable.

// command_service/index.js
const express = require('express');
const { Pool } = require('pg');

// A simple in-memory logger. Production systems would use Winston, Pino, etc.
const logger = {
    info: (msg) => console.log(`[INFO] ${new Date().toISOString()}: ${msg}`),
    error: (msg, err) => console.error(`[ERROR] ${new Date().toISOString()}: ${msg}`, err),
};

const app = express();
// Use a body parser with a higher limit for batch ingestion.
app.use(express.json({ limit: '10mb' }));

// A robust connection pool is critical for performance.
const pool = new Pool({
    max: 20, // More connections for a write-heavy workload
    idleTimeoutMillis: 30000,
    connectionTimeoutMillis: 5000,
    // ... other connection details from env vars
});

// A simple health check endpoint
app.get('/health', (req, res) => res.status(200).send('OK'));

// The single responsibility of this service: ingest commands.
app.post('/v1/metrics', async (req, res) => {
    // Expects an array of metrics for batching.
    const metrics = req.body;
    if (!Array.isArray(metrics) || metrics.length === 0) {
        return res.status(400).send('Request body must be a non-empty array of metrics.');
    }

    const client = await pool.connect();
    try {
        // In a real project, this would be more sophisticated, using something like
        // pg's `copy-from` for maximum bulk insert performance.
        // For this example, we'll use a transaction with multiple INSERTS.
        await client.query('BEGIN');
        for (const metric of metrics) {
            // Basic validation. A real implementation would use a schema validator like Joi or Zod.
            if (!metric.timestamp || !metric.name || typeof metric.value !== 'number' || typeof metric.labels !== 'object') {
                // Log the bad metric but don't fail the whole batch.
                logger.error('Invalid metric format in batch', metric);
                continue; 
            }
            const { timestamp, name, value, labels } = metric;
            await client.query(
                'INSERT INTO metrics(time, name, value, labels) VALUES($1, $2, $3, $4)',
                [timestamp, name, value, labels]
            );
        }
        await client.query('COMMIT');
        res.status(202).send({ received: metrics.length });
    } catch (err) {
        await client.query('ROLLBACK');
        logger.error('Batch ingestion transaction failed', err);
        res.status(500).send('Internal Server Error');
    } finally {
        // Always release the client back to the pool.
        client.release();
    }
});

const PORT = process.env.PORT || 8080;
app.listen(PORT, () => {
    logger.info(`Command Service listening on port ${PORT}`);
});

The key here is that this service does no complex thinking. It’s a funnel, optimized for I/O and network throughput.

Core Implementation: TimescaleDB Schema

This is the most critical component. A poorly designed schema will negate all the architectural benefits. We use TimescaleDB’s features to handle time-series data and mitigate cardinality issues.

-- timescale_schema.sql

-- 1. Ensure the TimescaleDB extension is available.
CREATE EXTENSION IF NOT EXISTS timescaledb;

-- 2. Create the main table for metrics.
-- 'time' is the conventional column name for the primary timestamp in TimescaleDB.
CREATE TABLE metrics (
    time    TIMESTAMPTZ       NOT NULL,
    name    TEXT              NOT NULL,
    value   DOUBLE PRECISION  NOT NULL,
    labels  JSONB             NOT NULL
);

-- 3. Turn the regular table into a hypertable, partitioned by time.
-- This is the core magic of TimescaleDB. It automatically creates and manages
-- time-based partitions ("chunks") under the hood. This keeps indexes small

-- and queries fast for recent data. A chunk interval of 1 day is a reasonable start.
SELECT create_hypertable('metrics', 'time', chunk_time_interval => INTERVAL '1 day');

-- 4. Create indexes optimized for our query patterns.
-- The standard TimescaleDB index is on the time column, which is automatically created.
-- We need a composite index on the fields we will filter and group by.
-- The order (name, time DESC) is crucial for queries that filter by name and then
-- ask for the most recent data.
CREATE INDEX idx_metrics_name_time ON metrics (name, time DESC);

-- 5. The high-cardinality index.
-- This GIN index on the JSONB column allows for efficient queries that
-- filter on arbitrary key-value pairs within the labels. For example, finding all
-- metrics where `labels.pod = 'my-app-xyz' AND labels.dc = 'us-east-1'`.
-- While still a GIN index, its performance within TimescaleDB's partitioned
-- chunks is significantly better than on a single, massive monolithic table.
CREATE INDEX idx_metrics_labels ON metrics USING GIN (labels);

-- 6. (Optional but recommended) Set up compression policies.
-- This can dramatically reduce storage costs for older data.
-- Data older than 7 days will be compressed.
ALTER TABLE metrics SET (
  timescaledb.compress,
  timescaledb.compress_segmentby = 'name'
);
SELECT add_compression_policy('metrics', INTERVAL '7 days');

The decision to use a JSONB column for labels is a trade-off. The alternative is a “normalized” approach with separate labels and metric_labels tables. While that can be more storage-efficient, it adds significant JOIN complexity to queries. For many observability use cases, the query performance and simplicity of an indexed JSONB column in TimescaleDB is a pragmatic choice.

Core Implementation: The Query Path

The Query Service is the counterpart to the Command Service. It is optimized for complex reads and aggregations. It might have a larger memory footprint to handle result sets and can be scaled independently.

// query_service/index.js
const express = require('express');
const { Pool } = require('pg');

const logger = {
    info: (msg) => console.log(`[INFO] ${new Date().toISOString()}: ${msg}`),
    error: (msg, err) => console.error(`[ERROR] ${new Date().toISOString()}: ${msg}`, err),
};

const app = express();
app.use(express.json());

const pool = new Pool({
    max: 10, // Fewer connections may be fine for a read-only workload
    // ... connection details
});

app.get('/health', (req, res) => res.status(200).send('OK'));

// The query endpoint is more complex, handling various parameters.
app.post('/v1/query', async (req, res) => {
    // A production validator (Joi/Zod) is essential here.
    const { name, start, end, filters, aggregation } = req.body;
    if (!name || !start || !end || !aggregation) {
        return res.status(400).send('Missing required query parameters.');
    }

    // Build the query dynamically but safely using parameterized queries.
    let query = `
        SELECT
            time_bucket($1, time) AS bucket,
    `;
    const queryParams = [aggregation.interval, start, end, name];
    let paramCounter = 5;

    // Aggregation function (e.g., avg, max, percentile_cont)
    // IMPORTANT: Sanitize this input to prevent SQL injection.
    const allowedAggFuncs = ['avg', 'max', 'min', 'sum', 'count'];
    if (!allowedAggFuncs.includes(aggregation.function)) {
        return res.status(400).send('Invalid aggregation function.');
    }
    query += ` ${aggregation.function}(value) AS value`;

    // Grouping by label
    if (aggregation.groupBy && typeof aggregation.groupBy === 'string') {
        // IMPORTANT: Sanitize groupBy to prevent injection.
        // A whitelist of allowed label keys is a good practice.
        const sanitizedGroupBy = aggregation.groupBy.replace(/[^a-zA-Z0-9_]/g, '');
        query += `, labels->>'${sanitizedGroupBy}' as group_key`;
    }

    query += `
        FROM metrics
        WHERE time BETWEEN $2 AND $3
          AND name = $4
    `;

    // Add label filters
    if (filters && typeof filters === 'object') {
        for (const [key, value] of Object.entries(filters)) {
            query += ` AND labels->>$${paramCounter++} = $${paramCounter++}`;
            queryParams.push(key, value);
        }
    }

    query += ' GROUP BY bucket';
    if (aggregation.groupBy) {
        const sanitizedGroupBy = aggregation.groupBy.replace(/[^a-zA-Z0-9_]/g, '');
        query += `, group_key`;
    }

    query += ' ORDER BY bucket';
    
    try {
        logger.info(`Executing query: ${query} with params: ${JSON.stringify(queryParams)}`);
        const { rows } = await pool.query(query, queryParams);
        res.status(200).json(rows);
    } catch (err) {
        logger.error('Query execution failed', err);
        res.status(500).send('Internal Server Error');
    }
});

const PORT = process.env.PORT || 8081;
app.listen(PORT, () => {
    logger.info(`Query Service listening on port ${PORT}`);
});

Notice the use of time_bucket(), a core TimescaleDB function that is highly optimized for time-based aggregation. The code is also more cautious about constructing SQL strings to avoid injection vulnerabilities, a critical concern in a service that builds dynamic queries.

Core Implementation: The Frontend with MobX and Babel

The frontend’s responsibility is to present this data effectively. This means managing fetch states, handling real-time updates, and allowing user interaction (zooming, filtering) without re-writing the entire application state logic each time.

First, the necessary build configuration. Babel is what allows us to use modern JavaScript features like class properties and decorators, which make MobX code significantly more ergonomic.

// .babelrc.json
{
  "presets": [
    "@babel/preset-env",
    "@babel/preset-react"
  ],
  "plugins": [
    // The order is important: 'decorators' before 'class-properties'.
    ["@babel/plugin-proposal-decorators", { "legacy": true }],
    ["@babel/plugin-proposal-class-properties", { "loose": true }]
  ]
}

Now, the MobX store. This class encapsulates all the state and logic related to fetching and displaying our time-series data.

// src/stores/DashboardStore.js
import { makeObservable, observable, action, flow, computed } from 'mobx';

class DashboardStore {
    // Observable state properties. MobX will track their usage.
    timeSeriesData = [];
    isLoading = false;
    error = null;
    
    // Query parameters that drive the data fetching
    queryParams = {
        name: 'http_requests_total',
        start: new Date(Date.now() - 3600 * 1000).toISOString(),
        end: new Date().toISOString(),
        filters: {
            'k8s_namespace': 'production'
        },
        aggregation: {
            interval: '1 minute',
            function: 'avg',
            groupBy: 'pod'
        }
    };

    constructor() {
        // This wires up the class properties to be MobX observables, actions, etc.
        makeObservable(this, {
            timeSeriesData: observable,
            isLoading: observable,
            error: observable,
            queryParams: observable,
            updateQueryParam: action,
            fetchData: flow, // `flow` is great for async actions
            seriesCount: computed
        });
    }

    // An action to modify state.
    updateQueryParam = (key, value) => {
        // This is a deep update, so we need to be careful.
        // In a real app, a library like `lodash.set` or immutable helpers might be used.
        if (key.includes('.')) {
            const [parent, child] = key.split('.');
            this.queryParams[parent][child] = value;
        } else {
            this.queryParams[key] = value;
        }
        // When this action is called, any component that depends on queryParams
        // will be automatically re-rendered.
    };
    
    // A computed property. It's derived from other state and is automatically
    // cached and re-evaluated when its dependencies change.
    get seriesCount() {
        const uniqueKeys = new Set(this.timeSeriesData.map(d => d.group_key));
        return uniqueKeys.size;
    }

    // `flow` creates an async action that is properly handled by MobX.
    // It uses a generator function.
    *fetchData() {
        this.isLoading = true;
        this.error = null;
        try {
            const response = yield fetch('http://localhost:8081/v1/query', {
                method: 'POST',
                headers: { 'Content-Type': 'application/json' },
                body: JSON.stringify(this.queryParams)
            });

            if (!response.ok) {
                throw new Error(`API request failed with status ${response.status}`);
            }

            const data = yield response.json();
            // Using `action` to batch the state updates.
            action(() => {
                this.timeSeriesData = data;
                this.isLoading = false;
            })();
        } catch (err) {
            action(() => {
                this.error = err.message;
                this.isLoading = false;
            })();
        }
    }
}

export const dashboardStore = new DashboardStore();

A React component using this store would be wrapped in observer from mobx-react-lite. This wrapper ensures the component re-renders whenever an observable value it uses changes. It’s highly efficient because it doesn’t re-render if unrelated state changes.

// src/components/TimeSeriesChart.jsx
import React, { useEffect } from 'react';
import { observer } from 'mobx-react-lite';
import { dashboardStore } from '../stores/DashboardStore';

// This is a placeholder for a real charting library like Recharts or D3.
const ChartComponent = ({ data }) => (
    <pre>{JSON.stringify(data, null, 2)}</pre>
);

const TimeSeriesChart = observer(() => {
    // This component will now automatically re-render whenever
    // dashboardStore.isLoading, dashboardStore.error, or
    // dashboardStore.timeSeriesData changes.
    
    useEffect(() => {
        // Fetch data on initial mount.
        dashboardStore.fetchData();
        
        // Set up a polling interval for real-time updates.
        // In a production system, WebSockets or Server-Sent Events would be better.
        const intervalId = setInterval(() => {
            dashboardStore.fetchData();
        }, 30000);
        
        return () => clearInterval(intervalId);
    }, []); // Empty dependency array ensures this runs only once.

    if (dashboardStore.isLoading) {
        return <div>Loading chart data...</div>;
    }

    if (dashboardStore.error) {
        return <div style={{ color: 'red' }}>Error: {dashboardStore.error}</div>;
    }

    return (
        <div>
            <h2>Metric: {dashboardStore.queryParams.name}</h2>
            <p>Displaying {dashboardStore.seriesCount} series</p>
            <ChartComponent data={dashboardStore.timeSeriesData} />
        </div>
    );
});

export default TimeSeriesChart;

This CQRS architecture, while more operationally complex, directly addresses the root causes of failure in the monolithic model. It isolates disparate workloads, allowing for independent tuning and scaling. It uses a database specifically designed for the data’s shape and query patterns, avoiding the impedance mismatch of forcing a general-purpose RDBMS to behave like a TSDB. The frontend state management is similarly purpose-built for reactivity, preventing complex UI logic from becoming an unmanageable mess.

The trade-off is clear: increased architectural complexity and operational overhead. This system requires separate deployment pipelines, monitoring, and alerting for its command and query services. The potential for data inconsistency between the write and read paths exists, though for most observability use cases, a few seconds of lag is acceptable. The choice to adopt this architecture is a pragmatic engineering decision, made when the scale of the problem makes the “simpler” alternative a guaranteed dead end. It’s not a solution for every project, but for high-cardinality, time-series workloads, it’s a proven and resilient pattern.

CQRS Babel Observability Architecture TimescaleDB MobX

Implementing a Fault-Tolerant Replicated State Machine Using Raft and Embedded SQLite

2023-10-27 Distributed Systems

Frontend Dgraph SQLite Raft Loki

Implementing a Federated Micro-Frontend Architecture with a SAML-to-gRPC Authentication Bridge Using esbuild

2023-10-27 System Design

Styled-components Authentication gRPC SAML Micro-frontends esbuild