The initial requirement seemed straightforward: build an internal dashboard for our Site Reliability Engineers to visualize and correlate high-resolution time-series metrics from our central Data Lake. The existing off-the-shelf tools were too slow, cumbersome, and lacked the domain-specific context we needed. The initial proof-of-concept, built with a popular charting library and a standard Redux setup, collapsed under the weight of the data. Attempting to render a mere dozen series, each with a hundred thousand data points over a 24-hour window, resulted in a browser tab consuming gigabytes of memory before inevitably crashing. This was our technical pain point: the sheer scale of the data made conventional frontend rendering and state management paradigms completely unviable.
Our Data Lake houses trillions of data points. A typical debugging session requires an engineer to load, pan, and zoom through millions of points in real-time. The failure of the prototype forced a fundamental rethink of the architecture. The problem wasn’t just rendering; it was a state management crisis. Every user interaction, like zooming into a 5-minute window from a 24-hour view, was triggering a cascade of state changes and re-renders that paralyzed the application. We needed an architecture that was both granular and lazy, capable of managing state for data that wasn’t even in the browser yet.
This led to our technology selection, which many would consider unconventional. We abandoned the monolithic state tree of Redux in favor of Recoil. In a real-world project with this level of data-dependency, the overhead of updating a single, massive state object for every minor interaction is a performance killer. Recoil’s atomized, graph-based approach allowed us to treat each piece of state, from the current time window to the data for a single time series, as an independent, subscribable unit. Its asynchronous selectors were the killer feature, enabling components to declaratively depend on data that would be fetched on-demand from our Data Lake API.
For styling, we faced a similar challenge. A component-heavy dashboard with complex, data-driven styles (e.g., coloring a series based on its alert status) can quickly become a mess of global CSS or suffer from the runtime overhead of CSS-in-JS. CSS Modules gave us scoped styling by default, eliminating specificity wars. We paired it with PostCSS to build a powerful, pre-processed styling pipeline with nesting and custom properties, providing the developer experience of a preprocessor without the runtime cost. The combination felt right: a highly performant, granular, and maintainable stack for a high-stakes visualization problem.
The Backend Simulation: A Data Lake API Stub
Before diving into the frontend, it’s critical to define the API contract. In our production environment, this API gateway queries a system like Presto or Trino running against our Data Lake (stored in S3). For this breakdown, a simple Node.js Express server will simulate this behavior, specifically its ability to downsample data based on the requested resolution. This is a crucial feature for managing performance.
// server/index.js
const express = require('express');
const cors = require('cors');
const morgan = 'morgan'; // Using morgan for request logging in a real app is advisable.
const app = express();
const PORT = 4000;
app.use(cors());
// app.use(morgan('dev')); // Example of adding logging middleware
/**
* Generates mock time-series data.
* In a real system, this would query a data lake.
* @param {string} seriesId - The ID of the time series.
* @param {number} start - Start timestamp (Unix epoch in ms).
* @param {number} end - End timestamp (Unix epoch in ms).
* @param {number} resolution - The number of points to return.
* @returns {Array<{timestamp: number, value: number}>}
*/
function generateTimeSeriesData(seriesId, start, end, resolution) {
const data = [];
const step = (end - start) / resolution;
if (step <= 0) {
return [];
}
// Use a seed based on seriesId for deterministic randomness
let value = parseInt(seriesId.replace(/[^0-9]/g, '') || '1', 10) % 100;
for (let i = 0; i < resolution; i++) {
const timestamp = Math.floor(start + i * step);
// Simple sine wave + noise to simulate real metrics
const noise = (Math.random() - 0.5) * 10;
const trend = Math.sin((timestamp / 100000) + value) * 50;
value = 50 + trend + noise;
data.push({ timestamp, value: parseFloat(value.toFixed(2)) });
}
return data;
}
app.get('/api/timeseries', (req, res) => {
const { seriesId, start, end, resolution = 1000 } = req.query;
if (!seriesId || !start || !end) {
return res.status(400).json({ error: 'Missing required query parameters: seriesId, start, end.' });
}
const startTime = parseInt(start, 10);
const endTime = parseInt(end, 10);
const points = parseInt(resolution, 10);
if (isNaN(startTime) || isNaN(endTime) || isNaN(points)) {
return res.status(400).json({ error: 'Invalid timestamp or resolution values.' });
}
// Simulate query latency
setTimeout(() => {
try {
const data = generateTimeSeriesData(seriesId, startTime, endTime, points);
res.json({
seriesId,
data,
query: { start: startTime, end: endTime, resolution: points },
});
} catch (err) {
// Basic error handling
console.error(`[500] Error generating data for ${seriesId}:`, err);
res.status(500).json({ error: 'Internal server error while generating data.' });
}
}, Math.random() * 500 + 100); // Latency between 100ms and 600ms
});
app.listen(PORT, () => {
console.log(`Mock Time Series API server running on http://localhost:${PORT}`);
});
This server exposes one endpoint /api/timeseries
that accepts a seriesId
, start
and end
timestamp, and a resolution
. Requesting a 24-hour window with a resolution of 1000 returns a vastly different dataset than requesting a 5-minute window with the same resolution. This is the foundation of our performance strategy.
Recoil State Architecture: Granularity and Asynchronicity
The core of the frontend architecture lies in how we structured our Recoil state. We needed to separate UI state (like the visible time range) from remote data state (the actual time-series points).
// src/state/atoms.js
import { atom, selectorFamily } from 'recoil';
import { fetchTimeSeriesData } from '../services/api';
// Atom representing the global time range visible in the chart viewport.
// Any component can subscribe to this to react to zoom/pan events.
export const viewportTimeRangeState = atom({
key: 'viewportTimeRangeState',
default: {
start: Date.now() - 24 * 60 * 60 * 1000, // Default to last 24 hours
end: Date.now(),
},
});
// Atom representing the set of series IDs the user has chosen to display.
export const activeSeriesIdsState = atom({
key: 'activeSeriesIdsState',
default: new Set(['series-A1', 'series-B2', 'series-C3']),
});
// This is the most critical piece. It's a selector *family* that fetches
// data for a *single* series ID based on the current viewport time range.
// Recoil automatically caches the result based on the parameters (seriesId and timeRange).
// If another component requests the same seriesId and timeRange, it gets the cached data.
export const timeSeriesDataQuery = selectorFamily({
key: 'timeSeriesDataQuery',
get: (seriesId) => async ({ get }) => {
if (!seriesId) {
return null;
}
// This selector depends on another piece of Recoil state.
// If `viewportTimeRangeState` changes, Recoil will re-evaluate this selector
// for any component that is currently using it for a given seriesId.
const timeRange = get(viewportTimeRangeState);
// A simple optimization: don't fetch if the range is invalid.
if (timeRange.end <= timeRange.start) {
return { seriesId, data: [] };
}
try {
// The resolution is coupled to the viewport width. A real implementation
// would get this from a shared state or context. For now, we hardcode it.
// A wider viewport means we need more data points to avoid looking sparse.
const resolution = 1000;
const response = await fetchTimeSeriesData(seriesId, timeRange.start, timeRange.end, resolution);
return response;
} catch (error) {
console.error(`Failed to fetch data for ${seriesId}:`, error);
// Propagate the error to allow components to handle it gracefully.
throw error;
}
},
});
This design provides immense benefits:
- Decoupling: Components don’t know how to fetch data; they just declare their dependency on
timeSeriesDataQuery('some-id')
. - Automatic Caching: If the user pans and then pans back to the original view, Recoil serves the cached data instantly instead of firing another network request.
- Concurrency: When multiple series are displayed, Recoil initiates all asynchronous fetches concurrently.
- Granular Subscriptions: A component rendering data for
series-A1
will not re-render if the data forseries-B2
is updated. This was impossible with our initial Redux approach.
The Rendering Layer: Canvas over SVG
Our first implementation attempt used SVG. It was a disaster. Rendering 10,000 <circle>
or <path>
elements per series brought the browser’s renderer to its knees. The DOM is not designed for this kind of workload. The pivot to <canvas>
was not a choice but a necessity. The trade-off is losing the declarative nature and accessibility of DOM elements, but the performance gains are orders of magnitude.
Here is the core React component responsible for rendering a single time series on a canvas.
// src/components/SeriesRenderer.jsx
import React, { Suspense, useEffect, useRef } from 'react';
import { useRecoilValueLoadable } from 'recoil';
import { timeSeriesDataQuery } from '../state/atoms';
import { viewportTimeRangeState } from '../state/atoms';
import styles from './SeriesRenderer.module.css';
// A mapping for demonstration purposes. In a real app, this would come from metadata.
const SERIES_COLORS = {
'series-A1': '#4A90E2',
'series-B2': '#50E3C2',
'series-C3': '#F5A623',
};
function drawSeries(canvas, data, timeRange, color) {
const ctx = canvas.getContext('2d');
if (!ctx) return;
const { width, height } = canvas;
ctx.clearRect(0, 0, width, height);
if (!data || data.length === 0) {
return;
}
const { start: startTime, end: endTime } = timeRange;
const timeSpan = endTime - startTime;
// Find min/max values for scaling
const values = data.map(p => p.value);
const minValue = Math.min(...values);
const maxValue = Math.max(...values);
const valueSpan = maxValue - minValue || 1; // Avoid division by zero
ctx.beginPath();
ctx.strokeStyle = color;
ctx.lineWidth = 2;
ctx.lineJoin = 'round';
data.forEach((point, index) => {
const x = ((point.timestamp - startTime) / timeSpan) * width;
const y = height - ((point.value - minValue) / valueSpan) * height;
if (index === 0) {
ctx.moveTo(x, y);
} else {
ctx.lineTo(x, y);
}
});
ctx.stroke();
}
function CanvasRenderer({ seriesId }) {
const canvasRef = useRef(null);
const seriesDataLoadable = useRecoilValueLoadable(timeSeriesDataQuery(seriesId));
const timeRange = useRecoilValueLoadable(viewportTimeRangeState);
// This effect handles the actual drawing logic whenever data or dimensions change.
useEffect(() => {
const canvas = canvasRef.current;
if (!canvas || seriesDataLoadable.state !== 'hasValue' || timeRange.state !== 'hasValue') {
return;
}
// Basic resize handling to ensure canvas resolution matches display size
const { width, height } = canvas.getBoundingClientRect();
if (canvas.width !== width || canvas.height !== height) {
canvas.width = width;
canvas.height = height;
}
drawSeries(canvas, seriesDataLoadable.contents.data, timeRange.contents, SERIES_COLORS[seriesId] || '#FFFFFF');
}, [seriesDataLoadable, timeRange, seriesId]);
switch (seriesDataLoadable.state) {
case 'hasValue':
return <canvas ref={canvasRef} className={styles.seriesCanvas} />;
case 'loading':
return <div className={styles.loadingOverlay}>Loading {seriesId}...</div>;
case 'hasError':
return <div className={styles.errorOverlay}>Error loading {seriesId}.</div>;
default:
return null;
}
}
// The parent component wraps the renderer in React's Suspense,
// which works seamlessly with Recoil's async selectors.
export function SeriesRenderer({ seriesId }) {
return (
<Suspense fallback={<div className={styles.loadingOverlay}>Initializing...</div>}>
<CanvasRenderer seriesId={seriesId} />
</Suspense>
);
}
The use of useRecoilValueLoadable
is crucial for production-grade code. It allows us to inspect the state of the asynchronous selector (loading
, hasValue
, hasError
) and render appropriate UI without crashing the application if a network request fails. The <Suspense>
boundary provides a clean way to handle the initial loading state.
Styling Pipeline: CSS Modules and PostCSS
The styling for this component might seem simple, but in a complex dashboard, maintaining it is a challenge. CSS Modules provide local scope by default.
/* src/components/SeriesRenderer.module.css */
.seriesCanvas {
position: absolute;
top: 0;
left: 0;
width: 100%;
height: 100%;
pointer-events: none; /* Canvases for data sit below an interaction layer */
}
.loadingOverlay, .errorOverlay {
position: absolute;
inset: 0;
display: flex;
align-items: center;
justify-content: center;
font-family: sans-serif;
color: var(--text-color-secondary);
background-color: rgba(0, 0, 0, 0.2);
backdrop-filter: blur(2px);
}
.errorOverlay {
color: var(--color-error);
background-color: rgba(255, 0, 0, 0.1);
}
This is powered by a PostCSS configuration that enables modern features. In a real-world project, this is where we’d add plugins for theming, cross-browser compatibility, and other transformations.
// postcss.config.js
module.exports = {
plugins: [
'postcss-preset-env', // Provides fallbacks for modern CSS features
'postcss-nested', // Allows SASS-like nesting
// Other plugins like `autoprefixer` could be added here.
],
};
This setup hits a sweet spot: we get modern CSS syntax and guaranteed style encapsulation without the performance overhead of runtime CSS-in-JS solutions, which was a key consideration for our high-performance requirements.
System Architecture and Data Flow
The complete flow from user interaction to pixel on the screen is orchestrated by Recoil.
graph TD subgraph Browser A[User Interaction: Pan/Zoom] --> B(ChartContainer Component); B -- updates --> C{viewportTimeRangeState Atom}; subgraph Series Rendering D1[SeriesRenderer series-A1] -- subscribes to --> E1(timeSeriesDataQuery 'series-A1'); D2[SeriesRenderer series-B2] -- subscribes to --> E2(timeSeriesDataQuery 'series-B2'); end C -- triggers re-evaluation of --> E1; C -- triggers re-evaluation of --> E2; E1 -- depends on --> C; E2 -- depends on --> C; E1 -- async GET --> F[Backend API]; E2 -- async GET --> F; F -- returns data --> E1; F -- returns data --> E2; E1 -- updates --> D1; E2 -- updates --> D2; D1 -- draws on --> G[Canvas Element 1]; D2 -- draws on --> H[Canvas Element 2]; end subgraph Backend F -- queries --> I((Data Lake)); end style B fill:#f9f,stroke:#333,stroke-width:2px style C fill:#ccf,stroke:#333,stroke-width:2px style E1 fill:#9cf,stroke:#333,stroke-width:2px style E2 fill:#9cf,stroke:#333,stroke-width:2px
When a user pans the chart, the ChartContainer
updates the viewportTimeRangeState
atom. This change is detected by Recoil, which automatically re-evaluates the timeSeriesDataQuery
selector for each active series. New API calls are dispatched, and upon their return, only the specific SeriesRenderer
components whose data has changed are re-rendered, triggering a redraw on their respective canvases. This entire process is non-blocking and highly efficient.
Lingering Issues and Future Optimizations
This architecture, while performant, is not without its limitations. The current implementation still relies on a fixed resolution
parameter sent to the backend. A more sophisticated approach would dynamically calculate the required resolution based on the pixel width of the chart, ensuring we never fetch more data than can be physically displayed. This is known as a “pixel-perfect” query.
Furthermore, all data transformation logic currently resides in the drawSeries
function. For more complex visualizations involving aggregations or statistical analysis (e.g., drawing Bollinger Bands), this logic should be moved into another layer of Recoil selectors. This would allow us to memoize the results of expensive calculations, preventing them from being re-run on every single render.
Finally, the canvas-based approach presents accessibility challenges. A production-ready implementation would require a parallel, non-visual DOM structure or the use of canvas accessibility APIs to provide screen readers with the necessary information to interpret the chart data. The current solution prioritizes visual performance for sighted users, which is a common but important trade-off to acknowledge. The next iteration would focus heavily on bridging this accessibility gap.