The transition from a monolithic API to a scaled-out, containerized service cluster introduces a class of problems that simply don’t exist in a single-instance world. One of the most immediate is rate limiting. An in-memory token bucket or a fixed-window counter is trivial to implement for a single process. When three, five, or ten ephemeral instances of that same process are running behind a load balancer, enforcing a global limit—say, 1000 requests per minute for a given API key across the entire cluster—becomes a coordination problem.
A naive approach might involve a centralized Redis or database counter. While functional, this introduces a high-frequency write bottleneck and a potential single point of failure. More importantly, it fails to address a critical operational requirement: the ability to adjust rate limits dynamically across the entire fleet without a rolling restart. Modifying a configuration file and redeploying dozens of containers to change a single integer is not a tenable solution in a production environment.
This account details the construction of a distributed rate limiting solution where the core challenge was not just enforcement, but dynamic, cluster-wide reconfigurability. The architecture hinges on using Apache ZooKeeper not as a high-throughput counter, but as a coordination and configuration backbone for a cluster of .NET Web API services running in Docker.
The Technical Pain Point and Initial Design Choices
The core requirements were:
- Global Enforcement: The rate limit must be applied to the sum of requests across all container instances for a specific client.
- Dynamic Configuration: Rate limits must be updatable at runtime and propagate to all instances within seconds, without service interruption.
- Resilience: The system must be tolerant to individual API instances crashing and restarting.
ZooKeeper was chosen over alternatives like Redis for its specific strengths in distributed coordination. Its watch mechanism is purpose-built for propagating configuration changes. Its support for ephemeral nodes provides a simple, robust way to track active service instances. While Redis might offer higher raw throughput for a simple INCR
operation, our primary problem was coordination and dynamic control, not raw request counting speed.
The application stack consists of a .NET 6 Web API, containerized with Docker, and orchestrated for local development using Docker Compose. The ZooKeeper .NET client ZooKeeperNetEx
provides the necessary interface for communication.
The architecture can be visualized as follows:
graph TD subgraph "Docker Environment" ZK[("ZooKeeper Server")] subgraph "API Service Cluster" API1[API Instance 1] API2[API Instance 2] API3[API Instance 3] end end User[Client] --> LB(Load Balancer) LB --> API1 LB --> API2 LB --> API3 API1 <-->|Coordination & Config| ZK API2 <-->|Coordination & Config| ZK API3 <-->|Coordination & Config| ZK style ZK fill:#f9f,stroke:#333,stroke-width:2px
The Flawed First Attempt: Simple Limit Division
My initial thought process aimed for a lock-free, decentralized approach to avoid contention. The logic seemed simple:
- Each API instance registers itself as an ephemeral node in ZooKeeper under
/rate-limiter/instances/
. - Each instance places a watch on the
/rate-limiter/instances
parent node to get a count of active siblings (N
). - A global limit
L
is stored in/rate-limiter/config
. - Each instance calculates its local limit as
L / N
and enforces it using a standard in-memory rate limiter.
This design is elegant in theory but fails spectacularly in practice. The critical flaw is the assumption of uniform traffic distribution. A load balancer, especially with sticky sessions, may route a disproportionate number of requests from a high-traffic client to a single API instance. That instance would exhaust its small L / N
quota and begin rejecting requests, while other instances sit idle. The true global limit L
would never be reached, and valid requests would be unfairly denied. This approach creates a system that is functionally correct under perfect conditions but brittle and unreliable in a real-world project. It was abandoned.
The Second Iteration: A Lock-Based Distributed Counter
A more robust solution requires a centralized decision-making process for each request. This introduces the need for a distributed lock to prevent race conditions when multiple instances attempt to update a shared counter. This approach trades the theoretical elegance of a decentralized model for the practical correctness of a coordinated one.
The core data model in ZooKeeper is structured as follows:
-
/rate-limiter
: Root node for the entire feature. -
/rate-limiter/config/{apiKey}
: Stores the JSON configuration for a given API key. E.g.,{"Limit": 1000, "WindowSeconds": 60}
. All instances watch this node. -
/rate-limiter/state/{apiKey}
: Stores the current state of the sliding window counter. E.g.,{"Count": 450, "WindowStartUtc": "2023-10-27T10:00:00Z"}
. -
/rate-limiter/locks/{apiKey}
: An empty, persistent node used as a parent for acquiring a distributed lock.
The workflow for each request becomes:
- Acquire a distributed lock specific to the request’s
{apiKey}
. - Read the counter state from
/rate-limiter/state/{apiKey}
. - Check if the current time is outside the
WindowStartUtc
+WindowSeconds
.- If yes, reset the
Count
to 1 and updateWindowStartUtc
to the current time. - If no, check if
Count
is less thanLimit
.- If yes, increment
Count
. - If no, the request is rejected.
- If yes, increment
- If yes, reset the
- Write the new state back to
/rate-limiter/state/{apiKey}
. - Release the distributed lock.
This serialized access model guarantees correctness, but its performance is now bound by the latency of ZooKeeper operations and lock contention.
Production-Grade Implementation Details
1. Docker and Environment Setup
The docker-compose.yml
file sets up the ZooKeeper instance and scales the API service.
# docker-compose.yml
version: '3.8'
services:
zookeeper:
image: confluentinc/cp-zookeeper:7.3.2
hostname: zookeeper
container_name: zookeeper
ports:
- "2181:2181"
environment:
ZOOKEEPER_CLIENT_PORT: 2181
ZOOKEEPER_TICK_TIME: 2000
webapi:
build:
context: .
dockerfile: Dockerfile
ports:
- "8080" # Let Docker assign host ports to allow scaling
environment:
- ASPNETCORE_URLS=http://+:80
- ZookeeperConnectionString=zookeeper:2181
depends_on:
- zookeeper
The Dockerfile
for the .NET service is standard.
# Dockerfile
FROM mcr.microsoft.com/dotnet/sdk:6.0 AS build
WORKDIR /src
COPY ["MyApi.csproj", "."]
RUN dotnet restore "./MyApi.csproj"
COPY . .
WORKDIR "/src/."
RUN dotnet build "MyApi.csproj" -c Release -o /app/build
FROM build AS publish
RUN dotnet publish "MyApi.csproj" -c Release -o /app/publish /p:UseAppHost=false
FROM mcr.microsoft.com/dotnet/aspnet:6.0 AS final
WORKDIR /app
COPY /app/publish .
ENTRYPOINT ["dotnet", "MyApi.dll"]
2. ZooKeeper Client and Connection Management
A robust ZooKeeper client wrapper is essential. It must handle connection state changes, particularly session expiry, which is a common operational issue. A lost session invalidates all ephemeral nodes and watches, requiring the application to re-establish its state.
// Services/ZooKeeperClient.cs
using org.apache.zookeeper;
using System.Text;
// A simplified but resilient ZooKeeper client wrapper for DI
public class ZooKeeperClient : IDisposable
{
private readonly ILogger<ZooKeeperClient> _logger;
private readonly string _connectionString;
private ZooKeeper _zk;
private readonly SemaphoreSlim _connectionLock = new(1, 1);
private readonly CancellationTokenSource _cts = new();
public ZooKeeperClient(IConfiguration configuration, ILogger<ZooKeeperClient> logger)
{
_logger = logger;
_connectionString = configuration["ZookeeperConnectionString"];
}
public async Task ConnectAsync()
{
await _connectionLock.WaitAsync();
try
{
if (_zk != null && _zk.getState().IsAlive())
{
return;
}
_logger.LogInformation("Connecting to ZooKeeper at {ConnectionString}", _connectionString);
var watcher = new ConnectionWatcher(_logger, HandleSessionExpired);
_zk = new ZooKeeper(_connectionString, 5000, watcher);
// Wait until connected
var connectedSignal = new TaskCompletionSource<bool>();
watcher.OnStateChanged += (state) =>
{
if (state == Watcher.Event.KeeperState.SyncConnected)
{
connectedSignal.TrySetResult(true);
}
};
await connectedSignal.Task;
_logger.LogInformation("Successfully connected to ZooKeeper. Session ID: {SessionId}", _zk.getSessionId());
}
finally
{
_connectionLock.Release();
}
}
private async void HandleSessionExpired()
{
_logger.LogWarning("ZooKeeper session expired. Reconnecting and re-establishing state...");
// In a real application, you'd have a more robust re-initialization logic
// for all watches, ephemeral nodes, etc.
await ConnectAsync();
}
// Core methods to interact with ZK
public async Task CreatePathAsync(string path)
{
if (await ExistsAsync(path)) return;
var pathParts = path.Trim('/').Split('/');
var currentPath = new StringBuilder();
foreach (var part in pathParts)
{
currentPath.Append('/').Append(part);
var subPath = currentPath.ToString();
if (!await ExistsAsync(subPath))
{
try
{
await _zk.createAsync(subPath, null, ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT);
}
catch (KeeperException.NodeExistsException)
{
// Ignore, another instance created it in the meantime.
}
}
}
}
public async Task<bool> ExistsAsync(string path, Watcher watcher = null)
{
return await _zk.existsAsync(path, watcher) != null;
}
public async Task<string> GetDataAsync(string path, Watcher watcher = null)
{
var result = await _zk.getDataAsync(path, watcher);
return result != null ? Encoding.UTF8.GetString(result.Data) : null;
}
public async Task SetDataAsync(string path, string data)
{
await _zk.setDataAsync(path, Encoding.UTF8.GetBytes(data));
}
public async Task<string> CreateEphemeralSequentialAsync(string path, byte[] data)
{
return await _zk.createAsync(path, data, ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.EPHEMERAL_SEQUENTIAL);
}
public async Task<IEnumerable<string>> GetChildrenAsync(string path, Watcher watcher = null)
{
var result = await _zk.getChildrenAsync(path, watcher);
return result.Children;
}
public async Task DeleteAsync(string path)
{
await _zk.deleteAsync(path);
}
public void Dispose() => _zk?.closeAsync().Wait();
}
// Helper Watcher class to handle connection events
public class ConnectionWatcher : Watcher
{
private readonly ILogger _logger;
private readonly Action _onSessionExpired;
public event Action<Event.KeeperState> OnStateChanged;
public ConnectionWatcher(ILogger logger, Action onSessionExpired)
{
_logger = logger;
_onSessionExpired = onSessionExpired;
}
public override async Task process(WatchedEvent @event)
{
var state = @event.getState();
OnStateChanged?.Invoke(state);
_logger.LogDebug("ZooKeeper Watcher Event: Type={Type}, State={State}, Path={Path}", @event.get_Type(), state, @event.getPath());
if (state == Event.KeeperState.Expired)
{
_onSessionExpired?.Invoke();
}
await Task.CompletedTask;
}
}
This client is registered as a singleton in Program.cs
.
3. The Distributed Lock Implementation
The standard ZooKeeper recipe for a distributed lock involves creating an ephemeral, sequential node under a lock path. The client that creates the node with the lowest sequence number holds the lock.
// Services/DistributedLock.cs
public class DistributedLock : IAsyncDisposable
{
private readonly ZooKeeperClient _zkClient;
private readonly string _lockPath;
private string _lockNodePath;
public DistributedLock(ZooKeeperClient zkClient, string apiKey)
{
_zkClient = zkClient;
// Sanitize apiKey to be a valid ZNode name
var safeApiKey = new string(apiKey.Where(char.IsLetterOrDigit).ToArray());
_lockPath = $"/rate-limiter/locks/{safeApiKey}";
}
public async Task<bool> AcquireAsync(TimeSpan timeout)
{
await _zkClient.CreatePathAsync(_lockPath);
_lockNodePath = await _zkClient.CreateEphemeralSequentialAsync($"{_lockPath}/lock-", null);
var cts = new CancellationTokenSource(timeout);
while (!cts.IsCancellationRequested)
{
var children = (await _zkClient.GetChildrenAsync(_lockPath)).ToList();
children.Sort(); // ZooKeeper sequential nodes sort lexicographically
var myNodeName = _lockNodePath.Split('/').Last();
var myIndex = children.IndexOf(myNodeName);
if (myIndex == 0)
{
// We hold the lock
return true;
}
// Watch the node just before me
var predecessorPath = $"{_lockPath}/{children[myIndex - 1]}";
var tcs = new TaskCompletionSource<bool>();
var watcher = new NodeDeletedWatcher(() => tcs.TrySetResult(true));
if (await _zkClient.ExistsAsync(predecessorPath, watcher))
{
// Wait until the predecessor node is deleted
await tcs.Task;
}
// If Exists is false, predecessor was deleted between GetChildren and Exists calls, so we retry.
}
return false; // Timed out
}
public async ValueTask DisposeAsync()
{
if (_lockNodePath != null)
{
await _zkClient.DeleteAsync(_lockNodePath);
}
}
}
// A one-time watcher to signal when a node is deleted
public class NodeDeletedWatcher : Watcher
{
private readonly Action _onDeleted;
public NodeDeletedWatcher(Action onDeleted) => _onDeleted = onDeleted;
public override Task process(WatchedEvent @event)
{
if (@event.get_Type() == Event.EventType.NodeDeleted)
{
_onDeleted?.Invoke();
}
return Task.CompletedTask;
}
}
4. The Rate Limiting Middleware
This middleware orchestrates the entire process. It uses a ConcurrentDictionary
to cache rate limit configurations locally, only hitting ZooKeeper when a watch is triggered.
// Middleware/DistributedRateLimiterMiddleware.cs
using System.Net;
using System.Text.Json;
using System.Collections.Concurrent;
public class DistributedRateLimiterMiddleware
{
private readonly RequestDelegate _next;
private readonly ZooKeeperClient _zkClient;
private readonly ILogger<DistributedRateLimiterMiddleware> _logger;
// Local cache for rate limit configurations, populated by ZooKeeper watches
private static readonly ConcurrentDictionary<string, RateLimitConfig> _configCache = new();
public DistributedRateLimiterMiddleware(RequestDelegate next, ZooKeeperClient zkClient, ILogger<DistributedRateLimiterMiddleware> logger)
{
_next = next;
_zkClient = zkClient;
_logger = logger;
}
public async Task InvokeAsync(HttpContext context)
{
// For simplicity, API key is read from a header
if (!context.Request.Headers.TryGetValue("X-API-Key", out var apiKey) || string.IsNullOrEmpty(apiKey))
{
context.Response.StatusCode = (int)HttpStatusCode.Unauthorized;
await context.Response.WriteAsync("X-API-Key header is missing.");
return;
}
var config = await GetConfigForApiKeyAsync(apiKey);
if (config == null)
{
// If no config exists, we fail open or closed. Failing closed is safer.
context.Response.StatusCode = (int)HttpStatusCode.Forbidden;
await context.Response.WriteAsync("API Key not configured for rate limiting.");
return;
}
await using var distributedLock = new DistributedLock(_zkClient, apiKey);
// A timeout on the lock is critical to prevent requests from hanging indefinitely
if (!await distributedLock.AcquireAsync(TimeSpan.FromSeconds(5)))
{
_logger.LogWarning("Failed to acquire distributed lock for API key {ApiKey} within timeout.", apiKey);
context.Response.StatusCode = (int)HttpStatusCode.ServiceUnavailable;
await context.Response.WriteAsync("Rate limiting service is under high contention. Please try again.");
return;
}
var statePath = $"/rate-limiter/state/{apiKey}";
var state = await GetCurrentStateAsync(statePath);
var now = DateTime.UtcNow;
var windowStart = state.WindowStartUtc;
bool allowed;
if (now > windowStart.AddSeconds(config.WindowSeconds))
{
// Window has expired, reset it
state.Count = 1;
state.WindowStartUtc = now;
allowed = true;
_logger.LogInformation("New window started for {ApiKey}. Count: 1", apiKey);
}
else if (state.Count < config.Limit)
{
// Within window and under limit, increment
state.Count++;
allowed = true;
}
else
{
// Limit exceeded
allowed = false;
}
if (allowed)
{
await _zkClient.SetDataAsync(statePath, JsonSerializer.Serialize(state));
await _next(context);
}
else
{
_logger.LogWarning("Rate limit exceeded for API key {ApiKey}. Limit: {Limit}, Current: {Count}", apiKey, config.Limit, state.Count);
context.Response.StatusCode = (int)HttpStatusCode.TooManyRequests;
context.Response.Headers["Retry-After"] = config.WindowSeconds.ToString();
await context.Response.WriteAsync("Rate limit exceeded.");
}
}
private async Task<RateLimitConfig> GetConfigForApiKeyAsync(string apiKey)
{
if (_configCache.TryGetValue(apiKey, out var config))
{
return config;
}
var configPath = $"/rate-limiter/config/{apiKey}";
var configWatcher = new ConfigWatcher(apiKey, async (key, data) => {
if (data != null)
{
var newConfig = JsonSerializer.Deserialize<RateLimitConfig>(data);
_configCache[key] = newConfig;
_logger.LogInformation("Updated rate limit config for {ApiKey}: Limit={Limit}, Window={WindowSeconds}s", key, newConfig.Limit, newConfig.WindowSeconds);
}
else
{
_configCache.TryRemove(key, out _);
_logger.LogInformation("Removed rate limit config for {ApiKey}", key);
}
// Re-apply the watch
await GetConfigForApiKeyAsync(key);
});
if (!await _zkClient.ExistsAsync(configPath)) return null;
var configJson = await _zkClient.GetDataAsync(configPath, configWatcher);
if (string.IsNullOrEmpty(configJson)) return null;
var newConfig = JsonSerializer.Deserialize<RateLimitConfig>(configJson);
_configCache[apiKey] = newConfig;
return newConfig;
}
private async Task<RateLimitState> GetCurrentStateAsync(string statePath)
{
if (!await _zkClient.ExistsAsync(statePath))
{
// Initialize state if it doesn't exist
await _zkClient.CreatePathAsync(statePath);
var initialState = new RateLimitState { Count = 0, WindowStartUtc = DateTime.UtcNow };
await _zkClient.SetDataAsync(statePath, JsonSerializer.Serialize(initialState));
return initialState;
}
var stateJson = await _zkClient.GetDataAsync(statePath);
return JsonSerializer.Deserialize<RateLimitState>(stateJson);
}
// Watcher implementation for config changes
private class ConfigWatcher : Watcher
{
private readonly string _apiKey;
private readonly Func<string, string, Task> _callback;
public ConfigWatcher(string apiKey, Func<string, string, Task> callback)
{
_apiKey = apiKey;
_callback = callback;
}
public override async Task process(WatchedEvent @event)
{
if (@event.get_Type() == Event.EventType.NodeDataChanged || @event.get_Type() == Event.EventType.NodeCreated)
{
// In a real client, you'd get the ZK instance to re-fetch data
// This is simplified for the middleware context
await _callback(_apiKey, @event.getPath());
}
}
}
}
// Data models
public class RateLimitConfig { public int Limit { get; set; } public int WindowSeconds { get; set; } }
public class RateLimitState { public int Count { get; set; } public DateTime WindowStartUtc { get; set; } }
5. System Initialization
In Program.cs
, we register the client, ensure the base paths in ZooKeeper exist, and add the middleware to the pipeline.
// Program.cs
var builder = WebApplication.CreateBuilder(args);
// ... services
builder.Services.AddSingleton<ZooKeeperClient>();
builder.Services.AddControllers();
var app = builder.Build();
// Initialize ZooKeeper paths on startup
var zkClient = app.Services.GetRequiredService<ZooKeeperClient>();
await zkClient.ConnectAsync();
await zkClient.CreatePathAsync("/rate-limiter/config");
await zkClient.CreatePathAsync("/rate-limiter/state");
await zkClient.CreatePathAsync("/rate-limiter/locks");
// Seed a sample configuration for testing
await zkClient.SetDataAsync("/rate-limiter/config/test-key",
"{\"Limit\": 10, \"WindowSeconds\": 20}");
app.UseMiddleware<DistributedRateLimiterMiddleware>();
app.MapGet("/", () => $"Hello from container {Environment.MachineName}");
app.Run();
Testing the Dynamic Distributed System
To test, scale the service to 3 instances: docker-compose up --build --scale webapi=3
.
Then, use a simple script to fire requests. This PowerShell script sends 20 requests using the test-key
.
# test.ps1
$apiKey = "test-key"
$headers = @{ "X-API-Key" = $apiKey }
# Find one of the dynamically assigned ports
$containerId = (docker ps --filter "name=webapi" -q)[0]
$port = (docker port $containerId 80/tcp).Split(':')[1]
$uri = "http://localhost:$port/"
1..20 | ForEach-Object {
try {
$response = Invoke-WebRequest -Uri $uri -Headers $headers -UseBasicParsing
Write-Host "Request $_ : $($response.StatusCode) - OK"
} catch {
Write-Host "Request $_ : $($_.Exception.Response.StatusCode.value__) - $($_.Exception.Message)" -ForegroundColor Red
}
Start-Sleep -Milliseconds 200
}
Running this script will show approximately 10 successful requests and 10 rejections with status 429 Too Many Requests
. The key is that this behavior is consistent regardless of which of the three containers services the request.
The most powerful demonstration is the dynamic configuration. While the script is running, connect to ZooKeeper and change the limit:
# Inside the zookeeper container
# docker exec -it zookeeper /bin/bash
./bin/zkCli.sh -server localhost:2181
# In the zkCli shell
set /rate-limiter/config/test-key '{"Limit": 15, "WindowSeconds": 20}'
Almost immediately, the logs of the API instances will show the new configuration being loaded, and the running test script will start seeing successful requests again until the new limit of 15 is reached within the window.
The primary bottleneck in this architecture is the distributed lock. Every request for a given API key is serialized through a single lock-acquire/release cycle. For moderate traffic (up to a few hundred requests per second across the cluster), this is acceptable. The latency of ZooKeeper, typically a few milliseconds, is the governing factor. For high-performance systems requiring thousands of requests per second, this design would crumble under lock contention.
Future optimizations could involve a hybrid model where each node handles a local quota and only synchronizes with ZooKeeper when its local allocation is exhausted. Another path is to shard the locks by a more granular client identifier, spreading the contention. For extreme loads, a different technology stack like a dedicated proxy layer (e.g., Envoy) with a high-speed backend like Redis might be more appropriate. The value of the ZooKeeper-based approach lies in its robustness, consistency, and unparalleled support for dynamic, cluster-wide coordination, making it ideal for systems where correctness and control outweigh raw throughput requirements.