The initial attempt at building a real-time collaborative tool was a predictable failure. We started with a simple WebSocket architecture: a central Actix-web server held the “source of truth” state for a shared document in memory. Clients would connect, receive the full state, and then send delta updates as JSON patches. The server would apply a patch and broadcast the new state to all other connected clients. This worked flawlessly with two clients and careful, sequential edits. The moment two users edited the same part of the document concurrently, the system collapsed into a cascade of inconsistent states and corrupted data. Last-write-wins, implemented naively, is a recipe for disaster.
This led to the classic path of introducing operational transforms (OT). While powerful, OT is notoriously complex. It requires a central, ordered log of operations and intricate transformation functions to handle concurrent changes. The server-side complexity balloons, and handling disconnected clients who need to catch up becomes a significant engineering challenge. We spent a sprint prototyping an OT system and quickly realized the maintenance overhead and logical fragility were too high for our team’s goals. The core problem was our insistence on maintaining a perfectly linear history in a fundamentally non-linear, distributed environment.
We needed a different model, one that embraces concurrency rather than fighting it. This led us to Conflict-free Replicated Data Types (CRDTs). A CRDT is a data structure that can be replicated across multiple computers, updated independently and concurrently without coordination, where it is always mathematically possible to resolve inconsistencies and merge all versions into a final, consistent state. This felt like the right architectural primitive. Our new plan was to build a “collaboration kit” on this foundation: a high-performance Rust backend using Actix-web to manage CRDT document state and broadcast operations, and a hyper-responsive React frontend where Valtio’s proxy-based state management could mirror the complex, mutable CRDT document structure with surgical precision.
Technology Selection and Rationale
The choice of stack was deliberate, focusing on performance, safety, and developer ergonomics for this specific real-time problem.
Backend Framework: Actix-web (Rust)
In a system broadcasting potentially thousands of tiny updates per second, performance and concurrency are paramount. Actix-web, built on Rust, offers memory safety without a garbage collector, eliminating unpredictable latency spikes. Its actor model is a natural fit for managing stateful WebSocket connections. Each client connection can be its own isolated actor, communicating asynchronously with a central “document hub” actor. This isolates failures and makes managing the lifecycle of thousands of connections straightforward.CRDT Library:
diamond-types
(Rust & WASM)
We evaluated several CRDT libraries.automerge
is excellent but we opted fordiamond-types
due to its reported performance characteristics and its granular approach to operations. It produces small, efficient patches, which is critical for minimizing network traffic. Crucially, it has both a native Rust implementation for the server and a WebAssembly (WASM) build (diamond-types-web
) for the client. This allows us to use the exact same data model and merging logic on both ends, eliminating an entire class of client-server desynchronization bugs.Frontend State Management: Valtio (React)
A CRDT document is a deeply nested, mutable object. Traditional React state management libraries built on immutability (like Redux) would be a nightmare here. Every time a patch from the server arrived, we would have to recursively clone large parts of the state tree. This is not only slow but also incredibly verbose. Valtio, on the other hand, is built on proxies. You mutate the state object directly, just like a regular JavaScript object, and Valtio’s proxy detects the change and triggers a minimal re-render of only the components that depend on that specific piece of state. This is a perfect match for a CRDT model, where small, targeted mutations are the norm.
The “Kit” is the combination of these technologies, bound by a defined WebSocket protocol for exchanging CRDT operations.
The Backend: An Actix-web Actor System for CRDT Documents
The server architecture consists of two main actor types:
-
DocumentHub
: A singleton actor that holds the CRDT document in memory. It’s responsible for managing a list of connected client sessions, applying operations to the document, and broadcasting resulting patches to all other clients. -
WsSession
: An actor created for each WebSocket connection. It acts as a bridge, forwarding messages from the client to theDocumentHub
and pushing messages from theHub
back down to the client.
Here is the core setup for the server. Note the use of Arc<Mutex<...>>
to safely share the CRDT document state within the DocumentHub
. While a Mutex
introduces locking, the operations are extremely fast (applying a pre-validated patch), so contention in practice is negligible for this use case.
// file: main.rs
use actix::prelude::*;
use actix_web::{web, App, Error, HttpRequest, HttpResponse, HttpServer};
use actix_web_actors::ws;
use diamond_types::list::List;
use std::collections::HashMap;
use std::sync::{Arc, Mutex};
use std::time::Instant;
use rand::{self, Rng, thread_rng};
use serde::{Deserialize, Serialize};
// A unique identifier for each client session
type SessionId = usize;
/// Messages for our WebSocket protocol
#[derive(Message, Serialize, Deserialize, Debug)]
#[rtype(result = "()")]
#[serde(tag = "type")]
enum WsMessage {
/// A client is sending operations to be applied
ApplyOps { ops: Vec<u8> },
/// The hub is broadcasting operations to clients
BroadcastOps { ops: Vec<u8> },
/// The full document state for a new client
FullDocument { doc: Vec<u8> },
}
/// The central actor holding the document state.
#[derive(Debug)]
struct DocumentHub {
sessions: HashMap<SessionId, Recipient<WsMessage>>,
doc: Arc<Mutex<List>>,
}
impl DocumentHub {
fn new() -> Self {
Self {
sessions: HashMap::new(),
// The CRDT document. We'll use a simple List CRDT for this example.
// In a real application, this would be loaded from a database.
doc: Arc::new(Mutex::new(List::new())),
}
}
// Send a message to all connected sessions
fn broadcast(&self, message: &WsMessage) {
for addr in self.sessions.values() {
addr.do_send(message_to_send.clone());
}
}
}
impl Actor for DocumentHub {
type Context = Context<Self>;
}
/// Actor message for a client connecting to the hub
#[derive(Message)]
#[rtype(result = "()")]
struct Connect {
id: SessionId,
addr: Recipient<WsMessage>,
}
impl Handler<Connect> for DocumentHub {
type Result = ();
fn handle(&mut self, msg: Connect, _: &mut Context<Self>) {
println!("INFO: Client {} connected", msg.id);
self.sessions.insert(msg.id, msg.addr.clone());
// On connect, send the new client the full document state.
// A pitfall here is a large document blocking the actor. For huge documents,
// this should be done chunked or out-of-band.
let doc_state = self.doc.lock().unwrap().encode_v1();
msg.addr.do_send(WsMessage::FullDocument { doc: doc_state });
}
}
/// Actor message for a client disconnecting
#[derive(Message)]
#[rtype(result = "()")]
struct Disconnect {
id: SessionId,
}
impl Handler<Disconnect> for DocumentHub {
type Result = ();
fn handle(&mut self, msg: Disconnect, _: &mut Context<Self>) {
println!("INFO: Client {} disconnected", msg.id);
self.sessions.remove(&msg.id);
}
}
/// Actor message for applying operations from a client
#[derive(Message)]
#[rtype(result = "()")]
struct ApplyOpsMessage {
id: SessionId,
ops: Vec<u8>,
}
impl Handler<ApplyOpsMessage> for DocumentHub {
type Result = ();
fn handle(&mut self, msg: ApplyOpsMessage, _: &mut Context<Self>) {
let mut doc = self.doc.lock().unwrap();
// The core CRDT logic: merge the incoming operations.
// This operation is deterministic and handles concurrency gracefully.
doc.merge_raw_v1(&msg.ops).expect("Failed to merge ops");
println!("INFO: Applied ops from client {}, broadcasting to {} peers", msg.id, self.sessions.len() - 1);
// Broadcast the validated operations to all *other* clients.
for (id, addr) in &self.sessions {
if *id != msg.id {
addr.do_send(WsMessage::BroadcastOps { ops: msg.ops.clone() });
}
}
}
}
/// The WebSocket session actor
struct WsSession {
id: SessionId,
hb: Instant,
hub_addr: Addr<DocumentHub>,
}
impl WsSession {
const HEARTBEAT_INTERVAL: std::time::Duration = std::time::Duration::from_secs(5);
const CLIENT_TIMEOUT: std::time::Duration = std::time::Duration::from_secs(10);
fn hb(&self, ctx: &mut ws::WebsocketContext<Self>) {
ctx.run_interval(Self::HEARTBEAT_INTERVAL, |act, ctx| {
if Instant::now().duration_since(act.hb) > Self::CLIENT_TIMEOUT {
println!("WARN: Client {} heartbeat failed, disconnecting.", act.id);
act.hub_addr.do_send(Disconnect { id: act.id });
ctx.stop();
return;
}
ctx.ping(b"ping");
});
}
}
impl Actor for WsSession {
type Context = ws::WebsocketContext<Self>;
fn started(&mut self, ctx: &mut Self::Context) {
self.hb(ctx);
let addr = ctx.address().recipient();
self.hub_addr.do_send(Connect { id: self.id, addr });
}
fn stopping(&mut self, _: &mut Self::Context) -> Running {
self.hub_addr.do_send(Disconnect { id: self.id });
Running::Stop
}
}
impl Handler<WsMessage> for WsSession {
type Result = ();
fn handle(&mut self, msg: WsMessage, ctx: &mut Self::Context) {
// A production system would use a more robust binary format like MessagePack or Protobuf.
// JSON is used here for clarity.
let json_string = serde_json::to_string(&msg).unwrap();
ctx.text(json_string);
}
}
impl StreamHandler<Result<ws::Message, ws::ProtocolError>> for WsSession {
fn handle(&mut self, msg: Result<ws::Message, ws::ProtocolError>, ctx: &mut Self::Context) {
match msg {
Ok(ws::Message::Ping(msg)) => {
self.hb = Instant::now();
ctx.pong(&msg);
}
Ok(ws::Message::Pong(_)) => {
self.hb = Instant::now();
}
Ok(ws::Message::Text(text)) => {
// Attempt to deserialize the message from the client
match serde_json::from_str::<WsMessage>(&text) {
Ok(WsMessage::ApplyOps { ops }) => {
self.hub_addr.do_send(ApplyOpsMessage { id: self.id, ops });
},
Ok(_) => {
eprintln!("ERROR: Received unexpected message type from client {}", self.id);
}
Err(e) => {
eprintln!("ERROR: Failed to parse message from client {}: {}", self.id, e);
}
}
}
Ok(ws::Message::Close(reason)) => {
ctx.close(reason);
ctx.stop();
}
_ => ctx.stop(),
}
}
}
async fn ws_route(
req: HttpRequest,
stream: web::Payload,
hub_addr: web::Data<Addr<DocumentHub>>,
) -> Result<HttpResponse, Error> {
ws::start(
WsSession {
id: thread_rng().gen::<usize>(),
hb: Instant::now(),
hub_addr: hub_addr.get_ref().clone(),
},
&req,
stream,
)
}
#[actix_web::main]
async fn main() -> std::io::Result<()> {
// Start the DocumentHub actor in the background
let hub = DocumentHub::new().start();
println!("INFO: Starting server at ws://127.0.0.1:8080");
HttpServer::new(move || {
App::new()
.app_data(web::Data::new(hub.clone()))
.route("/ws", web::get().to(ws_route))
})
.bind(("127.0.0.1", 8080))?
.run()
.await
}
This backend structure is robust. Each connection is stateful but lightweight. The central DocumentHub
is the only point of contention, and because CRDT merges are fast, it scales well to a high number of concurrent writers for a single document. A key architectural decision for multi-document support would be to spawn a DocumentHub
actor per active document, managed by a higher-level “supervisor” actor.
The Frontend: Valtio and a Custom Hook for Seamless Integration
On the frontend, the goal is to create a seamless developer experience. A component shouldn’t need to know about WebSockets or CRDTs; it should just interact with a shared state object. Valtio and a custom React hook make this possible.
First, we define a service that encapsulates the WebSocket connection and message parsing logic.
// file: collaboration-kit.ts
import { Doc, merge, encode, decode } from 'diamond-types-web';
import { proxy, ref, snapshot } from 'valtio';
import { proxyWithComputed } from 'valtio/utils';
// Define the shape of messages, mirroring the Rust backend
type WsMessage =
| { type: 'FullDocument'; doc: Uint8Array }
| { type: 'BroadcastOps'; ops: Uint8Array };
// The Valtio store. The `doc` is the core CRDT object.
// We mark it with `ref()` to tell Valtio not to create proxies for its internals,
// as the diamond-types library manages its own state. We will manually trigger updates.
interface CollaborationState {
doc: Doc;
connection: 'disconnected' | 'connecting' | 'connected';
}
// Custom hook to manage the collaborative state
export function useCollaboration(docId: string) {
// We use `proxyWithComputed` to create a Valtio store instance that is local
// to this hook's lifecycle. A real-world application might use a global store.
const state = proxyWithComputed<CollaborationState>(
{
doc: ref(new Doc()), // Initialize with an empty CRDT document
connection: 'disconnected',
},
// Computed properties are useful for deriving state, e.g., the text content
{
text: (snap) => (snap.doc as Doc).getText('text').toString(),
}
);
useEffect(() => {
state.connection = 'connecting';
const ws = new WebSocket(`ws://127.0.0.1:8080/ws?docId=${docId}`);
ws.onopen = () => {
console.log('INFO: WebSocket connected');
state.connection = 'connected';
};
ws.onclose = () => {
console.log('WARN: WebSocket disconnected');
state.connection = 'disconnected';
};
ws.onerror = (err) => {
console.error('ERROR: WebSocket error:', err);
state.connection = 'disconnected';
};
ws.onmessage = (event) => {
// The core logic for handling incoming server messages
const msg = JSON.parse(event.data);
// A common pitfall is not correctly handling binary data from JSON.
// We must convert the array of numbers back to a Uint8Array.
const payload = msg.ops ? new Uint8Array(Object.values(msg.ops)) : new Uint8Array(Object.values(msg.doc));
if (msg.type === 'FullDocument') {
console.log('INFO: Received full document state');
const newDoc = new Doc();
merge(newDoc, decode(payload));
// Replace the document object. Valtio `ref()` requires this.
state.doc = ref(newDoc);
} else if (msg.type === 'BroadcastOps') {
console.log('INFO: Received broadcasted operations');
// This is the beauty of CRDTs. We just merge the incoming changes.
// No complex transformation logic is needed.
merge(state.doc, decode(payload));
// We have to re-assign to trigger Valtio's update for ref-ed objects.
state.doc = ref(state.doc);
}
};
// This is the bridge from Valtio to the CRDT engine.
// We subscribe to local changes on our CRDT document.
const unsubscribe = state.doc.on('update', (update: Uint8Array) => {
if (ws.readyState === WebSocket.OPEN) {
// When the user types, the `Doc` object is mutated, which emits an 'update' event.
// We capture the resulting operations and send them to the server.
const message = {
type: 'ApplyOps',
ops: Array.from(update), // Convert Uint8Array for JSON serialization
};
ws.send(JSON.stringify(message));
}
});
return () => {
unsubscribe();
ws.close();
};
}, [docId]); // Re-connect if the document ID changes
return state;
}
Now, the React component becomes incredibly simple. It uses the useCollaboration
hook and interacts with the Valtio proxy as if it were a simple local object.
// file: CollaborativeEditor.tsx
import React from 'react';
import { useSnapshot } from 'valtio';
import { useCollaboration } from './collaboration-kit';
function CollaborativeEditor({ docId }: { docId: string }) {
const collabState = useCollaboration(docId);
// `useSnapshot` provides an immutable snapshot of the state for rendering.
// React re-renders when this snapshot changes.
const snap = useSnapshot(collabState);
const handleTextChange = (e: React.ChangeEvent<HTMLTextAreaElement>) => {
// This is where the magic happens. We are directly mutating the CRDT document.
// Valtio does not see this mutation directly because `doc` is wrapped in `ref()`.
// Instead, the `diamond-types` library handles the mutation and emits an 'update' event,
// which our hook listens to for sending data to the server.
// It also triggers our own state update to reflect the change locally.
const editorText = snap.doc.getText('text');
// A more sophisticated implementation would calculate the diff
// and apply granular CRDT operations (insert/delete).
// For simplicity, we replace the whole text. diamond-types is smart enough
// to generate a minimal diff from this.
editorText.replace(0, editorText.length, e.target.value);
};
return (
<div>
<h1>Collaborative Document ({snap.connection})</h1>
<textarea
value={snap.text}
onChange={handleTextChange}
style={{ width: '100%', height: '400px', fontFamily: 'monospace' }}
disabled={snap.connection !== 'connected'}
/>
</div>
);
}
The flow of an edit is now fully CRDT-driven:
sequenceDiagram participant ClientA as Client A (Editor) participant ValtioA as Valtio Proxy (A) participant WsA as WebSocket Hook (A) participant Server as Actix-web Hub participant WsB as WebSocket Hook (B) participant ValtioB as Valtio Proxy (B) participant ClientB as Client B (Editor) ClientA->>+ValtioA: User types 'a'. (handleTextChange) ValtioA->>+WsA: Mutates CRDT doc, which emits 'update' event with ops. WsA->>+Server: Sends { type: 'ApplyOps', ops: [...] } Server->>Server: Merges ops into master CRDT document. Server->>+WsB: Broadcasts { type: 'BroadcastOps', ops: [...] } WsB->>+ValtioB: Merges incoming ops into local CRDT doc. ValtioB-->>ClientB: Valtio triggers re-render. Text area updates. deactivate WsB deactivate ValtioB deactivate Server deactivate WsA deactivate ValtioA
The key insight is how well Valtio’s philosophy aligns with CRDTs. Both embrace controlled, direct mutation. By wrapping the CRDT document object in ref()
, we delegate state management to the diamond-types
library and simply use Valtio as a highly efficient notification layer to trigger React re-renders when the underlying data changes. The useCollaboration
hook becomes a powerful piece of reusable infrastructure—our “kit”—that cleanly separates the complexities of real-time networking from the UI components.
Limitations and Future Paths
This implementation, while robust, is still a foundational kit. In a production system, several aspects would need to be addressed. The DocumentHub
holds the entire document in memory and has no persistence. A real system would need to snapshot the CRDT state to a database (e.g., PostgreSQL, Redis) periodically and on shutdown, and load it on demand when the first user joins a document session.
The protocol currently uses JSON, which is verbose for binary operations. Switching to a binary serialization format like MessagePack or Protobuf would significantly reduce network overhead. Furthermore, authentication and authorization are entirely absent. A production implementation would integrate token-based auth into the WebSocket upgrade request and have the DocumentHub
check permissions before connecting a session or applying operations.
Finally, while this covers document state, it doesn’t cover ephemeral presence state (e.g., cursors, selections). This is typically handled with a separate, non-CRDT message channel over the same WebSocket, as presence data doesn’t require the same strict consistency guarantees and can be treated as last-write-wins. This would be a natural extension of the existing actor model.