Implementing a CRDT-Based Synchronization Layer for Real-Time Collaborative Code Review Workflows

Software Architecture

Word Count: 2.8k

Read Times: 17 Min

The initial proof-of-concept for our internal code review tool was plagued by predictable issues. Comments were managed via standard REST API calls, polling for updates every ten seconds. This approach created a disjointed user experience where reviewers would frequently work on stale data, leading to overwritten drafts and redundant feedback. The technical debt was palpable; the system felt fragile and was a constant source of developer friction. A complete re-architecture of the commenting and review workflow was not just a “nice-to-have” but a necessity for team productivity.

Our first iteration moved away from polling to WebSockets. This was a logical step, providing instantaneous communication. However, it quickly became apparent that real-time messaging alone doesn’t solve state consistency. The “last write wins” problem simply manifested faster. Two engineers typing a comment in the same thread simultaneously would result in one’s work being clobbered. Server-side conflict resolution logic was considered, but this path leads to immense complexity, operational locking, and a stateful backend that is difficult to scale. This is a classic distributed systems problem manifesting in a user-facing feature.

The core challenge was identified: we needed to enable multiple users to concurrently edit shared state (review comments, suggestion blocks) in a way that converges to a consistent state without a central authority dictating the outcome of every keystroke. This requirement led us to Conflict-free Replicated Data Types (CRDTs).

Technology Selection Rationale

In a real-world project, technology choices are a balance of capability, ecosystem maturity, and team familiarity.

Backend: Spring Boot with STOMP over WebSocket
We chose Spring Boot for its robust ecosystem and our team’s existing expertise. The spring-boot-starter-websocket module provides a high-level messaging protocol (STOMP) that simplifies managing subscriptions and broadcasting messages. While a raw WebSocket implementation offers more control, STOMP provides a pragmatic abstraction with user destination handling and a message-broker pattern out of the box, which was sufficient for our needs. A common mistake is to over-engineer this layer early on; a single Spring Boot instance can handle thousands of concurrent connections before needing a more complex solution involving an external message broker like RabbitMQ or Redis Pub/Sub.
Frontend State Management: XState and Zustand
A code review is not just a collection of comments; it’s a process with a defined lifecycle: Draft -> In Review -> Requires Changes -> Approved -> Merged. This is a textbook use case for a finite state machine. XState was selected to model this core workflow. It enforces valid state transitions and makes the application logic predictable and resilient to edge cases.
However, not all state belongs in a global state machine. UI-specific state, such as the visibility of a file tree, the current search filter, or the expansion state of comment threads, is transient. Forcing this into XState would create unnecessary complexity. Zustand was chosen for its minimalistic API and hook-based approach to manage this local, transient component state. This separation of concerns—process state in XState, view state in Zustand—is a critical architectural pattern for maintainable frontends.
Real-Time Synchronization: CRDTs (via Y.js)
This was the cornerstone of the new architecture. Y.js is a mature, high-performance CRDT implementation. It provides data structures like Y.Text (for collaborative text) and Y.Array that can be manipulated locally and will automatically merge changes from other clients in a mathematically provable, conflict-free manner. It abstracts away the complex distributed systems theory into a practical library.
Build Tool: Vite
For the frontend, Vite was a non-negotiable choice. Its near-instant Hot Module Replacement (HMR) and efficient build process significantly shorten the development feedback loop, a crucial factor for productivity.

Architectural Overview

The final architecture operates on a simple principle: the server is a dumb, stateless message forwarder. All intelligence for state merging and conflict resolution is offloaded to the clients via CRDTs.

sequenceDiagram
    participant ClientA as Client A (React)
    participant SpringBoot as Spring Boot WebSocket
    participant ClientB as Client B (React)

    ClientA->>+SpringBoot: Establishes WebSocket Connection
    ClientB->>+SpringBoot: Establishes WebSocket Connection

    SpringBoot-->>ClientA: Connection ACK
    SpringBoot-->>ClientB: Connection ACK
    
    ClientA->>SpringBoot: Subscribes to topic /topic/review/123
    ClientB->>SpringBoot: Subscribes to topic /topic/review/123

    Note over ClientA, ClientB: User types 'Hello' in a shared comment box.

    ClientA-->>ClientA: Y.js captures local change, generates binary update vector
    ClientA->>SpringBoot: Sends CRDT update message to /app/review/123

    SpringBoot->>SpringBoot: Receives message
    SpringBoot-->>ClientA: Broadcasts CRDT update to /topic/review/123
    SpringBoot-->>ClientB: Broadcasts CRDT update to /topic/review/123

    ClientA-->>ClientA: Receives its own broadcast, ignores (or verifies)
    ClientB-->>ClientB: Receives update, Y.js applies it to local document
    
    Note over ClientB: UI (textarea) automatically reflects 'Hello'

This model is highly scalable. Since the Spring Boot application only forwards binary blobs, it doesn’t need to parse or understand the content, keeping its workload minimal.

Backend Implementation: The WebSocket Relay

The Spring Boot implementation is surprisingly lean. Its primary role is to configure the WebSocket endpoints and broadcast any received message to all subscribers of a given topic.

First, the WebSocket configuration:

// File: src/main/java/com/example/review/config/WebSocketConfig.java
package com.example.review.config;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.context.annotation.Configuration;
import org.springframework.messaging.simp.config.MessageBrokerRegistry;
import org.springframework.web.socket.config.annotation.EnableWebSocketMessageBroker;
import org.springframework.web.socket.config.annotation.StompEndpointRegistry;
import org.springframework.web.socket.config.annotation.WebSocketMessageBrokerConfigurer;

@Configuration
@EnableWebSocketMessageBroker
public class WebSocketConfig implements WebSocketMessageBrokerConfigurer {

    private static final Logger logger = LoggerFactory.getLogger(WebSocketConfig.class);

    @Override
    public void configureMessageBroker(MessageBrokerRegistry config) {
        // Enables a simple in-memory message broker.
        // Destinations prefixed with "/topic" will be routed to the broker.
        // Client applications subscribe to these destinations.
        config.enableSimpleBroker("/topic");

        // Designates the "/app" prefix for messages that are bound for
        // @MessageMapping-annotated methods in a controller.
        config.setApplicationDestinationPrefixes("/app");
    }

    @Override
    public void registerStompEndpoints(StompEndpointRegistry registry) {
        // The endpoint clients will connect to.
        // `withSockJS()` provides a fallback for browsers that don't support WebSockets.
        // `setAllowedOriginPatterns("*")` is for development; in production, this should
        // be locked down to the specific frontend domain.
        registry.addEndpoint("/ws-sync")
                .setAllowedOriginPatterns("*")
                .withSockJS();
        logger.info("STOMP endpoint registered at /ws-sync");
    }
}

The key parts of this configuration are enableSimpleBroker which sets up the topic prefixes for broadcasting, and setApplicationDestinationPrefixes for messages directed at the application logic.

The controller is equally straightforward. It defines a single method to handle incoming CRDT updates.

// File: src/main/java/com/example/review/controller/SyncController.java
package com.example.review.controller;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.messaging.handler.annotation.DestinationVariable;
import org.springframework.messaging.handler.annotation.MessageMapping;
import org.springframework.messaging.handler.annotation.SendTo;
import org.springframework.stereotype.Controller;

@Controller
public class SyncController {

    private static final Logger logger = LoggerFactory.getLogger(SyncController.class);

    /**
     * Handles incoming CRDT update messages for a specific review session.
     * The method receives a byte array, which is the raw CRDT update payload from a client.
     * It then broadcasts this payload to all other clients subscribed to the same review topic.
     *
     * @param reviewId The ID of the code review session, used to form the topic destination.
     * @param message  The binary CRDT update payload.
     * @return The same binary payload, which will be sent to the destination specified in @SendTo.
     */
    @MessageMapping("/review/{reviewId}")
    @SendTo("/topic/review/{reviewId}")
    public byte[] syncReviewState(@DestinationVariable String reviewId, byte[] message) {
        // In a real application, you might add logging or metrics here.
        // For example, tracking message size or frequency per reviewId.
        if (message == null || message.length == 0) {
            logger.warn("Received empty message for reviewId: {}", reviewId);
            // Returning an empty array or handling as an error might be necessary
            // depending on client-side expectations.
            return new byte[0]; 
        }

        // The business logic is trivial: just relay the message.
        // The server is kept stateless and simple. All complex merging logic
        // is handled by the CRDT implementation on the clients.
        logger.trace("Relaying message of size {} bytes for reviewId: {}", message.length, reviewId);
        
        return message;
    }
}

A pitfall here is message validation. While we keep the server “dumb” to the content, basic validation (e.g., message size limits) should be implemented at the WebSocket layer to prevent denial-of-service attacks. The @DestinationVariable allows us to create dynamic topics for each code review session, ensuring that updates are only broadcast to relevant clients.

Frontend Implementation: Weaving It All Together

The frontend is where the complexity lies. We must orchestrate the WebSocket connection, the CRDT document, the state machine, and the local UI state.

1. The Core State Machine (XState)

The review process is modeled as a state machine. This code provides clarity and prevents impossible states, such as merging a review that hasn’t been approved.

// src/machines/reviewMachine.js
import { createMachine, assign } from 'xstate';

// This machine models the lifecycle of a single code review.
// It doesn't care about the *content* of comments, only the *process*.
export const reviewMachine = createMachine({
  id: 'codeReview',
  initial: 'loading',
  context: {
    reviewId: null,
    error: null,
    author: null,
    changes: [],
  },
  states: {
    loading: {
      on: {
        FETCH_SUCCESS: {
          target: 'inReview',
          actions: assign((context, event) => ({
            author: event.data.author,
            changes: event.data.changes,
          })),
        },
        FETCH_ERROR: {
          target: 'failure',
          actions: assign({ error: (context, event) => event.data }),
        },
      },
    },
    inReview: {
      on: {
        // Reviewer action
        APPROVE: { target: 'approving' },
        // Reviewer action
        REQUEST_CHANGES: { target: 'submittingFeedback' },
        // Author action
        MARK_AS_READY: {
            // Guard to ensure the author can't mark it ready again if it already is.
            cond: 'isNotAuthorOfLastAction', 
            target: 'inReview' 
        }
      },
    },
    approving: {
      invoke: {
        src: 'approveReview', // This would be an async function making an API call
        onDone: { target: 'approved' },
        onError: { target: 'inReview', actions: assign({ error: 'Failed to approve' }) },
      },
    },
    submittingFeedback: {
      invoke: {
        src: 'submitFeedback',
        onDone: { target: 'changesRequested' },
        onError: { target: 'inReview', actions: assign({ error: 'Failed to submit feedback' }) },
      },
    },
    approved: {
      on: {
        MERGE: { target: 'merging' },
      },
    },
    changesRequested: {
      on: {
        // Author submits new changes
        PUSH_NEW_COMMIT: { target: 'loading' }, // Re-fetch the review state
      },
    },
    merging: {
      type: 'final' // This is a terminal state for this machine
    },
    failure: {
      type: 'final',
    },
  }
}, {
    guards: {
        isNotAuthorOfLastAction: (context, event) => {
            // A placeholder for real business logic.
            // In a real system, you'd check if the current user is different
            // from the person who last requested changes.
            return true;
        }
    }
});

This machine definition is pure logic. It is then instantiated within a React component using the @xstate/react hook useMachine.

2. Transient UI State (Zustand)

Alongside the global process state, we need to manage local UI state. For instance, a store to handle the active file being viewed.

// src/stores/uiStore.js
import { create } from 'zustand';

// This store is for UI state that doesn't affect the core review process.
// It's ephemeral and often component-specific.
export const useUIStore = create((set) => ({
  activeFile: 'src/main/java/com/example/review/Application.java',
  isCommentBoxCollapsed: true,
  setActiveFile: (path) => set({ activeFile: path }),
  toggleCommentBox: () => set((state) => ({ isCommentBoxCollapsed: !state.isCommentBoxCollapsed })),
}));

Using this store inside a component is trivial: const { activeFile, setActiveFile } = useUIStore();. This cleanly separates concerns. A code review of this structure would immediately highlight the clear boundary between the two state management paradigms.

3. The Collaborative Component with CRDTs

This is the most critical piece of the implementation. We create a React component for a comment thread that uses Y.js to synchronize its content.

First, we need a service to manage the WebSocket connection and the Y.Doc.

// src/services/crdtService.js
import * as Y from 'yjs';
import { WebsocketProvider } from 'y-websocket';

const doc = new Y.Doc();
let provider = null;

// This service abstracts the connection logic.
// A common mistake is to instantiate the provider inside a React component,
// leading to multiple connections.
export const connectToReviewSession = (reviewId) => {
  if (provider) {
    // If we're already connected to a different room, disconnect first.
    if (provider.roomname !== `review-${reviewId}`) {
      provider.disconnect();
    } else {
      // Already connected to the correct room.
      return { doc, provider };
    }
  }

  const wsUrl = 'ws://localhost:8080/ws-sync';
  // The room name must match the server-side topic structure.
  provider = new WebsocketProvider(wsUrl, `review-${reviewId}`, doc, {
      // y-websocket uses binary WebSockets which is more efficient for CRDTs.
      // Our Spring Boot backend must be able to handle binary frames.
      // The default configuration does this automatically.
  });

  provider.on('status', event => {
    console.log(`WebSocket connection status: ${event.status}`); // 'connected' or 'disconnected'
  });

  return { doc, provider };
};

export const getCommentThread = (threadId) => {
  // We use a Y.Map to store multiple comment threads within a single Y.Doc.
  // Each thread is identified by a unique ID.
  const threads = doc.getMap('commentThreads');
  if (!threads.has(threadId)) {
    // If the thread doesn't exist, create a new Y.Text for it.
    threads.set(threadId, new Y.Text());
  }
  return threads.get(threadId);
};

Now, the React component that uses this service. It binds a textarea directly to a Y.Text object.

// src/components/CollaborativeCommentInput.jsx
import React, { useEffect, useRef, useState } from 'react';
import { useXState } from '@xstate/react'; // Hypothetical hook for example
import { reviewMachine } from '../machines/reviewMachine';
import { useUIStore } from '../stores/uiStore';
import { connectToReviewSession, getCommentThread } from '../services/crdtService';

// This component demonstrates the integration of all three state management tools.
export const CollaborativeCommentInput = ({ reviewId, threadId, currentUser }) => {
  // 1. Core process state from XState
  const [state, send] = useXState(reviewMachine);
  
  // 2. Transient UI state from Zustand
  const { isCommentBoxCollapsed, toggleCommentBox } = useUIStore();

  const editorRef = useRef(null);
  const [yText, setYText] = useState(null);
  const [provider, setProvider] = useState(null);

  useEffect(() => {
    // Establish connection when the component mounts with a valid reviewId.
    const { provider: sessionProvider } = connectToReviewSession(reviewId);
    setProvider(sessionProvider);
    
    // Get the specific Y.Text instance for this comment thread.
    const text = getCommentThread(threadId);
    setYText(text);

    // This is the critical binding logic.
    // It connects the Y.Text object to the textarea DOM element.
    const binding = new Y.TextBinding(text, editorRef.current, provider.awareness);
    
    // A code review on this part would emphasize the need for proper cleanup.
    // Failure to destroy the binding on unmount will lead to memory leaks.
    return () => {
      if (binding) {
        binding.destroy();
      }
    };
  }, [reviewId, threadId]);

  const handleSubmit = () => {
    // Interacting with the state machine.
    // The content itself is already synced via CRDTs. This action
    // transitions the overall review state.
    send({ type: 'SUBMIT_COMMENT', user: currentUser, content: yText.toString() });
  };
  
  // The component only renders when the state machine is in a valid state for commenting.
  if (!state.matches('inReview')) {
      return <div>Commenting is disabled in the current review state: {state.value}</div>
  }

  return (
    <div>
      <button onClick={toggleCommentBox}>
        {isCommentBoxCollapsed ? 'Show Comment Box' : 'Hide Comment Box'}
      </button>
      {!isCommentBoxCollapsed && (
        <>
          <textarea
            ref={editorRef}
            rows={5}
            style={{ width: '100%', padding: '8px', border: '1px solid #ccc' }}
            placeholder="Type your comment here... others will see it in real-time."
          />
          <button onClick={handleSubmit} style={{ marginTop: '8px' }}>
            Submit Feedback
          </button>
        </>
      )}
    </div>
  );
};

During a code review of this component, several points were raised. The initial version had no cleanup logic for the Y.TextBinding, causing “zombie” listeners and performance degradation as users navigated between different reviews. The fix was adding the return () => { binding.destroy(); } in the useEffect hook. Another point was the handling of provider reconnection. y-websocket handles this automatically, but we added logging on the provider’s status event to make this behavior observable for debugging.

Lingering Issues and Future Optimizations

This architecture, while robust, is not without its trade-offs and limitations. The reliance on an in-memory message broker in Spring Boot means the backend is a single point of failure and a scalability bottleneck. The natural next step is to replace enableSimpleBroker with a StompBrokerRelay configuration pointing to an external broker like RabbitMQ or ActiveMQ, allowing for a horizontally scalable cluster of stateless Spring Boot nodes.

Persisting the CRDT data presents another challenge. Y.js documents can be serialized into a compact binary format. We currently persist the full document snapshot on every significant change. For very large review sessions with extensive comments, this is inefficient. A more advanced strategy would involve persisting periodic snapshots along with a log of incremental updates, which reduces storage overhead and improves load times, but adds complexity to the data recovery logic.

Finally, while Y.js handles text conflicts flawlessly, application-level semantic conflicts can still occur. For example, two users approving a review simultaneously is a race condition that the state machine, with proper guards and backend transactional logic, must handle. CRDTs solve data convergence, not business rule enforcement.

XState WebSocket Zustand CRDT Code Review Vite Spring Boot Real-Time

Implementing a Sandboxed WebAssembly UDF Engine within a Java Data Pipeline for Stateful Transformation

2023-10-27 Data Engineering

Rust DynamoDB ClickHouse WebAssembly Java UDF

Implementing Adaptive Rate Limiting and Circuit Breaking for Flask Services with Envoy Proxy

2023-10-27 Backend Architecture

Flask Microservices API Gateway Envoy Proxy Resilience Engineering