Selecting a Data Persistence Layer for a High-Throughput Collaborative PWA Backend


The core technical challenge is designing a persistence layer for a real-time collaborative design application—a Progressive Web App (PWA) where multiple users simultaneously manipulate elements on a shared canvas. This problem domain presents two conflicting requirements. First, the application must support complex, deeply-nested, and evolving object structures representing canvas elements, a scenario that naturally favors a flexible document model. Second, it demands predictable, single-digit millisecond latency for atomic updates at a massive scale to provide a seamless user experience, a characteristic of key-value or wide-column stores optimized for specific access patterns.

This forces a critical architectural decision at the project’s inception. Do we select a flexible document database like MongoDB and attempt to optimize it for high-throughput, low-latency writes? Or do we choose a key-value store like DynamoDB, which guarantees performance at scale, and force our complex data model to fit its rigid structure? A misstep here introduces significant technical debt and performance ceilings that are difficult to refactor later.

Our primary access patterns can be distilled into four critical operations:

  1. LoadCanvas(canvasId): Fetch all elements associated with a specific canvas. High read volume, latency sensitive on initial load.
  2. UpdateElement(canvasId, elementId, updatePayload): Modify a single element’s properties (position, color, content). Extremely high write volume, must have predictable sub-50ms latency. This is the heart of the real-time experience.
  3. ListUserProjects(userId): Retrieve metadata for all projects accessible to a user. Medium read volume, moderately latency sensitive.
  4. GetProjectHistory(projectId): Query historical versions or audit logs of a project. Low query volume, less latency sensitive, but requires complex filtering (e.g., by user, by time range, by element type).

Let’s evaluate two distinct architectural approaches to satisfy these patterns.

Approach A: The Flexible Document Model with MongoDB

MongoDB’s document-oriented nature seems like a natural fit. A canvas is a collection of elements, which can be intuitively modeled as a single Canvas document containing an array of Element sub-documents.

A simplified Mongoose schema might look like this:

// src/models/mongo/CanvasModel.js
import mongoose from 'mongoose';

const ElementSchema = new mongoose.Schema({
  elementId: { type: String, required: true },
  type: { type: String, enum: ['shape', 'text', 'image'], required: true },
  position: {
    x: { type: Number, required: true },
    y: { type: Number, required: true },
  },
  dimensions: {
    width: { type: Number, required: true },
    height: { type: Number, required: true },
  },
  properties: { type: mongoose.Schema.Types.Mixed }, // For arbitrary properties like color, content, src
  version: { type: Number, default: 1 },
}, { _id: false });

const CanvasSchema = new mongoose.Schema({
  _id: { type: String, alias: 'canvasId' }, // Using a custom ID
  projectId: { type: String, required: true, index: true },
  elements: [ElementSchema],
  lastModified: { type: Date, default: Date.now },
}, { timestamps: true });

// Index to efficiently find a specific element within the elements array
CanvasSchema.index({ 'elements.elementId': 1 });

export const CanvasModel = mongoose.model('Canvas', CanvasSchema);

The LoadCanvas operation becomes a straightforward findById. The UpdateElement operation, however, reveals the first point of friction. To update a single element, we must use positional operators to target a specific item within the elements array.

// src/services/MongoCanvasService.js

import { CanvasModel } from '../models/mongo/CanvasModel.js';
import { logger } from '../utils/logger.js';

class MongoCanvasService {
  async updateElement(canvasId, elementId, updatePayload) {
    const { position, dimensions, properties } = updatePayload;
    
    // Construct the update object to target nested fields
    const updateFields = {};
    if (position) {
      updateFields['elements.$.position'] = position;
    }
    if (dimensions) {
      updateFields['elements.$.dimensions'] = dimensions;
    }
    if (properties) {
      // Be careful with deep merges on generic objects
      // This simple approach overwrites, a more complex one might be needed.
      updateFields['elements.$.properties'] = properties;
    }

    // Use findOneAndUpdate to apply the update and retrieve the result
    // The arrayFilter targets the specific element by its ID.
    try {
      const result = await CanvasModel.findOneAndUpdate(
        { _id: canvasId },
        { 
          $set: updateFields,
          $inc: { 'elements.$.version': 1 } // Increment version for optimistic locking
        },
        {
          new: true,
          arrayFilters: [{ 'elem.elementId': elementId }],
          // This requires MongoDB 3.6+
          // 'elem' is a placeholder for the element being filtered.
          // Note: The positional operator `$` will then correspond to the FIRST element that matches the arrayFilters.
        }
      ).exec();

      if (!result) {
        logger.warn(`Update failed: Canvas ${canvasId} or Element ${elementId} not found.`);
        return null;
      }

      return result;
    } catch (error) {
      logger.error({
        message: 'Error updating element in MongoDB',
        canvasId,
        elementId,
        error: error.message,
      });
      throw new Error('Database update operation failed.');
    }
  }
}

Pros of the MongoDB Approach:

  1. Modeling Simplicity: The data structure in the database directly mirrors the application’s object model. This is intuitive for developers.
  2. Powerful Queries: The MongoDB Query Language (MQL) and its aggregation framework are extremely powerful for GetProjectHistory. We could easily build complex queries to analyze element changes over time, assuming we store historical versions or an audit log.
  3. Schema Flexibility: As we add new element types with unique properties, we can do so without rigid schema migrations. The Mixed type for properties is a testament to this flexibility.

Cons and Production Pitfalls:

  1. The Large Document Problem: In a real-world project, a complex design could contain thousands of elements. This bloats the parent Canvas document to several megabytes. Every small update to a single element requires MongoDB to read the entire document from disk, apply the change in memory, and write the entire document back. This generates significant I/O amplification and network overhead.
  2. Contention: With many users collaborating on the same canvas, they are all attempting to write to the same document. This creates write contention at the document level, which can serialize operations and increase latency, undermining the real-time experience.
  3. Cost and Operational Overhead: Achieving predictable low latency at massive scale with MongoDB often requires careful cluster management, performance tuning (index optimization, hardware selection), and sharding strategy. This translates to higher operational costs and requires specialized DevOps expertise, especially when compared to serverless alternatives. A common mistake is under-provisioning IOPS on the underlying storage, leading to performance degradation under load.

Approach B: The Access-Pattern-Driven Model with DynamoDB

DynamoDB forces a complete mental shift. Instead of modeling the data, we model the application’s access patterns. We’ll use a single-table design, a common best practice for DynamoDB, where different entity types are stored in the same table, distinguished by their key structure.

Our table will have a composite primary key: PK (Partition Key) and SK (Sort Key).

Here is the entity relationship diagram and the corresponding key design, visualized with Mermaid.js:

graph TD
    subgraph DynamoDB Single Table Design
        User["User (USER#{userId})"]
        Project["Project (PROJECT#{projectId})"]
        Canvas["Canvas (CANVAS#{canvasId})"]
        Element["Element (ELEMENT#{elementId})"]
        
        User -- "GSI1PK: USER#{userId}
GSI1SK: PROJECT#{projectId}" --> Project Project -- "PK: PROJECT#{projectId}
SK: METADATA" --> Project Project -- "PK: PROJECT#{projectId}
SK: CANVAS#{canvasId}" --> Canvas Canvas -- "PK: CANVAS#{canvasId}
SK: METADATA" --> Canvas Canvas -- "PK: CANVAS#{canvasId}
SK: ELEMENT#{elementId}" --> Element end

This design directly maps to our access patterns:

  • ListUserProjects(userId): Query GSI1 where GSI1PK = USER#{userId}. This secondary index allows us to find all projects for a user.
  • LoadCanvas(canvasId): Query the primary key where PK = CANVAS#{canvasId}. This will efficiently retrieve all items in the partition, which includes the canvas metadata and all its element items.
  • UpdateElement(canvasId, elementId, ...): A direct UpdateItem operation on a single item where PK = CANVAS#{canvasId} and SK = ELEMENT#{elementId}. This is the most efficient operation in DynamoDB.

Here is the implementation of the core service using the AWS SDK v3.

// src/services/DynamoCanvasService.js
import { 
  DynamoDBClient, 
  QueryCommand, 
  UpdateItemCommand 
} from "@aws-sdk/client-dynamodb";
import { marshall, unmarshall } from "@aws-sdk/util-dynamodb";
import { logger } from '../utils/logger.js';

// It's critical to initialize the client once and reuse it.
// In a serverless environment like Lambda, this would be done outside the handler.
const ddbClient = new DynamoDBClient({ 
  region: process.env.AWS_REGION || 'us-east-1',
  // Further configuration for retries, timeouts, etc., is essential for production.
  maxAttempts: 3, 
});

const TABLE_NAME = process.env.DYNAMODB_TABLE_NAME;

class DynamoCanvasService {
  /**
   * Loads all element items for a given canvas.
   * In a real application, this would also fetch canvas metadata.
   */
  async getCanvasElements(canvasId) {
    const params = {
      TableName: TABLE_NAME,
      KeyConditionExpression: "PK = :pk AND begins_with(SK, :sk_prefix)",
      ExpressionAttributeValues: marshall({
        ":pk": `CANVAS#${canvasId}`,
        ":sk_prefix": "ELEMENT#",
      }),
      // Production code should handle pagination if a canvas can have > 1MB of elements.
    };

    try {
      const command = new QueryCommand(params);
      const { Items } = await ddbClient.send(command);
      return Items.map(item => unmarshall(item));
    } catch (error) {
      logger.error({
        message: 'Error fetching canvas elements from DynamoDB',
        canvasId,
        error: error.message,
      });
      throw new Error('Database query operation failed.');
    }
  }

  /**
   * Updates a single element. This is the hot path.
   * This implementation includes optimistic locking using a version attribute.
   */
  async updateElement(canvasId, elementId, updatePayload, currentVersion) {
    const { position, dimensions, properties } = updatePayload;

    // This is more verbose than MongoDB, but it's explicit and type-safe.
    let updateExpression = 'SET #lastModified = :ts, #version = :newVersion';
    const expressionAttributeNames = {
      '#lastModified': 'lastModified',
      '#version': 'version',
    };
    const expressionAttributeValues = {
      ':ts': new Date().toISOString(),
      ':currentVersion': currentVersion,
      ':newVersion': currentVersion + 1,
    };

    if (position) {
      updateExpression += ', #pos = :pos';
      expressionAttributeNames['#pos'] = 'position';
      expressionAttributeValues[':pos'] = position;
    }
    if (dimensions) {
      updateExpression += ', #dim = :dim';
      expressionAttributeNames['#dim'] = 'dimensions';
      expressionAttributeValues[':dim'] = dimensions;
    }
    if (properties) {
      updateExpression += ', #props = :props';
      expressionAttributeNames['#props'] = 'properties';
      expressionAttributeValues[':props'] = properties;
    }

    const params = {
      TableName: TABLE_NAME,
      Key: {
        PK: `CANVAS#${canvasId}`,
        SK: `ELEMENT#${elementId}`,
      },
      UpdateExpression: updateExpression,
      ConditionExpression: '#version = :currentVersion', // Optimistic locking
      ExpressionAttributeNames: expressionAttributeNames,
      ExpressionAttributeValues: marshall(expressionAttributeValues),
      ReturnValues: 'ALL_NEW', // Return the item after the update
    };

    try {
      const command = new UpdateItemCommand(params);
      const { Attributes } = await ddbClient.send(command);
      return unmarshall(Attributes);
    } catch (error) {
      if (error.name === 'ConditionalCheckFailedException') {
        logger.warn({
          message: 'Optimistic locking failure on element update.',
          canvasId,
          elementId,
          version: currentVersion,
        });
        // The client needs to handle this by refetching the element and retrying.
        throw new Error('Conflict: Element has been modified by another user.');
      }
      logger.error({
        message: 'Error updating element in DynamoDB',
        canvasId,
        elementId,
        error: error.message,
      });
      throw new Error('Database update operation failed.');
    }
  }
}

Pros of the DynamoDB Approach:

  1. Predictable High Performance: The UpdateElement operation targets a single item via its primary key. This is the most efficient write operation in DynamoDB and its performance remains constant regardless of data volume. We have eliminated the I/O amplification and contention issues seen with the large document model.
  2. Serverless and Scalable: DynamoDB scales seamlessly with zero administrative overhead. The pay-per-request model can be extremely cost-effective for spiky workloads typical of collaborative tools.
  3. Access Pattern Enforcement: The single-table design forces the development team to be deliberate about data access. This prevents inefficient, ad-hoc queries from being introduced accidentally, which can protect production stability.

Cons and Production Pitfalls:

  1. Modeling Complexity: The learning curve for single-table design is steep. It feels unnatural at first and can be difficult to visualize and evolve. Adding a new access pattern often requires creating a new Global Secondary Index (GSI), which duplicates storage and adds cost.
  2. Query inflexibility: The GetProjectHistory access pattern is a significant problem. DynamoDB is not an analytical database. Running complex queries with multiple filters is inefficient and expensive, often requiring full table scans, which should be avoided at all costs in production.
  3. “Hot Partition” Risk: If one specific canvas becomes extremely popular, all its elements share the same Partition Key (CANVAS#{canvasId}). This can lead to a “hot partition,” where the throughput for a single partition exceeds DynamoDB’s limits (currently 3000 RCU and 1000 WCU per second per partition). While these are high limits, it’s a design constraint to be aware of.

The Architect’s Decision and Rationale

In a real-world project, uptime and user experience are paramount. The most critical operation is UpdateElement, which directly impacts the perception of real-time collaboration. The potential for unpredictable latency and write contention with MongoDB’s large document model presents an unacceptable risk to this core feature.

Therefore, the decision is to use DynamoDB as the primary persistence layer for the real-time collaborative state (the “write model”). Its guarantee of predictable low-latency performance at scale for our hottest access pattern is the deciding factor. The operational simplicity of a serverless database allows a smaller team to manage a large-scale system effectively.

However, we must address the glaring weakness of DynamoDB: its inability to handle the complex queries required by GetProjectHistory. To solve this, we will implement a Command Query Responsibility Segregation (CQRS) pattern.

  1. Write Path (Commands): All state changes (UpdateElement, CreateElement, etc.) are written directly to the DynamoDB single table. This is our source of truth.
  2. Read Path (Queries): We enable DynamoDB Streams on our table. A Lambda function will be triggered by every change, transforming the event data into a more query-friendly format and pushing it to a secondary data store optimized for analytical queries. For this “read model,” MongoDB Atlas (or a self-hosted instance) is an excellent choice, leveraging its rich query capabilities and flexible schema. Amazon OpenSearch Service would be another strong contender.

This hybrid architecture leverages the strengths of both databases: DynamoDB for high-throughput, low-latency transactional writes, and MongoDB for complex, flexible reads on historical data.

A simplified test for our DynamoDB service might look like this, using aws-sdk-client-mock:

// tests/DynamoCanvasService.test.js
import { jest } from '@jest/globals';
import { DynamoDBClient, UpdateItemCommand } from '@aws-sdk/client-dynamodb';
import { mockClient } from 'aws-sdk-client-mock';
import { DynamoCanvasService } from '../src/services/DynamoCanvasService';

// Mock the logger to prevent console noise during tests
jest.mock('../src/utils/logger', () => ({
  logger: {
    info: jest.fn(),
    warn: jest.fn(),
    error: jest.fn(),
  },
}));

describe('DynamoCanvasService', () => {
  let ddbMock;
  let service;

  beforeEach(() => {
    ddbMock = mockClient(DynamoDBClient);
    service = new DynamoCanvasService();
  });

  afterEach(() => {
    ddbMock.reset();
    jest.clearAllMocks();
  });

  it('should successfully update an element with correct parameters', async () => {
    const updatedItem = {
      PK: { S: 'CANVAS#canvas-123' },
      SK: { S: 'ELEMENT#element-abc' },
      version: { N: '2' },
      // ... other attributes
    };
    ddbMock.on(UpdateItemCommand).resolves({ Attributes: updatedItem });

    const result = await service.updateElement(
      'canvas-123',
      'element-abc',
      { position: { x: 100, y: 150 } },
      1
    );

    expect(result.version).toBe(2);
    const sentCommand = ddbMock.calls()[0].args[0].input;
    expect(sentCommand.TableName).toBe(process.env.DYNAMODB_TABLE_NAME);
    expect(sentCommand.Key.PK).toBe('CANVAS#canvas-123');
    expect(sentCommand.ConditionExpression).toBe('#version = :currentVersion');
  });

  it('should throw a conflict error on ConditionalCheckFailedException', async () => {
    const error = new Error('ConditionalCheckFailedException');
    error.name = 'ConditionalCheckFailedException';
    ddbMock.on(UpdateItemCommand).rejects(error);

    await expect(
      service.updateElement('canvas-123', 'element-abc', {}, 1)
    ).rejects.toThrow('Conflict: Element has been modified by another user.');
  });
});

This hybrid architecture is not without its own complexities. The introduction of DynamoDB Streams and a Lambda function adds another moving part to the system. This creates a state of eventual consistency between the real-time write model and the historical read model, which the PWA client must be designed to handle. The latency of this replication pipeline must be monitored to ensure it meets business requirements for how quickly historical data must become queryable. Furthermore, the single-table design in DynamoDB requires strict discipline; adding new access patterns must be a deliberate architectural decision, not an afterthought, as it often necessitates creating and backfilling a new GSI, which can be a non-trivial operation on a large dataset.


  TOC