Implementing a Resilient Micro-frontend Deployment Strategy with a Service Worker Orchestrator on AWS


The monolith deployment pipeline was our primary bottleneck. A team fixing a typo in the checkout module would trigger a full application rebuild, regression test suite, and a high-stakes deployment for the entire React-based, Ant Design-infused frontend. Releases were slow, risky, and batched, stifling team autonomy. The clear path forward was micro-frontends, but the standard approaches—iframes, web components, or even build-time solutions like Webpack Module Federation—didn’t solve our core requirement: true runtime deployment independence with zero-downtime updates and immediate rollback capability, all without disrupting the user’s session.

Our initial concept was to leverage a client-side proxy to dynamically route traffic to the correct micro-frontend version. This led us directly to the Service Worker. It’s designed to intercept any network request originating from the client, making it the perfect control plane for a client-side deployment strategy. The architecture we settled on uses an Ant Design-based “shell” application, which is stable and rarely changes. This shell registers a Service Worker that acts as an orchestrator. This orchestrator’s behavior is dictated by a manifest file hosted on AWS S3, which serves as the single source of truth for which version of each micro-frontend is currently “live.”

The decision against Module Federation was a conscious one. In a real-world project with many teams, forcing a coordinated build to update a shared dependency or the host container re-introduces the very coupling we aimed to eliminate. We needed runtime composition. The Service Worker provides this by decoupling the deployment of a micro-frontend from the deployment of the shell and other micro-frontends. AWS S3 combined with CloudFront was the obvious choice for hosting both the static assets of our micro-frontends and the central deployment-manifest.json. It’s cheap, scalable, and globally distributed.

Phase 1: The Manifest-Driven Architecture and the Orchestrator’s Brain

The entire system hinges on a simple JSON file on S3. This manifest defines which applications exist, their active versions, and the entry points for their assets. A typical deployment-manifest.json looks like this:

{
  "version": "2023-10-27T10:00:00Z",
  "applications": {
    "products": {
      "version": "1.2.1",
      "entry": "/products-mfe/1.2.1/index.html",
      "assets": [
        "/products-mfe/1.2.1/main.chunk.js",
        "/products-mfe/1.2.1/styles.css"
      ]
    },
    "cart": {
      "version": "2.0.0",
      "entry": "/cart-mfe/2.0.0/index.html",
      "assets": [
        "/cart-mfe/2.0.0/main.chunk.js"
      ]
    }
  },
  "shared-libs": {
    "react": "18.2.0",
    "antd": "5.9.0"
  }
}

This file is the contract. The CI/CD pipeline for the products team, upon a successful build of version 1.2.2, would be responsible for uploading the assets to a corresponding products-mfe/1.2.2/ path in our S3 bucket and then atomically updating this manifest file to point to the new version.

The Service Worker’s first job is to fetch and cache this manifest. The initial implementation for sw.js begins with installation and activation logic.

// sw.js

const MANIFEST_URL = '/deployment-manifest.json'; // This will be proxied by CloudFront to the S3 bucket
const MANIFEST_CACHE_NAME = 'mfe-manifest-cache-v1';
const ASSET_CACHE_PREFIX = 'mfe-asset-cache-';
const SHELL_CACHE_NAME = 'mfe-shell-cache-v1';

// A list of the shell application's own assets to be cached during installation.
const SHELL_ASSETS = [
  '/',
  '/index.html',
  '/static/js/bundle.js',
  '/static/css/main.css',
  '/logo.png'
];

self.addEventListener('install', (event) => {
  console.log('[Service Worker] Install event fired.');
  // Pre-cache the shell assets for offline capability and performance.
  event.waitUntil(
    caches.open(SHELL_CACHE_NAME).then((cache) => {
      console.log('[Service Worker] Caching shell application assets.');
      return cache.addAll(SHELL_ASSETS);
    })
  );
  // Force the waiting service worker to become the active service worker.
  self.skipWaiting();
});

self.addEventListener('activate', (event) => {
  console.log('[Service Worker] Activate event fired.');
  // This is the ideal place for cache management and cleanup.
  event.waitUntil(
    // We will implement cache cleanup logic later.
    // Immediately claim clients to ensure the SW controls the page on first load.
    self.clients.claim()
  );
});

// A utility function to fetch and cache the latest manifest.
// This is central to the entire orchestration logic.
async function fetchAndCacheManifest() {
  try {
    const response = await fetch(MANIFEST_URL, { cache: 'no-store' }); // Always get the latest
    if (!response.ok) {
      throw new Error(`Failed to fetch manifest: ${response.statusText}`);
    }
    const manifestCache = await caches.open(MANIFEST_CACHE_NAME);
    // Store the manifest response. We will use this as our source of truth.
    await manifestCache.put(new Request(MANIFEST_URL), response.clone());
    console.log('[Service Worker] Successfully fetched and cached new manifest.');
    return await response.json();
  } catch (error) {
    console.error('[Service Worker] Error fetching manifest:', error);
    // In case of network failure, try to serve from cache.
    const manifestCache = await caches.open(MANIFEST_CACHE_NAME);
    const cachedResponse = await manifestCache.match(MANIFEST_URL);
    if (cachedResponse) {
      console.warn('[Service Worker] Serving manifest from cache due to network error.');
      return await cachedResponse.json();
    }
    // If there's no cached manifest, we are in a bad state.
    // The application might not function correctly.
    return null;
  }
}

A significant early pitfall was how and when to check for a new manifest. Polling it on every navigation would be inefficient. Our solution was a hybrid approach. The manifest on CloudFront is configured with a very short cache TTL (e.g., 60 seconds). The Service Worker fetches it on activation and then periodically in the background. If a new version is detected, it pre-caches the new assets and then uses postMessage to inform the active client, which can display an Ant Design Notification prompting the user to refresh.

Phase 2: Intercepting Requests as a Runtime Router

With the manifest logic in place, the core functionality resides in the fetch event listener. This is where the Service Worker intercepts outgoing requests and decides how to respond.

// sw.js (continued)

// Main fetch handler for request interception
self.addEventListener('fetch', (event) => {
  const { request } a= event;
  const url = new URL(request.url);

  // We only care about GET requests.
  if (request.method !== 'GET') {
    return;
  }

  // Strategy: If it's a navigation request to a micro-frontend route,
  // serve the correct version of its index.html.
  if (request.mode === 'navigate') {
    event.respondWith(handleNavigationRequest(request));
    return;
  }

  // Strategy: If it's a request for a static asset from a micro-frontend,
  // find it in the manifest and serve it from cache or network.
  if (isMicroFrontendAsset(url)) {
    event.respondWith(handleAssetRequest(request));
    return;
  }

  // For shell assets or other requests, use a standard cache-first strategy.
  event.respondWith(
    caches.match(request).then(response => response || fetch(request))
  );
});

async function handleNavigationRequest(request) {
  const manifest = await getManifest();
  if (!manifest) {
    // Critical failure - cannot find manifest. Fall back to network.
    return fetch(request);
  }

  const url = new URL(request.url);
  const path = url.pathname;

  // Find which application matches the navigation path.
  // Example: /products/123 should match the 'products' app.
  const appName = Object.keys(manifest.applications).find(name => path.startsWith(`/${name}`));

  if (appName) {
    const appConfig = manifest.applications[appName];
    const entryUrl = appConfig.entry;
    const cacheName = `${ASSET_CACHE_PREFIX}${appName}-${appConfig.version}`;
    
    console.log(`[Service Worker] Navigation to ${appName}. Serving entry: ${entryUrl}`);

    const cache = await caches.open(cacheName);
    const cachedResponse = await cache.match(entryUrl);
    
    if (cachedResponse) {
      return cachedResponse;
    }

    // If not in cache, fetch, cache, and return.
    // This populates the cache for the first time.
    const response = await fetch(entryUrl);
    if (response.ok) {
      await cache.put(entryUrl, response.clone());
    }
    return response;
  }

  // If no micro-frontend route matches, assume it's a shell route.
  // Serve the shell's main index.html.
  return caches.match('/');
}

// Dummy function for clarity. In a real app, this would be more robust.
function isMicroFrontendAsset(url) {
  return url.pathname.includes('-mfe/');
}

// Simplified function to get manifest from cache.
async function getManifest() {
  const manifestCache = await caches.open(MANIFEST_CACHE_NAME);
  const cachedResponse = await manifestCache.match(MANIFEST_URL);
  return cachedResponse ? cachedResponse.json() : fetchAndCacheManifest();
}

The real complexity emerged when handling asset requests from within a loaded micro-frontend. A micro-frontend’s index.html might reference main.chunk.js. The browser requests /products/main.chunk.js. The Service Worker intercepts this, but this path doesn’t exist. The actual asset is at a versioned URL like /products-mfe/1.2.1/main.chunk.js. The handleAssetRequest function must contain this rewriting logic.

// sw.js (continued)

async function handleAssetRequest(request) {
  const manifest = await getManifest();
  if (!manifest) return fetch(request);

  const url = new URL(request.url);
  const path = url.pathname; // e.g., /products-mfe/1.2.1/main.chunk.js

  // Find the app and version this asset belongs to.
  // We parse this from the URL path itself, assuming a convention.
  const pathParts = path.split('/').filter(p => p);
  if (pathParts.length < 3) {
    // Malformed asset path, fall back to network.
    return fetch(request);
  }

  const [appName, appVersion, ..._] = pathParts; // e.g., ['products-mfe', '1.2.1', 'main.chunk.js']
  const cleanAppName = appName.replace('-mfe', '');
  
  const appConfig = manifest.applications[cleanAppName];

  // A critical security and stability check:
  // Only serve assets that are explicitly listed in the active manifest.
  // This prevents loading stale or incorrect assets if the manifest has updated
  // but the browser still holds a reference to an old HTML file.
  if (appConfig && appConfig.version === appVersion && appConfig.assets.includes(path)) {
    const cacheName = `${ASSET_CACHE_PREFIX}${cleanAppName}-${appVersion}`;
    const cache = await caches.open(cacheName);
    const cachedResponse = await cache.match(request);

    // Cache first, then network.
    if (cachedResponse) return cachedResponse;

    const networkResponse = await fetch(request);
    if (networkResponse.ok) {
      await cache.put(request, networkResponse.clone());
    }
    return networkResponse;
  } else {
    // If the requested asset version is not in the manifest, it's stale or invalid.
    // Return a 404 to prevent the application from running in a broken state.
    console.warn(`[Service Worker] Denying request for stale/invalid asset: ${path}`);
    return new Response('Asset not found in current deployment manifest.', { status: 404 });
  }
}

This strict checking against the manifest is a key aspect of the system’s resilience. It prevents a scenario where a user has an old index.html in their browser cache that tries to load JavaScript chunks from a version that has already been rolled back or superseded.

Phase 3: Communication, Updates, and the User Experience

The system is useless if it can’t inform the user of updates. We established a communication channel between the Service Worker and the Ant Design shell using postMessage.

In the Service Worker, after fetching a new manifest, we compare its version with the currently loaded one.

// sw.js inside a function that periodically checks for updates

let currentManifestVersion = null;

async function checkForUpdates() {
  const newManifest = await fetchAndCacheManifest();
  if (!newManifest) return;

  if (currentManifestVersion && newManifest.version !== currentManifestVersion) {
    console.log('[Service Worker] New manifest detected. Informing clients.');
    
    // Pre-cache assets for the new version for a smoother update.
    await preCacheNewAssets(newManifest);

    // Inform all active clients about the update.
    const clients = await self.clients.matchAll();
    clients.forEach(client => {
      client.postMessage({ type: 'NEW_VERSION_AVAILABLE' });
    });
  }
  currentManifestVersion = newManifest.version;
}

// Function to pre-cache assets from a newly detected manifest
async function preCacheNewAssets(manifest) {
  for (const appName in manifest.applications) {
    const appConfig = manifest.applications[appName];
    const cacheName = `${ASSET_CACHE_PREFIX}${appName}-${appConfig.version}`;
    const cache = await caches.open(cacheName);
    const assetUrls = [appConfig.entry, ...appConfig.assets];
    await cache.addAll(assetUrls);
    console.log(`[Service Worker] Pre-cached assets for ${appName} v${appConfig.version}`);
  }
}

In our Ant Design shell application (a React component), we listen for this message.

// App.js in the Ant Design shell

import React, { useEffect } from 'react';
import { Button, notification } from 'antd';

function App() {
  useEffect(() => {
    if ('serviceWorker' in navigator) {
      const handleServiceWorkerMessage = (event) => {
        if (event.data && event.data.type === 'NEW_VERSION_AVAILABLE') {
          // Use Ant Design's notification component for a non-intrusive UI.
          const key = `open${Date.now()}`;
          const btn = (
            <Button type="primary" size="small" onClick={() => window.location.reload()}>
              Reload
            </Button>
          );
          notification.open({
            message: 'Update Available',
            description: 'A new version of the application is ready. Please reload to update.',
            btn,
            key,
            duration: 0, // Keep it open until the user interacts
          });
        }
      };

      navigator.serviceWorker.addEventListener('message', handleServiceWorkerMessage);

      // Clean up the event listener on component unmount.
      return () => {
        navigator.serviceWorker.removeEventListener('message', handleServiceWorkerMessage);
      };
    }
  }, []);

  // ... rest of the shell application logic
  return (
    // ... JSX for the shell layout
  );
}

We decided against a fully automatic “hot-swap” of micro-frontends without a page reload. In practice, managing application state during a hot-swap is fraught with peril. A user could be in the middle of filling out a form when the underlying component code changes. The “Reload to Update” pattern is a pragmatic compromise that guarantees a clean application state.

Phase 4: Cache Management and Instant Rollbacks

A critical production-grade requirement is cache cleanup. Without it, the user’s disk space would slowly be consumed by stale application versions. The activate event is the perfect place for this. The logic is to get the list of active cache names from the manifest and delete any that are not on the list.

// sw.js (in activate event)

self.addEventListener('activate', (event) => {
  console.log('[Service Worker] Activate event fired.');
  event.waitUntil(
    (async () => {
      const manifest = await getManifest();
      if (!manifest) return;

      const activeCacheNames = new Set([SHELL_CACHE_NAME, MANIFEST_CACHE_NAME]);
      for (const appName in manifest.applications) {
        const appConfig = manifest.applications[appName];
        activeCacheNames.add(`${ASSET_CACHE_PREFIX}${appName}-${appConfig.version}`);
      }
      
      const cacheKeys = await caches.keys();
      const cachesToDelete = cacheKeys.filter(key => !activeCacheNames.has(key));
      
      await Promise.all(cachesToDelete.map(key => {
        console.log(`[Service Worker] Deleting old cache: ${key}`);
        return caches.delete(key);
      }));

      await self.clients.claim();
    })()
  );
});

This cleanup logic makes rollbacks trivial and instant. If version 1.2.1 of the products micro-frontend has a critical bug, the on-call engineer’s remediation is simple: update the deployment-manifest.json on S3 to point back to the stable version 1.2.0. The next time a client’s Service Worker fetches the manifest, it will see that 1.2.0 is the active version. It will start serving assets from the mfe-asset-cache-products-1.2.0 cache (which likely still exists) and, during its next activation, the cleanup logic will automatically purge the mfe-asset-cache-products-1.2.1 cache. The rollback is effective immediately for new sessions and within a minute for existing ones.

The entire architecture can be visualized as follows:

sequenceDiagram
    participant User
    participant Browser
    participant ServiceWorker as SW
    participant CloudFront as CDN/AWS
    participant S3

    User->>Browser: Navigates to /products
    Browser->>SW: Intercept fetch for /products (navigate)
    SW->>CDN: GET /deployment-manifest.json
    CDN->>S3: GET /deployment-manifest.json
    S3-->>CDN: Returns manifest.json
    CDN-->>SW: Returns manifest.json
    SW->>SW: Parse manifest, find 'products' v1.2.1
    SW->>Browser: Respond with cached /products-mfe/1.2.1/index.html
    Browser->>Browser: Parse index.html, requests main.chunk.js
    Browser->>SW: Intercept fetch for /products-mfe/1.2.1/main.chunk.js
    SW->>SW: Check manifest, verify asset is valid
    SW->>Browser: Respond with cached main.chunk.js
    
    Note over S3, CDN: Later, CI/CD updates manifest to products v1.2.2
    
    SW->>CDN: Periodic check for /deployment-manifest.json
    CDN-->>SW: Returns updated manifest.json
    SW->>SW: Detects new version, pre-caches v1.2.2 assets
    SW->>Browser: postMessage({type: 'NEW_VERSION_AVAILABLE'})
    Browser->>User: Display Ant Design notification "Update available"
    User->>Browser: Clicks "Reload" button
    Browser->>SW: New navigation to /products
    SW->>SW: Parse new manifest, find 'products' v1.2.2
    SW->>Browser: Respond with cached /products-mfe/1.2.2/index.html

The system we built provides tangible benefits: teams can deploy independently, releases are low-risk, and rollbacks are nearly instantaneous. The Service Worker, often pigeonholed as a tool for offline caching, proved to be a powerful client-side deployment orchestrator.

The current solution isn’t without its own set of challenges. The local development experience requires tooling to mock the manifest and bypass the Service Worker, adding a layer of complexity for new developers. Furthermore, while we share libraries like React and Ant Design via the shell, a more sophisticated shared dependency strategy is needed to handle version mismatches or the need for a micro-frontend to use a newer version of a shared library than the shell provides. The manifest on S3 is also a single point of failure; a more robust architecture would use a versioned DynamoDB table with a Lambda function for atomic updates, providing a clear audit trail and mitigating the risk of manifest corruption. These are the next frontiers for improving the resilience and maintainability of this architecture.


  TOC