Implementing a Secure Hybrid Architecture for Monolith Decomposition with Pulumi and Consul Connect


A significant portion of enterprise systems still run on battle-tested, monolithic Java EE applications. Our core problem centered on such a system: a stateful, JMS-centric monolith deployed on a fleet of virtual machines, heavily reliant on ActiveMQ for asynchronous processing. The business imperative was to increase development velocity and introduce new features, a classic driver for microservices. However, a “big bang” rewrite was deemed unacceptably risky due to the system’s criticality and the undocumented nature of many of its business rules. The chosen path was a gradual decomposition using the Strangler Fig pattern. This immediately presented the primary technical challenge: how to establish a secure, manageable, and dynamically configurable communication bridge between the legacy VM-based environment and the new Kubernetes-hosted microservices, all while ensuring zero downtime.

Architectural Decision Point: The Migration Strategy

Two primary high-level strategies were evaluated for this decomposition.

The first option was a “Lift-and-Shift then Refactor” approach. This involves containerizing the entire monolithic application and its dependencies, including ActiveMQ, and deploying them into a Kubernetes cluster. The primary advantage is infrastructure unification; everything runs under a single orchestration plane. This simplifies networking and discovery, as all components are within the same Kubernetes ecosystem. However, the pitfalls are substantial. In a real-world project, monoliths are rarely cloud-native. They often have slow startup times, rely on local filesystem state, and have JVM memory tuning optimized for long-running VMs, not ephemeral containers. The process of containerization itself can become a multi-month project with no immediate business value. More critically, this approach merely moves the architectural problem to a new location. It doesn’t solve the core issue of tight coupling and simply defers the difficult work of decomposition.

The second, and ultimately chosen, strategy was a “Strangler Fig with a Service Mesh Bridge.” This involves leaving the stable monolith running on its existing VMs while building new capabilities as microservices on a parallel Kubernetes platform. The key is creating a secure and transparent communication layer between these two worlds. The strangulation happens incrementally: a piece of functionality is rewritten as a microservice, and a proxy or facade layer in the monolith is updated to route calls to the new service. This approach directly tackles the decomposition, delivers value incrementally, and contains risk within the new services. The primary drawback is the initial setup complexity. It requires managing a hybrid infrastructure (VMs and Kubernetes) and introducing new components like a service mesh and a distributed configuration system. The operational overhead is higher at the outset, but the risk reduction for a critical system justifies this investment. We decided the long-term benefits of a controlled, gradual migration far outweighed the initial operational complexity.

Core Implementation: Provisioning the Hybrid Infrastructure with Pulumi

Infrastructure as Code (IaC) was non-negotiable. While Terraform is a common choice, we selected Pulumi with TypeScript. The rationale was simple: managing a hybrid environment with conditional logic, loops, and integrations with external systems is far more elegant and maintainable in a general-purpose programming language than in a Domain-Specific Language (DSL). The ability to write unit tests for our infrastructure code was also a significant factor.

Our Pulumi project was structured to manage both the new Kubernetes environment and the necessary networking constructs to connect with the legacy VMs.

// pulumi/main.ts
// A simplified representation of the Pulumi program to define the hybrid infrastructure.

import * as aws from "@pulumi/aws";
import * as eks from "@pulumi/eks";
import * as k8s from "@pulumi/kubernetes";
import * as pulumi from "@pulumi/pulumi";

// --- 1. Network Foundation ---
// Create a VPC to house both the new EKS cluster and provide a path to the legacy environment.
// In a real project, this would likely connect to an existing on-prem network via Direct Connect or VPN.
const vpc = new aws.ec2.Vpc("app-vpc", {
    cidrBlock: "10.100.0.0/16",
    enableDnsHostnames: true,
    enableDnsSupport: true,
    tags: { "Name": "strangler-fig-vpc" },
});

const subnet = new aws.ec2.Subnet("app-subnet", {
    vpcId: vpc.id,
    cidrBlock: "10.100.1.0/24",
    availabilityZone: "us-west-2a",
    tags: { "Name": "app-subnet" },
});

// --- 2. Security Groups ---
// Security group for the new EKS cluster nodes.
const eksSg = new aws.ec2.SecurityGroup("eks-sg", {
    vpcId: vpc.id,
    description: "Allow all traffic within the EKS cluster",
    ingress: [{ protocol: "-1", self: true, fromPort: 0, toPort: 0 }],
    egress: [{ protocol: "-1", fromPort: 0, toPort: 0, cidrBlocks: ["0.0.0.0/0"] }],
});

// Security group for the legacy VM.
// IMPORTANT: We do NOT open the ActiveMQ port (61616) to the world or even the VPC.
// Consul Connect will handle this securely. We only need to allow Consul traffic.
const legacyVmSg = new aws.ec2.SecurityGroup("legacy-vm-sg", {
    vpcId: vpc.id,
    description: "Security group for the legacy monolith VM",
    ingress: [
        // Allow SSH access for management
        { protocol: "tcp", fromPort: 22, toPort: 22, cidrBlocks: ["YOUR_IP/32"] },
        // Allow Consul agent traffic from the EKS cluster
        { protocol: "tcp", fromPort: 8300, toPort: 8302, sourceSecurityGroupId: eksSg.id },
        { protocol: "udp", fromPort: 8300, toPort: 8302, sourceSecurityGroupId: eksSg.id },
        { protocol: "tcp", fromPort: 8500, toPort: 8502, sourceSecurityGroupId: eksSg.id },
    ],
    egress: [{ protocol: "-1", fromPort: 0, toPort: 0, cidrBlocks: ["0.0.0.0/0"] }],
});

// --- 3. EKS Cluster for Microservices ---
// Create an EKS cluster for the new microservices.
const cluster = new eks.Cluster("app-cluster", {
    vpcId: vpc.id,
    subnetIds: [subnet.id],
    instanceType: "t3.medium",
    desiredCapacity: 2,
    minSize: 1,
    maxSize: 3,
    nodeSecurityGroup: eksSg,
});

// Export the kubeconfig to access the cluster.
export const kubeconfig = cluster.kubeconfig;
const k8sProvider = cluster.provider;

// --- 4. Deploying Core Services (Consul & Nacos) into EKS via Helm ---
// A common mistake is to manage these application-layer services manually.
// Using Pulumi ensures they are part of the core infrastructure definition.

// Deploy HashiCorp Consul using its Helm chart.
const consulChart = new k8s.helm.v3.Chart("consul", {
    chart: "consul",
    version: "0.45.0",
    fetchOpts: { repo: "https://helm.releases.hashicorp.com" },
    values: {
        global: {
            name: "consul",
            datacenter: "dc1",
        },
        connectInject: {
            enabled: true, // This is crucial for automatic sidecar injection
            default: true,
        },
        controller: {
            enabled: true,
        },
        ui: {
            enabled: true,
        },
    },
}, { provider: k8sProvider });

// Deploy Nacos for dynamic configuration using its Helm chart.
const nacosChart = new k8s.helm.v3.Chart("nacos", {
    chart: "nacos",
    version: "2.0.2",
    fetchOpts: { repo: "https://nacos-group.github.io/nacos-k8s" },
    values: {
        replicaCount: 1, // For demonstration; use 3+ for production.
        // Production would use an external database.
        persistence: {
            enabled: false
        },
        nacos: {
            // JVM heap settings are critical for Nacos stability.
            // These are placeholder values.
            Xms: "512m",
            Xmx: "512m",
            Xmn: "256m"
        }
    },
}, { provider: k8sProvider, dependsOn: [consulChart] }); // Ensure Consul is up first if needed.

This Pulumi program defines the entire baseline: a VPC, security groups that strictly control traffic, an EKS cluster, and Helm deployments for Consul and Nacos. The key security aspect is in legacyVmSg: the ActiveMQ and application ports are not exposed. All communication will be forced through the secure tunnel provided by Consul Connect.

The Secure Bridge: Consul Connect in a Hybrid Environment

The crux of the architecture is establishing a secure mTLS (mutual TLS) tunnel between a new microservice in Kubernetes and the legacy ActiveMQ broker on the VM. This is where Consul Connect shines. It works by deploying lightweight sidecar proxies next to each application instance. These proxies handle service discovery, connection establishment, and encryption, making the process transparent to the application itself.

1. On the Legacy VM:

First, a Consul agent must be installed and configured on the legacy monolith’s VM. It needs to be configured to join the Consul cluster running in Kubernetes.

// /etc/consul.d/consul.hcl - Configuration for the Consul agent on the legacy VM
data_dir = "/opt/consul"
log_level = "INFO"
datacenter = "dc1"

// This IP address would be the address of the Consul server service in Kubernetes
// (e.g., exposed via a LoadBalancer or NodePort).
retry_join = ["10.100.x.x"]

// Enable the gRPC listener for the Connect proxy
ports {
  grpc = 8502
}

connect {
  enabled = true
}

Next, we register the legacy services (the monolith application and ActiveMQ) with the local Consul agent. This makes them discoverable by other services in the mesh.

// /etc/consul.d/services.json - Service definitions for legacy components
{
  "services": [
    {
      "name": "legacy-monolith",
      "id": "monolith-app-1",
      "port": 8080,
      "connect": {
        "sidecar_service": {}
      }
    },
    {
      "name": "activemq-broker",
      "id": "activemq-1",
      "port": 61616, // The actual ActiveMQ port
      "connect": {
        "sidecar_service": {
           // The sidecar proxy will listen on a local port (e.g., 21000) and forward
           // traffic to the real ActiveMQ port (61616).
           "proxy": {
             "upstreams": []
           }
        }
      }
    }
  ]
}

After reloading Consul on the VM, you start the sidecar proxy for ActiveMQ:
consul connect proxy -sidecar-for activemq-1
This command starts a process that listens locally on a dynamic port. Any traffic sent to this proxy will be securely forwarded to localhost:61616 on the VM after mTLS authentication and authorization.

2. The New Microservice (in Kubernetes):

Now, we create a new microservice that needs to send messages to ActiveMQ. This service is deployed to Kubernetes. Thanks to Consul’s Helm chart setting connectInject.enabled=true, the Consul sidecar injector will automatically add the consul-connect-envoy-sidecar container to our pod.

The microservice’s deployment manifest declares its dependency on activemq-broker via an annotation.

# k8s/new-processor-service.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: new-data-processor
spec:
  replicas: 1
  selector:
    matchLabels:
      app: new-data-processor
  template:
    metadata:
      labels:
        app: new-data-processor
      annotations:
        # This annotation tells the Consul sidecar to establish a connection
        # to the 'activemq-broker' service and expose it on localhost:21000 inside the pod.
        "consul.hashicorp.com/connect-service-upstreams": "activemq-broker:21000"
    spec:
      containers:
      - name: processor
        image: "my-repo/new-data-processor:1.0.0"
        ports:
        - containerPort: 8080
        env:
        - name: ACTIVEMQ_BROKER_URL
          # CRITICAL: The application connects to localhost. The sidecar proxy handles the rest.
          value: "tcp://localhost:21000" 

The application code itself is now blissfully unaware of the network complexity. It connects to localhost:21000 as instructed by the environment variable. The Envoy sidecar, running in the same pod, intercepts this connection, finds the activemq-broker service via Consul, establishes an mTLS tunnel to the sidecar on the legacy VM, and forwards the traffic.

graph TD
    subgraph Kubernetes Pod
        A[New Processor Microservice] -->|tcp://localhost:21000| B(Envoy Sidecar Proxy);
    end

    subgraph Legacy VM
        D(Envoy Sidecar Proxy) -->|tcp://localhost:61616| E[ActiveMQ Broker];
    end

    B -- mTLS Tunnel over VPC --o D;

    style A fill:#cce5ff,stroke:#333,stroke-width:2px
    style E fill:#ffcccc,stroke:#333,stroke-width:2px

3. Securing Communication with Intentions:

By default, Consul Connect denies all communication. We must create an “Intention” to explicitly allow the new-data-processor to talk to activemq-broker. This is a powerful, application-level firewall rule.

This can be done via the Consul UI or command line:
consul intention create -allow new-data-processor activemq-broker

This configuration ensures that even if another pod in the Kubernetes cluster is compromised, it cannot connect to the legacy ActiveMQ broker because no intention allows it. Security is based on service identity, not IP addresses, which is essential in a dynamic environment like Kubernetes.

Dynamic Control with Nacos

The next piece of the puzzle is controlling the “strangling” process. When a piece of functionality, say the order validation logic, is moved from the monolith to a new microservice, we need a way to dynamically switch traffic without a redeployment. This is a perfect use case for a distributed configuration center like Nacos.

We define a configuration in Nacos with a specific Data ID and Group. For example:

  • Data ID: routing-rules.properties
  • Group: monolith-facade
  • Content:
    # feature.order.validation.useNewService
    # Toggles whether to use the new microservice for order validation.
    # false = use legacy internal code
    # true = call new microservice via Consul Connect
    feature.order.validation.useNewService=false

The legacy monolith’s code is then refactored. The original order validation logic is wrapped in a facade or strategy pattern. This facade uses the Nacos client to fetch the configuration and listen for changes.

// Simplified Java code inside the legacy monolith
import com.alibaba.nacos.api.NacosFactory;
import com.alibaba.nacos.api.config.ConfigService;
import com.alibaba.nacos.api.config.listener.Listener;
import java.util.Properties;
import java.util.concurrent.Executor;

public class OrderValidationFacade {
    private static final String DATA_ID = "routing-rules.properties";
    private static final String GROUP = "monolith-facade";

    private volatile boolean useNewService = false;
    private final LegacyOrderValidator legacyValidator = new LegacyOrderValidator();
    private final NewServiceOrderValidator newServiceValidator = new NewServiceOrderValidator();

    public OrderValidationFacade() {
        try {
            // In a real Java EE app, this would be managed in a singleton, lifecycle-aware bean.
            String serverAddr = System.getenv("NACOS_SERVER_ADDR"); // e.g., "nacos-server:8848"
            Properties properties = new Properties();
            properties.put("serverAddr", serverAddr);
            
            ConfigService configService = NacosFactory.createConfigService(properties);
            
            // Initial fetch of the config
            String content = configService.getConfig(DATA_ID, GROUP, 5000);
            updateConfig(content);

            // Add a listener for dynamic updates. This is the key feature.
            configService.addListener(DATA_ID, GROUP, new Listener() {
                @Override
                public Executor getExecutor() {
                    return null; // Use default executor
                }

                @Override
                public void receiveConfigInfo(String configInfo) {
                    System.out.println("Received new Nacos config: " + configInfo);
                    updateConfig(configInfo);
                }
            });
        } catch (Exception e) {
            // A common mistake is to not handle Nacos unavailability on startup.
            // The application should start with a safe default.
            System.err.println("Failed to connect to Nacos, using default routing (legacy). " + e.getMessage());
            this.useNewService = false;
        }
    }

    private void updateConfig(String content) {
        if (content == null || content.isEmpty()) return;
        try {
            Properties props = new Properties();
            props.load(new java.io.StringReader(content));
            this.useNewService = Boolean.parseBoolean(props.getProperty("feature.order.validation.useNewService", "false"));
        } catch (java.io.IOException e) {
            System.err.println("Failed to parse Nacos config, retaining previous settings.");
        }
    }

    public ValidationResult validateOrder(Order order) {
        if (useNewService) {
            // This validator would use an HTTP client configured to talk to the new service
            // through its local Consul Connect proxy.
            return newServiceValidator.validate(order);
        } else {
            return legacyValidator.validate(order);
        }
    }
}

With this in place, we can go to the Nacos UI, change feature.order.validation.useNewService to true, and publish. Within seconds, the listener in the monolith fires, the useNewService flag flips, and all subsequent validation requests are routed to the new microservice. This allows for canary releases and instant rollbacks of new functionality without touching the legacy deployment.

Decomposing the Frontend with Lit

The final challenge was the tightly coupled, server-side rendered frontend (JSPs in this case). A full rewrite was out of the question. Instead, we applied the strangler pattern to the UI using micro-frontends delivered as Web Components. Lit was chosen for its small footprint, standards-based approach, and lack of a heavy framework runtime, making it ideal for embedding into an existing page.

A new, independent “Product Recommendation” micro-frontend service was created. It serves a single JavaScript file that defines a <product-recommendations> custom element.

// product-recommendations.js - A Lit Web Component
import { LitElement, html, css } from 'lit';
import { customElement, property } from 'lit/decorators.js';

@customElement('product-recommendations')
export class ProductRecommendations extends LitElement {
  @property({ type: String }) sku = '';
  @property({ type: Array }) recommendations = [];
  @property({ type: Boolean }) isLoading = true;

  static styles = css`
    :host { display: block; border: 1px solid #ccc; padding: 1em; }
    ul { list-style: none; padding: 0; }
    li { margin-bottom: 0.5em; }
  `;

  connectedCallback() {
    super.connectedCallback();
    this.fetchRecommendations();
  }

  async fetchRecommendations() {
    this.isLoading = true;
    try {
      // The API Gateway routes '/api/recommendations' to the new microservice.
      const response = await fetch(`/api/recommendations?sku=${this.sku}`);
      if (!response.ok) throw new Error('Network response was not ok');
      this.recommendations = await response.json();
    } catch (error) {
      console.error("Failed to fetch recommendations:", error);
      this.recommendations = [];
    } finally {
      this.isLoading = false;
    }
  }

  render() {
    if (this.isLoading) {
      return html`<p>Loading recommendations...</p>`;
    }
    return html`
      <h3>Recommended Products</h3>
      <ul>
        ${this.recommendations.map(rec => html`<li>${rec.name} - ${rec.price}</li>`)}
      </ul>
    `;
  }
}

The legacy JSP page is then modified slightly. The old server-side recommendation logic is removed, and in its place, we add two lines:

<%-- productDetail.jsp --%>

<%-- ... existing product detail rendering ... --%>

<!-- This is the strangulation point in the UI -->
<script type="module" src="https://my-mfe-cdn.com/product-recommendations.js"></script>
<product-recommendations sku="${product.sku}"></product-recommendations>

<%-- ... rest of the page ... --%>

This is remarkably non-invasive. The JSP simply renders a custom HTML tag. The browser, upon seeing the tag, executes the loaded JavaScript, which then takes over the responsibility for that small part of the page. The Lit component is self-contained, handling its own data fetching from a new, independent microservice API, completely decoupling it from the monolithic backend’s rendering lifecycle.

Architectural Limitations and Future Considerations

This architecture, while effective, is not a silver bullet. The most significant lingering issue is operational complexity. For a period, the team must be proficient in managing both the legacy VM stack and the new cloud-native Kubernetes stack. The service mesh, while providing immense security and observability benefits, introduces another component that requires monitoring and maintenance. There is a performance overhead associated with the sidecar proxies, which, although minimal for most workloads, could be a factor in latency-sensitive applications.

Furthermore, this pattern does not magically solve data consistency problems. As functionality moves into microservices that manage their own databases, maintaining transactional integrity across the monolith and microservices becomes a new challenge. This architecture provides the communication channels to implement patterns like Sagas, but the implementation of those patterns is a separate, complex task.

Finally, there is a clear organizational risk. The “temporary” bridge can easily become a permanent fixture if there is no disciplined, continuous effort to migrate functionality and eventually decommission the monolith. The goal must always be to shrink the monolith’s responsibilities to zero, at which point the entire legacy stack and the bridge itself can be retired. Without that focus, the result is not a successful migration but a more complex, distributed monolith.


  TOC