Implementing Dynamic Multi-Tenant Isolation in a Weaviate and Spring Boot System Using Zookeeper for Coordination and mTLS for Security

Distributed Systems

Word Count: 2.8k

Read Times: 17 Min

The initial project brief was straightforward: build a vector search backend for an internal application. The prototype, built with Spring Boot and Weaviate, performed well. Then came the pivot to a multi-tenant SaaS platform. This single business decision transformed a simple data service into a complex distributed system problem, demanding stringent data isolation, dynamic tenant provisioning, and a zero-trust security posture. The original architecture was completely inadequate. What follows is the log of how we re-engineered the system to meet these new, unforgiving requirements.

Technical Pain Point: The Illusion of “Simple” Multi-Tenancy

Weaviate offers out-of-the-box multi-tenancy, which seemed like a perfect solution. You create a class with multi-tenancy enabled and then pass a tenant identifier in an X-Tenant header for every request. On the surface, this handles data partitioning within the database.

The pitfall here is that Weaviate’s feature solves the data storage isolation problem, but not the tenant lifecycle management problem. A real-world project must answer several other questions:

How do we atomically provision a new tenant across our entire stack (not just in Weaviate)?
How do we ensure that service A doesn’t attempt to write data for a tenant that hasn’t been fully created in service B?
How do we dynamically update the system’s knowledge of active tenants without service restarts?
How do we guarantee that all communication, especially for a system handling sensitive tenant data, is mutually authenticated and encrypted, end-to-end?

Our initial Spring Boot service simply passed through a header. This was brittle and insecure. It relied on upstream services to provide a valid tenant ID and had no coordination mechanism. A failed provisioning step could leave the system in an inconsistent state, a critical flaw for a SaaS product.

The Architectural Re-Design: A Three-Pillar Approach

The redesign was founded on three core decisions to address the lifecycle, coordination, and security challenges.

Security First: Mandatory mTLS. All internal service-to-service communication must be mutually authenticated. No exceptions. This establishes a zero-trust network where services must present valid, trusted certificates to even speak to one another. This applies to the communication between our Spring Boot application and the Weaviate database.
Coordination with Zookeeper. We needed a robust, distributed coordination service. While modern alternatives like etcd or Consul exist, our team had deep operational experience with Zookeeper, making it a pragmatic choice for managing distributed locks and configuration. It would act as the source of truth for tenant existence and state, ensuring consistent tenant provisioning across the system.
Centralized Tenant Logic in Spring Boot. The Spring Boot application would evolve from a simple proxy to the central nervous system for tenant management. It would orchestrate the creation of tenants, manage their state in Zookeeper, and enforce access control before ever communicating with Weaviate. The Progressive Web App (PWA) client would interact with this service, authenticated via JWTs that securely identify the user and their associated tenant.

graph TD
    subgraph "Browser"
        PWA
    end

    subgraph "API Layer"
        APIGateway[API Gateway]
    end

    subgraph "Core Services"
        SpringBootApp[Spring Boot Tenant Service]
        Zookeeper[Zookeeper Cluster]
        Weaviate[Weaviate Instance]
    end

    PWA -- HTTPS / JWT --> APIGateway
    APIGateway -- mTLS / JWT --> SpringBootApp

    subgraph "Internal mTLS Communication"
        SpringBootApp -- gRPC/REST over mTLS --> Weaviate
        SpringBootApp -- ZK Protocol --> Zookeeper
    end

    style Weaviate fill:#f9f,stroke:#333,stroke-width:2px
    style Zookeeper fill:#f9f,stroke:#333,stroke-width:2px
    style SpringBootApp fill:#ccf,stroke:#333,stroke-width:2px

Phase 1: Forging the mTLS Foundation

Before any application logic, we had to build the secure transport layer. A common mistake is to bolt on security later; it needs to be the bedrock. This involved generating certificates and configuring both Spring Boot and Weaviate to use them.

Certificate Generation

For a production environment, you would use a proper internal CA. For this build, we used openssl to generate a self-signed CA, a server certificate for Weaviate, and a client certificate for our Spring Boot app.

# 1. Create a self-signed Certificate Authority (CA)
openssl genrsa -out ca.key 4096
openssl req -new -x509 -days 3650 -key ca.key -out ca.crt -subj "/C=US/ST=CA/L=PaloAlto/O=MyOrg/OU=CA/CN=my-ca"

# 2. Create Server Key and CSR for Weaviate
openssl genrsa -out weaviate.key 4096
openssl req -new -key weaviate.key -out weaviate.csr -subj "/C=US/ST=CA/L=PaloAlto/O=MyOrg/OU=Server/CN=weaviate"

# 3. Sign the Server Certificate with our CA
openssl x509 -req -days 365 -in weaviate.csr -CA ca.crt -CAkey ca.key -CAcreateserial -out weaviate.crt

# 4. Create Client Key and CSR for Spring Boot App
openssl genrsa -out springboot.key 4096
openssl req -new -key springboot.key -out springboot.csr -subj "/C=US/ST=CA/L=PaloAlto/O=MyOrg/OU=Client/CN=springboot-app"

# 5. Sign the Client Certificate with our CA
openssl x509 -req -days 365 -in springboot.csr -CA ca.crt -CAkey ca.key -CAcreateserial -out springboot.crt

# 6. Create PKCS12 Keystore for Spring Boot Client
# This bundles the client's key and cert chain for Java's SSLContext
openssl pkcs12 -export -out springboot.p12 -inkey springboot.key -in springboot.crt -certfile ca.crt -password pass:changeit

Configuring Weaviate and Docker Compose

We configured our docker-compose.yml to launch Weaviate with TLS enabled and requiring client authentication.

version: '3.8'

services:
  weaviate:
    image: semitechnologies/weaviate:1.23.7
    ports:
      - "8080:8080" # HTTPS port
      - "50051:50051" # gRPC port
    restart: on-failure:0
    volumes:
      - ./weaviate-data:/var/lib/weaviate
      - ./certs/weaviate.crt:/etc/weaviate/weaviate.crt
      - ./certs/weaviate.key:/etc/weaviate/weaviate.key
      - ./certs/ca.crt:/etc/weaviate/ca.crt # For client cert validation
    environment:
      AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'false'
      AUTHENTICATION_OIDC_ENABLED: 'true' # A way to enable auth, we use client certs
      AUTHENTICATION_OIDC_CLIENT_ID: 'dummy'
      AUTHENTICATION_OIDC_ISSUER: 'dummy'
      QUERY_DEFAULTS_LIMIT: 25
      DEFAULT_VECTORIZER_MODULE: 'none'
      ENABLE_MODULES: ''
      CLUSTER_HOSTNAME: 'node1'
      PERSISTENCE_DATA_PATH: '/var/lib/weaviate'
      # TLS Configuration
      ENABLE_GRPC_TLS: 'true'
      GRPC_TLS_CERT_FILE: '/etc/weaviate/weaviate.crt'
      GRPC_TLS_KEY_FILE: '/etc/weaviate/weaviate.key'
      ENABLE_TLS: 'true'
      TLS_CERT_FILE: '/etc/weaviate/weaviate.crt'
      TLS_KEY_FILE: '/etc/weaviate/weaviate.key'
      TLS_CLIENT_AUTH_MODE: 'RequireAndVerifyClientCert' # This is the critical mTLS part
      TLS_CLIENT_CA_FILE: '/etc/weaviate/ca.crt'

Configuring Spring Boot’s HTTP Client for mTLS

The default RestTemplate or WebClient won’t work. We need to build a custom SSLContext that loads our client keystore (springboot.p12) and the truststore (containing ca.crt) to verify Weaviate’s server certificate.

package com.example.mtls.config;

import org.apache.hc.client5.http.impl.classic.CloseableHttpClient;
import org.apache.hc.client5.http.impl.classic.HttpClients;
import org.apache.hc.client5.http.impl.io.PoolingHttpClientConnectionManagerBuilder;
import org.apache.hc.client5.http.io.HttpClientConnectionManager;
import org.apache.hc.client5.http.ssl.SSLConnectionSocketFactory;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.core.io.Resource;
import org.springframework.http.client.HttpComponentsClientHttpRequestFactory;
import org.springframework.web.client.RestTemplate;

import javax.net.ssl.SSLContext;
import org.apache.hc.core5.ssl.SSLContextBuilder;

import java.io.InputStream;
import java.security.KeyStore;

@Configuration
public class WeaviateClientConfig {

    @Value("${weaviate.client.ssl.key-store}")
    private Resource keyStore;

    @Value("${weaviate.client.ssl.key-store-password}")
    private String keyStorePassword;

    @Value("${weaviate.client.ssl.trust-store}")
    private Resource trustStore;

    @Value("${weaviate.client.ssl.trust-store-password}")
    private String trustStorePassword;

    @Bean
    public RestTemplate weaviateRestTemplate() throws Exception {
        // Build SSLContext with our client certificate and truststore
        SSLContext sslContext = new SSLContextBuilder()
                .loadKeyMaterial(
                        keyStore.getURL(),
                        keyStorePassword.toCharArray(),
                        keyStorePassword.toCharArray()
                )
                .loadTrustMaterial(
                        trustStore.getURL(),
                        trustStorePassword.toCharArray()
                )
                .build();
        
        // Create an SSL connection factory that uses our context
        SSLConnectionSocketFactory socketFactory = new SSLConnectionSocketFactory(sslContext);

        // Build a connection manager with the custom SSL factory
        HttpClientConnectionManager connectionManager = PoolingHttpClientConnectionManagerBuilder.create()
                .setSSLSocketFactory(socketFactory)
                .build();

        // Build the final HttpClient
        CloseableHttpClient httpClient = HttpClients.custom()
                .setConnectionManager(connectionManager)
                .build();

        // Create a RestTemplate that uses this custom HttpClient
        HttpComponentsClientHttpRequestFactory requestFactory = new HttpComponentsClientHttpRequestFactory(httpClient);
        
        return new RestTemplate(requestFactory);
    }
}

And the corresponding application.properties:

weaviate.client.ssl.key-store=classpath:certs/springboot.p12
weaviate.client.ssl.key-store-password=changeit
weaviate.client.ssl.trust-store=classpath:certs/truststore.p12 # A truststore containing ca.crt
weaviate.client.ssl.trust-store-password=changeit

Now, any RestTemplate injected with @Qualifier("weaviateRestTemplate") will correctly perform an mTLS handshake with Weaviate.

Phase 2: Orchestrating Tenant Lifecycle with Zookeeper

With secure communication established, we addressed the coordination problem. We used Apache Curator, a high-level client for Zookeeper, to simplify interactions.

Zookeeper Data Model

Our Zookeeper data model is simple but effective:

/tenants: Parent znode for all tenant information.
/tenants/{tenantId}: A znode for each active tenant. The existence of this znode is the system’s source of truth. We can store metadata here, like {"status": "ACTIVE", "created_at": "..."}.
/locks/tenant-provisioning: A path used for acquiring a distributed lock to prevent race conditions during tenant creation.

Tenant Provisioning Service

This service orchestrates the creation of a new tenant. It’s a critical section that must be executed atomically.

package com.example.mtls.service;

import com.fasterxml.jackson.databind.ObjectMapper;
import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.apache.curator.framework.CuratorFramework;
import org.apache.curator.framework.recipes.locks.InterProcessMutex;
import org.springframework.stereotype.Service;

import java.nio.charset.StandardCharsets;
import java.util.concurrent.TimeUnit;

@Service
@Slf4j
@RequiredArgsConstructor
public class TenantProvisioningService {

    private static final String TENANT_LOCK_PATH = "/locks/tenant-provisioning";
    private static final long LOCK_WAIT_SECONDS = 15;

    private final CuratorFramework curatorClient;
    private final WeaviateAdminService weaviateAdminService; // Service to interact with Weaviate
    private final ObjectMapper objectMapper;

    public boolean provisionTenant(String tenantId) {
        String tenantPath = "/tenants/" + tenantId;
        InterProcessMutex lock = new InterProcessMutex(curatorClient, TENANT_LOCK_PATH);

        try {
            // A common mistake is not specifying a timeout for lock acquisition.
            // In a distributed system, you must assume services can be slow or unresponsive.
            if (!lock.acquire(LOCK_WAIT_SECONDS, TimeUnit.SECONDS)) {
                log.error("Could not acquire lock for tenant provisioning: {}", tenantId);
                return false;
            }
            log.info("Acquired lock for provisioning tenant '{}'", tenantId);

            // Check if tenant already exists in Zookeeper (our source of truth)
            if (curatorClient.checkExists().forPath(tenantPath) != null) {
                log.warn("Tenant '{}' already provisioned, skipping.", tenantId);
                return true; // Idempotent behavior
            }

            // Step 1: Create the tenant in Weaviate
            boolean weaviateSuccess = weaviateAdminService.createTenantInWeaviate(tenantId);
            if (!weaviateSuccess) {
                log.error("Failed to create tenant '{}' in Weaviate. Aborting.", tenantId);
                // In a real-world project, you'd add retry logic or a dead-letter queue here.
                return false;
            }

            // Step 2: On successful creation in Weaviate, register it in Zookeeper
            TenantMetadata metadata = new TenantMetadata("ACTIVE", System.currentTimeMillis());
            byte[] metadataBytes = objectMapper.writeValueAsBytes(metadata);
            curatorClient.create().creatingParentsIfNeeded().forPath(tenantPath, metadataBytes);
            log.info("Successfully provisioned tenant '{}' and registered in Zookeeper.", tenantId);

            return true;

        } catch (Exception e) {
            log.error("Exception during tenant provisioning for '{}'. State may be inconsistent.", tenantId, e);
            // Here, we would need a reconciliation process to clean up partial creations.
            return false;
        } finally {
            try {
                if (lock.isAcquiredInThisProcess()) {
                    lock.release();
                    log.info("Released lock for tenant '{}'", tenantId);
                }
            } catch (Exception e) {
                log.error("Failed to release provisioning lock for tenant '{}'", tenantId, e);
            }
        }
    }

    // A simple record for tenant metadata stored in Zookeeper
    private record TenantMetadata(String status, long createdAt) {}
}

This implementation uses a distributed lock to ensure that two requests to provision the same tenant ID don’t collide. It first performs the operation on the external system (Weaviate) and only then updates the central registry (Zookeeper).

Phase 3: Integrating Tenant Context into the Application Flow

Now we need to make the application tenant-aware for every single request.

Tenant Context Holder

We use a ThreadLocal to hold the tenant ID for the duration of a request. This is a standard pattern in Spring applications.

package com.example.mtls.security;

import org.springframework.util.Assert;

public final class TenantContextHolder {

    private static final ThreadLocal<String> tenantContext = new ThreadLocal<>();

    public static void setTenantId(String tenantId) {
        Assert.notNull(tenantId, "tenantId cannot be null");
        tenantContext.set(tenantId);
    }

    public static String getTenantId() {
        return tenantContext.get();
    }

    public static void clear() {
        tenantContext.remove();
    }
}

Dynamic Tenant Validation and Context Interceptor

An HandlerInterceptor intercepts all incoming requests. It extracts the tenant ID (e.g., from a JWT claim), validates its existence against our Zookeeper-backed cache, and populates the TenantContextHolder.

To avoid hitting Zookeeper for every request, we implement a cache that is kept up-to-date using a Zookeeper PathChildrenCache watcher.

package com.example.mtls.service;

import lombok.extern.slf4j.Slf4j;
import org.apache.curator.framework.CuratorFramework;
import org.apache.curator.framework.recipes.cache.PathChildrenCache;
import org.apache.curator.framework.recipes.cache.PathChildrenCacheEvent;
import org.springframework.stereotype.Service;

import javax.annotation.PostConstruct;
import javax.annotation.PreDestroy;
import java.util.Collections;
import java.util.Set;
import java.util.concurrent.ConcurrentHashMap;

@Service
@Slf4j
public class DynamicTenantRegistry {

    private final CuratorFramework curatorClient;
    private final Set<String> activeTenants = ConcurrentHashMap.newKeySet();
    private PathChildrenCache tenantWatcher;

    public DynamicTenantRegistry(CuratorFramework curatorClient) {
        this.curatorClient = curatorClient;
    }

    @PostConstruct
    public void start() throws Exception {
        // The watcher will keep our in-memory set of tenants synchronized with Zookeeper.
        // This is far more efficient than querying Zookeeper on every request.
        tenantWatcher = new PathChildrenCache(curatorClient, "/tenants", true);
        tenantWatcher.getListenable().addListener(this::handleTenantChangeEvent);
        tenantWatcher.start(PathChildrenCache.StartMode.POST_INITIALIZED_EVENT);
        log.info("DynamicTenantRegistry started. Watching /tenants in Zookeeper.");
    }

    private void handleTenantChangeEvent(CuratorFramework client, PathChildrenCacheEvent event) {
        String tenantId = extractTenantIdFromPath(event.getData().getPath());
        if (tenantId == null) return;
        
        switch (event.getType()) {
            case CHILD_ADDED -> {
                activeTenants.add(tenantId);
                log.info("Tenant added to registry: {}", tenantId);
            }
            case CHILD_REMOVED -> {
                activeTenants.remove(tenantId);
                log.info("Tenant removed from registry: {}", tenantId);
            }
            case CHILD_UPDATED -> log.info("Tenant data updated for: {}", tenantId);
            default -> { /* No-op for other events */ }
        }
    }

    public boolean isTenantActive(String tenantId) {
        return activeTenants.contains(tenantId);
    }

    public Set<String> getActiveTenants() {
        return Collections.unmodifiableSet(activeTenants);
    }

    private String extractTenantIdFromPath(String path) {
        if (path == null || !path.startsWith("/tenants/")) {
            return null;
        }
        return path.substring("/tenants/".length());
    }

    @PreDestroy
    public void stop() throws Exception {
        if (tenantWatcher != null) {
            tenantWatcher.close();
        }
    }
}

Now the interceptor can use this lightning-fast in-memory registry.

@Component
@RequiredArgsConstructor
public class TenantContextInterceptor implements HandlerInterceptor {
    
    private final DynamicTenantRegistry tenantRegistry;
    // Assume a service that can parse a JWT and extract claims
    private final JwtTokenProvider jwtTokenProvider;

    @Override
    public boolean preHandle(HttpServletRequest request, HttpServletResponse response, Object handler) throws Exception {
        // In a real PWA -> Backend flow, the tenant ID would be a claim in a JWT.
        String token = resolveToken(request);
        if (token != null && jwtTokenProvider.validateToken(token)) {
            String tenantId = jwtTokenProvider.getTenantIdFromToken(token);
            
            // The crucial validation step.
            if (tenantRegistry.isTenantActive(tenantId)) {
                TenantContextHolder.setTenantId(tenantId);
                return true;
            }
        }
        
        response.sendError(HttpServletResponse.SC_FORBIDDEN, "Invalid or Inactive Tenant");
        return false;
    }

    @Override
    public void afterCompletion(HttpServletRequest request, HttpServletResponse response, Object handler, Exception ex) {
        // Always clean up the ThreadLocal to prevent memory leaks in application servers.
        TenantContextHolder.clear();
    }
    
    // ... helper method to extract token from 'Authorization' header
}

Finally, the Weaviate data access layer can transparently add the tenant header.

@Repository
public class WeaviateVectorRepository {

    private final RestTemplate weaviateRestTemplate; // The mTLS-configured RestTemplate

    public void addObject(DataObject data) {
        String tenantId = TenantContextHolder.getTenantId();
        if (tenantId == null) {
            throw new IllegalStateException("Tenant context is not set. This request is outside the tenant-aware flow.");
        }

        HttpHeaders headers = new HttpHeaders();
        headers.setContentType(MediaType.APPLICATION_JSON);
        headers.set("X-Tenant", tenantId); // Dynamically set the tenant header

        HttpEntity<DataObject> requestEntity = new HttpEntity<>(data, headers);
        
        // This call is now secure (mTLS) and tenant-isolated (X-Tenant header)
        weaviateRestTemplate.postForObject("https://weaviate:8080/v1/objects", requestEntity, String.class);
    }
}

Lingering Issues and Future Iterations

This architecture provides a robust, secure, and dynamic multi-tenant foundation, but it is not without its own complexities and areas for improvement.

The choice of Zookeeper introduces operational overhead. A properly configured Zookeeper cluster is non-trivial to manage. For a system built on Kubernetes from the ground up, a custom Kubernetes Operator that manages a Tenant Custom Resource Definition (CRD) could provide a more cloud-native approach to the same coordination problem, using the Kubernetes API server as the source of truth instead of Zookeeper.

Furthermore, mTLS certificate management is a significant challenge. The static, manually-generated certificates in this build are a proof-of-concept. A production system requires an automated certificate lifecycle management solution like HashiCorp Vault or cert-manager for Kubernetes to handle certificate issuance, renewal, and revocation seamlessly. Without this, certificate expiration becomes a ticking time bomb.

Finally, the tenant state propagation relies on Zookeeper watchers. While reliable, this model can face challenges at extreme scale (tens of thousands of tenants with high churn) and introduces a point of coupling. An alternative could involve an event-driven approach, where the tenant service emits TenantCreated or TenantDeactivated events to a message bus like Kafka, allowing other services to subscribe and update their local state independently. This would decouple the services but introduce its own set of complexities around message delivery guarantees.

Zookeeper Security mTLS PWA Multi-tenancy Spring Boot Weaviate

Implementing Correlated Trace-Log-Metric Observability for Serverless ML Inference with SkyWalking Vector and AWS Lambda

2023-10-27 Observability

Serverless SkyWalking Vector AWS Lambda Hugging Face Transformers OTLP

Offloading Apache Spark Post-Processing to a Flutter Client via gRPC and a WASM Compute Engine

2023-10-27 Distributed Systems

gRPC WebAssembly Flutter Apache Spark Spring Framework