Implementing a Zero-Trust Data Analytics Pipeline with Trino, Jupyter, and Consul Connect

Data Engineering

Word Count: 2.4k

Read Times: 15 Min

The initial request seemed straightforward: allow our data science team to run ad-hoc queries against a Trino cluster and publish the results as static HTML reports. The problem was security. The Trino cluster processes sensitive PII data, and our security posture mandates a zero-trust network model. Opening direct network paths from analyst environments (Jupyter notebooks) to the data cluster was a non-starter. Manually managing TLS certificates and firewall rules for each component—JupyterHub, the Trino coordinator, Trino workers, and the report generation service—would create an operational bottleneck and a brittle, error-prone system.

Our primary pain point was the lack of a unified, identity-based security mechanism for a heterogeneous data pipeline. Trino has its own security mechanisms, but they don’t cover the communication from the report generator or the interactive notebooks. A traditional VPN-based approach provides coarse-grained network access, not fine-grained, service-to-service authorization. This is what led us down the path of a service mesh, a tool typically associated with microservices, not data engineering stacks.

The concept was to build a pipeline where every network connection is authenticated and authorized based on service identity, not IP addresses. An analyst, working within a Jupyter environment, would execute a query. This query would be sent to the Trino coordinator, which would then distribute the work to its workers. A separate “Orchestrator” service would then fetch the results, process them, and invoke a Static Site Generator (SSG) to create a report. Every single arrow in this data flow diagram needed to be secured with mutual TLS (mTLS). We selected Consul Connect for this task due to its relative simplicity and strong integration with HashiCorp’s ecosystem. In a real-world project, the choice of service mesh is critical, but Consul provided a clear path to securing non-HTTP traffic (like Trino’s internal protocol) with its sidecar proxy model.

Initial Architecture and The Security Gap

Let’s lay out the components in a docker-compose.yml file. This represents our baseline, unsecured environment. It includes a Consul server, a Trino coordinator, a Trino worker, and our custom Python orchestrator that simulates the bridge between Jupyter and the SSG.

# docker-compose.base.yml
version: '3.8'

services:
  consul-server:
    image: hashicorp/consul:1.13.1
    container_name: consul-server
    command: "agent -server -ui -node=server-1 -bootstrap-expect=1 -client=0.0.0.0"
    ports:
      - "8500:8500"
      - "8600:8600/udp"
    networks:
      - data-plane

  trino-coordinator:
    image: trinodb/trino:399
    container_name: trino-coordinator
    volumes:
      - ./trino/coordinator:/etc/trino
    ports:
      - "8080:8080"
    networks:
      - data-plane
    depends_on:
      - consul-server

  trino-worker:
    image: trinodb/trino:399
    container_name: trino-worker
    volumes:
      - ./trino/worker:/etc/trino
    networks:
      - data-plane
    depends_on:
      - trino-coordinator

  orchestrator:
    build:
      context: ./orchestrator
    container_name: orchestrator
    environment:
      - TRINO_HOST=trino-coordinator
      - TRINO_PORT=8080
    networks:
      - data-plane
    depends_on:
      - trino-coordinator

networks:
  data-plane:
    driver: bridge

The configurations for Trino are minimal at this stage.

./trino/coordinator/config.properties:

coordinator=true
node-scheduler.include-coordinator=false
http-server.http.port=8080
query.max-memory=5GB
query.max-memory-per-node=1GB
discovery-server.enabled=true
discovery.uri=http://trino-coordinator:8080

./trino/worker/config.properties:

coordinator=false
http-server.http.port=8080
query.max-memory=5GB
query.max-memory-per-node=1GB
discovery.uri=http://trino-coordinator:8080

This setup works, but it’s a security disaster. Any container on the data-plane network can communicate with any other. The orchestrator can hit the Trino worker’s HTTP port directly. There is no encryption in transit. Our goal is to lock this down entirely using Consul Connect.

Phase 1: Integrating Consul Connect Sidecars

The first step is to inject a Consul agent and an Envoy proxy sidecar into each service that needs to communicate securely. We’ll modify our Docker Compose setup to reflect this. The key change is that each service now gets a corresponding Consul agent running in client mode.

The service definitions are moved from Docker Compose labels into dedicated HCL configuration files for clarity and better management in a production scenario.

./consul/configs/orchestrator.hcl:

service {
  name = "orchestrator"
  port = 8000 // A placeholder port for the service itself

  connect {
    sidecar_service {
      proxy {
        upstreams {
          destination_name = "trino-coordinator"
          local_bind_port = 28080 // We'll connect to this port locally
        }
      }
    }
  }

  check {
    id = "orchestrator-alive"
    name = "Orchestrator Process Check"
    args = ["/usr/bin/pgrep", "-f", "main.py"]
    interval = "10s"
    timeout = "1s"
  }
}

A pitfall here is misunderstanding the upstreams block. This directive tells the local Envoy proxy (the sidecar for orchestrator) to listen on localhost:28080. Any traffic sent to this local port will be automatically encrypted with mTLS and forwarded to a healthy instance of the trino-coordinator service. Our application code no longer needs to know the IP address of the Trino coordinator; it just talks to localhost.

./consul/configs/trino-coordinator.hcl:

service {
  name = "trino-coordinator"
  port = 8080

  connect {
    sidecar_service {}
  }
  
  check {
    id       = "trino-coordinator-http"
    name     = "Trino Coordinator HTTP API"
    http     = "http://localhost:8080/v1/info"
    method   = "GET"
    interval = "10s"
    timeout  = "2s"
  }
}

Now, we rewrite the docker-compose.yml to launch these services with their sidecars. This is where things get complex. Each application container needs a companion Consul agent container.

# docker-compose.secure.yml
version: '3.8'

services:
  # ... consul-server remains the same ...

  trino-coordinator:
    image: trinodb/trino:399
    container_name: trino-coordinator
    volumes:
      - ./trino/coordinator_secure:/etc/trino # Use secured config
    networks:
      - data-plane
    # No ports exposed to the host

  consul-agent-coordinator:
    image: hashicorp/consul:1.13.1
    container_name: consul-agent-coordinator
    command: "agent -node=trino-coordinator-node -join=consul-server -config-dir=/consul/config"
    volumes:
      - ./consul/configs/trino-coordinator.hcl:/consul/config/trino-coordinator.hcl
    network_mode: "service:trino-coordinator" # Critical: shares network stack
    depends_on:
      - trino-coordinator

  trino-worker:
    # ... similar setup for worker ...
    
  consul-agent-worker:
    # ... similar setup for worker agent ...

  orchestrator:
    build:
      context: ./orchestrator
    container_name: orchestrator
    environment:
      # CRITICAL CHANGE: We now connect to the local sidecar proxy
      - TRINO_HOST=127.0.0.1
      - TRINO_PORT=28080
      - TRINO_PROTOCOL=https # The proxy terminates TLS for us
    networks:
      - data-plane

  consul-agent-orchestrator:
    image: hashicorp/consul:1.13.1
    container_name: consul-agent-orchestrator
    command: "agent -node=orchestrator-node -join=consul-server -config-dir=/consul/config"
    volumes:
      - ./consul/configs/orchestrator.hcl:/consul/config/orchestrator.hcl
    network_mode: "service:orchestrator"
    depends_on:
      - orchestrator

networks:
  data-plane:
    driver: bridge

The network_mode: "service:..." directive is the core mechanism here. It places the Consul agent and its Envoy proxy in the same network namespace as the application container. This allows the application to communicate with its sidecar via localhost, which is essential for the transparent proxying to work.

Phase 2: Reconfiguring Trino for a Service Mesh World

This was the most challenging part of the implementation. Trino’s discovery and internal communication mechanisms were not designed with a service mesh in mind. A common mistake is to only proxy the external client traffic. However, the communication between the Trino coordinator and workers must also be secured.

We had to modify Trino’s configuration to force all communication through the local Envoy proxies.

./trino/coordinator_secure/config.properties:

coordinator=true
node-scheduler.include-coordinator=false
http-server.http.port=8080
# Force internal comms over HTTPS, which the proxy will handle
internal-communication.https.required=true
internal-communication.https.keystore.path=/etc/trino/keystore.jks
internal-communication.https.keystore.key=dummykey

# This is the key: discovery URI now points to the local proxy
# which will forward to the real coordinator service via Consul.
# Trino thinks it's talking to itself, but it's talking to Envoy.
discovery.uri=https://localhost:8080

./trino/worker_secure/config.properties:

coordinator=false
http-server.http.port=8080
# Discovery points to the coordinator's service name in Consul.
# The worker's local proxy will resolve and connect to this.
discovery.uri=https://trino-coordinator.service.consul:8080

The keystore files are dummies. Because the Envoy sidecar proxy is handling TLS termination and origination for mTLS, Trino’s own HTTPS configuration is only needed to satisfy its internal requirement to speak HTTPS when internal-communication.https.required is true. The proxy intercepts this traffic before Trino’s own TLS layer becomes relevant for the external connection. This feels like a hack, but in a real-world project, this is a pragmatic way to adapt a legacy application to a service mesh without code changes.

To visualize the secured traffic flow:

graph TD
    subgraph "Orchestrator Container"
        A[Orchestrator App] -->|Plain HTTP to localhost:28080| B(Envoy Sidecar)
    end
    subgraph "Trino Coordinator Container"
        D(Envoy Sidecar) -->|Plain HTTP to localhost:8080| E[Trino Coordinator Process]
    end
    subgraph "Trino Worker Container"
        G(Envoy Sidecar) -->|Plain HTTP to localhost:8080| H[Trino Worker Process]
    end
    
    B -- mTLS over data-plane network --> D
    D -- mTLS over data-plane network --> G
    E -- Internal Discovery via localhost --> D

    style B fill:#f9f,stroke:#333,stroke-width:2px
    style D fill:#f9f,stroke:#333,stroke-width:2px
    style G fill:#f9f,stroke:#333,stroke-width:2px

Phase 3: The Orchestrator Logic and Secure Connection

The orchestrator service is a Python application that uses the trino-python-client. Its responsibility is to connect to Trino, execute a predefined query, fetch the results, and then (in a real system) trigger an SSG build.

./orchestrator/main.py:

import os
import logging
import time
import sys
from trino.dbapi import connect
from trino.exceptions import TrinoError

logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(levelname)s - %(message)s',
    stream=sys.stdout
)

# Configuration from environment variables
TRINO_HOST = os.getenv("TRINO_HOST", "127.0.0.1")
TRINO_PORT = int(os.getenv("TRINO_PORT", 28080))
TRINO_USER = "data-science-user"
TRINO_CATALOG = "tpch"
TRINO_SCHEMA = "sf1"

def run_query():
    """
    Connects to Trino through the local Consul Connect proxy and runs a query.
    """
    conn = None
    try:
        logging.info(f"Attempting to connect to Trino at {TRINO_HOST}:{TRINO_PORT}")
        conn = connect(
            host=TRINO_HOST,
            port=TRINO_PORT,
            user=TRINO_USER,
            catalog=TRINO_CATALOG,
            schema=TRINO_SCHEMA,
            http_scheme='https', # Envoy proxy provides HTTPS upstream
            verify=False # We don't need to verify the cert, as it's localhost
        )
        
        cur = conn.cursor()
        logging.info("Connection successful. Executing query...")
        
        # A sample query that would be parameterized in a real application
        query = "SELECT nationkey, name, regionkey FROM nation ORDER BY nationkey LIMIT 10"
        cur.execute(query)
        
        rows = cur.fetchall()
        logging.info(f"Query successful. Fetched {len(rows)} rows.")
        
        # In a real implementation, this is where you'd trigger the SSG
        # For example, write rows to a CSV/JSON and run a Hugo/Eleventy process
        #
        # with open("/reports_data/nations.json", "w") as f:
        #     json.dump(rows, f)
        # subprocess.run(["hugo", "-s", "/path/to/site", "-d", "/output/public"])
        
        for row in rows:
            logging.info(f"  Data: {row}")

    except TrinoError as e:
        logging.error(f"Trino query failed: {e}")
        # Implement proper retry logic or dead-letter queue here
        # A common mistake is to not handle transient failures gracefully
    except Exception as e:
        logging.error(f"An unexpected error occurred: {e}", exc_info=True)
    finally:
        if conn:
            conn.close()
            logging.info("Connection to Trino closed.")


if __name__ == "__main__":
    while True:
        run_query()
        logging.info("Sleeping for 30 seconds before next run...")
        time.sleep(30)

The critical detail in the code is how it connects: host='127.0.0.1', port=28080, and http_scheme='https'. The code is completely unaware of the service mesh’s existence beyond this endpoint configuration. This is the power of the sidecar model: it decouples application logic from network security logic.

Phase 4: Enforcing Access with Consul Intentions

With everything wired up, the final step is to define the security policy. By default, Consul Connect in secure mode denies all traffic. We must explicitly create “intentions” to allow communication between services.

We can define these using the Consul CLI or via HCL files applied with consul config write.

./consul/intentions/allow-orchestrator-to-trino.hcl:

Kind          = "service-intentions"
Name          = "trino-coordinator"
Sources = [
  {
    Name   = "orchestrator"
    Action = "allow"
  }
]

This configuration states that the service named orchestrator is allowed to initiate connections to the service named trino-coordinator.

To prove this works, let’s first run the system without this intention. The Python orchestrator will fail to connect, and the Envoy proxy logs for the orchestrator sidecar will show a RBAC: access denied error. This is the system working as intended—denying by default.

After applying the intention:
consul config write ./consul/intentions/allow-orchestrator-to-trino.hcl

The orchestrator’s connection attempts will now succeed. The query runs, and data is fetched. We have successfully created an encrypted, authenticated, and authorized communication channel between our application and the Trino cluster. We would repeat this process to allow the Trino coordinator to talk to the workers.

A more advanced L7 intention could even restrict access to specific HTTP paths, for instance, allowing the orchestrator to only access /v1/statement on the Trino API, but this level of granularity was not required for our initial implementation.

Lingering Issues and Future Work

This solution provides a robust security foundation, but it is not a complete production system. A pragmatic engineer must always consider the boundaries of the current implementation.

First, the performance overhead of the Envoy sidecars is non-zero. For the metadata-heavy communication between the client and coordinator, this is negligible. However, for high-volume data shuffling between Trino workers on large-scale queries, the extra network hop and CPU cycles for encryption/decryption within each sidecar could become a bottleneck. Measuring this impact with realistic workloads is a critical next step. Emerging technologies like eBPF-based service meshes aim to mitigate this, but they come with their own complexity.

Second, this architecture only handles service-to-service authentication (the who at the machine level). It does not handle user-level authentication and authorization (the who at the human level). Trino still needs to be configured with an authenticator (e.g., LDAP or password file) to verify the user data-science-user and an authorizer to check if that user has permissions on the tpch catalog. Consul Connect secures the pipe; Trino secures the data access within that pipe. They are complementary, not mutually exclusive.

Finally, the bridge between Jupyter and this orchestrator is conceptualized here. A production implementation would involve using JupyterHub to spawn user-specific notebook servers, with each server’s container running a Consul agent sidecar. The user’s code within the notebook would then connect to localhost, just as our orchestrator script does, inheriting the secure connection context of its container. This extension introduces significant complexity around dynamic service registration and intention management for ephemeral notebook containers.

SSG Zero Trust Consul Connect Service Mesh Presto / Trino Jupyter

Implementing a Polyglot CQRS Pipeline with Cassandra Writes and Firestore Real-Time Projections

2023-10-27 Distributed Systems

CQRS Firestore AWS SNS Cassandra AWS Lambda Lit Event-Driven Architecture

Constructing an Idempotent Batching Consumer for Bridging a REST API and HBase via RabbitMQ

2023-10-27 Data Engineering

RabbitMQ RESTful API HBase Java System Design