The initial request seemed straightforward: allow our data science team to run ad-hoc queries against a Trino cluster and publish the results as static HTML reports. The problem was security. The Trino cluster processes sensitive PII data, and our security posture mandates a zero-trust network model. Opening direct network paths from analyst environments (Jupyter notebooks) to the data cluster was a non-starter. Manually managing TLS certificates and firewall rules for each component—JupyterHub, the Trino coordinator, Trino workers, and the report generation service—would create an operational bottleneck and a brittle, error-prone system.
Our primary pain point was the lack of a unified, identity-based security mechanism for a heterogeneous data pipeline. Trino has its own security mechanisms, but they don’t cover the communication from the report generator or the interactive notebooks. A traditional VPN-based approach provides coarse-grained network access, not fine-grained, service-to-service authorization. This is what led us down the path of a service mesh, a tool typically associated with microservices, not data engineering stacks.
The concept was to build a pipeline where every network connection is authenticated and authorized based on service identity, not IP addresses. An analyst, working within a Jupyter environment, would execute a query. This query would be sent to the Trino coordinator, which would then distribute the work to its workers. A separate “Orchestrator” service would then fetch the results, process them, and invoke a Static Site Generator (SSG) to create a report. Every single arrow in this data flow diagram needed to be secured with mutual TLS (mTLS). We selected Consul Connect for this task due to its relative simplicity and strong integration with HashiCorp’s ecosystem. In a real-world project, the choice of service mesh is critical, but Consul provided a clear path to securing non-HTTP traffic (like Trino’s internal protocol) with its sidecar proxy model.
Initial Architecture and The Security Gap
Let’s lay out the components in a docker-compose.yml
file. This represents our baseline, unsecured environment. It includes a Consul server, a Trino coordinator, a Trino worker, and our custom Python orchestrator that simulates the bridge between Jupyter and the SSG.
# docker-compose.base.yml
version: '3.8'
services:
consul-server:
image: hashicorp/consul:1.13.1
container_name: consul-server
command: "agent -server -ui -node=server-1 -bootstrap-expect=1 -client=0.0.0.0"
ports:
- "8500:8500"
- "8600:8600/udp"
networks:
- data-plane
trino-coordinator:
image: trinodb/trino:399
container_name: trino-coordinator
volumes:
- ./trino/coordinator:/etc/trino
ports:
- "8080:8080"
networks:
- data-plane
depends_on:
- consul-server
trino-worker:
image: trinodb/trino:399
container_name: trino-worker
volumes:
- ./trino/worker:/etc/trino
networks:
- data-plane
depends_on:
- trino-coordinator
orchestrator:
build:
context: ./orchestrator
container_name: orchestrator
environment:
- TRINO_HOST=trino-coordinator
- TRINO_PORT=8080
networks:
- data-plane
depends_on:
- trino-coordinator
networks:
data-plane:
driver: bridge
The configurations for Trino are minimal at this stage.
./trino/coordinator/config.properties
:
coordinator=true
node-scheduler.include-coordinator=false
http-server.http.port=8080
query.max-memory=5GB
query.max-memory-per-node=1GB
discovery-server.enabled=true
discovery.uri=http://trino-coordinator:8080
./trino/worker/config.properties
:
coordinator=false
http-server.http.port=8080
query.max-memory=5GB
query.max-memory-per-node=1GB
discovery.uri=http://trino-coordinator:8080
This setup works, but it’s a security disaster. Any container on the data-plane
network can communicate with any other. The orchestrator can hit the Trino worker’s HTTP port directly. There is no encryption in transit. Our goal is to lock this down entirely using Consul Connect.
Phase 1: Integrating Consul Connect Sidecars
The first step is to inject a Consul agent and an Envoy proxy sidecar into each service that needs to communicate securely. We’ll modify our Docker Compose setup to reflect this. The key change is that each service now gets a corresponding Consul agent running in client mode.
The service definitions are moved from Docker Compose labels into dedicated HCL configuration files for clarity and better management in a production scenario.
./consul/configs/orchestrator.hcl
:
service {
name = "orchestrator"
port = 8000 // A placeholder port for the service itself
connect {
sidecar_service {
proxy {
upstreams {
destination_name = "trino-coordinator"
local_bind_port = 28080 // We'll connect to this port locally
}
}
}
}
check {
id = "orchestrator-alive"
name = "Orchestrator Process Check"
args = ["/usr/bin/pgrep", "-f", "main.py"]
interval = "10s"
timeout = "1s"
}
}
A pitfall here is misunderstanding the upstreams
block. This directive tells the local Envoy proxy (the sidecar for orchestrator
) to listen on localhost:28080
. Any traffic sent to this local port will be automatically encrypted with mTLS and forwarded to a healthy instance of the trino-coordinator
service. Our application code no longer needs to know the IP address of the Trino coordinator; it just talks to localhost.
./consul/configs/trino-coordinator.hcl
:
service {
name = "trino-coordinator"
port = 8080
connect {
sidecar_service {}
}
check {
id = "trino-coordinator-http"
name = "Trino Coordinator HTTP API"
http = "http://localhost:8080/v1/info"
method = "GET"
interval = "10s"
timeout = "2s"
}
}
Now, we rewrite the docker-compose.yml
to launch these services with their sidecars. This is where things get complex. Each application container needs a companion Consul agent container.
# docker-compose.secure.yml
version: '3.8'
services:
# ... consul-server remains the same ...
trino-coordinator:
image: trinodb/trino:399
container_name: trino-coordinator
volumes:
- ./trino/coordinator_secure:/etc/trino # Use secured config
networks:
- data-plane
# No ports exposed to the host
consul-agent-coordinator:
image: hashicorp/consul:1.13.1
container_name: consul-agent-coordinator
command: "agent -node=trino-coordinator-node -join=consul-server -config-dir=/consul/config"
volumes:
- ./consul/configs/trino-coordinator.hcl:/consul/config/trino-coordinator.hcl
network_mode: "service:trino-coordinator" # Critical: shares network stack
depends_on:
- trino-coordinator
trino-worker:
# ... similar setup for worker ...
consul-agent-worker:
# ... similar setup for worker agent ...
orchestrator:
build:
context: ./orchestrator
container_name: orchestrator
environment:
# CRITICAL CHANGE: We now connect to the local sidecar proxy
- TRINO_HOST=127.0.0.1
- TRINO_PORT=28080
- TRINO_PROTOCOL=https # The proxy terminates TLS for us
networks:
- data-plane
consul-agent-orchestrator:
image: hashicorp/consul:1.13.1
container_name: consul-agent-orchestrator
command: "agent -node=orchestrator-node -join=consul-server -config-dir=/consul/config"
volumes:
- ./consul/configs/orchestrator.hcl:/consul/config/orchestrator.hcl
network_mode: "service:orchestrator"
depends_on:
- orchestrator
networks:
data-plane:
driver: bridge
The network_mode: "service:..."
directive is the core mechanism here. It places the Consul agent and its Envoy proxy in the same network namespace as the application container. This allows the application to communicate with its sidecar via localhost
, which is essential for the transparent proxying to work.
Phase 2: Reconfiguring Trino for a Service Mesh World
This was the most challenging part of the implementation. Trino’s discovery and internal communication mechanisms were not designed with a service mesh in mind. A common mistake is to only proxy the external client traffic. However, the communication between the Trino coordinator and workers must also be secured.
We had to modify Trino’s configuration to force all communication through the local Envoy proxies.
./trino/coordinator_secure/config.properties
:
coordinator=true
node-scheduler.include-coordinator=false
http-server.http.port=8080
# Force internal comms over HTTPS, which the proxy will handle
internal-communication.https.required=true
internal-communication.https.keystore.path=/etc/trino/keystore.jks
internal-communication.https.keystore.key=dummykey
# This is the key: discovery URI now points to the local proxy
# which will forward to the real coordinator service via Consul.
# Trino thinks it's talking to itself, but it's talking to Envoy.
discovery.uri=https://localhost:8080
./trino/worker_secure/config.properties
:
coordinator=false
http-server.http.port=8080
# Discovery points to the coordinator's service name in Consul.
# The worker's local proxy will resolve and connect to this.
discovery.uri=https://trino-coordinator.service.consul:8080
The keystore files are dummies. Because the Envoy sidecar proxy is handling TLS termination and origination for mTLS, Trino’s own HTTPS configuration is only needed to satisfy its internal requirement to speak HTTPS when internal-communication.https.required
is true. The proxy intercepts this traffic before Trino’s own TLS layer becomes relevant for the external connection. This feels like a hack, but in a real-world project, this is a pragmatic way to adapt a legacy application to a service mesh without code changes.
To visualize the secured traffic flow:
graph TD subgraph "Orchestrator Container" A[Orchestrator App] -->|Plain HTTP to localhost:28080| B(Envoy Sidecar) end subgraph "Trino Coordinator Container" D(Envoy Sidecar) -->|Plain HTTP to localhost:8080| E[Trino Coordinator Process] end subgraph "Trino Worker Container" G(Envoy Sidecar) -->|Plain HTTP to localhost:8080| H[Trino Worker Process] end B -- mTLS over data-plane network --> D D -- mTLS over data-plane network --> G E -- Internal Discovery via localhost --> D style B fill:#f9f,stroke:#333,stroke-width:2px style D fill:#f9f,stroke:#333,stroke-width:2px style G fill:#f9f,stroke:#333,stroke-width:2px
Phase 3: The Orchestrator Logic and Secure Connection
The orchestrator service is a Python application that uses the trino-python-client
. Its responsibility is to connect to Trino, execute a predefined query, fetch the results, and then (in a real system) trigger an SSG build.
./orchestrator/main.py
:
import os
import logging
import time
import sys
from trino.dbapi import connect
from trino.exceptions import TrinoError
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(levelname)s - %(message)s',
stream=sys.stdout
)
# Configuration from environment variables
TRINO_HOST = os.getenv("TRINO_HOST", "127.0.0.1")
TRINO_PORT = int(os.getenv("TRINO_PORT", 28080))
TRINO_USER = "data-science-user"
TRINO_CATALOG = "tpch"
TRINO_SCHEMA = "sf1"
def run_query():
"""
Connects to Trino through the local Consul Connect proxy and runs a query.
"""
conn = None
try:
logging.info(f"Attempting to connect to Trino at {TRINO_HOST}:{TRINO_PORT}")
conn = connect(
host=TRINO_HOST,
port=TRINO_PORT,
user=TRINO_USER,
catalog=TRINO_CATALOG,
schema=TRINO_SCHEMA,
http_scheme='https', # Envoy proxy provides HTTPS upstream
verify=False # We don't need to verify the cert, as it's localhost
)
cur = conn.cursor()
logging.info("Connection successful. Executing query...")
# A sample query that would be parameterized in a real application
query = "SELECT nationkey, name, regionkey FROM nation ORDER BY nationkey LIMIT 10"
cur.execute(query)
rows = cur.fetchall()
logging.info(f"Query successful. Fetched {len(rows)} rows.")
# In a real implementation, this is where you'd trigger the SSG
# For example, write rows to a CSV/JSON and run a Hugo/Eleventy process
#
# with open("/reports_data/nations.json", "w") as f:
# json.dump(rows, f)
# subprocess.run(["hugo", "-s", "/path/to/site", "-d", "/output/public"])
for row in rows:
logging.info(f" Data: {row}")
except TrinoError as e:
logging.error(f"Trino query failed: {e}")
# Implement proper retry logic or dead-letter queue here
# A common mistake is to not handle transient failures gracefully
except Exception as e:
logging.error(f"An unexpected error occurred: {e}", exc_info=True)
finally:
if conn:
conn.close()
logging.info("Connection to Trino closed.")
if __name__ == "__main__":
while True:
run_query()
logging.info("Sleeping for 30 seconds before next run...")
time.sleep(30)
The critical detail in the code is how it connects: host='127.0.0.1'
, port=28080
, and http_scheme='https'
. The code is completely unaware of the service mesh’s existence beyond this endpoint configuration. This is the power of the sidecar model: it decouples application logic from network security logic.
Phase 4: Enforcing Access with Consul Intentions
With everything wired up, the final step is to define the security policy. By default, Consul Connect in secure mode denies all traffic. We must explicitly create “intentions” to allow communication between services.
We can define these using the Consul CLI or via HCL files applied with consul config write
.
./consul/intentions/allow-orchestrator-to-trino.hcl
:
Kind = "service-intentions"
Name = "trino-coordinator"
Sources = [
{
Name = "orchestrator"
Action = "allow"
}
]
This configuration states that the service named orchestrator
is allowed to initiate connections to the service named trino-coordinator
.
To prove this works, let’s first run the system without this intention. The Python orchestrator will fail to connect, and the Envoy proxy logs for the orchestrator sidecar will show a RBAC: access denied
error. This is the system working as intended—denying by default.
After applying the intention:consul config write ./consul/intentions/allow-orchestrator-to-trino.hcl
The orchestrator’s connection attempts will now succeed. The query runs, and data is fetched. We have successfully created an encrypted, authenticated, and authorized communication channel between our application and the Trino cluster. We would repeat this process to allow the Trino coordinator to talk to the workers.
A more advanced L7 intention could even restrict access to specific HTTP paths, for instance, allowing the orchestrator to only access /v1/statement
on the Trino API, but this level of granularity was not required for our initial implementation.
Lingering Issues and Future Work
This solution provides a robust security foundation, but it is not a complete production system. A pragmatic engineer must always consider the boundaries of the current implementation.
First, the performance overhead of the Envoy sidecars is non-zero. For the metadata-heavy communication between the client and coordinator, this is negligible. However, for high-volume data shuffling between Trino workers on large-scale queries, the extra network hop and CPU cycles for encryption/decryption within each sidecar could become a bottleneck. Measuring this impact with realistic workloads is a critical next step. Emerging technologies like eBPF-based service meshes aim to mitigate this, but they come with their own complexity.
Second, this architecture only handles service-to-service authentication (the who at the machine level). It does not handle user-level authentication and authorization (the who at the human level). Trino still needs to be configured with an authenticator (e.g., LDAP or password file) to verify the user data-science-user
and an authorizer to check if that user has permissions on the tpch
catalog. Consul Connect secures the pipe; Trino secures the data access within that pipe. They are complementary, not mutually exclusive.
Finally, the bridge between Jupyter and this orchestrator is conceptualized here. A production implementation would involve using JupyterHub to spawn user-specific notebook servers, with each server’s container running a Consul agent sidecar. The user’s code within the notebook would then connect to localhost
, just as our orchestrator script does, inheriting the secure connection context of its container. This extension introduces significant complexity around dynamic service registration and intention management for ephemeral notebook containers.