The initial proof-of-concept for our Retrieval-Augmented Generation (RAG) service was straightforward. A single ChromaDB collection, a batch ingestion script, and an API that converted user questions into embeddings and performed a similarity search. It worked well enough for a demo. The moment we had to architect it for multiple customers, however, we hit a wall. The core problem was that ChromaDB, in its current form, has no concept of granular, per-document access control. A simple application-level bug could cause catastrophic data leakage between tenants, a risk no production system can afford.
Our existing stack was non-negotiable: user and tenant metadata resides in Firestore for its scalability and ease of use, authentication is handled by a centralized IAM provider issuing standard JWTs, and the entire workload runs on DigitalOcean Droplets for simplicity and cost-effectiveness. The challenge was to bolt a robust, non-negotiable security layer onto ChromaDB without fundamentally changing the application’s interaction with it.
Forking ChromaDB to add native IAM was out of the question. Running a separate ChromaDB instance or even a collection per tenant felt like a path to operational chaos, especially when projecting to thousands of tenants. The cost and management overhead would become untenable. The only viable path was to intercept and rewrite all communication with the database, enforcing tenancy at a layer just before the data store. This led to the design of a mandatory, tenancy-aware proxy service.
The principle is this: no application service talks to ChromaDB directly. All traffic is routed through a lightweight Go proxy. This proxy’s sole responsibility is to validate the incoming JWT, extract a tenant_id
claim, and inject a mandatory where
filter into every single ChromaDB query to enforce data isolation. This architecture centralizes the security logic, making it auditable and difficult to bypass.
The Architectural Foundation
The network topology on DigitalOcean is critical for this to work. We deploy both the ChromaDB instance and our custom IAM proxy as Docker containers on the same Droplet. They communicate over a private Docker bridge network. Crucially, the ChromaDB container’s port (8000) is only exposed to this internal network. The proxy’s port is the only one exposed to the public internet (or to our internal VPC). Any attempt to bypass the proxy and hit ChromaDB directly from another service will fail at the network level.
graph TD subgraph "DigitalOcean Droplet" direction LR subgraph "Docker Bridge Network" ProxyService[IAM Proxy :8080] --> ChromaDBService[ChromaDB :8000] end Client[Client Application] -- HTTPS Request with JWT --> LB[DO Load Balancer] LB --> ProxyService end subgraph "External Services" IAMProvider[IAM Provider] FirestoreDB[Firestore] end Client -- Authenticates --> IAMProvider IAMProvider -- Issues JWT --> Client ProxyService -- Validates JWT Public Key --> IAMProvider ProxyService -- Fetches Tenant Metadata --> FirestoreDB
This setup ensures the proxy is a mandatory checkpoint, not an optional convenience.
The Proxy Implementation in Go
We chose Go for its performance, concurrency model, and strong standard library, making it ideal for a network proxy that needs to be fast and reliable.
1. Project Structure and Dependencies
A minimal Go project structure is sufficient.
/iam-chroma-proxy
|-- /cmd
| |-- /main.go # Application entry point
|-- /internal
| |-- /auth # JWT validation logic
| |-- /config # Configuration management
| |-- /database # Firestore client
| |-- /proxy # Core proxy and query rewriting logic
|-- go.mod
|-- go.sum
|-- Dockerfile
|-- config.yaml
The core dependencies are managed via go.mod
:
// go.mod
module github.com/your-org/iam-chroma-proxy
go 1.21
require (
cloud.google.com/go/firestore v1.14.0
github.com/gin-gonic/gin v1.9.1
github.com/golang-jwt/jwt/v5 v5.0.0
github.com/jellydator/validation v1.1.0
gopkg.in/yaml.v3 v3.0.1
)
// ... other transitive dependencies
2. Configuration and Initialization
In a real-world project, hardcoding configuration is a recipe for disaster. We manage settings through a config.yaml
file loaded at startup.
# config.yaml
server:
port: "8080"
chromadb:
target_url: "http://chromadb:8000" # Internal Docker network hostname
iam:
# The JWKS endpoint of your identity provider (e.g., Auth0, Cognito, etc.)
jwks_url: "https://your-iam-provider.com/.well-known/jwks.json"
# The audience claim expected in the JWT
audience: "https://api.your-service.com"
# The issuer claim expected in the JWT
issuer: "https://your-iam-provider.com/"
firestore:
project_id: "your-gcp-project-id"
# Used for logging and operational context
service:
name: "iam-chroma-proxy"
version: "1.0.0"
The Go code to load and validate this configuration is boilerplate but necessary for production readiness.
// internal/config/config.go
package config
import (
"os"
"gopkg.in/yaml.v3"
)
type Config struct {
Server struct {
Port string `yaml:"port"`
} `yaml:"server"`
ChromaDB struct {
TargetURL string `yaml:"target_url"`
} `yaml:"chromadb"`
IAM struct {
JWKSURL string `yaml:"jwks_url"`
Audience string `yaml:"audience"`
Issuer string `yaml:"issuer"`
} `yaml:"iam"`
Firestore struct {
ProjectID string `yaml:"project_id"`
} `yaml:"firestore"`
}
func Load(path string) (*Config, error) {
var cfg Config
f, err := os.ReadFile(path)
if err != nil {
return nil, err
}
if err := yaml.Unmarshal(f, &cfg); err != nil {
return nil, err
}
// A common mistake is not validating config values.
// Add validation logic here (e.g., check for empty strings, valid URLs).
return &cfg, nil
}
3. The Core: JWT Authentication Middleware
This is the first gate. Every request must present a valid bearer token. We use the jwt/v5
library to parse the token and fetch the public key from the provider’s JWKS endpoint for signature verification. A common pitfall is failing to cache the JWKS response, leading to excessive HTTP requests to the IAM provider. A production system must implement caching with a reasonable TTL.
// internal/auth/validator.go
package auth
import (
"context"
"errors"
"fmt"
"log/slog"
"net/http"
"strings"
"sync"
"time"
"github.com/gin-gonic/gin"
"github.com/golang-jwt/jwt/v5"
"github.com/MicahParks/keyfunc/v2"
)
// Define custom claims structure to extract tenant_id
type CustomClaims struct {
TenantID string `json:"tenant_id"`
jwt.RegisteredClaims
}
type JWTValidator struct {
jwks *keyfunc.JWKS
once sync.Once
jwksURL string
audience string
issuer string
}
func NewJWTValidator(jwksURL, audience, issuer string) (*JWTValidator, error) {
return &JWTValidator{
jwksURL: jwksURL,
audience: audience,
issuer: issuer,
}, nil
}
// initJWKS initializes the JWKS key function with caching.
func (v *JWTValidator) initJWKS() {
var err error
options := keyfunc.Options{
RefreshInterval: time.Hour,
RefreshTimeout: 10 * time.Second,
RefreshErrorHandler: func(err error) {
slog.Error("JWKS refresh error", "error", err)
},
}
v.jwks, err = keyfunc.Get(v.jwksURL, options)
if err != nil {
// This is a fatal error on startup. The proxy cannot function without the keys.
panic(fmt.Sprintf("Failed to get JWKS: %v", err))
}
}
// AuthMiddleware is a Gin middleware for JWT validation.
func (v *JWTValidator) AuthMiddleware() gin.HandlerFunc {
// Lazily initialize JWKS on first request.
v.once.Do(v.initJWKS)
return func(c *gin.Context) {
authHeader := c.GetHeader("Authorization")
if authHeader == "" {
c.AbortWithStatusJSON(http.StatusUnauthorized, gin.H{"error": "Authorization header required"})
return
}
parts := strings.Split(authHeader, " ")
if len(parts) != 2 || strings.ToLower(parts[0]) != "bearer" {
c.AbortWithStatusJSON(http.StatusUnauthorized, gin.H{"error": "Invalid Authorization header format"})
return
}
tokenString := parts[1]
claims := &CustomClaims{}
token, err := jwt.ParseWithClaims(tokenString, claims, v.jwks.Keyfunc)
if err != nil {
slog.Warn("JWT parsing failed", "error", err)
c.AbortWithStatusJSON(http.StatusUnauthorized, gin.H{"error": "Invalid token"})
return
}
if !token.Valid {
c.AbortWithStatusJSON(http.StatusUnauthorized, gin.H{"error": "Invalid token signature or claims"})
return
}
// In a real-world project, you MUST validate issuer and audience.
// This prevents token substitution attacks.
if !strings.EqualFold(claims.Issuer, v.issuer) {
c.AbortWithStatusJSON(http.StatusUnauthorized, gin.H{"error": "Invalid token issuer"})
return
}
if !claims.IsForAudience(v.audience) {
c.AbortWithStatusJSON(http.StatusUnauthorized, gin.H{"error": "Invalid token audience"})
return
}
if claims.TenantID == "" {
c.AbortWithStatusJSON(http.StatusForbidden, gin.H{"error": "Token missing required tenant_id claim"})
return
}
// Store the tenant ID in the context for downstream handlers.
c.Set("tenant_id", claims.TenantID)
c.Next()
}
}
The most critical part here is extracting our custom tenant_id
claim and injecting it into the request context. This securely passes the tenant’s identity to the next layer: the query rewriter.
4. The Query Rewriting Proxy Handler
This is where the magic happens. We define a handler that catches all requests (/api/v1/*path
). It reads the incoming request body, which is a JSON payload destined for ChromaDB. We unmarshal it into a generic map[string]interface{}
to avoid being coupled to ChromaDB’s specific request structures. Then, we manipulate this map to inject our security filter before re-serializing it and forwarding it to the real ChromaDB instance.
The logic for modifying the where
filter must be robust. A naive implementation could be easily bypassed. For instance, if a user provides their own where
clause like {"$or": [{"owner": "me"}, {"tenant_id": "another-tenant"}]}
, simply adding our filter won’t work. The correct approach is to enforce an "$and"
condition.
// internal/proxy/handler.go
package proxy
import (
"bytes"
"encoding/json"
"fmt"
"io"
"log/slog"
"net/http"
"net/http/httputil"
"net/url"
"github.com/gin-gonic/gin"
)
type ChromaProxy struct {
target *url.URL
}
func NewChromaProxy(targetURL string) (*ChromaProxy, error) {
u, err := url.Parse(targetURL)
if err != nil {
return nil, fmt.Errorf("invalid target URL: %w", err)
}
return &ChromaProxy{target: u}, nil
}
func (p *ChromaProxy) HandleProxy() gin.HandlerFunc {
proxy := httputil.NewSingleHostReverseProxy(p.target)
// We need to modify the request body, so the default director is not enough.
proxy.Director = func(req *http.Request) {
req.Host = p.target.Host
req.URL.Scheme = p.target.Scheme
req.URL.Host = p.target.Host
}
proxy.ModifyResponse = func(resp *http.Response) error {
// Log errors from ChromaDB for easier debugging.
if resp.StatusCode >= 400 {
slog.Warn("Upstream ChromaDB error",
"status_code", resp.StatusCode,
"request_uri", resp.Request.URL.RequestURI(),
)
}
return nil
}
return func(c *gin.Context) {
tenantID, exists := c.Get("tenant_id")
if !exists {
// This should theoretically never happen if the auth middleware is applied.
c.JSON(http.StatusInternalServerError, gin.H{"error": "Tenant ID not found in context"})
return
}
// We only need to modify POST/PUT requests with bodies.
// A more robust implementation would check the specific ChromaDB endpoints.
// For now, we focus on the main query endpoint.
if c.Request.Method != http.MethodPost || c.FullPath() != "/api/v1/collections/:collection_name/query" {
proxy.ServeHTTP(c.Writer, c.Request)
return
}
bodyBytes, err := io.ReadAll(c.Request.Body)
if err != nil {
c.JSON(http.StatusBadRequest, gin.H{"error": "Failed to read request body"})
return
}
c.Request.Body.Close() // Important to close the original body
var requestPayload map[string]interface{}
if err := json.Unmarshal(bodyBytes, &requestPayload); err != nil {
c.JSON(http.StatusBadRequest, gin.H{"error": "Invalid JSON body"})
return
}
// The core security logic.
err = injectTenantFilter(requestPayload, tenantID.(string))
if err != nil {
c.JSON(http.StatusBadRequest, gin.H{"error": err.Error()})
return
}
modifiedBody, err := json.Marshal(requestPayload)
if err != nil {
c.JSON(http.StatusInternalServerError, gin.H{"error": "Failed to marshal modified body"})
return
}
// Replace the request body with our modified version.
c.Request.Body = io.NopCloser(bytes.NewBuffer(modifiedBody))
c.Request.ContentLength = int64(len(modifiedBody))
c.Request.Header.Set("Content-Length", fmt.Sprint(len(modifiedBody)))
proxy.ServeHTTP(c.Writer, c.Request)
}
}
// injectTenantFilter modifies the ChromaDB query payload to enforce tenancy.
func injectTenantFilter(payload map[string]interface{}, tenantID string) error {
tenantFilter := map[string]interface{}{"tenant_id": tenantID}
whereClause, exists := payload["where"]
if !exists {
// Case 1: No existing 'where' clause. Simply add our tenant filter.
payload["where"] = tenantFilter
return nil
}
whereClauseMap, ok := whereClause.(map[string]interface{})
if !ok {
return fmt.Errorf("'where' clause is not a valid JSON object")
}
// Case 2: A 'where' clause already exists. We must wrap it with an '$and'.
// This prevents a malicious user from using '$or' to query other tenants' data.
newWhereClause := map[string]interface{}{
"$and": []interface{}{
tenantFilter,
whereClauseMap,
},
}
payload["where"] = newWhereClause
slog.Info("Injected tenant filter into ChromaDB query", "tenant_id", tenantID)
return nil
}
A critical piece of defensive programming is to ensure any existing where
clause is wrapped in an $and
operation with our mandatory tenant filter. This is non-negotiable.
5. Unit Testing the Security Logic
Trusting this logic without tests is professional malpractice. We must write unit tests that cover the core injectTenantFilter
function, including edge cases.
// internal/proxy/handler_test.go
package proxy
import (
"encoding/json"
"testing"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
)
func TestInjectTenantFilter(t *testing.T) {
tenantID := "tenant-123"
testCases := []struct {
name string
inputPayload string
expectedPayload string
expectError bool
}{
{
name: "no existing where clause",
inputPayload: `{"query_texts": ["some query"]}`,
expectedPayload: `{
"query_texts": ["some query"],
"where": {"tenant_id": "tenant-123"}
}`,
},
{
name: "with existing where clause",
inputPayload: `{"query_texts": ["some query"], "where": {"status": "active"}}`,
expectedPayload: `{
"query_texts": ["some query"],
"where": {
"$and": [
{"tenant_id": "tenant-123"},
{"status": "active"}
]
}
}`,
},
{
name: "with malicious $or clause",
inputPayload: `{"query_texts": ["some query"], "where": {"$or": [{"owner": "me"}, {"tenant_id": "other-tenant"}]}}`,
expectedPayload: `{
"query_texts": ["some query"],
"where": {
"$and": [
{"tenant_id": "tenant-123"},
{"$or": [{"owner": "me"}, {"tenant_id": "other-tenant"}]}
]
}
}`,
},
{
name: "invalid where clause type",
inputPayload: `{"where": "not-an-object"}`,
expectError: true,
},
}
for _, tc := range testCases {
t.Run(tc.name, func(t *testing.T) {
var payload map[string]interface{}
err := json.Unmarshal([]byte(tc.inputPayload), &payload)
require.NoError(t, err)
err = injectTenantFilter(payload, tenantID)
if tc.expectError {
assert.Error(t, err)
} else {
assert.NoError(t, err)
var expected map[string]interface{}
err = json.Unmarshal([]byte(tc.expectedPayload), &expected)
require.NoError(t, err)
assert.Equal(t, expected, payload)
}
})
}
}
These tests prove that our rewriting logic is sound and handles the cases we care about, including preventing trivial bypasses.
6. Dockerization for DigitalOcean
The final step is packaging the application for deployment. A multi-stage Dockerfile
keeps the final image lean.
# Dockerfile
# ---- Build Stage ----
FROM golang:1.21-alpine AS builder
WORKDIR /app
# Copy go.mod and go.sum files to download dependencies
COPY go.mod go.sum ./
RUN go mod download
# Copy the source code
COPY . .
# Build the application
RUN CGO_ENABLED=0 GOOS=linux go build -o /iam-chroma-proxy ./cmd/main.go
# ---- Final Stage ----
FROM alpine:latest
# It's a good practice to run as a non-root user
RUN addgroup -S appgroup && adduser -S appuser -G appgroup
USER appuser
WORKDIR /home/appuser
# Copy the binary from the builder stage
COPY /iam-chroma-proxy .
# Copy configuration
COPY config.yaml .
# Expose the server port
EXPOSE 8080
# Run the application
CMD ["./iam-chroma-proxy"]
For local development and production deployment, a docker-compose.yml
file ties everything together.
# docker-compose.yml
version: '3.8'
services:
chromadb:
image: chromadb/chroma:latest
# The key is to NOT expose the port to the host machine,
# keeping it within the Docker network.
# ports:
# - "8000:8000" # DO NOT DO THIS IN PRODUCTION
volumes:
- chromadb_data:/chroma/.chroma/
proxy:
build: .
ports:
- "8080:8080" # This is the only publicly exposed port
environment:
# Pass GCP credentials for Firestore securely
GOOGLE_APPLICATION_CREDENTIALS: /run/secrets/gcp_creds.json
secrets:
- gcp_creds.json
depends_on:
- chromadb
command: ["./iam-chroma-proxy", "-config", "config.yaml"]
volumes:
chromadb_data:
secrets:
gcp_creds.json:
file: ./path/to/your/gcp-credentials.json
This configuration, when deployed to a DigitalOcean Droplet with Docker installed, creates the exact isolated network environment required by our architecture.
Limitations and Future Considerations
This proxy architecture solves the immediate problem of multi-tenant data isolation in ChromaDB, but it is not without its own set of trade-offs and concerns. First, it introduces a single point of failure and a potential performance bottleneck. The Go proxy is fast, but it still adds a network hop and processing overhead to every query. For a high-throughput system, this proxy would need to be horizontally scaled behind a load balancer, and its performance would require continuous monitoring.
Second, the current implementation lacks sophisticated observability. Production-grade code would require structured logging with request tracing, and Prometheus metrics for latency, request rates, and error counts per tenant. This is essential for debugging and capacity planning.
Third, this solution is fundamentally a workaround for a missing feature in the underlying database. If a future version of ChromaDB introduces native, robust, role-based access control, this entire service becomes technical debt. Any team implementing such a pattern must keep a close eye on the database’s feature roadmap to avoid maintaining a complex component that is no longer necessary.
Finally, we have not addressed administrative access. A support engineer or system administrator might need to query data across all tenants. This requires a separate, highly secured “backdoor” path that can bypass the tenant injection logic based on a special administrative JWT role, a feature that must be designed and implemented with extreme care.