Implementing a Secure Vector Search Service with Declarative GitOps and Distributed Tracing

Architecture

Word Count: 3k

Read Times: 18 Min

The initial proof-of-concept was deceptively simple. An ASP.NET Core API endpoint that took a text query, embedded it, and fired it off to a Qdrant instance running in Docker. It worked beautifully for public data, but the project’s real value lay in indexing sensitive internal documentation—engineering diagrams, financial reports, and legal contracts. The moment this requirement became concrete, our straightforward semantic search service turned into a significant security and operational challenge. The core pain point wasn’t just authenticating users at the edge, but ensuring a verifiable, auditable, and secure data path from the user’s request all the way to the vector shards in Qdrant, managed entirely through a declarative GitOps workflow.

Our first, naive approach was to put the Qdrant API key in our appsettings.json and call it a day. This was immediately rejected. A single compromised service pod would grant an attacker access to the entire vector index. This is a classic violation of Zero Trust principles, which mandate that trust is never implicit and must be continuously verified. We needed a multi-layered defense: network-level isolation, application-level context awareness, and a fully auditable deployment pipeline.

The Evolving Architecture: From API Keys to mTLS and Traced Context

The initial concept was to build a security-aware proxy service in front of Qdrant. This service would handle authentication and inject metadata filters into queries to enforce tenant isolation. While viable, it added another network hop and a new component to maintain. A more pragmatic solution was to bake this logic directly into our ASP.NET Core service but secure the transport layer itself.

This led to our technology selection refinement:

ASP.NET Core: Remained our application framework of choice. Its robust middleware pipeline and first-class gRPC support were critical.
Qdrant: We committed to using its gRPC interface exclusively. HTTP/REST is fine for debugging, but gRPC with mutual TLS (mTLS) provides a far superior security posture for service-to-service communication. It guarantees both encryption-in-transit and, crucially, client authentication. Qdrant can’t just be called by anything on the network; it must be called by a client presenting a valid, trusted certificate.
Flux CD with SOPS: Our GitOps toolchain. The challenge was managing the mTLS certificates and Qdrant API keys. Storing them in plaintext in Git is a cardinal sin. We integrated Mozilla’s SOPS (Secrets OPerations) into our Flux pipeline. This allows us to encrypt secrets files with PGP or cloud KMS keys, commit the encrypted file to Git, and have Flux decrypt them on-the-fly inside the cluster. This provides a declarative, auditable, and secure secret management workflow.
Apache SkyWalking: Initially used for performance tracing, its role evolved. We discovered its Baggage feature, a mechanism for propagating key-value pairs across process boundaries along with the primary trace context. This was our “aha!” moment. We could inject the user’s identity and tenant claims from their JWT into the SkyWalking baggage at the API ingress. Every downstream operation, including the call to Qdrant, would carry this security context within its trace, giving us an impeccable audit trail and a way to enforce access control deep within the application logic.

Here is the final architecture we implemented.

graph TD
    subgraph Git Repository
        A[App Manifests]
        B[Encrypted Secrets via SOPS]
    end

    subgraph Kubernetes Cluster
        C(Flux CD Controller) -- Watches --> A & B
        C -- Deploys/Reconciles --> D{ASP.NET Core Pod}
        D -- Mounts Decrypted Secret --> E[mTLS Client Cert]
        D -- Contains --> F[SkyWalking Agent]

        subgraph "Secure gRPC Channel (mTLS)"
            D -- gRPC Request w/ Client Cert --> G{Qdrant Pod}
        end

        G -- Requires Client Cert --> D
    end

    U[User w/ JWT] -- HTTPS Request --> D
    D -- Parses JWT --> F
    F -- Injects Claims to Baggage --> D
    D -- Logs with TraceID --> L[Logging Backend]
    F -- Reports Traces --> S[SkyWalking OAP]

    style G fill:#f9f,stroke:#333,stroke-width:2px
    style D fill:#ccf,stroke:#333,stroke-width:2px

Phase 1: The Secure gRPC Channel between ASP.NET Core and Qdrant

Before anything else, we had to lock down the network path. This involved generating certificates, configuring Qdrant to require them, and configuring our ASP.NET Core client to present them.

First, let’s generate the necessary certificates using openssl. In a real-world project, you would use a proper Certificate Authority (CA), perhaps managed by cert-manager in Kubernetes, but for a self-contained example, self-signed is sufficient to prove the mechanism.

# 1. Generate a private key and a self-signed certificate for our CA
openssl genrsa -out ca.key 4096
openssl req -new -x509 -key ca.key -sha256 -subj "/CN=DemoCA" -days 365 -out ca.crt

# 2. Generate the server private key and CSR for Qdrant
openssl genrsa -out qdrant.key 4096
openssl req -new -key qdrant.key -sha256 -subj "/CN=qdrant.default.svc.cluster.local" -out qdrant.csr

# 3. Sign the server CSR with our CA
openssl x509 -req -in qdrant.csr -CA ca.crt -CAkey ca.key -CAcreateserial -out qdrant.crt -days 365 -sha256

# 4. Generate the client private key and CSR for our ASP.NET service
openssl genrsa -out vector-service.key 4096
openssl req -new -key vector-service.key -sha256 -subj "/CN=vector-service.default.svc.cluster.local" -out vector-service.csr

# 5. Sign the client CSR with our CA
openssl x509 -req -in vector-service.csr -CA ca.crt -CAkey ca.key -CAcreateserial -out vector-service.crt -days 365 -sha256

# 6. Combine client key and cert into a PFX for easy use in .NET
openssl pkcs12 -export -out vector-service.pfx -inkey vector-service.key -in vector-service.crt -certfile ca.crt -password pass:YourStrongPasswordHere

Next, we configure Qdrant to use these certificates. In a production Kubernetes setup, this would be managed via a ConfigMap and a Secret. For local testing, we can use a docker-compose.yml.

# docker-compose.yml
version: '3.8'
services:
  qdrant:
    image: qdrant/qdrant:v1.7.4
    ports:
      - "6333:6333" # REST
      - "6334:6334" # gRPC
    volumes:
      - ./qdrant-data:/qdrant/storage
      - ./certs/qdrant.key:/qdrant/config/tls.key
      - ./certs/qdrant.crt:/qdrant/config/tls.crt
      - ./certs/ca.crt:/qdrant/config/ca.crt
      - ./qdrant-config.yaml:/qdrant/config/production.yaml
    command: ["./qdrant", "--config-path", "/qdrant/config/production.yaml"]

The Qdrant configuration file is where we enforce mTLS.

# qdrant-config.yaml
log_level: INFO
service:
  grpc_port: 6334
  enable_tls: true
tls:
  cert: /qdrant/config/tls.crt
  key: /qdrant/config/tls.key
  # This is the key part for mTLS: require and verify client certs against our CA
  ca_cert: /qdrant/config/ca.crt

Now for the ASP.NET Core service. The core challenge is configuring the Qdrant.Client.QdrantGrpcClient to use our client certificate. We do this by customizing the HttpClientHandler used by the gRPC channel.

In Program.cs:

// Program.cs
using System.Security.Cryptography.X509Certificates;
using Grpc.Net.Client;
using Qdrant.Client;

var builder = WebApplication.CreateBuilder(args);

// Configuration for Qdrant client
builder.Services.Configure<QdrantOptions>(builder.Configuration.GetSection("Qdrant"));
var qdrantOptions = builder.Configuration.GetSection("Qdrant").Get<QdrantOptions>()!;

// Register the Qdrant client with custom mTLS handler
builder.Services.AddSingleton(provider =>
{
    var logger = provider.GetRequiredService<ILogger<Program>>();

    var handler = new HttpClientHandler();

    // Load the client certificate. In K8s, this path would be from a mounted secret.
    // e.g., /etc/certs/vector-service.pfx
    var clientCertPath = qdrantOptions.ClientCertPath;
    var clientCertPassword = qdrantOptions.ClientCertPassword;

    if (string.IsNullOrEmpty(clientCertPath))
    {
        // A common mistake is to fail silently. In a real-world project,
        // this should be a hard failure at startup.
        logger.LogCritical("Qdrant client certificate path is not configured. mTLS is required.");
        throw new InvalidOperationException("Qdrant client certificate path cannot be null or empty.");
    }

    try
    {
        var clientCertificate = new X509Certificate2(clientCertPath, clientCertPassword);
        handler.ClientCertificates.Add(clientCertificate);
    }
    catch (Exception ex)
    {
        logger.LogCritical(ex, "Failed to load Qdrant client certificate from {Path}", clientCertPath);
        throw;
    }

    // Optional: If the server cert's CN doesn't match the hostname (e.g., in Docker),
    // you might need custom server certificate validation. In Kubernetes with proper service names,
    // this is usually not required.
    // handler.ServerCertificateCustomValidationCallback = (message, cert, chain, errors) =>
    // {
    //     logger.LogInformation("Validating Qdrant server certificate...");
    //     // Add custom validation logic here if needed. For now, we trust our CA.
    //     return errors == System.Net.Security.SslPolicyErrors.None;
    // };

    var channel = GrpcChannel.ForAddress(qdrantOptions.Url, new GrpcChannelOptions
    {
        HttpHandler = handler
    });

    // Qdrant client requires the API key even with mTLS, for its internal RBAC (Enterprise)
    // or as a simple auth mechanism.
    return new QdrantClient(channel, apiKey: qdrantOptions.ApiKey);
});


builder.Services.AddControllers();
var app = builder.Build();
app.MapControllers();
app.Run();

public class QdrantOptions
{
    public string Url { get; set; } = string.Empty;
    public string ApiKey { get; set; } = string.Empty;
    public string ClientCertPath { get; set; } = string.Empty;
    public string ClientCertPassword { get; set; } = string.Empty;
}

This code block establishes a gRPC channel where every request to Qdrant is encrypted and authenticated using our client certificate. If a rogue service without the correct certificate tries to connect, the TLS handshake will fail at the transport layer, long before any Qdrant logic is executed.

Phase 2: GitOps with Flux CD and SOPS for Secret Management

Managing certificates and API keys manually is not scalable or secure. This is where our GitOps workflow comes in.

First, we encrypt our secrets. We’ll create a Kubernetes Secret manifest and encrypt it with SOPS using a PGP key.

# Generate a PGP key for SOPS (do this once and store securely)
gpg --full-generate-key
# Get the fingerprint
GPG_FINGERPRINT=$(gpg --list-keys --with-colons | grep 'fpr' | head -1 | cut -d: -f10)

# Create a Kubernetes Secret YAML file with our certs and API key
# The actual file contents are base64 encoded
cat <<EOF > vector-service-secrets.yaml
apiVersion: v1
kind: Secret
metadata:
  name: vector-service-qdrant-certs
  namespace: default
type: Opaque
data:
  # echo -n 'YourStrongPasswordHere' | base64
  pfx-password: WW91clN0cm9uZ1Bhc3N3b3JkSGVyZQ==
  # cat vector-service.pfx | base64
  vector-service.pfx: MIIKWgIBAzCCCj8GCSqGSIb3DQEHAaCCCjAEggocMIIKGDCC...
  # echo -n 'qdrant-secret-api-key' | base64
  api-key: cWRyYW50LXNlY3JldC1hcGkta2V5
EOF

# Encrypt the file using SOPS
sops --encrypt --pgp $GPG_FINGERPRINT --in-place vector-service-secrets.yaml

The resulting vector-service-secrets.yaml is now safe to commit to Git. It contains encrypted data but retains its structure.

Next, we configure Flux CD to deploy our application. This involves a Kustomization that points to our application’s manifests and is configured to use the SOPS decryptor.

# ./clusters/production/flux-system/gotk-sync.yaml
apiVersion: source.toolkit.fluxcd.io/v1beta2
kind: GitRepository
metadata:
  name: production-repo
  namespace: flux-system
spec:
  interval: 1m
  url: https://github.com/your-org/your-repo
  ref:
    branch: main
---
apiVersion: kustomize.toolkit.fluxcd.io/v1beta2
kind: Kustomization
metadata:
  name: vector-service
  namespace: flux-system
spec:
  interval: 10m
  path: ./apps/vector-service/production
  prune: true
  sourceRef:
    kind: GitRepository
    name: production-repo
  # This is the crucial part for SOPS integration
  decryption:
    provider: sops
    secretRef:
      name: sops-gpg # A secret in flux-system containing the private GPG key

Finally, our application’s Deployment manifest mounts the decrypted secret and passes the configuration to the application via environment variables.

# ./apps/vector-service/production/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: vector-service
spec:
  replicas: 2
  selector:
    matchLabels:
      app: vector-service
  template:
    metadata:
      labels:
        app: vector-service
    spec:
      containers:
      - name: vector-service-api
        image: your-registry/vector-service:1.2.3
        ports:
        - containerPort: 8080
        env:
        - name: ASPNETCORE_URLS
          value: "http://+:8080"
        - name: Qdrant__Url
          value: "https://qdrant.default.svc.cluster.local:6334"
        - name: Qdrant__ClientCertPath
          value: "/etc/qdrant-certs/vector-service.pfx"
        - name: Qdrant__ClientCertPassword
          # Value from our k8s secret
          valueFrom:
            secretKeyRef:
              name: vector-service-qdrant-certs
              key: pfx-password
        - name: Qdrant__ApiKey
          # Value from our k8s secret
          valueFrom:
            secretKeyRef:
              name: vector-service-qdrant-certs
              key: api-key
        # SkyWalking Agent Configuration
        - name: SKYWALKING__SERVICENAME
          value: VectorSearchService
        - name: SKYWALKING__COLLECTORBAKCENDSERVICES
          value: skywalking-oap.observability.svc.cluster.local:11800
        - name: DOTNET_ADDITIONAL_DEPS
          value: /skywalking/additional-deps
        - name: DOTNET_SHARED_STORE
          value: /skywalking/shared-store
        - name: DOTNET_STARTUP_HOOKS
          value: /skywalking/SkyAPM.DotNet.Core.StartupHook.dll
        volumeMounts:
        - name: qdrant-certs-volume
          mountPath: "/etc/qdrant-certs"
          readOnly: true
        - name: skywalking-agent
          mountPath: /skywalking
      volumes:
      - name: qdrant-certs-volume
        secret:
          secretName: vector-service-qdrant-certs
      # initContainer would be used to copy the SkyWalking agent
      - name: skywalking-agent
        emptyDir: {}

This declarative approach ensures that our secrets are managed securely, with a full audit trail in Git. Any change to a certificate or API key is a Git commit, which can be reviewed and approved.

Phase 3: Propagating Security Context with SkyWalking

With the transport layer secure and secrets managed, the final piece was application-level context. We needed to ensure that a query for “tenant-A” could never see data from “tenant-B”.

We use a standard JWT-based authentication at the edge. An ASP.NET Core middleware parses the token and extracts claims like tenant_id and user_id. The pitfall here is simply passing these as method parameters. It’s easy to forget, and there’s no audit trail of why a certain filter was applied.

This is where SkyWalking’s Baggage comes in. We inject the security claims into the baggage. This context now travels with every span in the distributed trace.

First, let’s define a middleware to extract claims and populate the baggage.

// SecurityContextMiddleware.cs
using SkyApm.Tracing;
using SkyApm.Tracing.Segments;

public class SecurityContextMiddleware
{
    private readonly RequestDelegate _next;
    private readonly IEntrySegmentContextAccessor _contextAccessor;
    private readonly ILogger<SecurityContextMiddleware> _logger;

    public SecurityContextMiddleware(RequestDelegate next, IEntrySegmentContextAccessor contextAccessor, ILogger<SecurityContextMiddleware> logger)
    {
        _next = next;
        _contextAccessor = contextAccessor;
        _logger = logger;
    }

    public async Task InvokeAsync(HttpContext context)
    {
        // In a real app, you would validate the JWT here.
        // For this example, we'll assume a validated ClaimsPrincipal exists.
        var tenantIdClaim = context.User.FindFirst("tenant_id");
        var userIdClaim = context.User.FindFirst("user_id");

        var traceId = _contextAccessor.Context?.TraceId;

        if (tenantIdClaim != null)
        {
            // The key for baggage items must be configured in agent.config
            // to be propagated. E.g., baggage.header.keys=tenant_id,user_id
            _contextAccessor.Context.Span.AddBaggage("tenant_id", tenantIdClaim.Value);
            _logger.LogInformation(
                "TraceID: {TraceId} - Injected tenant_id '{TenantId}' into SkyWalking baggage.",
                traceId, tenantIdClaim.Value);
        }
        else
        {
            _logger.LogWarning(
                "TraceID: {TraceId} - Request received without a 'tenant_id' claim.",
                traceId);
        }

        if (userIdClaim != null)
        {
            _contextAccessor.Context.Span.AddBaggage("user_id", userIdClaim.Value);
        }

        await _next(context);
    }
}

We register this middleware in Program.cs after authentication and authorization.

Now, in our service layer that communicates with Qdrant, we can retrieve this context and use it to build secure queries.

// VectorSearchService.cs
using Qdrant.Client;
using Qdrant.Client.Grpc;
using SkyApm.Tracing;

public class VectorSearchService
{
    private readonly QdrantClient _qdrantClient;
    private readonly IExitSegmentContextAccessor _contextAccessor;
    private readonly ILogger<VectorSearchService> _logger;

    public VectorSearchService(QdrantClient qdrantClient, IExitSegmentContextAccessor contextAccessor, ILogger<VectorSearchService> logger)
    {
        _qdrantClient = qdrantClient;
        _contextAccessor = contextAccessor;
        _logger = logger;
    }

    public async Task<IReadOnlyList<ScoredPoint>> SearchAsync(string collectionName, float[] queryVector)
    {
        var traceId = _contextAccessor.Context?.TraceId;
        // Retrieve the tenant_id from the baggage.
        // This context has been propagated automatically by SkyWalking.
        var tenantId = _contextAccessor.Context.GetBaggage("tenant_id");

        if (string.IsNullOrEmpty(tenantId))
        {
            // This is a critical security failure. A request that should be tenant-specific
            // has reached the data access layer without a tenant context.
            // This should be logged with high severity and the request should be rejected.
            _logger.LogError(
                "TraceID: {TraceId} - Attempted to perform a search without a tenant_id in the SkyWalking baggage. Denying request.",
                traceId);
            throw new UnauthorizedAccessException("Search operation is not allowed without a valid tenant context.");
        }

        _logger.LogInformation(
            "TraceID: {TraceId} - Performing search for tenant_id '{TenantId}'.",
            traceId, tenantId);

        // Build a filter to enforce tenant isolation at the database level.
        // This ensures that even if there's a bug in our logic, we can't leak data.
        var filter = new Filter
        {
            Must =
            {
                new Condition
                {
                    Field = new FieldCondition
                    {
                        Key = "metadata.tenant_id",
                        Match = new Match { Integer = long.Parse(tenantId) } // Assuming tenant_id is numeric
                    }
                }
            }
        };

        var searchPoints = new SearchPoints
        {
            CollectionName = collectionName,
            Vector = { queryVector },
            Limit = 10,
            Filter = filter,
            WithPayload = true
        };

        try
        {
            var searchResult = await _qdrantClient.SearchAsync(searchPoints);
            return searchResult.Result;
        }
        catch (Exception ex)
        {
            _logger.LogError(ex, "TraceID: {TraceId} - An error occurred while searching Qdrant for tenant {TenantId}.", traceId, tenantId);
            throw;
        }
    }
}

This closes the loop. We now have a system where:

Transport is Secured: Unauthorized services cannot even connect to Qdrant.
Secrets are Managed: All sensitive material is encrypted at rest in Git and managed declaratively.
Context is Propagated: Every request carries an auditable security context.
Access is Enforced: The application logic uses this context to enforce fine-grained data isolation.

When we look at a trace in the SkyWalking UI for a slow or failed request, we don’t just see the service calls and database timings; we see the tenant_id and user_id associated with that exact request, which is invaluable for both debugging and security forensics.

Lingering Issues and Future Iterations

This implementation provides a robust security posture, but it’s not the end of the road. A key limitation is that our tenant isolation logic resides entirely within our ASP.NET Core application. While the mTLS layer prevents unauthorized services from connecting, a severe bug within our VectorSearchService could still potentially bypass the filter logic. To achieve true defense-in-depth, the next evolution would involve leveraging Qdrant Enterprise’s built-in Role-Based Access Control, where we could create per-tenant API keys and have our service use the appropriate key based on the JWT context.

Additionally, our certificate management relies on manually generated certs encrypted with SOPS. A more mature implementation would integrate cert-manager into our Kubernetes cluster to automatically issue and rotate certificates from a CA like Let’s Encrypt or an internal HashiCorp Vault instance. Flux could then be configured to watch the secrets created by cert-manager and trigger rolling updates on our application pods when certificates are renewed. This would eliminate the manual certificate lifecycle management, which is a common source of production outages.

Security ASP.NET Core SkyWalking GitOps Flux CD Qdrant Zero Trust

Implementing Automated Canary Analysis for Android Fleet System Updates Using Spinnaker and Prometheus

2023-10-27 DevOps

Android Puppet Prometheus Spinnaker BDD

Automating Dynamic SCSS Theme Previews with a Go Service and Tekton Pipelines

2023-10-27 DevOps

CI/CD Kubernetes Go Sass/SCSS Tekton Valtio