The developer productivity metric was in the red. A simple task, like finding the authoritative guide for setting up a legacy microservice, could consume an entire morning. Our internal documentation was scattered across Confluence, SharePoint, and thousands of Git repository READMEs. Keyword search was failing us; developers needed answers, not just a list of documents containing a specific term. This pain point, discussed repeatedly in our Scrum retrospectives, became the catalyst for a new internal tool: a semantic search engine for our entire engineering knowledge base.
Our initial concept during the first sprint was to enhance our existing Elasticsearch setup. The idea was to just improve indexing. It quickly became apparent this was a dead end. We needed to understand the intent behind a query like “how to handle payment transaction rollbacks” and match it to conceptually similar content, even if the keywords didn’t align perfectly. This is the domain of vector search.
The technology selection process was contentious. While some advocated for a managed cloud service, our organization’s infrastructure is strictly on-premise, managed entirely by a mature, if somewhat rigid, Puppet ecosystem. This immediately ruled out solutions that didn’t offer a clean, self-hostable deployment path. We landed on Qdrant. Its performance benchmarks were impressive, but the deciding factors were its straightforward, single-binary deployment model and its robust filtering capabilities, which we anticipated needing for access control. For securing the service, JWT was the obvious, non-negotiable standard within our company. The real challenge wasn’t picking the components, but integrating these modern tools into our established, Puppet-driven infrastructure.
The Foundation: Managing Qdrant with Puppet
The first hurdle was getting a production-ready Qdrant instance provisioned declaratively. In our environment, “production-ready” means managed by Puppet, with configuration, service lifecycle, and data persistence all defined as code. A common mistake is to just exec
a docker run
command in Puppet. This is not idempotent and offers poor state management. A more robust approach involves using Puppet to manage the Docker container as a systemd service.
This required a custom Puppet class. The goal was to manage the Qdrant configuration file, the data directory, and the systemd service unit that would run the container.
Here is the core Puppet class, profile::qdrant_server
, that we developed.
# Class: profile::qdrant_server
#
# Manages a Qdrant vector database instance running in Docker.
#
# @param ensure The state of the service (e.g., 'running', 'stopped').
# @param enable Whether the service should be enabled at boot.
# @param docker_image The full Docker image name for Qdrant.
# @param data_dir The host path for persistent Qdrant data.
# @param config_dir The host path for Qdrant configuration.
# @param service_api_key The API key for securing the Qdrant instance.
# @param log_level The logging level for the Qdrant instance.
#
class profile::qdrant_server (
String $ensure = 'running',
Boolean $enable = true,
String $docker_image = 'qdrant/qdrant:v1.6.0',
Stdlib::Absolutepath $data_dir = '/opt/qdrant/data',
Stdlib::Absolutepath $config_dir = '/etc/qdrant',
String $service_api_key = '', # This should be sourced from Hiera/Secrets management
Enum['error', 'warn', 'info', 'debug', 'trace'] $log_level = 'info',
) {
# Ensure dependencies are managed. In a real-world project, this would be more robust.
# We assume the 'docker' module is available and has set up the Docker daemon.
# We also assume a user/group for running the service exists.
require docker
# 1. Manage the configuration and data directories on the host.
# Puppet ensures these directories exist with the correct permissions before
# attempting to mount them into the container.
file { $data_dir:
ensure => directory,
owner => 'qdrant',
group => 'qdrant',
mode => '0750',
}
file { $config_dir:
ensure => directory,
owner => 'root',
group => 'root',
mode => '0755',
}
# 2. Template out the Qdrant configuration file.
# This allows us to manage Qdrant's internal settings directly from Puppet.
# The API key is critical for production security.
file { "${config_dir}/config.yaml":
ensure => file,
owner => 'root',
group => 'root',
mode => '0644',
content => template('profile/qdrant/config.yaml.erb'),
notify => Service['qdrant'], # Restart the service if the config changes.
}
# 3. Manage the systemd service unit file.
# This defines how systemd will start, stop, and manage the Qdrant Docker container.
# Using systemd gives us robust lifecycle management, automatic restarts, and logging integration.
file { '/etc/systemd/system/qdrant.service':
ensure => file,
owner => 'root',
group => 'root',
mode => '0644',
content => template('profile/qdrant/qdrant.service.erb'),
notify => Service['qdrant'],
}
# 4. Manage the service itself.
# This is the core resource that ensures the Qdrant container is running.
# It depends on the config and systemd unit file being in place.
service { 'qdrant':
ensure => $ensure,
enable => $enable,
hasstatus => true,
provider => 'systemd',
subscribe => File['/etc/systemd/system/qdrant.service'],
}
}
The accompanying ERB templates are critical. The configuration template injects the API key and other settings directly from Puppet variables, which in a real environment would be securely sourced from Hiera.
Template templates/profile/qdrant/config.yaml.erb
:
# This file is managed by Puppet.
#
service:
# The API key is essential for preventing unauthorized access.
api_key: "<%= @service_api_key %>"
storage:
# Configuration for storage and indexing
storage_path: "/qdrant/storage"
snapshots_path: "/qdrant/snapshots"
on_disk_payload: true
log_level: "<%= @log_level %>"
Template templates/profile/qdrant/qdrant.service.erb
:
# This file is managed by Puppet.
#
[Unit]
Description=Qdrant Vector Database
After=docker.service
Requires=docker.service
[Service]
TimeoutStartSec=0
Restart=always
ExecStartPre=-/usr/bin/docker stop qdrant
ExecStartPre=-/usr/bin/docker rm qdrant
ExecStartPre=/usr/bin/docker pull <%= @docker_image %>
ExecStart=/usr/bin/docker run --rm \
--name qdrant \
-p 6333:6333 \
-p 6334:6334 \
-v <%= @data_dir %>:/qdrant/storage \
-v <%= @config_dir %>/config.yaml:/qdrant/config/production.yaml \
<%= @docker_image %>
[Install]
WantedBy=multi-user.target
This Puppet code gave us a reliable, repeatable way to deploy and configure Qdrant. The pitfall here is managing Docker image updates; ExecStartPre=/usr/bin/docker pull ...
ensures the latest specified version is used on service restart, which is a simple but effective strategy for our needs. More complex canary deployments would require a more sophisticated orchestration layer.
The Brains: A JWT-Secured Backend API
With the database layer handled, the next sprint focused on the API that would sit between the frontend and Qdrant. We chose FastAPI (Python) for its performance and ease of use. This service had three core responsibilities:
- Secure endpoints using JWT.
- Accept a natural language query from the frontend.
- Convert the query to a vector embedding and search Qdrant.
Securing the API was paramount. We defined a JWT structure that included not just user identity but also their team and role, which would be used for filtering search results later.
# file: app/security.py
import os
from datetime import datetime, timedelta
from typing import Optional, Dict, Any
import jwt
from fastapi import Depends, HTTPException, status
from fastapi.security import OAuth2PasswordBearer
# In a real project, these should be loaded from a secure config service.
SECRET_KEY = os.environ.get("JWT_SECRET_KEY", "a-very-secret-key")
ALGORITHM = "HS256"
ACCESS_TOKEN_EXPIRE_MINUTES = 60
oauth2_scheme = OAuth2PasswordBearer(tokenUrl="token")
def create_access_token(data: dict, expires_delta: Optional[timedelta] = None):
to_encode = data.copy()
if expires_delta:
expire = datetime.utcnow() + expires_delta
else:
expire = datetime.utcnow() + timedelta(minutes=ACCESS_TOKEN_EXPIRE_MINUTES)
to_encode.update({"exp": expire})
encoded_jwt = jwt.encode(to_encode, SECRET_KEY, algorithm=ALGORITHM)
return encoded_jwt
async def get_current_user(token: str = Depends(oauth2_scheme)) -> Dict[str, Any]:
"""
Dependency to decode JWT and extract user payload.
This would be called for any protected endpoint.
"""
credentials_exception = HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="Could not validate credentials",
headers={"WWW-Authenticate": "Bearer"},
)
try:
payload = jwt.decode(token, SECRET_KEY, algorithms=[ALGORITHM])
username: str = payload.get("sub")
# You can also extract custom claims like roles or teams
user_role: str = payload.get("role")
if username is None or user_role is None:
raise credentials_exception
# In a real-world project, you'd likely fetch a full user object from a DB.
# For our purposes, the payload is sufficient.
return {"username": username, "role": user_role}
except jwt.PyJWTError:
raise credentials_exception
The search endpoint integrated the sentence-transformers
library for creating embeddings and the qdrant-client
for interacting with our Puppet-managed instance. A common mistake is to instantiate the embedding model or the Qdrant client on every request. This is incredibly inefficient. The correct approach is to initialize them once at application startup.
# file: app/main.py
import os
import logging
from functools import lru_cache
from fastapi import FastAPI, Depends, HTTPException, status
from pydantic import BaseModel
from qdrant_client import QdrantClient, models
from sentence_transformers import SentenceTransformer
from app.security import get_current_user
# --- Application Setup ---
# Configure logging for production visibility
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
logger = logging.getLogger(__name__)
app = FastAPI()
# --- Model and Client Initialization ---
# Using lru_cache is a simple way to ensure these expensive objects
# are created only once.
@lru_cache(maxsize=1)
def get_embedding_model():
logger.info("Loading sentence-transformer model...")
# Using a pre-trained model. A key future iteration is to fine-tune this on our own code/docs.
model = SentenceTransformer('all-MiniLM-L6-v2')
logger.info("Model loaded successfully.")
return model
@lru_cache(maxsize=1)
def get_qdrant_client():
qdrant_host = os.environ.get("QDRANT_HOST", "localhost")
qdrant_api_key = os.environ.get("QDRANT_API_KEY") # This MUST match the key in Puppet
if not qdrant_api_key:
logger.error("QDRANT_API_KEY environment variable not set.")
# Fail fast if security is not configured.
raise ValueError("QDRANT_API_KEY must be provided.")
logger.info(f"Connecting to Qdrant at {qdrant_host}...")
client = QdrantClient(host=qdrant_host, port=6333, api_key=qdrant_api_key)
return client
# --- API Models ---
class SearchQuery(BaseModel):
query: str
top_k: int = 5
class SearchResult(BaseModel):
id: str
score: float
document_url: str
title: str
snippet: str
# --- API Endpoints ---
@app.post("/search", response_model=list[SearchResult])
async def search_documents(
query: SearchQuery,
current_user: dict = Depends(get_current_user)
):
"""
Performs semantic search on the internal knowledge base.
"""
model = get_embedding_model()
qdrant = get_qdrant_client()
collection_name = "internal_docs"
try:
# 1. Generate query vector
query_vector = model.encode(query.query).tolist()
# 2. Build search filter based on user role from JWT
# This is where the integration of JWT and Qdrant shines.
# We can restrict search results based on document permissions.
user_role = current_user.get("role", "developer")
search_filter = models.Filter(
must=[
models.FieldCondition(
key="accessible_to",
match=models.MatchValue(value=user_role)
)
]
)
# 3. Perform the search against Qdrant
hits = qdrant.search(
collection_name=collection_name,
query_vector=query_vector,
query_filter=search_filter,
limit=query.top_k,
with_payload=True # Include the document metadata in the results
)
# 4. Format the results for the frontend
results = []
for hit in hits:
payload = hit.payload or {}
results.append(
SearchResult(
id=str(hit.id),
score=hit.score,
document_url=payload.get("url", ""),
title=payload.get("title", "No Title"),
snippet=payload.get("text", "")[:200] + "..." # Generate a snippet
)
)
return results
except Exception as e:
logger.error(f"Search failed for user {current_user['username']}: {e}", exc_info=True)
raise HTTPException(
status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
detail="An error occurred during the search process."
)
# Simple health check endpoint
@app.get("/health")
def health_check():
return {"status": "ok"}
This API structure is lean but production-grade. It includes security, proper resource management, error handling, and logging. The integration of role-based access control directly into the Qdrant query filter was a key architectural decision that simplified our business logic significantly.
The Face: A Functional Frontend
The final piece of the puzzle, tackled in a parallel sprint, was the frontend. We used React with TypeScript. The core challenge wasn’t complex state management but rather creating a seamless and responsive user experience while handling authentication correctly.
The JWT flow is standard: on login, the token is received and stored securely (e.g., in an HttpOnly cookie managed by the auth service, or in memory). For every request to our search API, this token must be attached as an Authorization
header.
Here’s a simplified view of the search component and the API service that powers it.
// file: src/services/api.ts
import axios from 'axios';
// A single, configured Axios instance is better than scattering fetch() calls.
const apiClient = axios.create({
baseURL: process.env.REACT_APP_API_URL || 'http://localhost:8000',
});
// Interceptor to attach the JWT to every outgoing request.
apiClient.interceptors.request.use(
(config) => {
// In a real app, the token would be retrieved from a state manager or secure storage.
const token = localStorage.getItem('authToken');
if (token) {
config.headers.Authorization = `Bearer ${token}`;
}
return config;
},
(error) => {
return Promise.reject(error);
}
);
// Define types for our API communication
interface SearchPayload {
query: string;
top_k?: number;
}
export interface SearchResult {
id: string;
score: number;
document_url: string;
title: string;
snippet: string;
}
// The actual API call function
export const performSearch = async (payload: SearchPayload): Promise<SearchResult[]> => {
try {
const response = await apiClient.post<SearchResult[]>('/search', payload);
return response.data;
} catch (error) {
// A robust implementation would have more nuanced error handling,
// especially for 401 Unauthorized errors (e.g., triggering a re-login).
if (axios.isAxiosError(error) && error.response?.status === 401) {
console.error("Authentication error. Please log in again.");
// Redirect to login or refresh token
} else {
console.error("An unexpected error occurred:", error);
}
throw error; // Re-throw to be caught by the component
}
};
The React component uses this service to fetch and display data. We focused on providing immediate feedback to the user (loading states) and handling potential API errors gracefully.
// file: src/components/SearchComponent.tsx
import React, { useState, FormEvent } from 'react';
import { performSearch, SearchResult } from '../services/api';
const SearchComponent: React.FC = () => {
const [query, setQuery] = useState('');
const [results, setResults] = useState<SearchResult[]>([]);
const [isLoading, setIsLoading] = useState(false);
const [error, setError] = useState<string | null>(null);
const handleSearch = async (e: FormEvent) => {
e.preventDefault();
if (!query.trim()) return;
setIsLoading(true);
setError(null);
setResults([]);
try {
const data = await performSearch({ query, top_k: 10 });
setResults(data);
} catch (err) {
setError('Failed to fetch search results. Please try again later.');
} finally {
setIsLoading(false);
}
};
return (
<div className="search-container">
<form onSubmit={handleSearch}>
<input
type="text"
value={query}
onChange={(e) => setQuery(e.target.value)}
placeholder="Ask about our architecture, code, or processes..."
disabled={isLoading}
/>
<button type="submit" disabled={isLoading}>
{isLoading ? 'Searching...' : 'Search'}
</button>
</form>
{error && <div className="error-message">{error}</div>}
<div className="results-list">
{results.map((result) => (
<div key={result.id} className="result-item">
<h3>
<a href={result.document_url} target="_blank" rel="noopener noreferrer">
{result.title}
</a>
<span className="score">(Score: {result.score.toFixed(2)})</span>
</h3>
<p>{result.snippet}</p>
</div>
))}
</div>
</div>
);
};
export default SearchComponent;
This combination of a typed API service and a functional component provides a solid, maintainable foundation. The Scrum process was invaluable here; early feedback from our pilot group during a sprint review led us to add the result scores and snippets, which weren’t in the initial design but proved essential for users to gauge result relevance quickly.
System Architecture Overview
After several sprints of development, testing, and refinement, the final architecture was clear and effective for our specific constraints.
graph TD A[Frontend Developer on Browser] -- HTTPS --> B{NGINX Reverse Proxy}; B -- 1. Search Query (with JWT) --> C[FastAPI Search Service]; C -- 2. Validate JWT --> D[Auth Service / IdP]; C -- 3. Query to Vector --> E[SentenceTransformer Model]; C -- 4. Vector Search (with API Key) --> F[Qdrant Instance]; F -- Manages --> G[On-Disk Vector/Payload Data]; subgraph Puppet Managed Host C; F; G; end H[Puppet Master] -- Applies Catalog --> I[Target Node]; subgraph Target Node I -- Configures --> C; I -- Configures & Runs --> F; end style F fill:#f9f,stroke:#333,stroke-width:2px; style C fill:#ccf,stroke:#333,stroke-width:2px; style H fill:#f0c674,stroke:#333,stroke-width:2px;
This solution isn’t perfect, but it’s a significant leap forward. The data ingestion pipeline remains a series of batch jobs, which introduces latency between a document’s creation and its appearance in search results. The next major effort will be to build a real-time ingestion pipeline using CDC or webhooks. Furthermore, our Puppet-based deployment is robust but static. As usage grows, we may face scaling challenges that would be more elegantly solved in a Kubernetes environment with Horizontal Pod Autoscaling, but migrating our core infrastructure is a much larger strategic decision. The current embedding model is also generic; fine-tuning a model on our internal corpus is the logical next step for improving relevance.