A security audit on our multi-tenant platform uncovered a critical flaw. Our Scala backend was forwarding search requests to a SolrCloud cluster, and while the application layer performed tenant checks for most operations, the search endpoint was a gaping hole. A sufficiently clever user, or a malicious actor with a compromised low-privilege API key, could construct a raw Solr query that bypassed application-level checks, potentially exposing data from all tenants. The initial implementation was dangerously simple, relying on developers to “do the right thing” in their service logic.
// The vulnerable, initial implementation
import org.apache.solr.client.solrj.impl.HttpSolrClient
import org.apache.solr.client.solrj.SolrQuery
import scala.util.Try
import scala.jdk.CollectionConverters._
// A simplified service showing the direct proxy pattern
class InsecureLegacySearchService(solrClient: HttpSolrClient) {
def search(collection: String, rawQuery: String, start: Int, rows: Int): Try[Seq[Map[String, AnyRef]]] = {
Try {
val solrQuery = new SolrQuery()
solrQuery.setQuery(rawQuery) // Direct pass-through of user input!
solrQuery.setStart(start)
solrQuery.setRows(rows)
val response = solrClient.query(collection, solrQuery)
val docs = response.getResults.asScala.toList
docs.map(_.getFieldValuesMap.asScala.toMap)
}
}
}
The vulnerability here is obvious in retrospect. A request with rawQuery="*:*"
would dump everything, completely ignoring the tenancy model enforced elsewhere in the application.
Our first panicked reaction was to consider parsing the incoming query string within the Scala service. The idea was to inspect the abstract syntax tree of the Lucene query and reject or rewrite anything that seemed suspicious. This path was quickly abandoned. The pitfall here is immense: the Solr query syntax is incredibly rich, supporting nested queries, function queries, local params, and various parsers (edismax
, standard
, etc.). Building a parser in Scala that could safely and correctly handle every possible valid query without introducing new vulnerabilities or breaking legitimate functionality would be more complex than the search feature itself. It’s a classic example of a solution that creates more problems than it solves. In a real-world project, this approach is a maintenance nightmare waiting to happen.
The correct architectural choice was to leverage Solr’s own powerful filtering mechanism but enforce its application from a centralized, non-bypassable layer in our Scala backend. The goal became to ensure that every single query executed against Solr, regardless of its origin within our codebase, would have a mandatory, cryptographically-derived security filter attached. We decided to use Solr’s FilterQuery
(fq
) parameter. A common mistake is to try and append security constraints to the main query (q
) parameter. This is incorrect because it complicates the user’s query logic and messes with relevancy scoring. The fq
parameter is ideal because it acts as a separate, mandatory filter that is applied to the result set. Critically, fq
results are independently cacheable by Solr via its filterCache
, which is a massive performance benefit in a high-traffic system.
Our strategy crystallized: the Scala application would be responsible for authenticating the user, establishing their security context (e.g., their tenant ID, user roles, data access tiers), and then programmatically injecting this context as an inescapable fq
parameter into every Solr query.
Step 1: Schema-Level Security Anchors
Before writing any Scala code, the Solr schema itself had to be prepared to support these security constraints. Every document indexed into Solr must contain the fields our security filter will operate on. For our multi-tenant system, this meant adding a tenant_id
field and a multi-valued access_groups
field to our schema.xml
.
<!-- In managed-schema or schema.xml -->
<!--
The tenant_id is a non-negotiable anchor for data isolation.
It should be indexed but not typically stored unless needed for debugging.
Using a string is flexible, but a more optimized type could be used
if tenant IDs follow a strict pattern.
-->
<field name="tenant_id" type="string" indexed="true" stored="false" required="true" docValues="true" />
<!--
access_groups is multiValued, allowing a document to be visible
to multiple user groups, roles, or tiers within the same tenant.
This provides fine-grained control beyond simple tenancy.
-->
<field name="access_groups" type="string" indexed="true" stored="false" required="false" multiValued="true" docValues="true" />
<!--
Using docValues="true" is a production-grade optimization. It stores the field
values in a column-oriented structure that is highly efficient for faceting,
sorting, and filtering, which is exactly what our security fq will be doing.
-->
With this schema, every document we index must declare its owner (tenant_id
) and visibility scope (access_groups
). This moves the security definition to index time, making query-time enforcement deterministic and fast.
Step 2: Defining a Security Context in Scala
The next step was to create an immutable representation of a user’s permissions within the Scala application. This SecurityContext
would be the single source of truth for constructing the Solr filter. In our system, this context is derived from a validated JWT, but it could come from any trusted source like a session store.
import org.typelevel.log4cats.Logger
import cats.effect.Sync
/**
* Represents the security identity of the caller.
* This is an immutable, trusted object created after successful authentication.
* It's intentionally kept simple and focused on what's needed for data filtering.
*
* @param tenantId The non-negotiable identifier for the tenant.
* @param groups A set of roles, permissions, or access tiers the user belongs to.
*/
case class SecurityContext(tenantId: String, groups: Set[String])
object SecurityContext {
// A factory to create a context from a validated claims set (e.g., from a JWT).
// In a real-world project, this would involve cryptographic validation.
def fromClaims[F[_]: Sync](claims: Map[String, Any])(implicit logger: Logger[F]): F[SecurityContext] = {
val tenantIdOpt = claims.get("tid").collect { case s: String => s }
val groupsOpt = claims.get("groups").collect { case g: java.util.List[String] =>
import scala.jdk.CollectionConverters._
g.asScala.toSet
}
(tenantIdOpt, groupsOpt) match {
case (Some(tid), Some(grps)) if tid.nonEmpty =>
Sync[F].pure(SecurityContext(tid, grps))
case (Some(tid), None) if tid.nonEmpty => // Groups are optional
Sync[F].pure(SecurityContext(tid, Set.empty))
case _ =>
val errorMsg = "Failed to construct SecurityContext: missing or invalid tenantId in claims."
logger.warn(errorMsg) *> Sync[F].raiseError(new SecurityException(errorMsg))
}
}
}
// Custom exception for clear error handling
class SecurityException(message: String) extends RuntimeException(message)
This SecurityContext
is the bridge between our application’s IAM system and Solr’s filtering capabilities. The fromClaims
factory includes basic validation and logging, ensuring that we fail fast and securely if the necessary identity information isn’t present.
Step 3: The Secure Query Interceptor
This is the core of the implementation. We created a SecureSolrQueryService
that wraps the raw Solr client. It exposes a search method that accepts a user-provided SolrQuery
and the mandatory SecurityContext
. Its sole purpose is to inject the security filter query before dispatching the request to Solr.
import org.apache.solr.client.solrj.{SolrClient, SolrQuery}
import org.apache.solr.client.solrj.response.QueryResponse
import cats.effect.Sync
import cats.syntax.all._
import org.typelevel.log4cats.Logger
class SecureSolrQueryService[F[_]: Sync](solrClient: SolrClient)(implicit logger: Logger[F]) {
/**
* Executes a query against a Solr collection, but only after applying
* mandatory document-level security filters derived from the SecurityContext.
*
* @param collection The Solr collection to query.
* @param userQuery The original query from the user.
* @param securityContext The validated security context of the caller.
* @return The Solr query response.
*/
def secureQuery(collection: String, userQuery: SolrQuery, securityContext: SecurityContext): F[QueryResponse] = {
for {
_ <- logger.info(s"Building secure query for tenant '${securityContext.tenantId}'")
secureFq = buildSecurityFilter(securityContext)
_ <- logger.debug(s"Generated security filter: $secureFq")
// Clone the user query to avoid side effects. A common mistake is to mutate the original object.
finalQuery = userQuery.getCopy()
// Add the mandatory security filter. This is the crucial step.
_ = finalQuery.addFilterQuery(secureFq)
// Log final query for debugging, but be careful about logging sensitive info from `q`.
_ <- logger.debug(s"Executing final secure Solr query: ${finalQuery.toQueryString}")
// Execute the query. We wrap the blocking SolrJ call in Sync[F] for effect management.
response <- Sync[F].blocking(solrClient.query(collection, finalQuery))
} yield response
}
/**
* Constructs the Solr filter query (fq) string from the SecurityContext.
* This logic is centralized and critical for security.
*
* The resulting filter must match:
* 1. The document's tenant_id.
* 2. The document's access_groups must contain at least one of the user's groups.
* If the user has no groups, we only match documents with no group requirements.
* A more restrictive policy might be to deny access if the user has no groups
* and the document requires one.
*
* @param ctx The security context.
* @return A valid Solr fq string.
*/
private def buildSecurityFilter(ctx: SecurityContext): String = {
// Escape tenant ID to prevent injection vulnerabilities within the filter itself.
val escapedTenantId = org.apache.solr.client.solrj.util.ClientUtils.escapeQueryChars(ctx.tenantId)
val tenantFilter = s"tenant_id:\"$escapedTenantId\""
val groupFilter = if (ctx.groups.isEmpty) {
// If the user has no groups, they can only see documents that are not restricted to any group.
// This is represented by a document where `access_groups` field does not exist.
// The `(*:* NOT access_groups:*)` is a common Solr idiom for this.
"(*:* NOT access_groups:[* TO *])"
} else {
// If the user has groups, they can see documents that are either public
// (no access_groups) OR have an access_group that matches one of the user's groups.
val escapedGroups = ctx.groups.map(g => s""""${org.apache.solr.client.solrj.util.ClientUtils.escapeQueryChars(g)}"""")
val groupsClause = escapedGroups.mkString(" OR ")
s"""(access_groups:($groupsClause) OR (*:* NOT access_groups:[* TO *]))"""
}
// Combine filters with a mandatory AND clause.
s"+($tenantFilter) +($groupFilter)"
}
}
This service encapsulates the entire security logic. Several key decisions were made here for robustness:
- Immutability: We use
userQuery.getCopy()
to avoid mutating the original query object, preventing unexpected side effects in other parts of the application. - Fail-Safe Logic: The
buildSecurityFilter
method is the heart of the enforcement. It ensures that thetenant_id
is always present and handles the logic for group-based access control. The logic for users with no groups is a critical detail; here, we decided they can only see “public” documents within their tenant. - Input Escaping: A subtle but deadly pitfall is injection within the filter itself. If a
tenantId
orgroup
contained characters like"
or:
, it could break the query syntax. We useClientUtils.escapeQueryChars
to neutralize this threat. - Effect Management: The blocking
solrClient.query
call is wrapped inSync[F].blocking
, making it play nicely within a functional, asynchronous framework like Cats Effect. This prevents thread pool starvation under load.
The generated fq
for a user in tenant acme-corp
with roles editor
and viewer
would look something like this:+ (tenant_id:"acme-corp") + ((access_groups:("editor" OR "viewer")) OR (*:* NOT access_groups:[* TO *]))
This filter is now programmatically and mandatorily added to every query.
Step 4: Architecture Visualization and Integration
The final data flow is clean and secure. A Mermaid diagram illustrates this revised architecture:
sequenceDiagram participant Client participant ScalaService as Scala Service (e.g., Akka HTTP / http4s) participant Auth as Authentication Middleware participant SecureSolr as SecureSolrQueryService participant Solr Client->>+ScalaService: GET /search?q=laptops ScalaService->>+Auth: Validate JWT Auth-->>-ScalaService: Return SecurityContext(tenantId="acme-corp", groups={"editor"}) ScalaService->>+SecureSolr: secureQuery(userQuery, securityContext) Note over SecureSolr: Original q="laptops" Note over SecureSolr: Builds fq="+tenant_id:\"acme-corp\" ..." Note over SecureSolr: Adds fq to a copy of userQuery SecureSolr->>+Solr: query(q="laptops", fq="+tenant_id:...", ...) Solr-->>-SecureSolr: QueryResponse (only acme-corp documents) SecureSolr-->>-ScalaService: Return QueryResponse ScalaService-->>-Client: 200 OK (JSON results)
This diagram shows how the SecureSolrQueryService
acts as a chokepoint. No query can reach Solr without passing through this security enrichment step.
Step 5: Rigorous Integration Testing
Trusting this logic without testing is professional negligence. We used Testcontainers to spin up an ephemeral Solr instance for our integration tests, allowing us to validate the end-to-end behavior in a hermetic environment.
// A conceptual test using MUnit and Testcontainers-scala
import com.dimafeng.testcontainers.{ForAllTestContainer, SolrContainer}
import org.apache.solr.client.solrj.impl.HttpSolrClient
import org.apache.solr.common.SolrInputDocument
class SecureSolrQueryServiceSpec extends munit.FunSuite with ForAllTestContainer {
override val container: SolrContainer = SolrContainer("solr:8.11")
// Test setup: create collection and index documents for different tenants
def setupTestData(client: SolrClient): Unit = {
// Tenant A docs
val doc1 = new SolrInputDocument()
doc1.setField("id", "tA_doc1")
doc1.setField("tenant_id", "tenant-A")
doc1.setField("content_t", "This is a public document in tenant A")
client.add("test_collection", doc1)
// Tenant A doc with group restriction
val doc2 = new SolrInputDocument()
doc2.setField("id", "tA_doc2")
doc2.setField("tenant_id", "tenant-A")
doc2.setField("access_groups", "admin")
doc2.setField("content_t", "This is an admin document in tenant A")
client.add("test_collection", doc2)
// Tenant B docs
val doc3 = new SolrInputDocument()
doc3.setField("id", "tB_doc1")
doc3.setField("tenant_id", "tenant-B")
doc3.setField("content_t", "This is a document in tenant B")
client.add("test_collection", doc3)
client.commit("test_collection")
}
test("A user from tenant A should only see documents from tenant A") {
val solrClient = new HttpSolrClient.Builder(s"http://${container.host}:${container.solrPort}/solr").build()
setupTestData(solrClient) // In a real suite, this would be in a beforeAll
val service = new SecureSolrQueryService[cats.effect.IO](solrClient)(/*... logger ...*/)
val tenantAContext = SecurityContext("tenant-A", Set.empty)
val queryAll = new SolrQuery("*:*")
val resultsIO = service.secureQuery("test_collection", queryAll, tenantAContext)
val results = resultsIO.unsafeRunSync() // For testing simplicity
assertEquals(results.getResults.getNumFound, 1L)
assertEquals(results.getResults.get(0).getFieldValue("id"), "tA_doc1")
}
test("An admin user from tenant A should see both public and admin docs") {
val solrClient = new HttpSolrClient.Builder(s"http://${container.host}:${container.solrPort}/solr").build()
// Assuming setupTestData has been run
val service = new SecureSolrQueryService[cats.effect.IO](solrClient)(/*... logger ...*/)
val tenantAAdminContext = SecurityContext("tenant-A", Set("admin"))
val queryAll = new SolrQuery("*:*")
val results = service.secureQuery("test_collection", queryAll, tenantAAdminContext).unsafeRunSync()
assertEquals(results.getResults.getNumFound, 2L)
val ids = results.getResults.asScala.map(_.getFieldValue("id").toString).toSet
assertEquals(ids, Set("tA_doc1", "tA_doc2"))
}
test("A query must be rejected if context is invalid (conceptual)") {
// This test would be at the HTTP layer, ensuring the SecurityContext.fromClaims
// failure propagates to a 401/403 response, preventing any Solr query.
// ... test logic here ...
}
}
These tests prove that the security filter works as designed, correctly isolating data between tenants and respecting group-based permissions. The tests are a crucial part of the development process, not an afterthought.
This architecture has proven to be robust and performant. The security logic is centralized, testable, and completely decoupled from the individual business features that need to perform searches. However, the solution is not without its limitations. The complexity of the buildSecurityFilter
method will grow as more authorization dimensions are added (e.g., project IDs, data sensitivity labels). This can make the filter logic harder to reason about and maintain. Furthermore, this approach puts the onus of creating the SecurityContext
correctly on the application layer. Any bug in the authentication middleware that creates an incorrect context could lead to a security breach, although the mandatory tenant isolation provides a strong baseline of protection. For future iterations, we are exploring the possibility of using an external policy engine like Open Policy Agent (OPA) to generate the Solr filter fragments, which would further decouple the authorization logic from our Scala codebase.