Propagating SAML Assertions to a gRPC-Go Service Mesh via a Custom Token Service on Docker Swarm

Backend Architecture

Word Count: 3.1k

Read Times: 19 Min

The mandate from the security team was unequivocal: all internal administrative tools must authenticate against the corporate SAML Identity Provider (IdP). This presented an immediate architectural conflict with our existing infrastructure. Our internal ecosystem consisted of dozens of high-throughput gRPC-Go microservices, orchestrated via Docker Swarm on a fleet of GCP Compute Engine instances. These services communicated directly, server-to-server, with no concept of a user session or browser.

SAML’s redirect-based web flow is fundamentally designed for user-interactive sessions. A client is bounced between a Service Provider (SP) and an IdP to establish trust, culminating in a digitally signed XML assertion posted back to the SP. This model breaks down completely for programmatic gRPC clients or scripts needing to interact with the service mesh. A direct SAML validation on every gRPC call would be computationally expensive and introduce unacceptable latency. The core problem was bridging the world of enterprise web SSO with a modern, high-performance internal RPC framework.

Our initial concept was to build a security gateway, a central point of contact that would handle the SAML flow and then pass requests inward. However, this would create a bottleneck and a single point of failure. A more distributed approach was needed. We settled on designing a dedicated, lightweight “SAML-to-Internal-Token” translation service. The flow would be as follows:

A human operator authenticates against the corporate IdP via a simple, protected web UI fronting our new token service.
The token service acts as a SAML SP, receives the SAML assertion, and validates it.
Upon successful validation, it extracts the user’s identity (e.g., email, group memberships) from the assertion.
It then mints a short-lived, internally-scoped JWT, embedding this identity information as claims.
This JWT is returned to the operator, who can then use it as a bearer token in the metadata of their gRPC client calls.
Every gRPC service in the mesh would be equipped with a standard interceptor to validate this internal JWT before processing any request.

This architecture decouples the complex, stateful SAML dance from the stateless, high-performance gRPC interactions. The SAML flow happens once, and the resulting lightweight token is used for a configurable period, drastically reducing the authentication overhead on the backend services.

The SAML Service Provider and Token Minting Service

The heart of the solution is the Go service that functions as both a SAML Service Provider and a JWT issuer. We chose the crewjam/saml library for its robustness in handling the intricacies of SAML XML processing and the golang-jwt/jwt/v4 library for token creation.

A critical part of any SAML integration is the metadata. Our SP service needs to expose its metadata for the corporate IdP to consume and must itself consume the IdP’s metadata to verify incoming assertions.

Here is the core configuration and initialization for the SAML SP component. This code would live within the main package of our token service.

// file: internal/saml/provider.go

package saml

import (
	"context"
	"crypto/rsa"
	"crypto/tls"
	"crypto/x509"
	"encoding/pem"
	"fmt"
	"io/ioutil"
	"net/http"
	"net/url"
	"time"

	"github.com/crewjam/saml/samlsp"
	"github.com/sirupsen/logrus"
)

// Config holds all necessary configuration for the SAML Service Provider.
type Config struct {
	// SpBaseURL is the base URL of our service provider (e.g., https://auth.internal.corp.com)
	SpBaseURL string
	// IdpMetadataURL is the URL to the Identity Provider's metadata XML.
	IdpMetadataURL string
	// SpCertPath is the file path to the SP's public certificate.
	SpCertPath string
	// SpKeyPath is the file path to the SP's private key.
	SpKeyPath string
}

// NewSAMLProvider configures and returns a samlsp.Middleware instance.
// This middleware handles the entire SAML flow (redirects, assertion consumption).
func NewSAMLProvider(cfg Config, logger *logrus.Logger) (*samlsp.Middleware, error) {
	// Load the SP's key pair. This is used to sign requests and decrypt assertions.
	// In a real-world project, these paths should be loaded from secure storage,
	// not hardcoded. For Docker Swarm, these files will be mounted as secrets.
	spKey, err := loadRSAPrivateKey(cfg.SpKeyPath)
	if err != nil {
		return nil, fmt.Errorf("failed to load SP private key: %w", err)
	}

	spCert, err := loadX509Certificate(cfg.SpCertPath)
	if err != nil {
		return nil, fmt.Errorf("failed to load SP certificate: %w", err)
	}

	// Fetch the IdP metadata. A common pitfall here is network connectivity
	// from your service to the IdP's metadata URL. It's often better to cache this
	// metadata locally and refresh it periodically.
	idpMetadataURL, err := url.Parse(cfg.IdpMetadataURL)
	if err != nil {
		return nil, fmt.Errorf("invalid IdP metadata URL: %w", err)
	}

	// We create a custom HTTP client to allow for potential self-signed certs in dev
	// or specific trust configurations in production.
	httpClient := &http.Client{
		Timeout: 30 * time.Second,
		Transport: &http.Transport{
			TLSClientConfig: &tls.Config{InsecureSkipVerify: false}, // Set to true only in trusted dev environments
		},
	}

	rootURL, err := url.Parse(cfg.SpBaseURL)
	if err != nil {
		return nil, fmt.Errorf("invalid SP base URL: %w", err)
	}

	// The core SAML middleware setup.
	samlSP, err := samlsp.New(samlsp.Options{
		URL:               *rootURL,
		Key:               spKey,
		Certificate:       spCert,
		IDPMetadataURL:    idpMetadataURL,
		HTTPClient:        httpClient,
		AllowIDPInitiated: true, // Allow logins initiated from the IdP dashboard.
		SignRequest:       true, // Security best practice: always sign authentication requests.
	})
	if err != nil {
		return nil, fmt.Errorf("failed to create SAML SP middleware: %w", err)
	}

	// The Session provider is crucial. It dictates how the user's authenticated
	// state is maintained after the SAML dance. Here, we use a simple in-memory
	// store for demonstration, but a production system would use a distributed
	// cache like Redis or Memcached.
	samlSP.Session = &samlsp.CookieSessionProvider{
		Name:     "auth_session_token",
		Domain:   rootURL.Hostname(),
		Secure:   true,
		HTTPOnly: true,
		MaxAge:   time.Hour * 1, // Session cookie lifetime
	}

	logger.Info("SAML Service Provider initialized successfully")
	return samlSP, nil
}

func loadRSAPrivateKey(path string) (*rsa.PrivateKey, error) {
	keyData, err := ioutil.ReadFile(path)
	if err != nil {
		return nil, err
	}
	block, _ := pem.Decode(keyData)
	if block == nil {
		return nil, fmt.Errorf("failed to decode PEM block from %s", path)
	}
	key, err := x509.ParsePKCS1PrivateKey(block.Bytes)
	if err != nil {
		// Try parsing as PKCS8
		pk, err2 := x509.ParsePKCS8PrivateKey(block.Bytes)
		if err2 != nil {
			return nil, fmt.Errorf("failed to parse private key: %v / %v", err, err2)
		}
		rsaKey, ok := pk.(*rsa.PrivateKey)
		if !ok {
			return nil, fmt.Errorf("key is not an RSA private key")
		}
		return rsaKey, nil
	}
	return key, nil
}

func loadX509Certificate(path string) (*x509.Certificate, error) {
	certData, err := ioutil.ReadFile(path)
	if err != nil {
		return nil, err
	}
	block, _ := pem.Decode(certData)
	if block == nil {
		return nil, fmt.Errorf("failed to decode PEM block from %s", path)
	}
	cert, err := x509.ParseCertificate(block.Bytes)
	if err != nil {
		return nil, err
	}
	return cert, nil
}

With the SAML provider configured, we need an HTTP handler that it can protect. This handler, the Assertion Consumer Service (ACS), is the destination for the IdP’s POST request. Once samlsp validates the assertion, it populates the request context with session data. Our handler’s job is to extract this data and mint the internal JWT.

// file: internal/token/issuer.go

package token

import (
	"fmt"
	"net/http"
	"time"

	"github.com/crewjam/saml/samlsp"
	"github.com/golang-jwt/jwt/v4"
	"github.com/sirupsen/logrus"
)

// JWTIssuer is responsible for creating and signing internal JWTs.
type JWTIssuer struct {
	signingKey    []byte
	issuer        string
	audience      string
	tokenLifetime time.Duration
	logger        *logrus.Logger
}

// NewJWTIssuer creates a new token issuer.
// signingKey should be a securely generated and managed secret.
func NewJWTIssuer(signingKey, issuer, audience string, lifetime time.Duration, logger *logrus.Logger) (*JWTIssuer, error) {
	if len(signingKey) < 32 {
		return nil, fmt.Errorf("signing key must be at least 32 bytes long")
	}
	return &JWTIssuer{
		signingKey:    []byte(signingKey),
		issuer:        issuer,
		audience:      audience,
		tokenLifetime: lifetime,
		logger:        logger,
	}, nil
}

// Claims defines the structure of our internal JWT.
type Claims struct {
	Email  string   `json:"email"`
	Groups []string `json:"groups"`
	jwt.RegisteredClaims
}

// TokenIssuingHandler is the HTTP handler that is invoked after successful SAML authentication.
func (j *JWTIssuer) TokenIssuingHandler() http.HandlerFunc {
	return func(w http.ResponseWriter, r *http.Request) {
		// The samlsp.Middleware populates the context with the session claims.
		session := samlsp.SessionFromContext(r.Context())
		if session == nil {
			j.logger.Error("Failed to get session from context after SAML auth")
			http.Error(w, "Forbidden", http.StatusForbidden)
			return
		}

		sessionClaims, ok := session.(samlsp.SessionWithAttributes)
		if !ok {
			j.logger.Error("Session does not contain attributes")
			http.Error(w, "Forbidden", http.StatusForbidden)
			return
		}

		// A critical step is mapping SAML attributes to our internal claims.
		// The attribute names (e.g., "email", "memberOf") are defined by the IdP.
		// A common mistake is misspelling these or not requesting them in the SP config.
		email := sessionClaims.GetAttributes().Get("email")
		if email == "" {
			j.logger.Error("SAML assertion missing 'email' attribute")
			http.Error(w, "Incomplete user profile from IdP", http.StatusBadRequest)
			return
		}
		groups := sessionClaims.GetAttributes().GetAll("memberOf")

		j.logger.WithFields(logrus.Fields{
			"email":  email,
			"groups": len(groups),
		}).Info("SAML assertion processed, minting internal JWT")

		// Create the JWT claims.
		expirationTime := time.Now().Add(j.tokenLifetime)
		claims := &Claims{
			Email:  email,
			Groups: groups,
			RegisteredClaims: jwt.RegisteredClaims{
				ExpiresAt: jwt.NewNumericDate(expirationTime),
				IssuedAt:  jwt.NewNumericDate(time.Now()),
				Issuer:    j.issuer,
				Audience:  jwt.ClaimStrings{j.audience},
				Subject:   email,
			},
		}

		// Create and sign the token.
		token := jwt.NewWithClaims(jwt.SigningMethodHS256, claims)
		tokenString, err := token.SignedString(j.signingKey)
		if err != nil {
			j.logger.WithError(err).Error("Failed to sign internal JWT")
			http.Error(w, "Internal Server Error", http.StatusInternalServerError)
			return
		}

		// Return the token to the user, typically in a simple JSON response.
		// A front-end CLI or UI can then store and use this token.
		w.Header().Set("Content-Type", "application/json")
		fmt.Fprintf(w, `{"internal_api_token":"%s", "expires_at":"%s"}`, tokenString, expirationTime.Format(time.RFC3339))
	}
}

Securing gRPC Services with a JWT Interceptor

With the token service in place, the next step is to enforce authentication on every gRPC service in the mesh. Doing this manually in every RPC handler would be repetitive and error-prone. The correct approach is to use a gRPC unary interceptor.

This interceptor runs before the actual RPC handler. Its job is to extract the JWT from the request metadata, validate it, and inject the authenticated user’s identity into the request’s context for the handler to use.

// file: internal/auth/interceptor.go

package auth

import (
	"context"
	"fmt"
	"strings"

	"github.com/golang-jwt/jwt/v4"
	"github.com/sirupsen/logrus"
	"google.golang.org/grpc"
	"google.golang.org/grpc/codes"
	"google.golang.org/grpc/metadata"
	"google.golang.org/grpc/status"
)

// AuthenticatedUser represents the identity extracted from a valid token.
type AuthenticatedUser struct {
	Email  string
	Groups []string
}

// userContextKey is a private type to prevent context key collisions.
type userContextKey string

const authenticatedUserKey userContextKey = "authenticatedUser"

// JWTValidator contains the configuration needed to validate tokens.
type JWTValidator struct {
	signingKey []byte
	audience   string
	issuer     string
	logger     *logrus.Logger
}

// NewJWTValidator creates a validator. It must use the same key, audience,
// and issuer as the JWTIssuer.
func NewJWTValidator(signingKey, audience, issuer string, logger *logrus.Logger) (*JWTValidator, error) {
	if len(signingKey) < 32 {
		return nil, fmt.Errorf("signing key must be at least 32 bytes long")
	}
	return &JWTValidator{
		signingKey: []byte(signingKey),
		audience:   audience,
		issuer:     issuer,
		logger:     logger,
	}, nil
}

// AuthInterceptor is the gRPC unary interceptor.
func (v *JWTValidator) AuthInterceptor() grpc.UnaryServerInterceptor {
	return func(ctx context.Context, req interface{}, info *grpc.UnaryServerInfo, handler grpc.UnaryHandler) (interface{}, error) {
		// The pitfall here is that some methods might not require authentication
		// (e.g., a health check). The interceptor must be able to bypass them.
		// A real implementation would have a list of public methods.
		if info.FullMethod == "/grpc.health.v1.Health/Check" {
			return handler(ctx, req)
		}

		// Extract token from metadata. The convention is 'authorization: Bearer <token>'.
		md, ok := metadata.FromIncomingContext(ctx)
		if !ok {
			return nil, status.Error(codes.Unauthenticated, "metadata is not provided")
		}

		authHeaders := md.Get("authorization")
		if len(authHeaders) == 0 {
			return nil, status.Error(codes.Unauthenticated, "authorization header is not provided")
		}

		parts := strings.Split(authHeaders[0], " ")
		if len(parts) != 2 || strings.ToLower(parts[0]) != "bearer" {
			return nil, status.Error(codes.Unauthenticated, "invalid authorization header format")
		}
		tokenString := parts[1]

		// Parse and validate the token.
		claims := &Claims{}
		token, err := jwt.ParseWithClaims(tokenString, claims, func(token *jwt.Token) (interface{}, error) {
			// A crucial security check: ensure the signing algorithm is what you expect.
			if _, ok := token.Method.(*jwt.SigningMethodHMAC); !ok {
				return nil, fmt.Errorf("unexpected signing method: %v", token.Header["alg"])
			}
			return v.signingKey, nil
		})

		if err != nil {
			v.logger.WithError(err).Warn("JWT validation failed")
			return nil, status.Error(codes.Unauthenticated, "invalid token")
		}

		if !token.Valid {
			return nil, status.Error(codes.Unauthenticated, "token is not valid")
		}
		
		// Additional claims validation.
		if !claims.VerifyAudience(v.audience, true) {
			return nil, status.Errorf(codes.Unauthenticated, "invalid token audience")
		}
		if !claims.VerifyIssuer(v.issuer, true) {
			return nil, status.Errorf(codes.Unauthenticated, "invalid token issuer")
		}

		// Validation passed. Create the user object and inject it into the context.
		user := AuthenticatedUser{
			Email:  claims.Email,
			Groups: claims.Groups,
		}
		newCtx := context.WithValue(ctx, authenticatedUserKey, user)
		
		return handler(newCtx, req)
	}
}

// UserFromContext retrieves the authenticated user from the context.
// RPC handlers should use this function to get the caller's identity.
func UserFromContext(ctx context.Context) (AuthenticatedUser, bool) {
	user, ok := ctx.Value(authenticatedUserKey).(AuthenticatedUser)
	return user, ok
}

Deployment on Docker Swarm

Docker Swarm provides a simple yet powerful way to orchestrate these services. We use a docker-compose.yml file to define the services, networks, and secrets. A key feature is Swarm’s encrypted overlay networks, which ensure secure communication between the token service and the gRPC backends without exposing ports on the host nodes.

Another powerful feature is Docker Secrets, which we use to securely provide the SAML private key and the JWT signing key to the appropriate containers. These secrets are mounted as files in-memory, avoiding the risk of writing them to disk.

# file: docker-compose.yml
version: "3.8"

services:
  # The SAML-to-JWT Token Service
  auth-service:
    image: my-registry/auth-service:1.0.0
    networks:
      - internal-net
    ports:
      # Expose 443 for the SAML ACS endpoint and user-facing login page.
      - "443:8443"
    environment:
      - LOG_LEVEL=info
      - SP_BASE_URL=https://auth.internal.corp.com
      - IDP_METADATA_URL=https://idp.corp.com/metadata.xml
      - JWT_ISSUER=internal-auth-service
      - JWT_AUDIENCE=internal-grpc-services
      # Reference secrets for sensitive data
      - JWT_SIGNING_KEY_FILE=/run/secrets/jwt_signing_key
      - SP_CERT_PATH=/run/secrets/sp_cert_pem
      - SP_KEY_PATH=/run/secrets/sp_key_pem
    secrets:
      - sp_key_pem
      - sp_cert_pem
      - jwt_signing_key
    deploy:
      replicas: 2
      placement:
        constraints: [node.role == manager]
      restart_policy:
        condition: on-failure

  # A sample backend gRPC service
  user-profile-service:
    image: my-registry/user-profile-service:1.2.0
    networks:
      - internal-net
      # No ports are exposed publicly. Communication happens over the overlay network.
    environment:
      - LOG_LEVEL=info
      - DATABASE_URL=...
      # It also needs the JWT key to validate tokens
      - JWT_SIGNING_KEY_FILE=/run/secrets/jwt_signing_key
      - JWT_AUDIENCE=internal-grpc-services
      - JWT_ISSUER=internal-auth-service
    secrets:
      - jwt_signing_key
    deploy:
      replicas: 3
      restart_policy:
        condition: on-failure

# Define the secrets. These must be created in Docker Swarm beforehand
# using `docker secret create ...`
secrets:
  sp_key_pem:
    external: true
  sp_cert_pem:
    external: true
  jwt_signing_key:
    external: true

# The secure overlay network for inter-service communication
networks:
  internal-net:
    driver: overlay
    attachable: true

The deployment process becomes a simple command: docker stack deploy -c docker-compose.yml myapp. Swarm handles scheduling, networking, and secret distribution.

The Role of Prettier in a Go-Centric Project

While gofmt is the standard for Go code, our project wasn’t just Go. It included Dockerfiles, extensive YAML for Docker Compose and GCP Cloud Build, and Markdown documentation. In a production environment, consistency is key to maintainability. Enforcing a single, opinionated formatter across all project assets was a non-negotiable principle.

We used Prettier with plugins for YAML and Dockerfiles. A pre-commit hook was set up with husky to run prettier --write on staged files, ensuring that no unformatted code ever reached the repository. Our CI pipeline on Google Cloud Build had a dedicated first step to validate this.

# file: cloudbuild.yaml
steps:
- name: 'node:18'
  id: 'Format Check'
  entrypoint: 'npm'
  args: ['install', '&&', 'npm', 'run', 'format:check']
  # package.json would contain: "format:check": "prettier --check ."

- name: 'gcr.io/cloud-builders/go:1.19'
  id: 'Unit Test'
  args: ['test', './...']
  env: ['CGO_ENABLED=0']

- name: 'gcr.io/cloud-builders/docker'
  id: 'Build Auth Service'
  args:
    - 'build'
    - '-t'
    - 'my-registry/auth-service:$SHORT_SHA'
    - '.'
    - '-f'
    - 'cmd/auth-service/Dockerfile'

# ... other build and push steps ...

This simple check eliminated countless hours of debate over YAML indentation or Dockerfile instruction order in code reviews, allowing engineers to focus on the logic, not the syntax.

System Flow Diagram

This architecture can be visualized as follows:

sequenceDiagram
    participant User as Human Operator
    participant Client as gRPC Client/Script
    participant Browser
    participant TokenSvc as Token Service (SAML SP)
    participant IdP as Corporate IdP
    participant GrpcSvc as Backend gRPC Service

    User->>Browser: Accesses internal tool UI
    Browser->>TokenSvc: Initiates login (/login)
    TokenSvc-->>Browser: HTTP 302 Redirect to IdP
    Browser->>IdP: Follows redirect, sends AuthNRequest
    IdP-->>User: Prompts for credentials
    User->>IdP: Submits credentials
    IdP-->>Browser: HTTP 302 Redirect back to TokenSvc (ACS)
    Browser->>TokenSvc: POST SAML Assertion to ACS
    
    Note right of TokenSvc: 1. Validates SAML Assertion
2. Extracts user attributes
3. Mints short-lived JWT

    TokenSvc-->>Browser: Returns JWT in JSON payload
    Browser-->>User: Displays JWT for use
    User->>Client: Configures client with the JWT
    
    Client->>GrpcSvc: gRPC call with 'authorization: Bearer jwt...'
    
    Note right of GrpcSvc: 1. Interceptor extracts JWT
2. Validates JWT signature & claims
3. Injects user identity into context
    
    GrpcSvc-->>GrpcSvc: Processes request with user context
    GrpcSvc-->>Client: Returns gRPC response

This solution successfully bridged the gap. It provided a centralized, auditable point of entry for user authentication that aligned with enterprise security policy, while preserving the performance and decentralized nature of our internal gRPC service mesh.

However, the architecture is not without its limitations. The JWT signing key is a high-value secret that must be managed and rotated carefully. A compromised key could allow an attacker to mint valid tokens for any user. Our current implementation uses a symmetric key (HS256); an evolution would be to use asymmetric keys (RS256). The token service would sign with the private key, and backend services would only need the public key to validate, which is less sensitive. Furthermore, this model lacks a mechanism for token revocation. If a token is leaked, it remains valid until its expiration. For highly sensitive operations, a revocation list or a more complex system where each gRPC service re-validates the token against the auth service on every call might be necessary, though this would re-introduce some of the latency we sought to avoid.

GCP gRPC-Go SAML Docker Swarm Prettier

Propagating Distributed Trace Context and OIDC Claims Across a Redux SPA, Sanic Gateway, and Elixir Service

2023-10-27 Observability

SkyWalking Elixir Sanic Redux OpenID Connect (OIDC)

Implementing a Distributed Go Worker System for On-Demand PostCSS Compilation

2023-10-27 Backend Development

DevOps PostCSS Go Distributed & Middleware NATS