The transition from a monolithic Rails application, where security is neatly managed by stateful session cookies, to a decoupled architecture with a Gatsby single-page application and a GraphQL API introduced an immediate and critical security vulnerability. Our initial naive implementation used a long-lived JSON Web Token (JWT) stored in the browser’s localStorage
. This approach, while simple, exposed a significant attack surface. A stolen token granted indefinite access until its distant expiry, and there was no viable mechanism for server-side revocation. The core problem was attempting to replicate stateful security with a fundamentally stateless tool without embracing the necessary complexity.
Our revised objective became clear: implement a security model that provided the user experience of a persistent session while adhering to the security principle of least privilege in time. This meant short-lived access tokens, expiring in minutes, coupled with a mechanism for seamless and secure re-authentication. The choice was a refresh token flow. The access token would be an ephemeral, stateless bearer token, while the refresh token would be a long-lived, stateful credential used solely to mint new access tokens.
This decision cascaded through the stack. The Ruby on Rails API, now serving GraphQL, needed to manage the lifecycle of these refresh tokens, storing them securely and providing endpoints for issuance and rotation. The Gatsby frontend required a robust client-side interceptor to handle token expiry and refresh automatically, without disrupting the user experience. Finally, the entire system needed to be deployed on Docker Swarm with a Zero Trust mindset. Services, even on the same private network, would not implicitly trust each other. The JWT signing keys, the most critical secret in this architecture, would be managed by Docker’s native secret management, never touching a Docker image or source control.
The Ruby on Rails GraphQL Backend: Token Lifecycle Management
The foundation of the security model rests within the Rails API. It is the single source of truth for identity and the sole issuer of tokens. The first step was to move beyond the simple JWT gem and create a dedicated service for handling token logic, specifically using the RS256 algorithm for asymmetric signing. This allows the API to sign with a private key while other services could, in theory, verify with a public key without needing the signing secret.
# app/services/json_web_token.rb
# A dedicated service for encoding and decoding JWTs.
# In a real-world project, this encapsulates all JWT logic, making it easier
# to swap algorithms or libraries if needed.
class JsonWebToken
# The private key is loaded from the path specified by an environment variable.
# This path will point to the file injected by Docker Swarm secrets.
# Fallback is for local development.
PRIVATE_KEY_PATH = ENV.fetch('JWT_PRIVATE_KEY_PATH', 'config/keys/jwt_private_key.pem')
PUBLIC_KEY_PATH = ENV.fetch('JWT_PUBLIC_KEY_PATH', 'config/keys/jwt_public_key.pub')
# A common mistake is to hardcode keys or use symmetric algorithms (HS256)
# where the same key is used for signing and verification. RS256 is superior
# as it allows for public verification without exposing the signing key.
PRIVATE_KEY = OpenSSL::PKey::RSA.new(File.read(PRIVATE_KEY_PATH))
PUBLIC_KEY = OpenSSL::PKey::RSA.new(File.read(PUBLIC_KEY_PATH))
ALGORITHM = 'RS256'.freeze
ACCESS_TOKEN_LIFETIME = 15.minutes
REFRESH_TOKEN_LIFETIME = 7.days
# Encodes an access token. Note the short lifetime.
# The 'jti' (JWT ID) is a crucial claim for preventing replay attacks
# and enabling more granular revocation if we build a blacklist.
def self.encode_access_token(user)
payload = {
sub: user.id,
iat: Time.now.to_i,
exp: ACCESS_TOKEN_LIFETIME.from_now.to_i,
jti: SecureRandom.uuid,
type: 'access'
}
JWT.encode(payload, PRIVATE_KEY, ALGORITHM)
end
# Decodes a token. It automatically verifies the signature, expiration,
# and issuer (if provided).
def self.decode(token)
decoded = JWT.decode(token, PUBLIC_KEY, true, { algorithm: ALGORITHM })
ActiveSupport::HashWithIndifferentAccess.new(decoded[0])
rescue JWT::ExpiredSignature, JWT::VerificationError, JWT::DecodeError
# Centralizing error handling here is critical.
# The caller shouldn't need to know about JWT-specific exceptions.
nil
end
end
To manage the stateful refresh tokens, we introduced a new model. Storing them in the database allows for revocation and tracking. A pitfall here is not indexing the token
column, which would lead to slow lookups as the table grows.
# app/models/refresh_token.rb
class RefreshToken < ApplicationRecord
belongs_to :user
before_create :set_token_and_expiry
# This token is what's sent to the client in an HttpOnly cookie.
# It's indexed for fast lookups during the refresh process.
validates :token, presence: true, uniqueness: true
private
def set_token_and_expiry
self.token ||= SecureRandom.hex(32)
self.expires_at ||= JsonWebToken::REFRESH_TOKEN_LIFETIME.from_now
end
end
# db/migrate/YYYYMMDDHHMMSS_create_refresh_tokens.rb
class CreateRefreshTokens < ActiveRecord::Migration[7.0]
def change
create_table :refresh_tokens do |t|
t.references :user, null: false, foreign_key: true
t.string :token, null: false
t.datetime :expires_at, null: false
t.timestamps
end
# The index is critical for performance. Without it, refreshing a token
# would require a full table scan.
add_index :refresh_tokens, :token, unique: true
end
end
With the models and services in place, the GraphQL mutations form the public interface for authentication. The login
mutation issues both tokens, but delivers them differently. The access token is in the JSON response, destined for client-side memory. The refresh token is set in a secure, HttpOnly
cookie. This prevents JavaScript from accessing it, mitigating XSS attacks that could steal the long-lived token.
# app/graphql/mutations/login.rb
class Mutations::Login < Mutations::BaseMutation
argument :email, String, required: true
argument :password, String, required: true
field :access_token, String, null: true
field :user, Types::UserType, null: true
field :errors, [String], null: false
def resolve(email:, password:)
user = User.find_for_authentication(email: email)
if user&.valid_password?(password)
# Create and store the stateful refresh token
refresh_token = user.refresh_tokens.create!
# Set the refresh token in a secure cookie.
# HttpOnly: Prevents JS access.
# SameSite=Strict: Prevents sending the cookie on cross-origin requests.
# Secure: Ensures it's only sent over HTTPS in production.
context[:cookies][:refresh_token] = {
value: refresh_token.token,
httponly: true,
samesite: :strict,
secure: Rails.env.production?,
expires: refresh_token.expires_at
}
{
access_token: JsonWebToken.encode_access_token(user),
user: user,
errors: []
}
else
{ access_token: nil, user: nil, errors: ['Invalid email or password'] }
end
end
end
# app/graphql/mutations/refresh_token.rb
class Mutations::RefreshToken < Mutations::BaseMutation
field :access_token, String, null: true
field :errors, [String], null: false
def resolve
token_value = context[:cookies][:refresh_token]
return { access_token: nil, errors: ['No refresh token found'] } unless token_value
# The core of token rotation. Find the old token.
old_token = RefreshToken.find_by(token: token_value)
unless old_token && old_token.expires_at > Time.now
# Also clear the invalid/expired cookie from the client
context[:cookies].delete(:refresh_token)
return { access_token: nil, errors: ['Invalid or expired refresh token'] }
end
user = old_token.user
# IMPORTANT: Invalidate the old token to prevent reuse (a form of replay attack).
old_token.destroy!
# Issue a new refresh token and set it in the cookie. This is token rotation.
new_refresh_token = user.refresh_tokens.create!
context[:cookies][:refresh_token] = {
value: new_refresh_token.token,
httponly: true,
samesite: :strict,
secure: Rails.env.production?,
expires: new_refresh_token.expires_at
}
{
access_token: JsonWebToken.encode_access_token(user),
errors: []
}
end
end
Finally, the API must protect its resources. We do this by inspecting the Authorization
header on incoming requests. This logic is centralized in the GraphqlController
before the query is even executed.
# app/controllers/graphql_controller.rb
class GraphqlController < ApplicationController
# This acts as the authentication gate for the entire GraphQL endpoint.
def execute
# ... (standard graphql-ruby setup)
context = {
current_user: current_user,
cookies: cookies, # Pass cookies into the GraphQL context
}
result = MyApiSchema.execute(query, variables: variables, context: context, operation_name: operation_name)
render json: result
rescue => e
# ... error handling
end
private
def current_user
# A memoized helper to find the user from the Bearer token.
return @current_user if defined?(@current_user)
token = request.headers['Authorization']&.split(' ')&.last
return nil unless token
decoded_token = JsonWebToken.decode(token)
return nil unless decoded_token && decoded_token[:type] == 'access'
@current_user = User.find_by(id: decoded_token[:sub])
end
end
The Gatsby Frontend: Seamless Token Interception
The frontend’s responsibility is to manage the access token and orchestrate the refresh flow transparently. Storing the short-lived access token in localStorage
is an anti-pattern; it provides no real security benefit over a long-lived token and still exposes it to XSS. A better approach is to keep it in application memory only.
We use Apollo Client for GraphQL communication. Its Link middleware system is the perfect place to inject our token management logic. We’ll create a chain of links: an errorLink
to catch 401 Unauthorized
responses and trigger a refresh, and an authLink
to attach the token to outgoing requests.
// src/apollo/client.js
import { ApolloClient, InMemoryCache, createHttpLink, from } from '@apollo/client';
import { setContext } from '@apollo/client/link/context';
import { onError } from '@apollo/client/link/error';
// This is our in-memory, non-persistent store for the access token.
// It's just a variable in a module, inaccessible to other scripts.
let accessToken = null;
export const setAccessToken = (token) => {
accessToken = token;
};
export const getAccessToken = () => accessToken;
const httpLink = createHttpLink({
uri: process.env.GATSBY_API_URL,
});
// This link attaches the Authorization header to every GraphQL request.
const authLink = setContext((_, { headers }) => {
const token = getAccessToken();
return {
headers: {
...headers,
authorization: token ? `Bearer ${token}` : '',
},
};
});
// This is the critical piece for handling expired access tokens.
// It intercepts GraphQL errors. If it's a 401, it attempts to refresh.
const errorLink = onError(({ graphQLErrors, networkError, operation, forward }) => {
if (graphQLErrors) {
for (let err of graphQLErrors) {
// Assuming the backend sends a specific error code for expired tokens.
// In a real-world app, you'd check for a specific extension code, not just the message.
if (err.extensions?.code === 'UNAUTHENTICATED') {
// Here we need to trigger the refresh token mutation.
// We cannot `await` here directly, so this logic becomes complex.
// A common pattern is to have a queue of failed requests.
// For simplicity, we'll just redirect to login, but a production app
// would implement a more robust retry mechanism.
console.error("Authentication error, need to refresh token or log out.");
// A real implementation would call a `refreshToken()` function here,
// which performs the mutation, updates `accessToken` via `setAccessToken`,
// and then uses `forward(operation)` to retry the request.
// That logic is non-trivial and often involves promises and request queuing.
}
}
}
if (networkError) {
console.log(`[Network error]: ${networkError}`);
}
});
// The links are chained. The error link wraps the auth link, which wraps the HTTP link.
export const client = new ApolloClient({
link: from([errorLink, authLink, httpLink]),
cache: new InMemoryCache(),
});
To manage the authentication state across the application, a React Context is ideal.
// src/context/AuthContext.js
import React, { createContext, useState, useContext } from 'react';
import { useMutation } from '@apollo/client';
import { LOGIN_MUTATION, REFRESH_TOKEN_MUTATION } from '../apollo/queries';
import { setAccessToken } from '../apollo/client';
const AuthContext = createContext();
export const AuthProvider = ({ children }) => {
const [user, setUser] = useState(null);
const [loading, setLoading] = useState(true);
// Note: We need to use `refetchQueries` or update the cache manually
// after login/logout to ensure UI consistency.
const [loginMutation] = useMutation(LOGIN_MUTATION);
const [refreshTokenMutation] = useMutation(REFRESH_TOKEN_MUTATION);
const login = async (email, password) => {
const { data } = await loginMutation({ variables: { email, password } });
if (data?.login?.accessToken) {
setAccessToken(data.login.accessToken);
setUser(data.login.user);
}
// Handle errors...
};
// This function would be called by the Apollo error link.
const refresh = async () => {
try {
const { data } = await refreshTokenMutation();
if (data?.refreshToken?.accessToken) {
setAccessToken(data.refreshToken.accessToken);
return true; // Indicate success
}
} catch (error) {
// Refresh failed, log the user out.
logout();
return false; // Indicate failure
}
};
const logout = () => {
// Call a logout mutation on the backend to invalidate the refresh token.
setAccessToken(null);
setUser(null);
};
// On initial app load, you might try to refresh the token to see if a
// valid session exists.
React.useEffect(() => {
refresh().finally(() => setLoading(false));
}, []);
const value = { user, loading, login, logout, refresh };
return <AuthContext.Provider value={value}>{children}</AuthContext.Provider>;
};
export const useAuth = () => useContext(AuthContext);
This setup provides a clean API (useAuth
) for components to interact with the authentication system, hiding the complexity of token management.
Orchestration with Docker Swarm: Enforcing Zero Trust
Deploying this architecture requires careful management of secrets and network policies. Docker Swarm, while simpler than Kubernetes, provides the necessary primitives. The cornerstone of our deployment is docker secret
. We generate an RSA key pair locally and provide it to the Swarm, which then mounts it securely into the Rails container’s filesystem.
First, generate the keys:
$ ssh-keygen -t rsa -b 4096 -m PEM -f jwt_private.pem
$ openssl rsa -in jwt_private.pem -pubout -outform PEM -out jwt_public.pem
Next, create the secrets in the Docker Swarm manager:
$ docker secret create jwt_private_key jwt_private.pem
$ docker secret create jwt_public_key jwt_public.pem
The docker-stack.yml
file defines the services, networks, and secrets, tying everything together. A key principle of Zero Trust is enforced here: network segmentation. The database lives on a backend-db
network, accessible only by the Rails API service. The Gatsby frontend and the API communicate over a frontend-api
network. There is no default path from the public-facing frontend service to the database.
version: '3.8'
services:
# The Rails GraphQL API
api:
image: my-registry/my-rails-api:latest
environment:
RAILS_ENV: production
RAILS_LOG_TO_STDOUT: "true"
DATABASE_URL: "postgresql://user:password@postgres:5432/mydb"
# These env vars point the application to the secrets mounted by Docker
JWT_PRIVATE_KEY_PATH: /run/secrets/jwt_private_key
JWT_PUBLIC_KEY_PATH: /run/secrets/jwt_public_key
secrets:
- jwt_private_key
- jwt_public_key
networks:
- frontend-api
- backend-db
deploy:
replicas: 2
update_config:
parallelism: 1
delay: 10s
restart_policy:
condition: on-failure
# The Gatsby application, served by Nginx
frontend:
image: my-registry/my-gatsby-app:latest
ports:
- "80:80"
networks:
- frontend-api
deploy:
replicas: 2
# The PostgreSQL Database
postgres:
image: postgres:14-alpine
volumes:
- postgres_data:/var/lib/postgresql/data
environment:
POSTGRES_DB: mydb
POSTGRES_USER: user
POSTGRES_PASSWORD: password
networks:
# Only connected to the backend network
- backend-db
deploy:
placement:
constraints: [node.role == manager] # Pin DB to a specific node if needed
# Secrets must be declared as 'external' because we created them manually
# outside of this stack file. This is a best practice.
secrets:
jwt_private_key:
external: true
jwt_public_key:
external: true
# Encrypted overlay networks for service-to-service communication
networks:
frontend-api:
driver: overlay
attachable: true
backend-db:
driver: overlay
internal: true # 'internal' prevents any external access to this network
volumes:
postgres_data:
This stack definition creates a resilient, secure runtime. The API’s signing key is never part of its image. Network traffic between services is encrypted by default on Swarm’s overlay networks. An attacker who compromises the frontend container cannot directly access the database.
graph TD subgraph "Internet" U[User Browser] end subgraph "Docker Swarm Cluster (Encrypted Overlay Networks)" LB[Swarm Ingress Routing Mesh] subgraph "Network: frontend-api" F1[Frontend Container 1] F2[Frontend Container 2] A1[API Container 1] A2[API Container 2] end subgraph "Network: backend-db (Internal)" DB[(PostgreSQL)] end subgraph "Swarm Secret Storage" S_PRIV(jwt_private_key) S_PUB(jwt_public_key) end end U -- HTTPS --> LB LB -- HTTP --> F1 & F2 F1 -- GraphQL API Calls --> A1 & A2 F2 -- GraphQL API Calls --> A1 & A2 A1 -- SQL --> DB A2 -- SQL --> DB S_PRIV -- mounted as file --> A1 S_PRIV -- mounted as file --> A2 S_PUB -- mounted as file --> A1 S_PUB -- mounted as file --> A2
The limitations of this approach primarily revolve around refresh token revocation. While our implementation removes the token from the database upon use or logout, a stolen refresh token remains valid until it’s used or expires. A truly robust system would implement a revocation list, likely in a fast-access store like Redis, checked during every refresh attempt. This adds complexity and a new dependency but hardens security against token theft. Furthermore, while Docker Swarm provides excellent basic security, it lacks the advanced network policy engines like Calico or Cilium available in Kubernetes, which can enforce Layer 7 policies (e.g., “only allow this specific GraphQL mutation from the frontend service”). This architecture is a significant step up from simple JWTs, but the path of security engineering is one of continuous, iterative improvement.