Implementing Short-Lived JWT Authentication with Refresh Tokens in a Decoupled Gatsby and Rails GraphQL Architecture on Docker Swarm


The transition from a monolithic Rails application, where security is neatly managed by stateful session cookies, to a decoupled architecture with a Gatsby single-page application and a GraphQL API introduced an immediate and critical security vulnerability. Our initial naive implementation used a long-lived JSON Web Token (JWT) stored in the browser’s localStorage. This approach, while simple, exposed a significant attack surface. A stolen token granted indefinite access until its distant expiry, and there was no viable mechanism for server-side revocation. The core problem was attempting to replicate stateful security with a fundamentally stateless tool without embracing the necessary complexity.

Our revised objective became clear: implement a security model that provided the user experience of a persistent session while adhering to the security principle of least privilege in time. This meant short-lived access tokens, expiring in minutes, coupled with a mechanism for seamless and secure re-authentication. The choice was a refresh token flow. The access token would be an ephemeral, stateless bearer token, while the refresh token would be a long-lived, stateful credential used solely to mint new access tokens.

This decision cascaded through the stack. The Ruby on Rails API, now serving GraphQL, needed to manage the lifecycle of these refresh tokens, storing them securely and providing endpoints for issuance and rotation. The Gatsby frontend required a robust client-side interceptor to handle token expiry and refresh automatically, without disrupting the user experience. Finally, the entire system needed to be deployed on Docker Swarm with a Zero Trust mindset. Services, even on the same private network, would not implicitly trust each other. The JWT signing keys, the most critical secret in this architecture, would be managed by Docker’s native secret management, never touching a Docker image or source control.

The Ruby on Rails GraphQL Backend: Token Lifecycle Management

The foundation of the security model rests within the Rails API. It is the single source of truth for identity and the sole issuer of tokens. The first step was to move beyond the simple JWT gem and create a dedicated service for handling token logic, specifically using the RS256 algorithm for asymmetric signing. This allows the API to sign with a private key while other services could, in theory, verify with a public key without needing the signing secret.

# app/services/json_web_token.rb
# A dedicated service for encoding and decoding JWTs.
# In a real-world project, this encapsulates all JWT logic, making it easier
# to swap algorithms or libraries if needed.
class JsonWebToken
  # The private key is loaded from the path specified by an environment variable.
  # This path will point to the file injected by Docker Swarm secrets.
  # Fallback is for local development.
  PRIVATE_KEY_PATH = ENV.fetch('JWT_PRIVATE_KEY_PATH', 'config/keys/jwt_private_key.pem')
  PUBLIC_KEY_PATH = ENV.fetch('JWT_PUBLIC_KEY_PATH', 'config/keys/jwt_public_key.pub')
  
  # A common mistake is to hardcode keys or use symmetric algorithms (HS256)
  # where the same key is used for signing and verification. RS256 is superior
  # as it allows for public verification without exposing the signing key.
  PRIVATE_KEY = OpenSSL::PKey::RSA.new(File.read(PRIVATE_KEY_PATH))
  PUBLIC_KEY = OpenSSL::PKey::RSA.new(File.read(PUBLIC_KEY_PATH))
  
  ALGORITHM = 'RS256'.freeze
  ACCESS_TOKEN_LIFETIME = 15.minutes
  REFRESH_TOKEN_LIFETIME = 7.days

  # Encodes an access token. Note the short lifetime.
  # The 'jti' (JWT ID) is a crucial claim for preventing replay attacks
  # and enabling more granular revocation if we build a blacklist.
  def self.encode_access_token(user)
    payload = {
      sub: user.id,
      iat: Time.now.to_i,
      exp: ACCESS_TOKEN_LIFETIME.from_now.to_i,
      jti: SecureRandom.uuid,
      type: 'access'
    }
    JWT.encode(payload, PRIVATE_KEY, ALGORITHM)
  end

  # Decodes a token. It automatically verifies the signature, expiration,
  # and issuer (if provided).
  def self.decode(token)
    decoded = JWT.decode(token, PUBLIC_KEY, true, { algorithm: ALGORITHM })
    ActiveSupport::HashWithIndifferentAccess.new(decoded[0])
  rescue JWT::ExpiredSignature, JWT::VerificationError, JWT::DecodeError
    # Centralizing error handling here is critical.
    # The caller shouldn't need to know about JWT-specific exceptions.
    nil
  end
end

To manage the stateful refresh tokens, we introduced a new model. Storing them in the database allows for revocation and tracking. A pitfall here is not indexing the token column, which would lead to slow lookups as the table grows.

# app/models/refresh_token.rb
class RefreshToken < ApplicationRecord
  belongs_to :user
  
  before_create :set_token_and_expiry

  # This token is what's sent to the client in an HttpOnly cookie.
  # It's indexed for fast lookups during the refresh process.
  validates :token, presence: true, uniqueness: true
  
  private
  
  def set_token_and_expiry
    self.token ||= SecureRandom.hex(32)
    self.expires_at ||= JsonWebToken::REFRESH_TOKEN_LIFETIME.from_now
  end
end

# db/migrate/YYYYMMDDHHMMSS_create_refresh_tokens.rb
class CreateRefreshTokens < ActiveRecord::Migration[7.0]
  def change
    create_table :refresh_tokens do |t|
      t.references :user, null: false, foreign_key: true
      t.string :token, null: false
      t.datetime :expires_at, null: false

      t.timestamps
    end
    # The index is critical for performance. Without it, refreshing a token
    # would require a full table scan.
    add_index :refresh_tokens, :token, unique: true
  end
end

With the models and services in place, the GraphQL mutations form the public interface for authentication. The login mutation issues both tokens, but delivers them differently. The access token is in the JSON response, destined for client-side memory. The refresh token is set in a secure, HttpOnly cookie. This prevents JavaScript from accessing it, mitigating XSS attacks that could steal the long-lived token.

# app/graphql/mutations/login.rb
class Mutations::Login < Mutations::BaseMutation
  argument :email, String, required: true
  argument :password, String, required: true

  field :access_token, String, null: true
  field :user, Types::UserType, null: true
  field :errors, [String], null: false

  def resolve(email:, password:)
    user = User.find_for_authentication(email: email)

    if user&.valid_password?(password)
      # Create and store the stateful refresh token
      refresh_token = user.refresh_tokens.create!

      # Set the refresh token in a secure cookie.
      # HttpOnly: Prevents JS access.
      # SameSite=Strict: Prevents sending the cookie on cross-origin requests.
      # Secure: Ensures it's only sent over HTTPS in production.
      context[:cookies][:refresh_token] = {
        value: refresh_token.token,
        httponly: true,
        samesite: :strict,
        secure: Rails.env.production?,
        expires: refresh_token.expires_at
      }
      
      {
        access_token: JsonWebToken.encode_access_token(user),
        user: user,
        errors: []
      }
    else
      { access_token: nil, user: nil, errors: ['Invalid email or password'] }
    end
  end
end

# app/graphql/mutations/refresh_token.rb
class Mutations::RefreshToken < Mutations::BaseMutation
  field :access_token, String, null: true
  field :errors, [String], null: false

  def resolve
    token_value = context[:cookies][:refresh_token]
    return { access_token: nil, errors: ['No refresh token found'] } unless token_value
    
    # The core of token rotation. Find the old token.
    old_token = RefreshToken.find_by(token: token_value)

    unless old_token && old_token.expires_at > Time.now
      # Also clear the invalid/expired cookie from the client
      context[:cookies].delete(:refresh_token)
      return { access_token: nil, errors: ['Invalid or expired refresh token'] }
    end

    user = old_token.user

    # IMPORTANT: Invalidate the old token to prevent reuse (a form of replay attack).
    old_token.destroy!

    # Issue a new refresh token and set it in the cookie. This is token rotation.
    new_refresh_token = user.refresh_tokens.create!
    context[:cookies][:refresh_token] = {
      value: new_refresh_token.token,
      httponly: true,
      samesite: :strict,
      secure: Rails.env.production?,
      expires: new_refresh_token.expires_at
    }

    {
      access_token: JsonWebToken.encode_access_token(user),
      errors: []
    }
  end
end

Finally, the API must protect its resources. We do this by inspecting the Authorization header on incoming requests. This logic is centralized in the GraphqlController before the query is even executed.

# app/controllers/graphql_controller.rb
class GraphqlController < ApplicationController
  # This acts as the authentication gate for the entire GraphQL endpoint.
  def execute
    # ... (standard graphql-ruby setup)
    context = {
      current_user: current_user,
      cookies: cookies, # Pass cookies into the GraphQL context
    }
    result = MyApiSchema.execute(query, variables: variables, context: context, operation_name: operation_name)
    render json: result
  rescue => e
    # ... error handling
  end

  private

  def current_user
    # A memoized helper to find the user from the Bearer token.
    return @current_user if defined?(@current_user)

    token = request.headers['Authorization']&.split(' ')&.last
    return nil unless token

    decoded_token = JsonWebToken.decode(token)
    return nil unless decoded_token && decoded_token[:type] == 'access'

    @current_user = User.find_by(id: decoded_token[:sub])
  end
end

The Gatsby Frontend: Seamless Token Interception

The frontend’s responsibility is to manage the access token and orchestrate the refresh flow transparently. Storing the short-lived access token in localStorage is an anti-pattern; it provides no real security benefit over a long-lived token and still exposes it to XSS. A better approach is to keep it in application memory only.

We use Apollo Client for GraphQL communication. Its Link middleware system is the perfect place to inject our token management logic. We’ll create a chain of links: an errorLink to catch 401 Unauthorized responses and trigger a refresh, and an authLink to attach the token to outgoing requests.

// src/apollo/client.js
import { ApolloClient, InMemoryCache, createHttpLink, from } from '@apollo/client';
import { setContext } from '@apollo/client/link/context';
import { onError } from '@apollo/client/link/error';

// This is our in-memory, non-persistent store for the access token.
// It's just a variable in a module, inaccessible to other scripts.
let accessToken = null;

export const setAccessToken = (token) => {
  accessToken = token;
};

export const getAccessToken = () => accessToken;

const httpLink = createHttpLink({
  uri: process.env.GATSBY_API_URL,
});

// This link attaches the Authorization header to every GraphQL request.
const authLink = setContext((_, { headers }) => {
  const token = getAccessToken();
  return {
    headers: {
      ...headers,
      authorization: token ? `Bearer ${token}` : '',
    },
  };
});

// This is the critical piece for handling expired access tokens.
// It intercepts GraphQL errors. If it's a 401, it attempts to refresh.
const errorLink = onError(({ graphQLErrors, networkError, operation, forward }) => {
  if (graphQLErrors) {
    for (let err of graphQLErrors) {
      // Assuming the backend sends a specific error code for expired tokens.
      // In a real-world app, you'd check for a specific extension code, not just the message.
      if (err.extensions?.code === 'UNAUTHENTICATED') {
        // Here we need to trigger the refresh token mutation.
        // We cannot `await` here directly, so this logic becomes complex.
        // A common pattern is to have a queue of failed requests.
        // For simplicity, we'll just redirect to login, but a production app
        // would implement a more robust retry mechanism.
        console.error("Authentication error, need to refresh token or log out.");
        // A real implementation would call a `refreshToken()` function here,
        // which performs the mutation, updates `accessToken` via `setAccessToken`,
        // and then uses `forward(operation)` to retry the request.
        // That logic is non-trivial and often involves promises and request queuing.
      }
    }
  }

  if (networkError) {
    console.log(`[Network error]: ${networkError}`);
  }
});

// The links are chained. The error link wraps the auth link, which wraps the HTTP link.
export const client = new ApolloClient({
  link: from([errorLink, authLink, httpLink]),
  cache: new InMemoryCache(),
});

To manage the authentication state across the application, a React Context is ideal.

// src/context/AuthContext.js
import React, { createContext, useState, useContext } from 'react';
import { useMutation } from '@apollo/client';
import { LOGIN_MUTATION, REFRESH_TOKEN_MUTATION } from '../apollo/queries';
import { setAccessToken } from '../apollo/client';

const AuthContext = createContext();

export const AuthProvider = ({ children }) => {
  const [user, setUser] = useState(null);
  const [loading, setLoading] = useState(true);

  // Note: We need to use `refetchQueries` or update the cache manually
  // after login/logout to ensure UI consistency.
  const [loginMutation] = useMutation(LOGIN_MUTATION);
  const [refreshTokenMutation] = useMutation(REFRESH_TOKEN_MUTATION);

  const login = async (email, password) => {
    const { data } = await loginMutation({ variables: { email, password } });
    if (data?.login?.accessToken) {
      setAccessToken(data.login.accessToken);
      setUser(data.login.user);
    }
    // Handle errors...
  };
  
  // This function would be called by the Apollo error link.
  const refresh = async () => {
    try {
      const { data } = await refreshTokenMutation();
      if (data?.refreshToken?.accessToken) {
        setAccessToken(data.refreshToken.accessToken);
        return true; // Indicate success
      }
    } catch (error) {
      // Refresh failed, log the user out.
      logout();
      return false; // Indicate failure
    }
  };


  const logout = () => {
    // Call a logout mutation on the backend to invalidate the refresh token.
    setAccessToken(null);
    setUser(null);
  };

  // On initial app load, you might try to refresh the token to see if a
  // valid session exists.
  React.useEffect(() => {
     refresh().finally(() => setLoading(false));
  }, []);

  const value = { user, loading, login, logout, refresh };

  return <AuthContext.Provider value={value}>{children}</AuthContext.Provider>;
};

export const useAuth = () => useContext(AuthContext);

This setup provides a clean API (useAuth) for components to interact with the authentication system, hiding the complexity of token management.

Orchestration with Docker Swarm: Enforcing Zero Trust

Deploying this architecture requires careful management of secrets and network policies. Docker Swarm, while simpler than Kubernetes, provides the necessary primitives. The cornerstone of our deployment is docker secret. We generate an RSA key pair locally and provide it to the Swarm, which then mounts it securely into the Rails container’s filesystem.

First, generate the keys:

$ ssh-keygen -t rsa -b 4096 -m PEM -f jwt_private.pem
$ openssl rsa -in jwt_private.pem -pubout -outform PEM -out jwt_public.pem

Next, create the secrets in the Docker Swarm manager:

$ docker secret create jwt_private_key jwt_private.pem
$ docker secret create jwt_public_key jwt_public.pem

The docker-stack.yml file defines the services, networks, and secrets, tying everything together. A key principle of Zero Trust is enforced here: network segmentation. The database lives on a backend-db network, accessible only by the Rails API service. The Gatsby frontend and the API communicate over a frontend-api network. There is no default path from the public-facing frontend service to the database.

version: '3.8'

services:
  # The Rails GraphQL API
  api:
    image: my-registry/my-rails-api:latest
    environment:
      RAILS_ENV: production
      RAILS_LOG_TO_STDOUT: "true"
      DATABASE_URL: "postgresql://user:password@postgres:5432/mydb"
      # These env vars point the application to the secrets mounted by Docker
      JWT_PRIVATE_KEY_PATH: /run/secrets/jwt_private_key
      JWT_PUBLIC_KEY_PATH: /run/secrets/jwt_public_key
    secrets:
      - jwt_private_key
      - jwt_public_key
    networks:
      - frontend-api
      - backend-db
    deploy:
      replicas: 2
      update_config:
        parallelism: 1
        delay: 10s
      restart_policy:
        condition: on-failure

  # The Gatsby application, served by Nginx
  frontend:
    image: my-registry/my-gatsby-app:latest
    ports:
      - "80:80"
    networks:
      - frontend-api
    deploy:
      replicas: 2

  # The PostgreSQL Database
  postgres:
    image: postgres:14-alpine
    volumes:
      - postgres_data:/var/lib/postgresql/data
    environment:
      POSTGRES_DB: mydb
      POSTGRES_USER: user
      POSTGRES_PASSWORD: password
    networks:
      # Only connected to the backend network
      - backend-db
    deploy:
      placement:
        constraints: [node.role == manager] # Pin DB to a specific node if needed

# Secrets must be declared as 'external' because we created them manually
# outside of this stack file. This is a best practice.
secrets:
  jwt_private_key:
    external: true
  jwt_public_key:
    external: true

# Encrypted overlay networks for service-to-service communication
networks:
  frontend-api:
    driver: overlay
    attachable: true
  backend-db:
    driver: overlay
    internal: true # 'internal' prevents any external access to this network

volumes:
  postgres_data:

This stack definition creates a resilient, secure runtime. The API’s signing key is never part of its image. Network traffic between services is encrypted by default on Swarm’s overlay networks. An attacker who compromises the frontend container cannot directly access the database.

graph TD
    subgraph "Internet"
        U[User Browser]
    end

    subgraph "Docker Swarm Cluster (Encrypted Overlay Networks)"
        LB[Swarm Ingress Routing Mesh]

        subgraph "Network: frontend-api"
            F1[Frontend Container 1]
            F2[Frontend Container 2]
            A1[API Container 1]
            A2[API Container 2]
        end

        subgraph "Network: backend-db (Internal)"
            DB[(PostgreSQL)]
        end

        subgraph "Swarm Secret Storage"
            S_PRIV(jwt_private_key)
            S_PUB(jwt_public_key)
        end
    end

    U -- HTTPS --> LB
    LB -- HTTP --> F1 & F2
    F1 -- GraphQL API Calls --> A1 & A2
    F2 -- GraphQL API Calls --> A1 & A2
    A1 -- SQL --> DB
    A2 -- SQL --> DB
    
    S_PRIV -- mounted as file --> A1
    S_PRIV -- mounted as file --> A2
    S_PUB -- mounted as file --> A1
    S_PUB -- mounted as file --> A2

The limitations of this approach primarily revolve around refresh token revocation. While our implementation removes the token from the database upon use or logout, a stolen refresh token remains valid until it’s used or expires. A truly robust system would implement a revocation list, likely in a fast-access store like Redis, checked during every refresh attempt. This adds complexity and a new dependency but hardens security against token theft. Furthermore, while Docker Swarm provides excellent basic security, it lacks the advanced network policy engines like Calico or Cilium available in Kubernetes, which can enforce Layer 7 policies (e.g., “only allow this specific GraphQL mutation from the frontend service”). This architecture is a significant step up from simple JWTs, but the path of security engineering is one of continuous, iterative improvement.


  TOC