Building a Revocable OAuth 2.0 Session Management System for Android Using a Service Worker and etcd

Architecture

Word Count: 3k

Read Times: 18 Min

The technical pain point emerged during a security audit of a hybrid Android application. Our application heavily relies on WebView components to render dynamic content alongside native UI, creating a fragmented authentication landscape. The existing implementation used long-lived refresh tokens stored in Android’s SharedPreferences, which presented a significant attack surface. A compromised device could lead to prolonged, unauthorized access. Our goal was to transition to a system of short-lived access tokens, but this introduced unacceptable user experience friction due to frequent re-authentication prompts. Furthermore, we had no mechanism for immediate, system-wide session revocation for a specific user or device, a critical requirement in our move towards a zero-trust security model.

Our initial concept was to build a session management layer that was both resilient and centrally controllable. We needed to abstract the token lifecycle management away from the core Android application logic and ensure that revocation commands could propagate across our distributed backend services in near real-time. Polling a central database for a revocation list was deemed too slow and inefficient. This led us to a rather unconventional combination of technologies. The core idea was to leverage a Service Worker, running within a hidden WebView, to act as a secure “session sidecar” for the Android app. This sidecar would handle all token storage, refresh, and injection. For the backend, we decided to use etcd not just as a configuration store, but as a distributed, watchable revocation ledger.

The choice of a Service Worker was pivotal. By loading a trusted, single-purpose web page in a background WebView, we could install a Service Worker to manage the session. This component lives in its own isolated context, can intercept network requests originating from its scope, and has access to robust storage APIs like IndexedDB. This architecture decouples token management from the native Kotlin/Java code. The native side only needs to interact with this WebView, effectively outsourcing all complex OAuth 2.0 PKCE flow continuations and refresh logic to the JavaScript layer. This provides a unified session for both the WebView content and native API calls, which we would proxy through the Service Worker’s fetch handler.

For the backend, etcd was selected over traditional databases or caches like Redis for one primary reason: its Watch API. In a real-world project, a critical security function like session revocation cannot tolerate delays. An administrator needs to be able to click a button and have a user’s access terminated globally within seconds. etcd‘s consensus-based distributed nature provides high availability, and its Watch feature allows our backend services to subscribe to changes on a specific key prefix (e.g., /revocations/). When a token’s unique identifier (jti) is written to this prefix, all watching services are notified immediately. This push-based model is vastly superior to a pull-based (polling) approach in terms of latency and resource efficiency. It forms the backbone of our dynamic policy enforcement.

Here is the high-level architecture we landed on.

sequenceDiagram
    participant App as Android App (Native)
    participant WV as WebView (with Service Worker)
    participant Auth as Authorization Server (Go)
    participant RS as Resource Server (Go)
    participant ETCD as etcd Cluster

    App->>WV: Initiate Login
    WV->>Auth: Request Authorization (PKCE Flow)
    Auth-->>WV: Authorization Code
    WV->>Auth: Exchange Code for Tokens
    Auth->>ETCD: Read Client Config
    Auth-->>WV: Access & Refresh Tokens (with JTI)
    WV-->>WV: Store Tokens in IndexedDB

    Note over App, WV: Session Established

    App->>WV: Request to call /api/data
    WV->>WV: Intercept fetch event
    WV->>RS: GET /api/data (with Access Token)

    RS->>ETCD: Watch /revocations/
    ETCD-->>RS: (Live stream of revocations)

    RS->>RS: Validate Token JWT & Signature
    RS->>RS: Check JTI against local revocation cache
    RS-->>WV: 200 OK, { "data": "..." }
    WV-->>App: Return data

    Note over Auth, ETCD: Admin action: Revoke Token
    Auth->>ETCD: PUT /revocations/jti/
    ETCD-->>RS: Push Notification: Key /revocations/jti/ created
    RS->>RS: Update local revocation cache

    App->>WV: Request to call /api/data again
    WV->>RS: GET /api/data (with same Access Token)
    RS->>RS: Validate Token JWT & Signature
    RS->>RS: Check JTI against cache -> FOUND
    RS-->>WV: 401 Unauthorized
    WV->>WV: Clear tokens from IndexedDB
    WV-->>App: Signal Authentication Failure

Backend Implementation with Go and etcd

The foundation of this system is the Authorization Server and the middleware used by Resource Servers. Both are written in Go and interact heavily with the etcd cluster.

First, let’s define the structure for our OAuth client configuration, which we’ll store in etcd.

// file: etcd/config.go
package etcd

import (
	"context"
	"encoding/json"
	"fmt"
	"time"

	clientv3 "go.etcd.io/etcd/client/v3"
)

// OAuthClientConfig defines the structure for storing client info in etcd.
type OAuthClientConfig struct {
	ClientID     string   `json:"client_id"`
	ClientSecret string   `json:"client_secret"` // Hashed in a real scenario
	RedirectURIs []string `json:"redirect_uris"`
	AllowedScopes []string `json:"allowed_scopes"`
}

// ConfigStore manages client configurations within etcd.
type ConfigStore struct {
	client *clientv3.Client
	prefix string
}

func NewConfigStore(client *clientv3.Client) *ConfigStore {
	return &ConfigStore{client: client, prefix: "/oauth/clients/"}
}

// GetClient fetches a client configuration from etcd.
func (s *ConfigStore) GetClient(ctx context.Context, clientID string) (*OAuthClientConfig, error) {
	key := s.prefix + clientID
	ctx, cancel := context.WithTimeout(ctx, 5*time.Second)
	defer cancel()

	resp, err := s.client.Get(ctx, key)
	if err != nil {
		return nil, fmt.Errorf("etcd get failed for key %s: %w", key, err)
	}

	if len(resp.Kvs) == 0 {
		return nil, fmt.Errorf("client with id %s not found", clientID)
	}

	var config OAuthClientConfig
	if err := json.Unmarshal(resp.Kvs[0].Value, &config); err != nil {
		return nil, fmt.Errorf("failed to unmarshal client config: %w", err)
	}

	return &config, nil
}

The most critical backend component is the revocation watcher and the corresponding middleware for our Resource Servers. This is where the power of etcd‘s watch mechanism is realized.

// file: middleware/revocation.go
package middleware

import (
	"context"
	"log"
	"net/http"
	"strings"
	"sync"
	"time"

	"github.com/golang-jwt/jwt/v4"
	clientv3 "go.etcd.io/etcd/client/v3"
)

// RevocationWatcher maintains a real-time cache of revoked token JTIs.
type RevocationWatcher struct {
	client       *clientv3.Client
	revokedJTIs  map[string]struct{}
	mu           sync.RWMutex
	prefix       string
	syncComplete chan struct{}
}

// NewRevocationWatcher creates and starts a new watcher.
func NewRevocationWatcher(client *clientv3.Client) *RevocationWatcher {
	watcher := &RevocationWatcher{
		client:       client,
		revokedJTIs:  make(map[string]struct{}),
		prefix:       "/revocations/jti/",
		syncComplete: make(chan struct{}),
	}
	go watcher.start()
	// Block until the initial sync is complete to avoid race conditions on startup.
	<-watcher.syncComplete
	return watcher
}

func (rw *RevocationWatcher) start() {
	log.Println("Starting revocation watcher...")

	// Initial population of the revocation list.
	ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
	resp, err := rw.client.Get(ctx, rw.prefix, clientv3.WithPrefix())
	cancel()
	if err != nil {
		log.Fatalf("Failed to perform initial sync of revocations: %v", err)
	}

	rw.mu.Lock()
	for _, kv := range resp.Kvs {
		jti := strings.TrimPrefix(string(kv.Key), rw.prefix)
		rw.revokedJTIs[jti] = struct{}{}
	}
	rw.mu.Unlock()
	log.Printf("Initial sync complete. Loaded %d revoked JTIs.", len(resp.Kvs))
	close(rw.syncComplete) // Signal that initial sync is done.

	// Start watching for future changes.
	watchChan := rw.client.Watch(context.Background(), rw.prefix, clientv3.WithPrefix(), clientv3.WithRev(resp.Header.Revision+1))

	for watchResp := range watchChan {
		for _, event := range watchResp.Events {
			jti := strings.TrimPrefix(string(event.Kv.Key), rw.prefix)
			switch event.Type {
			case clientv3.EventTypePut:
				rw.mu.Lock()
				rw.revokedJTIs[jti] = struct{}{}
				rw.mu.Unlock()
				log.Printf("Revocation added for JTI: %s", jti)
			case clientv3.EventTypeDelete:
				// Keys are added with a TTL, so they self-delete when the token expires.
				rw.mu.Lock()
				delete(rw.revokedJTIs, jti)
				rw.mu.Unlock()
				log.Printf("Revocation expired and removed for JTI: %s", jti)
			}
		}
	}
}

// IsRevoked checks if a given JTI is in the revocation cache.
func (rw *RevocationWatcher) IsRevoked(jti string) bool {
	rw.mu.RLock()
	defer rw.mu.RUnlock()
	_, found := rw.revokedJTIs[jti]
	return found
}

// AuthMiddleware creates an HTTP middleware to verify JWTs and check for revocation.
func AuthMiddleware(watcher *RevocationWatcher, jwtSecret []byte) func(http.Handler) http.Handler {
	return func(next http.Handler) http.Handler {
		return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
			authHeader := r.Header.Get("Authorization")
			if authHeader == "" {
				http.Error(w, "Authorization header required", http.StatusUnauthorized)
				return
			}

			tokenString := strings.TrimPrefix(authHeader, "Bearer ")
			if tokenString == authHeader {
				http.Error(w, "Bearer token required", http.StatusUnauthorized)
				return
			}

			token, err := jwt.Parse(tokenString, func(token *jwt.Token) (interface{}, error) {
				if _, ok := token.Method.(*jwt.SigningMethodHMAC); !ok {
					return nil, http.ErrAbortHandler
				}
				return jwtSecret, nil
			})

			if err != nil || !token.Valid {
				http.Error(w, "Invalid token", http.StatusUnauthorized)
				return
			}

			claims, ok := token.Claims.(jwt.MapClaims)
			if !ok {
				http.Error(w, "Invalid token claims", http.StatusUnauthorized)
				return
			}

			jti, ok := claims["jti"].(string)
			if !ok {
				http.Error(w, "Token missing jti claim", http.StatusUnauthorized)
				return
			}

			if watcher.IsRevoked(jti) {
				log.Printf("Access denied for revoked JTI: %s", jti)
				http.Error(w, "Token has been revoked", http.StatusUnauthorized)
				return
			}

			// Add user info to context for downstream handlers
			ctx := context.WithValue(r.Context(), "userID", claims["sub"])
			next.ServeHTTP(w, r.WithContext(ctx))
		})
	}
}

A common mistake here is to perform the initial sync and start watching in parallel without a synchronization primitive. This can lead to a service starting to accept requests before its revocation list is fully populated, creating a small window of vulnerability. The use of the syncComplete channel ensures the middleware doesn’t become active until the watcher is ready.

Service Worker as a Session Sidecar

The Service Worker is the client-side heart of this system. It intercepts requests, manages tokens in IndexedDB, and handles the refresh flow transparently.

// file: service-worker.js

const TOKEN_STORE_NAME = 'token-store';
const DB_NAME = 'auth-db';
let dbPromise = null;

function getDb() {
    if (!dbPromise) {
        dbPromise = new Promise((resolve, reject) => {
            const request = indexedDB.open(DB_NAME, 1);
            request.onerror = () => reject("Error opening IndexedDB.");
            request.onsuccess = (event) => resolve(event.target.result);
            request.onupgradeneeded = (event) => {
                const db = event.target.result;
                db.createObjectStore(TOKEN_STORE_NAME, { keyPath: 'id' });
            };
        });
    }
    return dbPromise;
}

async function setTokenData(tokenData) {
    const db = await getDb();
    return new Promise((resolve, reject) => {
        const transaction = db.transaction(TOKEN_STORE_NAME, 'readwrite');
        const store = transaction.objectStore(TOKEN_STORE_NAME);
        const request = store.put({ id: 'session', ...tokenData });
        request.onsuccess = () => resolve();
        request.onerror = (event) => reject(`Error storing token data: ${event.target.error}`);
    });
}

async function getTokenData() {
    const db = await getDb();
    return new Promise((resolve, reject) => {
        const transaction = db.transaction(TOKEN_STORE_NAME, 'readonly');
        const store = transaction.objectStore(TOKEN_STORE_NAME);
        const request = store.get('session');
        request.onsuccess = (event) => resolve(event.target.result);
        request.onerror = (event) => reject(`Error fetching token data: ${event.target.error}`);
    });
}

self.addEventListener('install', (event) => {
    event.waitUntil(self.skipWaiting());
});

self.addEventListener('activate', (event) => {
    event.waitUntil(self.clients.claim());
});

// A simple lock to prevent concurrent token refreshes
let isRefreshing = false;
let refreshSubscribers = [];

async function refreshToken() {
    const tokenData = await getTokenData();
    if (!tokenData || !tokenData.refresh_token) {
        throw new Error('No refresh token available.');
    }

    // In a production scenario, you would perform the token refresh call here.
    // This involves a POST request to your auth server's token endpoint
    // with grant_type=refresh_token.
    console.log("Attempting to refresh token...");
    // const response = await fetch('/auth/token', { ... });
    // const newTokens = await response.json();
    // await setTokenData(newTokens);
    // return newTokens.access_token;
    
    // For demonstration, we simulate a successful refresh
    const newAccessToken = `refreshed-access-token-${Date.now()}`;
    await setTokenData({ ...tokenData, access_token: newAccessToken, expires_at: Date.now() + 3600 * 1000 });
    return newAccessToken;
}

self.addEventListener('fetch', (event) => {
    // Only intercept requests to our API
    if (event.request.url.includes('/api/')) {
        event.respondWith(
            (async () => {
                const tokenData = await getTokenData();

                if (!tokenData) {
                    // No session, let the request fail. The app should handle the 401.
                    return fetch(event.request);
                }

                // A common pitfall is not accounting for clock skew.
                // Subtract a buffer (e.g., 60 seconds) to be safe.
                const tokenIsExpired = Date.now() > (tokenData.expires_at - 60000);

                if (tokenIsExpired) {
                    if (!isRefreshing) {
                        isRefreshing = true;
                        try {
                            const newAccessToken = await refreshToken();
                            isRefreshing = false;
                            // Fulfill queued requests
                            refreshSubscribers.forEach(callback => callback(newAccessToken));
                            refreshSubscribers = [];
                            // Retry the original request with the new token
                            const headers = new Headers(event.request.headers);
                            headers.set('Authorization', `Bearer ${newAccessToken}`);
                            return fetch(event.request.url, { ...event.request, headers });
                        } catch (error) {
                            isRefreshing = false;
                            console.error("Token refresh failed:", error);
                            // Clear tokens and signal failure
                            await setTokenData(null); 
                            return new Response('Authentication failed', { status: 401 });
                        }
                    } else {
                        // If a refresh is already in progress, queue this request
                        return new Promise(resolve => {
                            refreshSubscribers.push(newAccessToken => {
                                const headers = new Headers(event.request.headers);
                                headers.set('Authorization', `Bearer ${newAccessToken}`);
                                resolve(fetch(event.request.url, { ...event.request, headers }));
                            });
                        });
                    }
                } else {
                    const headers = new Headers(event.request.headers);
                    headers.set('Authorization', `Bearer ${tokenData.access_token}`);
                    return fetch(new Request(event.request.url, {
                        ...event.request,
                        headers,
                    }));
                }
            })()
        );
    }
});

Android Client Integration

The Android app needs to manage the WebView that hosts the Service Worker, initiate the login flow, and proxy its API calls through the JavaScript layer.

// file: AuthWebViewClient.kt
package com.example.secureapp

import android.content.Intent
import android.net.Uri
import android.webkit.WebResourceRequest
import android.webkit.WebView
import android.webkit.WebViewClient

// This client handles the OAuth redirect.
class AuthWebViewClient(private val onAuthCodeReceived: (String) -> Unit) : WebViewClient() {
    private val redirectUri = "com.example.secureapp:/oauth2/callback"

    override fun shouldOverrideUrlLoading(view: WebView?, request: WebResourceRequest?): Boolean {
        val url = request?.url.toString()
        if (url.startsWith(redirectUri)) {
            val code = request?.url?.getQueryParameter("code")
            if (code != null) {
                onAuthCodeReceived(code)
                return true // We've handled the redirect.
            }
        }
        return false // Let the WebView handle the URL.
    }
}

// file: MainActivity.kt
package com.example.secureapp

import android.os.Bundle
import android.util.Log
import android.webkit.JavascriptInterface
import android.webkit.WebView
import androidx.appcompat.app.AppCompatActivity
import androidx.browser.customtabs.CustomTabsIntent

class MainActivity : AppCompatActivity() {

    private lateinit var webView: WebView
    private val authServiceUrl = "https://your-auth-service.com"
    private val trustedOrigin = "https://your-trusted-sw-host.com"

    override fun onCreate(savedInstanceState: Bundle?) {
        super.onCreate(savedInstanceState)
        setContentView(R.layout.activity_main)
        setupWebView()

        // Example: Triggering a native call that gets proxied through the SW
        findViewById<Button>(R.id.call_api_button).setOnClickListener {
            makeProxiedApiCall("/api/v1/user/profile")
        }
    }

    private fun setupWebView() {
        webView = findViewById(R.id.webView)
        webView.settings.javaScriptEnabled = true
        webView.settings.domStorageEnabled = true // Required for Service Workers

        // This interface allows JavaScript to call back into native Kotlin code.
        webView.addJavascriptInterface(WebAppInterface(), "Android")

        webView.webViewClient = AuthWebViewClient { code ->
            // The redirect was caught. Now, pass the code to the WebView's JS context
            // so the Service Worker can exchange it for a token.
            webView.evaluateJavascript("window.handleAuthCode('$code');", null)
        }
        
        // Load the trusted page that registers the Service Worker.
        webView.loadUrl(trustedOrigin)
    }

    private fun startLoginFlow() {
        // Use Chrome Custom Tabs for the login flow - it's more secure than a WebView
        // as it doesn't share cookies with the app's WebView.
        val pkce = generatePkce() // Implement PKCE code verifier/challenge generation
        val authUrl = Uri.parse("$authServiceUrl/authorize")
            .buildUpon()
            .appendQueryParameter("response_type", "code")
            .appendQueryParameter("client_id", "android-client-id")
            .appendQueryParameter("redirect_uri", "com.example.secureapp:/oauth2/callback")
            .appendQueryParameter("scope", "read write")
            .appendQueryParameter("code_challenge", pkce.challenge)
            .appendQueryDarameter("code_challenge_method", "S256")
            .build()
        
        val builder = CustomTabsIntent.Builder()
        val customTabsIntent = builder.build()
        customTabsIntent.launchUrl(this, authUrl)
    }

    private fun makeProxiedApiCall(endpoint: String) {
        val script = """
            // This JS code runs inside the WebView and leverages the Service Worker
            fetch('$trustedOrigin$endpoint')
                .then(response => {
                    if (!response.ok) {
                        // If we get a 401, it means the session is dead.
                        if (response.status === 401) {
                            Android.onAuthFailure();
                        }
                        throw new Error(`HTTP error! status: ${'$'}{response.status}`);
                    }
                    return response.json();
                })
                .then(data => Android.onApiResponse(JSON.stringify(data)))
                .catch(error => Android.onApiError(error.toString()));
        """
        webView.evaluateJavascript(script, null)
    }

    inner class WebAppInterface {
        @JavascriptInterface
        fun onAuthFailure() {
            // The SW failed to get a valid token. Trigger the login flow.
            runOnUiThread {
                startLoginFlow()
            }
        }

        @JavascriptInterface
        fun onApiResponse(jsonData: String) {
            Log.d("API_SUCCESS", "Received data: $jsonData")
        }

        @JavascriptInterface
        fun onApiError(error: String) {
            Log.e("API_ERROR", "API call failed: $error")
        }
    }
}

The final result is a robust system. The Android application’s session state is fully managed within an isolated Service Worker context. Token refreshes are automatic and invisible to the user. Most importantly, an administrator can revoke a specific token by writing its jti to etcd, and within milliseconds, every single microservice in the infrastructure will reject that token. This closes a critical security gap present in traditional refresh token models.

This architecture is not without its limitations. The primary dependency is on a healthy, well-maintained etcd cluster. An etcd outage would impact the ability to revoke tokens and, depending on implementation, could even affect new token issuance if client configurations are also stored there. Furthermore, the number of watches on the etcd cluster scales linearly with the number of resource server instances. For very large-scale deployments, an intermediate caching or fan-out service might be necessary to shield etcd from excessive watch connections. Finally, the Service Worker’s lifecycle is managed by the browser engine within the WebView, and while generally persistent, it can be terminated under heavy memory pressure by the Android OS, potentially requiring a re-initialization phase when the app is next opened. Future work could explore using WorkManager on Android to periodically “ping” the WebView to keep the Service Worker alive during critical background operations.

Security OAuth 2.0 etcd Android Service Workers Go

Implementing Stateful Request Enrichment in Kong via a High-Concurrency Clojure gRPC Service on Alibaba Cloud

2023-10-27 API Gateway

Kong Alibaba Cloud Clojure gRPC Kubernetes

Implementing a Test-Driven Secure Data Access Layer with Dynamic mTLS for Solr and MariaDB

2023-10-27 Backend Development

Solr TDD mTLS MariaDB Vault Testcontainers