The mandate was seemingly straightforward: build an event-driven data enrichment pipeline. A new Google Cloud Function would trigger on a Pub/Sub message, fetch supplementary data from our core user-profile
service, and push the enriched result to a BigQuery table. The complication arose from our infrastructure’s security posture. The user-profile
service lives inside a hardened Google Kubernetes Engine (GKE) cluster, governed by an Istio service mesh with a default-deny policy. All inter-service communication requires strict mTLS, and no service is exposed directly to the public internet.
Our initial discussions revolved around two non-starters. The first, exposing the user-profile
service via a public GKE Ingress, was immediately vetoed. It would create a hole in our security perimeter, nullifying the entire purpose of the service mesh. The second, a complex VPN or Interconnect setup, was deemed operational overkill for a single serverless function’s needs. It would introduce significant latency and maintenance overhead.
The core problem remained: how to allow a managed, serverless entity running on Google’s infrastructure to securely and efficiently communicate with a private workload inside our Istio mesh, without compromising our Zero Trust principles. The solution required bridging two distinct networking and identity domains. This led us to an architecture based on three key components: a Serverless VPC Access connector to bridge the network layer, an internal-facing Istio Ingress Gateway to act as a policy enforcement point, and a JWT-based authentication mechanism to bridge the identity gap. To ensure performance and minimize cold start latency, a critical factor in event-driven systems, we would leverage esbuild
to create a minimal, optimized function bundle.
The Architectural Foundation: Network and Service Mesh Configuration
Before writing a single line of function code, the infrastructure must be correctly configured. The goal is to create a private network path from the Cloud Function environment into the GKE cluster’s VPC, terminating at a specific, controlled entry point in the service mesh.
1. The GKE Cluster and Target Service
Assume a standard GKE cluster with Istio installed and automatic sidecar injection enabled for the default
namespace. Our target is a simple user-profile
service.
# user-profile-service.yaml
apiVersion: v1
kind: Service
metadata:
name: user-profile
labels:
app: user-profile
spec:
ports:
- port: 80
name: http
targetPort: 8080
selector:
app: user-profile
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: user-profile-deployment
spec:
replicas: 2
selector:
matchLabels:
app: user-profile
template:
metadata:
labels:
app: user-profile
spec:
containers:
- name: user-profile
# A simple echo server for demonstration purposes.
# In a real-world project, this would be your business logic container.
image: hashicorp/http-echo
args:
- "-text={\"userId\": \"123\", \"email\": \"[email protected]\"}"
- "-listen=:8080"
ports:
- containerPort: 8080
Deploying this creates the internal service, but by default, it’s unreachable from outside the mesh.
2. Bridging the VPC with a Serverless Connector
Google Cloud Functions do not run inside a user’s VPC by default. To grant them access to internal resources like a GKE cluster’s internal IP range, we must create a Serverless VPC Access connector. This is a non-trivial resource; it reserves a dedicated subnet within the target VPC and acts as a network proxy.
# Ensure you are using the correct project and region
gcloud config set project YOUR_PROJECT_ID
gcloud config set compute/region YOUR_REGION
# Create the VPC Access Connector
# It requires a /28 IP range within your VPC that doesn't overlap with other subnets.
gcloud compute networks vpc-access connectors create gcf-to-gke-connector \
--network YOUR_VPC_NAME \
--region YOUR_REGION \
--range 10.8.0.0/28
A common mistake here is under-provisioning the connector or choosing an IP range that will later conflict with GKE pod or service CIDR ranges. Careful VPC planning is essential.
3. Istio Ingress Gateway for Internal Traffic
The standard Istio ingress-gateway
is designed for public-facing traffic. We need a dedicated gateway for internal traffic originating from our VPC, such as our Cloud Function. We achieve this by deploying a new gateway instance and annotating its service to request an internal TCP/UDP load balancer from Google Cloud.
# internal-gateway.yaml
apiVersion: v1
kind: Service
metadata:
name: istio-internal-gateway
namespace: istio-system
annotations:
# This is the critical annotation for GKE
networking.gke.io/load-balancer-type: "Internal"
labels:
istio: internal-gateway
spec:
type: LoadBalancer
ports:
- port: 80
targetPort: 8080
name: http2
selector:
istio: internal-gateway
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: istio-internal-gateway
namespace: istio-system
spec:
replicas: 1 # Adjust for production needs
selector:
matchLabels:
istio: internal-gateway
template:
metadata:
labels:
# Match the service selector
istio: internal-gateway
annotations:
# Prevent this gateway from getting a public IP
"sidecar.istio.io/inject": "true"
spec:
containers:
- name: istio-proxy
image: auto # Istio will inject the correct proxy image
ports:
- containerPort: 8080
After applying this, kubectl get svc -n istio-system istio-internal-gateway
will show an EXTERNAL-IP
which is actually an internal IP address within our VPC. This IP is the target our Cloud Function will call.
4. Exposing the Service via the Gateway
Now we wire the internal gateway to our user-profile
service using Istio’s Gateway
and VirtualService
resources.
# user-profile-routing.yaml
apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
name: internal-traffic-gateway
spec:
selector:
istio: internal-gateway # Binds to our internal gateway deployment
servers:
- port:
number: 80
name: http
protocol: HTTP
hosts:
- "user-profile.internal" # A virtual hostname for routing
---
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: user-profile-vs
spec:
hosts:
- "user-profile.internal"
gateways:
- internal-traffic-gateway
http:
- route:
- destination:
host: user-profile.default.svc.cluster.local
port:
number: 80
This configuration tells the istio-internal-gateway
that any request for user-profile.internal
on port 80 should be routed to the user-profile
service.
Bridging Identity with JWT and Istio Policies
Network connectivity is only half the battle. The service mesh still needs to authenticate and authorize the incoming request. Since the Cloud Function cannot participate in Istio’s mTLS identity fabric (SPIFFE), we must use an alternative identity token: a JSON Web Token (JWT).
The flow is as follows:
- The Cloud Function will generate a signed JWT.
- The JWT will be included in the
Authorization
header of the request to the internal gateway. - The Istio gateway will be configured with a
RequestAuthentication
policy to validate the JWT’s signature and issuer. - An
AuthorizationPolicy
will grant access only if the request contains a valid, validated JWT.
For this example, we’ll use a simple asymmetric key pair for signing and verification. In a production environment, the public key would be exposed via a JWKS (JSON Web Key Set) endpoint.
# user-profile-jwt-policy.yaml
apiVersion: security.istio.io/v1beta1
kind: RequestAuthentication
metadata:
name: jwt-for-user-profile
namespace: istio-system # Apply on the gateway namespace
spec:
selector:
matchLabels:
istio: internal-gateway
jwtRules:
- issuer: "[email protected]"
# For a real project, use a jwksUri. For this demo, we embed the key.
# The public key must be in JWKS format.
jwks: |
{
"keys": [
{
"kty": "EC",
"crv": "P-256",
"x": "...", # Base64Url encoded X coordinate
"y": "...", # Base64Url encoded Y coordinate
"kid": "jwt-signing-key-v1"
}
]
}
---
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
name: require-jwt-for-user-profile
namespace: default # Apply on the target service namespace
spec:
selector:
matchLabels:
app: user-profile
action: ALLOW
rules:
- from:
- source:
# The principals are the Istio identities of the gateways
principals: ["cluster.local/ns/istio-system/sa/istio-internal-gateway-service-account"]
to:
- operation:
methods: ["GET"]
paths: ["/profile/*"]
when:
# This condition requires a valid JWT principal from the RequestAuthentication
- key: request.auth.claims[iss]
values: ["[email protected]"]
The pitfall here is policy placement. The RequestAuthentication
policy must be applied to the workload that terminates the TLS and inspects the headers—in this case, the istio-internal-gateway
in the istio-system
namespace. The AuthorizationPolicy
, however, must be applied in the namespace of the target service (default
) to protect the user-profile
workload itself.
The Cloud Function: Optimized for Performance with esbuild
With the infrastructure ready, we can build the Cloud Function. Our focus is not just on functionality but also on minimizing the cold start time. A large, dependency-heavy function bundle can add hundreds of milliseconds to the initial invocation latency.
1. Project Structure and Dependencies
.
├── src/
│ └── index.ts
├── build.mjs
├── package.json
└── tsconfig.json
We’ll use TypeScript for type safety. The key dependencies are axios
for HTTP requests, jose
for robust JWT signing, and @google-cloud/secret-manager
to securely fetch our private signing key.
// package.json
{
"name": "gke-egress-function",
"version": "1.0.0",
"main": "dist/index.js",
"scripts": {
"build": "node build.mjs"
},
"dependencies": {
"@google-cloud/functions-framework": "^3.3.0",
"@google-cloud/secret-manager": "^5.2.0",
"axios": "^1.6.0",
"jose": "^5.1.0"
},
"devDependencies": {
"@types/node": "^20.8.10",
"esbuild": "^0.19.5",
"typescript": "^5.2.2"
}
}
2. esbuild Configuration
Instead of a complex webpack.config.js
, we use a simple build script for esbuild
. Its speed is transformative for quick iteration cycles.
// build.mjs
import * as esbuild from 'esbuild';
const sharedConfig = {
entryPoints: ['src/index.ts'],
bundle: true,
platform: 'node',
target: 'node18',
// Google Cloud Functions provides this package at runtime,
// so we mark it as external to avoid bundling it.
// This is a critical optimization for bundle size.
external: ['@google-cloud/functions-framework'],
minify: true,
sourcemap: true,
};
await esbuild.build({
...sharedConfig,
outfile: 'dist/index.js',
format: 'cjs', // CommonJS format required by older GCF runtimes
});
console.log('Build finished successfully.');
The most important line is external: ['@google-cloud/functions-framework']
. Omitting this would bundle the entire Functions Framework, unnecessarily bloating our deployment package. Real-world projects often have several such runtime-provided dependencies that should be excluded.
3. Core Function Logic
The TypeScript code implements the JWT signing and the internal API call. It’s structured to cache the fetched private key in a global variable to avoid refetching it from Secret Manager on every “warm” invocation.
// src/index.ts
import { HttpFunction } from '@google-cloud/functions-framework';
import { SecretManagerServiceClient } from '@google-cloud/secret-manager';
import axios, { isAxiosError } from 'axios';
import * as jose from 'jose';
// Configuration from environment variables
const GKE_GATEWAY_URL = process.env.GKE_GATEWAY_URL; // e.g., http://10.128.0.5/profile/123
const JWT_ISSUER = process.env.JWT_ISSUER;
const SIGNING_KEY_SECRET_ID = process.env.SIGNING_KEY_SECRET_ID; // e.g., projects/123/secrets/jwt-key/versions/latest
// Cache for the signing key to avoid repeated fetches on warm starts
let signingKey: jose.KeyLike | null = null;
const secretManager = new SecretManagerServiceClient();
async function getSigningKey(): Promise<jose.KeyLike> {
if (signingKey) {
return signingKey;
}
try {
const [version] = await secretManager.accessSecretVersion({
name: SIGNING_KEY_SECRET_ID,
});
const keyData = version.payload?.data?.toString();
if (!keyData) {
throw new Error('Private key not found in Secret Manager.');
}
// This logic assumes the key is stored in PEM format.
const importedKey = await jose.importSPKI(keyData, 'ES256');
signingKey = importedKey;
return signingKey;
} catch (error) {
console.error('Failed to fetch or import signing key:', error);
// In a real system, you'd have more robust error handling/alerting
throw new Error('Internal configuration error: could not load signing key.');
}
}
export const callInternalService: HttpFunction = async (req, res) => {
if (!GKE_GATEWAY_URL || !JWT_ISSUER || !SIGNING_KEY_SECRET_ID) {
console.error('Missing required environment variables.');
res.status(500).send('Server configuration error.');
return;
}
try {
const key = await getSigningKey();
// Generate the JWT for this request
const jwt = await new jose.SignJWT({ 'scope': 'read:user-profile' })
.setProtectedHeader({ alg: 'ES256' })
.setIssuedAt()
.setIssuer(JWT_ISSUER)
.setAudience('user-profile-service')
.setExpirationTime('5m')
.sign(key);
// Make the authenticated call to the internal gateway
const response = await axios.get(GKE_GATEWAY_URL, {
headers: {
'Authorization': `Bearer ${jwt}`,
// Pass the Host header required by the Istio VirtualService
'Host': 'user-profile.internal'
},
timeout: 3000, // Important to set a timeout
});
res.status(200).json(response.data);
} catch (error) {
console.error('Error calling internal service:', error);
if (isAxiosError(error) && error.response) {
// Forward the error from the downstream service if available
res.status(error.response.status).send(error.response.data);
} else {
res.status(500).send('An unexpected error occurred.');
}
}
};
A subtle but critical detail is setting the Host
header. The axios
call is to an IP address, but the Istio VirtualService
routes based on the hostname (user-profile.internal
). We must explicitly provide this header for the routing rule to match.
4. Deployment Script
Finally, a shell script automates the build and deploy process, ensuring all required flags are set correctly.
#!/bin/bash
set -e # Exit immediately if a command exits with a non-zero status.
# --- Configuration ---
PROJECT_ID="your-project-id"
REGION="your-region"
FUNCTION_NAME="gke-egress-proxy"
VPC_CONNECTOR="gcf-to-gke-connector"
GKE_GATEWAY_URL="http://INTERNAL_GATEWAY_IP/profile/123"
JWT_ISSUER="gcp-function-issuer@${PROJECT_ID}.iam.gserviceaccount.com"
SIGNING_KEY_SECRET_ID="projects/${PROJECT_ID}/secrets/jwt-signing-key/versions/latest"
# This service account needs roles/secretmanager.secretAccessor
FUNCTION_SERVICE_ACCOUNT="${FUNCTION_NAME}@${PROJECT_ID}.iam.gserviceaccount.com"
# --- Build Step ---
echo "Building function with esbuild..."
npm run build
# --- Deploy Step ---
echo "Deploying function to Google Cloud..."
gcloud functions deploy ${FUNCTION_NAME} \
--gen2 \
--runtime=nodejs18 \
--region=${REGION} \
--source=./dist \
--entry-point=callInternalService \
--trigger-http \
--allow-unauthenticated \
--vpc-connector=${VPC_CONNECTOR} \
--service-account=${FUNCTION_SERVICE_ACCOUNT} \
--set-env-vars="GKE_GATEWAY_URL=${GKE_GATEWAY_URL},JWT_ISSUER=${JWT_ISSUER},SIGNING_KEY_SECRET_ID=${SIGNING_KEY_SECRET_ID}"
echo "Deployment complete."
The resulting architecture is robust and secure.
graph TD A[External Event e.g., Pub/Sub] --> B{Google Cloud Function}; B -- 1. Fetch Key --> C[Secret Manager]; B -- 2. Generate JWT --> B; B -- 3. HTTPS Request w/ JWT --> D[VPC Network]; subgraph VPC Network D -- 4. Via VPC Connector --> E[Internal Load Balancer IP]; end subgraph GKE Cluster / Istio Mesh E -- 5. --> F[istio-internal-gateway]; F -- 6. Validate JWT --> F; F -- 7. Route based on Host Header --> G[user-profile sidecar]; G -- 8. mTLS --> H[user-profile app]; end
The solution successfully bridges the two environments. The Cloud Function operates in its managed environment, while the GKE cluster maintains its strict security perimeter. The Istio gateway acts as a trusted intermediary, translating the JWT-based identity from the “outside” world into an authorized request within the mesh. The use of esbuild
ensures that our bridge component remains lightweight and responsive, minimizing the performance penalty of a serverless architecture.
This pattern, however, is not without its own complexities and trade-offs. The management of JWT signing keys, including rotation and revocation, becomes a critical security function that must be managed outside this specific implementation. Furthermore, the VPC Access connector is a stateful, always-on resource, which introduces a fixed cost component to an otherwise “serverless” design. For scenarios requiring extremely low network latency, the overhead of traversing the VPC connector and gateway might be too high, potentially favoring a solution like Cloud Run on GKE, which would place the compute directly inside the cluster, albeit at the cost of the operational simplicity of Cloud Functions.