Executing arbitrary, untrusted code on behalf of multiple users is a significant security challenge. A simple container escape or a network misconfiguration can lead to a full-scale breach, allowing lateral movement across a shared infrastructure. The primary pain point is achieving robust isolation at both the compute and network layers without incurring the prohibitive cost and latency of provisioning a full virtual machine for every single task. A shared Kubernetes cluster, for instance, introduces a massive attack surface and complex RBAC policies that are often misconfigured. The goal was to build a system where tenant workloads are treated as hostile by default, enforcing isolation through ephemeral, programmatically-defined infrastructure.
Our initial concept was a “Zero-Trust Execution Environment.” A user, authenticated via a phishing-resistant mechanism, submits a container image reference to a control plane API. The system then dynamically provisions a short-lived, heavily restricted environment, executes the container, streams results to a durable data store, and tears down the entire environment, leaving no residual state or access paths. This model demands a tight integration of identity, networking, runtime, and data storage components, each chosen specifically to enforce a segment of the security boundary.
The technology selection was driven by the principle of least privilege and minimal surface area:
Network Isolation: AWS VPCs and Security Groups. Instead of a large, shared VPC, we opted for creating security groups on-the-fly for each execution job. These groups would have a default-deny policy, with egress rules programmatically added to permit access only to necessary AWS service endpoints (like DynamoDB and S3) via VPC Gateway Endpoints. This completely severs the workload’s access to the public internet and any other internal services, making lateral movement impossible at the network layer.
Runtime Isolation:
containerd
. We explicitly avoided the Docker daemon.containerd
provides the core container lifecycle management we need without the additional networking, volume management, and API layers of Docker, thereby reducing the potential attack surface. It allows us to interact directly with namespaces, cgroups, and the OCI runtime specification, giving us fine-grained control over the container’s environment. The plan was to runcontainerd
on minimalist EC2 instances that would serve as ephemeral execution nodes.Authentication: WebAuthn. For the control plane API that accepts workload submissions, password-based or simple API key authentication was insufficient. Given that a compromised developer account could submit malicious code, we required a phishing-resistant multifactor authentication method. WebAuthn (FIDO2) was the only logical choice, binding authentication to a physical hardware key.
State & Data Management: Amazon DynamoDB. A NoSQL database was a natural fit for storing job metadata, status, and results. The key advantage of DynamoDB in this architecture is its deep integration with AWS IAM. Each ephemeral execution node could be assigned an IAM role with a policy granting it permission to read and write only the specific DynamoDB items related to its assigned job. This provides database-level tenant isolation, preventing a compromised workload from accessing another job’s data, even if they share the same table.
The architecture is split into three main components: the Control Plane (API server), the Orchestrator (state machine and infrastructure provisioner), and the Execution Agent (running on the ephemeral nodes).
sequenceDiagram participant User participant ControlPlaneAPI participant Orchestrator participant DynamoDB participant AWS_EC2_API participant ExecutionNode participant containerd User->>ControlPlaneAPI: Register/Login via WebAuthn ControlPlaneAPI-->>User: JWT Session Token User->>ControlPlaneAPI: POST /submit (JWT, containerImage) ControlPlaneAPI->>Orchestrator: Enqueue Job Orchestrator->>DynamoDB: Create Job Item (Status: PENDING) Orchestrator->>AWS_EC2_API: 1. Create Security Group (Egress-only to VPC Endpoints) Orchestrator->>AWS_EC2_API: 2. Create IAM Role (Job-specific DynamoDB access) Orchestrator->>AWS_EC2_API: 3. Run EC2 Instance (with SG, IAM Role, UserData) ExecutionNode-->>DynamoDB: Poll for job details (using IAM role) ExecutionNode->>containerd: Pull image & Create container ExecutionNode->>containerd: Start Task (run user code) containerd-->>ExecutionNode: Logs/Exit Code ExecutionNode->>DynamoDB: Update Job Item (Status: COMPLETED, Results) ExecutionNode->>Orchestrator: Signal Job Completion (e.g., via SQS) Orchestrator->>AWS_EC2_API: Terminate EC2 Instance Orchestrator->>AWS_EC2_API: Delete Security Group Orchestrator->>AWS_EC2_API: Delete IAM Role Orchestrator->>DynamoDB: Update Job Item (Status: ARCHIVED)
Control Plane: WebAuthn-Secured API
The entry point must be secure. We used Go and the go-webauthn/webauthn
library to implement the FIDO2/WebAuthn flow. This involves a challenge-response ceremony for both registration and authentication. In a real-world project, user state and credential information would be stored persistently. For this implementation, we use an in-memory store for simplicity.
The core data structures for managing WebAuthn users:
// main.go (Control Plane)
package main
import (
// ... imports
"github.com/go-webauthn/webauthn/protocol"
"github.comcom/go-webauthn/webauthn/webauthn"
)
// User represents a user in our system. In a real application,
// this would be backed by a persistent database.
type User struct {
ID []byte
Name string
DisplayName string
Credentials []webauthn.Credential
}
// webauthn.User interface implementations
func (u User) WebAuthnID() []byte { return u.ID }
func (u User) WebAuthnName() string { return u.Name }
func (u User) WebAuthnDisplayName() string { return u.DisplayName }
func (u User) WebAuthnIcon() string { return "" }
func (u User) WebAuthnCredentials() []webauthn.Credential { return u.Credentials }
// In-memory store for users and sessions.
// THIS IS NOT PRODUCTION-READY. Use a database like DynamoDB or Redis.
var userDB = make(map[string]*User)
var sessionStore = make(map[string]*protocol.SessionData)
// WebAuthn configuration
var webAuthn *webauthn.WebAuthn
func main() {
var err error
webAuthn, err = webauthn.New(&webauthn.Config{
RPID: "localhost", // Relying Party ID - MUST match your domain in production
RPDisplayName: "Secure Execution Sandbox",
RPOrigins: []string{"http://localhost:8080"}, // Allowed origins
})
if err != nil {
log.Fatalf("Failed to create WebAuthn from config: %v", err)
}
// ... setup HTTP server and routes ...
}
The registration and login handlers implement the multi-step WebAuthn ceremony. The crucial part is that upon successful login, we issue a standard JWT, which is then used to authenticate subsequent API calls like submitting a job.
Here is a simplified handler for job submission. It expects a valid JWT in the Authorization
header.
// main.go (Control Plane)
// JobRequest defines the payload for submitting a new execution job.
type JobRequest struct {
ContainerImage string `json:"containerImage"`
Cmd []string `json:"cmd"`
}
// handleSubmit is the authenticated endpoint for starting a new job.
func handleSubmit(w http.ResponseWriter, r *http.Request) {
// In a real implementation, a middleware would handle JWT validation
// and extract the user's identity.
username := "test-user" // Hardcoded for this example
var req JobRequest
if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
http.Error(w, "Invalid request body", http.StatusBadRequest)
return
}
if req.ContainerImage == "" {
http.Error(w, "containerImage is required", http.StatusBadRequest)
return
}
jobID := uuid.New().String()
log.Printf("User '%s' submitted job %s for image %s", username, jobID, req.ContainerImage)
// This is where the control plane hands off to the orchestrator.
// In this example, we'll call the orchestrator logic directly.
// In a production system, this would likely be an SQS message or gRPC call.
err := orchestrator.StartJob(r.Context(), jobID, username, req)
if err != nil {
log.Printf("Error starting job %s: %v", jobID, err)
http.Error(w, "Failed to start job", http.StatusInternalServerError)
return
}
response := map[string]string{"jobID": jobID, "status": "PENDING"}
w.Header().Set("Content-Type", "application/json")
w.WriteHeader(http.StatusAccepted)
json.NewEncoder(w).Encode(response)
}
Orchestrator: Dynamic Infrastructure Provisioning
This is the heart of the isolation mechanism. The orchestrator’s sole responsibility is to translate a job request into a secure, ephemeral set of AWS resources. We use the AWS SDK for Go.
First, storing the job’s initial state in DynamoDB:
// orchestrator/main.go
package orchestrator
import (
"context"
"github.com/aws/aws-sdk-go-v2/aws"
"github.com/aws/aws-sdk-go-v2/config"
"github.com/aws/aws-sdk-go-v2/service/dynamodb"
"github.com/aws/aws-sdk-go-v2/service/dynamodb/types"
// ... other imports
)
const jobsTableName = "SecureExecutionJobs"
type Job struct {
ID string
Owner string
ContainerImage string
Cmd []string
Status string // PENDING, RUNNING, COMPLETED, FAILED
CreatedAt int64
}
func storeInitialJobState(ctx context.Context, ddbClient *dynamodb.Client, job Job) error {
item, err := attributevalue.MarshalMap(job)
if err != nil {
return fmt.Errorf("failed to marshal job struct: %w", err)
}
_, err = ddbClient.PutItem(ctx, &dynamodb.PutItemInput{
TableName: aws.String(jobsTableName),
Item: item,
})
if err != nil {
return fmt.Errorf("failed to put item in DynamoDB: %w", err)
}
return nil
}
The core provisioning logic is a multi-step process. A common pitfall here is failing to clean up resources if one of the intermediate steps fails. The logic must be idempotent and include rollback capabilities.
// orchestrator/provisioner.go
// These values should be configured, not hardcoded.
const (
amiID = "ami-0c55b159cbfafe1f0" // Amazon Linux 2
instanceType = "t3.micro"
vpcID = "vpc-xxxxxxxxxxxx"
subnetID = "subnet-xxxxxxxxxxxx"
dynamoDBVPCEndpointID = "vpce-xxxxxxxxxxxx"
)
func provisionExecutionNode(ctx context.Context, jobID, owner string) (string, error) {
// AWS clients for EC2 and IAM would be initialized elsewhere.
// 1. Create a job-specific IAM Role and Policy.
// The policy grants access ONLY to the DynamoDB item with the matching jobID.
policyDoc := fmt.Sprintf(`{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": ["dynamodb:UpdateItem", "dynamodb:GetItem"],
"Resource": "arn:aws:dynamodb:us-east-1:123456789012:table/%s",
"Condition": {
"ForAllValues:StringEquals": {
"dynamodb:LeadingKeys": ["%s"]
}
}
}
]
}`, jobsTableName, jobID)
roleName := fmt.Sprintf("exec-role-%s", jobID)
// ... AWS API calls to iam.CreateRole, iam.CreatePolicy, iam.AttachRolePolicy ...
// This is complex and requires careful error handling.
// 2. Create a job-specific Security Group.
sgName := fmt.Sprintf("exec-sg-%s", jobID)
sgResult, err := ec2Client.CreateSecurityGroup(ctx, &ec2.CreateSecurityGroupInput{
GroupName: aws.String(sgName),
Description: aws.String(fmt.Sprintf("SG for job %s", jobID)),
VpcId: aws.String(vpcID),
})
// ... error handling ...
sgID := *sgResult.GroupId
// 3. Add an egress rule ONLY to the DynamoDB VPC endpoint's prefix list.
// This blocks all other network traffic, including public internet.
_, err = ec2Client.AuthorizeSecurityGroupEgress(ctx, &ec2.AuthorizeSecurityGroupEgressInput{
GroupId: sgID,
IpPermissions: []types.IpPermission{
{
IpProtocol: aws.String("tcp"),
FromPort: aws.Int32(443),
ToPort: aws.Int32(443),
PrefixListIds: []types.PrefixListId{
{
PrefixListId: aws.String(getPrefixListForVPCEndpoint(dynamoDBVPCEndpointID)),
},
},
},
},
})
// ... error handling and cleanup logic on failure ...
// 4. Launch the EC2 instance with the role and security group.
// The UserData script will install containerd and our execution agent.
userData := `#!/bin/bash
yum update -y
yum install -y containerd
systemctl enable --now containerd
# Download and run our execution agent binary from S3 or another trusted source
# /usr/local/bin/execution-agent --job-id ` + jobID + `
`
encodedUserData := base64.StdEncoding.EncodeToString([]byte(userData))
runInput := &ec2.RunInstancesInput{
ImageId: aws.String(amiID),
InstanceType: instanceType,
MinCount: aws.Int32(1),
MaxCount: aws.Int32(1),
SecurityGroupIds: []string{sgID},
SubnetId: aws.String(subnetID),
IamInstanceProfile: &types.IamInstanceProfileSpecification{
Name: aws.String(roleName), // Instance Profile Name
},
UserData: aws.String(encodedUserData),
}
runResult, err := ec2Client.RunInstances(ctx, runInput)
// ... error handling and full resource cleanup ...
instanceID := *runResult.Instances[0].InstanceId
return instanceID, nil
}
Execution Agent: containerd
Interaction
The agent runs on the ephemeral EC2 node. Its job is simple but critical: fetch job details, use the containerd
client to run the work, and report back.
// agent/main.go
package main
import (
"context"
"flag"
"fmt"
"log"
"strings"
"time"
"github.com/containerd/containerd"
"github.com/containerd/containerd/cio"
"github.com/containerd/containerd/namespaces"
"github.com/containerd/containerd/oci"
// ... aws sdk imports
)
func main() {
jobID := flag.String("job-id", "", "The ID of the job to execute")
flag.Parse()
if *jobID == "" {
log.Fatal("--job-id is required")
}
ctx := context.Background()
// 1. Fetch job details from DynamoDB using the instance role
jobDetails := fetchJobDetailsFromDDB(ctx, *jobID)
updateJobStatusInDDB(ctx, *jobID, "RUNNING")
// 2. Setup containerd client
client, err := containerd.New("/run/containerd/containerd.sock")
if err != nil {
log.Fatalf("Failed to connect to containerd: %v", err)
updateJobStatusInDDB(ctx, *jobID, "FAILED")
// Signal orchestrator to clean up
return
}
defer client.Close()
// Create a context with the 'sandbox' namespace for containerd
ctx = namespaces.WithNamespace(ctx, "sandbox")
// 3. Pull the image
image, err := client.Pull(ctx, jobDetails.ContainerImage, containerd.WithPullUnpack)
if err != nil {
log.Printf("Failed to pull image %s: %v", jobDetails.ContainerImage, err)
updateJobStatusInDDB(ctx, *jobID, "FAILED")
return
}
// 4. Create the container. This is where we can apply strict OCI specs.
// A key production improvement would be to specify a seccomp profile
// and resource limits (CPU, Memory).
container, err := client.NewContainer(
ctx,
*jobID, // Container ID is the job ID
containerd.WithImage(image),
containerd.WithNewSnapshot(fmt.Sprintf("%s-snapshot", *jobID), image),
containerd.WithNewSpec(oci.WithImageConfig(image),
// A common mistake is not setting resource limits.
// An untrusted workload could easily exhaust host resources.
oci.WithLinuxResources(&specs.LinuxResources{
Memory: &specs.LinuxMemory{
Limit: aws.Int64(1024 * 1024 * 512), // 512 MB limit
},
CPU: &specs.LinuxCPU{
Quota: aws.Int64(50000), // 50% of one CPU core
Period: aws.Uint64(100000),
},
}),
),
)
if err != nil {
// Handle error...
}
defer container.Delete(ctx, containerd.WithSnapshotCleanup)
// 5. Create and run the task
logBuffer := new(strings.Builder)
task, err := container.NewTask(ctx, cio.NewCreator(cio.WithStreams(nil, logBuffer, logBuffer)))
if err != nil {
// Handle error...
}
defer task.Delete(ctx)
// Wait for the task to exit
statusC, err := task.Wait(ctx)
if err != nil {
// Handle error...
}
if err := task.Start(ctx); err != nil {
// Handle error...
}
status := <-statusC
code, _, err := status.Result()
if err != nil {
// Handle error...
}
log.Printf("Task finished with exit code: %d", code)
// 6. Report results back to DynamoDB
updateJobCompletionInDDB(ctx, *jobID, code, logBuffer.String())
// 7. Signal orchestrator to terminate this instance
// ... SQS send message or similar ...
}
This architecture provides a strong foundation for isolating untrusted workloads. The network boundary is absolute, defined by ephemeral VPC security groups. The compute boundary is managed by containerd
, with process and filesystem isolation. The data access boundary is enforced at the database level by IAM. And the entry point is secured by phishing-resistant WebAuthn credentials.
The primary limitation of this specific implementation is its performance. Provisioning an EC2 instance, even a micro one, for every job introduces a cold start latency of 30-60 seconds, which is unacceptable for many use cases. A production-grade system would evolve this pattern by using a pool of pre-warmed instances or, more effectively, replacing EC2 instances with Firecracker microVMs. Firecracker offers VM-level security with container-like speed, which would be a natural next step. Furthermore, the cleanup logic must be made robust against partial failures, typically by using a state machine (like AWS Step Functions) to track provisioning and teardown, ensuring no orphaned resources are left behind, which could become a significant cost and security issue. The current IAM policy generation is also static; a more advanced system might analyze the code to be run and generate an even more restrictive, tailored policy for each specific job.