Utilizing eBPF and Puppet for Kernel-Level Tracing of Jetpack Compose Performance in a Jenkins CI Environment

DevOps

Word Count: 2.9k

Read Times: 18 Min

For three weeks, our CI pipeline for the flagship Android application was a source of constant frustration. Intermittent failures plagued the ui-tests stage, where Espresso tests targeting complex Jetpack Compose screens would time out. The failures were exclusively on our Jenkins build farm; local developer machines and even staging emulators ran the tests flawlessly. The error was always a generic androidx.test.espresso.NoMatchingViewException or an AppNotRespondingException, pointing to severe UI jank that prevented Espresso from finding and interacting with components in time. This was a classic “Heisenbug”—a problem that vanished the moment you tried to observe it directly with standard tools.

Our initial hypothesis gravitated toward backend latency. The screens under test performed complex data fetching. In a real-world project, the first suspect is often the network or server-side processing. We already had a mature observability stack built on SkyWalking, so validating this was straightforward. We configured the test build variant to point to a dedicated, fully instrumented staging environment and triggered the failing Jenkins job.

The SkyWalking dashboard was unequivocal.

sequenceDiagram
    participant App as Android App (in Emulator)
    participant Backend as Backend Service
    App->>Backend: GET /api/v3/user/profile
    Note right of Backend: Processing Time: 35ms
    Backend-->>App: 200 OK (Profile Data)
    App->>Backend: GET /api/v3/user/dashboard
    Note right of Backend: P99 Latency: 62ms
    Backend-->>App: 200 OK (Dashboard Data)

Traces showed p99 latencies consistently below 100ms for all relevant endpoints. The backend was not the bottleneck. This forced us to accept a more difficult reality: the performance degradation was happening entirely within the Android emulator process running on the virtualized Jenkins agent.

The standard Android Profiler in Android Studio is an excellent tool, but it’s GUI-based and designed for interactive sessions. It’s completely unsuitable for a headless, automated CI environment. We needed a way to get deep, system-level performance data from a specific process (the QEMU emulator) running on a remote Linux machine, triggered automatically, and correlated with our test execution. This is where the investigation pivoted toward kernel-level observability. The idea was to stop trying to look inside the Android guest OS and instead to observe the emulator process from the outside, from the host kernel’s perspective. This immediately brought eBPF to the forefront.

eBPF (extended Berkeley Packet Filter) allows running sandboxed programs in the Linux kernel without changing kernel source code. For our use case, its ability to attach to kernel probes (kprobes) to trace function calls like the scheduler or syscalls was the perfect fit. We could get nanosecond-precision data about why and when our emulator process was losing CPU time, without instrumenting a single line of application code.

The final piece of the puzzle was deployment and orchestration. Our build farm consists of dozens of agents. Manually installing and configuring the eBPF toolchain on each was not an option. Puppet already managed the base configuration of these nodes, so extending it to handle our diagnostic tooling was the only maintainable path. Jenkins, as the orchestrator, would tie it all together: use Puppet to ensure the agent was ready, launch the eBPF trace, run the Android tests, and collect the results.

Automating Agent Configuration with Puppet

The first step was to create a robust Puppet manifest to prepare any Jenkins agent for this type of diagnostic work. In a production environment, you cannot assume the required tools are present. The manifest needed to be idempotent and handle the installation of the kernel headers (a hard requirement for compiling eBPF programs on the fly) and the BCC (BPF Compiler Collection), which provides a powerful Python frontend for writing eBPF tools.

Here is the Puppet class, profile::android_ci_agent, we developed.

# Class: profile::android_ci_agent
#
# Configures a Linux node to function as an Android CI build agent with
# advanced eBPF-based performance tracing capabilities.
#
class profile::android_ci_agent {
  # Ensure core build tools are present. This is a baseline for many operations,
  # including compiling eBPF programs.
  ensure_packages(['build-essential', 'git', 'clang', 'llvm'])

  # The kernel headers corresponding to the *running* kernel are essential for BCC
  # to compile the eBPF C code. Using the $facts['kernelrelease'] ensures we
  # install the correct version. A common mistake is to install a generic
  # 'linux-headers' package which might not match the running kernel, leading
  # to compilation failures.
  $kernel_headers_package = "linux-headers-${facts['kernelrelease']}"
  ensure_packages([$kernel_headers_package])

  # BCC (BPF Compiler Collection) is the primary toolchain. It includes the Python 3
  # bindings we'll use for our tracing script. The 'bpfcc-tools' package
  # contains both the libraries and some useful pre-built tools for debugging.
  # We explicitly depend on the kernel headers being installed first.
  package { 'bpfcc-tools':
    ensure  => 'present',
    require => Package[$kernel_headers_package],
  }

  # Deploy the custom eBPF tracing script from the Puppet master's fileserver.
  # This centralizes management of the script; updating it doesn't require
  # modifying the Jenkinsfile or rebuilding agent images.
  file { '/usr/local/bin/trace_qemu_scheduler.py':
    ensure  => 'file',
    owner   => 'root',
    group   => 'root',
    mode    => '0755',
    source  => 'puppet:///modules/profile/trace_qemu_scheduler.py',
    require => Package['bpfcc-tools'],
  }

  # Configure sudoers to allow the 'jenkins' user to run our specific eBPF script
  # without a password. eBPF programs require root privileges. A pitfall here is
  # giving broad sudo access. We scope it to the exact command for security.
  file { '/etc/sudoers.d/91-jenkins-bpf-tracer':
    ensure  => 'present',
    owner   => 'root',
    group   => 'root',
    mode    => '0440',
    content => 'jenkins ALL=(ALL) NOPASSWD: /usr/local/bin/trace_qemu_scheduler.py',
  }
}

This manifest makes the agent setup declarative and repeatable. It handles dependencies correctly and securely grants the jenkins user the necessary permissions, a critical step for automation.

The eBPF Scheduler and Syscall Tracer

With the agents configured, the core of the solution is the eBPF script itself. We wrote a Python script using BCC that performs two main functions:

It identifies the Process ID (PID) of the QEMU process that corresponds to our Android emulator.
It attaches kprobes to the kernel’s scheduler function finish_task_switch to monitor every time a process is scheduled off the CPU.
It attaches kprobes to syscalls associated with graphics rendering, specifically the ioctl syscall, to see if rendering commands are blocking or causing contention.

The script is designed to be run in the background during the test execution, printing structured JSON logs to standard output.

trace_qemu_scheduler.py:

#!/usr/bin/python3
#
# trace_qemu_scheduler.py - A BCC-based tool to trace scheduler events and
# ioctl syscalls for a QEMU process to diagnose UI jank in Android emulators.
#
# USAGE: ./trace_qemu_scheduler.py

from bcc import BPF
import ctypes as ct
import time
import argparse
import subprocess
import json
import sys
import logging

# Setup basic logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')

# The BPF C program is the heart of the tool. It runs inside the kernel.
bpf_text = """
#include <uapi/linux/ptrace.h>
#include <linux/sched.h>
#include <linux/fs.h>

// Data structure to pass event information from kernel to user-space.
// This must match the Python struct definition.
struct event_t {
    u64 ts;
    u32 pid;
    u32 tgid;
    int event_type; // 1 = sched_switch, 2 = ioctl_entry, 3 = ioctl_exit
    char comm[TASK_COMM_LEN];
    // For sched_switch
    char prev_comm[TASK_COMM_LEN];
    s32 prev_pid;
    // For ioctl
    u32 fd;
    u32 request;
};

// Perf buffer to send events to user-space.
BPF_PERF_OUTPUT(events);

// A map to store the start time of an ioctl call, keyed by thread ID.
BPF_HASH(ioctl_start, u32);

// Target TGID (Thread Group ID, effectively the PID) for filtering.
// This will be replaced by the user-space script with the actual QEMU PID.
static const int TARGET_TGID = -1;

// kprobe on finish_task_switch: called when a task switch is complete.
// We capture when the target QEMU process is switched *out*.
int trace_sched_switch(struct pt_regs *ctx, struct task_struct *prev) {
    u32 tgid = prev->tgid;
    if (tgid != TARGET_TGID) {
        return 0;
    }

    struct event_t event = {};
    event.ts = bpf_ktime_get_ns();
    event.pid = prev->pid;
    event.tgid = tgid;
    event.event_type = 1; // sched_switch
    event.prev_pid = prev->pid;
    bpf_get_current_comm(&event.comm, sizeof(event.comm));
    bpf_probe_read_kernel(&event.prev_comm, sizeof(event.prev_comm), prev->comm);

    events.perf_submit(ctx, &event, sizeof(event));
    return 0;
}

// kprobe on the entry of the ioctl syscall.
int trace_ioctl_entry(struct pt_regs *ctx, int fd, unsigned int request) {
    u32 tgid = bpf_get_current_pid_tgid() >> 32;
    if (tgid != TARGET_TGID) {
        return 0;
    }

    u32 pid = bpf_get_current_pid_tgid();
    u64 ts = bpf_ktime_get_ns();
    ioctl_start.update(&pid, &ts);

    struct event_t event = {};
    event.ts = ts;
    event.pid = pid;
    event.tgid = tgid;
    event.event_type = 2; // ioctl_entry
    event.fd = fd;
    event.request = request;
    bpf_get_current_comm(&event.comm, sizeof(event.comm));
    
    events.perf_submit(ctx, &event, sizeof(event));
    return 0;
}

// kretprobe on the exit of the ioctl syscall.
int trace_ioctl_exit(struct pt_regs *ctx) {
    u32 tgid = bpf_get_current_pid_tgid() >> 32;
    if (tgid != TARGET_TGID) {
        return 0;
    }

    // You could calculate duration here by comparing with the start time
    // from the ioctl_start map, but for simplicity, we just log the exit.
    
    struct event_t event = {};
    event.ts = bpf_ktime_get_ns();
    event.pid = bpf_get_current_pid_tgid();
    event.tgid = tgid;
    event.event_type = 3; // ioctl_exit
    bpf_get_current_comm(&event.comm, sizeof(event.comm));

    events.perf_submit(ctx, &event, sizeof(event));
    return 0;
}
"""

# Python representation of the C struct
class Event(ct.Structure):
    _fields_ = [
        ("ts", ct.c_ulonglong),
        ("pid", ct.c_uint),
        ("tgid", ct.c_uint),
        ("event_type", ct.c_int),
        ("comm", ct.c_char * 16),
        ("prev_comm", ct.c_char * 16),
        ("prev_pid", ct.c_int),
        ("fd", ct.c_uint),
        ("request", ct.c_uint),
    ]

def find_qemu_pid():
    """Finds the PID of the main QEMU system process."""
    try:
        # A common mistake is to grep for 'qemu' which can match this script itself.
        # Being more specific is key.
        output = subprocess.check_output(
            ["pgrep", "-f", "qemu-system-x86_64.*-avd"],
            universal_newlines=True
        ).strip()
        if output:
            return int(output.splitlines()[0])
    except (subprocess.CalledProcessError, IndexError, ValueError) as e:
        logging.error(f"Failed to find QEMU process: {e}")
        return None
    return None

def main():
    qemu_pid = find_qemu_pid()
    if not qemu_pid:
        logging.error("QEMU process not found. Exiting.")
        sys.exit(1)

    logging.info(f"Found QEMU process with TGID: {qemu_pid}. Attaching probes...")

    # Replace the placeholder in the BPF text with the actual PID.
    # This is a critical step for filtering events at the kernel level,
    # which is vastly more efficient than filtering in user-space.
    modified_bpf_text = bpf_text.replace('TARGET_TGID = -1', f'static const int TARGET_TGID = {qemu_pid};')

    # Load the BPF program
    b = BPF(text=modified_bpf_text)
    
    # Attach probes
    b.attach_kprobe(event="finish_task_switch", fn_name="trace_sched_switch")
    b.attach_kprobe(event="__x64_sys_ioctl", fn_name="trace_ioctl_entry")
    b.attach_kretprobe(event="__x64_sys_ioctl", fn_name="trace_ioctl_exit")

    logging.info("Probes attached. Listening for events...")

    def print_event(cpu, data, size):
        event = ct.cast(data, ct.POINTER(Event)).contents
        
        event_data = {
            "timestamp_ns": event.ts,
            "pid": event.pid,
            "process_name": event.comm.decode('utf-8', 'replace'),
        }

        if event.event_type == 1: # sched_switch
            event_data["type"] = "SCHED_SWITCH_OUT"
            event_data["switched_out_process"] = event.prev_comm.decode('utf-8', 'replace')
            event_data["switched_out_pid"] = event.prev_pid
        elif event.event_type == 2: # ioctl_entry
            event_data["type"] = "IOCTL_ENTRY"
            event_data["fd"] = event.fd
            event_data["request_code"] = hex(event.request)
        elif event.event_type == 3: # ioctl_exit
            event_data["type"] = "IOCTL_EXIT"

        # Output as a single line of JSON for easy parsing.
        print(json.dumps(event_data), flush=True)

    b["events"].open_perf_buffer(print_event)
    while True:
        try:
            b.perf_buffer_poll()
        except KeyboardInterrupt:
            logging.info("Detaching probes and exiting.")
            sys.exit(0)
        except Exception as e:
            logging.error(f"An error occurred during polling: {e}")
            break

if __name__ == "__main__":
    main()

The Problematic `Jetpack Compose` Code

The root cause was traced to a specific Composable responsible for rendering a complex, animated dashboard. Under normal conditions, it performed adequately. However, on a resource-constrained CI agent where the emulator competes for CPU with Gradle and other processes, its inefficiencies were magnified into critical failures.

The original code looked something like this:

// The problematic composable causing excessive recomposition and GPU work.
@Composable
fun RealtimeDataDashboard(viewModel: DashboardViewModel) {
    val userMetrics by viewModel.userMetrics.collectAsState()
    val systemStatus by viewModel.systemStatus.collectAsState()

    // PITFALL: Deriving state directly inside the composable body.
    // Every time userMetrics or systemStatus changes, this complex calculation
    // is re-run, even if the result would be the same.
    val criticalAlerts = userMetrics.filter { it.isCritical && systemStatus.isOnline }

    LazyColumn {
        items(criticalAlerts) { alert ->
            // Another PITFALL: A complex composable that uses Modifier.animateContentSize()
            // and contains heavy drawing logic, being re-rendered for every minor data change.
            AlertCard(alert = alert)
        }
    }
}

The key issues were twofold:

Complex filtering logic (criticalAlerts) was performed directly in the composition. Any minor, unrelated update to userMetrics or systemStatus would trigger this expensive recalculation.
The AlertCard itself was a heavy component. Rapidly adding or removing items from criticalAlerts caused a cascade of expensive animations and redraws, flooding the host with graphics-related ioctl syscalls.

Orchestration with a Jenkinsfile

The Jenkinsfile is where all the pieces come together. It defines the pipeline that prepares the environment, runs the trace and tests in parallel, and collects the artifacts.

// Jenkinsfile
pipeline {
    agent { label 'android-builder' }

    stages {
        stage('Validate Agent Configuration') {
            steps {
                script {
                    // In a production setup, we might trigger a 'puppet agent -t' run.
                    // For this example, we'll just verify the tracer script exists.
                    sh 'ls -l /usr/local/bin/trace_qemu_scheduler.py'
                }
            }
        }

        stage('Run UI Tests with eBPF Tracing') {
            steps {
                // This is crucial for cleanup. We need to ensure the tracer process
                // and the emulator are terminated regardless of test success or failure.
                // A common mistake is to let these processes leak, consuming resources.
                timeout(time: 30, unit: 'MINUTES') {
                    script {
                        // Start the emulator in the background, headless.
                        sh '$ANDROID_HOME/emulator/emulator -avd test_avd -no-window -no-audio -no-snapshot &'
                        // Wait for the emulator to fully boot.
                        sh '$ANDROID_HOME/platform-tools/adb wait-for-device shell \'while [[ -z $(getprop sys.boot_completed) ]]; do sleep 1; done;\''

                        // Use a variable to hold the PID of the tracer for later.
                        def tracerPid
                        try {
                            // Run the eBPF tracer in parallel with the Gradle tests.
                            parallel(
                                trace: {
                                    // The 'script' block is needed to get the PID.
                                    // We redirect stderr to stdout to capture all output.
                                    sh '''
                                        sudo /usr/local/bin/trace_qemu_scheduler.py > trace_output.json 2>&1 &
                                        echo $! > tracer.pid
                                    '''
                                    tracerPid = readFile('tracer.pid').trim()
                                    echo "eBPF tracer started with PID: ${tracerPid}"
                                },
                                test: {
                                    // Execute the tests. If this fails, the 'finally' block
                                    // in the outer 'try' will still execute for cleanup.
                                    sh './gradlew :app:connectedAndroidTest'
                                }
                            )
                        } catch (e) {
                            echo "UI tests failed. Captured eBPF trace might contain clues."
                            // Allow the build to be marked as UNSTABLE or FAILED but still proceed to cleanup.
                            currentBuild.result = 'FAILURE'
                            throw e
                        } finally {
                            echo "Cleaning up processes..."
                            // Stop the tracer process gracefully first.
                            if (tracerPid) {
                                sh "sudo kill -SIGINT ${tracerPid} || true"
                            }
                            // Stop the emulator.
                            sh '$ANDROID_HOME/platform-tools/adb emu kill'
                        }
                    }
                }
            }
        }
    }

    post {
        always {
            // Archive the trace output and test results for every build.
            // This is vital for comparing successful and failed runs.
            archiveArtifacts artifacts: 'trace_output.json, app/build/reports/', allowEmptyArchive: true
        }
    }
}

Analysis and Resolution

After running the pipeline, the trace_output.json from a failed build was illuminating. During the moments Espresso reported the UI was unresponsive, the log showed a massive burst of events:

{"timestamp_ns": 1669824312843098, "pid": 12346, "process_name": "qemu-system-x86", "type": "IOCTL_ENTRY", "fd": 34, "request_code": "0xc020644d"}
{"timestamp_ns": 1669824312843150, "pid": 12345, "process_name": "qemu-system-x86", "type": "SCHED_SWITCH_OUT", "switched_out_process": "qemu-system-x86", "switched_out_pid": 12345}
{"timestamp_ns": 1669824312843201, "pid": 12346, "process_name": "qemu-system-x86", "type": "IOCTL_EXIT"}
{"timestamp_ns": 1669824312843255, "pid": 12346, "process_name": "qemu-system-x86", "type": "IOCTL_ENTRY", "fd": 34, "request_code": "0xc020644d"}
... (hundreds more in a few milliseconds) ...

The pattern was clear: the QEMU process, which handles graphics via host GPU passthrough, was spamming the kernel with ioctl calls. Concurrently, the scheduler was frequently descheduling the main emulator threads, indicating severe CPU contention. The application was spending so much time in the kernel handling rendering commands that it couldn’t service the UI thread.

The fix was to refactor the Jetpack Compose code to be more efficient, using derivedStateOf to ensure the expensive filter only runs when its direct inputs change, and simplifying the AlertCard animations.

// The optimized composable
@Composable
fun RealtimeDataDashboard(viewModel: DashboardViewModel) {
    val userMetrics by viewModel.userMetrics.collectAsState()
    val systemStatus by viewModel.systemStatus.collectAsState()

    // FIX: Use derivedStateOf to memoize the calculation. It will only
    // re-run if userMetrics or systemStatus instance actually changes.
    val criticalAlerts by remember(userMetrics, systemStatus) {
        derivedStateOf {
            userMetrics.filter { it.isCritical && systemStatus.isOnline }
        }
    }

    LazyColumn {
        // FIX: Add a key to help Compose understand item identity,
        // reducing unnecessary recompositions during list changes.
        items(items = criticalAlerts, key = { it.id }) { alert ->
            AlertCard(alert = alert)
        }
    }
}

With the optimized code, the CI job passed consistently. The new trace_output.json was dramatically cleaner, showing far fewer ioctl calls and scheduler events during the test run.

This approach of using eBPF for zero-instrumentation kernel tracing within a CI pipeline is undeniably complex. The eBPF script can be brittle, as it depends on specific kernel function names and syscalls that could change. Maintaining the Puppet manifests and the Jenkinsfile requires a DevOps skillset that is distinct from mobile application development. However, for debugging otherwise invisible, environment-specific performance issues, it proved to be an invaluable technique. Future iterations could involve creating a more generic eBPF tool that automatically discovers relevant syscalls, or forwarding the JSON event stream to a time-series database for automated anomaly detection, turning this diagnostic tool into a proactive monitoring system.