Zero-Instrumentation Observability with eBPF

January 14, 2026·9 min read

eBPFObservability

Instrumenting every service in a distributed system is expensive and fragile. Libraries drift out of date, different languages need different SDKs, and some services can't be modified at all. eBPF offers a compelling alternative: attach probes to kernel and userspace events, and collect telemetry without touching a single line of application code.

How eBPF works

eBPF programs are small, sandboxed programs that run in the Linux kernel. They're verified for safety — no infinite loops, no out-of-bounds memory access — and JIT-compiled to native instructions. You attach them to hooks: kprobes for kernel functions, uprobes for userspace functions, tracepoints for static kernel instrumentation points, and more.

snippet.c


#include <linux/bpf.h>
#include <bpf/bpf_helpers.h>

struct {
    __uint(type, BPF_MAP_TYPE_HASH);
    __uint(max_entries, 1024);
    __type(key, u16);
    __type(value, u64);
} tcp_conn_count SEC(".maps");

SEC("kprobe/tcp_v4_connect")
int trace_tcp_connect(struct pt_regs *ctx) {

    struct sock *sk = (struct sock *)PT_REGS_PARM1(ctx);
    u16 dport = sk->__sk_common.skc_dport;
    dport = __builtin_bswap16(dport);

    u64 *count = bpf_map_lookup_elem(&tcp_conn_count, &dport);
    if (count) {
        __sync_fetch_and_add(count, 1);
    } else {
        u64 init = 1;
        bpf_map_update_elem(&tcp_conn_count, &dport, &init, BPF_ANY);
    }
    return 0;
}

char LICENSE[] SEC("license") = "GPL";

HTTP tracing without touching the application

For HTTP services, the most valuable signal is the request-response lifecycle: method, path, status code, latency. With eBPF, you can attach uprobes to common HTTP library functions — Go's net/http, Python's WSGI, Node's HTTP parser — and extract the same data without an SDK.

The approach is to probe the function that writes the response header. At that point, the request is parsed and the status code is known. You read the request data from the socket buffer or from function arguments, compute latency from a timestamp saved when the connection was accepted, and emit the span to a userspace collector via a BPF ring buffer.

The limitations

eBPF observability isn't a complete replacement for application-level instrumentation. Internal function spans, business-logic metrics, and structured log events still need SDKs. eBPF excels at the system boundary: network calls, disk I/O, scheduling latency, memory allocation patterns. The best observability strategy layers eBPF for infrastructure signals with lightweight SDKs for application-specific telemetry.

eBPF provides observability with zero code changes
Best suited for infrastructure signals: networking, I/O, scheduling
Application-level business metrics still benefit from SDKs
Performance overhead is typically under 1% CPU
BPF CO-RE (Compile Once, Run Everywhere) solves kernel portability

Getting started

The tooling ecosystem has matured significantly. libbpf and its Rust bindings provide a safe, high-level interface for writing BPF programs. Projects like Cilium's Tetragon and Pixie give you production-ready eBPF observability without writing kernel code yourself. If you're responsible for a fleet of services and want better observability without touching every codebase, eBPF is worth the investment.

We deployed eBPF-based HTTP tracing across 200 services and got useful latency and error-rate data within a day — no code changes, no redeploys. That kind of velocity is hard to beat with traditional instrumentation.

← Back to all posts