Contents
TABLE OF CONTENTS
← All posts

Zero-Instrumentation Observability with eBPF

·9 min read
eBPFObservability

Instrumenting every service in a distributed system is expensive and fragile. Libraries drift out of date, different languages need different SDKs, and some services can't be modified at all. eBPF offers a compelling alternative: attach probes to kernel and userspace events, and collect telemetry without touching a single line of application code.

How eBPF works

eBPF programs are small, sandboxed programs that run in the Linux kernel. They're verified for safety — no infinite loops, no out-of-bounds memory access — and JIT-compiled to native instructions. You attach them to hooks: kprobes for kernel functions, uprobes for userspace functions, tracepoints for static kernel instrumentation points, and more.

Created snippet.c
+29
+// eBPF program that counts TCP connections by destination port
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+
+struct {
+ __uint(type, BPF_MAP_TYPE_HASH);
+ __uint(max_entries, 1024);
+ __type(key, u16); // destination port
+ __type(value, u64); // count
+} tcp_conn_count SEC(".maps");
+
+SEC("kprobe/tcp_v4_connect")
+int trace_tcp_connect(struct pt_regs *ctx) {
+ // The sock pointer is the first argument
+ struct sock *sk = (struct sock *)PT_REGS_PARM1(ctx);
+ u16 dport = sk->__sk_common.skc_dport;
+ dport = __builtin_bswap16(dport); // network to host byte order
+
+ u64 *count = bpf_map_lookup_elem(&tcp_conn_count, &dport);
+ if (count) {
+ __sync_fetch_and_add(count, 1);
+ } else {
+ u64 init = 1;
+ bpf_map_update_elem(&tcp_conn_count, &dport, &init, BPF_ANY);
+ }
+ return 0;
+}
+
+char LICENSE[] SEC("license") = "GPL";

HTTP tracing without touching the application

For HTTP services, the most valuable signal is the request-response lifecycle: method, path, status code, latency. With eBPF, you can attach uprobes to common HTTP library functions — Go's net/http, Python's WSGI, Node's HTTP parser — and extract the same data without an SDK.

The approach is to probe the function that writes the response header. At that point, the request is parsed and the status code is known. You read the request data from the socket buffer or from function arguments, compute latency from a timestamp saved when the connection was accepted, and emit the span to a userspace collector via a BPF ring buffer.

The limitations

eBPF observability isn't a complete replacement for application-level instrumentation. Internal function spans, business-logic metrics, and structured log events still need SDKs. eBPF excels at the system boundary: network calls, disk I/O, scheduling latency, memory allocation patterns. The best observability strategy layers eBPF for infrastructure signals with lightweight SDKs for application-specific telemetry.

Getting started

The tooling ecosystem has matured significantly. libbpf and its Rust bindings provide a safe, high-level interface for writing BPF programs. Projects like Cilium's Tetragon and Pixie give you production-ready eBPF observability without writing kernel code yourself. If you're responsible for a fleet of services and want better observability without touching every codebase, eBPF is worth the investment.

We deployed eBPF-based HTTP tracing across 200 services and got useful latency and error-rate data within a day — no code changes, no redeploys. That kind of velocity is hard to beat with traditional instrumentation.