✍️ 필사 모드: eBPF Complete Guide 2025: Kernel Programming Revolution for Observability, Networking, Security
EnglishTL;DR
- eBPF is a safe programming environment inside the Linux kernel. Add tracing/networking/security features without changing kernel code
- 3 major use cases: Observability (Pixie/Parca), Networking (Cilium/XDP), Security (Falco/Tetragon)
- CO-RE: Compile once, run on various kernel versions — eliminates deployment burden
- bpftrace: DTrace-like high-level language for ad-hoc tracing (
bpftrace -e 'tracepoint:syscalls:sys_enter_open { @[comm] = count(); }') - XDP: Packet processing at the network card level — Cloudflare uses it for DDoS defense (10M+ packets per second)
1. What is eBPF?
1.1 One-line definition
eBPF is a technology that allows you to run sandboxed programs inside the operating system kernel without modifying it.
Traditionally, extending kernel functionality required two approaches:
- Add code to the kernel itself — hard to contribute, and even when merged, users wait for new kernel versions
- Load kernel modules — risky if poorly written, vulnerable to kernel ABI changes
eBPF offers a third path: safely run custom code inside the kernel.
1.2 How is safety ensured?
Before loading, eBPF programs must pass BPF Verifier static analysis:
- Termination guarantee: No infinite loops (limits on backward jumps)
- Memory safety: Tracks all pointer accesses, bounds checks
- Function call restrictions: Only allowed helper functions
- Stack size: Maximum 512 bytes
- Instruction count: Less than 1 million (kernel 5.2+)
After verification, the JIT compiler converts to native machine code, executing at near-native speed.
1.3 Evolution from cBPF to eBPF
| Feature | cBPF (1992) | eBPF (2014~) |
|---|---|---|
| Purpose | Packet filtering only | General-purpose kernel programming |
| Registers | 2 (32-bit) | 11 (64-bit) |
| Instructions | 22 | 100+ |
| Helper functions | None | 200+ |
| Maps | None | 30+ types |
| JIT | Some architectures | Most architectures |
cBPF used by tcpdump was a simple packet filter, but eBPF evolved into a mini virtual machine.
2. eBPF Architecture Deep Dive
2.1 Component Overview
┌─────────────────────────────────────────────┐
│ User Space │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ bpftool │ │ bpftrace │ │ Cilium │ │
│ └────┬─────┘ └────┬─────┘ └────┬─────┘ │
│ │ │ │ │
│ └─────────────┼─────────────┘ │
│ │ bpf() syscall │
└─────────────────────┼───────────────────────┘
▼
┌─────────────────────────────────────────────┐
│ Kernel Space │
│ ┌─────────────────────────────────────┐ │
│ │ BPF Verifier (safety verification) │ │
│ └──────────────┬──────────────────────┘ │
│ ▼ │
│ ┌─────────────────────────────────────┐ │
│ │ JIT Compiler (native code) │ │
│ └──────────────┬──────────────────────┘ │
│ ▼ │
│ ┌─────────────────────────────────────┐ │
│ │ Hooks: kprobe, tracepoint, XDP, │ │
│ │ perf_event, socket, cgroup, LSM │ │
│ └─────────────────────────────────────┘ │
└─────────────────────────────────────────────┘
2.2 Major Hook Types
eBPF programs can attach to various kernel events:
| Hook Type | Description | Example |
|---|---|---|
| kprobe / kretprobe | Kernel function entry/exit | Trace do_sys_open calls |
| uprobe / uretprobe | User-space functions | Trace OpenSSL SSL_read |
| tracepoint | Static kernel tracepoints | sched:sched_switch |
| perf_event | Hardware/software counters | CPU cycles, cache misses |
| XDP | Network driver | DDoS filtering |
| TC (Traffic Control) | Traffic shaping | QoS, load balancing |
| socket | Socket operations | Socket filters |
| cgroup | Cgroup events | Network isolation |
| LSM (Linux Security Modules) | Security policies | File access control |
2.3 BPF Maps — Core Data Structures
Data sharing between eBPF programs and user space happens through BPF Maps:
| Map Type | Purpose |
|---|---|
BPF_MAP_TYPE_HASH | General key-value storage |
BPF_MAP_TYPE_ARRAY | Array (index-based) |
BPF_MAP_TYPE_PERF_EVENT_ARRAY | Streaming events to user space |
BPF_MAP_TYPE_RINGBUF | More efficient event streaming (kernel 5.8+) |
BPF_MAP_TYPE_LRU_HASH | LRU cache |
BPF_MAP_TYPE_LPM_TRIE | Longest prefix match (used in routing) |
BPF_MAP_TYPE_PROG_ARRAY | Tail call jump table |
// BPF_MAP_TYPE_HASH example: counting syscalls per PID
struct {
__uint(type, BPF_MAP_TYPE_HASH);
__type(key, __u32); // PID
__type(value, __u64); // count
__uint(max_entries, 10240);
} syscall_count SEC(".maps");
3. Development Tools Comparison
3.1 BCC (BPF Compiler Collection)
- Languages: C for BPF, Python/Lua for user space
- Pros: Rich tooling (100+ tools in bcc-tools package)
- Cons: Compiler dependency (LLVM at runtime), large binaries
3.2 bpftrace — DTrace's Successor
- Language: High-level awk-like DSL
- Use case: Ad-hoc tracing, debugging
- Strength: Powerful analysis in one line
# Count syscalls per process
bpftrace -e 'tracepoint:syscalls:sys_enter_* { @[comm] = count(); }'
# Trace read() calls taking longer than 1 second
bpftrace -e '
kprobe:vfs_read { @start[tid] = nsecs; }
kretprobe:vfs_read /@start[tid]/ {
$duration = nsecs - @start[tid];
if ($duration > 1000000000) {
printf("%s pid %d took %d ms\n", comm, pid, $duration / 1000000);
}
delete(@start[tid]);
}'
# Top CPU functions (on-CPU profiling)
bpftrace -e 'profile:hz:99 { @[ustack] = count(); }'
3.3 libbpf + CO-RE — Production Standard
- CO-RE (Compile Once, Run Everywhere): Embeds kernel struct info via BTF — runs across kernel versions
- Pros: Small binaries, no compiler dependency, fast startup
- Cons: Steep learning curve
// libbpf example: tracing openat()
#include <linux/bpf.h>
#include <bpf/bpf_helpers.h>
char LICENSE[] SEC("license") = "GPL";
SEC("tracepoint/syscalls/sys_enter_openat")
int handle_openat(struct trace_event_raw_sys_enter *ctx) {
char comm[16];
bpf_get_current_comm(&comm, sizeof(comm));
bpf_printk("openat by %s\n", comm);
return 0;
}
3.4 Tools Comparison
| Tool | Learning Curve | Performance | Deployment | Best For |
|---|---|---|---|---|
| BCC | Medium | Medium | Hard (needs LLVM) | Learning, one-off analysis |
| bpftrace | Easy | High | Medium | Ad-hoc tracing, debugging |
| libbpf + CO-RE | Hard | Very High | Easy | Production tool development |
| Aya (Rust) | Hard | Very High | Easy | Rust ecosystem integration |
| ebpf-go | Medium | Very High | Easy | Go application integration |
4. Three Major Use Cases of eBPF
4.1 Observability
eBPF observes system behavior deeply without changing application code.
Pixie — Kubernetes Native Observability
- Open-source project acquired by New Relic
- Auto-traces HTTP/gRPC/MySQL requests in all Pods without sidecars
- BPF code attaches uprobes to OpenSSL
SSL_read/SSL_writeto capture data before encryption
Parca — Continuous Profiling
- 24/7 CPU profiling in production
- Less than 1% overhead
- Open source by Polar Signals
4.2 Networking
Cilium — eBPF-based K8s Networking Standard
- CNCF Graduated project
- Complete kube-proxy replacement — eBPF for service routing instead of iptables
- L7 policy support (HTTP, gRPC, Kafka)
- Hubble: Cilium-based network observability
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
name: allow-api-only
spec:
endpointSelector:
matchLabels:
app: backend
ingress:
- fromEndpoints:
- matchLabels:
app: frontend
toPorts:
- ports:
- port: '8080'
rules:
http:
- method: 'GET'
path: '/api/v1/.*'
XDP — Ultra-fast Packet Processing
XDP processes packets at the network driver level — bypassing the kernel network stack.
- Cloudflare: Uses XDP for DDoS defense — filters 10M+ packets per second
- Facebook Katran: L4 load balancer — 200Gbps on a single server
- kube-proxy replacement: Cilium's host routing mode uses XDP
4.3 Security
Falco — Runtime Security Monitoring
- CNCF Incubating
- Tracks system calls and K8s events to detect anomalies
- Rule-based (Sysdig filter language)
- rule: Terminal shell in container
desc: A shell was used as the entrypoint/exec point into a container
condition: >
spawned_process and container
and shell_procs and proc.tty != 0
output: >
Shell spawned in container (user=%user.name container=%container.name
shell=%proc.name pid=%proc.pid)
priority: WARNING
Tetragon — Cilium Team's Next-Generation Security
- eBPF-based security observability + runtime enforcement
- Uses LSM hooks in addition to syscalls
- Real-time blocking capability (Falco only detects)
5. Practical: Debugging with bpftrace
5.1 Scenario: Disk I/O suddenly slow
# 1. Which process is causing disk I/O?
sudo bpftrace -e '
tracepoint:block:block_rq_issue {
@[comm] = count();
}
interval:s:5 {
print(@);
clear(@);
}'
# 2. I/O latency distribution (histogram)
sudo bpftrace -e '
kprobe:blk_account_io_start { @start[arg0] = nsecs; }
kprobe:blk_account_io_done /@start[arg0]/ {
@ms = hist((nsecs - @start[arg0]) / 1000000);
delete(@start[arg0]);
}'
5.2 Scenario: Unknown files being opened
sudo bpftrace -e '
tracepoint:syscalls:sys_enter_openat {
printf("%s opened: %s\n", comm, str(args->filename));
}'
6. CO-RE: Compile Once, Run Everywhere
6.1 Problem: Struct definitions vary between kernels
Traditional BCC compiled with kernel headers at runtime. But containers struggle to access host kernel headers, and compile times were long.
6.2 Solution: BTF + CO-RE
BTF (BPF Type Format): Kernel exposes its struct info at /sys/kernel/btf/vmlinux.
CO-RE: Compiler represents struct field accesses "symbolically", and the loader relocates them at runtime using BTF info.
#include <bpf/bpf_core_read.h>
SEC("kprobe/do_unlinkat")
int handle_unlinkat(struct pt_regs *ctx) {
struct task_struct *task = (struct task_struct *)bpf_get_current_task();
// BPF_CORE_READ: BTF-based safe field reading
pid_t pid = BPF_CORE_READ(task, pid);
bpf_printk("unlink by pid %d\n", pid);
return 0;
}
Result: A single compiled .o file works on kernels 5.4 to 6.5.
7. Production Case Studies
7.1 Netflix — Performance Debugging with bpftrace
Netflix's Brendan Gregg is eBPF's most influential advocate. Netflix uses BCC and bpftrace across tens of thousands of EC2 instances:
- Off-CPU analysis: Why processes are blocked (locks, I/O wait)
- TCP retransmission tracing: Diagnosing network issues between microservices
- Filesystem latency: Instantly identifying slow files
7.2 Cloudflare — DDoS Defense with XDP
- Adopted in 2017, continuously expanded
- L4Drop: Blocks tens of millions of malicious packets per second at the network driver level
- 10x more efficient than iptables
- 30% infrastructure cost reduction from CPU savings
7.3 Meta — eBPF Everywhere
- Katran: L4 load balancer (XDP-based)
- DCCP profiling: Performance analysis across data centers
- Kernel networking subsystem replacement: Bypassing kernel stack in some cases
7.4 Google — GKE Dataplane V2 with Cilium
- GKE Dataplane V2 is Cilium-based
- Completely replaces iptables-based kube-proxy
- Consistent performance even as cluster size grows
8. Limitations and Challenges
8.1 Verifier Constraints
- Instruction count limit (currently 1 million)
- Loop restrictions (eased by BPF_LOOP helper)
- 512-byte stack — hard to manipulate large structs
- "Fighting the verifier" is the steepest learning curve
8.2 Kernel Version Dependency
- Some features need recent kernels (e.g., ringbuf is 5.8+, BPF_LSM is 5.7+)
- CO-RE solves most issues, but new helpers are still kernel-dependent
8.3 Debugging Difficulty
- printf("debug") is impossible — only
bpf_printk()(slow) - Runtime error analysis is hard
- Verifier error messages are cryptic
9. The Future of eBPF
9.1 BPF in Windows
Microsoft is developing eBPF for Windows. Same eBPF programs can run on Windows → true cross-platform kernel programming.
9.2 sched_ext — Scheduler in eBPF
Introduced in kernel 6.12+. Write the CPU scheduler itself in eBPF → workload-specific custom scheduling.
10. Getting Started — Learning Roadmap
Week 1: Basics
bpftool prog list— Check loaded BPF programs- Install
bpftraceand explore bpftrace/tools - Read "Linux Observability with BPF" (O'Reilly) chapters 1-3
Week 2-3: BCC
- Install
bcc-toolspackage - Use tools like
execsnoop,opensnoop,tcptop - Write your first BPF program with BCC tutorial
Week 4-6: libbpf + CO-RE
- Clone libbpf-bootstrap
- Write a simple tracing tool (e.g., syscall counter)
- Master BTF and CO-RE macros
Week 7+: Real Projects
- Apply to your production environment
- Deep dive into Cilium, Falco, or Pixie
- Conferences: eBPF Summit, KubeCon (eBPF Day)
Quiz
1. Why are eBPF programs guaranteed to be safe?
Answer: Before loading into the kernel, the BPF Verifier performs static analysis. It checks for infinite loops, memory access validation, only allows approved helper functions, and limits instruction count. After verification passes, the JIT compiler converts to native machine code, running at near-native speed.
2. What problem does CO-RE solve?
Answer: Traditional BCC required compiling with kernel headers at runtime. This caused LLVM dependency, long startup times, and difficulties in container environments. CO-RE uses BTF to allow the compiler to represent struct field accesses "symbolically", with the loader performing runtime relocation. A single compiled .o file works across various kernel versions.
3. Why is XDP faster than iptables?
Answer: XDP processes packets at the network driver level. Packets are processed before they enter the kernel network stack (skb allocation, conntrack, etc.), resulting in very low overhead. Cloudflare processes 10M+ packets per second on a single server with XDP, achieving 10x more efficiency than iptables.
4. How does Cilium replace kube-proxy?
Answer: kube-proxy uses iptables or IPVS rules to route Service ClusterIPs to backend Pods. As clusters grow, iptables rules can reach thousands, degrading performance. Cilium's eBPF programs select backends directly during socket operations and packet processing, eliminating the need for iptables rules. The result is consistent performance with low CPU usage.
5. When should you use bpftrace vs libbpf?
Answer: bpftrace is suitable for ad-hoc tracing and debugging. Its DTrace-like high-level DSL allows powerful analysis in one line. libbpf + CO-RE is suitable for production tool development — small binaries, no compiler dependency, fast startup, broad kernel compatibility. Personal debugging uses bpftrace; tools you ship use libbpf.
References
- ebpf.io — Official site
- Linux Observability with BPF — David Calavera, Lorenzo Fontana
- BPF Performance Tools — Brendan Gregg
- libbpf-bootstrap — CO-RE starter template
- bcc-tools — 100+ BPF tools
- bpftrace — DTrace-like high-level tracing
- Cilium Documentation
- Falco Documentation
- Brendan Gregg's eBPF page
- Awesome eBPF
- eBPF Summit — Annual conference
- KubeCon eBPF Day — Cilium team presentations
현재 단락 (1/263)
- **eBPF** is a safe programming environment inside the Linux kernel. Add tracing/networking/securit...