Split View: eBPF 기반 Kubernetes 관측성 가이드: Cilium Hubble부터 Tetragon까지
eBPF 기반 Kubernetes 관측성 가이드: Cilium Hubble부터 Tetragon까지
eBPF란?
eBPF(extended Berkeley Packet Filter)는 리눅스 커널 내부에서 샌드박스된 프로그램을 실행할 수 있게 하는 기술입니다. 커널 모듈을 수정하지 않고도 네트워킹, 보안, 관측성 기능을 추가할 수 있습니다.
기존 관측 방식 vs eBPF
| 방식 | 오버헤드 | 커널 수정 | 가시성 |
|---|---|---|---|
| 로그 (stdout) | 중간 | 불필요 | 앱 레벨만 |
| Prometheus 메트릭 | 낮음 | 불필요 | 앱이 노출하는 것만 |
| 사이드카 프록시 (Envoy) | 높음 | 불필요 | L7까지 |
| eBPF | 매우 낮음 | 불필요 | 커널부터 L7까지 |
eBPF의 핵심 장점은 에이전트 없이 커널 레벨에서 관측할 수 있다는 것입니다.
Cilium + Hubble: 네트워크 관측성
Cilium 설치
# Cilium CLI 설치
CILIUM_CLI_VERSION=$(curl -s https://raw.githubusercontent.com/cilium/cilium-cli/main/stable.txt)
curl -L --fail --remote-name-all \
https://github.com/cilium/cilium-cli/releases/download/${CILIUM_CLI_VERSION}/cilium-linux-amd64.tar.gz
sudo tar xzvfC cilium-linux-amd64.tar.gz /usr/local/bin
rm cilium-linux-amd64.tar.gz
# Cilium 설치 (Hubble 포함)
cilium install --version 1.16.5 \
--set hubble.relay.enabled=true \
--set hubble.ui.enabled=true \
--set hubble.metrics.enableOpenMetrics=true \
--set hubble.metrics.enabled="{dns,drop,tcp,flow,port-distribution,icmp,httpV2:exemplars=true;labelsContext=source_ip\,source_namespace\,source_workload\,destination_ip\,destination_namespace\,destination_workload\,traffic_direction}"
# 상태 확인
cilium status
cilium hubble port-forward &
Hubble CLI로 네트워크 흐름 관찰
# 모든 네트워크 흐름 관찰
hubble observe
# 특정 네임스페이스의 흐름만
hubble observe --namespace production
# DNS 쿼리 관찰
hubble observe --protocol dns
# HTTP 요청 관찰 (L7)
hubble observe --protocol http
# 드롭된 패킷 관찰 (네트워크 정책 위반 등)
hubble observe --verdict DROPPED
# 특정 Pod 간 통신
hubble observe --from-pod production/frontend --to-pod production/api-server
# JSON 출력으로 파이프라인 연동
hubble observe --namespace production -o json | jq '.flow.source.identity'
Hubble UI
# Hubble UI 포트포워딩
kubectl port-forward -n kube-system svc/hubble-ui 12000:80
# 브라우저에서 http://localhost:12000 접속
# 서비스 맵 + 실시간 트래픽 흐름 확인 가능
Hubble 메트릭을 Prometheus로 수집
# Prometheus ServiceMonitor
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: hubble-metrics
namespace: kube-system
spec:
selector:
matchLabels:
k8s-app: hubble
endpoints:
- port: hubble-metrics
interval: 15s
# Grafana 대시보드용 PromQL 쿼리
# HTTP 요청 비율 (서비스별)
sum(rate(hubble_http_requests_total[5m])) by (destination_workload, method, status)
# DNS 에러율
sum(rate(hubble_dns_responses_total{rcode!="No Error"}[5m])) by (source_workload, qtypes)
# 드롭된 패킷 (네트워크 정책)
sum(rate(hubble_drop_total[5m])) by (reason, source_workload)
# TCP 연결 시간
histogram_quantile(0.99, sum(rate(hubble_tcp_connect_duration_seconds_bucket[5m])) by (le, destination_workload))
Tetragon: 보안 관측성
Tetragon은 eBPF 기반 보안 관측성 + 런타임 보안 도구입니다. 프로세스 실행, 파일 접근, 네트워크 연결 등을 커널 레벨에서 감지합니다.
Tetragon 설치
# Helm으로 설치
helm repo add cilium https://helm.cilium.io
helm install tetragon cilium/tetragon -n kube-system
# Tetragon CLI 설치
GOOS=$(go env GOOS)
GOARCH=$(go env GOARCH)
curl -L --remote-name-all \
https://github.com/cilium/tetragon/releases/latest/download/tetra-${GOOS}-${GOARCH}.tar.gz
sudo tar -C /usr/local/bin -xzvf tetra-${GOOS}-${GOARCH}.tar.gz
프로세스 이벤트 관찰
# 모든 프로세스 실행 이벤트
kubectl exec -n kube-system ds/tetragon -c tetragon -- \
tetra getevents -o compact
# 출력 예시:
# 🚀 process default/nginx-7b4f... /bin/sh -c nginx -g 'daemon off;'
# 🚀 process default/nginx-7b4f... /usr/sbin/nginx -g daemon off;
# 💥 exit default/nginx-7b4f... /bin/sh -c nginx -g 'daemon off;' 0
TracingPolicy: 커스텀 보안 규칙
# 민감한 파일 접근 감지
apiVersion: cilium.io/v1alpha1
kind: TracingPolicy
metadata:
name: sensitive-file-access
spec:
kprobes:
- call: 'fd_install'
syscall: false
args:
- index: 0
type: int
- index: 1
type: 'file'
selectors:
- matchArgs:
- index: 1
operator: 'Prefix'
values:
- '/etc/shadow'
- '/etc/passwd'
- '/root/.ssh'
- '/var/run/secrets/kubernetes.io'
# 외부 네트워크 연결 감지
apiVersion: cilium.io/v1alpha1
kind: TracingPolicy
metadata:
name: external-connections
spec:
kprobes:
- call: 'tcp_connect'
syscall: false
args:
- index: 0
type: 'sock'
selectors:
- matchArgs:
- index: 0
operator: 'NotDAddr'
values:
- '10.0.0.0/8'
- '172.16.0.0/12'
- '192.168.0.0/16'
# TracingPolicy 적용
kubectl apply -f sensitive-file-access.yaml
# 이벤트 관찰
kubectl exec -n kube-system ds/tetragon -c tetragon -- \
tetra getevents -o compact --pods nginx
# 테스트: nginx Pod에서 /etc/shadow 접근 시도
kubectl exec -it nginx -- cat /etc/shadow
# Tetragon 이벤트 출력:
# 📬 open default/nginx /etc/shadow
커스텀 eBPF 관측 도구
bpftrace로 빠른 분석
# 시스템 콜 빈도 (상위 10개)
bpftrace -e 'tracepoint:raw_syscalls:sys_enter { @[comm] = count(); }
interval:s:5 { print(@, 10); clear(@); }'
# TCP 재전송 추적
bpftrace -e 'kprobe:tcp_retransmit_skb {
@retrans[comm, pid] = count();
}'
# 파일 열기 추적
bpftrace -e 'tracepoint:syscalls:sys_enter_openat {
printf("%s PID:%d -> %s\n", comm, pid, str(args->filename));
}'
# 지연 시간이 긴 디스크 I/O (10ms 이상)
bpftrace -e 'kprobe:blk_account_io_done {
$duration = nsecs - @start[arg0];
if ($duration > 10000000) {
printf("slow IO: %d ms, comm=%s\n", $duration/1000000, comm);
}
}'
Python + bcc로 커스텀 도구
#!/usr/bin/env python3
# tcp_latency.py - TCP 연결 지연 시간 추적
from bcc import BPF
bpf_text = """
#include <net/sock.h>
#include <bcc/proto.h>
struct event_t {
u32 pid;
u32 daddr;
u16 dport;
u64 delta_us;
char comm[16];
};
BPF_HASH(start, struct sock *);
BPF_PERF_OUTPUT(events);
int trace_connect(struct pt_regs *ctx, struct sock *sk) {
u64 ts = bpf_ktime_get_ns();
start.update(&sk, &ts);
return 0;
}
int trace_tcp_rcv_state_process(struct pt_regs *ctx, struct sock *sk) {
if (sk->__sk_common.skc_state != TCP_SYN_SENT)
return 0;
u64 *tsp = start.lookup(&sk);
if (!tsp) return 0;
struct event_t event = {};
event.pid = bpf_get_current_pid_tgid() >> 32;
event.delta_us = (bpf_ktime_get_ns() - *tsp) / 1000;
event.daddr = sk->__sk_common.skc_daddr;
event.dport = sk->__sk_common.skc_dport;
bpf_get_current_comm(&event.comm, sizeof(event.comm));
events.perf_submit(ctx, &event, sizeof(event));
start.delete(&sk);
return 0;
}
"""
b = BPF(text=bpf_text)
b.attach_kprobe(event="tcp_v4_connect", fn_name="trace_connect")
b.attach_kprobe(event="tcp_rcv_state_process", fn_name="trace_tcp_rcv_state_process")
def print_event(cpu, data, size):
event = b["events"].event(data)
print(f"PID={event.pid} COMM={event.comm.decode()} "
f"DEST={socket.inet_ntoa(struct.pack('I', event.daddr))}:{socket.ntohs(event.dport)} "
f"LATENCY={event.delta_us}us")
b["events"].open_perf_buffer(print_event)
while True:
b.perf_buffer_poll()
eBPF 도구 비교
| 도구 | 용도 | 장점 | 단점 |
|---|---|---|---|
| Hubble | 네트워크 관측 | Cilium 통합, UI | Cilium 필수 |
| Tetragon | 보안 관측 | 런타임 보안 | 학습 곡선 |
| Pixie | 전체 관측 | 자동 계측, SQL 쿼리 | 리소스 사용 |
| bpftrace | 디버깅 | 빠른 일회성 분석 | 프로덕션 부적합 |
| Grafana Beyla | APM | 자동 HTTP/gRPC 계측 | L7만 |
프로덕션 도입 가이드
단계별 도입 전략
Phase 1: Cilium + Hubble (네트워크 가시성)
├── 서비스 맵 자동 생성
├── DNS 에러 감지
└── 네트워크 정책 위반 모니터링
Phase 2: Tetragon (보안 관측)
├── 프로세스 실행 감사
├── 민감 파일 접근 감지
└── 외부 연결 모니터링
Phase 3: 커스텀 eBPF (심층 분석)
├── 성능 프로파일링
├── 지연 시간 분석
└── 커스텀 메트릭
리소스 오버헤드
Cilium Agent: ~200MB RAM, ~0.1 CPU per node
Hubble Relay: ~128MB RAM, ~0.05 CPU
Tetragon: ~256MB RAM, ~0.15 CPU per node
총 오버헤드: 노드당 ~1% CPU, ~600MB RAM
→ 사이드카 프록시(Envoy) 대비 50~70% 절약
📝 확인 퀴즈 (6문제)
Q1. eBPF가 기존 관측 방식보다 유리한 점은?
커널 레벨에서 동작하여 앱 수정 없이 네트워크, 프로세스, 파일 접근 등을 관측할 수 있으며, 사이드카 프록시 대비 오버헤드가 매우 낮습니다.
Q2. Hubble로 DNS 에러를 관찰하는 명령어는?
hubble observe --protocol dns 또는 DNS 메트릭의 hubble_dns_responses_total{rcode!="No Error"} 확인
Q3. Tetragon의 TracingPolicy로 감지할 수 있는 이벤트 유형은?
프로세스 실행, 파일 접근(open), 네트워크 연결(tcp_connect), 시스템 콜 등 커널 레벨 이벤트
Q4. bpftrace와 bcc의 차이점은?
bpftrace는 AWK 스타일의 간결한 문법으로 빠른 일회성 분석에 적합하고, bcc는 Python/C로 복잡한 커스텀 도구를 개발할 수 있습니다.
Q5. Cilium + Hubble의 노드당 리소스 오버헤드는 대략?
CPU ~0.1코어, RAM ~200MB (Hubble Relay 별도)
Q6. eBPF 기반 관측 도입 시 첫 번째 단계로 추천하는 것은?
Cilium + Hubble로 네트워크 가시성을 확보하는 것. 서비스 맵, DNS 에러, 네트워크 정책 위반을 먼저 모니터링합니다.
eBPF-Based Kubernetes Observability Guide: From Cilium Hubble to Tetragon
- What is eBPF?
- Cilium + Hubble: Network Observability
- Tetragon: Security Observability
- Custom eBPF Observability Tools
- eBPF Tool Comparison
- Production Adoption Guide
- Quiz
What is eBPF?
eBPF (extended Berkeley Packet Filter) is a technology that allows running sandboxed programs inside the Linux kernel. It enables adding networking, security, and observability capabilities without modifying kernel modules.
Traditional Observability vs eBPF
| Approach | Overhead | Kernel Modification | Visibility |
|---|---|---|---|
| Logs (stdout) | Medium | Not required | App level only |
| Prometheus Metrics | Low | Not required | Only what app exposes |
| Sidecar Proxy (Envoy) | High | Not required | Up to L7 |
| eBPF | Very low | Not required | From kernel to L7 |
The key advantage of eBPF is the ability to observe at the kernel level without agents.
Cilium + Hubble: Network Observability
Installing Cilium
# Install Cilium CLI
CILIUM_CLI_VERSION=$(curl -s https://raw.githubusercontent.com/cilium/cilium-cli/main/stable.txt)
curl -L --fail --remote-name-all \
https://github.com/cilium/cilium-cli/releases/download/${CILIUM_CLI_VERSION}/cilium-linux-amd64.tar.gz
sudo tar xzvfC cilium-linux-amd64.tar.gz /usr/local/bin
rm cilium-linux-amd64.tar.gz
# Install Cilium (with Hubble)
cilium install --version 1.16.5 \
--set hubble.relay.enabled=true \
--set hubble.ui.enabled=true \
--set hubble.metrics.enableOpenMetrics=true \
--set hubble.metrics.enabled="{dns,drop,tcp,flow,port-distribution,icmp,httpV2:exemplars=true;labelsContext=source_ip\,source_namespace\,source_workload\,destination_ip\,destination_namespace\,destination_workload\,traffic_direction}"
# Check status
cilium status
cilium hubble port-forward &
Observing Network Flows with Hubble CLI
# Observe all network flows
hubble observe
# Flows from a specific namespace only
hubble observe --namespace production
# Observe DNS queries
hubble observe --protocol dns
# Observe HTTP requests (L7)
hubble observe --protocol http
# Observe dropped packets (network policy violations, etc.)
hubble observe --verdict DROPPED
# Communication between specific Pods
hubble observe --from-pod production/frontend --to-pod production/api-server
# JSON output for pipeline integration
hubble observe --namespace production -o json | jq '.flow.source.identity'
Hubble UI
# Hubble UI port forwarding
kubectl port-forward -n kube-system svc/hubble-ui 12000:80
# Access http://localhost:12000 in your browser
# View service map + real-time traffic flows
Collecting Hubble Metrics with Prometheus
# Prometheus ServiceMonitor
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: hubble-metrics
namespace: kube-system
spec:
selector:
matchLabels:
k8s-app: hubble
endpoints:
- port: hubble-metrics
interval: 15s
# PromQL queries for Grafana dashboards
# HTTP request rate (by service)
sum(rate(hubble_http_requests_total[5m])) by (destination_workload, method, status)
# DNS error rate
sum(rate(hubble_dns_responses_total{rcode!="No Error"}[5m])) by (source_workload, qtypes)
# Dropped packets (network policy)
sum(rate(hubble_drop_total[5m])) by (reason, source_workload)
# TCP connection time
histogram_quantile(0.99, sum(rate(hubble_tcp_connect_duration_seconds_bucket[5m])) by (le, destination_workload))
Tetragon: Security Observability
Tetragon is an eBPF-based security observability + runtime security tool. It detects process execution, file access, network connections, and more at the kernel level.
Installing Tetragon
# Install with Helm
helm repo add cilium https://helm.cilium.io
helm install tetragon cilium/tetragon -n kube-system
# Install Tetragon CLI
GOOS=$(go env GOOS)
GOARCH=$(go env GOARCH)
curl -L --remote-name-all \
https://github.com/cilium/tetragon/releases/latest/download/tetra-${GOOS}-${GOARCH}.tar.gz
sudo tar -C /usr/local/bin -xzvf tetra-${GOOS}-${GOARCH}.tar.gz
Observing Process Events
# All process execution events
kubectl exec -n kube-system ds/tetragon -c tetragon -- \
tetra getevents -o compact
# Example output:
# 🚀 process default/nginx-7b4f... /bin/sh -c nginx -g 'daemon off;'
# 🚀 process default/nginx-7b4f... /usr/sbin/nginx -g daemon off;
# 💥 exit default/nginx-7b4f... /bin/sh -c nginx -g 'daemon off;' 0
TracingPolicy: Custom Security Rules
# Detect sensitive file access
apiVersion: cilium.io/v1alpha1
kind: TracingPolicy
metadata:
name: sensitive-file-access
spec:
kprobes:
- call: 'fd_install'
syscall: false
args:
- index: 0
type: int
- index: 1
type: 'file'
selectors:
- matchArgs:
- index: 1
operator: 'Prefix'
values:
- '/etc/shadow'
- '/etc/passwd'
- '/root/.ssh'
- '/var/run/secrets/kubernetes.io'
# Detect external network connections
apiVersion: cilium.io/v1alpha1
kind: TracingPolicy
metadata:
name: external-connections
spec:
kprobes:
- call: 'tcp_connect'
syscall: false
args:
- index: 0
type: 'sock'
selectors:
- matchArgs:
- index: 0
operator: 'NotDAddr'
values:
- '10.0.0.0/8'
- '172.16.0.0/12'
- '192.168.0.0/16'
# Apply TracingPolicy
kubectl apply -f sensitive-file-access.yaml
# Observe events
kubectl exec -n kube-system ds/tetragon -c tetragon -- \
tetra getevents -o compact --pods nginx
# Test: attempt to access /etc/shadow from nginx Pod
kubectl exec -it nginx -- cat /etc/shadow
# Tetragon event output:
# 📬 open default/nginx /etc/shadow
Custom eBPF Observability Tools
Quick Analysis with bpftrace
# System call frequency (top 10)
bpftrace -e 'tracepoint:raw_syscalls:sys_enter { @[comm] = count(); }
interval:s:5 { print(@, 10); clear(@); }'
# Track TCP retransmissions
bpftrace -e 'kprobe:tcp_retransmit_skb {
@retrans[comm, pid] = count();
}'
# Track file opens
bpftrace -e 'tracepoint:syscalls:sys_enter_openat {
printf("%s PID:%d -> %s\n", comm, pid, str(args->filename));
}'
# High-latency disk I/O (over 10ms)
bpftrace -e 'kprobe:blk_account_io_done {
$duration = nsecs - @start[arg0];
if ($duration > 10000000) {
printf("slow IO: %d ms, comm=%s\n", $duration/1000000, comm);
}
}'
Custom Tools with Python + bcc
#!/usr/bin/env python3
# tcp_latency.py - Track TCP connection latency
from bcc import BPF
bpf_text = """
#include <net/sock.h>
#include <bcc/proto.h>
struct event_t {
u32 pid;
u32 daddr;
u16 dport;
u64 delta_us;
char comm[16];
};
BPF_HASH(start, struct sock *);
BPF_PERF_OUTPUT(events);
int trace_connect(struct pt_regs *ctx, struct sock *sk) {
u64 ts = bpf_ktime_get_ns();
start.update(&sk, &ts);
return 0;
}
int trace_tcp_rcv_state_process(struct pt_regs *ctx, struct sock *sk) {
if (sk->__sk_common.skc_state != TCP_SYN_SENT)
return 0;
u64 *tsp = start.lookup(&sk);
if (!tsp) return 0;
struct event_t event = {};
event.pid = bpf_get_current_pid_tgid() >> 32;
event.delta_us = (bpf_ktime_get_ns() - *tsp) / 1000;
event.daddr = sk->__sk_common.skc_daddr;
event.dport = sk->__sk_common.skc_dport;
bpf_get_current_comm(&event.comm, sizeof(event.comm));
events.perf_submit(ctx, &event, sizeof(event));
start.delete(&sk);
return 0;
}
"""
b = BPF(text=bpf_text)
b.attach_kprobe(event="tcp_v4_connect", fn_name="trace_connect")
b.attach_kprobe(event="tcp_rcv_state_process", fn_name="trace_tcp_rcv_state_process")
def print_event(cpu, data, size):
event = b["events"].event(data)
print(f"PID={event.pid} COMM={event.comm.decode()} "
f"DEST={socket.inet_ntoa(struct.pack('I', event.daddr))}:{socket.ntohs(event.dport)} "
f"LATENCY={event.delta_us}us")
b["events"].open_perf_buffer(print_event)
while True:
b.perf_buffer_poll()
eBPF Tool Comparison
| Tool | Use Case | Strengths | Limitations |
|---|---|---|---|
| Hubble | Network observability | Cilium integration, UI | Requires Cilium |
| Tetragon | Security observability | Runtime security | Learning curve |
| Pixie | Full observability | Auto-instrumentation, SQL queries | Resource usage |
| bpftrace | Debugging | Quick ad-hoc analysis | Not for production |
| Grafana Beyla | APM | Auto HTTP/gRPC instrumentation | L7 only |
Production Adoption Guide
Phased Adoption Strategy
Phase 1: Cilium + Hubble (Network Visibility)
├── Automatic service map generation
├── DNS error detection
└── Network policy violation monitoring
Phase 2: Tetragon (Security Observability)
├── Process execution auditing
├── Sensitive file access detection
└── External connection monitoring
Phase 3: Custom eBPF (Deep Analysis)
├── Performance profiling
├── Latency analysis
└── Custom metrics
Resource Overhead
Cilium Agent: ~200MB RAM, ~0.1 CPU per node
Hubble Relay: ~128MB RAM, ~0.05 CPU
Tetragon: ~256MB RAM, ~0.15 CPU per node
Total overhead: ~1% CPU, ~600MB RAM per node
→ 50-70% savings compared to sidecar proxies (Envoy)
Review Quiz (6 Questions)
Q1. What advantage does eBPF have over traditional observability approaches?
It operates at the kernel level, enabling observation of network, process, and file access events without application modifications, with significantly lower overhead compared to sidecar proxies.
Q2. What command observes DNS errors with Hubble?
hubble observe --protocol dns or check the DNS metric hubble_dns_responses_total{rcode!="No Error"}
Q3. What types of events can Tetragon's TracingPolicy detect?
Kernel-level events such as process execution, file access (open), network connections (tcp_connect), and system calls.
Q4. What is the difference between bpftrace and bcc?
bpftrace uses a concise AWK-style syntax suitable for quick ad-hoc analysis, while bcc allows developing complex custom tools with Python/C.
Q5. What is the approximate per-node resource overhead of Cilium + Hubble?
Approximately 0.1 CPU cores and 200MB RAM (Hubble Relay is separate).
Q6. What is the recommended first step when adopting eBPF-based observability?
Establish network visibility with Cilium + Hubble. Start by monitoring service maps, DNS errors, and network policy violations.
Quiz
Q1: What is the main topic covered in "eBPF-Based Kubernetes Observability Guide: From Cilium
Hubble to Tetragon"?
Hands-on guide to eBPF-based Kubernetes observability tools. Covers network flow observation with Cilium Hubble, security event detection with Tetragon, and writing custom eBPF programs.
Q2: What is eBPF??
eBPF (extended Berkeley Packet Filter) is a technology that allows running sandboxed programs
inside the Linux kernel. It enables adding networking, security, and observability capabilities
without modifying kernel modules.
Q3: Explain the core concept of Cilium + Hubble: Network Observability.
Installing Cilium Observing Network Flows with Hubble CLI Hubble UI Collecting Hubble Metrics with
Prometheus
Q4: What are the key aspects of Tetragon: Security Observability?
Tetragon is an eBPF-based security observability + runtime security tool. It detects process
execution, file access, network connections, and more at the kernel level. Installing Tetragon
Observing Process Events TracingPolicy: Custom Security Rules
Q5: How does Custom eBPF Observability Tools work?
Quick Analysis with bpftrace Custom Tools with Python + bcc