Skip to content

Split View: eBPF 기반 Kubernetes 관측성 가이드: Cilium Hubble부터 Tetragon까지

|

eBPF 기반 Kubernetes 관측성 가이드: Cilium Hubble부터 Tetragon까지

eBPF란?

eBPF(extended Berkeley Packet Filter)는 리눅스 커널 내부에서 샌드박스된 프로그램을 실행할 수 있게 하는 기술입니다. 커널 모듈을 수정하지 않고도 네트워킹, 보안, 관측성 기능을 추가할 수 있습니다.

기존 관측 방식 vs eBPF

방식오버헤드커널 수정가시성
로그 (stdout)중간불필요앱 레벨만
Prometheus 메트릭낮음불필요앱이 노출하는 것만
사이드카 프록시 (Envoy)높음불필요L7까지
eBPF매우 낮음불필요커널부터 L7까지

eBPF의 핵심 장점은 에이전트 없이 커널 레벨에서 관측할 수 있다는 것입니다.

Cilium + Hubble: 네트워크 관측성

Cilium 설치

# Cilium CLI 설치
CILIUM_CLI_VERSION=$(curl -s https://raw.githubusercontent.com/cilium/cilium-cli/main/stable.txt)
curl -L --fail --remote-name-all \
  https://github.com/cilium/cilium-cli/releases/download/${CILIUM_CLI_VERSION}/cilium-linux-amd64.tar.gz
sudo tar xzvfC cilium-linux-amd64.tar.gz /usr/local/bin
rm cilium-linux-amd64.tar.gz

# Cilium 설치 (Hubble 포함)
cilium install --version 1.16.5 \
  --set hubble.relay.enabled=true \
  --set hubble.ui.enabled=true \
  --set hubble.metrics.enableOpenMetrics=true \
  --set hubble.metrics.enabled="{dns,drop,tcp,flow,port-distribution,icmp,httpV2:exemplars=true;labelsContext=source_ip\,source_namespace\,source_workload\,destination_ip\,destination_namespace\,destination_workload\,traffic_direction}"

# 상태 확인
cilium status
cilium hubble port-forward &

Hubble CLI로 네트워크 흐름 관찰

# 모든 네트워크 흐름 관찰
hubble observe

# 특정 네임스페이스의 흐름만
hubble observe --namespace production

# DNS 쿼리 관찰
hubble observe --protocol dns

# HTTP 요청 관찰 (L7)
hubble observe --protocol http

# 드롭된 패킷 관찰 (네트워크 정책 위반 등)
hubble observe --verdict DROPPED

# 특정 Pod 간 통신
hubble observe --from-pod production/frontend --to-pod production/api-server

# JSON 출력으로 파이프라인 연동
hubble observe --namespace production -o json | jq '.flow.source.identity'

Hubble UI

# Hubble UI 포트포워딩
kubectl port-forward -n kube-system svc/hubble-ui 12000:80

# 브라우저에서 http://localhost:12000 접속
# 서비스 맵 + 실시간 트래픽 흐름 확인 가능

Hubble 메트릭을 Prometheus로 수집

# Prometheus ServiceMonitor
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: hubble-metrics
  namespace: kube-system
spec:
  selector:
    matchLabels:
      k8s-app: hubble
  endpoints:
    - port: hubble-metrics
      interval: 15s
# Grafana 대시보드용 PromQL 쿼리

# HTTP 요청 비율 (서비스별)
sum(rate(hubble_http_requests_total[5m])) by (destination_workload, method, status)

# DNS 에러율
sum(rate(hubble_dns_responses_total{rcode!="No Error"}[5m])) by (source_workload, qtypes)

# 드롭된 패킷 (네트워크 정책)
sum(rate(hubble_drop_total[5m])) by (reason, source_workload)

# TCP 연결 시간
histogram_quantile(0.99, sum(rate(hubble_tcp_connect_duration_seconds_bucket[5m])) by (le, destination_workload))

Tetragon: 보안 관측성

Tetragon은 eBPF 기반 보안 관측성 + 런타임 보안 도구입니다. 프로세스 실행, 파일 접근, 네트워크 연결 등을 커널 레벨에서 감지합니다.

Tetragon 설치

# Helm으로 설치
helm repo add cilium https://helm.cilium.io
helm install tetragon cilium/tetragon -n kube-system

# Tetragon CLI 설치
GOOS=$(go env GOOS)
GOARCH=$(go env GOARCH)
curl -L --remote-name-all \
  https://github.com/cilium/tetragon/releases/latest/download/tetra-${GOOS}-${GOARCH}.tar.gz
sudo tar -C /usr/local/bin -xzvf tetra-${GOOS}-${GOARCH}.tar.gz

프로세스 이벤트 관찰

# 모든 프로세스 실행 이벤트
kubectl exec -n kube-system ds/tetragon -c tetragon -- \
  tetra getevents -o compact

# 출력 예시:
# 🚀 process default/nginx-7b4f... /bin/sh -c nginx -g 'daemon off;'
# 🚀 process default/nginx-7b4f... /usr/sbin/nginx -g daemon off;
# 💥 exit    default/nginx-7b4f... /bin/sh -c nginx -g 'daemon off;' 0

TracingPolicy: 커스텀 보안 규칙

# 민감한 파일 접근 감지
apiVersion: cilium.io/v1alpha1
kind: TracingPolicy
metadata:
  name: sensitive-file-access
spec:
  kprobes:
    - call: 'fd_install'
      syscall: false
      args:
        - index: 0
          type: int
        - index: 1
          type: 'file'
      selectors:
        - matchArgs:
            - index: 1
              operator: 'Prefix'
              values:
                - '/etc/shadow'
                - '/etc/passwd'
                - '/root/.ssh'
                - '/var/run/secrets/kubernetes.io'
# 외부 네트워크 연결 감지
apiVersion: cilium.io/v1alpha1
kind: TracingPolicy
metadata:
  name: external-connections
spec:
  kprobes:
    - call: 'tcp_connect'
      syscall: false
      args:
        - index: 0
          type: 'sock'
      selectors:
        - matchArgs:
            - index: 0
              operator: 'NotDAddr'
              values:
                - '10.0.0.0/8'
                - '172.16.0.0/12'
                - '192.168.0.0/16'
# TracingPolicy 적용
kubectl apply -f sensitive-file-access.yaml

# 이벤트 관찰
kubectl exec -n kube-system ds/tetragon -c tetragon -- \
  tetra getevents -o compact --pods nginx

# 테스트: nginx Pod에서 /etc/shadow 접근 시도
kubectl exec -it nginx -- cat /etc/shadow
# Tetragon 이벤트 출력:
# 📬 open    default/nginx /etc/shadow

커스텀 eBPF 관측 도구

bpftrace로 빠른 분석

# 시스템 콜 빈도 (상위 10개)
bpftrace -e 'tracepoint:raw_syscalls:sys_enter { @[comm] = count(); }
  interval:s:5 { print(@, 10); clear(@); }'

# TCP 재전송 추적
bpftrace -e 'kprobe:tcp_retransmit_skb {
  @retrans[comm, pid] = count();
}'

# 파일 열기 추적
bpftrace -e 'tracepoint:syscalls:sys_enter_openat {
  printf("%s PID:%d -> %s\n", comm, pid, str(args->filename));
}'

# 지연 시간이 긴 디스크 I/O (10ms 이상)
bpftrace -e 'kprobe:blk_account_io_done {
  $duration = nsecs - @start[arg0];
  if ($duration > 10000000) {
    printf("slow IO: %d ms, comm=%s\n", $duration/1000000, comm);
  }
}'

Python + bcc로 커스텀 도구

#!/usr/bin/env python3
# tcp_latency.py - TCP 연결 지연 시간 추적
from bcc import BPF

bpf_text = """
#include <net/sock.h>
#include <bcc/proto.h>

struct event_t {
    u32 pid;
    u32 daddr;
    u16 dport;
    u64 delta_us;
    char comm[16];
};

BPF_HASH(start, struct sock *);
BPF_PERF_OUTPUT(events);

int trace_connect(struct pt_regs *ctx, struct sock *sk) {
    u64 ts = bpf_ktime_get_ns();
    start.update(&sk, &ts);
    return 0;
}

int trace_tcp_rcv_state_process(struct pt_regs *ctx, struct sock *sk) {
    if (sk->__sk_common.skc_state != TCP_SYN_SENT)
        return 0;

    u64 *tsp = start.lookup(&sk);
    if (!tsp) return 0;

    struct event_t event = {};
    event.pid = bpf_get_current_pid_tgid() >> 32;
    event.delta_us = (bpf_ktime_get_ns() - *tsp) / 1000;
    event.daddr = sk->__sk_common.skc_daddr;
    event.dport = sk->__sk_common.skc_dport;
    bpf_get_current_comm(&event.comm, sizeof(event.comm));

    events.perf_submit(ctx, &event, sizeof(event));
    start.delete(&sk);
    return 0;
}
"""

b = BPF(text=bpf_text)
b.attach_kprobe(event="tcp_v4_connect", fn_name="trace_connect")
b.attach_kprobe(event="tcp_rcv_state_process", fn_name="trace_tcp_rcv_state_process")

def print_event(cpu, data, size):
    event = b["events"].event(data)
    print(f"PID={event.pid} COMM={event.comm.decode()} "
          f"DEST={socket.inet_ntoa(struct.pack('I', event.daddr))}:{socket.ntohs(event.dport)} "
          f"LATENCY={event.delta_us}us")

b["events"].open_perf_buffer(print_event)
while True:
    b.perf_buffer_poll()

eBPF 도구 비교

도구용도장점단점
Hubble네트워크 관측Cilium 통합, UICilium 필수
Tetragon보안 관측런타임 보안학습 곡선
Pixie전체 관측자동 계측, SQL 쿼리리소스 사용
bpftrace디버깅빠른 일회성 분석프로덕션 부적합
Grafana BeylaAPM자동 HTTP/gRPC 계측L7만

프로덕션 도입 가이드

단계별 도입 전략

Phase 1: Cilium + Hubble (네트워크 가시성)
  ├── 서비스 맵 자동 생성
  ├── DNS 에러 감지
  └── 네트워크 정책 위반 모니터링

Phase 2: Tetragon (보안 관측)
  ├── 프로세스 실행 감사
  ├── 민감 파일 접근 감지
  └── 외부 연결 모니터링

Phase 3: 커스텀 eBPF (심층 분석)
  ├── 성능 프로파일링
  ├── 지연 시간 분석
  └── 커스텀 메트릭

리소스 오버헤드

Cilium Agent: ~200MB RAM, ~0.1 CPU per node
Hubble Relay: ~128MB RAM, ~0.05 CPU
Tetragon: ~256MB RAM, ~0.15 CPU per node

총 오버헤드: 노드당 ~1% CPU, ~600MB RAM
→ 사이드카 프록시(Envoy) 대비 50~70% 절약

📝 확인 퀴즈 (6문제)

Q1. eBPF가 기존 관측 방식보다 유리한 점은?

커널 레벨에서 동작하여 앱 수정 없이 네트워크, 프로세스, 파일 접근 등을 관측할 수 있으며, 사이드카 프록시 대비 오버헤드가 매우 낮습니다.

Q2. Hubble로 DNS 에러를 관찰하는 명령어는?

hubble observe --protocol dns 또는 DNS 메트릭의 hubble_dns_responses_total{rcode!="No Error"} 확인

Q3. Tetragon의 TracingPolicy로 감지할 수 있는 이벤트 유형은?

프로세스 실행, 파일 접근(open), 네트워크 연결(tcp_connect), 시스템 콜 등 커널 레벨 이벤트

Q4. bpftrace와 bcc의 차이점은?

bpftrace는 AWK 스타일의 간결한 문법으로 빠른 일회성 분석에 적합하고, bcc는 Python/C로 복잡한 커스텀 도구를 개발할 수 있습니다.

Q5. Cilium + Hubble의 노드당 리소스 오버헤드는 대략?

CPU ~0.1코어, RAM ~200MB (Hubble Relay 별도)

Q6. eBPF 기반 관측 도입 시 첫 번째 단계로 추천하는 것은?

Cilium + Hubble로 네트워크 가시성을 확보하는 것. 서비스 맵, DNS 에러, 네트워크 정책 위반을 먼저 모니터링합니다.

eBPF-Based Kubernetes Observability Guide: From Cilium Hubble to Tetragon

What is eBPF?

eBPF (extended Berkeley Packet Filter) is a technology that allows running sandboxed programs inside the Linux kernel. It enables adding networking, security, and observability capabilities without modifying kernel modules.

Traditional Observability vs eBPF

ApproachOverheadKernel ModificationVisibility
Logs (stdout)MediumNot requiredApp level only
Prometheus MetricsLowNot requiredOnly what app exposes
Sidecar Proxy (Envoy)HighNot requiredUp to L7
eBPFVery lowNot requiredFrom kernel to L7

The key advantage of eBPF is the ability to observe at the kernel level without agents.

Cilium + Hubble: Network Observability

Installing Cilium

# Install Cilium CLI
CILIUM_CLI_VERSION=$(curl -s https://raw.githubusercontent.com/cilium/cilium-cli/main/stable.txt)
curl -L --fail --remote-name-all \
  https://github.com/cilium/cilium-cli/releases/download/${CILIUM_CLI_VERSION}/cilium-linux-amd64.tar.gz
sudo tar xzvfC cilium-linux-amd64.tar.gz /usr/local/bin
rm cilium-linux-amd64.tar.gz

# Install Cilium (with Hubble)
cilium install --version 1.16.5 \
  --set hubble.relay.enabled=true \
  --set hubble.ui.enabled=true \
  --set hubble.metrics.enableOpenMetrics=true \
  --set hubble.metrics.enabled="{dns,drop,tcp,flow,port-distribution,icmp,httpV2:exemplars=true;labelsContext=source_ip\,source_namespace\,source_workload\,destination_ip\,destination_namespace\,destination_workload\,traffic_direction}"

# Check status
cilium status
cilium hubble port-forward &

Observing Network Flows with Hubble CLI

# Observe all network flows
hubble observe

# Flows from a specific namespace only
hubble observe --namespace production

# Observe DNS queries
hubble observe --protocol dns

# Observe HTTP requests (L7)
hubble observe --protocol http

# Observe dropped packets (network policy violations, etc.)
hubble observe --verdict DROPPED

# Communication between specific Pods
hubble observe --from-pod production/frontend --to-pod production/api-server

# JSON output for pipeline integration
hubble observe --namespace production -o json | jq '.flow.source.identity'

Hubble UI

# Hubble UI port forwarding
kubectl port-forward -n kube-system svc/hubble-ui 12000:80

# Access http://localhost:12000 in your browser
# View service map + real-time traffic flows

Collecting Hubble Metrics with Prometheus

# Prometheus ServiceMonitor
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: hubble-metrics
  namespace: kube-system
spec:
  selector:
    matchLabels:
      k8s-app: hubble
  endpoints:
    - port: hubble-metrics
      interval: 15s
# PromQL queries for Grafana dashboards

# HTTP request rate (by service)
sum(rate(hubble_http_requests_total[5m])) by (destination_workload, method, status)

# DNS error rate
sum(rate(hubble_dns_responses_total{rcode!="No Error"}[5m])) by (source_workload, qtypes)

# Dropped packets (network policy)
sum(rate(hubble_drop_total[5m])) by (reason, source_workload)

# TCP connection time
histogram_quantile(0.99, sum(rate(hubble_tcp_connect_duration_seconds_bucket[5m])) by (le, destination_workload))

Tetragon: Security Observability

Tetragon is an eBPF-based security observability + runtime security tool. It detects process execution, file access, network connections, and more at the kernel level.

Installing Tetragon

# Install with Helm
helm repo add cilium https://helm.cilium.io
helm install tetragon cilium/tetragon -n kube-system

# Install Tetragon CLI
GOOS=$(go env GOOS)
GOARCH=$(go env GOARCH)
curl -L --remote-name-all \
  https://github.com/cilium/tetragon/releases/latest/download/tetra-${GOOS}-${GOARCH}.tar.gz
sudo tar -C /usr/local/bin -xzvf tetra-${GOOS}-${GOARCH}.tar.gz

Observing Process Events

# All process execution events
kubectl exec -n kube-system ds/tetragon -c tetragon -- \
  tetra getevents -o compact

# Example output:
# 🚀 process default/nginx-7b4f... /bin/sh -c nginx -g 'daemon off;'
# 🚀 process default/nginx-7b4f... /usr/sbin/nginx -g daemon off;
# 💥 exit    default/nginx-7b4f... /bin/sh -c nginx -g 'daemon off;' 0

TracingPolicy: Custom Security Rules

# Detect sensitive file access
apiVersion: cilium.io/v1alpha1
kind: TracingPolicy
metadata:
  name: sensitive-file-access
spec:
  kprobes:
    - call: 'fd_install'
      syscall: false
      args:
        - index: 0
          type: int
        - index: 1
          type: 'file'
      selectors:
        - matchArgs:
            - index: 1
              operator: 'Prefix'
              values:
                - '/etc/shadow'
                - '/etc/passwd'
                - '/root/.ssh'
                - '/var/run/secrets/kubernetes.io'
# Detect external network connections
apiVersion: cilium.io/v1alpha1
kind: TracingPolicy
metadata:
  name: external-connections
spec:
  kprobes:
    - call: 'tcp_connect'
      syscall: false
      args:
        - index: 0
          type: 'sock'
      selectors:
        - matchArgs:
            - index: 0
              operator: 'NotDAddr'
              values:
                - '10.0.0.0/8'
                - '172.16.0.0/12'
                - '192.168.0.0/16'
# Apply TracingPolicy
kubectl apply -f sensitive-file-access.yaml

# Observe events
kubectl exec -n kube-system ds/tetragon -c tetragon -- \
  tetra getevents -o compact --pods nginx

# Test: attempt to access /etc/shadow from nginx Pod
kubectl exec -it nginx -- cat /etc/shadow
# Tetragon event output:
# 📬 open    default/nginx /etc/shadow

Custom eBPF Observability Tools

Quick Analysis with bpftrace

# System call frequency (top 10)
bpftrace -e 'tracepoint:raw_syscalls:sys_enter { @[comm] = count(); }
  interval:s:5 { print(@, 10); clear(@); }'

# Track TCP retransmissions
bpftrace -e 'kprobe:tcp_retransmit_skb {
  @retrans[comm, pid] = count();
}'

# Track file opens
bpftrace -e 'tracepoint:syscalls:sys_enter_openat {
  printf("%s PID:%d -> %s\n", comm, pid, str(args->filename));
}'

# High-latency disk I/O (over 10ms)
bpftrace -e 'kprobe:blk_account_io_done {
  $duration = nsecs - @start[arg0];
  if ($duration > 10000000) {
    printf("slow IO: %d ms, comm=%s\n", $duration/1000000, comm);
  }
}'

Custom Tools with Python + bcc

#!/usr/bin/env python3
# tcp_latency.py - Track TCP connection latency
from bcc import BPF

bpf_text = """
#include <net/sock.h>
#include <bcc/proto.h>

struct event_t {
    u32 pid;
    u32 daddr;
    u16 dport;
    u64 delta_us;
    char comm[16];
};

BPF_HASH(start, struct sock *);
BPF_PERF_OUTPUT(events);

int trace_connect(struct pt_regs *ctx, struct sock *sk) {
    u64 ts = bpf_ktime_get_ns();
    start.update(&sk, &ts);
    return 0;
}

int trace_tcp_rcv_state_process(struct pt_regs *ctx, struct sock *sk) {
    if (sk->__sk_common.skc_state != TCP_SYN_SENT)
        return 0;

    u64 *tsp = start.lookup(&sk);
    if (!tsp) return 0;

    struct event_t event = {};
    event.pid = bpf_get_current_pid_tgid() >> 32;
    event.delta_us = (bpf_ktime_get_ns() - *tsp) / 1000;
    event.daddr = sk->__sk_common.skc_daddr;
    event.dport = sk->__sk_common.skc_dport;
    bpf_get_current_comm(&event.comm, sizeof(event.comm));

    events.perf_submit(ctx, &event, sizeof(event));
    start.delete(&sk);
    return 0;
}
"""

b = BPF(text=bpf_text)
b.attach_kprobe(event="tcp_v4_connect", fn_name="trace_connect")
b.attach_kprobe(event="tcp_rcv_state_process", fn_name="trace_tcp_rcv_state_process")

def print_event(cpu, data, size):
    event = b["events"].event(data)
    print(f"PID={event.pid} COMM={event.comm.decode()} "
          f"DEST={socket.inet_ntoa(struct.pack('I', event.daddr))}:{socket.ntohs(event.dport)} "
          f"LATENCY={event.delta_us}us")

b["events"].open_perf_buffer(print_event)
while True:
    b.perf_buffer_poll()

eBPF Tool Comparison

ToolUse CaseStrengthsLimitations
HubbleNetwork observabilityCilium integration, UIRequires Cilium
TetragonSecurity observabilityRuntime securityLearning curve
PixieFull observabilityAuto-instrumentation, SQL queriesResource usage
bpftraceDebuggingQuick ad-hoc analysisNot for production
Grafana BeylaAPMAuto HTTP/gRPC instrumentationL7 only

Production Adoption Guide

Phased Adoption Strategy

Phase 1: Cilium + Hubble (Network Visibility)
  ├── Automatic service map generation
  ├── DNS error detection
  └── Network policy violation monitoring

Phase 2: Tetragon (Security Observability)
  ├── Process execution auditing
  ├── Sensitive file access detection
  └── External connection monitoring

Phase 3: Custom eBPF (Deep Analysis)
  ├── Performance profiling
  ├── Latency analysis
  └── Custom metrics

Resource Overhead

Cilium Agent: ~200MB RAM, ~0.1 CPU per node
Hubble Relay: ~128MB RAM, ~0.05 CPU
Tetragon: ~256MB RAM, ~0.15 CPU per node

Total overhead: ~1% CPU, ~600MB RAM per node
50-70% savings compared to sidecar proxies (Envoy)

Review Quiz (6 Questions)

Q1. What advantage does eBPF have over traditional observability approaches?

It operates at the kernel level, enabling observation of network, process, and file access events without application modifications, with significantly lower overhead compared to sidecar proxies.

Q2. What command observes DNS errors with Hubble?

hubble observe --protocol dns or check the DNS metric hubble_dns_responses_total{rcode!="No Error"}

Q3. What types of events can Tetragon's TracingPolicy detect?

Kernel-level events such as process execution, file access (open), network connections (tcp_connect), and system calls.

Q4. What is the difference between bpftrace and bcc?

bpftrace uses a concise AWK-style syntax suitable for quick ad-hoc analysis, while bcc allows developing complex custom tools with Python/C.

Q5. What is the approximate per-node resource overhead of Cilium + Hubble?

Approximately 0.1 CPU cores and 200MB RAM (Hubble Relay is separate).

Q6. What is the recommended first step when adopting eBPF-based observability?

Establish network visibility with Cilium + Hubble. Start by monitoring service maps, DNS errors, and network policy violations.

Quiz

Q1: What is the main topic covered in "eBPF-Based Kubernetes Observability Guide: From Cilium Hubble to Tetragon"?

Hands-on guide to eBPF-based Kubernetes observability tools. Covers network flow observation with Cilium Hubble, security event detection with Tetragon, and writing custom eBPF programs.

Q2: What is eBPF?? eBPF (extended Berkeley Packet Filter) is a technology that allows running sandboxed programs inside the Linux kernel. It enables adding networking, security, and observability capabilities without modifying kernel modules.

Q3: Explain the core concept of Cilium + Hubble: Network Observability. Installing Cilium Observing Network Flows with Hubble CLI Hubble UI Collecting Hubble Metrics with Prometheus

Q4: What are the key aspects of Tetragon: Security Observability? Tetragon is an eBPF-based security observability + runtime security tool. It detects process execution, file access, network connections, and more at the kernel level. Installing Tetragon Observing Process Events TracingPolicy: Custom Security Rules

Q5: How does Custom eBPF Observability Tools work? Quick Analysis with bpftrace Custom Tools with Python + bcc