- Authors
- Name
- What is eBPF?
- Cilium + Hubble: Network Observability
- Tetragon: Security Observability
- Custom eBPF Observability Tools
- eBPF Tool Comparison
- Production Adoption Guide
What is eBPF?
eBPF (extended Berkeley Packet Filter) is a technology that allows running sandboxed programs inside the Linux kernel. It enables adding networking, security, and observability capabilities without modifying kernel modules.
Traditional Observability vs eBPF
| Approach | Overhead | Kernel Modification | Visibility |
|---|---|---|---|
| Logs (stdout) | Medium | Not required | App level only |
| Prometheus Metrics | Low | Not required | Only what app exposes |
| Sidecar Proxy (Envoy) | High | Not required | Up to L7 |
| eBPF | Very low | Not required | From kernel to L7 |
The key advantage of eBPF is the ability to observe at the kernel level without agents.
Cilium + Hubble: Network Observability
Installing Cilium
# Install Cilium CLI
CILIUM_CLI_VERSION=$(curl -s https://raw.githubusercontent.com/cilium/cilium-cli/main/stable.txt)
curl -L --fail --remote-name-all \
https://github.com/cilium/cilium-cli/releases/download/${CILIUM_CLI_VERSION}/cilium-linux-amd64.tar.gz
sudo tar xzvfC cilium-linux-amd64.tar.gz /usr/local/bin
rm cilium-linux-amd64.tar.gz
# Install Cilium (with Hubble)
cilium install --version 1.16.5 \
--set hubble.relay.enabled=true \
--set hubble.ui.enabled=true \
--set hubble.metrics.enableOpenMetrics=true \
--set hubble.metrics.enabled="{dns,drop,tcp,flow,port-distribution,icmp,httpV2:exemplars=true;labelsContext=source_ip\,source_namespace\,source_workload\,destination_ip\,destination_namespace\,destination_workload\,traffic_direction}"
# Check status
cilium status
cilium hubble port-forward &
Observing Network Flows with Hubble CLI
# Observe all network flows
hubble observe
# Flows from a specific namespace only
hubble observe --namespace production
# Observe DNS queries
hubble observe --protocol dns
# Observe HTTP requests (L7)
hubble observe --protocol http
# Observe dropped packets (network policy violations, etc.)
hubble observe --verdict DROPPED
# Communication between specific Pods
hubble observe --from-pod production/frontend --to-pod production/api-server
# JSON output for pipeline integration
hubble observe --namespace production -o json | jq '.flow.source.identity'
Hubble UI
# Hubble UI port forwarding
kubectl port-forward -n kube-system svc/hubble-ui 12000:80
# Access http://localhost:12000 in your browser
# View service map + real-time traffic flows
Collecting Hubble Metrics with Prometheus
# Prometheus ServiceMonitor
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: hubble-metrics
namespace: kube-system
spec:
selector:
matchLabels:
k8s-app: hubble
endpoints:
- port: hubble-metrics
interval: 15s
# PromQL queries for Grafana dashboards
# HTTP request rate (by service)
sum(rate(hubble_http_requests_total[5m])) by (destination_workload, method, status)
# DNS error rate
sum(rate(hubble_dns_responses_total{rcode!="No Error"}[5m])) by (source_workload, qtypes)
# Dropped packets (network policy)
sum(rate(hubble_drop_total[5m])) by (reason, source_workload)
# TCP connection time
histogram_quantile(0.99, sum(rate(hubble_tcp_connect_duration_seconds_bucket[5m])) by (le, destination_workload))
Tetragon: Security Observability
Tetragon is an eBPF-based security observability + runtime security tool. It detects process execution, file access, network connections, and more at the kernel level.
Installing Tetragon
# Install with Helm
helm repo add cilium https://helm.cilium.io
helm install tetragon cilium/tetragon -n kube-system
# Install Tetragon CLI
GOOS=$(go env GOOS)
GOARCH=$(go env GOARCH)
curl -L --remote-name-all \
https://github.com/cilium/tetragon/releases/latest/download/tetra-${GOOS}-${GOARCH}.tar.gz
sudo tar -C /usr/local/bin -xzvf tetra-${GOOS}-${GOARCH}.tar.gz
Observing Process Events
# All process execution events
kubectl exec -n kube-system ds/tetragon -c tetragon -- \
tetra getevents -o compact
# Example output:
# 🚀 process default/nginx-7b4f... /bin/sh -c nginx -g 'daemon off;'
# 🚀 process default/nginx-7b4f... /usr/sbin/nginx -g daemon off;
# 💥 exit default/nginx-7b4f... /bin/sh -c nginx -g 'daemon off;' 0
TracingPolicy: Custom Security Rules
# Detect sensitive file access
apiVersion: cilium.io/v1alpha1
kind: TracingPolicy
metadata:
name: sensitive-file-access
spec:
kprobes:
- call: 'fd_install'
syscall: false
args:
- index: 0
type: int
- index: 1
type: 'file'
selectors:
- matchArgs:
- index: 1
operator: 'Prefix'
values:
- '/etc/shadow'
- '/etc/passwd'
- '/root/.ssh'
- '/var/run/secrets/kubernetes.io'
# Detect external network connections
apiVersion: cilium.io/v1alpha1
kind: TracingPolicy
metadata:
name: external-connections
spec:
kprobes:
- call: 'tcp_connect'
syscall: false
args:
- index: 0
type: 'sock'
selectors:
- matchArgs:
- index: 0
operator: 'NotDAddr'
values:
- '10.0.0.0/8'
- '172.16.0.0/12'
- '192.168.0.0/16'
# Apply TracingPolicy
kubectl apply -f sensitive-file-access.yaml
# Observe events
kubectl exec -n kube-system ds/tetragon -c tetragon -- \
tetra getevents -o compact --pods nginx
# Test: attempt to access /etc/shadow from nginx Pod
kubectl exec -it nginx -- cat /etc/shadow
# Tetragon event output:
# 📬 open default/nginx /etc/shadow
Custom eBPF Observability Tools
Quick Analysis with bpftrace
# System call frequency (top 10)
bpftrace -e 'tracepoint:raw_syscalls:sys_enter { @[comm] = count(); }
interval:s:5 { print(@, 10); clear(@); }'
# Track TCP retransmissions
bpftrace -e 'kprobe:tcp_retransmit_skb {
@retrans[comm, pid] = count();
}'
# Track file opens
bpftrace -e 'tracepoint:syscalls:sys_enter_openat {
printf("%s PID:%d -> %s\n", comm, pid, str(args->filename));
}'
# High-latency disk I/O (over 10ms)
bpftrace -e 'kprobe:blk_account_io_done {
$duration = nsecs - @start[arg0];
if ($duration > 10000000) {
printf("slow IO: %d ms, comm=%s\n", $duration/1000000, comm);
}
}'
Custom Tools with Python + bcc
#!/usr/bin/env python3
# tcp_latency.py - Track TCP connection latency
from bcc import BPF
bpf_text = """
#include <net/sock.h>
#include <bcc/proto.h>
struct event_t {
u32 pid;
u32 daddr;
u16 dport;
u64 delta_us;
char comm[16];
};
BPF_HASH(start, struct sock *);
BPF_PERF_OUTPUT(events);
int trace_connect(struct pt_regs *ctx, struct sock *sk) {
u64 ts = bpf_ktime_get_ns();
start.update(&sk, &ts);
return 0;
}
int trace_tcp_rcv_state_process(struct pt_regs *ctx, struct sock *sk) {
if (sk->__sk_common.skc_state != TCP_SYN_SENT)
return 0;
u64 *tsp = start.lookup(&sk);
if (!tsp) return 0;
struct event_t event = {};
event.pid = bpf_get_current_pid_tgid() >> 32;
event.delta_us = (bpf_ktime_get_ns() - *tsp) / 1000;
event.daddr = sk->__sk_common.skc_daddr;
event.dport = sk->__sk_common.skc_dport;
bpf_get_current_comm(&event.comm, sizeof(event.comm));
events.perf_submit(ctx, &event, sizeof(event));
start.delete(&sk);
return 0;
}
"""
b = BPF(text=bpf_text)
b.attach_kprobe(event="tcp_v4_connect", fn_name="trace_connect")
b.attach_kprobe(event="tcp_rcv_state_process", fn_name="trace_tcp_rcv_state_process")
def print_event(cpu, data, size):
event = b["events"].event(data)
print(f"PID={event.pid} COMM={event.comm.decode()} "
f"DEST={socket.inet_ntoa(struct.pack('I', event.daddr))}:{socket.ntohs(event.dport)} "
f"LATENCY={event.delta_us}us")
b["events"].open_perf_buffer(print_event)
while True:
b.perf_buffer_poll()
eBPF Tool Comparison
| Tool | Use Case | Strengths | Limitations |
|---|---|---|---|
| Hubble | Network observability | Cilium integration, UI | Requires Cilium |
| Tetragon | Security observability | Runtime security | Learning curve |
| Pixie | Full observability | Auto-instrumentation, SQL queries | Resource usage |
| bpftrace | Debugging | Quick ad-hoc analysis | Not for production |
| Grafana Beyla | APM | Auto HTTP/gRPC instrumentation | L7 only |
Production Adoption Guide
Phased Adoption Strategy
Phase 1: Cilium + Hubble (Network Visibility)
├── Automatic service map generation
├── DNS error detection
└── Network policy violation monitoring
Phase 2: Tetragon (Security Observability)
├── Process execution auditing
├── Sensitive file access detection
└── External connection monitoring
Phase 3: Custom eBPF (Deep Analysis)
├── Performance profiling
├── Latency analysis
└── Custom metrics
Resource Overhead
Cilium Agent: ~200MB RAM, ~0.1 CPU per node
Hubble Relay: ~128MB RAM, ~0.05 CPU
Tetragon: ~256MB RAM, ~0.15 CPU per node
Total overhead: ~1% CPU, ~600MB RAM per node
→ 50-70% savings compared to sidecar proxies (Envoy)
Review Quiz (6 Questions)
Q1. What advantage does eBPF have over traditional observability approaches?
It operates at the kernel level, enabling observation of network, process, and file access events without application modifications, with significantly lower overhead compared to sidecar proxies.
Q2. What command observes DNS errors with Hubble?
hubble observe --protocol dns or check the DNS metric hubble_dns_responses_total{rcode!="No Error"}
Q3. What types of events can Tetragon's TracingPolicy detect?
Kernel-level events such as process execution, file access (open), network connections (tcp_connect), and system calls.
Q4. What is the difference between bpftrace and bcc?
bpftrace uses a concise AWK-style syntax suitable for quick ad-hoc analysis, while bcc allows developing complex custom tools with Python/C.
Q5. What is the approximate per-node resource overhead of Cilium + Hubble?
Approximately 0.1 CPU cores and 200MB RAM (Hubble Relay is separate).
Q6. What is the recommended first step when adopting eBPF-based observability?
Establish network visibility with Cilium + Hubble. Start by monitoring service maps, DNS errors, and network policy violations.