eBPF-Based Kubernetes Observability Guide: From Cilium Hubble to Tetragon

What is eBPF?
- Traditional Observability vs eBPF
Cilium + Hubble: Network Observability
Tetragon: Security Observability
Custom eBPF Observability Tools
- Quick Analysis with bpftrace
- Custom Tools with Python + bcc
eBPF Tool Comparison
Production Adoption Guide
- Phased Adoption Strategy
- Resource Overhead

What is eBPF?

eBPF (extended Berkeley Packet Filter) is a technology that allows running sandboxed programs inside the Linux kernel. It enables adding networking, security, and observability capabilities without modifying kernel modules.

Traditional Observability vs eBPF

Approach	Overhead	Kernel Modification	Visibility
Logs (stdout)	Medium	Not required	App level only
Prometheus Metrics	Low	Not required	Only what app exposes
Sidecar Proxy (Envoy)	High	Not required	Up to L7
eBPF	Very low	Not required	From kernel to L7

The key advantage of eBPF is the ability to observe at the kernel level without agents.

Cilium + Hubble: Network Observability

Installing Cilium

# Install Cilium CLI
CILIUM_CLI_VERSION=$(curl -s https://raw.githubusercontent.com/cilium/cilium-cli/main/stable.txt)
curl -L --fail --remote-name-all \
  https://github.com/cilium/cilium-cli/releases/download/${CILIUM_CLI_VERSION}/cilium-linux-amd64.tar.gz
sudo tar xzvfC cilium-linux-amd64.tar.gz /usr/local/bin
rm cilium-linux-amd64.tar.gz

# Install Cilium (with Hubble)
cilium install --version 1.16.5 \
  --set hubble.relay.enabled=true \
  --set hubble.ui.enabled=true \
  --set hubble.metrics.enableOpenMetrics=true \
  --set hubble.metrics.enabled="{dns,drop,tcp,flow,port-distribution,icmp,httpV2:exemplars=true;labelsContext=source_ip\,source_namespace\,source_workload\,destination_ip\,destination_namespace\,destination_workload\,traffic_direction}"

# Check status
cilium status
cilium hubble port-forward &

Observing Network Flows with Hubble CLI

# Observe all network flows
hubble observe

# Flows from a specific namespace only
hubble observe --namespace production

# Observe DNS queries
hubble observe --protocol dns

# Observe HTTP requests (L7)
hubble observe --protocol http

# Observe dropped packets (network policy violations, etc.)
hubble observe --verdict DROPPED

# Communication between specific Pods
hubble observe --from-pod production/frontend --to-pod production/api-server

# JSON output for pipeline integration
hubble observe --namespace production -o json | jq '.flow.source.identity'

Hubble UI

# Hubble UI port forwarding
kubectl port-forward -n kube-system svc/hubble-ui 12000:80

# Access http://localhost:12000 in your browser
# View service map + real-time traffic flows

Collecting Hubble Metrics with Prometheus

# Prometheus ServiceMonitor
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: hubble-metrics
  namespace: kube-system
spec:
  selector:
    matchLabels:
      k8s-app: hubble
  endpoints:
    - port: hubble-metrics
      interval: 15s

# PromQL queries for Grafana dashboards

# HTTP request rate (by service)
sum(rate(hubble_http_requests_total[5m])) by (destination_workload, method, status)

# DNS error rate
sum(rate(hubble_dns_responses_total{rcode!="No Error"}[5m])) by (source_workload, qtypes)

# Dropped packets (network policy)
sum(rate(hubble_drop_total[5m])) by (reason, source_workload)

# TCP connection time
histogram_quantile(0.99, sum(rate(hubble_tcp_connect_duration_seconds_bucket[5m])) by (le, destination_workload))

Tetragon: Security Observability

Tetragon is an eBPF-based security observability + runtime security tool. It detects process execution, file access, network connections, and more at the kernel level.

Installing Tetragon

# Install with Helm
helm repo add cilium https://helm.cilium.io
helm install tetragon cilium/tetragon -n kube-system

# Install Tetragon CLI
GOOS=$(go env GOOS)
GOARCH=$(go env GOARCH)
curl -L --remote-name-all \
  https://github.com/cilium/tetragon/releases/latest/download/tetra-${GOOS}-${GOARCH}.tar.gz
sudo tar -C /usr/local/bin -xzvf tetra-${GOOS}-${GOARCH}.tar.gz

Observing Process Events

# All process execution events
kubectl exec -n kube-system ds/tetragon -c tetragon -- \
  tetra getevents -o compact

# Example output:
# 🚀 process default/nginx-7b4f... /bin/sh -c nginx -g 'daemon off;'
# 🚀 process default/nginx-7b4f... /usr/sbin/nginx -g daemon off;
# 💥 exit    default/nginx-7b4f... /bin/sh -c nginx -g 'daemon off;' 0

TracingPolicy: Custom Security Rules

# Detect sensitive file access
apiVersion: cilium.io/v1alpha1
kind: TracingPolicy
metadata:
  name: sensitive-file-access
spec:
  kprobes:
    - call: 'fd_install'
      syscall: false
      args:
        - index: 0
          type: int
        - index: 1
          type: 'file'
      selectors:
        - matchArgs:
            - index: 1
              operator: 'Prefix'
              values:
                - '/etc/shadow'
                - '/etc/passwd'
                - '/root/.ssh'
                - '/var/run/secrets/kubernetes.io'

# Detect external network connections
apiVersion: cilium.io/v1alpha1
kind: TracingPolicy
metadata:
  name: external-connections
spec:
  kprobes:
    - call: 'tcp_connect'
      syscall: false
      args:
        - index: 0
          type: 'sock'
      selectors:
        - matchArgs:
            - index: 0
              operator: 'NotDAddr'
              values:
                - '10.0.0.0/8'
                - '172.16.0.0/12'
                - '192.168.0.0/16'

# Apply TracingPolicy
kubectl apply -f sensitive-file-access.yaml

# Observe events
kubectl exec -n kube-system ds/tetragon -c tetragon -- \
  tetra getevents -o compact --pods nginx

# Test: attempt to access /etc/shadow from nginx Pod
kubectl exec -it nginx -- cat /etc/shadow
# Tetragon event output:
# 📬 open    default/nginx /etc/shadow

Custom eBPF Observability Tools

Quick Analysis with bpftrace

# System call frequency (top 10)
bpftrace -e 'tracepoint:raw_syscalls:sys_enter { @[comm] = count(); }
  interval:s:5 { print(@, 10); clear(@); }'

# Track TCP retransmissions
bpftrace -e 'kprobe:tcp_retransmit_skb {
  @retrans[comm, pid] = count();
}'

# Track file opens
bpftrace -e 'tracepoint:syscalls:sys_enter_openat {
  printf("%s PID:%d -> %s\n", comm, pid, str(args->filename));
}'

# High-latency disk I/O (over 10ms)
bpftrace -e 'kprobe:blk_account_io_done {
  $duration = nsecs - @start[arg0];
  if ($duration > 10000000) {
    printf("slow IO: %d ms, comm=%s\n", $duration/1000000, comm);
  }
}'

Custom Tools with Python + bcc

#!/usr/bin/env python3
# tcp_latency.py - Track TCP connection latency
from bcc import BPF

bpf_text = """
#include <net/sock.h>
#include <bcc/proto.h>

struct event_t {
    u32 pid;
    u32 daddr;
    u16 dport;
    u64 delta_us;
    char comm[16];
};

BPF_HASH(start, struct sock *);
BPF_PERF_OUTPUT(events);

int trace_connect(struct pt_regs *ctx, struct sock *sk) {
    u64 ts = bpf_ktime_get_ns();
    start.update(&sk, &ts);
    return 0;
}

int trace_tcp_rcv_state_process(struct pt_regs *ctx, struct sock *sk) {
    if (sk->__sk_common.skc_state != TCP_SYN_SENT)
        return 0;

    u64 *tsp = start.lookup(&sk);
    if (!tsp) return 0;

    struct event_t event = {};
    event.pid = bpf_get_current_pid_tgid() >> 32;
    event.delta_us = (bpf_ktime_get_ns() - *tsp) / 1000;
    event.daddr = sk->__sk_common.skc_daddr;
    event.dport = sk->__sk_common.skc_dport;
    bpf_get_current_comm(&event.comm, sizeof(event.comm));

    events.perf_submit(ctx, &event, sizeof(event));
    start.delete(&sk);
    return 0;
}
"""

b = BPF(text=bpf_text)
b.attach_kprobe(event="tcp_v4_connect", fn_name="trace_connect")
b.attach_kprobe(event="tcp_rcv_state_process", fn_name="trace_tcp_rcv_state_process")

def print_event(cpu, data, size):
    event = b["events"].event(data)
    print(f"PID={event.pid} COMM={event.comm.decode()} "
          f"DEST={socket.inet_ntoa(struct.pack('I', event.daddr))}:{socket.ntohs(event.dport)} "
          f"LATENCY={event.delta_us}us")

b["events"].open_perf_buffer(print_event)
while True:
    b.perf_buffer_poll()

eBPF Tool Comparison

Tool	Use Case	Strengths	Limitations
Hubble	Network observability	Cilium integration, UI	Requires Cilium
Tetragon	Security observability	Runtime security	Learning curve
Pixie	Full observability	Auto-instrumentation, SQL queries	Resource usage
bpftrace	Debugging	Quick ad-hoc analysis	Not for production
Grafana Beyla	APM	Auto HTTP/gRPC instrumentation	L7 only

Production Adoption Guide

Phased Adoption Strategy

Phase 1: Cilium + Hubble (Network Visibility)
  ├── Automatic service map generation
  ├── DNS error detection
  └── Network policy violation monitoring

Phase 2: Tetragon (Security Observability)
  ├── Process execution auditing
  ├── Sensitive file access detection
  └── External connection monitoring

Phase 3: Custom eBPF (Deep Analysis)
  ├── Performance profiling
  ├── Latency analysis
  └── Custom metrics

Resource Overhead

Cilium Agent: ~200MB RAM, ~0.1 CPU per node
Hubble Relay: ~128MB RAM, ~0.05 CPU
Tetragon: ~256MB RAM, ~0.15 CPU per node

Total overhead: ~1% CPU, ~600MB RAM per node
→ 50-70% savings compared to sidecar proxies (Envoy)

Review Quiz (6 Questions)

Q1. What advantage does eBPF have over traditional observability approaches?

It operates at the kernel level, enabling observation of network, process, and file access events without application modifications, with significantly lower overhead compared to sidecar proxies.

Q2. What command observes DNS errors with Hubble?

hubble observe --protocol dns or check the DNS metric hubble_dns_responses_total{rcode!="No Error"}

Q3. What types of events can Tetragon's TracingPolicy detect?

Kernel-level events such as process execution, file access (open), network connections (tcp_connect), and system calls.

Q4. What is the difference between bpftrace and bcc?

bpftrace uses a concise AWK-style syntax suitable for quick ad-hoc analysis, while bcc allows developing complex custom tools with Python/C.

Q5. What is the approximate per-node resource overhead of Cilium + Hubble?

Approximately 0.1 CPU cores and 200MB RAM (Hubble Relay is separate).

Q6. What is the recommended first step when adopting eBPF-based observability?

Establish network visibility with Cilium + Hubble. Start by monitoring service maps, DNS errors, and network policy violations.