- Authors

- Name
- Youngju Kim
- @fjvbn20031
Cilium Hubble Observability Platform Internal Analysis
Overview
Hubble is a network observability platform built into Cilium that collects and analyzes all network events from the eBPF datapath in real time. It provides deep network visibility without installing additional agents on the infrastructure.
1. Hubble Architecture
1.1 Component Layout
+-----------------------------------------------------------+
| Hubble Architecture |
+-----------------------------------------------------------+
| |
| Node 1 Node 2 Node 3 |
| +-----------+ +-----------+ +-----------+ |
| | Cilium | | Cilium | | Cilium | |
| | Agent | | Agent | | Agent | |
| | +------+ | | +------+ | | +------+ | |
| | |Hubble| | | |Hubble| | | |Hubble| | |
| | |Server| | | |Server| | | |Server| | |
| | +------+ | | +------+ | | +------+ | |
| +-----------+ +-----------+ +-----------+ |
| | | | |
| +--------+--------+--------+--------+ |
| | | |
| +-----v-----+ +-----v-----+ |
| | Hubble | | Hubble UI | |
| | Relay | | | |
| +-----------+ +-----------+ |
| | |
| +-----v-----+ |
| | Prometheus| |
| | Grafana | |
| +-----------+ |
+-----------------------------------------------------------+
1.2 Component Roles
| Component | Role | Deployment |
|---|---|---|
| Hubble Server | Collect flows on each node | Embedded in Cilium Agent |
| Hubble Relay | Aggregate flows cluster-wide | Deployment (1-2 replicas) |
| Hubble UI | Topology visualization | Deployment |
| Hubble CLI | Command-line flow queries | Local binary |
2. Hubble Server: Flow Collection Engine
2.1 Flow Collection Mechanism
The Hubble Server runs inside the Cilium Agent, collecting events from the eBPF datapath:
eBPF datapath events:
[Packet processing] -> [Policy verdict] -> [Conntrack events]
| | |
v v v
[Perf Event Ring Buffer]
|
v
[Hubble Server: Event parser]
|
v
[Flow ring buffer (in-memory)]
|
v
[gRPC server: Stream flows to clients]
2.2 Ring Buffer
Hubble stores flows in a fixed-size ring buffer:
# Ring buffer size configuration
# Default: 4095 flows
# Config: --hubble-buffer-size=16383
# Check current ring buffer status
hubble status
# Nodes:
# node-1: Connected, Flows: 4095/4095 (100.00%), ...
# node-2: Connected, Flows: 3821/4095 (93.31%), ...
2.3 Flow Data Structure
Key fields in a flow record:
- time: Event timestamp
- source:
identity: Source Identity
namespace: Source namespace
labels: Source labels
pod_name: Source Pod name
- destination:
identity: Destination Identity
namespace: Destination namespace
labels: Destination labels
pod_name: Destination Pod name
- IP:
source: Source IP
destination: Destination IP
- l4:
TCP/UDP:
source_port: Source port
destination_port: Destination port
- l7:
type: HTTP/DNS/Kafka
http:
method: GET/POST/...
url: Request URL
code: Response code
- verdict: FORWARDED/DROPPED/AUDIT/REDIRECTED
- drop_reason: Drop reason (when verdict is DROPPED)
- Type: L3_L4/L7/TRACE/DROP
3. Hubble Relay: Cluster-Wide Observability
3.1 Relay Operation
Hubble Relay connects to all Hubble Servers in the cluster to aggregate flows:
Relay connection management:
1. Discover all Cilium Agents (Hubble Servers) in the cluster
2. Connect to each node's Hubble gRPC service
3. Distribute flow requests to all nodes
4. Aggregate responses and deliver to clients
5. Automatic reconnection on node add/remove
3.2 Relay Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: hubble-relay
namespace: kube-system
spec:
replicas: 1
template:
spec:
containers:
- name: hubble-relay
image: quay.io/cilium/hubble-relay:v1.16.0
ports:
- containerPort: 4245
name: grpc
args:
- serve
- --peer-service=unix:///var/run/cilium/hubble.sock
- --listen-address=:4245
3.3 Relay Port Forwarding
# Local port forwarding to Hubble Relay
cilium hubble port-forward &
# Then access via hubble CLI
hubble observe
hubble status
4. Hubble UI: Topology Visualization
4.1 UI Features
Hubble UI is a web-based interface providing:
Key features:
1. Service Map
- Visualize service-to-service communication as topology graph
- Real-time traffic flow display
- Color-coded normal/abnormal connections
2. Flow Table
- Real-time network flow listing
- Filtering (namespace, labels, verdict)
- Detailed information for each flow
3. Policy Visualization
- Display allow/deny status from policies
- Highlight dropped traffic
5. Flow Types: L3/L4/L7
5.1 L3/L4 Flows
Collected by default for all network packets:
# Observe L3/L4 flows
hubble observe --type l3/l4
# Example output:
# Mar 20 10:15:32.123 default/frontend-abc -> default/backend-xyz
# TCP SYN 10.244.1.5:34567 -> 10.244.2.10:8080
# verdict: FORWARDED
# Track TCP connections
hubble observe --protocol tcp --to-port 8080
# UDP traffic
hubble observe --protocol udp --to-port 53
5.2 L7 Flows
Collected for traffic with L7 policies applied:
# Observe HTTP flows
hubble observe --type l7 --protocol http
# Example output:
# Mar 20 10:15:32.456 default/frontend-abc -> default/api-server-xyz
# HTTP GET /api/v1/users
# Response: 200 OK (23ms)
# DNS flows
hubble observe --type l7 --protocol dns
# Kafka flows
hubble observe --type l7 --protocol kafka
5.3 Drop Flows
Packets dropped due to policies or errors:
# Observe only dropped flows
hubble observe --verdict DROPPED
# Example output:
# Mar 20 10:15:34.012 default/untrusted-app -> default/backend-xyz
# TCP 10.244.3.5:45678 -> 10.244.2.10:8080
# verdict: DROPPED (Policy denied)
# Filter by specific drop reason
hubble observe --verdict DROPPED --drop-reason POLICY_DENIED
6. Hubble CLI Usage
6.1 Basic Commands
# Observe all flows (real-time streaming)
hubble observe -f
# Last 100 flows
hubble observe --last 100
# Specific time range
hubble observe --since 5m
hubble observe --since "2026-03-20T10:00:00Z" --until "2026-03-20T10:30:00Z"
6.2 Filtering
# Namespace filtering
hubble observe --namespace production
# Pod name filtering
hubble observe --from-pod production/frontend-abc
hubble observe --to-pod production/backend-xyz
# Label filtering
hubble observe --from-label "app=frontend"
hubble observe --to-label "app=backend"
# IP filtering
hubble observe --from-ip 10.244.1.5
hubble observe --to-ip 10.244.2.10
# Port filtering
hubble observe --to-port 8080
# Verdict filtering
hubble observe --verdict FORWARDED
hubble observe --verdict DROPPED
# Combined filters
hubble observe \
--namespace production \
--from-label "app=frontend" \
--to-label "app=backend" \
--to-port 8080 \
--verdict FORWARDED
6.3 Output Formats
# Default output (human-readable)
hubble observe
# JSON output
hubble observe -o json
# Compact output
hubble observe -o compact
# Dictionary output
hubble observe -o dict
# jsonpb (Protocol Buffers JSON format)
hubble observe -o jsonpb
6.4 Status Check
# Overall Hubble status
hubble status
# Example output:
# Healthcheck (via localhost:4245): Ok
# Current/Max Flows: 16383/16383 (100.00%)
# Flows/s: 245.32
# Connected Nodes: 3/3
# Per-node status
hubble list nodes
7. Hubble Prometheus Metrics
7.1 Key Metrics
| Metric Name | Description |
|---|---|
| hubble_flows_processed_total | Total flows processed |
| hubble_drop_total | Dropped packets by reason |
| hubble_tcp_flags_total | Packets by TCP flag |
| hubble_dns_queries_total | DNS query count |
| hubble_dns_responses_total | DNS response count |
| hubble_http_requests_total | HTTP requests by method/path |
| hubble_http_responses_total | HTTP responses by status code |
| hubble_http_request_duration_seconds | HTTP request latency |
| hubble_icmp_total | ICMP packet count |
7.2 Alert Rule Examples
groups:
- name: hubble-alerts
rules:
- alert: HighDropRate
expr: rate(hubble_drop_total[5m]) > 100
for: 5m
labels:
severity: warning
annotations:
summary: 'High packet drop rate detected'
- alert: HTTPErrorRate
expr: |
rate(hubble_http_responses_total{status=~"5.."}[5m])
/ rate(hubble_http_responses_total[5m]) > 0.05
for: 5m
labels:
severity: critical
annotations:
summary: 'HTTP 5xx error rate exceeds 5%'
- alert: DNSLatencyHigh
expr: |
histogram_quantile(0.99, rate(hubble_dns_response_time_seconds_bucket[5m]))
> 0.5
for: 5m
labels:
severity: warning
annotations:
summary: 'DNS response latency p99 exceeds 500ms'
8. Hubble gRPC API
8.1 API Overview
Hubble provides programmatic access to flow data through its gRPC API:
// Hubble Observer API (simplified)
service Observer {
// Stream flows
rpc GetFlows(GetFlowsRequest) returns (stream GetFlowsResponse);
// Hubble server status
rpc ServerStatus(ServerStatusRequest) returns (ServerStatusResponse);
// Node list
rpc GetNodes(GetNodesRequest) returns (GetNodesResponse);
}
8.2 API Usage Example
# Direct API call with gRPCurl
grpcurl -plaintext localhost:4245 observer.Observer/ServerStatus
# Stream flows
grpcurl -plaintext -d '{}' localhost:4245 observer.Observer/GetFlows
8.3 Custom Integration Use Cases
Hubble gRPC API use cases:
1. Custom Dashboards
- Collect specific business metrics
- Custom visualizations
2. Automated Security Analysis
- Detect abnormal traffic patterns
- Automatic policy violation alerts
3. Audit Logging
- Record network activity for compliance
- Export to external systems for long-term storage
4. Service Mesh Integration
- Monitor service-to-service latency
- Error tracking and analysis
9. Performance Impact and Tuning
9.1 Performance Overhead
Overhead from enabling Hubble:
CPU:
- Basic L3/L4 observation: Minimal (~1-2%)
- L7 observation (HTTP, etc.): Depends on Envoy proxy overhead
- High-traffic environments: Flow parsing and ring buffer management
Memory:
- Proportional to ring buffer size
- Default 4095 flows x ~500 bytes per flow = ~2MB
- 16383 flows: ~8MB
Network:
- gRPC streaming traffic to Relay
- Proportional to number of observing clients
9.2 Tuning Parameters
# Ring buffer size adjustment
--hubble-buffer-size=16383
# Event queue size
--hubble-event-queue-size=0 # 0 = auto
# Enable/disable specific metrics
--hubble-metrics=dns,drop,tcp,flow
# Monitor specific events
--hubble-monitor-events="drop:true;trace:true;l7:true"
9.3 Large-Scale Optimization
Considerations for large clusters:
1. Relay resource limits
- Set appropriate CPU/memory requests/limits
- Relay load increases with node count
2. Metric cardinality management
- Minimize label contexts to control metric count
- Disable unnecessary metrics
3. Flow retention strategy
- Balance ring buffer size with traffic volume
- Export to external systems for long-term retention
4. Network bandwidth
- Consider gRPC traffic between Relay and Agents
- Limit impact of large observation queries
10. Hubble Usage Scenarios
10.1 Troubleshooting
# Diagnose Pod-to-Pod connectivity issues
hubble observe --from-pod default/app-a --to-pod default/app-b
# Analyze dropped traffic causes
hubble observe --verdict DROPPED --from-pod default/app-a
# Check DNS resolution issues
hubble observe --type l7 --protocol dns --from-pod default/app-a
# Check traffic to specific service
hubble observe --to-label "app=database" --to-port 5432
10.2 Security Audit
# Monitor policy violation traffic
hubble observe --verdict DROPPED --drop-reason POLICY_DENIED
# Track egress traffic to external
hubble observe --to-identity reserved:world
# All ingress traffic to sensitive namespace
hubble observe --namespace sensitive-ns --traffic-direction ingress
Summary
Cilium Hubble provides network observability through these core capabilities:
- Zero Instrumentation: Automatic flow collection from eBPF without application modifications
- Multi-Layer Visibility: Observation across all layers from L3/L4 network to L7 application
- Real-Time Streaming: Real-time flow streaming via ring buffer and gRPC
- Cluster-Wide Observation: Cluster-wide data aggregation through Hubble Relay
- Visualization: Service topology maps and flow visualization through Hubble UI
- Metrics Integration: Time-series metrics and alerting via Prometheus/Grafana
- Programmatic Access: Custom tool integration through gRPC API