Skip to content
Published on

Cilium Hubble Observability Platform Internal Analysis

Authors

Cilium Hubble Observability Platform Internal Analysis

Overview

Hubble is a network observability platform built into Cilium that collects and analyzes all network events from the eBPF datapath in real time. It provides deep network visibility without installing additional agents on the infrastructure.

1. Hubble Architecture

1.1 Component Layout

+-----------------------------------------------------------+
|                   Hubble Architecture                      |
+-----------------------------------------------------------+
|                                                           |
|  Node 1             Node 2             Node 3             |
|  +-----------+     +-----------+     +-----------+        |
|  | Cilium    |     | Cilium    |     | Cilium    |        |
|  | Agent     |     | Agent     |     | Agent     |        |
|  |  +------+ |     |  +------+ |     |  +------+ |        |
|  |  |Hubble| |     |  |Hubble| |     |  |Hubble| |        |
|  |  |Server| |     |  |Server| |     |  |Server| |        |
|  |  +------+ |     |  +------+ |     |  +------+ |        |
|  +-----------+     +-----------+     +-----------+        |
|       |                 |                 |               |
|       +--------+--------+--------+--------+               |
|                |                 |                         |
|          +-----v-----+    +-----v-----+                   |
|          |  Hubble   |    | Hubble UI |                   |
|          |  Relay    |    |           |                   |
|          +-----------+    +-----------+                   |
|                |                                          |
|          +-----v-----+                                    |
|          | Prometheus|                                    |
|          | Grafana   |                                    |
|          +-----------+                                    |
+-----------------------------------------------------------+

1.2 Component Roles

ComponentRoleDeployment
Hubble ServerCollect flows on each nodeEmbedded in Cilium Agent
Hubble RelayAggregate flows cluster-wideDeployment (1-2 replicas)
Hubble UITopology visualizationDeployment
Hubble CLICommand-line flow queriesLocal binary

2. Hubble Server: Flow Collection Engine

2.1 Flow Collection Mechanism

The Hubble Server runs inside the Cilium Agent, collecting events from the eBPF datapath:

eBPF datapath events:
  [Packet processing] -> [Policy verdict] -> [Conntrack events]
       |                      |                    |
       v                      v                    v
  [Perf Event Ring Buffer]
       |
       v
  [Hubble Server: Event parser]
       |
       v
  [Flow ring buffer (in-memory)]
       |
       v
  [gRPC server: Stream flows to clients]

2.2 Ring Buffer

Hubble stores flows in a fixed-size ring buffer:

# Ring buffer size configuration
# Default: 4095 flows
# Config: --hubble-buffer-size=16383

# Check current ring buffer status
hubble status
# Nodes:
#   node-1: Connected, Flows: 4095/4095 (100.00%), ...
#   node-2: Connected, Flows: 3821/4095 (93.31%), ...

2.3 Flow Data Structure

Key fields in a flow record:

- time: Event timestamp
- source:
    identity: Source Identity
    namespace: Source namespace
    labels: Source labels
    pod_name: Source Pod name
- destination:
    identity: Destination Identity
    namespace: Destination namespace
    labels: Destination labels
    pod_name: Destination Pod name
- IP:
    source: Source IP
    destination: Destination IP
- l4:
    TCP/UDP:
      source_port: Source port
      destination_port: Destination port
- l7:
    type: HTTP/DNS/Kafka
    http:
      method: GET/POST/...
      url: Request URL
      code: Response code
- verdict: FORWARDED/DROPPED/AUDIT/REDIRECTED
- drop_reason: Drop reason (when verdict is DROPPED)
- Type: L3_L4/L7/TRACE/DROP

3. Hubble Relay: Cluster-Wide Observability

3.1 Relay Operation

Hubble Relay connects to all Hubble Servers in the cluster to aggregate flows:

Relay connection management:

1. Discover all Cilium Agents (Hubble Servers) in the cluster
2. Connect to each node's Hubble gRPC service
3. Distribute flow requests to all nodes
4. Aggregate responses and deliver to clients
5. Automatic reconnection on node add/remove

3.2 Relay Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: hubble-relay
  namespace: kube-system
spec:
  replicas: 1
  template:
    spec:
      containers:
        - name: hubble-relay
          image: quay.io/cilium/hubble-relay:v1.16.0
          ports:
            - containerPort: 4245
              name: grpc
          args:
            - serve
            - --peer-service=unix:///var/run/cilium/hubble.sock
            - --listen-address=:4245

3.3 Relay Port Forwarding

# Local port forwarding to Hubble Relay
cilium hubble port-forward &

# Then access via hubble CLI
hubble observe
hubble status

4. Hubble UI: Topology Visualization

4.1 UI Features

Hubble UI is a web-based interface providing:

Key features:
1. Service Map
   - Visualize service-to-service communication as topology graph
   - Real-time traffic flow display
   - Color-coded normal/abnormal connections

2. Flow Table
   - Real-time network flow listing
   - Filtering (namespace, labels, verdict)
   - Detailed information for each flow

3. Policy Visualization
   - Display allow/deny status from policies
   - Highlight dropped traffic

5. Flow Types: L3/L4/L7

5.1 L3/L4 Flows

Collected by default for all network packets:

# Observe L3/L4 flows
hubble observe --type l3/l4

# Example output:
# Mar 20 10:15:32.123 default/frontend-abc -> default/backend-xyz
#   TCP SYN 10.244.1.5:34567 -> 10.244.2.10:8080
#   verdict: FORWARDED

# Track TCP connections
hubble observe --protocol tcp --to-port 8080

# UDP traffic
hubble observe --protocol udp --to-port 53

5.2 L7 Flows

Collected for traffic with L7 policies applied:

# Observe HTTP flows
hubble observe --type l7 --protocol http

# Example output:
# Mar 20 10:15:32.456 default/frontend-abc -> default/api-server-xyz
#   HTTP GET /api/v1/users
#   Response: 200 OK (23ms)

# DNS flows
hubble observe --type l7 --protocol dns

# Kafka flows
hubble observe --type l7 --protocol kafka

5.3 Drop Flows

Packets dropped due to policies or errors:

# Observe only dropped flows
hubble observe --verdict DROPPED

# Example output:
# Mar 20 10:15:34.012 default/untrusted-app -> default/backend-xyz
#   TCP 10.244.3.5:45678 -> 10.244.2.10:8080
#   verdict: DROPPED (Policy denied)

# Filter by specific drop reason
hubble observe --verdict DROPPED --drop-reason POLICY_DENIED

6. Hubble CLI Usage

6.1 Basic Commands

# Observe all flows (real-time streaming)
hubble observe -f

# Last 100 flows
hubble observe --last 100

# Specific time range
hubble observe --since 5m
hubble observe --since "2026-03-20T10:00:00Z" --until "2026-03-20T10:30:00Z"

6.2 Filtering

# Namespace filtering
hubble observe --namespace production

# Pod name filtering
hubble observe --from-pod production/frontend-abc
hubble observe --to-pod production/backend-xyz

# Label filtering
hubble observe --from-label "app=frontend"
hubble observe --to-label "app=backend"

# IP filtering
hubble observe --from-ip 10.244.1.5
hubble observe --to-ip 10.244.2.10

# Port filtering
hubble observe --to-port 8080

# Verdict filtering
hubble observe --verdict FORWARDED
hubble observe --verdict DROPPED

# Combined filters
hubble observe \
  --namespace production \
  --from-label "app=frontend" \
  --to-label "app=backend" \
  --to-port 8080 \
  --verdict FORWARDED

6.3 Output Formats

# Default output (human-readable)
hubble observe

# JSON output
hubble observe -o json

# Compact output
hubble observe -o compact

# Dictionary output
hubble observe -o dict

# jsonpb (Protocol Buffers JSON format)
hubble observe -o jsonpb

6.4 Status Check

# Overall Hubble status
hubble status

# Example output:
# Healthcheck (via localhost:4245): Ok
# Current/Max Flows: 16383/16383 (100.00%)
# Flows/s: 245.32
# Connected Nodes: 3/3

# Per-node status
hubble list nodes

7. Hubble Prometheus Metrics

7.1 Key Metrics

Metric NameDescription
hubble_flows_processed_totalTotal flows processed
hubble_drop_totalDropped packets by reason
hubble_tcp_flags_totalPackets by TCP flag
hubble_dns_queries_totalDNS query count
hubble_dns_responses_totalDNS response count
hubble_http_requests_totalHTTP requests by method/path
hubble_http_responses_totalHTTP responses by status code
hubble_http_request_duration_secondsHTTP request latency
hubble_icmp_totalICMP packet count

7.2 Alert Rule Examples

groups:
  - name: hubble-alerts
    rules:
      - alert: HighDropRate
        expr: rate(hubble_drop_total[5m]) > 100
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: 'High packet drop rate detected'

      - alert: HTTPErrorRate
        expr: |
          rate(hubble_http_responses_total{status=~"5.."}[5m])
          / rate(hubble_http_responses_total[5m]) > 0.05
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: 'HTTP 5xx error rate exceeds 5%'

      - alert: DNSLatencyHigh
        expr: |
          histogram_quantile(0.99, rate(hubble_dns_response_time_seconds_bucket[5m]))
          > 0.5
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: 'DNS response latency p99 exceeds 500ms'

8. Hubble gRPC API

8.1 API Overview

Hubble provides programmatic access to flow data through its gRPC API:

// Hubble Observer API (simplified)
service Observer {
    // Stream flows
    rpc GetFlows(GetFlowsRequest) returns (stream GetFlowsResponse);

    // Hubble server status
    rpc ServerStatus(ServerStatusRequest) returns (ServerStatusResponse);

    // Node list
    rpc GetNodes(GetNodesRequest) returns (GetNodesResponse);
}

8.2 API Usage Example

# Direct API call with gRPCurl
grpcurl -plaintext localhost:4245 observer.Observer/ServerStatus

# Stream flows
grpcurl -plaintext -d '{}' localhost:4245 observer.Observer/GetFlows

8.3 Custom Integration Use Cases

Hubble gRPC API use cases:

1. Custom Dashboards
   - Collect specific business metrics
   - Custom visualizations

2. Automated Security Analysis
   - Detect abnormal traffic patterns
   - Automatic policy violation alerts

3. Audit Logging
   - Record network activity for compliance
   - Export to external systems for long-term storage

4. Service Mesh Integration
   - Monitor service-to-service latency
   - Error tracking and analysis

9. Performance Impact and Tuning

9.1 Performance Overhead

Overhead from enabling Hubble:

CPU:
  - Basic L3/L4 observation: Minimal (~1-2%)
  - L7 observation (HTTP, etc.): Depends on Envoy proxy overhead
  - High-traffic environments: Flow parsing and ring buffer management

Memory:
  - Proportional to ring buffer size
  - Default 4095 flows x ~500 bytes per flow = ~2MB
  - 16383 flows: ~8MB

Network:
  - gRPC streaming traffic to Relay
  - Proportional to number of observing clients

9.2 Tuning Parameters

# Ring buffer size adjustment
--hubble-buffer-size=16383

# Event queue size
--hubble-event-queue-size=0  # 0 = auto

# Enable/disable specific metrics
--hubble-metrics=dns,drop,tcp,flow

# Monitor specific events
--hubble-monitor-events="drop:true;trace:true;l7:true"

9.3 Large-Scale Optimization

Considerations for large clusters:

1. Relay resource limits
   - Set appropriate CPU/memory requests/limits
   - Relay load increases with node count

2. Metric cardinality management
   - Minimize label contexts to control metric count
   - Disable unnecessary metrics

3. Flow retention strategy
   - Balance ring buffer size with traffic volume
   - Export to external systems for long-term retention

4. Network bandwidth
   - Consider gRPC traffic between Relay and Agents
   - Limit impact of large observation queries

10. Hubble Usage Scenarios

10.1 Troubleshooting

# Diagnose Pod-to-Pod connectivity issues
hubble observe --from-pod default/app-a --to-pod default/app-b

# Analyze dropped traffic causes
hubble observe --verdict DROPPED --from-pod default/app-a

# Check DNS resolution issues
hubble observe --type l7 --protocol dns --from-pod default/app-a

# Check traffic to specific service
hubble observe --to-label "app=database" --to-port 5432

10.2 Security Audit

# Monitor policy violation traffic
hubble observe --verdict DROPPED --drop-reason POLICY_DENIED

# Track egress traffic to external
hubble observe --to-identity reserved:world

# All ingress traffic to sensitive namespace
hubble observe --namespace sensitive-ns --traffic-direction ingress

Summary

Cilium Hubble provides network observability through these core capabilities:

  • Zero Instrumentation: Automatic flow collection from eBPF without application modifications
  • Multi-Layer Visibility: Observation across all layers from L3/L4 network to L7 application
  • Real-Time Streaming: Real-time flow streaming via ring buffer and gRPC
  • Cluster-Wide Observation: Cluster-wide data aggregation through Hubble Relay
  • Visualization: Service topology maps and flow visualization through Hubble UI
  • Metrics Integration: Time-series metrics and alerting via Prometheus/Grafana
  • Programmatic Access: Custom tool integration through gRPC API