Split View: Cilium Hubble 관찰성 플랫폼 내부 분석

Cilium Hubble 관찰성 플랫폼 내부 분석

개요

Hubble은 Cilium에 내장된 네트워크 관찰성 플랫폼으로, eBPF 데이터패스에서 발생하는 모든 네트워크 이벤트를 실시간으로 수집하고 분석합니다. 인프라에 추가적인 에이전트를 설치하지 않고도 깊이 있는 네트워크 가시성을 제공합니다.

1. Hubble 아키텍처

1.1 컴포넌트 구성

+-----------------------------------------------------------+
|                     Hubble 아키텍처                         |
+-----------------------------------------------------------+
|                                                           |
|  노드 1             노드 2             노드 3              |
|  +-----------+     +-----------+     +-----------+        |
|  | Cilium    |     | Cilium    |     | Cilium    |        |
|  | Agent     |     | Agent     |     | Agent     |        |
|  |  +------+ |     |  +------+ |     |  +------+ |        |
|  |  |Hubble| |     |  |Hubble| |     |  |Hubble| |        |
|  |  |Server| |     |  |Server| |     |  |Server| |        |
|  |  +------+ |     |  +------+ |     |  +------+ |        |
|  +-----------+     +-----------+     +-----------+        |
|       |                 |                 |               |
|       +--------+--------+--------+--------+               |
|                |                 |                         |
|          +-----v-----+    +-----v-----+                   |
|          |  Hubble   |    | Hubble UI |                   |
|          |  Relay    |    |           |                   |
|          +-----------+    +-----------+                   |
|                |                                          |
|          +-----v-----+                                    |
|          | Prometheus|                                    |
|          | Grafana   |                                    |
|          +-----------+                                    |
+-----------------------------------------------------------+

1.2 각 컴포넌트의 역할

컴포넌트	역할	배포 형태
Hubble Server	각 노드에서 플로우 수집	Cilium Agent에 내장
Hubble Relay	클러스터 전체 플로우 집계	Deployment (1-2개)
Hubble UI	토폴로지 시각화	Deployment
Hubble CLI	명령줄 플로우 조회	로컬 바이너리

2. Hubble Server: 플로우 수집 엔진

2.1 플로우 수집 메커니즘

Hubble Server는 Cilium Agent 내부에서 실행되며, eBPF 데이터패스의 이벤트를 수집합니다.

eBPF 데이터패스 이벤트:
  [패킷 처리] -> [정책 verdict] -> [conntrack 이벤트]
       |              |                   |
       v              v                   v
  [Perf Event Ring Buffer]
       |
       v
  [Hubble Server: 이벤트 파서]
       |
       v
  [플로우 링 버퍼 (메모리 내)]
       |
       v
  [gRPC 서버: 클라이언트에 플로우 스트리밍]

2.2 링 버퍼

Hubble은 고정 크기의 링 버퍼에 플로우를 저장합니다.

# 링 버퍼 크기 설정
# 기본값: 4095 플로우
# 설정: --hubble-buffer-size=16383

# 현재 링 버퍼 상태 확인
hubble status
# Nodes:
#   node-1: Connected, Flows: 4095/4095 (100.00%), ...
#   node-2: Connected, Flows: 3821/4095 (93.31%), ...

2.3 플로우 데이터 구조

플로우 레코드 주요 필드:

- time: 이벤트 타임스탬프
- source:
    identity: 소스 Identity
    namespace: 소스 네임스페이스
    labels: 소스 레이블
    pod_name: 소스 Pod 이름
- destination:
    identity: 목적지 Identity
    namespace: 목적지 네임스페이스
    labels: 목적지 레이블
    pod_name: 목적지 Pod 이름
- IP:
    source: 소스 IP
    destination: 목적지 IP
- l4:
    TCP/UDP:
      source_port: 소스 포트
      destination_port: 목적지 포트
- l7:
    type: HTTP/DNS/Kafka
    http:
      method: GET/POST/...
      url: 요청 URL
      code: 응답 코드
- verdict: FORWARDED/DROPPED/AUDIT/REDIRECTED
- drop_reason: 드롭 이유 (verdict가 DROPPED인 경우)
- Type: L3_L4/L7/TRACE/DROP

3. Hubble Relay: 클러스터 전체 관찰성

3.1 Relay 동작 원리

Hubble Relay는 클러스터의 모든 Hubble Server에 연결하여 플로우를 집계합니다.

Relay의 연결 관리:

1. 클러스터의 모든 Cilium Agent(Hubble Server) 디스커버리
2. 각 노드의 Hubble gRPC 서비스에 연결
3. 플로우 요청을 모든 노드에 분산
4. 응답을 집계하여 클라이언트에 전달
5. 노드 추가/제거 시 자동 재연결

3.2 Relay 배포

apiVersion: apps/v1
kind: Deployment
metadata:
  name: hubble-relay
  namespace: kube-system
spec:
  replicas: 1
  template:
    spec:
      containers:
        - name: hubble-relay
          image: quay.io/cilium/hubble-relay:v1.16.0
          ports:
            - containerPort: 4245
              name: grpc
          args:
            - serve
            - --peer-service=unix:///var/run/cilium/hubble.sock
            - --listen-address=:4245

3.3 Relay 포트 포워딩

# Hubble Relay에 로컬 포트 포워딩
cilium hubble port-forward &

# 이후 hubble CLI로 접근 가능
hubble observe
hubble status

4. Hubble UI: 토폴로지 시각화

4.1 UI 기능

Hubble UI는 웹 기반 인터페이스로 다음 기능을 제공합니다.

주요 기능:
1. 서비스 맵 (Service Map)
   - 서비스 간 통신 관계를 토폴로지 그래프로 시각화
   - 실시간 트래픽 흐름 표시
   - 정상/비정상 연결 색상 구분

2. 플로우 테이블
   - 실시간 네트워크 플로우 목록
   - 필터링 (네임스페이스, 레이블, verdict)
   - 각 플로우의 상세 정보

3. 정책 시각화
   - 정책에 의한 허용/거부 상태 표시
   - 드롭된 트래픽 하이라이트

4.2 UI 배포

apiVersion: apps/v1
kind: Deployment
metadata:
  name: hubble-ui
  namespace: kube-system
spec:
  replicas: 1
  template:
    spec:
      containers:
        - name: frontend
          image: quay.io/cilium/hubble-ui:v0.13.0
          ports:
            - containerPort: 8081
              name: http
        - name: backend
          image: quay.io/cilium/hubble-ui-backend:v0.13.0
          ports:
            - containerPort: 8090

5. 플로우 타입: L3/L4/L7

5.1 L3/L4 플로우

모든 네트워크 패킷에 대해 기본적으로 수집됩니다.

# L3/L4 플로우 관찰
hubble observe --type l3/l4

# 출력 예시:
# Mar 20 10:15:32.123 default/frontend-abc -> default/backend-xyz
#   TCP SYN 10.244.1.5:34567 -> 10.244.2.10:8080
#   verdict: FORWARDED

# TCP 연결 추적
hubble observe --protocol tcp --to-port 8080

# UDP 트래픽
hubble observe --protocol udp --to-port 53

5.2 L7 플로우

L7 정책이 적용된 트래픽에 대해 수집됩니다.

# HTTP 플로우 관찰
hubble observe --type l7 --protocol http

# 출력 예시:
# Mar 20 10:15:32.456 default/frontend-abc -> default/api-server-xyz
#   HTTP GET /api/v1/users
#   Response: 200 OK (23ms)

# DNS 플로우
hubble observe --type l7 --protocol dns

# 출력 예시:
# Mar 20 10:15:33.789 default/backend-abc -> kube-system/coredns-xyz
#   DNS Query: api.example.com A
#   DNS Response: 203.0.113.10 (TTL: 300)

# Kafka 플로우
hubble observe --type l7 --protocol kafka

5.3 드롭 플로우

정책이나 오류로 인해 드롭된 패킷입니다.

# 드롭된 플로우만 관찰
hubble observe --verdict DROPPED

# 출력 예시:
# Mar 20 10:15:34.012 default/untrusted-app -> default/backend-xyz
#   TCP 10.244.3.5:45678 -> 10.244.2.10:8080
#   verdict: DROPPED (Policy denied)

# 특정 드롭 이유 필터링
hubble observe --verdict DROPPED --drop-reason POLICY_DENIED

6. Hubble CLI 사용법

6.1 기본 명령

# 모든 플로우 관찰 (실시간 스트리밍)
hubble observe -f

# 최근 100개 플로우
hubble observe --last 100

# 특정 시간 범위
hubble observe --since 5m
hubble observe --since "2026-03-20T10:00:00Z" --until "2026-03-20T10:30:00Z"

6.2 필터링

# 네임스페이스 필터링
hubble observe --namespace production

# Pod 이름 필터링
hubble observe --from-pod production/frontend-abc
hubble observe --to-pod production/backend-xyz

# 레이블 필터링
hubble observe --from-label "app=frontend"
hubble observe --to-label "app=backend"

# IP 필터링
hubble observe --from-ip 10.244.1.5
hubble observe --to-ip 10.244.2.10

# 포트 필터링
hubble observe --to-port 8080
hubble observe --from-port 443

# verdict 필터링
hubble observe --verdict FORWARDED
hubble observe --verdict DROPPED

# 복합 필터
hubble observe \
  --namespace production \
  --from-label "app=frontend" \
  --to-label "app=backend" \
  --to-port 8080 \
  --verdict FORWARDED

6.3 출력 형식

# 기본 출력 (사람 읽기 좋은 형식)
hubble observe

# JSON 출력
hubble observe -o json

# 간결한 출력
hubble observe -o compact

# 딕셔너리 출력
hubble observe -o dict

# jsonpb (Protocol Buffers JSON 형식)
hubble observe -o jsonpb

6.4 상태 확인

# Hubble 전체 상태
hubble status

# 출력 예시:
# Healthcheck (via localhost:4245): Ok
# Current/Max Flows: 16383/16383 (100.00%)
# Flows/s: 245.32
# Connected Nodes: 3/3

# 노드별 상태
hubble list nodes

7. Hubble Prometheus 메트릭

7.1 메트릭 활성화

# Cilium 설정에서 Hubble 메트릭 활성화
# helm install cilium cilium/cilium \
#   --set hubble.metrics.enabled="{dns,drop,tcp,flow,icmp,httpV2:exemplars=true;labelsContext=source_ip,source_namespace,source_workload,destination_ip,destination_namespace,destination_workload,traffic_direction}"

7.2 주요 메트릭

메트릭 이름	설명
hubble_flows_processed_total	처리된 총 플로우 수
hubble_drop_total	드롭된 패킷 수 (이유별)
hubble_tcp_flags_total	TCP 플래그별 패킷 수
hubble_dns_queries_total	DNS 쿼리 수
hubble_dns_responses_total	DNS 응답 수
hubble_dns_response_types_total	DNS 응답 타입별 수
hubble_http_requests_total	HTTP 요청 수 (메서드, 경로별)
hubble_http_responses_total	HTTP 응답 수 (상태 코드별)
hubble_http_request_duration_seconds	HTTP 요청 지연 시간
hubble_icmp_total	ICMP 패킷 수

7.3 Grafana 대시보드

Hubble Grafana 대시보드 구성:

1. 네트워크 개요
   - 총 플로우 수/초
   - 드롭 비율
   - 프로토콜별 트래픽 분포

2. DNS 모니터링
   - DNS 쿼리 수/초
   - DNS 응답 지연 시간 (p50, p95, p99)
   - DNS 에러 비율
   - 인기 도메인

3. HTTP 모니터링
   - 요청 수/초 (메서드별)
   - 응답 상태 코드 분포
   - 요청 지연 시간 히스토그램
   - 에러율 (5xx / 전체)

4. 정책 모니터링
   - 드롭된 패킷 수/초 (이유별)
   - 정책 verdict 분포
   - Identity별 드롭 수

7.4 알림 규칙 예시

# Prometheus 알림 규칙 예시
groups:
  - name: hubble-alerts
    rules:
      - alert: HighDropRate
        expr: rate(hubble_drop_total[5m]) > 100
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: '높은 패킷 드롭 비율 감지'

      - alert: HTTPErrorRate
        expr: |
          rate(hubble_http_responses_total{status=~"5.."}[5m])
          / rate(hubble_http_responses_total[5m]) > 0.05
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: 'HTTP 5xx 에러율이 5%를 초과'

      - alert: DNSLatencyHigh
        expr: |
          histogram_quantile(0.99, rate(hubble_dns_response_time_seconds_bucket[5m]))
          > 0.5
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: 'DNS 응답 지연 시간 p99가 500ms 초과'

8. Hubble gRPC API

8.1 API 개요

Hubble은 gRPC API를 통해 프로그래매틱 방식으로 플로우 데이터에 접근할 수 있습니다.

// Hubble Observer API (단순화)
service Observer {
    // 플로우 스트리밍
    rpc GetFlows(GetFlowsRequest) returns (stream GetFlowsResponse);

    // Hubble 서버 상태
    rpc ServerStatus(ServerStatusRequest) returns (ServerStatusResponse);

    // 노드 목록
    rpc GetNodes(GetNodesRequest) returns (GetNodesResponse);
}

8.2 API 사용 예시

# gRPCurl로 직접 API 호출
grpcurl -plaintext localhost:4245 observer.Observer/ServerStatus

# 플로우 스트리밍
grpcurl -plaintext -d '{}' localhost:4245 observer.Observer/GetFlows

# 필터 적용
grpcurl -plaintext -d '{
  "whitelist": [
    {
      "source_pod": ["default/frontend"]
    }
  ]
}' localhost:4245 observer.Observer/GetFlows

8.3 커스텀 통합

Hubble gRPC API 활용 사례:

1. 커스텀 대시보드
   - 특정 비즈니스 메트릭 수집
   - 맞춤형 시각화

2. 자동화된 보안 분석
   - 비정상 트래픽 패턴 감지
   - 정책 위반 자동 알림

3. 감사 로깅
   - 규정 준수를 위한 네트워크 활동 기록
   - 장기 저장을 위한 외부 시스템 연동

4. 서비스 메시 통합
   - 서비스 간 지연 시간 모니터링
   - 에러 추적 및 분석

9. 성능 영향 및 튜닝

9.1 성능 오버헤드

Hubble 활성화에 따른 오버헤드:

CPU:
  - 기본 L3/L4 관찰: 최소 오버헤드 (약 1-2%)
  - L7 관찰 (HTTP 등): Envoy 프록시 오버헤드에 의존
  - 고트래픽 환경: 플로우 파싱 및 링 버퍼 관리

메모리:
  - 링 버퍼 크기에 비례
  - 기본 4095 플로우 x 플로우당 약 500바이트 = 약 2MB
  - 16383 플로우: 약 8MB

네트워크:
  - Relay로의 gRPC 스트리밍 트래픽
  - 관찰 클라이언트 수에 비례

9.2 튜닝 파라미터

# 링 버퍼 크기 조정
# 더 많은 플로우 저장 = 더 많은 메모리 사용
--hubble-buffer-size=16383

# 이벤트 큐 크기
--hubble-event-queue-size=0  # 0 = 자동

# 메트릭 활성화/비활성화
# 필요한 메트릭만 활성화하여 오버헤드 최소화
--hubble-metrics=dns,drop,tcp,flow

# 특정 네임스페이스만 모니터링
--hubble-monitor-events="drop:true;trace:true;l7:true"

# L7 관찰 범위 제한
# L7 정책이 적용된 트래픽만 L7 플로우 생성

9.3 대규모 환경 최적화

대규모 클러스터 고려사항:

1. Relay 리소스 제한 설정
   - CPU/메모리 리소스 요청/제한 적절히 설정
   - 노드 수가 많을수록 Relay 부하 증가

2. 메트릭 카디널리티 관리
   - 레이블 컨텍스트를 최소화하여 메트릭 수 관리
   - 필요 없는 메트릭 비활성화

3. 플로우 보존 전략
   - 링 버퍼 크기와 트래픽 양의 균형
   - 장기 보존이 필요하면 외부 시스템으로 내보내기

4. 네트워크 대역폭
   - Relay와 Agent 간 gRPC 트래픽 고려
   - 대규모 관찰 쿼리의 영향 제한

10. Hubble 활용 시나리오

10.1 트러블슈팅

# Pod 간 연결 문제 진단
hubble observe --from-pod default/app-a --to-pod default/app-b

# 드롭된 트래픽 원인 분석
hubble observe --verdict DROPPED --from-pod default/app-a

# DNS 해석 문제 확인
hubble observe --type l7 --protocol dns --from-pod default/app-a

# 특정 서비스로의 트래픽 확인
hubble observe --to-label "app=database" --to-port 5432

10.2 보안 감사

# 정책 위반 트래픽 모니터링
hubble observe --verdict DROPPED --drop-reason POLICY_DENIED

# 외부로의 이그레스 트래픽 추적
hubble observe --to-identity reserved:world

# 특정 네임스페이스의 모든 인그레스 트래픽
hubble observe --namespace sensitive-ns --traffic-direction ingress

10.3 성능 모니터링

# HTTP 응답 시간 관찰
hubble observe --type l7 --protocol http -o json | \
  jq '.flow.l7.http.latency'

# DNS 쿼리 지연 시간
hubble observe --type l7 --protocol dns -o json | \
  jq '.flow.l7.dns.latency'

# TCP 연결 설정 추적
hubble observe --protocol tcp --tcp-flags SYN

정리

Cilium Hubble은 다음과 같은 핵심 기능으로 네트워크 관찰성을 제공합니다.

제로 인스트루멘테이션: 애플리케이션 수정 없이 eBPF에서 자동으로 플로우 수집
다층 가시성: L3/L4 네트워크부터 L7 애플리케이션까지 전 계층 관찰
실시간 스트리밍: 링 버퍼와 gRPC를 통한 실시간 플로우 스트리밍
클러스터 전체 관찰: Hubble Relay를 통한 클러스터 전체 데이터 집계
시각화: Hubble UI를 통한 서비스 토폴로지 맵과 플로우 시각화
메트릭 통합: Prometheus/Grafana를 통한 시계열 메트릭 및 알림
프로그래매틱 접근: gRPC API를 통한 커스텀 도구 통합

Cilium Hubble Observability Platform Internal Analysis

Overview

Hubble is a network observability platform built into Cilium that collects and analyzes all network events from the eBPF datapath in real time. It provides deep network visibility without installing additional agents on the infrastructure.

1. Hubble Architecture

1.1 Component Layout

+-----------------------------------------------------------+
|                   Hubble Architecture                      |
+-----------------------------------------------------------+
|                                                           |
|  Node 1             Node 2             Node 3             |
|  +-----------+     +-----------+     +-----------+        |
|  | Cilium    |     | Cilium    |     | Cilium    |        |
|  | Agent     |     | Agent     |     | Agent     |        |
|  |  +------+ |     |  +------+ |     |  +------+ |        |
|  |  |Hubble| |     |  |Hubble| |     |  |Hubble| |        |
|  |  |Server| |     |  |Server| |     |  |Server| |        |
|  |  +------+ |     |  +------+ |     |  +------+ |        |
|  +-----------+     +-----------+     +-----------+        |
|       |                 |                 |               |
|       +--------+--------+--------+--------+               |
|                |                 |                         |
|          +-----v-----+    +-----v-----+                   |
|          |  Hubble   |    | Hubble UI |                   |
|          |  Relay    |    |           |                   |
|          +-----------+    +-----------+                   |
|                |                                          |
|          +-----v-----+                                    |
|          | Prometheus|                                    |
|          | Grafana   |                                    |
|          +-----------+                                    |
+-----------------------------------------------------------+

1.2 Component Roles

Component	Role	Deployment
Hubble Server	Collect flows on each node	Embedded in Cilium Agent
Hubble Relay	Aggregate flows cluster-wide	Deployment (1-2 replicas)
Hubble UI	Topology visualization	Deployment
Hubble CLI	Command-line flow queries	Local binary

2. Hubble Server: Flow Collection Engine

2.1 Flow Collection Mechanism

The Hubble Server runs inside the Cilium Agent, collecting events from the eBPF datapath:

eBPF datapath events:
  [Packet processing] -> [Policy verdict] -> [Conntrack events]
       |                      |                    |
       v                      v                    v
  [Perf Event Ring Buffer]
       |
       v
  [Hubble Server: Event parser]
       |
       v
  [Flow ring buffer (in-memory)]
       |
       v
  [gRPC server: Stream flows to clients]

2.2 Ring Buffer

Hubble stores flows in a fixed-size ring buffer:

# Ring buffer size configuration
# Default: 4095 flows
# Config: --hubble-buffer-size=16383

# Check current ring buffer status
hubble status
# Nodes:
#   node-1: Connected, Flows: 4095/4095 (100.00%), ...
#   node-2: Connected, Flows: 3821/4095 (93.31%), ...

2.3 Flow Data Structure

Key fields in a flow record:

- time: Event timestamp
- source:
    identity: Source Identity
    namespace: Source namespace
    labels: Source labels
    pod_name: Source Pod name
- destination:
    identity: Destination Identity
    namespace: Destination namespace
    labels: Destination labels
    pod_name: Destination Pod name
- IP:
    source: Source IP
    destination: Destination IP
- l4:
    TCP/UDP:
      source_port: Source port
      destination_port: Destination port
- l7:
    type: HTTP/DNS/Kafka
    http:
      method: GET/POST/...
      url: Request URL
      code: Response code
- verdict: FORWARDED/DROPPED/AUDIT/REDIRECTED
- drop_reason: Drop reason (when verdict is DROPPED)
- Type: L3_L4/L7/TRACE/DROP

3. Hubble Relay: Cluster-Wide Observability

3.1 Relay Operation

Hubble Relay connects to all Hubble Servers in the cluster to aggregate flows:

Relay connection management:

1. Discover all Cilium Agents (Hubble Servers) in the cluster
2. Connect to each node's Hubble gRPC service
3. Distribute flow requests to all nodes
4. Aggregate responses and deliver to clients
5. Automatic reconnection on node add/remove

3.2 Relay Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: hubble-relay
  namespace: kube-system
spec:
  replicas: 1
  template:
    spec:
      containers:
        - name: hubble-relay
          image: quay.io/cilium/hubble-relay:v1.16.0
          ports:
            - containerPort: 4245
              name: grpc
          args:
            - serve
            - --peer-service=unix:///var/run/cilium/hubble.sock
            - --listen-address=:4245

3.3 Relay Port Forwarding

# Local port forwarding to Hubble Relay
cilium hubble port-forward &

# Then access via hubble CLI
hubble observe
hubble status

4. Hubble UI: Topology Visualization

4.1 UI Features

Hubble UI is a web-based interface providing:

Key features:
1. Service Map
   - Visualize service-to-service communication as topology graph
   - Real-time traffic flow display
   - Color-coded normal/abnormal connections

2. Flow Table
   - Real-time network flow listing
   - Filtering (namespace, labels, verdict)
   - Detailed information for each flow

3. Policy Visualization
   - Display allow/deny status from policies
   - Highlight dropped traffic

5. Flow Types: L3/L4/L7

5.1 L3/L4 Flows

Collected by default for all network packets:

# Observe L3/L4 flows
hubble observe --type l3/l4

# Example output:
# Mar 20 10:15:32.123 default/frontend-abc -> default/backend-xyz
#   TCP SYN 10.244.1.5:34567 -> 10.244.2.10:8080
#   verdict: FORWARDED

# Track TCP connections
hubble observe --protocol tcp --to-port 8080

# UDP traffic
hubble observe --protocol udp --to-port 53

5.2 L7 Flows

Collected for traffic with L7 policies applied:

# Observe HTTP flows
hubble observe --type l7 --protocol http

# Example output:
# Mar 20 10:15:32.456 default/frontend-abc -> default/api-server-xyz
#   HTTP GET /api/v1/users
#   Response: 200 OK (23ms)

# DNS flows
hubble observe --type l7 --protocol dns

# Kafka flows
hubble observe --type l7 --protocol kafka

5.3 Drop Flows

Packets dropped due to policies or errors:

# Observe only dropped flows
hubble observe --verdict DROPPED

# Example output:
# Mar 20 10:15:34.012 default/untrusted-app -> default/backend-xyz
#   TCP 10.244.3.5:45678 -> 10.244.2.10:8080
#   verdict: DROPPED (Policy denied)

# Filter by specific drop reason
hubble observe --verdict DROPPED --drop-reason POLICY_DENIED

6. Hubble CLI Usage

6.1 Basic Commands

# Observe all flows (real-time streaming)
hubble observe -f

# Last 100 flows
hubble observe --last 100

# Specific time range
hubble observe --since 5m
hubble observe --since "2026-03-20T10:00:00Z" --until "2026-03-20T10:30:00Z"

6.2 Filtering

# Namespace filtering
hubble observe --namespace production

# Pod name filtering
hubble observe --from-pod production/frontend-abc
hubble observe --to-pod production/backend-xyz

# Label filtering
hubble observe --from-label "app=frontend"
hubble observe --to-label "app=backend"

# IP filtering
hubble observe --from-ip 10.244.1.5
hubble observe --to-ip 10.244.2.10

# Port filtering
hubble observe --to-port 8080

# Verdict filtering
hubble observe --verdict FORWARDED
hubble observe --verdict DROPPED

# Combined filters
hubble observe \
  --namespace production \
  --from-label "app=frontend" \
  --to-label "app=backend" \
  --to-port 8080 \
  --verdict FORWARDED

6.3 Output Formats

# Default output (human-readable)
hubble observe

# JSON output
hubble observe -o json

# Compact output
hubble observe -o compact

# Dictionary output
hubble observe -o dict

# jsonpb (Protocol Buffers JSON format)
hubble observe -o jsonpb

6.4 Status Check

# Overall Hubble status
hubble status

# Example output:
# Healthcheck (via localhost:4245): Ok
# Current/Max Flows: 16383/16383 (100.00%)
# Flows/s: 245.32
# Connected Nodes: 3/3

# Per-node status
hubble list nodes

7. Hubble Prometheus Metrics

7.1 Key Metrics

Metric Name	Description
hubble_flows_processed_total	Total flows processed
hubble_drop_total	Dropped packets by reason
hubble_tcp_flags_total	Packets by TCP flag
hubble_dns_queries_total	DNS query count
hubble_dns_responses_total	DNS response count
hubble_http_requests_total	HTTP requests by method/path
hubble_http_responses_total	HTTP responses by status code
hubble_http_request_duration_seconds	HTTP request latency
hubble_icmp_total	ICMP packet count

7.2 Alert Rule Examples

groups:
  - name: hubble-alerts
    rules:
      - alert: HighDropRate
        expr: rate(hubble_drop_total[5m]) > 100
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: 'High packet drop rate detected'

      - alert: HTTPErrorRate
        expr: |
          rate(hubble_http_responses_total{status=~"5.."}[5m])
          / rate(hubble_http_responses_total[5m]) > 0.05
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: 'HTTP 5xx error rate exceeds 5%'

      - alert: DNSLatencyHigh
        expr: |
          histogram_quantile(0.99, rate(hubble_dns_response_time_seconds_bucket[5m]))
          > 0.5
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: 'DNS response latency p99 exceeds 500ms'

8. Hubble gRPC API

8.1 API Overview

Hubble provides programmatic access to flow data through its gRPC API:

// Hubble Observer API (simplified)
service Observer {
    // Stream flows
    rpc GetFlows(GetFlowsRequest) returns (stream GetFlowsResponse);

    // Hubble server status
    rpc ServerStatus(ServerStatusRequest) returns (ServerStatusResponse);

    // Node list
    rpc GetNodes(GetNodesRequest) returns (GetNodesResponse);
}

8.2 API Usage Example

# Direct API call with gRPCurl
grpcurl -plaintext localhost:4245 observer.Observer/ServerStatus

# Stream flows
grpcurl -plaintext -d '{}' localhost:4245 observer.Observer/GetFlows

8.3 Custom Integration Use Cases

Hubble gRPC API use cases:

1. Custom Dashboards
   - Collect specific business metrics
   - Custom visualizations

2. Automated Security Analysis
   - Detect abnormal traffic patterns
   - Automatic policy violation alerts

3. Audit Logging
   - Record network activity for compliance
   - Export to external systems for long-term storage

4. Service Mesh Integration
   - Monitor service-to-service latency
   - Error tracking and analysis

9. Performance Impact and Tuning

9.1 Performance Overhead

Overhead from enabling Hubble:

CPU:
  - Basic L3/L4 observation: Minimal (~1-2%)
  - L7 observation (HTTP, etc.): Depends on Envoy proxy overhead
  - High-traffic environments: Flow parsing and ring buffer management

Memory:
  - Proportional to ring buffer size
  - Default 4095 flows x ~500 bytes per flow = ~2MB
  - 16383 flows: ~8MB

Network:
  - gRPC streaming traffic to Relay
  - Proportional to number of observing clients

9.2 Tuning Parameters

# Ring buffer size adjustment
--hubble-buffer-size=16383

# Event queue size
--hubble-event-queue-size=0  # 0 = auto

# Enable/disable specific metrics
--hubble-metrics=dns,drop,tcp,flow

# Monitor specific events
--hubble-monitor-events="drop:true;trace:true;l7:true"

9.3 Large-Scale Optimization

Considerations for large clusters:

1. Relay resource limits
   - Set appropriate CPU/memory requests/limits
   - Relay load increases with node count

2. Metric cardinality management
   - Minimize label contexts to control metric count
   - Disable unnecessary metrics

3. Flow retention strategy
   - Balance ring buffer size with traffic volume
   - Export to external systems for long-term retention

4. Network bandwidth
   - Consider gRPC traffic between Relay and Agents
   - Limit impact of large observation queries

10. Hubble Usage Scenarios

10.1 Troubleshooting

# Diagnose Pod-to-Pod connectivity issues
hubble observe --from-pod default/app-a --to-pod default/app-b

# Analyze dropped traffic causes
hubble observe --verdict DROPPED --from-pod default/app-a

# Check DNS resolution issues
hubble observe --type l7 --protocol dns --from-pod default/app-a

# Check traffic to specific service
hubble observe --to-label "app=database" --to-port 5432

10.2 Security Audit

# Monitor policy violation traffic
hubble observe --verdict DROPPED --drop-reason POLICY_DENIED

# Track egress traffic to external
hubble observe --to-identity reserved:world

# All ingress traffic to sensitive namespace
hubble observe --namespace sensitive-ns --traffic-direction ingress

Summary

Cilium Hubble provides network observability through these core capabilities:

Zero Instrumentation: Automatic flow collection from eBPF without application modifications
Multi-Layer Visibility: Observation across all layers from L3/L4 network to L7 application
Real-Time Streaming: Real-time flow streaming via ring buffer and gRPC
Cluster-Wide Observation: Cluster-wide data aggregation through Hubble Relay
Visualization: Service topology maps and flow visualization through Hubble UI
Metrics Integration: Time-series metrics and alerting via Prometheus/Grafana
Programmatic Access: Custom tool integration through gRPC API