Skip to content
Published on

Hubble and ClusterMesh — Cilium Observability and Multi-Cluster Operations

Authors

Introduction

The moment you start enforcing network policy, new questions pour in: why did the payment API just get cut off, is this drop caused by policy or routing, and how does a service spanning two clusters fail over? With the datapath moved into the kernel, the observation tooling must operate at the same depth.

Hubble is the observability layer that extracts flows directly from the Cilium eBPF datapath, and ClusterMesh is the multi-cluster layer that joins the datapaths of several clusters into one mesh. The key point is that both run on top of your existing Cilium installation, with no extra agents or sidecars. This article covers Hubble architecture and practical queries, metrics and alerting, long-term flow retention, then ClusterMesh setup, global services, operational issues, and the relationship with service meshes.

Hubble Architecture

+------------------- Node 1 -------------------+
|  [eBPF datapath]                              |
|       | perf ring buffer (flow events)       |
|       v                                      |
|  [Hubble server inside cilium-agent]         |
|   - keeps flows in a node-local ring buffer  |
|   - gRPC API (unix socket / 4244)            |
+------------------+---------------------------+
                   |
+------------------- Node 2 ... N --------------+
|  (same structure) |                            |
+------------------+---------------------------+
                   |
                   v  (connects to 4244 on every node)
            [hubble-relay]  :4245
             - aggregates cluster-wide flows behind one API
                   |
        +----------+-----------+
        v                      v
   [hubble CLI]           [hubble-ui]
   (ops/debugging)        (service map visualization)

Three operational properties follow from this structure.

  1. Flows exist only briefly in node-local ring buffers. By default only a fixed number of flows per node (4095 by default) is kept in memory, so for long-term analysis you must configure export.
  2. The relay is an aggregator, not a store. If the relay dies, neither the datapath nor node-local observation is affected.
  3. The flow data model is a structured event carrying L3/L4/L7 metadata: source/destination identities and labels, verdict, drop reason, HTTP/DNS details. Packet payloads are not stored.

hubble CLI in Practice

The CLI ships through the same channel as cilium-cli; the common pattern is to port-forward to the relay locally.

cilium hubble enable --ui          # enable Hubble + relay + UI
cilium hubble port-forward &       # connect to the relay on localhost:4245
hubble status                      # confirm flows are being received

Tracking drops — the queries you will use most

# Watch dropped flows cluster-wide in real time
hubble observe --verdict DROPPED -f

# Drops in a specific namespace, with reasons
hubble observe --verdict DROPPED --namespace payments \
  --output json | jq -r '.flow.drop_reason_desc' | sort | uniq -c

# Only policy denials (drop reason filter)
hubble observe --verdict DROPPED --drop-reason-desc POLICY_DENIED

# Denied traffic leaving a specific pod
hubble observe --from-pod payments/pg-gateway --verdict DROPPED

# All flows between two workloads (either direction)
hubble observe --pod shop/frontend --pod shop/backend

Mapping service dependencies

# Summarize what order-api calls (egress dependencies)
hubble observe --from-pod shop/order-api --type trace:to-endpoint \
  --output json | jq -r '.flow.destination.namespace + "/" + (.flow.destination.labels | join(","))' \
  | sort | uniq -c | sort -rn

# DNS query history (which external domains are looked up)
hubble observe --from-namespace payments --protocol dns \
  --output json | jq -r '.flow.l7.dns.query' | sort | uniq -c

# L7 HTTP observation (requires L7 policy or visibility annotation)
hubble observe --namespace shop --protocol http \
  --output json | jq -r '[.flow.l7.http.method, .flow.l7.http.url, (.flow.l7.http.code|tostring)] | join(" ")'

# Time range plus label selector combined
hubble observe --since 30m --label app=checkout --verdict FORWARDED

Verdict values include FORWARDED (allowed), DROPPED (blocked), AUDIT (traffic that would have been blocked in audit mode), and ERROR. During policy rollout, the AUDIT filter becomes your central tool.

Hubble UI — the service map

hubble-ui draws a per-namespace service map. It shows arrows between workloads (distinguishing L4/L7), failure ratios, and DNS targets, making it useful for understanding the current communication structure before writing policy and for instantly spotting which segment broke during an incident. The UI is primarily a live view, however — post-hoc analysis should use the exported data covered below.

Metrics — Prometheus Integration

Hubble exposes flow-derived metrics in Prometheus format. You choose which metrics to generate via helm values.

hubble:
  enabled: true
  metrics:
    enableOpenMetrics: true
    enabled:
      - dns:query;labelsContext=source_namespace,destination_namespace
      - drop:labelsContext=source_namespace,destination_namespace;sourceContext=workload-name
      - tcp
      - flow
      - port-distribution
      - httpV2:exemplars=true;labelsContext=source_namespace,source_workload,destination_namespace,destination_workload

HTTP golden signals

The httpV2 metrics give you per-service request rate, error rate, and latency quantiles without touching application code.

# Request rate (RPS)
sum(rate(hubble_http_requests_total{destination_namespace="shop"}[5m]))
  by (destination_workload)

# 5xx error ratio
sum(rate(hubble_http_requests_total{status=~"5.."}[5m]))
  / sum(rate(hubble_http_requests_total[5m]))

# p99 latency
histogram_quantile(0.99,
  sum(rate(hubble_http_request_duration_seconds_bucket[5m])) by (le, destination_workload))

Policy drop alerting rules

groups:
  - name: cilium-policy
    rules:
      - alert: PolicyDropSpike
        expr: |
          sum(rate(hubble_drop_total{reason="POLICY_DENIED"}[5m]))
            by (source_namespace, destination_namespace) > 1
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: 'Policy drops between namespaces exceeded threshold'
      - alert: CiliumAgentDown
        expr: up{job="cilium-agent"} == 0
        for: 5m
        labels:
          severity: critical

Right after enforcing policy, deliberately set the drop alert thresholds low to catch missing allow rules quickly, then raise them to operational levels once things stabilize — a two-phase approach worth adopting.

Long-Term Flow Retention — Export and SIEM Integration

Ring buffers rotate in minutes, so file export is mandatory for audit evidence and post-hoc analysis.

hubble:
  export:
    fileMaxSizeMb: 50
    fileMaxBackups: 10
    dynamic:
      enabled: true
      config:
        content:
          - name: security-events
            filePath: /var/run/cilium/hubble/security.log
            includeFilters:
              - verdict: ["DROPPED", "AUDIT"]
          - name: dns-all
            filePath: /var/run/cilium/hubble/dns.log
            includeFilters:
              - protocol: ["dns"]

Export files are line-delimited JSON, so they ride straight onto a standard log pipeline.

Node: hubble export files (JSON lines)
   -> log collector (Fluent Bit / Vector / OTel Collector)
   -> buffer (Kafka, optional)
   -> storage/analysis (Elasticsearch, Loki, S3+Athena, SIEM)

Useful mappings for SIEM integration: verdict and drop_reason_desc map to security event classification, the identity labels of source/destination map to asset tagging, and l7.dns.query feeds threat intel (matching against malicious domain lists). Shipping every flow explodes costs, so the common practice is to selectively export only security-relevant flows (DROPPED/AUDIT/DNS) as in the example above.

ClusterMesh Architecture

ClusterMesh joins multiple Cilium clusters to provide cross-cluster service discovery, load balancing, and policy.

+--------- Cluster A (id=1) -----------+      +--------- Cluster B (id=2) -----------+
|                                      |      |                                      |
|  [cilium-agent x N]                  |      |  [cilium-agent x N]                  |
|       |  (reads)                     |      |       |  (reads)                     |
|       v                              |      |       v                              |
|  [clustermesh-apiserver]  <----------+------+--- agents connect to the peer       |
|   - exposes own services/identities/ |      |    cluster apiserver and watch state |
|     endpoints in etcd form           |      |                                      |
|   - exposed via LoadBalancer/NodePort|      |  [clustermesh-apiserver]             |
|                                      |      |                                      |
|  PodCIDR: 10.1.0.0/16 (no overlap!)  |      |  PodCIDR: 10.2.0.0/16 (no overlap!)  |
+--------------------------------------+      +--------------------------------------+

Synchronized: services (global), identities, ipcache (remote pod IP -> identity)
After sync: pod-to-pod traffic is routed directly by both datapaths (no middle gateway)

The important design points:

  • Identities keep their meaning across cluster boundaries. The identity of an app=backend pod in cluster B is propagated into the ipcache of cluster A, so multi-cluster policy works on the same model.
  • No extra hop in the data plane. Only the control plane (clustermesh-apiserver) is added; packets flow directly using your existing routing mode (tunnel/native).
  • Control plane failure only stops the propagation of new changes. Even if the apiserver dies, traffic keeps flowing using the already-synchronized services and identities.

Setting Up ClusterMesh

Prerequisites

  • The PodCIDRs and node IPs of all clusters must not overlap (the most common design mistake).
  • All clusters must use certificates issued by the same CA (mutual mTLS trust).
  • Nodes across clusters must be able to communicate directly (VPC peering, dedicated lines, VPN, etc.).
  • Each cluster needs a unique name and id (1 to 255).
# Cluster A helm values
cluster:
  name: cluster-a
  id: 1
clustermesh:
  useAPIServer: true
  apiserver:
    service:
      type: LoadBalancer

Connecting with cilium-cli

# 1) Copy the CA of cluster A into cluster B (shared CA)
kubectl --context cluster-a -n kube-system get secret cilium-ca -o yaml \
  | kubectl --context cluster-b apply -f -

# 2) Enable ClusterMesh on both sides
cilium clustermesh enable --context cluster-a
cilium clustermesh enable --context cluster-b

# 3) Connect them (bidirectional peering configured automatically)
cilium clustermesh connect --context cluster-a --destination-context cluster-b

# 4) Verify status and connectivity
cilium clustermesh status --context cluster-a --wait
cilium connectivity test --context cluster-a --multi-cluster cluster-b

Global Services — Affinity and Failover

Annotate services with the same name/namespace in both clusters and their backends merge into one virtual service.

# Deploy identically in both clusters
apiVersion: v1
kind: Service
metadata:
  name: checkout
  namespace: shop
  annotations:
    service.cilium.io/global: "true"
    service.cilium.io/affinity: "local"   # prefer local backends, fall back to remote
spec:
  selector:
    app: checkout
  ports:
    - port: 80
      targetPort: 8080
AnnotationValueBehavior
service.cilium.io/globaltrueMerge backends across clusters
service.cilium.io/affinitylocalPrefer healthy local backends, go remote when all local are down
service.cilium.io/affinityremotePrefer remote (drain/canary scenarios)
service.cilium.io/affinitynoneSpread evenly across all clusters
service.cilium.io/global-sync-endpoint-slicestrueSync remote endpoints as EndpointSlices

The affinity=local configuration is the de facto standard. In normal operation traffic is handled inside the cluster, saving latency, and when all local backends become unhealthy you get automatic failover to the remote cluster. Note that the unhealthy verdict is readiness-based, so failover is only as accurate as your application readinessProbe.

Multi-Cluster Policy

CNPs can use the cluster as a condition.

apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: allow-cross-cluster-checkout
  namespace: shop
spec:
  endpointSelector:
    matchLabels:
      app: checkout-db
  ingress:
    - fromEndpoints:
        - matchLabels:
            app: checkout
            io.cilium.k8s.policy.cluster: cluster-a   # only from this cluster

Without the cluster label, the policy matches app=checkout in both clusters. When you want to enforce cluster boundaries with policy — for example, "the DB is reachable only from apps in its own cluster" — the cluster label is what you need.

Encryption — WireGuard and IPsec

If cross-cluster traffic traverses untrusted networks, enable transport encryption. The same feature also applies between nodes inside a cluster.

AspectWireGuardIPsec
Setup difficultyVery low (automatic key management)Manual key generation/rotation
Kernel requirement5.6 or laterBroadly supported
PerformanceGenerally excellent, uses multiqueueDepends on algorithms/hardware offload
Key rotationAutomaticManual procedure (secret replacement)
Regulatory fitFixed algorithms (ChaCha20)Preferred where FIPS is required
# Enable WireGuard (helm)
helm upgrade cilium cilium/cilium -n kube-system \
  --reuse-values --set encryption.enabled=true --set encryption.type=wireguard

# Check encryption status
kubectl -n kube-system exec ds/cilium -- cilium encrypt status

Unless you have specific regulatory requirements (FIPS and the like), WireGuard is operationally far simpler. Do not forget to redo the MTU calculation to account for the encryption overhead.

Operational Issues — Adding/Removing Clusters, Version Skew, Monitoring

Adding and removing clusters

# Add: share CA -> enable -> connect (same procedure as above)

# Remove: clean up global service dependencies first
# 1) Drain affinity/traffic toward the cluster being removed
# 2) Disconnect
cilium clustermesh disconnect --context cluster-a --destination-context cluster-c
# 3) Verify status on the remaining clusters
cilium clustermesh status --context cluster-a

Removal makes the remote backends of global services disappear, so first check whether any service with affinity=none was depending on remote backends.

Version skew

Cilium versions across ClusterMesh-connected clusters are officially supported only up to one minor version apart. The upgrade principle: one cluster at a time, and do not move to the next minor until every cluster is on the same version. This is a separate axis from Kubernetes version skew, so manage both as a matrix.

Connection monitoring

# The key mesh status command
cilium clustermesh status --context cluster-a

# Remote cluster sync state from the agent point of view
kubectl -n kube-system exec ds/cilium -- cilium status --verbose | grep -A10 ClusterMesh

In Prometheus, alongside the clustermesh agent metrics (remote cluster readiness, sync queues), set an alert for the situation where the remote backend count of a global service hits zero — it lets you detect a silent mesh partition quickly.

Relationship with Service Meshes — Overlap and Differences with Istio

"If we have Cilium, do we still need Istio?" is the most frequent question. You have to separate the overlapping parts from the distinct ones.

CapabilityCilium (+Hubble/ClusterMesh)Istio
L3/L4 policy, network isolationCore territorySecondary
L7 policy (HTTP path/method)Yes (selective Envoy redirect)Yes (all traffic proxied)
mTLS (workload-to-workload encryption)Transport layer (WireGuard/IPsec) centricApplication layer mTLS + identity attestation
Traffic splitting (canary weights)LimitedCore territory (VirtualService)
Retries/timeouts/circuit breakingNot providedCore territory
Multi-cluster servicesClusterMeshMulti-primary/remote topologies
ObservabilityNetwork centric (flows, verdicts)Request centric (tracing, app metrics)
OverheadLow (mostly in-kernel)Proxy traversal cost (reduced with ambient)

The practical summary: if your goal is network security, isolation, and observation, Cilium alone is often enough; if you need canary deployments, per-request retries, and fine-grained traffic control, layer a service mesh on top. In fact, running Cilium as the CNI with Istio ambient on top is a common combination — in that case explicitly divide responsibilities so the mTLS/L7 features do not overlap (for example: encryption via WireGuard, L7 routing via Istio).

Operations Checklist

  • Are Hubble relay/UI enabled and is hubble status healthy
  • Are alerts configured on DROPPED/AUDIT verdicts
  • Are HTTP golden signal metrics (httpV2) wired into dashboards
  • Is there an export and long-term retention pipeline for security events (drops/DNS)
  • Is it documented and verified that no PodCIDR/node IP ranges overlap across clusters
  • Is shared-CA trust configured between clusters
  • Are cluster ids (1 to 255) and names unique across all clusters
  • Are the global service affinity strategy (local recommended) and readinessProbes reviewed
  • Is cross-cluster version skew kept within one minor version
  • Are clustermesh status and remote backend counts wired into monitoring
  • Is the cluster removal procedure (drain traffic → disconnect) in the runbook
  • Are the encryption choice (WireGuard/IPsec) and MTU recalculation complete

Closing

Hubble and ClusterMesh look like separate features, but both grow from the same foundation: a datapath where identity reaches all the way into the kernel. Because flows carry identities, observation is meaningful; because identities synchronize across clusters, multi-cluster policy works on the same model as a single cluster. The operational conclusion is simple: turn on Hubble before you enforce policy, and finish your CIDR and CA design before you add clusters. Follow that order and the Cilium stack scales from a single cluster to multi-cluster with one and the same operating model.

References