- Authors

- Name
- Youngju Kim
- @fjvbn20031
Cilium ClusterMesh: Multi-Cluster Networking Internal Implementation
Overview
Cilium ClusterMesh is a multi-cluster solution that connects multiple Kubernetes clusters into a unified network. It provides cross-cluster service discovery, load balancing, and network policy enforcement while maintaining the independence of each cluster.
1. ClusterMesh Architecture
1.1 Core Components
Cluster A Cluster B
+---------------------------+ +---------------------------+
| Cilium Agent (per node) | | Cilium Agent (per node) |
| - Local endpoint mgmt | | - Local endpoint mgmt |
| - Remote cluster watch | | - Remote cluster watch |
+---------------------------+ +---------------------------+
| |
v v
+---------------------------+ +---------------------------+
| clustermesh-apiserver | | clustermesh-apiserver |
| - Embedded etcd | <---> | - Embedded etcd |
| - Externally accessible | | - Externally accessible |
+---------------------------+ +---------------------------+
| |
v v
+---------------------------+ +---------------------------+
| Internal etcd (k8s state) | | Internal etcd (k8s state) |
+---------------------------+ +---------------------------+
1.2 clustermesh-apiserver
The clustermesh-apiserver is a component that runs in each cluster, exposing the cluster's state to other clusters:
apiVersion: apps/v1
kind: Deployment
metadata:
name: clustermesh-apiserver
namespace: kube-system
spec:
replicas: 2
template:
spec:
containers:
- name: apiserver
image: quay.io/cilium/clustermesh-apiserver:v1.16.0
ports:
- containerPort: 2379
name: etcd
- name: etcd
image: quay.io/coreos/etcd:v3.5.11
args:
- --data-dir=/var/run/etcd
- --listen-client-urls=https://0.0.0.0:2379
1.3 Data Synchronization Flow
State change in Cluster A (e.g., new Pod created)
|
v
Cilium Agent (Cluster A) -> Update CiliumEndpoint CRD
|
v
clustermesh-apiserver (Cluster A) -> Store state in embedded etcd
|
v
Cilium Agent (Cluster B) -> Watch Cluster A's etcd
|
v
Update Cluster B's ipcache and service maps
2. Cross-Cluster Service Discovery
2.1 Global Services
A global service is a service with the same name and namespace across multiple clusters:
apiVersion: v1
kind: Service
metadata:
name: api-service
namespace: production
annotations:
io.cilium/global-service: 'true'
spec:
selector:
app: api
ports:
- port: 80
targetPort: 8080
2.2 Global Service Operation
Global service "api-service" (production namespace)
Cluster A backends: Pod-A1 (10.244.1.5), Pod-A2 (10.244.1.6)
Cluster B backends: Pod-B1 (10.245.1.5), Pod-B2 (10.245.1.6)
Cluster A BPF service map:
api-service:80 -> [Pod-A1, Pod-A2, Pod-B1, Pod-B2]
Cluster B BPF service map:
api-service:80 -> [Pod-A1, Pod-A2, Pod-B1, Pod-B2]
Load balancing across the same backend pool from all clusters
2.3 Service Affinity
apiVersion: v1
kind: Service
metadata:
name: api-service
namespace: production
annotations:
io.cilium/global-service: 'true'
io.cilium/service-affinity: 'local'
spec:
selector:
app: api
ports:
- port: 80
Service affinity options:
| Value | Behavior |
|---|---|
| default | Evenly distribute across all cluster backends |
| local | Prefer local cluster, fall back to remote if none |
| remote | Prefer remote cluster, fall back to local if none |
| none | No affinity (same as default) |
3. Cross-Cluster Network Policies
3.1 Identity Synchronization
In ClusterMesh, Identities from each cluster are synchronized:
Identity synchronization flow:
Cluster A: Pod created (app=frontend)
-> Identity assigned: 48291 (Cluster A scope)
-> CiliumIdentity CRD created
-> Shared via clustermesh-apiserver
Cluster B: Remote Identity received
-> Add remote Identity to local ipcache
-> Remote Identity can be referenced in policy maps
3.2 Cross-Cluster Policy Example
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
name: allow-cross-cluster
namespace: production
spec:
endpointSelector:
matchLabels:
app: backend
ingress:
- fromEndpoints:
- matchLabels:
app: frontend
io.cilium.k8s.policy.cluster: cluster-a
3.3 Cluster Identification
Cluster identification label:
io.cilium.k8s.policy.cluster: <cluster-name>
Each cluster's Identity includes the cluster name:
k8s:app=frontend
k8s:io.kubernetes.pod.namespace=production
k8s:io.cilium.k8s.policy.cluster=cluster-a
This allows policies to select workloads from specific clusters
4. ClusterMesh Connection Setup
4.1 Prerequisites
1. Unique cluster ID (1-255)
- Set different cluster-id for each cluster
2. Unique cluster name
3. Non-overlapping Pod CIDRs
- Cluster A: 10.244.0.0/16
- Cluster B: 10.245.0.0/16
4. Network connectivity
- Pod networks must be mutually reachable
- Tunnel or direct routing required
4.2 Setup Steps
# Step 1: Enable ClusterMesh on each cluster
cilium clustermesh enable --service-type LoadBalancer
# Step 2: Connect clusters
cilium clustermesh connect --destination-context ctx-cluster-b
# Step 3: Verify status
cilium clustermesh status
# Example output:
# Cluster Connections:
# cluster-b:
# connected: true
# endpoints: 24
# identities: 42
# services: 8
5. KVStoreMesh: Scalability Enhancement
5.1 KVStoreMesh Architecture
In large ClusterMesh environments, each Agent connecting directly to remote cluster etcd increases load. KVStoreMesh solves this:
Without KVStoreMesh:
All Agents in Cluster A -> Cluster B etcd (direct)
N nodes x M clusters = N*M connections
With KVStoreMesh:
Cluster A KVStoreMesh -> Cluster B etcd (single connection)
All Agents in Cluster A -> Local KVStoreMesh cache
Dramatically fewer connections
5.2 KVStoreMesh Operation
Data flow from remote Cluster B:
Cluster B: clustermesh-apiserver (etcd)
|
v (single connection)
Cluster A: KVStoreMesh
|
v (replicate data to local cache)
Cluster A: Local etcd or CRD
|
v
Cluster A: Cilium Agent (each node)
- Read from local data source
- No direct remote etcd connection needed
6. External Workloads
6.1 Overview
The external workloads feature allows installing Cilium Agent on VMs or bare-metal servers and joining them to the Kubernetes cluster:
Kubernetes Cluster
+---------------------------+
| Pod A (10.244.1.5) |
| Pod B (10.244.1.6) |
| Cilium Agent (per node) |
+---------------------------+
|
v (ClusterMesh connection)
+---------------------------+
| External VM (192.168.1.100)|
| Cilium Agent installed |
| - Same policies applied |
| - Same Identity assigned |
| - Service access available |
+---------------------------+
6.2 External Workload Setup
# Step 1: Enable external workload support on cluster
cilium clustermesh vm create my-vm --ipv4-alloc-cidr 10.192.1.0/24
# Step 2: Generate install script for VM
cilium clustermesh vm install install-external-workload.sh
# Step 3: Execute script on VM
# The script:
# - Installs Cilium Agent
# - Configures cluster connection
# - Sets up certificates
# - Starts Agent
6.3 Identity for External Workloads
Same Identity mechanism as Kubernetes Pods applied to external VMs:
VM labels:
app: legacy-app
env: production
Identity assigned: 59102
Policy enforcement:
- Control communication from cluster Pods to VM
- Control communication from VM to cluster Pods
- Same policy model based on Identity
7. Failure Handling and High Availability
7.1 Behavior During Cluster Failure
Scenario: Cluster B goes completely down
Cluster A behavior:
1. Detect clustermesh-apiserver connection loss
2. Remove Cluster B backends from service maps
3. Route new connections to Cluster A backends only
4. Existing connections cleaned up after timeout
With service affinity "local":
- Cluster B failure has no impact on Cluster A
- Only Cluster A backends were being used
7.2 Network Partition Response
During network partition:
1. Remote cluster connection timeout
2. Mark remote backends as "unreachable"
3. Prefer local backends
4. Automatic reconnection and state sync on network recovery
8. Monitoring and Troubleshooting
8.1 Status Commands
# ClusterMesh connection status
cilium clustermesh status
# Endpoints synced from remote clusters
cilium endpoint list --selector "reserved:remote-node"
# Global services
cilium service list
# Remote Identities
cilium identity list | grep "cluster-b"
8.2 Debugging Commands
# ClusterMesh related logs
cilium-agent --debug
# Remote cluster connection state
cilium status --verbose | grep -A 10 "ClusterMesh"
# Cross-cluster service backends
cilium bpf lb list | grep "global"
# Remote ipcache entries
cilium bpf ipcache list | grep "cluster-b"
8.3 Common Issues
Issue: Clusters cannot connect
Checks:
1. Verify cluster-id is unique
2. Verify Pod CIDRs do not overlap
3. Verify clustermesh-apiserver is externally accessible
4. Verify TLS certificates are correct
Issue: Global service missing remote backends
Checks:
1. Same service name/namespace on both clusters
2. io.cilium/global-service annotation present
3. Check backend list with cilium service list
Issue: Cross-cluster policy not applied
Checks:
1. Verify Identities are syncing correctly
2. Verify cluster name label used correctly in policy
3. Check remote Identities with cilium identity list
Summary
Cilium ClusterMesh provides multi-cluster networking through these core principles:
- Distributed Architecture: Each cluster operates independently with no central control plane
- Identity Synchronization: Shared security Identities across clusters for consistent policy enforcement
- Global Services: Service discovery and load balancing across multiple clusters
- KVStoreMesh: Scalability optimization for large environments
- External Workloads: Integration of VMs/bare-metal servers into the Kubernetes network
- Failure Isolation: Failures are isolated between clusters, maintaining overall system stability