Cilium ClusterMesh: Multi-Cluster Networking Internal Implementation
Overview
Cilium ClusterMesh is a multi-cluster solution that connects multiple Kubernetes clusters into a unified network. It provides cross-cluster service discovery, load balancing, and network policy enforcement while maintaining the independence of each cluster.
1. ClusterMesh Architecture
1.1 Core Components
Cluster A Cluster B
+---------------------------+ +---------------------------+
| Cilium Agent (per node) | | Cilium Agent (per node) |
| - Local endpoint mgmt | | - Local endpoint mgmt |
| - Remote cluster watch | | - Remote cluster watch |
+---------------------------+ +---------------------------+
| |
v v
+---------------------------+ +---------------------------+
| clustermesh-apiserver | | clustermesh-apiserver |
| - Embedded etcd | <---> | - Embedded etcd |
| - Externally accessible | | - Externally accessible |
+---------------------------+ +---------------------------+
| |
v v
+---------------------------+ +---------------------------+
| Internal etcd (k8s state) | | Internal etcd (k8s state) |
+---------------------------+ +---------------------------+
1.2 clustermesh-apiserver
The clustermesh-apiserver is a component that runs in each cluster, exposing the cluster's state to other clusters:
apiVersion: apps/v1
kind: Deployment
metadata:
name: clustermesh-apiserver
namespace: kube-system
spec:
replicas: 2
template:
spec:
containers:
- name: apiserver
image: quay.io/cilium/clustermesh-apiserver:v1.16.0
ports:
- containerPort: 2379
name: etcd
- name: etcd
image: quay.io/coreos/etcd:v3.5.11
args:
- --data-dir=/var/run/etcd
- --listen-client-urls=https://0.0.0.0:2379
1.3 Data Synchronization Flow
State change in Cluster A (e.g., new Pod created)
|
v
Cilium Agent (Cluster A) -> Update CiliumEndpoint CRD
|
v
clustermesh-apiserver (Cluster A) -> Store state in embedded etcd
|
v
Cilium Agent (Cluster B) -> Watch Cluster A's etcd
|
v
Update Cluster B's ipcache and service maps
2. Cross-Cluster Service Discovery
2.1 Global Services
A global service is a service with the same name and namespace across multiple clusters:
apiVersion: v1
kind: Service
metadata:
name: api-service
namespace: production
annotations:
io.cilium/global-service: 'true'
spec:
selector:
app: api
ports:
- port: 80
targetPort: 8080
2.2 Global Service Operation
Global service "api-service" (production namespace)
Cluster A backends: Pod-A1 (10.244.1.5), Pod-A2 (10.244.1.6)
Cluster B backends: Pod-B1 (10.245.1.5), Pod-B2 (10.245.1.6)
Cluster A BPF service map:
api-service:80 -> [Pod-A1, Pod-A2, Pod-B1, Pod-B2]
Cluster B BPF service map:
api-service:80 -> [Pod-A1, Pod-A2, Pod-B1, Pod-B2]
Load balancing across the same backend pool from all clusters
2.3 Service Affinity
apiVersion: v1
kind: Service
metadata:
name: api-service
namespace: production
annotations:
io.cilium/global-service: 'true'
io.cilium/service-affinity: 'local'
spec:
selector:
app: api
ports:
- port: 80
Service affinity options:
| Value | Behavior |
| ------- | ------------------------------------------------- |
| default | Evenly distribute across all cluster backends |
| local | Prefer local cluster, fall back to remote if none |
| remote | Prefer remote cluster, fall back to local if none |
| none | No affinity (same as default) |
3. Cross-Cluster Network Policies
3.1 Identity Synchronization
In ClusterMesh, Identities from each cluster are synchronized:
Identity synchronization flow:
Cluster A: Pod created (app=frontend)
-> Identity assigned: 48291 (Cluster A scope)
-> CiliumIdentity CRD created
-> Shared via clustermesh-apiserver
Cluster B: Remote Identity received
-> Add remote Identity to local ipcache
-> Remote Identity can be referenced in policy maps
3.2 Cross-Cluster Policy Example
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
name: allow-cross-cluster
namespace: production
spec:
endpointSelector:
matchLabels:
app: backend
ingress:
- fromEndpoints:
- matchLabels:
app: frontend
io.cilium.k8s.policy.cluster: cluster-a
3.3 Cluster Identification
Cluster identification label:
io.cilium.k8s.policy.cluster: <cluster-name>
Each cluster's Identity includes the cluster name:
k8s:app=frontend
k8s:io.kubernetes.pod.namespace=production
k8s:io.cilium.k8s.policy.cluster=cluster-a
This allows policies to select workloads from specific clusters
4. ClusterMesh Connection Setup
4.1 Prerequisites
1. Unique cluster ID (1-255)
- Set different cluster-id for each cluster
2. Unique cluster name
3. Non-overlapping Pod CIDRs
- Cluster A: 10.244.0.0/16
- Cluster B: 10.245.0.0/16
4. Network connectivity
- Pod networks must be mutually reachable
- Tunnel or direct routing required
4.2 Setup Steps
Step 1: Enable ClusterMesh on each cluster
cilium clustermesh enable --service-type LoadBalancer
Step 2: Connect clusters
cilium clustermesh connect --destination-context ctx-cluster-b
Step 3: Verify status
cilium clustermesh status
Example output:
Cluster Connections:
cluster-b:
connected: true
endpoints: 24
identities: 42
services: 8
5. KVStoreMesh: Scalability Enhancement
5.1 KVStoreMesh Architecture
In large ClusterMesh environments, each Agent connecting directly to remote cluster etcd increases load. KVStoreMesh solves this:
Without KVStoreMesh:
All Agents in Cluster A -> Cluster B etcd (direct)
N nodes x M clusters = N*M connections
With KVStoreMesh:
Cluster A KVStoreMesh -> Cluster B etcd (single connection)
All Agents in Cluster A -> Local KVStoreMesh cache
Dramatically fewer connections
5.2 KVStoreMesh Operation
Data flow from remote Cluster B:
Cluster B: clustermesh-apiserver (etcd)
|
v (single connection)
Cluster A: KVStoreMesh
|
v (replicate data to local cache)
Cluster A: Local etcd or CRD
|
v
Cluster A: Cilium Agent (each node)
- Read from local data source
- No direct remote etcd connection needed
6. External Workloads
6.1 Overview
The external workloads feature allows installing Cilium Agent on VMs or bare-metal servers and joining them to the Kubernetes cluster:
Kubernetes Cluster
+---------------------------+
| Pod A (10.244.1.5) |
| Pod B (10.244.1.6) |
| Cilium Agent (per node) |
+---------------------------+
|
v (ClusterMesh connection)
+---------------------------+
| External VM (192.168.1.100)|
| Cilium Agent installed |
| - Same policies applied |
| - Same Identity assigned |
| - Service access available |
+---------------------------+
6.2 External Workload Setup
Step 1: Enable external workload support on cluster
cilium clustermesh vm create my-vm --ipv4-alloc-cidr 10.192.1.0/24
Step 2: Generate install script for VM
cilium clustermesh vm install install-external-workload.sh
Step 3: Execute script on VM
The script:
- Installs Cilium Agent
- Configures cluster connection
- Sets up certificates
- Starts Agent
6.3 Identity for External Workloads
Same Identity mechanism as Kubernetes Pods applied to external VMs:
VM labels:
app: legacy-app
env: production
Identity assigned: 59102
Policy enforcement:
- Control communication from cluster Pods to VM
- Control communication from VM to cluster Pods
- Same policy model based on Identity
7. Failure Handling and High Availability
7.1 Behavior During Cluster Failure
Scenario: Cluster B goes completely down
Cluster A behavior:
1. Detect clustermesh-apiserver connection loss
2. Remove Cluster B backends from service maps
3. Route new connections to Cluster A backends only
4. Existing connections cleaned up after timeout
With service affinity "local":
- Cluster B failure has no impact on Cluster A
- Only Cluster A backends were being used
7.2 Network Partition Response
During network partition:
1. Remote cluster connection timeout
2. Mark remote backends as "unreachable"
3. Prefer local backends
4. Automatic reconnection and state sync on network recovery
8. Monitoring and Troubleshooting
8.1 Status Commands
ClusterMesh connection status
cilium clustermesh status
Endpoints synced from remote clusters
cilium endpoint list --selector "reserved:remote-node"
Global services
cilium service list
Remote Identities
cilium identity list | grep "cluster-b"
8.2 Debugging Commands
ClusterMesh related logs
cilium-agent --debug
Remote cluster connection state
cilium status --verbose | grep -A 10 "ClusterMesh"
Cross-cluster service backends
cilium bpf lb list | grep "global"
Remote ipcache entries
cilium bpf ipcache list | grep "cluster-b"
8.3 Common Issues
Issue: Clusters cannot connect
Checks:
1. Verify cluster-id is unique
2. Verify Pod CIDRs do not overlap
3. Verify clustermesh-apiserver is externally accessible
4. Verify TLS certificates are correct
Issue: Global service missing remote backends
Checks:
1. Same service name/namespace on both clusters
2. io.cilium/global-service annotation present
3. Check backend list with cilium service list
Issue: Cross-cluster policy not applied
Checks:
1. Verify Identities are syncing correctly
2. Verify cluster name label used correctly in policy
3. Check remote Identities with cilium identity list
Summary
Cilium ClusterMesh provides multi-cluster networking through these core principles:
- **Distributed Architecture**: Each cluster operates independently with no central control plane
- **Identity Synchronization**: Shared security Identities across clusters for consistent policy enforcement
- **Global Services**: Service discovery and load balancing across multiple clusters
- **KVStoreMesh**: Scalability optimization for large environments
- **External Workloads**: Integration of VMs/bare-metal servers into the Kubernetes network
- **Failure Isolation**: Failures are isolated between clusters, maintaining overall system stability
현재 단락 (1/230)
Cilium ClusterMesh is a multi-cluster solution that connects multiple Kubernetes clusters into a uni...