- Authors

- Name
- Youngju Kim
- @fjvbn20031
Cilium Architecture Internals: The Core of eBPF-Based Networking
Overview
Cilium is a CNI plugin that leverages eBPF (extended Berkeley Packet Filter) to provide high-performance networking, security, and observability in Kubernetes environments. This post provides a deep analysis of the core components that make up Cilium's internal architecture.
1. Cilium Agent: The Node-Level Core Engine
1.1 Agent Overview
The Cilium Agent is the core component that runs as a DaemonSet on each Kubernetes node. Its primary responsibilities include:
- Endpoint Management: Configuring network interfaces when Pods are created/deleted
- Policy Compilation: Translating CiliumNetworkPolicy into eBPF programs
- eBPF Program Loading: Loading compiled BPF programs into the kernel
- Identity Management: Allocating and tracking security label-based Identities
- IPAM: Node-level IP address management
- State Synchronization: Syncing state with KVStore or CRDs
1.2 Agent Internal Structure
The Cilium Agent is composed of several subsystems:
// Key subsystems of the Cilium Agent (conceptual structure)
type Daemon struct {
// Endpoint management
endpointManager *endpoint.EndpointManager
// Policy repository
policy *policy.Repository
// Identity allocator
identityAllocator *cache.CachingIdentityAllocator
// IPAM
ipam *ipam.IPAM
// Datapath management
datapath datapath.Datapath
// KVStore client
kvStore kvstore.BackendOperations
// Monitoring
monitorAgent monitorAgent.Agent
}
1.3 Agent Startup Process
When the Agent starts, initialization proceeds in the following order:
- Load Configuration: Read settings from ConfigMap or command-line arguments
- KVStore Connection: Connect to etcd or CRD-based KVStore
- IPAM Initialization: Set up IP address pools
- BPF Map Initialization: Create or restore required BPF maps
- Restore Existing Endpoints: Restore existing endpoint state on restart
- Load Policies: Load and apply existing network policies
- Start Event Watching: Watch Kubernetes API server events
# Check Agent status
cilium status --verbose
# Check Agent configuration
cilium config
1.4 Endpoint Management Details
The Cilium Agent is called through the CNI interface when Pods are created:
# Cilium CNI call flow on Pod creation:
# 1. kubelet requests Pod creation from container runtime via CRI
# 2. Container runtime calls CNI plugin (Cilium)
# 3. Cilium CNI makes API call to Agent
# 4. Agent creates veth pair, allocates IP, loads BPF programs
# List endpoints
cilium endpoint list
# Get detailed endpoint info
cilium endpoint get 12345
2. Cilium Operator: Cluster-Level Management
2.1 Operator Role
The Cilium Operator is a component that runs at the cluster level, deployed as a single instance (or multiple instances with leader election for HA).
Key responsibilities:
- IPAM Management: Cluster-wide IP pool management (Cluster Scope IPAM, AWS ENI, etc.)
- CRD Management: Garbage collection for CiliumNode, CiliumIdentity, etc.
- Node Discovery: New node registration and metadata management
- Ingress/Gateway Management: Processing Ingress and Gateway API resources
2.2 Role Separation Between Operator and Agent
+------------------+ +-------------------+
| Cilium Operator | | Cilium Agent |
| (Cluster Scope) | | (Node Scope) |
+------------------+ +-------------------+
| - Cluster IPAM | | - Endpoint mgmt |
| - CRD GC | | - BPF program load |
| - Node discovery | | - Policy enforce |
| - Ingress handle | | - Conntrack mgmt |
| - Identity GC | | - Identity alloc |
+------------------+ +-------------------+
2.3 Operator Configuration Example
apiVersion: apps/v1
kind: Deployment
metadata:
name: cilium-operator
namespace: kube-system
spec:
replicas: 2
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1
template:
spec:
containers:
- name: cilium-operator
image: quay.io/cilium/operator-generic:v1.16.0
args:
- --config-dir=/tmp/cilium/config-map
- --cluster-pool-ipv4-cidr=10.0.0.0/8
- --cluster-pool-ipv4-mask-size=24
3. eBPF Datapath: Kernel-Level Network Processing
3.1 eBPF Program Hook Points
Cilium attaches eBPF programs to several Linux kernel hook points.
tc (Traffic Control) Hooks
The primary hook point, attached at the ingress/egress of each network interface:
Packet receive flow:
NIC -> [XDP] -> Driver -> [tc ingress] -> Network Stack -> [tc egress] -> NIC
Cilium BPF program attachment points:
- lxc* (veth): from-container, to-container
- cilium_host: to-host, from-host
- cilium_net: from-host (overlay)
- eth0 (physical): from-netdev, to-netdev
XDP (eXpress Data Path)
The fastest packet processing path, operating at the network driver level:
# Check XDP program
ip link show dev eth0
# eth0: ... xdp/id:42 ...
# List BPF programs
bpftool prog list
Socket-Level Hooks
Intercept socket system calls to perform service load balancing:
connect() -> [BPF sock_ops] -> Translate service IP to backend IP
sendmsg() -> [BPF sk_msg] -> Redirect messages between sockets
3.2 BPF Program Compilation Pipeline
C source code (bpf/*.c)
|
v
LLVM/Clang (BPF backend)
|
v
BPF ELF object file (.o)
|
v
BPF loader (bpf syscall)
|
v
Kernel BPF verifier
|
v
JIT compilation -> Native machine code
|
v
Execute in kernel
3.3 Key BPF Program Files
bpf/
bpf_lxc.c # Pod (endpoint) datapath
bpf_host.c # Host network datapath
bpf_overlay.c # Overlay (VXLAN/Geneve) datapath
bpf_network.c # Physical network interface
bpf_xdp.c # XDP program
bpf_sock.c # Socket-level programs
lib/
common.h # Common headers and macros
maps.h # BPF map definitions
policy.h # Policy-related functions
conntrack.h # Connection tracking
nat.h # NAT engine
lb.h # Load balancer
4. BPF Maps: Datapath State Storage
4.1 Key BPF Map Types
Cilium uses various BPF map types:
| Map Name | Type | Purpose |
|---|---|---|
| cilium_ipcache | LPM Trie | IP to Identity/tunnel info mapping |
| cilium_policy | Hash | Identity-based policy lookup |
| cilium_ct4_global | Hash (LRU) | IPv4 connection tracking |
| cilium_ct6_global | Hash (LRU) | IPv6 connection tracking |
| cilium_lb4_services | Hash | IPv4 service load balancer |
| cilium_lb4_backends | Hash | IPv4 backend info |
| cilium_snat_v4_external | Hash (LRU) | IPv4 SNAT mapping |
| cilium_tunnel_map | Hash | Tunnel endpoint mapping |
| cilium_lxc | Hash | Endpoint info |
| cilium_signals | Perf Event Array | Event signaling |
4.2 BPF Map Inspection Commands
# List all BPF maps
bpftool map list
# Inspect specific map contents (ipcache)
cilium bpf ipcache list
# Connection tracking table
cilium bpf ct list global
# Service load balancer table
cilium bpf lb list
# Check policy map
cilium bpf policy get 12345
# Check tunnel map
cilium bpf tunnel list
4.3 LPM Trie: The Core of CIDR Policies
LPM (Longest Prefix Match) Trie is used for CIDR-based policy matching:
IP Cache structure:
10.0.0.0/8 -> Identity: 1 (world)
10.244.0.0/16 -> Identity: 100 (cluster)
10.244.1.0/24 -> Identity: 200 (specific namespace)
10.244.1.5/32 -> Identity: 300 (specific Pod)
Lookup: 10.244.1.5 -> Identity 300 (most specific match)
4.4 LRU Hash: Connection Tracking
LRU (Least Recently Used) Hash maps are used for connection tracking. When the map is full, the oldest entries are automatically evicted.
# conntrack entry example
cilium bpf ct list global
# SRC: 10.244.1.5:34567 DST: 10.96.0.1:443
# Flags: rx+tx closing
# Lifetime: 120s
# RxPackets: 42 RxBytes: 3456
# TxPackets: 38 TxBytes: 2890
5. Identity Allocation: Label-Based Security Model
5.1 Identity Allocation Mechanism
Cilium's security model is founded on label-based Identity:
Pod labels:
app: frontend
env: production
team: web
|
v (extract security-relevant labels)
Security-relevant labels:
k8s:app=frontend
k8s:io.kubernetes.pod.namespace=default
|
v (hash computation)
Identity: 48291 (numeric identifier)
5.2 Determining Security-Relevant Labels
Not all labels are used for Identity. By default, the following are considered security-relevant labels:
# List Identities
cilium identity list
# Get specific Identity details
cilium identity get 48291
# Labels included in Identity:
# - k8s:app=...
# - k8s:io.kubernetes.pod.namespace=...
# - k8s:io.cilium.k8s.policy.cluster=...
# Labels excluded from Identity:
# - k8s:controller-revision-hash=...
# - k8s:pod-template-hash=...
# - k8s:pod-template-generation=...
5.3 Identity Synchronization
Identities are synchronized cluster-wide through the KVStore:
Node A: Pod created -> Request Identity based on labels
|
v
KVStore (CRD or etcd)
|
v (Identity allocation/lookup)
CiliumIdentity CRD:
metadata:
name: "48291"
security-labels:
k8s:app: frontend
k8s:io.kubernetes.pod.namespace: default
|
v
Node B: Update IP-to-Identity mapping in ipcache
5.4 Reserved Identities
Cilium defines special-purpose reserved Identities:
| Identity | Numeric Value | Meaning |
|---|---|---|
| unknown | 0 | Unknown source |
| host | 1 | Local host |
| world | 2 | Outside the cluster |
| unmanaged | 3 | Endpoint not managed by Cilium |
| health | 4 | Health check endpoint |
| init | 5 | Endpoint being initialized |
| remote-node | 6 | Remote node |
| kube-apiserver | 7 | Kubernetes API server |
| ingress | 8 | Ingress |
6. Endpoint Lifecycle
6.1 Endpoint Creation
1. Receive Pod creation event
|
v
2. CNI ADD call -> Create veth pair
|
v
3. Allocate IP address (IPAM)
|
v
4. Allocate Identity
|
v
5. Compute policies (Identity-based)
|
v
6. Compile BPF programs
(policy + Identity + config -> customized BPF code)
|
v
7. Load BPF programs (tc ingress/egress)
|
v
8. Update BPF maps
(ipcache, endpoint map, policy map)
|
v
9. Endpoint state: Ready
6.2 Endpoint Regeneration
When policies change, BPF programs for affected endpoints are recompiled:
# Endpoint regeneration trigger reasons:
# - Network policy change
# - Identity change
# - Cilium configuration change
# - IP change for FQDN referenced in policies
# Monitor regeneration status
cilium endpoint list
# ID IDENTITY POLICY ENDPOINT STATUS
# 1234 48291 OK ready
# 1235 48292 OK regenerating <-- regenerating
6.3 Endpoint Deletion
1. Receive Pod deletion event
|
v
2. CNI DEL call
|
v
3. Detach and remove BPF programs
|
v
4. Remove related entries from BPF maps
|
v
5. Delete veth pair
|
v
6. Return IP address (IPAM)
|
v
7. Clean up endpoint metadata
6.4 Endpoint Status Inspection
# Detailed endpoint info
cilium endpoint get 1234
# Endpoint health check
cilium endpoint health 1234
# BPF programs per endpoint
cilium bpf prog list
# Endpoint logs
cilium endpoint log 1234
7. State Management and Restoration
7.1 KVStore Backends
Cilium supports two KVStore backends:
# CRD-based (default)
# - Uses CRDs like CiliumIdentity, CiliumEndpoint, CiliumNode
# - No etcd dependency
# - State management through Kubernetes API server
# etcd-based
# - Requires a separate etcd cluster
# - Performance advantage in large clusters
# - Required for ClusterMesh (clustermesh-apiserver)
7.2 State Restoration on Agent Restart
Agent restart
|
v
Restore BPF maps (maps persist in kernel)
|
v
Discover existing endpoints (scan veth interfaces)
|
v
Restore endpoint state (load from CRDs)
|
v
Warm Identity cache
|
v
Recompute and apply policies
|
v
Resume normal operation
Even when the Agent restarts, BPF maps and programs persist in the kernel, so the datapath continues to operate without interruption. This ensures network connectivity is maintained during Agent upgrades.
8. Architecture Debugging Tools
8.1 Essential Debugging Commands
# Full status check
cilium status --verbose
# BPF programming state
cilium bpf prog list
# Datapath configuration check
cilium debuginfo
# Monitoring (real-time packet events)
cilium monitor --type trace
cilium monitor --type drop
cilium monitor --type policy-verdict
# Metrics check
cilium metrics list
8.2 Troubleshooting Checklist
- Check Agent/Operator status with
cilium status - Check endpoint state with
cilium endpoint list(ready/not-ready/regenerating) - Verify IP-to-Identity mapping with
cilium bpf ipcache list - Analyze dropped packet causes with
cilium monitor --type drop - Verify applied policies with
cilium policy get - Check connection tracking state with
cilium bpf ct list global
Summary
Cilium's architecture is built on the following core design principles:
- Kernel-Level Processing: Processing network packets directly in the kernel via eBPF for high performance
- Identity-Based Security: Applying policies based on label-based Identity instead of IP addresses, suitable for dynamic environments
- Declarative Management: Declarative state management through Kubernetes CRDs
- Zero-Downtime Updates: BPF programs and maps persist in the kernel, maintaining the datapath during Agent restarts
- Role Separation: Clear separation of responsibilities between Agent (node-level) and Operator (cluster-level)