Skip to content
Published on

Cilium Architecture Internals: The Core of eBPF-Based Networking

Authors

Cilium Architecture Internals: The Core of eBPF-Based Networking

Overview

Cilium is a CNI plugin that leverages eBPF (extended Berkeley Packet Filter) to provide high-performance networking, security, and observability in Kubernetes environments. This post provides a deep analysis of the core components that make up Cilium's internal architecture.

1. Cilium Agent: The Node-Level Core Engine

1.1 Agent Overview

The Cilium Agent is the core component that runs as a DaemonSet on each Kubernetes node. Its primary responsibilities include:

  • Endpoint Management: Configuring network interfaces when Pods are created/deleted
  • Policy Compilation: Translating CiliumNetworkPolicy into eBPF programs
  • eBPF Program Loading: Loading compiled BPF programs into the kernel
  • Identity Management: Allocating and tracking security label-based Identities
  • IPAM: Node-level IP address management
  • State Synchronization: Syncing state with KVStore or CRDs

1.2 Agent Internal Structure

The Cilium Agent is composed of several subsystems:

// Key subsystems of the Cilium Agent (conceptual structure)
type Daemon struct {
    // Endpoint management
    endpointManager  *endpoint.EndpointManager
    // Policy repository
    policy           *policy.Repository
    // Identity allocator
    identityAllocator *cache.CachingIdentityAllocator
    // IPAM
    ipam             *ipam.IPAM
    // Datapath management
    datapath         datapath.Datapath
    // KVStore client
    kvStore          kvstore.BackendOperations
    // Monitoring
    monitorAgent     monitorAgent.Agent
}

1.3 Agent Startup Process

When the Agent starts, initialization proceeds in the following order:

  1. Load Configuration: Read settings from ConfigMap or command-line arguments
  2. KVStore Connection: Connect to etcd or CRD-based KVStore
  3. IPAM Initialization: Set up IP address pools
  4. BPF Map Initialization: Create or restore required BPF maps
  5. Restore Existing Endpoints: Restore existing endpoint state on restart
  6. Load Policies: Load and apply existing network policies
  7. Start Event Watching: Watch Kubernetes API server events
# Check Agent status
cilium status --verbose

# Check Agent configuration
cilium config

1.4 Endpoint Management Details

The Cilium Agent is called through the CNI interface when Pods are created:

# Cilium CNI call flow on Pod creation:
# 1. kubelet requests Pod creation from container runtime via CRI
# 2. Container runtime calls CNI plugin (Cilium)
# 3. Cilium CNI makes API call to Agent
# 4. Agent creates veth pair, allocates IP, loads BPF programs

# List endpoints
cilium endpoint list

# Get detailed endpoint info
cilium endpoint get 12345

2. Cilium Operator: Cluster-Level Management

2.1 Operator Role

The Cilium Operator is a component that runs at the cluster level, deployed as a single instance (or multiple instances with leader election for HA).

Key responsibilities:

  • IPAM Management: Cluster-wide IP pool management (Cluster Scope IPAM, AWS ENI, etc.)
  • CRD Management: Garbage collection for CiliumNode, CiliumIdentity, etc.
  • Node Discovery: New node registration and metadata management
  • Ingress/Gateway Management: Processing Ingress and Gateway API resources

2.2 Role Separation Between Operator and Agent

+------------------+      +-------------------+
|  Cilium Operator |      |   Cilium Agent    |
|  (Cluster Scope) |      |   (Node Scope)    |
+------------------+      +-------------------+
| - Cluster IPAM   |      | - Endpoint mgmt   |
| - CRD GC         |      | - BPF program load |
| - Node discovery  |      | - Policy enforce  |
| - Ingress handle |      | - Conntrack mgmt  |
| - Identity GC    |      | - Identity alloc  |
+------------------+      +-------------------+

2.3 Operator Configuration Example

apiVersion: apps/v1
kind: Deployment
metadata:
  name: cilium-operator
  namespace: kube-system
spec:
  replicas: 2
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1
  template:
    spec:
      containers:
        - name: cilium-operator
          image: quay.io/cilium/operator-generic:v1.16.0
          args:
            - --config-dir=/tmp/cilium/config-map
            - --cluster-pool-ipv4-cidr=10.0.0.0/8
            - --cluster-pool-ipv4-mask-size=24

3. eBPF Datapath: Kernel-Level Network Processing

3.1 eBPF Program Hook Points

Cilium attaches eBPF programs to several Linux kernel hook points.

tc (Traffic Control) Hooks

The primary hook point, attached at the ingress/egress of each network interface:

Packet receive flow:
NIC -> [XDP] -> Driver -> [tc ingress] -> Network Stack -> [tc egress] -> NIC

Cilium BPF program attachment points:
- lxc* (veth): from-container, to-container
- cilium_host: to-host, from-host
- cilium_net: from-host (overlay)
- eth0 (physical): from-netdev, to-netdev

XDP (eXpress Data Path)

The fastest packet processing path, operating at the network driver level:

# Check XDP program
ip link show dev eth0
# eth0: ... xdp/id:42 ...

# List BPF programs
bpftool prog list

Socket-Level Hooks

Intercept socket system calls to perform service load balancing:

connect() -> [BPF sock_ops] -> Translate service IP to backend IP
sendmsg() -> [BPF sk_msg] -> Redirect messages between sockets

3.2 BPF Program Compilation Pipeline

C source code (bpf/*.c)
    |
    v
LLVM/Clang (BPF backend)
    |
    v
BPF ELF object file (.o)
    |
    v
BPF loader (bpf syscall)
    |
    v
Kernel BPF verifier
    |
    v
JIT compilation -> Native machine code
    |
    v
Execute in kernel

3.3 Key BPF Program Files

bpf/
  bpf_lxc.c          # Pod (endpoint) datapath
  bpf_host.c          # Host network datapath
  bpf_overlay.c       # Overlay (VXLAN/Geneve) datapath
  bpf_network.c       # Physical network interface
  bpf_xdp.c           # XDP program
  bpf_sock.c          # Socket-level programs
  lib/
    common.h          # Common headers and macros
    maps.h            # BPF map definitions
    policy.h           # Policy-related functions
    conntrack.h        # Connection tracking
    nat.h              # NAT engine
    lb.h               # Load balancer

4. BPF Maps: Datapath State Storage

4.1 Key BPF Map Types

Cilium uses various BPF map types:

Map NameTypePurpose
cilium_ipcacheLPM TrieIP to Identity/tunnel info mapping
cilium_policyHashIdentity-based policy lookup
cilium_ct4_globalHash (LRU)IPv4 connection tracking
cilium_ct6_globalHash (LRU)IPv6 connection tracking
cilium_lb4_servicesHashIPv4 service load balancer
cilium_lb4_backendsHashIPv4 backend info
cilium_snat_v4_externalHash (LRU)IPv4 SNAT mapping
cilium_tunnel_mapHashTunnel endpoint mapping
cilium_lxcHashEndpoint info
cilium_signalsPerf Event ArrayEvent signaling

4.2 BPF Map Inspection Commands

# List all BPF maps
bpftool map list

# Inspect specific map contents (ipcache)
cilium bpf ipcache list

# Connection tracking table
cilium bpf ct list global

# Service load balancer table
cilium bpf lb list

# Check policy map
cilium bpf policy get 12345

# Check tunnel map
cilium bpf tunnel list

4.3 LPM Trie: The Core of CIDR Policies

LPM (Longest Prefix Match) Trie is used for CIDR-based policy matching:

IP Cache structure:
10.0.0.0/8    -> Identity: 1 (world)
10.244.0.0/16 -> Identity: 100 (cluster)
10.244.1.0/24 -> Identity: 200 (specific namespace)
10.244.1.5/32 -> Identity: 300 (specific Pod)

Lookup: 10.244.1.5 -> Identity 300 (most specific match)

4.4 LRU Hash: Connection Tracking

LRU (Least Recently Used) Hash maps are used for connection tracking. When the map is full, the oldest entries are automatically evicted.

# conntrack entry example
cilium bpf ct list global
# SRC: 10.244.1.5:34567 DST: 10.96.0.1:443
# Flags: rx+tx closing
# Lifetime: 120s
# RxPackets: 42 RxBytes: 3456
# TxPackets: 38 TxBytes: 2890

5. Identity Allocation: Label-Based Security Model

5.1 Identity Allocation Mechanism

Cilium's security model is founded on label-based Identity:

Pod labels:
  app: frontend
  env: production
  team: web

  |
  v (extract security-relevant labels)

Security-relevant labels:
  k8s:app=frontend
  k8s:io.kubernetes.pod.namespace=default

  |
  v (hash computation)

Identity: 48291 (numeric identifier)

5.2 Determining Security-Relevant Labels

Not all labels are used for Identity. By default, the following are considered security-relevant labels:

# List Identities
cilium identity list

# Get specific Identity details
cilium identity get 48291

# Labels included in Identity:
# - k8s:app=...
# - k8s:io.kubernetes.pod.namespace=...
# - k8s:io.cilium.k8s.policy.cluster=...

# Labels excluded from Identity:
# - k8s:controller-revision-hash=...
# - k8s:pod-template-hash=...
# - k8s:pod-template-generation=...

5.3 Identity Synchronization

Identities are synchronized cluster-wide through the KVStore:

Node A: Pod created -> Request Identity based on labels
    |
    v
KVStore (CRD or etcd)
    |
    v (Identity allocation/lookup)
CiliumIdentity CRD:
  metadata:
    name: "48291"
  security-labels:
    k8s:app: frontend
    k8s:io.kubernetes.pod.namespace: default
    |
    v
Node B: Update IP-to-Identity mapping in ipcache

5.4 Reserved Identities

Cilium defines special-purpose reserved Identities:

IdentityNumeric ValueMeaning
unknown0Unknown source
host1Local host
world2Outside the cluster
unmanaged3Endpoint not managed by Cilium
health4Health check endpoint
init5Endpoint being initialized
remote-node6Remote node
kube-apiserver7Kubernetes API server
ingress8Ingress

6. Endpoint Lifecycle

6.1 Endpoint Creation

1. Receive Pod creation event
   |
   v
2. CNI ADD call -> Create veth pair
   |
   v
3. Allocate IP address (IPAM)
   |
   v
4. Allocate Identity
   |
   v
5. Compute policies (Identity-based)
   |
   v
6. Compile BPF programs
   (policy + Identity + config -> customized BPF code)
   |
   v
7. Load BPF programs (tc ingress/egress)
   |
   v
8. Update BPF maps
   (ipcache, endpoint map, policy map)
   |
   v
9. Endpoint state: Ready

6.2 Endpoint Regeneration

When policies change, BPF programs for affected endpoints are recompiled:

# Endpoint regeneration trigger reasons:
# - Network policy change
# - Identity change
# - Cilium configuration change
# - IP change for FQDN referenced in policies

# Monitor regeneration status
cilium endpoint list
# ID    IDENTITY   POLICY   ENDPOINT STATUS
# 1234  48291      OK       ready
# 1235  48292      OK       regenerating  <-- regenerating

6.3 Endpoint Deletion

1. Receive Pod deletion event
   |
   v
2. CNI DEL call
   |
   v
3. Detach and remove BPF programs
   |
   v
4. Remove related entries from BPF maps
   |
   v
5. Delete veth pair
   |
   v
6. Return IP address (IPAM)
   |
   v
7. Clean up endpoint metadata

6.4 Endpoint Status Inspection

# Detailed endpoint info
cilium endpoint get 1234

# Endpoint health check
cilium endpoint health 1234

# BPF programs per endpoint
cilium bpf prog list

# Endpoint logs
cilium endpoint log 1234

7. State Management and Restoration

7.1 KVStore Backends

Cilium supports two KVStore backends:

# CRD-based (default)
# - Uses CRDs like CiliumIdentity, CiliumEndpoint, CiliumNode
# - No etcd dependency
# - State management through Kubernetes API server

# etcd-based
# - Requires a separate etcd cluster
# - Performance advantage in large clusters
# - Required for ClusterMesh (clustermesh-apiserver)

7.2 State Restoration on Agent Restart

Agent restart
    |
    v
Restore BPF maps (maps persist in kernel)
    |
    v
Discover existing endpoints (scan veth interfaces)
    |
    v
Restore endpoint state (load from CRDs)
    |
    v
Warm Identity cache
    |
    v
Recompute and apply policies
    |
    v
Resume normal operation

Even when the Agent restarts, BPF maps and programs persist in the kernel, so the datapath continues to operate without interruption. This ensures network connectivity is maintained during Agent upgrades.

8. Architecture Debugging Tools

8.1 Essential Debugging Commands

# Full status check
cilium status --verbose

# BPF programming state
cilium bpf prog list

# Datapath configuration check
cilium debuginfo

# Monitoring (real-time packet events)
cilium monitor --type trace
cilium monitor --type drop
cilium monitor --type policy-verdict

# Metrics check
cilium metrics list

8.2 Troubleshooting Checklist

  1. Check Agent/Operator status with cilium status
  2. Check endpoint state with cilium endpoint list (ready/not-ready/regenerating)
  3. Verify IP-to-Identity mapping with cilium bpf ipcache list
  4. Analyze dropped packet causes with cilium monitor --type drop
  5. Verify applied policies with cilium policy get
  6. Check connection tracking state with cilium bpf ct list global

Summary

Cilium's architecture is built on the following core design principles:

  • Kernel-Level Processing: Processing network packets directly in the kernel via eBPF for high performance
  • Identity-Based Security: Applying policies based on label-based Identity instead of IP addresses, suitable for dynamic environments
  • Declarative Management: Declarative state management through Kubernetes CRDs
  • Zero-Downtime Updates: BPF programs and maps persist in the kernel, maintaining the datapath during Agent restarts
  • Role Separation: Clear separation of responsibilities between Agent (node-level) and Operator (cluster-level)