Skip to content

필사 모드: Cilium Architecture Internals: The Core of eBPF-Based Networking

English
0%
정확도 0%
💡 왼쪽 원문을 읽으면서 오른쪽에 따라 써보세요. Tab 키로 힌트를 받을 수 있습니다.
원문 렌더가 준비되기 전까지 텍스트 가이드로 표시합니다.

Cilium Architecture Internals: The Core of eBPF-Based Networking

Overview

Cilium is a CNI plugin that leverages eBPF (extended Berkeley Packet Filter) to provide high-performance networking, security, and observability in Kubernetes environments. This post provides a deep analysis of the core components that make up Cilium's internal architecture.

1. Cilium Agent: The Node-Level Core Engine

1.1 Agent Overview

The Cilium Agent is the core component that runs as a DaemonSet on each Kubernetes node. Its primary responsibilities include:

- **Endpoint Management**: Configuring network interfaces when Pods are created/deleted

- **Policy Compilation**: Translating CiliumNetworkPolicy into eBPF programs

- **eBPF Program Loading**: Loading compiled BPF programs into the kernel

- **Identity Management**: Allocating and tracking security label-based Identities

- **IPAM**: Node-level IP address management

- **State Synchronization**: Syncing state with KVStore or CRDs

1.2 Agent Internal Structure

The Cilium Agent is composed of several subsystems:

// Key subsystems of the Cilium Agent (conceptual structure)

type Daemon struct {

// Endpoint management

endpointManager *endpoint.EndpointManager

// Policy repository

policy *policy.Repository

// Identity allocator

identityAllocator *cache.CachingIdentityAllocator

// IPAM

ipam *ipam.IPAM

// Datapath management

datapath datapath.Datapath

// KVStore client

kvStore kvstore.BackendOperations

// Monitoring

monitorAgent monitorAgent.Agent

}

1.3 Agent Startup Process

When the Agent starts, initialization proceeds in the following order:

1. **Load Configuration**: Read settings from ConfigMap or command-line arguments

2. **KVStore Connection**: Connect to etcd or CRD-based KVStore

3. **IPAM Initialization**: Set up IP address pools

4. **BPF Map Initialization**: Create or restore required BPF maps

5. **Restore Existing Endpoints**: Restore existing endpoint state on restart

6. **Load Policies**: Load and apply existing network policies

7. **Start Event Watching**: Watch Kubernetes API server events

Check Agent status

cilium status --verbose

Check Agent configuration

cilium config

1.4 Endpoint Management Details

The Cilium Agent is called through the CNI interface when Pods are created:

Cilium CNI call flow on Pod creation:

1. kubelet requests Pod creation from container runtime via CRI

2. Container runtime calls CNI plugin (Cilium)

3. Cilium CNI makes API call to Agent

4. Agent creates veth pair, allocates IP, loads BPF programs

List endpoints

cilium endpoint list

Get detailed endpoint info

cilium endpoint get 12345

2. Cilium Operator: Cluster-Level Management

2.1 Operator Role

The Cilium Operator is a component that runs at the cluster level, deployed as a single instance (or multiple instances with leader election for HA).

Key responsibilities:

- **IPAM Management**: Cluster-wide IP pool management (Cluster Scope IPAM, AWS ENI, etc.)

- **CRD Management**: Garbage collection for CiliumNode, CiliumIdentity, etc.

- **Node Discovery**: New node registration and metadata management

- **Ingress/Gateway Management**: Processing Ingress and Gateway API resources

2.2 Role Separation Between Operator and Agent

+------------------+ +-------------------+

| Cilium Operator | | Cilium Agent |

| (Cluster Scope) | | (Node Scope) |

+------------------+ +-------------------+

| - Cluster IPAM | | - Endpoint mgmt |

| - CRD GC | | - BPF program load |

| - Node discovery | | - Policy enforce |

| - Ingress handle | | - Conntrack mgmt |

| - Identity GC | | - Identity alloc |

+------------------+ +-------------------+

2.3 Operator Configuration Example

apiVersion: apps/v1

kind: Deployment

metadata:

name: cilium-operator

namespace: kube-system

spec:

replicas: 2

strategy:

type: RollingUpdate

rollingUpdate:

maxUnavailable: 1

template:

spec:

containers:

- name: cilium-operator

image: quay.io/cilium/operator-generic:v1.16.0

args:

- --config-dir=/tmp/cilium/config-map

- --cluster-pool-ipv4-cidr=10.0.0.0/8

- --cluster-pool-ipv4-mask-size=24

3. eBPF Datapath: Kernel-Level Network Processing

3.1 eBPF Program Hook Points

Cilium attaches eBPF programs to several Linux kernel hook points.

tc (Traffic Control) Hooks

The primary hook point, attached at the ingress/egress of each network interface:

Packet receive flow:

NIC -> [XDP] -> Driver -> [tc ingress] -> Network Stack -> [tc egress] -> NIC

Cilium BPF program attachment points:

- lxc* (veth): from-container, to-container

- cilium_host: to-host, from-host

- cilium_net: from-host (overlay)

- eth0 (physical): from-netdev, to-netdev

XDP (eXpress Data Path)

The fastest packet processing path, operating at the network driver level:

Check XDP program

ip link show dev eth0

eth0: ... xdp/id:42 ...

List BPF programs

bpftool prog list

Socket-Level Hooks

Intercept socket system calls to perform service load balancing:

connect() -> [BPF sock_ops] -> Translate service IP to backend IP

sendmsg() -> [BPF sk_msg] -> Redirect messages between sockets

3.2 BPF Program Compilation Pipeline

C source code (bpf/*.c)

|

v

LLVM/Clang (BPF backend)

|

v

BPF ELF object file (.o)

|

v

BPF loader (bpf syscall)

|

v

Kernel BPF verifier

|

v

JIT compilation -> Native machine code

|

v

Execute in kernel

3.3 Key BPF Program Files

bpf/

bpf_lxc.c # Pod (endpoint) datapath

bpf_host.c # Host network datapath

bpf_overlay.c # Overlay (VXLAN/Geneve) datapath

bpf_network.c # Physical network interface

bpf_xdp.c # XDP program

bpf_sock.c # Socket-level programs

lib/

common.h # Common headers and macros

maps.h # BPF map definitions

policy.h # Policy-related functions

conntrack.h # Connection tracking

nat.h # NAT engine

lb.h # Load balancer

4. BPF Maps: Datapath State Storage

4.1 Key BPF Map Types

Cilium uses various BPF map types:

| Map Name | Type | Purpose |

| ----------------------- | ---------------- | ---------------------------------- |

| cilium_ipcache | LPM Trie | IP to Identity/tunnel info mapping |

| cilium_policy | Hash | Identity-based policy lookup |

| cilium_ct4_global | Hash (LRU) | IPv4 connection tracking |

| cilium_ct6_global | Hash (LRU) | IPv6 connection tracking |

| cilium_lb4_services | Hash | IPv4 service load balancer |

| cilium_lb4_backends | Hash | IPv4 backend info |

| cilium_snat_v4_external | Hash (LRU) | IPv4 SNAT mapping |

| cilium_tunnel_map | Hash | Tunnel endpoint mapping |

| cilium_lxc | Hash | Endpoint info |

| cilium_signals | Perf Event Array | Event signaling |

4.2 BPF Map Inspection Commands

List all BPF maps

bpftool map list

Inspect specific map contents (ipcache)

cilium bpf ipcache list

Connection tracking table

cilium bpf ct list global

Service load balancer table

cilium bpf lb list

Check policy map

cilium bpf policy get 12345

Check tunnel map

cilium bpf tunnel list

4.3 LPM Trie: The Core of CIDR Policies

LPM (Longest Prefix Match) Trie is used for CIDR-based policy matching:

IP Cache structure:

10.0.0.0/8 -> Identity: 1 (world)

10.244.0.0/16 -> Identity: 100 (cluster)

10.244.1.0/24 -> Identity: 200 (specific namespace)

10.244.1.5/32 -> Identity: 300 (specific Pod)

Lookup: 10.244.1.5 -> Identity 300 (most specific match)

4.4 LRU Hash: Connection Tracking

LRU (Least Recently Used) Hash maps are used for connection tracking. When the map is full, the oldest entries are automatically evicted.

conntrack entry example

cilium bpf ct list global

SRC: 10.244.1.5:34567 DST: 10.96.0.1:443

Flags: rx+tx closing

Lifetime: 120s

RxPackets: 42 RxBytes: 3456

TxPackets: 38 TxBytes: 2890

5. Identity Allocation: Label-Based Security Model

5.1 Identity Allocation Mechanism

Cilium's security model is founded on label-based Identity:

Pod labels:

app: frontend

env: production

team: web

|

v (extract security-relevant labels)

Security-relevant labels:

k8s:app=frontend

k8s:io.kubernetes.pod.namespace=default

|

v (hash computation)

Identity: 48291 (numeric identifier)

5.2 Determining Security-Relevant Labels

Not all labels are used for Identity. By default, the following are considered security-relevant labels:

List Identities

cilium identity list

Get specific Identity details

cilium identity get 48291

Labels included in Identity:

- k8s:app=...

- k8s:io.kubernetes.pod.namespace=...

- k8s:io.cilium.k8s.policy.cluster=...

Labels excluded from Identity:

- k8s:controller-revision-hash=...

- k8s:pod-template-hash=...

- k8s:pod-template-generation=...

5.3 Identity Synchronization

Identities are synchronized cluster-wide through the KVStore:

Node A: Pod created -> Request Identity based on labels

|

v

KVStore (CRD or etcd)

|

v (Identity allocation/lookup)

CiliumIdentity CRD:

metadata:

name: "48291"

security-labels:

k8s:app: frontend

k8s:io.kubernetes.pod.namespace: default

|

v

Node B: Update IP-to-Identity mapping in ipcache

5.4 Reserved Identities

Cilium defines special-purpose reserved Identities:

| Identity | Numeric Value | Meaning |

| -------------- | ------------- | ------------------------------ |

| unknown | 0 | Unknown source |

| host | 1 | Local host |

| world | 2 | Outside the cluster |

| unmanaged | 3 | Endpoint not managed by Cilium |

| health | 4 | Health check endpoint |

| init | 5 | Endpoint being initialized |

| remote-node | 6 | Remote node |

| kube-apiserver | 7 | Kubernetes API server |

| ingress | 8 | Ingress |

6. Endpoint Lifecycle

6.1 Endpoint Creation

1. Receive Pod creation event

|

v

2. CNI ADD call -> Create veth pair

|

v

3. Allocate IP address (IPAM)

|

v

4. Allocate Identity

|

v

5. Compute policies (Identity-based)

|

v

6. Compile BPF programs

(policy + Identity + config -> customized BPF code)

|

v

7. Load BPF programs (tc ingress/egress)

|

v

8. Update BPF maps

(ipcache, endpoint map, policy map)

|

v

9. Endpoint state: Ready

6.2 Endpoint Regeneration

When policies change, BPF programs for affected endpoints are recompiled:

Endpoint regeneration trigger reasons:

- Network policy change

- Identity change

- Cilium configuration change

- IP change for FQDN referenced in policies

Monitor regeneration status

cilium endpoint list

ID IDENTITY POLICY ENDPOINT STATUS

1234 48291 OK ready

1235 48292 OK regenerating <-- regenerating

6.3 Endpoint Deletion

1. Receive Pod deletion event

|

v

2. CNI DEL call

|

v

3. Detach and remove BPF programs

|

v

4. Remove related entries from BPF maps

|

v

5. Delete veth pair

|

v

6. Return IP address (IPAM)

|

v

7. Clean up endpoint metadata

6.4 Endpoint Status Inspection

Detailed endpoint info

cilium endpoint get 1234

Endpoint health check

cilium endpoint health 1234

BPF programs per endpoint

cilium bpf prog list

Endpoint logs

cilium endpoint log 1234

7. State Management and Restoration

7.1 KVStore Backends

Cilium supports two KVStore backends:

CRD-based (default)

- Uses CRDs like CiliumIdentity, CiliumEndpoint, CiliumNode

- No etcd dependency

- State management through Kubernetes API server

etcd-based

- Requires a separate etcd cluster

- Performance advantage in large clusters

- Required for ClusterMesh (clustermesh-apiserver)

7.2 State Restoration on Agent Restart

Agent restart

|

v

Restore BPF maps (maps persist in kernel)

|

v

Discover existing endpoints (scan veth interfaces)

|

v

Restore endpoint state (load from CRDs)

|

v

Warm Identity cache

|

v

Recompute and apply policies

|

v

Resume normal operation

Even when the Agent restarts, BPF maps and programs persist in the kernel, so the datapath continues to operate without interruption. This ensures network connectivity is maintained during Agent upgrades.

8. Architecture Debugging Tools

8.1 Essential Debugging Commands

Full status check

cilium status --verbose

BPF programming state

cilium bpf prog list

Datapath configuration check

cilium debuginfo

Monitoring (real-time packet events)

cilium monitor --type trace

cilium monitor --type drop

cilium monitor --type policy-verdict

Metrics check

cilium metrics list

8.2 Troubleshooting Checklist

1. Check Agent/Operator status with `cilium status`

2. Check endpoint state with `cilium endpoint list` (ready/not-ready/regenerating)

3. Verify IP-to-Identity mapping with `cilium bpf ipcache list`

4. Analyze dropped packet causes with `cilium monitor --type drop`

5. Verify applied policies with `cilium policy get`

6. Check connection tracking state with `cilium bpf ct list global`

Summary

Cilium's architecture is built on the following core design principles:

- **Kernel-Level Processing**: Processing network packets directly in the kernel via eBPF for high performance

- **Identity-Based Security**: Applying policies based on label-based Identity instead of IP addresses, suitable for dynamic environments

- **Declarative Management**: Declarative state management through Kubernetes CRDs

- **Zero-Downtime Updates**: BPF programs and maps persist in the kernel, maintaining the datapath during Agent restarts

- **Role Separation**: Clear separation of responsibilities between Agent (node-level) and Operator (cluster-level)

현재 단락 (1/287)

Cilium is a CNI plugin that leverages eBPF (extended Berkeley Packet Filter) to provide high-perform...

작성 글자: 0원문 글자: 11,289작성 단락: 0/287