- Authors

- Name
- Youngju Kim
- @fjvbn20031
containerd Container Lifecycle Management
This post analyzes the complete lifecycle of containers in containerd, from creation to termination. We examine the separation of container metadata and execution processes (Tasks), process management through shims, and integration of various runtime classes.
1. Separation of Container and Task
1.1 Core Concepts
containerd separates container metadata from execution state:
Container (metadata):
- ID, image reference, snapshot key
- OCI runtime spec
- Labels, extension data
- Persisted in BoltDB
Task (execution state):
- Actually running process
- PID, state (created/running/stopped)
- stdin/stdout/stderr
- Managed by shim process
1.2 Benefits of Separation
Benefits of separated design:
1. Container metadata can exist without a Task
- Create container and start later
- Preserve metadata of stopped containers
2. Independent of containerd restarts
- Tasks are managed by shim, surviving containerd restarts
- Reconnect to existing shims after restart
3. Support for multiple runtimes
- Container object is runtime-agnostic
- Runtime selected at Task creation time
2. Container Creation
2.1 Creation Process
Container creation flow:
1. Generate OCI spec from image
|
v
2. Prepare snapshot
- Add Active snapshot to image snapshot chain
- Writable layer for the container
|
v
3. Store container metadata
- Create Container record in BoltDB
- Store ID, image, snapshot, runtime, spec
|
v
4. Return container object
(process not yet started)
2.2 OCI Runtime Spec
containerd generates an OCI runtime spec to define the container execution environment:
OCI runtime spec key sections:
ociVersion: "1.0.2"
process:
terminal: false
user: uid=0, gid=0
args: ["/bin/sh"]
env: ["PATH=/usr/local/sbin:..."]
cwd: "/"
capabilities: ...
rlimits: ...
root:
path: "rootfs"
readonly: false
hostname: "container-abc"
mounts:
- destination: "/proc"
type: "proc"
source: "proc"
- destination: "/dev"
type: "tmpfs"
source: "tmpfs"
linux:
namespaces:
- type: "pid"
- type: "network"
- type: "ipc"
- type: "uts"
- type: "mount"
resources:
memory:
limit: 536870912
cpu:
shares: 1024
quota: 100000
period: 100000
cgroupsPath: "/kubelet/pod-abc/container-xyz"
2.3 Spec Generators (Spec Opts)
containerd spec generation pattern:
Spec Opts are function chains that incrementally build the OCI spec:
WithImageConfig(image) -> Apply image CMD, ENV, WORKDIR
WithHostNamespace(ns) -> Share host namespace
WithMemoryLimit(limit) -> Set memory limit
WithCPUs(cpus) -> Set CPU limit
WithMounts(mounts) -> Add mount points
WithProcessArgs(args) -> Set process arguments
WithRootfsPropagation(p) -> Set rootfs mount propagation
WithSeccompProfile(p) -> Apply Seccomp profile
WithApparmorProfile(p) -> Apply AppArmor profile
3. Task Execution
3.1 Task Creation
Task creation flow:
1. Check container's runtime type
(e.g., io.containerd.runc.v2)
|
v
2. Execute shim binary
(containerd-shim-runc-v2 start)
|
v
3. Shim returns ttrpc socket address
|
v
4. containerd sends Create request to shim
- Pass OCI spec
- Pass bundle path
|
v
5. Shim executes runc create
- Create namespaces
- Configure cgroups
- Mount rootfs
- Create process (not yet started)
|
v
6. Task state: Created
3.2 Task Start
Task start:
1. containerd sends Start request to shim
|
v
2. Shim executes runc start
- Start container process init
- Synchronize via exec.fifo
|
v
3. Task state: Running
- PID assigned
- stdin/stdout/stderr connected
3.3 Task State Transitions
Task state machine:
Created
|
| Start()
v
Running
|
+-- Kill(signal) -> Send signal
|
+-- Pause() -> Paused
| |
| +-- Resume() -> Running
|
+-- Process exits -> Stopped
|
v
Stopped
|
| Delete()
v
(deleted)
3.4 Exec (Additional Processes)
Exec operation:
Add a new process to an already running container:
1. Create ExecProcess
- Define new process spec (args, env, user)
- Assign execID
|
v
2. Send Exec request to shim
|
v
3. Execute runc exec
- Enter existing container namespaces
- Start new process
|
v
4. Independently manage stdin/stdout/stderr
Use cases: kubectl exec, docker exec
4. Shim Lifecycle
4.1 Shim Start
Shim start process:
1. containerd fork/execs shim binary
containerd-shim-runc-v2 -namespace k8s.io \
-id container-abc \
-address /run/containerd/containerd.sock \
start
|
v
2. Shim daemonizes itself
- Detach from parent process (setsid)
- Run independently of containerd
|
v
3. Create ttrpc Unix socket
/run/containerd/s/abc123...
|
v
4. Output socket address to stdout
containerd reads this address to connect
4.2 Shim Responsibilities
Shim key responsibilities:
1. Process management:
- Act as parent of container process
- Collect exit status via wait4()
- Detect and report OOM events
2. I/O management:
- Manage stdin/stdout/stderr FIFOs
- Connect to log drivers
- Copy I/O (containerProcess <-> FIFO)
3. Communication with containerd:
- Receive commands via ttrpc
- Report events (TaskExit, etc.)
- Respond to status queries
4. containerd restart resilience:
- Continue running when containerd restarts
- Restarted containerd reconnects to existing shim
- State recovery
4.3 Shim Shutdown
Shim shutdown:
1. Receive Task Delete request
|
v
2. Clean up container resources
- Delete cgroups
- Clean up namespaces
- Unmount rootfs
|
v
3. Close ttrpc socket
|
v
4. Shim process exits
5. Checkpoint/Restore
5.1 Checkpoint
Checkpoint operation:
Save running container state as a snapshot:
1. Invoke CRIU (Checkpoint/Restore in Userspace)
|
v
2. Dump process memory
- Save memory pages
- Save file descriptor state
- Save network connection state
|
v
3. Create checkpoint image
- CRIU image file set
- Stored alongside container spec
|
v
4. Optionally stop the container
Use cases:
- Live migration
- Fast start (restore from pre-warmed state)
- Debugging (capture state at specific point)
5.2 Restore
Restore operation:
1. Load checkpoint image
|
v
2. Prepare new container environment
- Create namespaces
- Mount rootfs
|
v
3. Execute CRIU restore
- Restore memory pages
- Restore process state
- Reconnect file descriptors
|
v
4. Resume process execution
6. Runtime Classes
6.1 Multiple Runtime Support
containerd supports various runtimes through the shim interface:
Runtime class comparison:
+----------+------------+-----------+----------+---------------+
| Runtime | Isolation | Overhead | Startup | Compatibility |
+----------+------------+-----------+----------+---------------+
| runc | Namespace | Minimal | Fast | Best |
| kata | Light VM | Medium | Medium | High |
| gVisor | User kernel| Low | Fast | Medium |
| Wasm | Wasm sandbox| Minimal | Very fast| Limited |
+----------+------------+-----------+----------+---------------+
6.2 runc
runc:
- Default OCI runtime
- Linux namespace and cgroup-based isolation
- Uses host kernel directly
- Lowest overhead
- Suitable for all Linux container workloads
shim: containerd-shim-runc-v2
config.toml:
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
runtime_type = "io.containerd.runc.v2"
6.3 Kata Containers
Kata Containers:
- Runs containers inside lightweight VMs
- Uses QEMU/Cloud-Hypervisor/Firecracker
- Strong isolation with separate guest kernel
- Suited for multi-tenant environments
- VM overhead exists
shim: containerd-shim-kata-v2
config.toml:
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.kata]
runtime_type = "io.containerd.kata.v2"
6.4 gVisor
gVisor (runsc):
- User-space kernel (Sentry)
- Intercepts and reimplements system calls
- Reduces host kernel attack surface
- Operates via ptrace or KVM
- Some system calls unsupported
shim: containerd-shim-runsc-v1
config.toml:
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runsc]
runtime_type = "io.containerd.runsc.v1"
6.5 WebAssembly (Wasm)
Wasm runtime:
- Runs WebAssembly binaries as containers
- Uses Wasmtime, WasmEdge, etc.
- Very fast startup (millisecond range)
- Minimal memory usage
- Portable binaries
- Limited system access (WASI)
shim: containerd-shim-wasm
config.toml:
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.wasm]
runtime_type = "io.containerd.wasm.v1"
6.6 RuntimeClass Selection
Kubernetes RuntimeClass integration:
1. Define RuntimeClass resource:
apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
name: kata
handler: kata
2. Specify RuntimeClass in Pod:
spec:
runtimeClassName: kata
containers:
- name: app
image: nginx
3. containerd selects runtime matching handler:
handler "kata" -> containerd.runtimes.kata config
-> execute containerd-shim-kata-v2
7. Summary
containerd container lifecycle management revolves around the separation of Container (metadata) and Task (execution), process isolation through shims, and support for diverse runtime classes. The shim's daemonized design ensures containers survive containerd restarts, while the standardized OCI runtime spec interface integrates runtimes like runc, Kata, gVisor, and Wasm seamlessly.