Skip to content
Published on

[containerd] Networking and Storage

Authors

containerd Networking and Storage

containerd does not implement networking and storage directly but integrates with external plugins through standard interfaces. This post analyzes network configuration via CNI, namespace management, volume mounts, device access, and security module integration.


1. CNI Integration

1.1 CNI Overview

Container Network Interface (CNI) is the standard interface for container networking. containerd calls CNI plugins to configure networks.

CNI call flow:

kubelet -> containerd (CRI RunPodSandbox)
                |
                v
        Create network namespace
                |
                v
        Call CNI plugin
        (ADD command)
                |
                v
        IP allocation, routing setup, interface creation
                |
                v
        Return result to containerd

1.2 CNI Configuration

CNI configuration file location:
  Config directory: /etc/cni/net.d/
  Binary directory: /opt/cni/bin/

containerd CNI configuration (config.toml):
  [plugins."io.containerd.grpc.v1.cri".cni]
    bin_dir = "/opt/cni/bin"
    conf_dir = "/etc/cni/net.d"
    max_conf_num = 1

1.3 CNI Plugin Chain

CNI config example (10-calico.conflist):

Network configuration is defined as a plugin chain:

1. Main plugin (calico, cilium, flannel, etc.):
   - Create network interface
   - IP allocation (IPAM)
   - Routing rule setup

2. Meta plugin (bandwidth, portmap, etc.):
   - Bandwidth limiting
   - Port mapping
   - Firewall rules

Execution order:
  ADD: Main -> Meta plugins (forward)
  DEL: Meta -> Main plugins (reverse)

1.4 CNI Call Details

CNI ADD execution detail:

1. containerd determines network namespace path
   /var/run/netns/cni-abc123

2. Set CNI environment variables:
   CNI_COMMAND=ADD
   CNI_CONTAINERID=abc123
   CNI_NETNS=/var/run/netns/cni-abc123
   CNI_IFNAME=eth0
   CNI_PATH=/opt/cni/bin

3. Execute CNI plugin binary
   Pass config JSON via stdin

4. Plugin returns result via stdout:
   - Assigned IP address
   - Gateway address
   - DNS configuration
   - Routing information

5. containerd stores the result

2. Network Namespaces

2.1 Namespace Creation

Pod network namespace:

During Pod Sandbox creation:
1. Create new network namespace with unshare(CLONE_NEWNET)
2. Persist via bind mount at /var/run/netns/
3. Execute CNI plugins in this namespace
4. All containers in the Pod share this namespace

Namespace sharing:
  Pause container holds the network namespace
  App containers join the same namespace
  -> Containers in Pod can communicate via localhost

2.2 Namespace Cleanup

Namespace cleanup:

During Pod deletion:
1. CNI DEL command releases network resources
   - Return IP address
   - Delete interface
   - Remove routing rules
2. Unmount bind mount from /var/run/netns/
3. Network namespace automatically deleted

3. Volume Mounts

3.1 Mount Types

containerd manages volumes through mount configuration in the OCI spec:

Mount types:

1. bind mount:
   - Mount host file/directory into container
   - Host and container share the same data
   - Used for ConfigMap, Secret, emptyDir, etc.

2. tmpfs mount:
   - Memory-based filesystem
   - Data lost on container termination
   - Used for /dev/shm, /run, etc.

3. Special filesystems:
   - proc: /proc
   - sysfs: /sys
   - cgroup: /sys/fs/cgroup
   - devpts: /dev/pts

3.2 Mount Propagation

Mount propagation options:

1. private:
   - No mount event propagation
   - Default

2. rprivate:
   - Recursive private

3. shared:
   - Bidirectional mount event propagation
   - Mount on host -> visible in container
   - Mount in container -> visible on host

4. rshared:
   - Recursive shared

5. slave:
   - Host -> container unidirectional propagation
   - Useful for volume plugins

6. rslave:
   - Recursive slave

Kubernetes usage:
  - Controlled via MountPropagation field
  - CSI drivers typically use Bidirectional (shared)

3.3 CRI Volume Processing

Volume processing via CRI:

kubelet adds mounts to OCI spec:

1. emptyDir:
   - kubelet creates directory on host
   - Passed to container as bind mount

2. hostPath:
   - Direct bind mount of host path

3. ConfigMap/Secret:
   - kubelet creates data on tmpfs
   - Passed to container as bind mount

4. PersistentVolumeClaim:
   - kubelet mounts volume via CSI driver
   - Mounted path passed as bind mount

containerd's role:
  - Reflect kubelet-prepared mount info in OCI spec
  - runc performs the actual mount

4. Device Access

4.1 Device Mapping

Device access mechanism:

OCI spec devices section:
  linux:
    devices:
      - path: "/dev/nvidia0"
        type: "c"
        major: 195
        minor: 0
        fileMode: 438
        uid: 0
        gid: 0

Cgroup device access control:
  linux:
    resources:
      devices:
        - allow: true
          type: "c"
          major: 195
          access: "rwm"

4.2 GPU Support

GPU access (NVIDIA):

NVIDIA Container Toolkit integration:

1. nvidia-container-runtime-hook:
   - Operates as OCI runtime hook
   - Runs before container start
   - Mounts NVIDIA driver libraries into container
   - Adds GPU device nodes to container

2. CDI (Container Device Interface):
   - Device vendor-neutral standard
   - Define device specs in /etc/cdi/
   - containerd reads CDI specs and reflects in OCI spec

CDI spec example:
  cdiVersion: "0.5.0"
  kind: "nvidia.com/gpu"
  devices:
    - name: "0"
      containerEdits:
        deviceNodes:
          - path: "/dev/nvidia0"
        mounts:
          - hostPath: "/usr/lib/x86_64-linux-gnu/libnvidia-ml.so"
            containerPath: "/usr/lib/x86_64-linux-gnu/libnvidia-ml.so"

4.3 Other Devices

Other device access:

1. FPGA:
   - Expose FPGA devices via CDI specs
   - Vendor-specific device plugins

2. InfiniBand/RDMA:
   - Map /dev/infiniband/* devices
   - Share network device namespace

3. Serial/USB:
   - Direct host device mapping
   - Privileged mode or explicit device allowlist

5. SELinux Integration

5.1 SELinux Context

SELinux container security:

SELinux settings in OCI spec:
  linux:
    mountLabel: "system_u:object_r:container_file_t:s0:c1,c2"
    processLabel: "system_u:system_r:container_t:s0:c1,c2"

Components:
  - user: system_u
  - role: system_r (process) / object_r (file)
  - type: container_t (process) / container_file_t (file)
  - level: s0:c1,c2 (MCS category)

MCS (Multi-Category Security):
  - Assigns unique categories to each container
  - Prevents access to other containers' files
  - Isolation between host and container

5.2 SELinux Processing Flow

SELinux application:

1. kubelet determines Pod SELinux options
   - securityContext.seLinuxOptions
   - Automatic MCS label assignment

2. Passed to containerd via CRI
   - processLabel: process security context
   - mountLabel: file security context

3. containerd reflects in OCI spec

4. runc applies at execution:
   - Apply SELinux label to process
   - Apply SELinux label to rootfs
   - Apply SELinux label to mounts

6. AppArmor Integration

6.1 AppArmor Profiles

AppArmor container security:

Default profile: cri-containerd.apparmor.d

Key rules:
  - Filesystem access restrictions
    deny /proc/kcore r,
    deny /sys/firmware/** r,
  - Network access control
  - Capability restrictions
  - Mount operation restrictions

Profile application:
  OCI spec:
    process:
      apparmorProfile: "cri-containerd.apparmor.d"

6.2 Custom Profiles

Custom AppArmor profiles:

1. Install profile on host:
   Place profile file in /etc/apparmor.d/
   apparmor_parser -r /etc/apparmor.d/my-profile

2. Specify in Pod:
   annotations:
     container.apparmor.security.beta.kubernetes.io/app: localhost/my-profile

3. containerd reflects in OCI spec:
   process:
     apparmorProfile: "my-profile"

7. Seccomp Integration

7.1 Seccomp Profiles

Seccomp (Secure Computing):

Define allowed/blocked system calls:

Default action: SCMP_ACT_ERRNO (deny)

Allowed system calls example:
  - read, write, open, close
  - mmap, mprotect, munmap
  - socket, connect, accept
  - ...

Blocked system calls example:
  - mount, umount (prevent container escape)
  - reboot
  - kexec_load
  - ptrace (in some environments)

7.2 Seccomp Application

Seccomp profile application:

1. Kubernetes SecurityContext:
   securityContext:
     seccompProfile:
       type: RuntimeDefault

2. RuntimeDefault profile:
   - containerd/runc default Seccomp profile
   - Blocks dangerous system calls
   - Suitable for most workloads

3. Custom profile:
   securityContext:
     seccompProfile:
       type: Localhost
       localhostProfile: "profiles/my-seccomp.json"

8. Summary

containerd networking and storage follows a delegation model through standard interfaces. Network configuration via CNI, mount management via OCI spec, device access via CDI, and security isolation via SELinux/AppArmor/Seccomp are the key pillars. This standards-based design allows containerd to flexibly integrate with various networking solutions and security modules.