Skip to content

필사 모드: [containerd] Image Management: OCI Images and Snapshots

English
0%
정확도 0%
💡 왼쪽 원문을 읽으면서 오른쪽에 따라 써보세요. Tab 키로 힌트를 받을 수 있습니다.
원문 렌더가 준비되기 전까지 텍스트 가이드로 표시합니다.

containerd Image Management: OCI Images and Snapshots

The containerd image management subsystem handles storing, distributing, and unpacking images based on the OCI image spec. This post analyzes the internals of Content Store, Snapshotter, image pull flow, and garbage collection.

1. OCI Image Spec

1.1 Image Structure

An OCI image consists of three core components:

OCI image structure:

1. Image Index (Fat Manifest)

- Supports multiple platforms (linux/amd64, linux/arm64, etc.)

- Points to per-platform Manifests

2. Image Manifest

- Digest of Config object

- Layer list (ordered)

- Media type information

3. Image Config

- Environment variables, entrypoint, CMD

- Layer diff ID list

- Creation history

1.2 Content Addressable Storage

Content Addressable Storage:

All objects are identified by SHA256 digest:

sha256:abc123... -> Image Index JSON

sha256:def456... -> Image Manifest JSON

sha256:789ghi... -> Image Config JSON

sha256:jkl012... -> Layer tar.gz

Benefits:

- Deduplication: identical layers stored only once

- Integrity verification: validate data via digest

- Caching: digest-based cache lookups

2. Content Store

2.1 Overview

Content Store is containerd's content-addressable storage that manages all binary data for images.

Content Store directory structure:

/var/lib/containerd/io.containerd.content.v1.content/

blobs/

sha256/

abc123... (Image Index)

def456... (Image Manifest)

789ghi... (Image Config)

jkl012... (Layer 1 tar.gz)

mno345... (Layer 2 tar.gz)

ingest/

(temporary download data)

2.2 Content Store API

Content Store key operations:

Info(digest) -> Query content metadata (size, creation time)

ReaderAt(digest) -> Read content (io.ReaderAt interface)

Writer(ref) -> Write content (atomic commit)

Delete(digest) -> Delete content

ListStatuses() -> Query in-progress writes

Abort(ref) -> Cancel in-progress write

2.3 Ingest Process

Content write (Ingest) process:

1. Create Writer (assign reference key)

|

v

2. Create temporary file in ingest/ directory

|

v

3. Stream data writes

(e.g., downloading layers from registry)

|

v

4. Digest verification

(compare expected digest with actual data hash)

|

v

5. Atomic commit

(move from ingest/ -> blobs/sha256/)

|

v

6. Clean up ingest/ on failure

3. Snapshotter

3.1 Snapshotter Overview

The Snapshotter is a plugin that manages image layers as filesystem snapshots, preparing the root filesystem for containers.

Snapshotter role:

Image layers (tar.gz)

|

v

Snapshotter converts each layer into a snapshot

|

v

Stacks snapshots to form a unified filesystem

|

v

Provides mount point to container

3.2 Snapshot Types

Snapshot types:

1. Committed

- Read-only snapshot

- Corresponds to image layers

- Sharable across multiple containers

2. Active

- Read/write snapshot

- Writable layer for a container

- Assigned to a single container

3.3 overlayfs Snapshotter

The most widely used Snapshotter:

overlayfs operation:

Layer 1 (base): /snapshots/1/fs (lowerdir)

Layer 2 (app): /snapshots/2/fs (lowerdir)

Write layer: /snapshots/3/fs (upperdir)

Work directory: /snapshots/3/work (workdir)

Mount:

mount -t overlay overlay \

-o lowerdir=/snapshots/2/fs:/snapshots/1/fs,\

upperdir=/snapshots/3/fs,\

workdir=/snapshots/3/work \

/container/rootfs

Benefits:

- Copy-on-Write: copies only on modification

- Fast container startup

- Layer sharing saves disk space

3.4 native Snapshotter

native Snapshotter:

- Stores each snapshot in an independent directory

- Fully copies parent snapshot (using hardlinks)

- Used in environments without overlayfs support

- Higher disk usage

- Simple and highly portable

3.5 devmapper Snapshotter

devmapper Snapshotter:

- Uses Linux device mapper thin provisioning

- Block-level Copy-on-Write

- Suited for high-performance workloads

- Used with Firecracker microVMs

- Complex setup (requires thin-pool pre-configuration)

Use cases:

- AWS Fargate (Firecracker)

- High-performance container environments

- Block storage-based infrastructure

3.6 Snapshotter API

Snapshotter key operations:

Stat(key) -> Query snapshot info

Prepare(key, parent) -> Create Active snapshot (writable)

View(key, parent) -> Read-only view of Committed snapshot

Commit(name, key) -> Convert Active snapshot to Committed

Mounts(key) -> Return mount info for snapshot

Remove(key) -> Delete snapshot

4. Image Pull Flow

4.1 Complete Flow

Image pull complete flow:

1. Resolve image reference

docker.io/library/nginx:latest

|

v

2. Download Image Index/Manifest

- Fetch manifest from registry

- Select manifest for target platform

|

v

3. Download Config

- Download image config JSON

- Store in Content Store

|

v

4. Download layers (parallel)

- Store each layer in Content Store

- Skip already existing layers

|

v

5. Unpack layers

- Read layers from Content Store

- Create snapshots via Snapshotter

|

v

6. Register image metadata

- Create image record in BoltDB

- Map tags to digests

4.2 Layer Download Details

Layer download:

1. Extract layer digest list from manifest

2. Check if already exists in Content Store

3. Download only missing layers from registry

4. Transfer Service manages downloads:

- Concurrent download limit (default 3)

- Progress tracking

- Retry logic

5. Each layer stored gzip-compressed in Content Store

4.3 Layer Unpacking

Layer unpacking:

1. Read layer blob from Content Store

2. Decompress gzip

3. Extract tar archive

4. Create snapshot in Snapshotter:

a. First layer: Prepare without parent

b. Apply layer contents to snapshot

c. Commit to convert to read-only

d. Next layer: Prepare with previous snapshot as parent

5. Complete final snapshot chain

Snapshot chain:

Layer 1 (committed) <- Layer 2 (committed) <- Layer 3 (committed)

5. Image Metadata

5.1 Image Record

Image metadata (BoltDB):

Image record:

- Name: "docker.io/library/nginx:latest"

- Target:

MediaType: "application/vnd.oci.image.index.v1+json"

Digest: "sha256:abc123..."

Size: 1234

- Labels:

"containerd.io/gc.ref.content.0": "sha256:def456..."

"containerd.io/gc.ref.content.1": "sha256:789ghi..."

- CreatedAt: 2026-03-20T00:00:00Z

- UpdatedAt: 2026-03-20T00:00:00Z

5.2 Querying Images

List images with ctr

ctr -n k8s.io images list

Detailed image info

ctr -n k8s.io images check

Inspect image content

ctr -n k8s.io content get sha256:abc123... | jq .

6. Garbage Collection

6.1 GC Mechanism

Garbage collection operation:

1. Identify root objects:

- Image records

- Container records

- Lease records

2. Trace references (Mark):

- Image -> Manifest -> Config + Layers

- Container -> Snapshot chain

- Lease -> Protected resources

3. Delete unreferenced objects (Sweep):

- Delete unreferenced blobs from Content Store

- Delete unreferenced snapshots from Snapshotter

- Clean up orphaned metadata records

6.2 GC Labels

GC reference labels:

containerd manages GC references via labels:

Image labels:

"containerd.io/gc.ref.content.0": "sha256:..." (manifest reference)

"containerd.io/gc.ref.content.1": "sha256:..." (layer reference)

Content labels:

"containerd.io/gc.ref.content.config": "sha256:..." (config reference)

"containerd.io/gc.ref.content.l.0": "sha256:..." (layer reference)

Snapshot labels:

"containerd.io/gc.ref.snapshot.overlayfs": "sha256:..." (snapshot reference)

6.3 Lease

Lease:

- Protects in-progress operation resources from GC

- Protects downloaded layers during image pull

- Protects snapshots during container creation

- TTL-based automatic expiration

- Can be explicitly deleted after operation completes

Example:

Image pull starts -> Lease created

Layer download -> Lease protects content

Image registration complete -> Lease deleted (image record holds references)

6.4 GC Scheduling

GC triggers:

1. Periodic execution:

- gc_schedule in containerd config (no default)

- When configured, uses cron expression for scheduling

2. Event-based:

- On image deletion

- On container deletion

- Explicit API call

3. Manual execution via ctr:

ctr -n k8s.io content prune

7. Summary

containerd image management is built on three pillars: Content Store's content-addressable storage, Snapshotter's layer management, and GC's resource cleanup. The overlayfs Snapshotter's Copy-on-Write mechanism enables fast container startup and efficient disk usage, while Lease-based GC protection ensures image operation safety.

현재 단락 (1/231)

The containerd image management subsystem handles storing, distributing, and unpacking images based ...

작성 글자: 0원문 글자: 7,316작성 단락: 0/231