Skip to content
Published on

Docker Complete Guide 2025: From Container Basics to Multi-stage Builds, Compose, and Production Deployment

Authors

Introduction

Container technology has become the backbone of modern software development. Docker is what eliminated the legendary excuse, "But it works on my machine!" Since its introduction in 2013, Docker has completely transformed the paradigm of development, testing, and deployment. As of 2025, containers serve as the fundamental unit for Kubernetes, serverless, and even AI workloads.

This guide systematically covers everything from the fundamental principles of containers, Dockerfile optimization, multi-stage builds, Docker Compose in practice, security, CI/CD pipelines, production operations, Docker alternatives, and AI workloads.


1. What Is a Container — Differences from VMs

Definition of a Container

A container packages an application and its dependencies together to run in an isolated environment. Unlike virtual machines (VMs), containers share the OS kernel, making them much lighter and faster.

Three Core Linux Technologies

Containers are a combination of three Linux kernel features:

1) Namespaces — Isolation

They limit the scope of system resources visible to a process.

NamespaceWhat It Isolates
PIDProcess IDs
NETNetwork interfaces, routing
MNTFilesystem mount points
UTSHostname, domain name
IPCInter-process communication
USERUser and group IDs
CGROUPCgroup root directory

2) Cgroups — Resource Limiting

They limit resource usage including CPU, memory, disk I/O, and network bandwidth.

# Example: limiting memory to 256MB with cgroup
sudo cgcreate -g memory:mycontainer
echo 268435456 | sudo tee /sys/fs/cgroup/memory/mycontainer/memory.limit_in_bytes

3) Union Filesystem — Layered Structure

A technology that merges multiple filesystem layers into a single view. Docker uses OverlayFS.

  • Read-Only Layers: Base image, package installations, etc.
  • Read-Write Layer: Changes made during container runtime

Container vs VM Comparison

┌─────────────────────────────────────────────────────────────┐
Container Architecture├──────────┬──────────┬──────────┬──────────┐                 │
App AApp BApp CApp D   │                 │
├──────────┼──────────┼──────────┼──────────┤                 │
Bins/Bins/Bins/Bins/   │                 │
LibsLibsLibsLibs    │                 │
├──────────┴──────────┴──────────┴──────────┤                 │
Container Runtime                │                 │
├────────────────────────────────────────────┤                 │
Host OS Kernel                │                 │
├────────────────────────────────────────────┤                 │
Infrastructure                │                 │
└────────────────────────────────────────────┘                 │
└─────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────┐
VM Architecture├──────────┬──────────┬──────────┬──────────┐                 │
App AApp BApp CApp D   │                 │
├──────────┼──────────┼──────────┼──────────┤                 │
Bins/Bins/Bins/Bins/   │                 │
LibsLibsLibsLibs    │                 │
├──────────┼──────────┼──────────┼──────────┤                 │
Guest OSGuest OSGuest OSGuest OS │                 │
├──────────┴──────────┴──────────┴──────────┤                 │
Hypervisor                    │                 │
├────────────────────────────────────────────┤                 │
Host OS                       │                 │
├────────────────────────────────────────────┤                 │
Infrastructure                │                 │
└────────────────────────────────────────────┘                 │
└─────────────────────────────────────────────────────────────┘
AspectContainerVM
Start TimeMillisecondsMinutes
Image SizeMB rangeGB range
Performance OverheadNearly none10-20%
Isolation LevelProcess levelHardware level
OS SupportShared host kernelIndependent OS

OCI Standard

The OCI (Open Container Initiative) defines open standards for container images and runtimes.

  • Runtime Specification: Standardizes how containers are run
  • Image Specification: Standardizes container image format
  • Distribution Specification: Standardizes image distribution methods

Docker, Podman, and containerd all follow OCI standards, ensuring image compatibility.


2. Docker Architecture

Client-Server Model

Docker uses a client-server architecture.

┌─────────────┐     REST API     ┌─────────────────────┐
Docker     │ ──────────────>Docker DaemonCLI   (dockerd)│              │                 │                      │
│  docker run  │                 │  ┌─────────────────┐ │
│  docker build│                 │  │   containerd    │ │
│  docker pull │                 │  │                 │ │
└─────────────┘                 │  │  ┌───────────┐  │ │
                                │  │  │   runc    │  │ │
                                │  │  └───────────┘  │ │
                                │  └─────────────────┘ │
                                └─────────────────────┘
  • Docker CLI: The client where users enter commands
  • Docker Daemon (dockerd): The server process managing container lifecycle
  • containerd: An OCI-compliant container runtime manager
  • runc: The low-level runtime that actually creates and runs containers

Four Core Concepts

1) Image

A read-only template serving as the blueprint for creating containers. It has a layered structure for efficient storage and transfer.

# Image management commands
docker images                    # List local images
docker pull nginx:1.25-alpine    # Download an image
docker image inspect nginx       # Image details
docker image prune               # Clean unused images

2) Container

A running instance of an image. A writable layer is added on top of the image.

# Running and managing containers
docker run -d --name my-nginx -p 8080:80 nginx:1.25-alpine
docker ps                        # List running containers
docker logs my-nginx             # View logs
docker exec -it my-nginx /bin/sh # Access container shell
docker stop my-nginx             # Stop
docker rm my-nginx               # Remove

3) Volume

A mechanism for persisting container data.

# Creating and using volumes
docker volume create my-data
docker run -d -v my-data:/app/data my-app
docker run -d -v /host/path:/container/path my-app  # Bind mount

4) Network

Manages communication between containers.

# Creating and managing networks
docker network create my-network
docker run -d --network my-network --name api my-api
docker run -d --network my-network --name db postgres
# The api container can access db at db:5432

3. Dockerfile Master Class

Key Instructions Explained

# Select base image
FROM node:20-alpine

# Add metadata
LABEL maintainer="dev@example.com"
LABEL version="1.0"

# Build-time variable (only available during build)
ARG NODE_ENV=production

# Environment variable (available at runtime too)
ENV NODE_ENV=$NODE_ENV
ENV PORT=3000

# Set working directory
WORKDIR /app

# Copy files (COPY is recommended)
COPY package*.json ./

# Execute command
RUN npm ci --only=production

# Copy source code
COPY . .

# Document port (does not actually open the port)
EXPOSE 3000

# Container health check
HEALTHCHECK --interval=30s --timeout=3s --retries=3 \
  CMD wget --no-verbose --tries=1 --spider http://localhost:3000/health || exit 1

# Switch to non-root user (security)
USER node

# Command to run when container starts
ENTRYPOINT ["node"]
CMD ["server.js"]

COPY vs ADD Differences

FeatureCOPYADD
Copy local filesYesYes
URL downloadNoYes
Auto-extract tarNoYes
RecommendedYesOnly special cases

Conclusion: Use COPY in most cases. Use ADD only when automatic tar extraction is needed.

CMD vs ENTRYPOINT Differences

# CMD only - can be fully overridden at docker run
CMD ["python", "app.py"]
# docker run my-image python test.py  (CMD replaced with test.py)

# ENTRYPOINT only - always executes
ENTRYPOINT ["python"]
# docker run my-image app.py  (runs python app.py)

# ENTRYPOINT + CMD combination (most flexible)
ENTRYPOINT ["python"]
CMD ["app.py"]
# docker run my-image         -> python app.py
# docker run my-image test.py -> python test.py

ARG vs ENV

# ARG: exists only during build (not at runtime)
ARG BUILD_VERSION=1.0
RUN echo "Building version: $BUILD_VERSION"

# ENV: exists during build + runtime
ENV APP_VERSION=1.0
# APP_VERSION is available when container runs

Dockerfile Optimization Best Practices

1) Layer Order Determines Cache Behavior

# Bad: npm install re-runs on every source change
COPY . .
RUN npm install

# Good: npm install only re-runs when package.json changes
COPY package*.json ./
RUN npm ci
COPY . .

2) Combine RUN Instructions

# Bad: Creates 3 layers
RUN apt-get update
RUN apt-get install -y curl
RUN rm -rf /var/lib/apt/lists/*

# Good: Creates 1 layer + cleans cache
RUN apt-get update && \
    apt-get install -y --no-install-recommends curl && \
    rm -rf /var/lib/apt/lists/*

3) Use .dockerignore

# .dockerignore
node_modules
.git
.env
*.md
Dockerfile
docker-compose*.yml
.github
coverage
dist

4) Choose Small Base Images

ImageSizeUse Case
ubuntu:22.04~77MBWhen full OS needed
node:20-slim~200MBMost Node.js apps
node:20-alpine~130MBLightweight Node.js
alpine:3.19~7MBMinimal base
distroless~2MBMinimal runtime only
scratch0MBStatic binaries only

4. Multi-stage Builds — Reduce Image Size by 90%

Multi-stage builds separate build tools from the final runtime environment, dramatically reducing image size.

Node.js Example: 1.2GB to 120MB

# Stage 1: Build environment
FROM node:20 AS builder
WORKDIR /app

COPY package*.json ./
RUN npm ci

COPY . .
RUN npm run build

# Stage 2: Production environment
FROM node:20-alpine AS production
WORKDIR /app

# Install production dependencies only
COPY package*.json ./
RUN npm ci --only=production && npm cache clean --force

# Copy only build artifacts
COPY --from=builder /app/dist ./dist

# Security: non-root user
USER node

EXPOSE 3000
CMD ["node", "dist/server.js"]

Result: Instead of the full node:20 image (~1.1GB), using Alpine-based (~130MB) + excluding build tools = final 120MB

Go Example: 800MB to 15MB

# Stage 1: Build
FROM golang:1.22-alpine AS builder
WORKDIR /app

COPY go.mod go.sum ./
RUN go mod download

COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -ldflags="-s -w" -o /app/server .

# Stage 2: Minimal image
FROM scratch
COPY --from=builder /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/
COPY --from=builder /app/server /server

EXPOSE 8080
ENTRYPOINT ["/server"]

Result: Extract only the static binary from Go build environment (~800MB) for under 15MB

Java/Spring Boot Example: 400MB to 80MB

# Stage 1: Build
FROM eclipse-temurin:21-jdk AS builder
WORKDIR /app

COPY gradlew build.gradle settings.gradle ./
COPY gradle ./gradle
RUN ./gradlew dependencies --no-daemon

COPY src ./src
RUN ./gradlew bootJar --no-daemon

# Stage 2: Runtime
FROM eclipse-temurin:21-jre-alpine
WORKDIR /app

COPY --from=builder /app/build/libs/*.jar app.jar

RUN addgroup -S spring && adduser -S spring -G spring
USER spring:spring

EXPOSE 8080
ENTRYPOINT ["java", "-jar", "app.jar"]

Python Example

# Stage 1: Dependency build
FROM python:3.12-slim AS builder
WORKDIR /app

RUN pip install --no-cache-dir poetry
COPY pyproject.toml poetry.lock ./
RUN poetry export -f requirements.txt -o requirements.txt --without-hashes
RUN pip install --no-cache-dir --prefix=/install -r requirements.txt

# Stage 2: Runtime
FROM python:3.12-slim
WORKDIR /app

COPY --from=builder /install /usr/local
COPY . .

RUN useradd --create-home appuser
USER appuser

EXPOSE 8000
CMD ["gunicorn", "--bind", "0.0.0.0:8000", "app.main:app"]

Build Result Summary

LanguageSingle StageMulti-stageReduction
Node.js1.2GB120MB90%
Go800MB15MB98%
Java400MB80MB80%
Python900MB150MB83%

5. Docker Compose in Practice

docker-compose.yml Basic Structure

Docker Compose defines and manages multiple containers in a single YAML file.

# docker-compose.yml
version: '3.9'

services:
  api:
    build:
      context: ./api
      dockerfile: Dockerfile
    ports:
      - '3000:3000'
    environment:
      - NODE_ENV=production
      - DATABASE_URL=postgresql://postgres:secret@db:5432/myapp
      - REDIS_URL=redis://redis:6379
    depends_on:
      db:
        condition: service_healthy
      redis:
        condition: service_started
    healthcheck:
      test: ['CMD', 'wget', '--spider', 'http://localhost:3000/health']
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 40s
    restart: unless-stopped
    networks:
      - app-network

  db:
    image: postgres:16-alpine
    volumes:
      - postgres-data:/var/lib/postgresql/data
      - ./init-scripts:/docker-entrypoint-initdb.d
    environment:
      POSTGRES_DB: myapp
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: secret
    ports:
      - '5432:5432'
    healthcheck:
      test: ['CMD-SHELL', 'pg_isready -U postgres']
      interval: 10s
      timeout: 5s
      retries: 5
    networks:
      - app-network

  redis:
    image: redis:7-alpine
    command: redis-server --appendonly yes --maxmemory 256mb
    volumes:
      - redis-data:/data
    ports:
      - '6379:6379'
    networks:
      - app-network

  nginx:
    image: nginx:1.25-alpine
    ports:
      - '80:80'
      - '443:443'
    volumes:
      - ./nginx/nginx.conf:/etc/nginx/nginx.conf:ro
      - ./nginx/ssl:/etc/nginx/ssl:ro
    depends_on:
      - api
    networks:
      - app-network

volumes:
  postgres-data:
    driver: local
  redis-data:
    driver: local

networks:
  app-network:
    driver: bridge

Environment Variables and Secrets Management

# Using .env files
services:
  api:
    env_file:
      - .env
      - .env.production

# Docker Secrets (Swarm mode)
services:
  api:
    secrets:
      - db_password
      - api_key

secrets:
  db_password:
    file: ./secrets/db_password.txt
  api_key:
    external: true

Essential Compose Commands

# Start services (background)
docker compose up -d

# View service logs
docker compose logs -f api

# Scale services
docker compose up -d --scale api=3

# Rebuild + restart services
docker compose up -d --build

# Stop and clean up resources
docker compose down

# Delete everything including volumes
docker compose down -v

# Restart specific service
docker compose restart api

# Check running service status
docker compose ps

Separating Development and Production Environments

# docker-compose.yml (shared)
services:
  api:
    build: ./api

# docker-compose.override.yml (development - auto-loaded)
services:
  api:
    volumes:
      - ./api:/app    # Mount for hot reload
    environment:
      - DEBUG=true
    command: npm run dev

# docker-compose.prod.yml (production)
services:
  api:
    environment:
      - NODE_ENV=production
    deploy:
      replicas: 3
      resources:
        limits:
          memory: 512M
          cpus: "0.5"
# Development (auto-loads override file)
docker compose up

# Production
docker compose -f docker-compose.yml -f docker-compose.prod.yml up -d

6. Networking Deep Dive

Docker Network Drivers

DriverDescriptionUse Case
bridgeDefault driver, intra-host container communicationSingle-host development
hostUses host network stack directlyPerformance-critical scenarios
overlayMulti-host communicationSwarm, multi-host setups
macvlanAssigns real MAC address to containersLegacy network integration
noneDisables networkingSecurity isolation

Container DNS Resolution

Containers on the same Docker network can discover each other by service name.

# Create custom network
docker network create my-app

# Run containers on the same network
docker run -d --name api --network my-app my-api-image
docker run -d --name db --network my-app postgres

# From inside the api container, access db container
# Hostname "db" is automatically resolved
curl http://db:5432

Port Mapping Strategies

# Specific port mapping
docker run -p 8080:80 nginx

# Bind to specific interface only
docker run -p 127.0.0.1:8080:80 nginx

# Random host port assignment
docker run -p 80 nginx

# Multiple port mappings
docker run -p 80:80 -p 443:443 nginx

Service Discovery Patterns

# Automatic service discovery in Docker Compose
services:
  api-gateway:
    image: nginx
    depends_on:
      - user-service
      - order-service

  user-service:
    image: my-user-service
    # Accessible from api-gateway at http://user-service:3000

  order-service:
    image: my-order-service
    # Accessible from api-gateway at http://order-service:3000

7. Security Best Practices

1) Run as Non-root User

# Node.js - use built-in node user
FROM node:20-alpine
WORKDIR /app
COPY --chown=node:node . .
RUN npm ci --only=production
USER node
CMD ["node", "server.js"]

# Custom user creation
FROM python:3.12-slim
RUN groupadd -r appgroup && useradd -r -g appgroup appuser
WORKDIR /app
COPY --chown=appuser:appgroup . .
USER appuser
CMD ["python", "app.py"]

2) Read-only Filesystem

docker run --read-only --tmpfs /tmp --tmpfs /var/run my-app
# In Docker Compose
services:
  api:
    read_only: true
    tmpfs:
      - /tmp
      - /var/run

3) Image Security Scanning

# Scan vulnerabilities with Trivy
trivy image my-app:latest

# Filter by severity
trivy image --severity HIGH,CRITICAL my-app:latest

# Docker Scout (built into Docker Desktop)
docker scout cves my-app:latest
docker scout recommendations my-app:latest

# Snyk
snyk container test my-app:latest

4) Secrets Management (Never Put Them in Dockerfile!)

# NEVER do this
ENV API_KEY=sk-1234567890
COPY .env /app/.env

# Correct approach: mount secrets during build (BuildKit)
RUN --mount=type=secret,id=npmrc,target=/root/.npmrc npm ci
# Pass secrets during build
DOCKER_BUILDKIT=1 docker build --secret id=npmrc,src=.npmrc -t my-app .

5) Rootless Docker

# Install Docker in rootless mode
dockerd-rootless-setuptool.sh install

# Verify
docker info | grep -i rootless
# Security Options: rootless

6) Content Trust and Image Signing

# Enable Docker Content Trust
export DOCKER_CONTENT_TRUST=1

# Only signed images can be pulled
docker pull my-registry/my-app:latest  # Fails without signature

# Sign images with Cosign (Sigstore)
cosign sign my-registry/my-app:latest
cosign verify my-registry/my-app:latest

7) Security Checklist

[Security Checklist]
- Run as non-root user (USER directive)
- Use minimal base images (Alpine, distroless, scratch)
- Regularly scan images for vulnerabilities (Trivy, Snyk, Scout)
- NEVER include secrets/credentials in Dockerfile
- Use read-only filesystem
- Minimize network exposure
- Set resource limits (memory, CPU)
- Enable Docker Content Trust
- Regularly update base images
- Remove unnecessary packages/tools

8. CI/CD Pipeline

GitHub Actions + Docker Build/Push

# .github/workflows/docker-build.yml
name: Docker Build and Push

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

env:
  REGISTRY: ghcr.io
  IMAGE_NAME: my-org/my-app

jobs:
  build-and-push:
    runs-on: ubuntu-latest
    permissions:
      contents: read
      packages: write

    steps:
      - name: Checkout
        uses: actions/checkout@v4

      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v3

      - name: Login to Container Registry
        uses: docker/login-action@v3
        with:
          registry: ghcr.io
          username: my-username
          password: my-token

      - name: Extract metadata
        id: meta
        uses: docker/metadata-action@v5
        with:
          images: ghcr.io/my-org/my-app
          tags: |
            type=ref,event=branch
            type=sha,prefix=
            type=semver,pattern=v{{version}}

      - name: Build and Push
        uses: docker/build-push-action@v5
        with:
          context: .
          push: true
          tags: steps.meta.outputs.tags
          labels: steps.meta.outputs.labels
          cache-from: type=gha
          cache-to: type=gha,mode=max
          platforms: linux/amd64,linux/arm64

Multi-platform Builds (amd64 + arm64)

# Create Buildx builder
docker buildx create --name multiarch --use

# Build and push for multiple platforms
docker buildx build \
  --platform linux/amd64,linux/arm64 \
  --tag my-registry/my-app:latest \
  --push .

Layer Caching in CI

# GitHub Actions Cache
- name: Build with cache
  uses: docker/build-push-action@v5
  with:
    cache-from: type=gha
    cache-to: type=gha,mode=max

# Registry-based cache
- name: Build with registry cache
  uses: docker/build-push-action@v5
  with:
    cache-from: type=registry,ref=ghcr.io/my-org/my-app:buildcache
    cache-to: type=registry,ref=ghcr.io/my-org/my-app:buildcache,mode=max

Container Registry Comparison

RegistryFree TierFeatures
Docker Hub1 private repoLargest public registry
GitHub Container Registry (GHCR)Unlimited publicGitHub Actions integration
Amazon ECR500MB free tierAWS service integration
Google Artifact Registry500MB free tierGCP service integration
Azure Container RegistryNoneAzure service integration

9. Production Operations

Logging Strategies

# Set container log driver
docker run -d \
  --log-driver=json-file \
  --log-opt max-size=10m \
  --log-opt max-file=3 \
  my-app

# Syslog driver
docker run -d \
  --log-driver=syslog \
  --log-opt syslog-address=tcp://logserver:514 \
  my-app
# Logging in Docker Compose
services:
  api:
    logging:
      driver: json-file
      options:
        max-size: '10m'
        max-file: '3'

Logging Best Practice: Always output logs to stdout/stderr. Never write directly to files. Docker collects logs through the log driver.

Monitoring: Prometheus + cAdvisor

# docker-compose.monitoring.yml
services:
  prometheus:
    image: prom/prometheus:latest
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
    ports:
      - '9090:9090'

  cadvisor:
    image: gcr.io/cadvisor/cadvisor:latest
    volumes:
      - /:/rootfs:ro
      - /var/run:/var/run:ro
      - /sys:/sys:ro
      - /var/lib/docker/:/var/lib/docker:ro
    ports:
      - '8080:8080'

  grafana:
    image: grafana/grafana:latest
    volumes:
      - grafana-data:/var/lib/grafana
    ports:
      - '3000:3000'
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=admin

volumes:
  grafana-data:

Resource Limits

# Memory limit: 512MB (OOM Kill on exceed)
docker run -m 512m my-app

# CPU limit: 1.5 cores
docker run --cpus="1.5" my-app

# Combined
docker run -m 512m --cpus="1.5" --memory-swap=1g my-app
# In Docker Compose
services:
  api:
    deploy:
      resources:
        limits:
          memory: 512M
          cpus: '1.5'
        reservations:
          memory: 256M
          cpus: '0.5'

Graceful Shutdown (SIGTERM Handling)

// Node.js example
process.on('SIGTERM', async () => {
  console.log('SIGTERM received. Graceful shutdown...')

  // Reject new requests
  server.close(async () => {
    // Close DB connections
    await db.close()
    // Close Redis connections
    await redis.quit()

    console.log('All connections closed. Exiting.')
    process.exit(0)
  })

  // Force exit after 30 seconds
  setTimeout(() => {
    console.error('Forced shutdown after timeout')
    process.exit(1)
  }, 30000)
})
# Correct ENTRYPOINT format in Dockerfile
# Use exec form (SIGTERM delivered directly to app)
ENTRYPOINT ["node", "server.js"]

# Do NOT use shell form (SIGTERM only delivered to sh)
# ENTRYPOINT node server.js  # Don't do this

Health Check Patterns

# Health check in Dockerfile
HEALTHCHECK --interval=30s --timeout=3s --start-period=30s --retries=3 \
  CMD curl -f http://localhost:3000/health || exit 1
# Health check in Docker Compose
services:
  api:
    healthcheck:
      test: ['CMD', 'curl', '-f', 'http://localhost:3000/health']
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 40s

  db:
    healthcheck:
      test: ['CMD-SHELL', 'pg_isready -U postgres']
      interval: 10s
      timeout: 5s
      retries: 5

10. Docker Alternatives: Podman, containerd, nerdctl

Podman — Rootless, Daemonless

Podman is a Docker-compatible container engine developed by Red Hat.

# Nearly identical CLI to Docker
podman run -d -p 8080:80 nginx
podman ps
podman images

# Alias for Docker compatibility
alias docker=podman
FeatureDockerPodman
ArchitectureDaemon-based (dockerd)Daemonless (fork model)
Root RequiredRequired by defaultRootless by default
Compose Supportdocker composepodman-compose
System IntegrationProprietarysystemd integration
Pod SupportNoYes (Kubernetes Pod compatible)

containerd — Kubernetes Default Runtime

containerd is a container runtime extracted from Docker and is the default runtime for Kubernetes.

# ctr - containerd native CLI (low-level)
ctr images pull docker.io/library/nginx:latest
ctr run docker.io/library/nginx:latest my-nginx

nerdctl — Docker-compatible CLI for containerd

# Same UX as Docker
nerdctl run -d -p 8080:80 nginx
nerdctl compose up -d
nerdctl build -t my-app .

# Extra feature: image encryption
nerdctl image encrypt --recipient jwe:public-key.pem my-app encrypted-app

Tool Selection Guide

ScenarioRecommended Tool
Local DevelopmentDocker Desktop or Podman
CI/CD PipelinesDocker (BuildKit)
Kubernetes Runtimecontainerd
Security-focused EnvironmentsPodman (rootless)
Docker-compatible + containerdnerdctl

11. Docker + AI Workloads

NVIDIA Container Toolkit

For AI/ML workloads, the NVIDIA Container Toolkit is needed to pass GPUs to containers.

# Install NVIDIA Container Toolkit
distribution=ubuntu22.04
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | \
  sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg

# Install packages
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit

# Configure Docker runtime
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

GPU Passthrough

# Use all GPUs
docker run --gpus all nvidia/cuda:12.3-base nvidia-smi

# Use specific GPUs only
docker run --gpus '"device=0,1"' my-ml-app

# In Docker Compose
services:
  ml-training:
    image: my-ml-app
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 2
              capabilities: [gpu]

ML Model Serving Containers

# PyTorch model serving Dockerfile
FROM pytorch/pytorch:2.2.0-cuda12.1-cudnn8-runtime

WORKDIR /app

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY model/ ./model/
COPY serve.py .

EXPOSE 8000

HEALTHCHECK --interval=30s --timeout=5s --retries=3 \
  CMD curl -f http://localhost:8000/health || exit 1

CMD ["python", "serve.py"]
# TensorFlow Serving
services:
  tf-serving:
    image: tensorflow/serving:latest-gpu
    ports:
      - '8501:8501'
    volumes:
      - ./models:/models
    environment:
      - MODEL_NAME=my_model
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]

  triton-server:
    image: nvcr.io/nvidia/tritonserver:24.01-py3
    ports:
      - '8000:8000'
      - '8001:8001'
      - '8002:8002'
    volumes:
      - ./model-repository:/models
    command: tritonserver --model-repository=/models
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]

Key ML Framework Containers

FrameworkOfficial ImageGPU Support
PyTorchpytorch/pytorchYes
TensorFlowtensorflow/tensorflowYes
NVIDIA Tritonnvcr.io/nvidia/tritonserverYes
Hugging Face TGIghcr.io/huggingface/text-generation-inferenceYes
vLLMvllm/vllm-openaiYes

Quiz

Test your understanding of Docker and containers.

Q1: What is the fundamental reason containers are lighter than VMs?

Answer: Containers share the host OS kernel. VMs have independent guest operating systems, adding gigabytes of overhead. Containers use Linux namespaces and cgroups to isolate at the process level, resulting in MB-sized images and millisecond startup times.

Q2: Explain the difference between COPY and ADD in a Dockerfile, and why COPY is recommended.

Answer: COPY provides simple functionality for copying local files/directories to the image. ADD additionally supports URL downloads and automatic tar extraction. COPY is recommended because (1) its behavior is explicit and predictable, (2) ADD's URL download can cause cache invalidation, and (3) it prevents unintended tar extraction. Use ADD only when automatic tar extraction is needed.

Q3: Why can Go images be reduced from 800MB to 15MB with multi-stage builds?

Answer: Go can compile static binaries. Building with CGO_ENABLED=0 produces a standalone executable with no external C library dependencies. Copying this onto scratch (empty image) makes the final image size equal to the binary size alone. All build tools (compiler, package manager, etc.) are eliminated, resulting in a dramatic size reduction.

Q4: Explain with an example why layer order in a Dockerfile matters for build cache.

Answer: Docker caches each layer, and when one layer changes, all subsequent layers are rebuilt. For example, in a Node.js app, if you run COPY . . followed by RUN npm install, even a single character change in source code triggers npm install again. Instead, COPY package.json first, run npm install, then COPY the source. As long as package.json is unchanged, the dependency installation layer is reused from cache, significantly reducing build time.

Q5: Why should containers run as non-root users, and how do you implement this?

Answer: If a container runs as root and a container escape vulnerability is exploited, the attacker gains root access to the host system. Use the USER directive in the Dockerfile to specify a non-root user. Node.js images have a built-in node user, while other images require creating users with RUN groupadd/useradd and then switching with the USER directive. File ownership is set using the COPY --chown option.


References

  1. Docker Official Documentation — The most reliable reference
  2. Dockerfile Best Practices — Official optimization guide
  3. Docker Compose Spec — Compose file reference
  4. OCI Specifications — Container standard specs
  5. NVIDIA Container Toolkit — GPU container guide
  6. Trivy Security Scanner — Container vulnerability scanning
  7. Podman Official Docs — Alternative container engine
  8. containerd Official Docs — Industry-standard container runtime
  9. Docker BuildKit Guide — Advanced build features
  10. Sigstore/Cosign — Container image signing
  11. cAdvisor — Container resource monitoring
  12. Distroless Images — Minimal container images
  13. Docker Security Cheat Sheet — OWASP security checklist
  14. Multi-platform Docker Builds — Cross-platform build guide
  15. Docker + AI/ML Guide — Docker GenAI Stack
  16. Hugging Face TGI — LLM model serving
  17. vLLM Project — High-performance LLM inference engine