- Published on
Docker BuildKit & Image Layers Complete Guide 2025: LLB, Cache Mount, Multi-Stage, OCI, Build Optimization Deep Dive
- Authors

- Name
- Youngju Kim
- @fjvbn20031
Introduction: The Evolution of docker build
10 Years Ago vs Today
Docker build in 2015:
$ docker build -t myapp .
Sending build context to Docker daemon 45.2MB
Step 1/12 : FROM node:14
Step 2/12 : WORKDIR /app
Step 3/12 : COPY package.json .
...
- Sequential execution: one layer at a time.
- Build context transfer: all files sent to the daemon.
- Cache breaks: a single file change forces a full rebuild.
- 5-10 minutes every time.
The same build in 2025 (BuildKit):
$ DOCKER_BUILDKIT=1 docker build -t myapp .
[+] Building 12.3s (15/15) FINISHED
=> [internal] load build definition 0.0s
=> [internal] load metadata 0.3s
=> [build 1/4] FROM node:14 0.0s (cached)
=> [build 2/4] COPY package.json . 0.1s
=> [build 3/4] RUN --mount=type=cache... npm install 8.2s
...
- Parallel execution: independent steps at the same time.
- Cache mount: reuse npm/maven/pip.
- Fine-grained caching: only changed files are invalidated.
- Seconds.
Same Dockerfile, 10x or more faster. The key difference: BuildKit.
What This Article Covers
- Docker image structure: OCI spec.
- Layers and UnionFS: the foundation of containers.
- Legacy builder vs BuildKit: what changed.
- LLB (Low-Level Builder): BuildKit's DSL.
- Multi-stage builds: small and secure.
- Cache strategies: layer, mount, registry.
- Reproducible builds.
- Security: distroless, scanning, SBOM.
- Practical optimization techniques.
1. The OCI Image Format
Open Container Initiative
The OCI (Open Container Initiative) is a Linux Foundation project defining container standards. Two major specifications:
- Runtime Spec: how to run containers.
- Image Spec: image format.
Docker, Podman, containerd, CRI-O all comply with OCI. Mutually interoperable.
Structure of an OCI Image
An OCI image is a collection of a few files:
manifest.json # Image metadata
config.json # Container configuration
layers/ # Filesystem layers
├── blob1.tar.gz
├── blob2.tar.gz
└── blob3.tar.gz
Manifest
Image manifest: a list of all components of the image.
{
"schemaVersion": 2,
"mediaType": "application/vnd.oci.image.manifest.v1+json",
"config": {
"mediaType": "application/vnd.oci.image.config.v1+json",
"digest": "sha256:abc123...",
"size": 1234
},
"layers": [
{
"mediaType": "application/vnd.oci.image.layer.v1.tar+gzip",
"digest": "sha256:def456...",
"size": 54321
},
{
"mediaType": "application/vnd.oci.image.layer.v1.tar+gzip",
"digest": "sha256:ghi789...",
"size": 123456
}
]
}
Each entry is content-addressable: identified by SHA-256 digest.
Image Config
{
"architecture": "amd64",
"os": "linux",
"config": {
"Env": ["PATH=/usr/local/sbin:..."],
"Cmd": ["node", "server.js"],
"WorkingDir": "/app",
"ExposedPorts": {
"3000/tcp": {}
}
},
"rootfs": {
"type": "layers",
"diff_ids": [
"sha256:aaa...",
"sha256:bbb..."
]
},
"history": [
{
"created": "2025-01-01T00:00:00Z",
"created_by": "/bin/sh -c #(nop) ADD file:... in /"
}
]
}
- rootfs.diff_ids: hashes of layers after decompression. Different from manifest digests.
- history: creation history of each layer.
Content Addressable Storage
Every blob is identified by a hash:
- Layer file:
sha256:def456... - Config:
sha256:abc123... - Manifest: itself a hash.
Benefits:
- Deduplication: identical layers stored only once.
- Integrity: verified by hash.
- Immutability: same hash = same content.
Image Index (Multi-arch)
Images for multiple platforms:
{
"schemaVersion": 2,
"mediaType": "application/vnd.oci.image.index.v1+json",
"manifests": [
{
"mediaType": "application/vnd.oci.image.manifest.v1+json",
"digest": "sha256:amd64...",
"platform": {
"architecture": "amd64",
"os": "linux"
}
},
{
"mediaType": "application/vnd.oci.image.manifest.v1+json",
"digest": "sha256:arm64...",
"platform": {
"architecture": "arm64",
"os": "linux"
}
}
]
}
On docker pull, the manifest matching the host platform is selected.
2. Layers and UnionFS
The Nature of Layers
Each layer of a Docker image is a tar archive holding filesystem changes (deltas).
Example:
FROM alpine:3.19 # Layer 1: base Alpine filesystem
COPY app.js /app/ # Layer 2: adds /app/app.js
RUN apk add nodejs # Layer 3: nodejs package + dependencies
RUN chmod +x /app/start.sh # Layer 4: file permission change
Each layer contains only the change from the previous state:
- Layer 1: all Alpine files.
- Layer 2: one file,
/app/app.js. - Layer 3:
/usr/bin/node, libraries, and hundreds of files. - Layer 4: just metadata for
/app/start.sh.
Union Filesystem
Union FS (OverlayFS, AUFS, Btrfs) merges multiple layers into one unified view:
merged view: /app/app.js, /usr/bin/node, ...
↑
OverlayFS
/ | \
Layer 1 Layer 2 Layer 3 (read-only)
\
container writable layer (read-write)
Behavior:
- Read: searches from the top layer down. Returns when found.
- Write: stores to the topmost writable layer (copy-on-write).
- Delete: marked with a whiteout file.
Copy-on-Write
When modifying a file:
- Read the file from the source layer.
- Copy to the writable layer.
- Modify the copy.
- The original is unchanged.
Result:
- Multiple containers share the same image.
- Each container keeps only its own changes.
- Hundreds of containers using the same image → stored once.
Layer Sharing
An important benefit: multiple images share layers.
Image A: ubuntu:22.04 + node + myapp
Image B: ubuntu:22.04 + node + other-app
Shared layers:
- ubuntu:22.04 (stored once)
- node (stored once)
Unique layers:
- myapp (A only)
- other-app (B only)
Disk savings:
- 10 images sharing the same base → 1x base + 10x app layers.
- Tens of GB becomes a few GB.
Trade-offs in Layer Count
More layers:
- Fine-grained build cache → faster builds.
- Parallel downloads on pull possible.
- But metadata overhead per layer.
Fewer layers:
- Smaller images.
- Pull may be more efficient.
- Lower cache granularity.
Practical recommendation: 10-20 layers. Not too few, not too many.
Layer Size Limits
Technically no limit. However:
- Single layer > 10 GB: problems possible.
- Total image > 5 GB: pull time issues.
- File count > 1 million: performance degradation.
In practice, a few GB or less and tens of thousands of files is appropriate.
3. Legacy Builder vs BuildKit
Legacy Docker Builder
Docker's pre-2017 build method:
Dockerfile → Docker Daemon → Sequential steps
Problems:
- Sequential only: independent steps also run sequentially.
- Client-server architecture: all files transferred to the daemon.
- Monolithic: no build graph.
- Limited caching: layer-level only.
- Hard to extend.
The Arrival of BuildKit
Released in Docker 18.06 in 2017. Now the default in Docker Desktop. Used by containerd, Podman, Kaniko, and other tools.
Key innovations:
- Build Graph: expresses builds as a DAG.
- Parallel execution: independent steps built simultaneously.
- Cache mount: cache directories used during builds.
- Secret mount: secrets used during builds (not left in the image).
- Multi-stage optimization: only required stages are built.
- Remote cache: registry as cache.
- Multi-platform: multiple archs at once.
Enabling BuildKit
Docker 23+: enabled by default.
Explicit enablement (older versions):
export DOCKER_BUILDKIT=1
docker build .
Config file:
// ~/.docker/config.json
{
"features": {
"buildkit": true
}
}
Performance Differences
Actual measurements of the same Dockerfile:
| Build | Legacy | BuildKit |
|---|---|---|
| Cold (no cache) | 180s | 120s |
| Warm (1 file changed) | 180s | 15s |
| Warm (independent steps) | 60s | 5s |
The difference made by BuildKit's fine-grained caching and parallelization.
4. BuildKit's LLB
Low-Level Builder
LLB (Low-Level Builder) is BuildKit's internal representation:
Dockerfile → Parser → LLB → Executor → Image
LLB is a low-level DSL expressing a build graph. Dockerfile is one of several frontends. Other frontends are possible:
- Dockerfile frontend: default.
- Buildpack frontend: Cloud Native Buildpacks.
- Custom frontend: your own DSL.
DAG-based Builds
BuildKit converts the Dockerfile into a DAG:
FROM node:18 AS base
RUN apt-get update && apt-get install -y python3
FROM base AS deps
COPY package.json .
RUN npm install
FROM base AS build
COPY . .
RUN npm run build
FROM nginx:alpine AS runtime
COPY /app/dist /usr/share/nginx/html
DAG representation:
node:18 (FROM)
│
▼
┌─── base ───┐
│ │
▼ ▼
deps build
│ │
▼ ▼
[unused] nginx:alpine (FROM)
│
▼
runtime
Note: the deps stage is not in the final image. BuildKit detects this and skips the build. The legacy builder would build it unconditionally.
Parallelism
Independent stages run in parallel:
FROM alpine AS a
RUN sleep 10
FROM alpine AS b
RUN sleep 10
FROM alpine
COPY /tmp /a
COPY /tmp /b
- Legacy: 20s (10 + 10).
- BuildKit: 10s (parallel).
Using LLB Directly
You can generate LLB with Go code:
import "github.com/moby/buildkit/client/llb"
st := llb.Image("alpine:3.19").
Run(llb.Shlex("apk add --no-cache curl")).
Root()
def, err := st.Marshal(ctx)
// Send to BuildKit
Advanced use cases: tools like Terraform and Bazel use LLB as a backend.
5. Multi-Stage Builds
Why Multi-Stage
Problem: build tools end up in the image.
# Bad example
FROM golang:1.21
WORKDIR /app
COPY . .
RUN go build -o myapp
CMD ["./myapp"]
Resulting image: ~1 GB (Go compiler, source, and dependencies all included).
Solution: multi-stage build.
# Build stage
FROM golang:1.21 AS builder
WORKDIR /app
COPY . .
RUN go build -o myapp
# Runtime stage
FROM alpine:3.19
COPY /app/myapp /myapp
CMD ["/myapp"]
Resulting image: ~10 MB (Alpine + binary only).
100x smaller. Build tools exist only in the builder stage, not in the final image.
COPY --from
Copy files from another stage or image:
FROM alpine AS tools
RUN apk add --no-cache curl jq
FROM scratch
COPY /usr/bin/curl /usr/bin/curl
COPY /usr/bin/jq /usr/bin/jq
# Only the binaries you need
FROM scratch
COPY /usr/local/bin/python3 /usr/local/bin/python3
# Works from registry images too
Stage Independence
Each stage is independent:
- Different base images.
- Different tools.
- Different purposes.
Build perspective: BuildKit detects inter-stage dependencies. Builds only what's needed.
Common Patterns
1. Compiled languages (Go, Rust, C):
FROM rust:1.75 AS builder
WORKDIR /app
COPY . .
RUN cargo build --release
FROM debian:bookworm-slim
COPY /app/target/release/myapp /usr/local/bin/
CMD ["myapp"]
2. Node.js:
FROM node:20 AS deps
WORKDIR /app
COPY package*.json ./
RUN npm ci
FROM node:20 AS build
WORKDIR /app
COPY /app/node_modules ./node_modules
COPY . .
RUN npm run build
FROM node:20-slim
WORKDIR /app
COPY /app/dist ./dist
COPY /app/node_modules ./node_modules
CMD ["node", "dist/server.js"]
3. Python:
FROM python:3.11 AS builder
COPY requirements.txt .
RUN pip install --target=/deps -r requirements.txt
FROM python:3.11-slim
COPY /deps /usr/local/lib/python3.11/site-packages
COPY . /app
WORKDIR /app
CMD ["python", "main.py"]
4. Java:
FROM maven:3.9-eclipse-temurin-21 AS builder
WORKDIR /app
COPY pom.xml .
RUN mvn dependency:go-offline
COPY src ./src
RUN mvn package -DskipTests
FROM eclipse-temurin:21-jre
COPY /app/target/*.jar /app.jar
CMD ["java", "-jar", "/app.jar"]
Stage Reuse
Reference the same stage from multiple places:
FROM alpine AS common
RUN apk add --no-cache ca-certificates
FROM common AS app1
# ...
FROM common AS app2
# ...
The common stage is built only once and shared.
6. Cache Strategies
Layer Cache (default)
The order in the Dockerfile affects caching:
FROM node:20
COPY . . # Copy everything
RUN npm install # Runs every time
Problem: changing just one line in source invalidates the COPY . cache, and npm install runs again.
Correct order:
FROM node:20
COPY package*.json ./ # Only dependency files
RUN npm install # Install dependencies
COPY . . # Rest (cache-independent)
Principle: things that rarely change on top, things that change often on the bottom.
.dockerignore
Files to exclude from the build context:
node_modules
.git
*.log
.env
dist
coverage
Effects:
- Smaller build context.
COPY .cache stability.- Prevent leaking secrets.
Cache Mount (BuildKit essential)
A cache mount is a cache directory used only during the build. It is not included in the layer.
# syntax=docker/dockerfile:1.6
FROM node:20
WORKDIR /app
COPY package*.json ./
RUN \
npm install
COPY . .
RUN npm run build
Behavior:
/root/.npmis a persistent cache. Maintained across builds.- Reuses packages npm has already downloaded.
- Not included in the image.
Performance difference:
- First build: 60s.
- Next build (cache used): 5s.
Language-specific Cache Mount Examples
Node.js (npm):
RUN \
npm ci
Node.js (pnpm):
RUN \
pnpm install --frozen-lockfile
Python (pip):
RUN \
pip install -r requirements.txt
Go:
RUN \
go build -o /app ./...
Rust (cargo):
RUN \
cargo build --release
Java (Maven):
RUN \
mvn package
Registry Cache
Store the build cache in a registry to share with other machines:
docker buildx build \
--cache-to type=registry,ref=myregistry.com/myapp:cache \
--cache-from type=registry,ref=myregistry.com/myapp:cache \
-t myapp:latest .
Useful in CI/CD:
- Shares cache across CI machines.
- Drastically reduces build times.
Inline Cache
Embed cache metadata in the image itself:
docker buildx build \
--cache-to type=inline \
--push \
-t myregistry.com/myapp:latest .
Use a pulled image directly as cache.
Git URL as Cache
Use a remote Git repository as cache:
# syntax=docker/dockerfile:1.6
FROM alpine
RUN \
...
7. Secret Management
Problem: Secrets During Build
# Bad: token remains in the image
ARG GITHUB_TOKEN
RUN git clone https://$GITHUB_TOKEN@github.com/private/repo.git
Result:
- Token exposed in
docker history. - Stored in image layers.
- Stays forever.
Secret Mount
BuildKit's secret mount:
# syntax=docker/dockerfile:1.6
FROM alpine
RUN \
TOKEN=$(cat /run/secrets/github_token) && \
git clone https://$TOKEN@github.com/private/repo.git
Build command:
echo $GITHUB_TOKEN | docker buildx build \
--secret id=github_token,src=/dev/stdin \
-t myapp .
Effect:
- Secret used only during the build.
- Not left in the image.
- Not in any layer.
SSH Mount
Pass the SSH agent socket to the build:
# syntax=docker/dockerfile:1.6
FROM alpine
RUN apk add --no-cache openssh-client git
RUN \
git clone git@github.com:private/repo.git
Build command:
docker buildx build --ssh default -t myapp .
The SSH key is never included in the image.
8. Reproducible Builds
Problem: Non-Reproducibility
Same Dockerfile, same source → different image?
FROM ubuntu:22.04
RUN apt-get update && apt-get install -y curl
Reasons:
- State of apt repos at the time of
apt-get update. - Package versions change over time.
- Local environment differences (network, cache).
Result: the image built yesterday differs from the one built today. Problematic for security audits and debugging.
Requirements for Reproducible Builds
Fully reproducible:
- Pinned base image:
FROM ubuntu:22.04@sha256:...(digest pinning). - Pinned package versions:
apt install curl=7.81.0-1ubuntu1.15. - Use lockfiles:
package-lock.json,requirements.txt(version pinning). - Deterministic builds: time-independent.
- Clean environment: no external dependencies.
Digest Pinning
# Bad: tags can move
FROM node:20
# Good: digest is immutable
FROM node:20@sha256:8f0c5a7a1d0c7b8f...
Check digest:
docker pull node:20
docker images --digests | grep node
Package Pinning
apt (Debian/Ubuntu):
RUN apt-get update && apt-get install -y \
curl=7.81.0-1ubuntu1.15 \
&& rm -rf /var/lib/apt/lists/*
pip:
requests==2.31.0
flask==3.0.0
npm:
{
"dependencies": {
"express": "4.18.2"
}
}
Plus commit package-lock.json.
SOURCE_DATE_EPOCH
Compilers such as Go and Rust embed the build time in the binary. Pin it with SOURCE_DATE_EPOCH:
ARG SOURCE_DATE_EPOCH=0
FROM golang:1.21
ENV SOURCE_DATE_EPOCH=$SOURCE_DATE_EPOCH
RUN go build -ldflags="-buildid=" -trimpath .
Verification
# Build multiple times
docker build -t myapp:1 .
docker build -t myapp:2 .
# Compare images
docker inspect myapp:1 | jq .RootFS.Layers
docker inspect myapp:2 | jq .RootFS.Layers
# If identical, it's reproducible
In practice: 100% reproducible builds are rare. Near reproducible is usually enough.
9. Image Security
Distroless Images
Google's distroless project: minimal images containing only the essential runtime.
FROM golang:1.21 AS builder
WORKDIR /app
COPY . .
RUN go build -o myapp
FROM gcr.io/distroless/base-debian12
COPY /app/myapp /
CMD ["/myapp"]
Characteristics:
- No shell: no
/bin/sh. - No package manager: no apt, apk.
- Minimal size: tens of MB.
- Minimized attack surface.
Scratch Image
Scratch: a fully empty image.
FROM golang:1.21 AS builder
WORKDIR /app
COPY . .
RUN CGO_ENABLED=0 go build -o myapp
FROM scratch
COPY /app/myapp /
CMD ["/myapp"]
Result: just the binary. A few MB.
Caveats:
- Static linking required (
CGO_ENABLED=0). - Debugging is hard (no shell).
- Copy CA certificates if you need them.
Image Scanning
Trivy (Aqua Security):
trivy image myapp:latest
- CVE scanning.
- Secret detection.
- Config file analysis.
Grype (Anchore):
grype myapp:latest
Other tools: Snyk, Clair, Docker Scout, etc.
SBOM (Software Bill of Materials)
A complete list of software in the image:
Generate:
syft myapp:latest -o spdx-json > sbom.json
Contents:
- All packages and versions.
- License info.
- Dependency tree.
Uses:
- Supply chain security.
- License compliance.
- CVE matching.
Image Signing
Sign images with cosign:
# Sign
cosign sign --key cosign.key myregistry.com/myapp:latest
# Verify
cosign verify --key cosign.pub myregistry.com/myapp:latest
Keyless signing (OIDC):
cosign sign myregistry.com/myapp:latest
# Automatic signing via GitHub Actions OIDC
Part of the Sigstore project.
10. Practical Optimization
Minimize Layers
Bad:
RUN apt-get update
RUN apt-get install -y curl
RUN apt-get install -y jq
RUN rm -rf /var/lib/apt/lists/*
Four layers. Intermediate state is preserved in layers.
Good:
RUN apt-get update && \
apt-get install -y \
curl \
jq && \
rm -rf /var/lib/apt/lists/*
One layer. A smaller image.
Remove Temporary Files
Install + clean up in the same layer:
RUN apt-get update && \
apt-get install -y build-essential && \
make && \
apt-get purge -y build-essential && \
apt-get autoremove -y && \
rm -rf /var/lib/apt/lists/*
Use .dockerignore
# Not needed for build
node_modules
.git
.github
*.log
*.md
# Security risk
.env
.env.local
**/secrets/
# Large temporary files
dist
build
coverage
.next
Effects:
- Smaller build context.
- Improved cache stability.
- Improved security.
Small Base Images
# Bad: large base
FROM node:20 # ~1.1 GB
# Good: slim
FROM node:20-slim # ~240 MB
# Better: alpine
FROM node:20-alpine # ~180 MB
# Best: distroless
FROM gcr.io/distroless/nodejs20-debian12 # ~180 MB, no shell
Layer Order Optimization
Principle: immutable → frequently changing order.
FROM node:20-alpine
# 1. System dependencies (rarely change)
RUN apk add --no-cache libc6-compat
# 2. Package manager config (rarely change)
WORKDIR /app
COPY package.json package-lock.json ./
# 3. Install dependencies (only when package.json changes)
RUN \
npm ci --only=production
# 4. Application code (frequently changes)
COPY . .
# 5. Build (only when code changes)
RUN npm run build
CMD ["node", "dist/server.js"]
Multi-Platform Build
Build for multiple platforms at once with buildx:
docker buildx create --use
docker buildx build \
--platform linux/amd64,linux/arm64 \
-t myregistry.com/myapp:latest \
--push .
Result: AMD64 and ARM64 images are pushed as a single manifest list.
BuildKit + CI/CD
GitHub Actions example:
- uses: docker/setup-buildx-action@v3
- uses: docker/build-push-action@v5
with:
context: .
push: true
tags: myregistry.com/myapp:${{ github.sha }}
cache-from: type=gha
cache-to: type=gha,mode=max
platforms: linux/amd64,linux/arm64
type=gha: uses GitHub Actions cache. Shared cache across CI runs.
11. Common Pitfalls
Pitfall 1: Cache Invalidation
FROM node:20
WORKDIR /app
COPY . . # One line changes, everything invalidated
RUN npm install
Fix: dependency files first.
Pitfall 2: Huge Build Context
$ docker build .
Sending build context: 500 MB # Suspicious
Cause: no .dockerignore, so node_modules, .git are included.
Fix: add .dockerignore.
Pitfall 3: Leaking Secrets
ARG AWS_SECRET_KEY
ENV AWS_SECRET_KEY=$AWS_SECRET_KEY
# Forever in the image
Fix: --mount=type=secret.
Pitfall 4: Wrong Stage Choice
FROM node:20 AS builder
# ... build stuff ...
FROM node:20-alpine
COPY /app . # glibc binaries on alpine
# Won't run!
Fix: align base image architectures.
Pitfall 5: Platform Mismatch
Build on Apple Silicon:
docker build -t myapp .
# Built as arm64
docker push myregistry.com/myapp
# On the server: "exec format error"
Fix: specify --platform linux/amd64.
Pitfall 6: Root User
FROM node:20
# Runs as root by default
CMD ["node", "server.js"]
Problem: security risk.
Fix:
RUN useradd -u 1000 appuser
USER appuser
Pitfall 7: Unnecessary Files
COPY . . # Everything
Fix: copy only what you need.
COPY src ./src
COPY package.json tsconfig.json ./
12. Debugging
Verbose Build Logs
docker buildx build --progress=plain .
# Print all logs
Inspecting Intermediate Containers
Legacy builder:
docker commit <intermediate_container_id> debug-image
docker run -it debug-image sh
BuildKit has no intermediate containers. Instead:
RUN ...
# Turn the state just before failure into an image
Or use --target to build up to a specific stage:
docker build --target builder -t myapp:debug .
docker run -it myapp:debug sh
Image Analysis
dive: explore the filesystem by layer.
dive myapp:latest
- Layer-by-layer contents.
- Wasted space.
- Size statistics.
docker history:
docker history myapp:latest
Command and size of each layer.
skopeo: inspect images.
skopeo inspect docker://nginx:latest
Benchmark
time docker build -t myapp .
Use tools like Hyperfine for repeatable measurement.
Quiz Review
Q1. What are three reasons BuildKit is faster than the legacy builder?
A.
1. Parallelism
The legacy builder runs the Dockerfile sequentially:
FROM → RUN 1 → RUN 2 → RUN 3 → ...
Each step waits for the previous one. Even independent work is sequential.
BuildKit is DAG-based:
FROM ─┬─ RUN 1 (independent)
└─ RUN 2 (independent)
↓
RUN 3 (depends on 1, 2)
Independent steps run concurrently. Most obvious in multi-stage builds:
FROM alpine AS a
RUN sleep 10
FROM alpine AS b
RUN sleep 10
FROM alpine
COPY /x /
COPY /y /
- Legacy: 20s (10 + 10).
- BuildKit: 10s (parallel).
In practice, 2-5x speedup on complex builds with multiple stages.
2. Fine-grained Cache Invalidation
Legacy builder cache:
Step 1/5 : COPY . . ← One line changes,
Step 2/5 : RUN npm install ← this reruns too
Step 3/5 : RUN npm build ← and this too
Everything reruns.
BuildKit:
- Cache mount keeps the npm cache persistent.
- Per-file change tracking.
- File-level cache hashes.
# BuildKit with cache mount
RUN \
npm ci
Result:
- First build: 60s.
- Next build: 5s (npm hits cache).
3. Skip Unnecessary Work
In multi-stage builds, BuildKit builds only the stages actually needed:
FROM node:20 AS deps
RUN npm install
FROM node:20 AS build
COPY /app/node_modules .
RUN npm run build
FROM node:20 AS test
COPY /app/node_modules .
RUN npm test ← Not needed for default build
FROM node:20-slim
COPY /app/dist ./dist
CMD ["node", "dist/server.js"]
Legacy: builds the test stage too.
BuildKit: the final image only references build → test is skipped.
You can also use --target test to build only the test stage.
Performance comparison:
Complex Dockerfile with 10 stages:
| Scenario | Legacy | BuildKit |
|---|---|---|
| Cold build | 300s | 180s |
| Warm build (source change) | 300s | 20s |
| Warm build (dependency change) | 300s | 60s |
| Single stage only | Impossible | 10s |
Additional benefits:
4. Remote Cache: registry as cache:
docker buildx build \
--cache-to type=registry,ref=myapp:cache \
--cache-from type=registry,ref=myapp:cache
Cache shared across CI machines.
5. Secret Mount: secrets used only during build, not in the image.
6. Multi-platform: multiple architectures at once.
Takeaway:
BuildKit is not a simple upgrade but a fundamental redesign. The shift from viewing a Dockerfile as a "list of commands" to a "DAG" made all these optimizations possible.
Today (2025), Docker Desktop ships BuildKit by default, and even the docker build command uses BuildKit. The legacy builder is a thing of the past.
Practical recommendations:
- All CI/CD: BuildKit is mandatory.
- Local development: enabled by default (recent Docker).
- Cache management: registry cache + gha cache.
- Dockerfile optimization: use BuildKit features (
# syntax=docker/dockerfile:1.6).
Migrating an old-style Dockerfile to BuildKit style alone can give 3-10x faster builds. Without any code changes.
Q2. How does a multi-stage build reduce image size by 100x?
A.
Basic idea: "separate build tools from runtime".
Problem scenario: building a Go app
Go apps need compilation. Building via Docker requires an image with the Go compiler:
Without multi-stage:
FROM golang:1.21
WORKDIR /app
COPY . .
RUN go build -o myapp
CMD ["./myapp"]
Result: ~1.1 GB image.
Why so big:
- Go compiler + standard library: ~600 MB.
- Build cache (
/go/pkg): ~200 MB. - Source + dependencies: ~100 MB.
- OS (Debian base): ~200 MB.
- Only the binary is needed at runtime (~10 MB).
99% is waste.
Multi-stage solution:
# Stage 1: build
FROM golang:1.21 AS builder
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN go build -o myapp
# Stage 2: runtime
FROM alpine:3.19
COPY /app/myapp /usr/local/bin/
CMD ["myapp"]
Result: ~17 MB image.
- Alpine base: 5 MB.
- Go binary: 10 MB.
- CA certificates, tzdata, etc.: 2 MB.
65x smaller. In some cases, up to 100x.
How it works:
The core of multi-stage builds:
- Each stage is independent: can use different base images.
COPY --from=stage: copy selectively from an earlier stage.- Final stage becomes the resulting image: earlier stages are not included.
Internally:
BuildKit converts the Dockerfile into a DAG:
builder (FROM golang:1.21)
│
│ RUN, COPY, ...
│
└→ /app/myapp (artifact)
│
│ COPY --from=builder
↓
runtime (FROM alpine:3.19)
│
└→ final image (deployment)
The builder stage is intermediate. Not included in the final image. Invisible in docker images after the build.
More extreme: scratch
FROM golang:1.21 AS builder
WORKDIR /app
COPY . .
RUN CGO_ENABLED=0 go build -o myapp
FROM scratch
COPY /app/myapp /
CMD ["/myapp"]
Result: ~10 MB (binary only).
scratch: a completely empty image. 0 bytes.- Only the binary is needed.
- No shell, no libraries.
Constraints:
- Static linking required (
CGO_ENABLED=0). - No dynamic dependencies.
- Hard to debug.
Real example: Rust
# Stage 1: cargo dependency cache
FROM rust:1.75 AS chef
WORKDIR /app
RUN cargo install cargo-chef
FROM chef AS planner
COPY . .
RUN cargo chef prepare --recipe-path recipe.json
FROM chef AS builder
COPY /app/recipe.json recipe.json
RUN cargo chef cook --release --recipe-path recipe.json
COPY . .
RUN cargo build --release
# Stage 2: minimal runtime
FROM debian:bookworm-slim
RUN apt-get update && apt-get install -y \
ca-certificates \
&& rm -rf /var/lib/apt/lists/*
COPY /app/target/release/myapp /usr/local/bin/
CMD ["myapp"]
Result: 80 MB vs 1.5 GB (full Rust image).
Complex Node.js example:
# Stage 1: dev dependencies + build
FROM node:20 AS build
WORKDIR /app
COPY package*.json ./
RUN npm ci # install all deps (including dev)
COPY . .
RUN npm run build # TypeScript compile, etc.
# Stage 2: production deps only
FROM node:20 AS deps
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
# Stage 3: minimal runtime
FROM node:20-slim
WORKDIR /app
COPY /app/node_modules ./node_modules
COPY /app/dist ./dist
COPY package*.json ./
CMD ["node", "dist/server.js"]
Result:
node:20: 1.1 GBnode:20+ all deps + build: 1.5 GB- This setup: 350 MB
4x smaller. With further optimization (alpine), down to 150 MB.
Benefits summary:
1. Pull speed:
- 1 GB image pull: 30-60s.
- 100 MB image pull: 3-5s.
- Big impact on CI/CD and deployment speed.
2. Network cost:
- AWS egress: $0.09/GB.
- 1000 pull/day x 1 GB = $90/day.
- 100 MB image: $9/day.
- 1/10 the cost.
3. Storage:
- Registry storage.
- Node's image cache.
- Smaller means savings.
4. Security:
- Smaller attack surface.
- Unused binaries = potential CVEs.
- Scratch/distroless = minimal attack surface.
5. Startup time:
- Image pull + startup.
- Small image = fast startup.
- Important for serverless and scale-out.
Pitfalls and solutions:
Pitfall 1: missing libraries
Binary not statically linked:
./myapp: error while loading shared libraries: libcurl.so.4
Fix:
- Static linking (
-static,CGO_ENABLED=0). - Or copy required libraries.
Pitfall 2: CA certificates
On HTTPS requests:
x509: certificate signed by unknown authority
Fix:
FROM alpine:3.19
RUN apk add --no-cache ca-certificates
Or:
FROM scratch
COPY /etc/ssl/certs /etc/ssl/certs
COPY /app/myapp /
Pitfall 3: timezone
FROM scratch
COPY /usr/share/zoneinfo /usr/share/zoneinfo
ENV TZ=UTC
Pitfall 4: DNS
Some languages need /etc/nsswitch.conf:
COPY /etc/nsswitch.conf /etc/nsswitch.conf
Pitfall 5: no shell
Scratch has no shell. Exec form is required:
CMD ["/myapp"] # exec form (OK)
# The following won't work
# CMD /myapp # shell form (NO)
Practical recommendations:
| Language | Recommended image | Size |
|---|---|---|
| Go | scratch | ~10 MB |
| Rust | debian-slim | ~80 MB |
| Java | eclipse-temurin-jre | ~250 MB |
| Node.js | node:alpine | ~150 MB |
| Python | python:slim | ~120 MB |
| C/C++ | alpine | ~5 MB |
Takeaway:
Multi-stage build is one of Docker 2.0's biggest innovations (introduced in 2017). A single Dockerfile expresses the entire build pipeline while keeping the final image minimal.
It's a clever way to work around Docker's fundamental design flaw. Before, you had to manage "build images" and "runtime images" separately. Complicated.
With multi-stage:
- One Dockerfile.
- Simplified CI/CD.
- Optimal image size.
- Reproducible.
Today, 99% of production Dockerfiles are multi-stage. Not using it requires a reason.
If your Dockerfile is single-stage, switching to multi-stage is the easiest optimization. Images get smaller almost always and become more secure almost always.
Closing: The Evolution of Builds
Key Points
- OCI spec: container standards. Followed by all tools.
- Layers + UnionFS: sharing and efficiency.
- BuildKit: DAG-based, parallel, powerful caching.
- Multi-stage: the key to small images.
- Cache mount: dramatic build time reduction.
- Secret mount: safe secret management.
- Distroless/Scratch: minimal attack surface.
- Reproducible builds: audit and trust.
Production Checklist
Production Dockerfile:
- Multi-stage build.
- Slim/alpine/distroless base.
-
.dockerignoreconfigured. - Dependency files COPY'd first.
- Cache mounts (
npm,pip,cargo). - Non-root user.
- Secrets via
--mount=type=secret. - Digest pinning (
@sha256:...). - HEALTHCHECK defined.
- Label metadata.
- Multi-platform builds.
- Image scanning.
- SBOM generation.
Final Lesson
Docker popularized containers. BuildKit reinvented builds. Together, they became the foundation of modern CI/CD.
Today the following is taken for granted:
- Minute-long builds → seconds.
- GB images → MB images.
- Opaque builds → reproducible.
- Secret leakage → secret mount.
But all of this is achieved only when Dockerfiles are written properly. Badly written Dockerfiles are still slow, bloated, and insecure.
The knowledge in this post is something every engineer should know. Backend, frontend, DevOps, SRE, all of them. The images you build are pulled thousands of times and run tens of thousands of times in production. Small, fast, and secure images determine the efficiency of the entire system.
When you write your next Dockerfile, remember the principles from this post:
- Multi-stage.
- Cache mount.
- Minimal base.
- Order matters.
- Don't leak secrets.
- Think reproducibility.
These habits make 5x faster CI, 10x smaller images, and safer systems.