- Authors
- Name
- 1. What is etcd?
- 2. Preparation: Checking etcd Information
- 3. Checking etcd Health
- 4. etcd Backup (Snapshot)
- 5. etcd Restore
- 6. Troubleshooting
- 7. CKA Exam Cheat Sheet
- 8. External etcd Cluster Scenario
- 9. Key Summary
- References
1. What is etcd?
etcd is a distributed Key-Value store that holds all state data for a Kubernetes cluster. Information about every resource in the cluster — Pods, Services, ConfigMaps, Secrets, RBAC policies, and more — is stored in etcd.
Put simply, everything you see with a kubectl get command ultimately comes from data stored in etcd.
┌─────────────────────────────────────────────┐
│ Kubernetes Control Plane │
│ │
│ kube-apiserver ──── etcd (all state stored) │
│ │ │
│ kube-scheduler kube-controller-manager │
└─────────────────────────────────────────────┘
Why is etcd Backup Important?
- If etcd is lost, the entire cluster is gone (Pods, Services, Secrets — everything)
- It is the only recovery method when a cluster is broken by a botched upgrade or misconfiguration
- It is a near-guaranteed topic on the CKA exam
2. Preparation: Checking etcd Information
2.1 Check the etcd Pod
On clusters installed with kubeadm, etcd runs as a Static Pod:
# Check etcd Pod
kubectl get pods -n kube-system | grep etcd
# Example output
# etcd-controlplane 1/1 Running 0 45m
2.2 Locate etcd Certificates (Most Important!)
etcd communicates over TLS, so you must know the exact certificate paths:
# Find certificate paths from etcd Pod arguments
kubectl describe pod etcd-controlplane -n kube-system | grep -E "cert|key|cacert|data-dir"
Typical paths:
| Item | Path |
|---|---|
| CA Certificate | /etc/kubernetes/pki/etcd/ca.crt |
| Server Certificate | /etc/kubernetes/pki/etcd/server.crt |
| Server Key | /etc/kubernetes/pki/etcd/server.key |
| Data Directory | /var/lib/etcd |
| Endpoint | https://127.0.0.1:2379 |
2.3 Verify etcdctl Installation
# Check etcdctl version
ETCDCTL_API=3 etcdctl version
# Install if missing (Ubuntu)
apt-get install etcd-client
# Or run inside the etcd Pod
kubectl exec -it etcd-controlplane -n kube-system -- etcdctl version
CKA Exam Tip: You must set
ETCDCTL_API=3. API v2 does not support thesnapshotcommand.
3. Checking etcd Health
Before backing up, verify that the current etcd cluster is healthy:
# Set environment variables (to avoid typing them every time)
export ETCDCTL_API=3
export ETCDCTL_CACERT=/etc/kubernetes/pki/etcd/ca.crt
export ETCDCTL_CERT=/etc/kubernetes/pki/etcd/server.crt
export ETCDCTL_KEY=/etc/kubernetes/pki/etcd/server.key
export ETCDCTL_ENDPOINTS=https://127.0.0.1:2379
# Cluster health check
etcdctl endpoint health
# Output: https://127.0.0.1:2379 is healthy: took = 2.345ms
# List members
etcdctl member list --write-out=table
# Check the number of stored keys
etcdctl get / --prefix --keys-only | wc -l
4. etcd Backup (Snapshot)
4.1 Create a Snapshot
ETCDCTL_API=3 etcdctl snapshot save /opt/etcd-backup.db \
--endpoints=https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key
4.2 Verify the Snapshot
ETCDCTL_API=3 etcdctl snapshot status /opt/etcd-backup.db --write-out=table
Example output:
+----------+----------+------------+------------+
| HASH | REVISION | TOTAL KEYS | TOTAL SIZE |
+----------+----------+------------+------------+
| 3e9a843c | 12450 | 1287 | 5.8 MB |
+----------+----------+------------+------------+
Caution: There have been reports that the
snapshot verifycommand can corrupt the DB, so usesnapshot statusfor verification only.
4.3 Automated Backup CronJob
In production, configure periodic backups with a CronJob:
apiVersion: batch/v1
kind: CronJob
metadata:
name: etcd-backup
namespace: kube-system
spec:
schedule: '0 */6 * * *'
jobTemplate:
spec:
template:
spec:
hostNetwork: true
containers:
- name: backup
image: bitnami/etcd:3.5
command:
- /bin/sh
- -c
- |
etcdctl snapshot save /backup/etcd-$(date +%Y%m%d-%H%M%S).db \
--endpoints=https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key
find /backup -name "etcd-*.db" -mtime +7 -delete
env:
- name: ETCDCTL_API
value: '3'
volumeMounts:
- name: etcd-certs
mountPath: /etc/kubernetes/pki/etcd
readOnly: true
- name: backup
mountPath: /backup
restartPolicy: OnFailure
nodeSelector:
node-role.kubernetes.io/control-plane: ''
tolerations:
- effect: NoSchedule
operator: Exists
volumes:
- name: etcd-certs
hostPath:
path: /etc/kubernetes/pki/etcd
- name: backup
hostPath:
path: /opt/etcd-backups
5. etcd Restore
5.1 Restore Procedure (Step by Step)
Step 1: Restore the snapshot to a new data directory
ETCDCTL_API=3 etcdctl snapshot restore /opt/etcd-backup.db \
--data-dir=/var/lib/etcd-from-backup \
--initial-cluster=controlplane=https://127.0.0.1:2380 \
--initial-cluster-token=etcd-cluster-1 \
--initial-advertise-peer-urls=https://127.0.0.1:2380
Key Point: The
--data-dirmust be a new path, not the existing/var/lib/etcd!
Step 2: Update the etcd Pod's data directory
vi /etc/kubernetes/manifests/etcd.yaml
Section to change:
# Before
volumes:
- hostPath:
path: /var/lib/etcd
type: DirectoryOrCreate
name: etcd-data
# After — change to the new path!
volumes:
- hostPath:
path: /var/lib/etcd-from-backup
type: DirectoryOrCreate
name: etcd-data
Step 3: Wait for etcd Pod to restart
# Since it's a Static Pod, kubelet will restart it automatically
watch "kubectl get pods -n kube-system | grep etcd"
# Or check directly with crictl
crictl ps | grep etcd
Step 4: Verify the restore
kubectl get nodes
kubectl get pods --all-namespaces
ETCDCTL_API=3 etcdctl endpoint health \
--endpoints=https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key
6. Troubleshooting
6.1 etcdctl: command not found
# Common in CKA exams!
# Solution 1: Use the full path
/usr/local/bin/etcdctl version
# Solution 2: Run inside the etcd Pod
kubectl exec -it etcd-controlplane -n kube-system -- sh
# Solution 3: SSH and sudo
sudo -i
export PATH=$PATH:/usr/local/bin
6.2 etcd Pod Won't Start After Restore
# Check logs
crictl logs $(crictl ps -a | grep etcd | awk '{print $1}')
# Common cause 1: Permission issues
chown -R etcd:etcd /var/lib/etcd-from-backup
# Common cause 2: data-dir path mismatch
# Verify that etcd.yaml's volumes.hostPath.path matches the actual restore path
# Common cause 3: initial-cluster-token mismatch
6.3 "context deadline exceeded" Error
# Cause: etcd hasn't fully started yet
# Solution: Wait 30-60 seconds and retry
7. CKA Exam Cheat Sheet
# 1. Backup
ETCDCTL_API=3 etcdctl snapshot save /opt/backup.db \
--endpoints=https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key
# 2. Restore
ETCDCTL_API=3 etcdctl snapshot restore /opt/backup.db \
--data-dir=/var/lib/etcd-from-backup
# 3. Change data-dir in etcd.yaml (one-liner)
sed -i 's|/var/lib/etcd|/var/lib/etcd-from-backup|g' \
/etc/kubernetes/manifests/etcd.yaml
# 4. Verify status
ETCDCTL_API=3 etcdctl snapshot status /opt/backup.db -w table
Exam Tip: Don't memorize certificate paths — just copy them from
kubectl describe pod etcd-controlplane -n kube-system!
8. External etcd Cluster Scenario
# Find the etcd endpoint from kube-apiserver
cat /etc/kubernetes/manifests/kube-apiserver.yaml | grep etcd-servers
# SSH to the etcd node and perform backup/restore
# Certificate paths may differ (/etc/etcd/pki/)
# After restore: systemctl restart etcd
9. Key Summary
| Operation | Command | Notes |
|---|---|---|
| Backup | etcdctl snapshot save | ETCDCTL_API=3 required, 3 certificates |
| Verify | etcdctl snapshot status | Use status, not verify |
| Restore | etcdctl snapshot restore | Must specify a new data-dir |
| Apply | Edit etcd.yaml | Change hostPath path |
| Confirm | etcdctl endpoint health | Wait 30-60 seconds after restart |
CKA Flow: describe -> copy certificates -> save/restore -> edit yaml. Done!