Skip to content

Split View: etcd 백업 & 복원 완전 가이드 — CKA 시험 필수 핸즈온

|

etcd 백업 & 복원 완전 가이드 — CKA 시험 필수 핸즈온

1. etcd란 무엇인가?

etcd는 Kubernetes 클러스터의 모든 상태 데이터를 저장하는 분산 Key-Value 스토어다. Pod, Service, ConfigMap, Secret, RBAC 정책 등 클러스터에 존재하는 모든 리소스의 정보가 etcd에 저장된다.

쉽게 말하면, kubectl get 명령어로 보이는 모든 것은 결국 etcd에서 가져온 데이터다.

┌─────────────────────────────────────────────┐
Kubernetes Control Plane│                                             │
│  kube-apiserver ──── etcd (모든 상태 저장)│       │                                     │
│  kube-scheduler    kube-controller-manager   │
└─────────────────────────────────────────────┘

왜 etcd 백업이 중요한가?

  • etcd가 날아가면 클러스터 전체가 사라진다 (Pod, Service, Secret 전부)
  • 잘못된 업그레이드나 조작으로 클러스터가 망가졌을 때 유일한 복구 수단
  • CKA 시험에서 거의 매번 출제되는 필수 토픽

2. 사전 준비: etcd 정보 확인

2.1 etcd Pod 확인

kubeadm으로 설치한 클러스터에서 etcd는 Static Pod로 실행된다:

# etcd Pod 확인
kubectl get pods -n kube-system | grep etcd

# 출력 예시
# etcd-controlplane   1/1   Running   0   45m

2.2 etcd 인증서 위치 확인 (가장 중요!)

etcd는 TLS로 통신하므로 인증서 경로를 정확히 알아야 한다:

# etcd Pod의 실행 인자에서 인증서 경로 확인
kubectl describe pod etcd-controlplane -n kube-system | grep -E "cert|key|cacert|data-dir"

일반적인 경로:

항목경로
CA 인증서/etc/kubernetes/pki/etcd/ca.crt
Server 인증서/etc/kubernetes/pki/etcd/server.crt
Server 키/etc/kubernetes/pki/etcd/server.key
데이터 디렉토리/var/lib/etcd
Endpointhttps://127.0.0.1:2379

2.3 etcdctl 설치 확인

# etcdctl 버전 확인
ETCDCTL_API=3 etcdctl version

# 없으면 설치 (Ubuntu)
apt-get install etcd-client

# 또는 etcd Pod 내부에서 실행
kubectl exec -it etcd-controlplane -n kube-system -- etcdctl version

⚠️ CKA 시험 팁: ETCDCTL_API=3을 반드시 설정해야 한다. API v2는 snapshot 명령어를 지원하지 않는다.


3. etcd 상태 확인

백업 전에 현재 etcd 클러스터가 정상인지 확인하자:

# 환경변수 설정 (매번 입력하기 귀찮으니)
export ETCDCTL_API=3
export ETCDCTL_CACERT=/etc/kubernetes/pki/etcd/ca.crt
export ETCDCTL_CERT=/etc/kubernetes/pki/etcd/server.crt
export ETCDCTL_KEY=/etc/kubernetes/pki/etcd/server.key
export ETCDCTL_ENDPOINTS=https://127.0.0.1:2379

# 클러스터 헬스 체크
etcdctl endpoint health
# 출력: https://127.0.0.1:2379 is healthy: took = 2.345ms

# 멤버 목록 확인
etcdctl member list --write-out=table

# 저장된 키 개수 확인
etcdctl get / --prefix --keys-only | wc -l

4. etcd 백업 (Snapshot)

4.1 스냅샷 생성

ETCDCTL_API=3 etcdctl snapshot save /opt/etcd-backup.db \
  --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/server.crt \
  --key=/etc/kubernetes/pki/etcd/server.key

4.2 스냅샷 검증

ETCDCTL_API=3 etcdctl snapshot status /opt/etcd-backup.db --write-out=table

출력 예시:

+----------+----------+------------+------------+
|   HASH   | REVISION | TOTAL KEYS | TOTAL SIZE |
+----------+----------+------------+------------+
| 3e9a843c |    12450 |       1287 |     5.8 MB |
+----------+----------+------------+------------+

⚠️ 주의: snapshot verify 명령은 DB를 손상시킬 수 있다는 보고가 있으므로, snapshot status로만 확인하자.

4.3 자동 백업 CronJob

프로덕션에서는 CronJob으로 주기적 백업을 구성한다:

apiVersion: batch/v1
kind: CronJob
metadata:
  name: etcd-backup
  namespace: kube-system
spec:
  schedule: '0 */6 * * *'
  jobTemplate:
    spec:
      template:
        spec:
          hostNetwork: true
          containers:
            - name: backup
              image: bitnami/etcd:3.5
              command:
                - /bin/sh
                - -c
                - |
                  etcdctl snapshot save /backup/etcd-$(date +%Y%m%d-%H%M%S).db \
                    --endpoints=https://127.0.0.1:2379 \
                    --cacert=/etc/kubernetes/pki/etcd/ca.crt \
                    --cert=/etc/kubernetes/pki/etcd/server.crt \
                    --key=/etc/kubernetes/pki/etcd/server.key
                  find /backup -name "etcd-*.db" -mtime +7 -delete
              env:
                - name: ETCDCTL_API
                  value: '3'
              volumeMounts:
                - name: etcd-certs
                  mountPath: /etc/kubernetes/pki/etcd
                  readOnly: true
                - name: backup
                  mountPath: /backup
          restartPolicy: OnFailure
          nodeSelector:
            node-role.kubernetes.io/control-plane: ''
          tolerations:
            - effect: NoSchedule
              operator: Exists
          volumes:
            - name: etcd-certs
              hostPath:
                path: /etc/kubernetes/pki/etcd
            - name: backup
              hostPath:
                path: /opt/etcd-backups

5. etcd 복원 (Restore)

5.1 복원 절차 (Step by Step)

Step 1: 스냅샷을 새 데이터 디렉토리로 복원

ETCDCTL_API=3 etcdctl snapshot restore /opt/etcd-backup.db \
  --data-dir=/var/lib/etcd-from-backup \
  --initial-cluster=controlplane=https://127.0.0.1:2380 \
  --initial-cluster-token=etcd-cluster-1 \
  --initial-advertise-peer-urls=https://127.0.0.1:2380

🔑 핵심: --data-dir은 기존 /var/lib/etcd가 아닌 새 경로를 지정해야 한다!

Step 2: etcd Pod의 데이터 디렉토리 변경

vi /etc/kubernetes/manifests/etcd.yaml

변경할 부분:

# Before
volumes:
- hostPath:
    path: /var/lib/etcd
    type: DirectoryOrCreate
  name: etcd-data

# After — 새 경로로 변경!
volumes:
- hostPath:
    path: /var/lib/etcd-from-backup
    type: DirectoryOrCreate
  name: etcd-data

Step 3: etcd Pod 재시작 대기

# Static Pod이므로 kubelet이 자동 재시작
watch "kubectl get pods -n kube-system | grep etcd"

# 또는 crictl로 직접 확인
crictl ps | grep etcd

Step 4: 복원 확인

kubectl get nodes
kubectl get pods --all-namespaces

ETCDCTL_API=3 etcdctl endpoint health \
  --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/server.crt \
  --key=/etc/kubernetes/pki/etcd/server.key

6. 트러블슈팅

6.1 etcdctl: command not found

# CKA 시험에서 자주 발생!
# 해결 1: 전체 경로
/usr/local/bin/etcdctl version

# 해결 2: etcd Pod 내부에서 실행
kubectl exec -it etcd-controlplane -n kube-system -- sh

# 해결 3: SSH 후 sudo
sudo -i
export PATH=$PATH:/usr/local/bin

6.2 복원 후 etcd Pod가 안 뜸

# 로그 확인
crictl logs $(crictl ps -a | grep etcd | awk '{print $1}')

# 흔한 원인 1: 퍼미션 문제
chown -R etcd:etcd /var/lib/etcd-from-backup

# 흔한 원인 2: data-dir 경로 불일치
# etcd.yaml의 volumes.hostPath.path와 실제 복원 경로 확인

# 흔한 원인 3: initial-cluster-token 불일치

6.3 "context deadline exceeded" 에러

# 원인: etcd가 아직 완전히 시작되지 않음
# 해결: 30-60초 대기 후 재시도

7. CKA 시험 치트시트

# 1. 백업
ETCDCTL_API=3 etcdctl snapshot save /opt/backup.db \
  --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/server.crt \
  --key=/etc/kubernetes/pki/etcd/server.key

# 2. 복원
ETCDCTL_API=3 etcdctl snapshot restore /opt/backup.db \
  --data-dir=/var/lib/etcd-from-backup

# 3. etcd.yaml data-dir 변경 (한 줄)
sed -i 's|/var/lib/etcd|/var/lib/etcd-from-backup|g' \
  /etc/kubernetes/manifests/etcd.yaml

# 4. 상태 확인
ETCDCTL_API=3 etcdctl snapshot status /opt/backup.db -w table

💡 시험 팁: 인증서 경로를 외우지 말고, kubectl describe pod etcd-controlplane -n kube-system에서 바로 복사!


8. 외부 etcd 클러스터의 경우

# kube-apiserver에서 etcd 엔드포인트 확인
cat /etc/kubernetes/manifests/kube-apiserver.yaml | grep etcd-servers

# SSH로 etcd 노드 접속 후 백업/복원
# 인증서 경로가 다를 수 있음 (/etc/etcd/pki/)
# 복원 후 systemctl restart etcd

9. 핵심 정리

작업명령어주의사항
백업etcdctl snapshot saveETCDCTL_API=3 필수, 인증서 3개
검증etcdctl snapshot statusverify 대신 status
복원etcdctl snapshot restore새 data-dir 지정 필수
적용etcd.yaml 수정hostPath 경로 변경
확인etcdctl endpoint health재시작 30-60초 대기

CKA 흐름: describe → 인증서 복사 → save/restore → yaml 수정. 끝!


References

etcd Backup & Restore Complete Guide — CKA Exam Essential Hands-on

1. What is etcd?

etcd is a distributed Key-Value store that holds all state data for a Kubernetes cluster. Information about every resource in the cluster — Pods, Services, ConfigMaps, Secrets, RBAC policies, and more — is stored in etcd.

Put simply, everything you see with a kubectl get command ultimately comes from data stored in etcd.

┌─────────────────────────────────────────────┐
Kubernetes Control Plane│                                             │
│  kube-apiserver ──── etcd (all state stored)│       │                                     │
│  kube-scheduler    kube-controller-manager   │
└─────────────────────────────────────────────┘

Why is etcd Backup Important?

  • If etcd is lost, the entire cluster is gone (Pods, Services, Secrets — everything)
  • It is the only recovery method when a cluster is broken by a botched upgrade or misconfiguration
  • It is a near-guaranteed topic on the CKA exam

2. Preparation: Checking etcd Information

2.1 Check the etcd Pod

On clusters installed with kubeadm, etcd runs as a Static Pod:

# Check etcd Pod
kubectl get pods -n kube-system | grep etcd

# Example output
# etcd-controlplane   1/1   Running   0   45m

2.2 Locate etcd Certificates (Most Important!)

etcd communicates over TLS, so you must know the exact certificate paths:

# Find certificate paths from etcd Pod arguments
kubectl describe pod etcd-controlplane -n kube-system | grep -E "cert|key|cacert|data-dir"

Typical paths:

ItemPath
CA Certificate/etc/kubernetes/pki/etcd/ca.crt
Server Certificate/etc/kubernetes/pki/etcd/server.crt
Server Key/etc/kubernetes/pki/etcd/server.key
Data Directory/var/lib/etcd
Endpointhttps://127.0.0.1:2379

2.3 Verify etcdctl Installation

# Check etcdctl version
ETCDCTL_API=3 etcdctl version

# Install if missing (Ubuntu)
apt-get install etcd-client

# Or run inside the etcd Pod
kubectl exec -it etcd-controlplane -n kube-system -- etcdctl version

CKA Exam Tip: You must set ETCDCTL_API=3. API v2 does not support the snapshot command.


3. Checking etcd Health

Before backing up, verify that the current etcd cluster is healthy:

# Set environment variables (to avoid typing them every time)
export ETCDCTL_API=3
export ETCDCTL_CACERT=/etc/kubernetes/pki/etcd/ca.crt
export ETCDCTL_CERT=/etc/kubernetes/pki/etcd/server.crt
export ETCDCTL_KEY=/etc/kubernetes/pki/etcd/server.key
export ETCDCTL_ENDPOINTS=https://127.0.0.1:2379

# Cluster health check
etcdctl endpoint health
# Output: https://127.0.0.1:2379 is healthy: took = 2.345ms

# List members
etcdctl member list --write-out=table

# Check the number of stored keys
etcdctl get / --prefix --keys-only | wc -l

4. etcd Backup (Snapshot)

4.1 Create a Snapshot

ETCDCTL_API=3 etcdctl snapshot save /opt/etcd-backup.db \
  --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/server.crt \
  --key=/etc/kubernetes/pki/etcd/server.key

4.2 Verify the Snapshot

ETCDCTL_API=3 etcdctl snapshot status /opt/etcd-backup.db --write-out=table

Example output:

+----------+----------+------------+------------+
|   HASH   | REVISION | TOTAL KEYS | TOTAL SIZE |
+----------+----------+------------+------------+
| 3e9a843c |    12450 |       1287 |     5.8 MB |
+----------+----------+------------+------------+

Caution: There have been reports that the snapshot verify command can corrupt the DB, so use snapshot status for verification only.

4.3 Automated Backup CronJob

In production, configure periodic backups with a CronJob:

apiVersion: batch/v1
kind: CronJob
metadata:
  name: etcd-backup
  namespace: kube-system
spec:
  schedule: '0 */6 * * *'
  jobTemplate:
    spec:
      template:
        spec:
          hostNetwork: true
          containers:
            - name: backup
              image: bitnami/etcd:3.5
              command:
                - /bin/sh
                - -c
                - |
                  etcdctl snapshot save /backup/etcd-$(date +%Y%m%d-%H%M%S).db \
                    --endpoints=https://127.0.0.1:2379 \
                    --cacert=/etc/kubernetes/pki/etcd/ca.crt \
                    --cert=/etc/kubernetes/pki/etcd/server.crt \
                    --key=/etc/kubernetes/pki/etcd/server.key
                  find /backup -name "etcd-*.db" -mtime +7 -delete
              env:
                - name: ETCDCTL_API
                  value: '3'
              volumeMounts:
                - name: etcd-certs
                  mountPath: /etc/kubernetes/pki/etcd
                  readOnly: true
                - name: backup
                  mountPath: /backup
          restartPolicy: OnFailure
          nodeSelector:
            node-role.kubernetes.io/control-plane: ''
          tolerations:
            - effect: NoSchedule
              operator: Exists
          volumes:
            - name: etcd-certs
              hostPath:
                path: /etc/kubernetes/pki/etcd
            - name: backup
              hostPath:
                path: /opt/etcd-backups

5. etcd Restore

5.1 Restore Procedure (Step by Step)

Step 1: Restore the snapshot to a new data directory

ETCDCTL_API=3 etcdctl snapshot restore /opt/etcd-backup.db \
  --data-dir=/var/lib/etcd-from-backup \
  --initial-cluster=controlplane=https://127.0.0.1:2380 \
  --initial-cluster-token=etcd-cluster-1 \
  --initial-advertise-peer-urls=https://127.0.0.1:2380

Key Point: The --data-dir must be a new path, not the existing /var/lib/etcd!

Step 2: Update the etcd Pod's data directory

vi /etc/kubernetes/manifests/etcd.yaml

Section to change:

# Before
volumes:
- hostPath:
    path: /var/lib/etcd
    type: DirectoryOrCreate
  name: etcd-data

# After — change to the new path!
volumes:
- hostPath:
    path: /var/lib/etcd-from-backup
    type: DirectoryOrCreate
  name: etcd-data

Step 3: Wait for etcd Pod to restart

# Since it's a Static Pod, kubelet will restart it automatically
watch "kubectl get pods -n kube-system | grep etcd"

# Or check directly with crictl
crictl ps | grep etcd

Step 4: Verify the restore

kubectl get nodes
kubectl get pods --all-namespaces

ETCDCTL_API=3 etcdctl endpoint health \
  --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/server.crt \
  --key=/etc/kubernetes/pki/etcd/server.key

6. Troubleshooting

6.1 etcdctl: command not found

# Common in CKA exams!
# Solution 1: Use the full path
/usr/local/bin/etcdctl version

# Solution 2: Run inside the etcd Pod
kubectl exec -it etcd-controlplane -n kube-system -- sh

# Solution 3: SSH and sudo
sudo -i
export PATH=$PATH:/usr/local/bin

6.2 etcd Pod Won't Start After Restore

# Check logs
crictl logs $(crictl ps -a | grep etcd | awk '{print $1}')

# Common cause 1: Permission issues
chown -R etcd:etcd /var/lib/etcd-from-backup

# Common cause 2: data-dir path mismatch
# Verify that etcd.yaml's volumes.hostPath.path matches the actual restore path

# Common cause 3: initial-cluster-token mismatch

6.3 "context deadline exceeded" Error

# Cause: etcd hasn't fully started yet
# Solution: Wait 30-60 seconds and retry

7. CKA Exam Cheat Sheet

# 1. Backup
ETCDCTL_API=3 etcdctl snapshot save /opt/backup.db \
  --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/server.crt \
  --key=/etc/kubernetes/pki/etcd/server.key

# 2. Restore
ETCDCTL_API=3 etcdctl snapshot restore /opt/backup.db \
  --data-dir=/var/lib/etcd-from-backup

# 3. Change data-dir in etcd.yaml (one-liner)
sed -i 's|/var/lib/etcd|/var/lib/etcd-from-backup|g' \
  /etc/kubernetes/manifests/etcd.yaml

# 4. Verify status
ETCDCTL_API=3 etcdctl snapshot status /opt/backup.db -w table

Exam Tip: Don't memorize certificate paths — just copy them from kubectl describe pod etcd-controlplane -n kube-system!


8. External etcd Cluster Scenario

# Find the etcd endpoint from kube-apiserver
cat /etc/kubernetes/manifests/kube-apiserver.yaml | grep etcd-servers

# SSH to the etcd node and perform backup/restore
# Certificate paths may differ (/etc/etcd/pki/)
# After restore: systemctl restart etcd

9. Key Summary

OperationCommandNotes
Backupetcdctl snapshot saveETCDCTL_API=3 required, 3 certificates
Verifyetcdctl snapshot statusUse status, not verify
Restoreetcdctl snapshot restoreMust specify a new data-dir
ApplyEdit etcd.yamlChange hostPath path
Confirmetcdctl endpoint healthWait 30-60 seconds after restart

CKA Flow: describe -> copy certificates -> save/restore -> edit yaml. Done!


References

Quiz

Q1: What is the main topic covered in "etcd Backup & Restore Complete Guide — CKA Exam Essential Hands-on"?

A complete guide to backing up and restoring etcd — the brain of a Kubernetes cluster — with practical commands. A must-know topic for the CKA exam and an essential skill for production operations.

Q2: What is etcd?? etcd is a distributed Key-Value store that holds all state data for a Kubernetes cluster. Information about every resource in the cluster — Pods, Services, ConfigMaps, Secrets, RBAC policies, and more — is stored in etcd.

Q3: Explain the core concept of Preparation: Checking etcd Information. 2.1 Check the etcd Pod On clusters installed with kubeadm, etcd runs as a Static Pod: 2.2 Locate etcd Certificates (Most Important!) etcd communicates over TLS, so you must know the exact certificate paths: Typical paths: 2.3 Verify etcdctl Installation

Q4: What are the key aspects of Checking etcd Health? Before backing up, verify that the current etcd cluster is healthy:

Q5: How does etcd Backup (Snapshot) work? 4.1 Create a Snapshot 4.2 Verify the Snapshot Example output: 4.3 Automated Backup CronJob In production, configure periodic backups with a CronJob: