Skip to content

Split View: Kubernetes DB 심화 가이드 2025: Operator 패턴, HA 구성, 백업/복구, 성능 튜닝 실전

✨ Learn with Quiz
|

Kubernetes DB 심화 가이드 2025: Operator 패턴, HA 구성, 백업/복구, 성능 튜닝 실전

목차

1. K8s에서 DB를 운영해야 하는가?

1.1 2025년 기준 장단점 분석

K8s DB 운영의 장점

  • 일관된 배포: 애플리케이션과 동일한 GitOps 워크플로우로 DB도 관리
  • 리소스 효율: 노드 리소스를 다른 워크로드와 공유 (전용 VM 대비 비용 절감)
  • 자동 복구: Operator가 장애 감지 및 자동 Failover 수행 (RTO 30초 이하)
  • 이식성: 클라우드 벤더 종속 없이 어디서든 동일한 구성
  • 개발 환경 통합: 개발/스테이징에서 프로덕션과 동일한 DB 스택

K8s DB 운영의 단점

  • 운영 복잡성: StorageClass, PV, Operator 업그레이드 등 추가 관리 포인트
  • 스토리지 성능: 네트워크 스토리지(EBS/PD)는 로컬 디스크 대비 레이턴시 높음
  • 전문성 필요: DBA + K8s 운영 양쪽 모두 깊은 이해 필요
  • 백업 복잡성: K8s 특유의 볼륨 스냅샷, 오브젝트 스토리지 연동 필요

1.2 의사결정 프레임워크

기준K8s DB 적합관리형 DB(RDS 등) 적합
팀 역량K8s + DBA 전문가 보유DBA 없음
비용 민감도높음 (인프라 최적화 필요)중간 (관리 비용 포함 OK)
규정 준수데이터 위치 제어 필요클라우드 리전 충분
멀티클라우드필수단일 클라우드
워크로드 규모중소규모 (수백 GB)대규모 (수 TB 이상)
SLA 요구99.9% 자체 달성 가능99.99% 벤더 SLA 필요

2. StatefulSet 심화

2.1 Pod Identity (순서 인덱스와 안정적 호스트명)

StatefulSet은 각 Pod에 **순서 인덱스(ordinal index)**를 부여합니다.

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: postgres
  namespace: database
spec:
  serviceName: postgres-headless
  replicas: 3
  selector:
    matchLabels:
      app: postgres
  template:
    metadata:
      labels:
        app: postgres
    spec:
      containers:
        - name: postgres
          image: postgres:16
          ports:
            - containerPort: 5432
          volumeMounts:
            - name: data
              mountPath: /var/lib/postgresql/data
          env:
            - name: POSTGRES_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: postgres-secret
                  key: password
  volumeClaimTemplates:
    - metadata:
        name: data
      spec:
        accessModes: ["ReadWriteOnce"]
        storageClassName: gp3-encrypted
        resources:
          requests:
            storage: 100Gi

이 StatefulSet이 생성하는 Pod:

postgres-0  →  postgres-0.postgres-headless.database.svc.cluster.local
postgres-1  →  postgres-1.postgres-headless.database.svc.cluster.local
postgres-2  →  postgres-2.postgres-headless.database.svc.cluster.local

각 Pod는 재시작 후에도 동일한 이름, 동일한 PVC에 바인딩됩니다.

2.2 PodManagementPolicy

spec:
  podManagementPolicy: OrderedReady  # 기본값
  # OrderedReady: 0→1→2 순서대로 생성, 2→1→0 순서대로 삭제
  # Parallel: 모든 Pod 동시 생성/삭제 (DB에서는 주의 필요)
  • OrderedReady: Primary(0번)가 먼저 시작되고 Ready 상태가 된 후 Replica가 시작
  • Parallel: 초기 클러스터 부트스트랩 시에만 사용 (기존 데이터가 없는 경우)

2.3 volumeClaimTemplates

volumeClaimTemplates:
  - metadata:
      name: data
      labels:
        type: database-storage
    spec:
      accessModes: ["ReadWriteOnce"]
      storageClassName: gp3-encrypted
      resources:
        requests:
          storage: 100Gi

이 템플릿은 각 Pod에 대해 개별 PVC를 생성합니다:

data-postgres-0  →  100Gi PV (gp3-encrypted)
data-postgres-1  →  100Gi PV (gp3-encrypted)
data-postgres-2  →  100Gi PV (gp3-encrypted)

StatefulSet 삭제 시 PVC는 자동 삭제되지 않습니다 (데이터 보호).

2.4 Update 전략

spec:
  updateStrategy:
    type: RollingUpdate       # 또는 OnDelete
    rollingUpdate:
      partition: 0            # partition 이상의 인덱스만 업데이트
      maxUnavailable: 1       # K8s 1.24+
  • RollingUpdate: 높은 인덱스부터 역순으로 업데이트 (2→1→0). Replica 먼저, Primary 마지막
  • OnDelete: Pod를 수동 삭제해야 업데이트됨. 카나리 업데이트에 유용
  • partition: partition: 2로 설정하면 인덱스 2 이상만 업데이트 (카나리)

2.5 Headless Service

apiVersion: v1
kind: Service
metadata:
  name: postgres-headless
  namespace: database
spec:
  type: ClusterIP
  clusterIP: None  # Headless Service
  selector:
    app: postgres
  ports:
    - port: 5432
      targetPort: 5432

Headless Service는 DNS A 레코드로 각 Pod의 IP를 직접 반환합니다:

# 모든 Pod IP 조회
nslookup postgres-headless.database.svc.cluster.local

# 특정 Pod 직접 접근
psql -h postgres-0.postgres-headless.database.svc.cluster.local -U postgres

3. CloudNativePG (PostgreSQL) Deep Dive

3.1 Cluster CRD

apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: my-postgres
  namespace: database
spec:
  instances: 3

  postgresql:
    parameters:
      shared_buffers: "2GB"
      effective_cache_size: "6GB"
      work_mem: "64MB"
      maintenance_work_mem: "512MB"
      max_connections: "200"
      max_wal_size: "2GB"
      min_wal_size: "1GB"
      wal_buffers: "64MB"
      random_page_cost: "1.1"
      effective_io_concurrency: "200"
      max_worker_processes: "8"
      max_parallel_workers_per_gather: "4"
      max_parallel_workers: "8"

  storage:
    size: 100Gi
    storageClass: gp3-encrypted
    pvcTemplate:
      accessModes:
        - ReadWriteOnce

  walStorage:
    size: 20Gi
    storageClass: gp3-encrypted

  resources:
    requests:
      cpu: "2"
      memory: "8Gi"
    limits:
      cpu: "4"
      memory: "8Gi"  # limits = requests (QoS Guaranteed)

  monitoring:
    enablePodMonitor: true
    customQueriesConfigMap:
      - name: custom-queries
        key: queries

  bootstrap:
    initdb:
      database: myapp
      owner: myapp
      secret:
        name: myapp-db-credentials

3.2 자동 Failover

CloudNativePG는 Primary 장애 시 자동으로 Replica를 승격합니다.

정상 상태:
  my-postgres-1 (Primary, RW)
  my-postgres-2 (Replica, RO) - streaming replication
  my-postgres-3 (Replica, RO) - streaming replication

Primary 장애 발생:
  1. CloudNativePG가 Primary 장애 감지 (health check 실패)
  2. 가장 최신 WAL 위치를 가진 Replica 선택
  3. pg_promote() 실행 → 새 Primary 승격
  4. 나머지 Replica가Primary를 팔로우
  5. pg_rewind로 이전 Primary를 Replica로 재합류

Failover:
  my-postgres-2 (Primary, RW) ← 자동 승격
  my-postgres-3 (Replica, RO) -Primary 팔로우
  my-postgres-1 (Replica, RO) - pg_rewind 후 재합류

Failover 소요 시간: 일반적으로 10-30초.

3.3 백업 설정 (Barman + Object Storage)

apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: my-postgres
spec:
  instances: 3

  backup:
    barmanObjectStore:
      destinationPath: s3://my-backup-bucket/postgres/
      endpointURL: https://s3.ap-northeast-2.amazonaws.com
      s3Credentials:
        accessKeyId:
          name: aws-creds
          key: ACCESS_KEY_ID
        secretAccessKey:
          name: aws-creds
          key: ACCESS_SECRET_KEY
      wal:
        compression: gzip
        maxParallel: 4
      data:
        compression: gzip
        immediateCheckpoint: true
    retentionPolicy: "30d"

스케줄 백업 설정:

apiVersion: postgresql.cnpg.io/v1
kind: ScheduledBackup
metadata:
  name: daily-backup
  namespace: database
spec:
  schedule: "0 3 * * *"  # 매일 오전 3시
  backupOwnerReference: self
  cluster:
    name: my-postgres
  immediate: false
  suspend: false

수동 백업:

apiVersion: postgresql.cnpg.io/v1
kind: Backup
metadata:
  name: manual-backup-20240901
  namespace: database
spec:
  method: barmanObjectStore
  cluster:
    name: my-postgres

3.4 WAL 아카이빙과 PITR

WAL(Write-Ahead Log) 아카이빙은 연속적인 데이터 보호를 제공합니다.

# WAL 아카이빙은 backup.barmanObjectStore.wal 설정으로 자동 활성화
spec:
  backup:
    barmanObjectStore:
      wal:
        compression: gzip
        maxParallel: 4
        # WAL이 지속적으로 S3에 아카이브됨

PITR(Point-in-Time Recovery) 수행:

apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: my-postgres-restored
spec:
  instances: 3

  bootstrap:
    recovery:
      source: my-postgres
      recoveryTarget:
        targetTime: "2024-09-01 14:30:00.00000+09"
        # 또는 특정 트랜잭션 ID
        # targetXID: "12345"
        # 또는 특정 LSN
        # targetLSN: "0/1234567"

  externalClusters:
    - name: my-postgres
      barmanObjectStore:
        destinationPath: s3://my-backup-bucket/postgres/
        endpointURL: https://s3.ap-northeast-2.amazonaws.com
        s3Credentials:
          accessKeyId:
            name: aws-creds
            key: ACCESS_KEY_ID
          secretAccessKey:
            name: aws-creds
            key: ACCESS_SECRET_KEY

3.5 Connection Pooling (내장 PgBouncer)

apiVersion: postgresql.cnpg.io/v1
kind: Pooler
metadata:
  name: my-postgres-pooler-rw
  namespace: database
spec:
  cluster:
    name: my-postgres
  instances: 2
  type: rw  # rw 또는 ro
  pgbouncer:
    poolMode: transaction
    parameters:
      max_client_conn: "1000"
      default_pool_size: "25"
      min_pool_size: "5"
      reserve_pool_size: "5"
      reserve_pool_timeout: "5"
      server_idle_timeout: "300"
      log_connections: "1"
      log_disconnections: "1"
# 읽기 전용 풀러 (Replica 대상)
apiVersion: postgresql.cnpg.io/v1
kind: Pooler
metadata:
  name: my-postgres-pooler-ro
spec:
  cluster:
    name: my-postgres
  instances: 2
  type: ro
  pgbouncer:
    poolMode: transaction
    parameters:
      max_client_conn: "2000"
      default_pool_size: "50"

애플리케이션 연결:

# 읽기/쓰기 (Primary)
my-postgres-pooler-rw.database.svc.cluster.local:5432

# 읽기 전용 (Replica)
my-postgres-pooler-ro.database.svc.cluster.local:5432

3.6 모니터링 (Prometheus + Grafana)

# CloudNativePG에서 PodMonitor 자동 생성
spec:
  monitoring:
    enablePodMonitor: true
    customQueriesConfigMap:
      - name: pg-custom-queries
        key: queries

커스텀 쿼리 ConfigMap:

apiVersion: v1
kind: ConfigMap
metadata:
  name: pg-custom-queries
  namespace: database
data:
  queries: |
    pg_database_size:
      query: "SELECT datname, pg_database_size(datname) as size_bytes FROM pg_database WHERE datname NOT IN ('template0', 'template1')"
      master: true
      metrics:
        - datname:
            usage: "LABEL"
            description: "Database name"
        - size_bytes:
            usage: "GAUGE"
            description: "Database size in bytes"

    pg_stat_activity_count:
      query: "SELECT state, count(*) as count FROM pg_stat_activity GROUP BY state"
      master: true
      metrics:
        - state:
            usage: "LABEL"
            description: "Connection state"
        - count:
            usage: "GAUGE"
            description: "Number of connections"

    pg_replication_lag:
      query: "SELECT CASE WHEN pg_is_in_recovery() THEN EXTRACT(EPOCH FROM (now() - pg_last_xact_replay_timestamp()))::int ELSE 0 END as lag_seconds"
      master: false
      metrics:
        - lag_seconds:
            usage: "GAUGE"
            description: "Replication lag in seconds"

3.7 Rolling Update (무중단 업데이트)

# 이미지 업데이트
spec:
  imageName: ghcr.io/cloudnative-pg/postgresql:16.3

# CloudNativePG가 자동으로:
# 1. Replica부터 하나씩 재시작
# 2. 각 Replica가 Ready 상태 확인 후 다음 진행
# 3. 마지막에 Primary를 switchover (새 Primary 승격 후 이전 Primary 재시작)
# → 다운타임 없이 업데이트 완료

4. Percona Operator for MySQL

4.1 XtraDB Cluster (동기 복제)

apiVersion: pxc.percona.com/v1
kind: PerconaXtraDBCluster
metadata:
  name: my-mysql
  namespace: database
spec:
  crVersion: "1.14.0"
  secretsName: my-mysql-secrets

  pxc:
    size: 3
    image: percona/percona-xtradb-cluster:8.0.35
    resources:
      requests:
        cpu: "2"
        memory: "8Gi"
      limits:
        cpu: "4"
        memory: "8Gi"
    volumeSpec:
      persistentVolumeClaim:
        storageClassName: gp3-encrypted
        accessModes: ["ReadWriteOnce"]
        resources:
          requests:
            storage: 100Gi
    configuration: |
      [mysqld]
      innodb_buffer_pool_size=4G
      innodb_log_file_size=1G
      innodb_flush_method=O_DIRECT
      max_connections=500
      wsrep_sync_wait=3
      wsrep_slave_threads=4

  haproxy:
    enabled: true
    size: 2
    image: percona/haproxy:2.8.5
    resources:
      requests:
        cpu: "500m"
        memory: "1Gi"

  proxysql:
    enabled: false

  backup:
    image: percona/percona-xtradb-cluster-operator:1.14.0-pxc8.0-backup-pxb8.0.35
    storages:
      s3-backup:
        type: s3
        s3:
          bucket: my-mysql-backups
          region: ap-northeast-2
          credentialsSecret: aws-s3-secret
    schedule:
      - name: daily-full
        schedule: "0 3 * * *"
        keep: 7
        storageName: s3-backup

4.2 ProxySQL 연결 라우팅

spec:
  proxysql:
    enabled: true
    size: 2
    image: percona/proxysql2:2.5.5
    resources:
      requests:
        cpu: "500m"
        memory: "1Gi"
    configuration: |
      datadir="/var/lib/proxysql"
      admin_variables=
      {
        admin_credentials="admin:admin"
        mysql_ifaces="0.0.0.0:6032"
      }
      mysql_variables=
      {
        threads=4
        max_connections=2048
        default_query_delay=0
        default_query_timeout=36000000
        poll_timeout=2000
        interfaces="0.0.0.0:3306;/tmp/proxysql.sock"
        default_schema="information_schema"
        stacksize=1048576
        connect_timeout_server=3000
        monitor_history=600000
        monitor_connect_interval=60000
        monitor_ping_interval=10000
      }

4.3 Percona XtraBackup

# 수동 백업 실행
apiVersion: pxc.percona.com/v1
kind: PerconaXtraDBClusterBackup
metadata:
  name: manual-backup-20240901
  namespace: database
spec:
  pxcCluster: my-mysql
  storageName: s3-backup
# 백업 상태 확인
kubectl get pxc-backup -n database
# NAME                    CLUSTER    STORAGE     STATUS
# manual-backup-20240901  my-mysql   s3-backup   Succeeded

# 백업에서 복원
kubectl apply -f - <<EOF
apiVersion: pxc.percona.com/v1
kind: PerconaXtraDBClusterRestore
metadata:
  name: restore-20240901
  namespace: database
spec:
  pxcCluster: my-mysql
  backupName: manual-backup-20240901
EOF

5. MongoDB Community Operator

5.1 ReplicaSet 구성

apiVersion: mongodbcommunity.mongodb.com/v1
kind: MongoDBCommunity
metadata:
  name: my-mongodb
  namespace: database
spec:
  members: 3
  type: ReplicaSet
  version: "7.0.12"
  security:
    authentication:
      modes: ["SCRAM"]
  users:
    - name: admin
      db: admin
      passwordSecretRef:
        name: mongodb-admin-password
      roles:
        - name: clusterAdmin
          db: admin
        - name: userAdminAnyDatabase
          db: admin
      scramCredentialsSecretName: admin-scram
    - name: myapp
      db: myapp
      passwordSecretRef:
        name: mongodb-myapp-password
      roles:
        - name: readWrite
          db: myapp
      scramCredentialsSecretName: myapp-scram

  statefulSet:
    spec:
      template:
        spec:
          containers:
            - name: mongod
              resources:
                requests:
                  cpu: "2"
                  memory: "4Gi"
                limits:
                  cpu: "4"
                  memory: "4Gi"
      volumeClaimTemplates:
        - metadata:
            name: data-volume
          spec:
            storageClassName: gp3-encrypted
            accessModes: ["ReadWriteOnce"]
            resources:
              requests:
                storage: 50Gi
        - metadata:
            name: logs-volume
          spec:
            storageClassName: gp3-encrypted
            accessModes: ["ReadWriteOnce"]
            resources:
              requests:
                storage: 10Gi

  additionalMongodConfig:
    storage.wiredTiger.engineConfig.cacheSizeGB: 2
    net.maxIncomingConnections: 500

5.2 백업 전략

# mongodump 기반 논리 백업 (CronJob)
apiVersion: batch/v1
kind: CronJob
metadata:
  name: mongodb-backup
  namespace: database
spec:
  schedule: "0 2 * * *"
  jobTemplate:
    spec:
      template:
        spec:
          containers:
            - name: backup
              image: mongo:7.0
              command:
                - /bin/sh
                - -c
                - |
                  mongodump \
                    --uri="mongodb://myapp:password@my-mongodb-0.my-mongodb-svc.database.svc.cluster.local:27017/myapp?replicaSet=my-mongodb&authSource=myapp" \
                    --gzip \
                    --archive=/backup/myapp-$(date +%Y%m%d).gz
                  # S3에 업로드
                  aws s3 cp /backup/myapp-$(date +%Y%m%d).gz \
                    s3://my-backup-bucket/mongodb/
              volumeMounts:
                - name: backup-volume
                  mountPath: /backup
          volumes:
            - name: backup-volume
              emptyDir: {}
          restartPolicy: OnFailure

6. 스토리지 심화

6.1 StorageClass 설정

# AWS EBS gp3 StorageClass
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: gp3-encrypted
provisioner: ebs.csi.aws.com
parameters:
  type: gp3
  encrypted: "true"
  iops: "6000"           # gp3 기본 3000, 최대 16000
  throughput: "250"       # gp3 기본 125MB/s, 최대 1000MB/s
  fsType: ext4
reclaimPolicy: Retain     # DB 데이터는 반드시 Retain
volumeBindingMode: WaitForFirstConsumer  # 토폴로지 인식 바인딩
allowVolumeExpansion: true
# GCP PD SSD StorageClass
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: pd-ssd-encrypted
provisioner: pd.csi.storage.gke.io
parameters:
  type: pd-ssd
  disk-encryption-kms-key: projects/my-project/locations/global/keyRings/my-ring/cryptoKeys/my-key
reclaimPolicy: Retain
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true

6.2 Local PV vs Cloud EBS/PD 성능 비교

지표Local NVMeEBS gp3 (6000 IOPS)EBS io2 (16000 IOPS)GCP PD-SSD
랜덤 읽기 IOPS500K+6,00016,00030,000
랜덤 쓰기 IOPS200K+6,00016,00030,000
순차 읽기 MB/s3,000+2501,0001,200
순차 쓰기 MB/s2,000+2501,0001,200
P99 레이턴시0.1ms 미만1-3ms0.5-1ms1-2ms
데이터 지속성노드 장애 시 손실99.999%99.999%99.999%
크기 조정불가동적 확장동적 확장동적 확장

6.3 CSI 드라이버

# AWS EBS CSI 드라이버 설치
helm repo add aws-ebs-csi-driver https://kubernetes-sigs.github.io/aws-ebs-csi-driver
helm install aws-ebs-csi-driver aws-ebs-csi-driver/aws-ebs-csi-driver \
  --namespace kube-system \
  --set controller.serviceAccount.annotations."eks\.amazonaws\.com/role-arn"="arn:aws:iam::ACCOUNT:role/ebs-csi-role"

# local-path-provisioner (개발/테스트 환경)
kubectl apply -f https://raw.githubusercontent.com/rancher/local-path-provisioner/master/deploy/local-path-storage.yaml

6.4 스토리지 벤치마킹

# fio 벤치마크 Pod
kubectl run fio-bench --image=nixery.dev/fio \
  --overrides='{
    "spec": {
      "containers": [{
        "name": "fio-bench",
        "image": "nixery.dev/fio",
        "command": ["sleep", "3600"],
        "volumeMounts": [{
          "name": "test-vol",
          "mountPath": "/data"
        }]
      }],
      "volumes": [{
        "name": "test-vol",
        "persistentVolumeClaim": {
          "claimName": "bench-pvc"
        }
      }]
    }
  }'

# 랜덤 읽기 벤치마크
kubectl exec fio-bench -- fio \
  --name=randread \
  --ioengine=libaio \
  --iodepth=32 \
  --rw=randread \
  --bs=4k \
  --direct=1 \
  --size=1G \
  --numjobs=4 \
  --runtime=60 \
  --directory=/data \
  --group_reporting

# pgbench (PostgreSQL 성능 테스트)
kubectl exec -it my-postgres-1 -- pgbench \
  -i -s 100 myapp  # 초기화 (100 scale factor)

kubectl exec -it my-postgres-1 -- pgbench \
  -c 50 -j 4 -T 300 -P 10 myapp  # 50 클라이언트, 300초

7. High Availability 패턴

7.1 동기 vs 비동기 복제

# CloudNativePG: 동기 복제 설정
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: my-postgres-sync
spec:
  instances: 3

  postgresql:
    parameters:
      synchronous_standby_names: "ANY 1 (*)"
      # ANY 1: 최소 1개 Replica에 동기 복제 보장
      # FIRST 1: 첫 번째 Replica에 동기 복제

  # minSyncReplicas / maxSyncReplicas로 세밀 제어
  minSyncReplicas: 1
  maxSyncReplicas: 1
방식데이터 일관성쓰기 레이턴시RPO사용 시나리오
비동기최종적 일관성낮음수초 데이터 손실 가능일반 워크로드
동기 (ANY 1)강한 일관성높음 (2x)0 (데이터 손실 없음)금융, 결제
반동기중간중간매우 작음대부분의 프로덕션

7.2 Pod Disruption Budget

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: postgres-pdb
  namespace: database
spec:
  maxUnavailable: 1  # 동시에 최대 1개 Pod만 중단 허용
  selector:
    matchLabels:
      app: postgres

CloudNativePG는 PDB를 자동 생성합니다. 기본적으로 maxUnavailable: 1.

7.3 Topology Spread Constraints

apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: my-postgres
spec:
  instances: 3

  topologySpreadConstraints:
    - maxSkew: 1
      topologyKey: topology.kubernetes.io/zone
      whenUnsatisfiable: DoNotSchedule
      labelSelector:
        matchLabels:
          cnpg.io/cluster: my-postgres
    - maxSkew: 1
      topologyKey: kubernetes.io/hostname
      whenUnsatisfiable: DoNotSchedule
      labelSelector:
        matchLabels:
          cnpg.io/cluster: my-postgres

이 설정은 DB Pod를 다른 가용 영역(AZ)과 다른 노드에 분산합니다.

7.4 Anti-Affinity 규칙

spec:
  affinity:
    enablePodAntiAffinity: true
    topologyKey: kubernetes.io/hostname
    # 또는 topology.kubernetes.io/zone (AZ 분산)
    podAntiAffinityType: required
    # required: 반드시 분산 (Pod 스케줄 불가 시 Pending)
    # preferred: 가능하면 분산 (스케줄 불가 시 동일 노드 허용)

    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
          - matchExpressions:
              - key: node-role
                operator: In
                values:
                  - database
    tolerations:
      - key: database-only
        operator: Equal
        value: "true"
        effect: NoSchedule

8. 백업 및 복구 전략

8.1 논리 백업 vs 물리 백업

구분논리 백업 (pg_dump)물리 백업 (pgBackRest)
속도느림 (대용량 시 수시간)빠름 (증분 백업 지원)
크기SQL 텍스트 (압축 가능)바이너리 (작음)
복원 유연성테이블 단위 복원 가능전체 클러스터 복원
버전 호환다른 PG 버전 간 가능동일 PG 버전만
PITR불가가능
사용 시나리오마이그레이션, 부분 복원프로덕션 재해 복구

8.2 연속 WAL 아카이빙

WAL 아카이빙 흐름:
PG PrimaryWAL 세그먼트 생성 (16MB 단위)S3/GCS 아카이브
                                            연속적 데이터 보호
                                        PITR: 임의 시점 복구 가능
# CloudNativePG WAL 아카이빙 설정 (자동)
spec:
  backup:
    barmanObjectStore:
      destinationPath: s3://my-backup-bucket/wal-archive/
      wal:
        compression: gzip
        maxParallel: 4
        # WAL 세그먼트가 완성될 때마다 자동 업로드

8.3 PITR (Point-in-Time Recovery) 워크스루

# 1. 현재 백업 목록 확인
kubectl get backup -n database

# 2. 복원 대상 시간 결정 (예: 실수로 DELETE 실행 직전)
# 시간: 2024-09-01 14:29:00 (삭제가 14:30에 실행됨)

# 3. 새 클러스터로 PITR 복원
kubectl apply -f pitr-restore.yaml
# (앞서 3.4절의 PITR YAML 참조)

# 4. 복원 진행 상황 모니터링
kubectl get cluster my-postgres-restored -n database -w

# 5. 복원 완료 후 데이터 검증
kubectl exec -it my-postgres-restored-1 -- psql -U myapp -d myapp \
  -c "SELECT count(*) FROM important_table;"

# 6. 애플리케이션 연결 전환
kubectl patch service myapp-db -n database \
  -p '{"spec":{"selector":{"cnpg.io/cluster":"my-postgres-restored"}}}'

8.4 Velero 클러스터 레벨 백업

# Velero 설치
velero install \
  --provider aws \
  --plugins velero/velero-plugin-for-aws:v1.9.0 \
  --bucket my-velero-bucket \
  --backup-location-config region=ap-northeast-2 \
  --snapshot-location-config region=ap-northeast-2

# DB 네임스페이스 전체 백업 (PV 포함)
velero backup create db-full-backup \
  --include-namespaces database \
  --include-resources '*' \
  --snapshot-volumes=true \
  --volume-snapshot-locations default

# 스케줄 백업
velero schedule create daily-db-backup \
  --schedule="0 4 * * *" \
  --include-namespaces database \
  --snapshot-volumes=true \
  --ttl 720h  # 30일 보관

8.5 교차 리전 재해 복구

# Source 리전 (ap-northeast-2)의 CloudNativePG 클러스터
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: my-postgres-primary
spec:
  instances: 3
  backup:
    barmanObjectStore:
      destinationPath: s3://my-backup-bucket-primary/
      s3Credentials:
        accessKeyId:
          name: aws-creds
          key: ACCESS_KEY_ID
        secretAccessKey:
          name: aws-creds
          key: ACCESS_SECRET_KEY

---
# DR 리전 (us-west-2)의 Replica 클러스터
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: my-postgres-dr
spec:
  instances: 2

  replica:
    enabled: true
    source: my-postgres-primary

  externalClusters:
    - name: my-postgres-primary
      barmanObjectStore:
        destinationPath: s3://my-backup-bucket-primary/
        s3Credentials:
          accessKeyId:
            name: aws-creds-dr
            key: ACCESS_KEY_ID
          secretAccessKey:
            name: aws-creds-dr
            key: ACCESS_SECRET_KEY

9. 성능 튜닝

9.1 리소스 Requests/Limits

# DB Pod는 반드시 QoS Guaranteed 클래스로 설정
resources:
  requests:
    cpu: "4"
    memory: "16Gi"
  limits:
    cpu: "4"        # requests == limits → Guaranteed
    memory: "16Gi"  # OOM Killer 방지

메모리 배분 가이드:

총 메모리 16Gi 기준:
- shared_buffers: 4GB (25%)
- effective_cache_size: 12GB (75%)
- work_mem: 64MB (세션당)
- maintenance_work_mem: 1GB
- OS/기타: 약 2GB

9.2 Huge Pages 설정

# 노드에 Huge Pages 설정 필요
# /etc/sysctl.conf: vm.nr_hugepages=2048

apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: my-postgres
spec:
  postgresql:
    parameters:
      shared_buffers: "4GB"
      huge_pages: "on"

  resources:
    requests:
      cpu: "4"
      memory: "16Gi"
      hugepages-2Mi: "4Gi"
    limits:
      cpu: "4"
      memory: "16Gi"
      hugepages-2Mi: "4Gi"

9.3 PostgreSQL 성능 파라미터 튜닝

postgresql:
  parameters:
    # 메모리
    shared_buffers: "4GB"
    effective_cache_size: "12GB"
    work_mem: "64MB"
    maintenance_work_mem: "1GB"
    wal_buffers: "64MB"

    # WAL
    max_wal_size: "4GB"
    min_wal_size: "1GB"
    checkpoint_completion_target: "0.9"
    wal_compression: "zstd"

    # 쿼리 플래너
    random_page_cost: "1.1"       # SSD인 경우 (HDD는 4.0)
    effective_io_concurrency: "200"  # SSD인 경우 (HDD는 2)
    seq_page_cost: "1.0"

    # 병렬 처리
    max_worker_processes: "8"
    max_parallel_workers_per_gather: "4"
    max_parallel_workers: "8"
    max_parallel_maintenance_workers: "4"

    # 연결
    max_connections: "200"       # PgBouncer 사용 시 낮게
    idle_in_transaction_session_timeout: "30000"  # 30초

    # 로깅
    log_min_duration_statement: "1000"   # 1초 이상 쿼리 로깅
    log_checkpoints: "on"
    log_lock_waits: "on"
    log_temp_files: "0"

    # 자동 VACUUM
    autovacuum_max_workers: "4"
    autovacuum_naptime: "30"
    autovacuum_vacuum_cost_limit: "1000"

9.4 Connection Pooling 구성

PgBouncer 권장 설정:

pool_mode = transaction
- 트랜잭션 단위로 커넥션 할당/반환
- K8s 환경에서 수백 Pod 동시 연결 시 필수
- PREPARE 문 사용 제한 (session mode에서만 가능)

default_pool_size = 25
- 백엔드 DB 커넥션  (max_connections의 12.5%)
- 너무 높으면 DB 과부하, 너무 낮으면 대기 증가

max_client_conn = 1000
- 프론트엔드 최대 연결 수
- Pod 수 x Pod당 커넥션 수 기준으로 산정

reserve_pool_size = 5
- 피크 트래픽 대비 예비 커넥션

server_idle_timeout = 300
- 유휴 커넥션 5분 후 정리

10. 보안

10.1 Network Policies

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: postgres-network-policy
  namespace: database
spec:
  podSelector:
    matchLabels:
      cnpg.io/cluster: my-postgres
  policyTypes:
    - Ingress
    - Egress
  ingress:
    # 애플리케이션 네임스페이스에서만 접근 허용
    - from:
        - namespaceSelector:
            matchLabels:
              app.kubernetes.io/part-of: myapp
        - podSelector:
            matchLabels:
              app: myapp
      ports:
        - protocol: TCP
          port: 5432
    # 같은 클러스터 내 복제 트래픽 허용
    - from:
        - podSelector:
            matchLabels:
              cnpg.io/cluster: my-postgres
      ports:
        - protocol: TCP
          port: 5432
  egress:
    # S3 백업을 위한 아웃바운드 (HTTPS)
    - to: []
      ports:
        - protocol: TCP
          port: 443
    # 클러스터 내 통신
    - to:
        - podSelector:
            matchLabels:
              cnpg.io/cluster: my-postgres
      ports:
        - protocol: TCP
          port: 5432
    # DNS
    - to: []
      ports:
        - protocol: UDP
          port: 53
        - protocol: TCP
          port: 53

10.2 Secrets 관리 (External Secrets Operator)

apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: postgres-credentials
  namespace: database
spec:
  refreshInterval: 1h
  secretStoreRef:
    name: aws-secrets-manager
    kind: ClusterSecretStore
  target:
    name: myapp-db-credentials
    creationPolicy: Owner
  data:
    - secretKey: username
      remoteRef:
        key: production/database/postgres
        property: username
    - secretKey: password
      remoteRef:
        key: production/database/postgres
        property: password

10.3 TLS 설정

# CloudNativePG는 기본적으로 TLS 활성화
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: my-postgres
spec:
  certificates:
    serverTLSSecret: my-postgres-server-tls
    serverCASecret: my-postgres-ca
    clientCASecret: my-postgres-client-ca
    replicationTLSSecret: my-postgres-replication-tls

  postgresql:
    parameters:
      ssl: "on"
      ssl_min_protocol_version: "TLSv1.3"

11. 모니터링 및 알림

11.1 핵심 메트릭

# Prometheus 알림 규칙
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: postgres-alerts
  namespace: monitoring
spec:
  groups:
    - name: postgres.rules
      rules:
        - alert: PostgresReplicationLagHigh
          expr: cnpg_pg_replication_lag > 10
          for: 5m
          labels:
            severity: warning
          annotations:
            summary: "PostgreSQL 복제 지연이 10초를 초과합니다"

        - alert: PostgresConnectionsNearLimit
          expr: >
            cnpg_pg_stat_activity_count /
            cnpg_pg_settings_setting{name="max_connections"} > 0.8
          for: 5m
          labels:
            severity: warning
          annotations:
            summary: "연결 수가 최대치의 80%를 초과했습니다"

        - alert: PostgresCacheHitRatioLow
          expr: >
            cnpg_pg_stat_database_blks_hit /
            (cnpg_pg_stat_database_blks_hit + cnpg_pg_stat_database_blks_read) < 0.95
          for: 15m
          labels:
            severity: warning
          annotations:
            summary: "캐시 히트율이 95% 미만입니다. shared_buffers 증가를 고려하세요"

        - alert: PostgresStorageNearFull
          expr: >
            kubelet_volume_stats_used_bytes{namespace="database"} /
            kubelet_volume_stats_capacity_bytes{namespace="database"} > 0.85
          for: 10m
          labels:
            severity: critical
          annotations:
            summary: "DB 스토리지 사용량이 85%를 초과했습니다"

        - alert: PostgresFailoverDetected
          expr: changes(cnpg_pg_replication_is_primary[5m]) > 0
          labels:
            severity: critical
          annotations:
            summary: "PostgreSQL Failover가 감지되었습니다"

11.2 Grafana 대시보드 핵심 패널

핵심 모니터링 패널:
1. 복제 지연 () - cnpg_pg_replication_lag
2. 활성 연결 수 - cnpg_pg_stat_activity_count
3. 캐시 히트율 - blks_hit / (blks_hit + blks_read)
4. TPS (트랜잭션/) - rate(cnpg_pg_stat_database_xact_commit[5m])
5. 쿼리 레이턴시 P99
6. 디스크 IOPS / 처리량
7. WAL 생성 속도 - rate(cnpg_pg_stat_archiver_archived_count[5m])
8. 스토리지 사용량 - kubelet_volume_stats_used_bytes
9. CPU / 메모리 사용률
10. Vacuum / Analyze 실행 상태

12. 마이그레이션 전략

12.1 VM에서 K8s로 마이그레이션

# 방법 1: pg_dump/pg_restore (소규모 DB)
# 소스 VM에서:
pg_dump -Fc -d myapp -h vm-db.internal -U postgres > myapp.dump

# K8s Pod에서 복원:
kubectl cp myapp.dump database/my-postgres-1:/tmp/myapp.dump
kubectl exec -it my-postgres-1 -n database -- \
  pg_restore -d myapp -U postgres /tmp/myapp.dump

# 방법 2: 논리 복제 (대규모 DB, 무중단)
# 1. K8s에서 빈 클러스터 생성
# 2. VM DB에서 publication 생성
#    CREATE PUBLICATION my_pub FOR ALL TABLES;
# 3. K8s DB에서 subscription 생성
#    CREATE SUBSCRIPTION my_sub
#      CONNECTION 'host=vm-db.internal dbname=myapp user=replicator'
#      PUBLICATION my_pub;
# 4. 초기 데이터 동기화 완료 대기
# 5. 애플리케이션 연결을 K8s DB로 전환
# 6. subscription 삭제

12.2 RDS에서 K8s DB로 마이그레이션

# AWS DMS (Database Migration Service) 활용
# 1. DMS 복제 인스턴스 생성
# 2. 소스 엔드포인트: RDS PostgreSQL
# 3. 대상 엔드포인트: K8s PostgreSQL (NLB/Port-forward 통해 접근)
# 4. Full Load + CDC (Change Data Capture) 모드로 실행
# 5. 동기화 완료 후 애플리케이션 전환

12.3 무중단 마이그레이션 체크리스트

마이그레이션 전:
[ ] 원본 DB 크기 및 테이블 수 파악
[ ] K8s 클러스터 리소스 여유 확인
[ ] StorageClass / PV 크기 산정 (원본의 2배 여유)
[ ] 네트워크 연결 확인 (원본 DBK8s 클러스터)
[ ] 논리 복제 사용자 생성 및 권한 부여

마이그레이션 중:
[ ] 초기 데이터 복사 완료 확인
[ ] 복제 지연이 0에 수렴하는지 확인
[ ] 시퀀스 값 동기화 확인
[ ] 애플리케이션 읽기 트래픽 K8s로 전환 (테스트)
[ ] 쓰기 트래픽 전환 (짧은 다운타임 또는 이중 쓰기)

마이그레이션 후:
[ ] 데이터 정합성 검증 (행 수, 체크섬)
[ ] 애플리케이션 성능 확인
[ ] 원본 DB 복제 중단
[ ] 백업 정책 적용 확인
[ ] 모니터링/알림 동작 확인

13. 프로덕션 체크리스트

인프라 및 스토리지

  • StorageClass의 reclaimPolicyRetain으로 설정됨
  • volumeBindingModeWaitForFirstConsumer로 설정됨
  • EBS/PD IOPS가 워크로드에 적합한 수준 (최소 6000 IOPS)
  • 볼륨 자동 확장(allowVolumeExpansion: true) 활성화
  • 노드에 DB 전용 레이블/테인트 적용

HA 및 복제

  • 최소 3개 인스턴스로 구성 (Primary 1 + Replica 2)
  • Pod Anti-Affinity로 다른 노드에 분산
  • Topology Spread로 다른 AZ에 분산
  • PodDisruptionBudget 설정 (maxUnavailable: 1)
  • 동기/비동기 복제 모드 결정 및 설정

백업 및 복구

  • 자동 백업 스케줄 설정 (최소 일 1회)
  • WAL 아카이빙 활성화 (PITR 가능)
  • 백업 보관 정책 설정 (최소 30일)
  • 백업 복원 테스트 완료 (최소 분기 1회)
  • 교차 리전 백업 복제 (DR 요구 시)

성능

  • 리소스 requests == limits (QoS Guaranteed)
  • shared_buffers = 총 메모리의 25%
  • effective_cache_size = 총 메모리의 75%
  • Connection Pooling 활성화 (PgBouncer transaction mode)
  • 쿼리 로깅 활성화 (log_min_duration_statement)

보안

  • NetworkPolicy로 DB 접근 제한
  • TLS 활성화 (최소 TLSv1.2)
  • Secret은 External Secrets Operator 또는 Vault로 관리
  • RBAC: Operator 서비스 계정 최소 권한
  • 데이터 암호화 (스토리지 레벨 + 전송 레벨)

모니터링

  • Prometheus 메트릭 수집 설정
  • Grafana 대시보드 구성
  • 복제 지연 알림 (10초 초과)
  • 스토리지 용량 알림 (85% 초과)
  • 연결 수 알림 (80% 초과)
  • Failover 발생 알림

운영

  • Operator 업그레이드 절차 문서화
  • DB 마이너/메이저 버전 업그레이드 절차 문서화
  • 장애 대응 런북 작성
  • 정기 복원 테스트 일정 수립

14. 퀴즈

Q1: StatefulSet에서 Pod Identity가 중요한 이유는?

A: StatefulSet의 Pod Identity(순서 인덱스 + 안정적 호스트명 + 전용 PVC)는 데이터베이스 운영에 필수입니다. Primary/Replica 역할이 특정 Pod에 바인딩되고, 재시작 후에도 동일한 스토리지에 연결되며, DNS 기반으로 다른 Pod가 특정 인스턴스에 직접 접근할 수 있습니다. 일반 Deployment로는 이런 보장이 불가능합니다.

Q2: CloudNativePG의 자동 Failover는 어떻게 동작하는가?

A: CloudNativePG는 Primary Pod의 health check 실패를 감지하면, 가장 최신 WAL 위치를 가진 Replica를 선택하여 pg_promote()로 새 Primary로 승격합니다. 나머지 Replica는 새 Primary를 팔로우하도록 재설정되고, 이전 Primary는 pg_rewind를 통해 Replica로 재합류합니다. 전체 과정은 보통 10-30초 내에 완료됩니다.

Q3: DB Pod의 QoS 클래스를 Guaranteed로 설정해야 하는 이유는?

A: QoS Guaranteed(requests == limits)로 설정하면 K8s가 해당 Pod의 리소스를 보장하며, 노드 메모리 부족 시에도 가장 마지막에 OOM Kill됩니다. DB는 메모리 내 캐시(shared_buffers)와 안정적인 CPU가 필수이므로, Burstable이나 BestEffort로 설정하면 예기치 않은 성능 저하나 장애가 발생할 수 있습니다.

Q4: volumeBindingMode를 WaitForFirstConsumer로 설정하는 이유는?

A: WaitForFirstConsumer는 Pod가 실제로 스케줄될 노드가 결정된 후에 PV를 프로비저닝합니다. 이를 통해 Pod와 PV가 동일한 가용 영역(AZ)에 위치하게 되어, 교차 AZ 네트워크 레이턴시를 방지합니다. Immediate 모드에서는 PV가 먼저 생성되어 다른 AZ에 위치할 수 있으며, 이 경우 Pod 스케줄이 실패하거나 성능이 저하됩니다.

Q5: PgBouncer의 transaction 모드와 session 모드의 차이는?

A: Transaction 모드는 트랜잭션이 끝나면 백엔드 커넥션을 풀로 반환하여 수백 개의 클라이언트가 소수의 DB 커넥션을 공유합니다. K8s에서 다수의 Pod가 동시 접근하는 환경에 최적입니다. Session 모드는 클라이언트 세션이 끝날 때까지 커넥션을 점유하여, PREPARE 문이나 세션 변수가 필요한 경우 사용합니다. 일반적으로 K8s 환경에서는 transaction 모드를 권장합니다.


15. 참고 자료

  1. CloudNativePG 공식 문서 - https://cloudnative-pg.io/documentation/
  2. Percona Operator for MySQL - https://docs.percona.com/percona-operator-for-mysql/pxc/
  3. MongoDB Community Operator - https://github.com/mongodb/mongodb-kubernetes-operator
  4. Kubernetes StatefulSet 문서 - https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/
  5. PgBouncer 문서 - https://www.pgbouncer.org/config.html
  6. Velero 공식 문서 - https://velero.io/docs/
  7. External Secrets Operator - https://external-secrets.io/
  8. AWS EBS CSI Driver - https://github.com/kubernetes-sigs/aws-ebs-csi-driver
  9. PostgreSQL 성능 튜닝 가이드 - https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server
  10. Kubernetes Network Policies - https://kubernetes.io/docs/concepts/services-networking/network-policies/
  11. pgBackRest 문서 - https://pgbackrest.org/
  12. Prometheus PostgreSQL Exporter - https://github.com/prometheus-community/postgres_exporter
  13. Barman (Backup and Recovery Manager) - https://pgbarman.org/

Kubernetes Database Deep Dive 2025: Operator Patterns, HA Configuration, Backup/Recovery, Performance Tuning

Table of Contents

1. Should You Run Databases on K8s?

1.1 Pros and Cons Analysis (2025 Perspective)

Advantages of K8s DB Operations

  • Consistent deployment: Manage DB with the same GitOps workflow as applications
  • Resource efficiency: Share node resources with other workloads (cost savings vs dedicated VMs)
  • Automatic recovery: Operator detects failures and performs automatic Failover (RTO under 30s)
  • Portability: Same configuration anywhere without cloud vendor lock-in
  • Dev environment parity: Identical DB stack across dev/staging/production

Disadvantages of K8s DB Operations

  • Operational complexity: Additional management points like StorageClass, PV, Operator upgrades
  • Storage performance: Network storage (EBS/PD) has higher latency than local disks
  • Expertise required: Deep understanding of both DBA and K8s operations needed
  • Backup complexity: K8s-specific volume snapshots, object storage integration required

1.2 Decision Framework

CriteriaK8s DB SuitableManaged DB (RDS etc.) Suitable
Team capabilityK8s + DBA experts availableNo DBA
Cost sensitivityHigh (need infra optimization)Medium (management cost OK)
ComplianceData location control neededCloud region sufficient
Multi-cloudRequiredSingle cloud
Workload sizeSmall-medium (hundreds of GB)Large (multiple TB+)
SLA requirement99.9% achievable in-house99.99% vendor SLA needed

2. StatefulSet Deep Dive

2.1 Pod Identity (Ordinal Index and Stable Hostname)

StatefulSet assigns each Pod an ordinal index.

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: postgres
  namespace: database
spec:
  serviceName: postgres-headless
  replicas: 3
  selector:
    matchLabels:
      app: postgres
  template:
    metadata:
      labels:
        app: postgres
    spec:
      containers:
        - name: postgres
          image: postgres:16
          ports:
            - containerPort: 5432
          volumeMounts:
            - name: data
              mountPath: /var/lib/postgresql/data
          env:
            - name: POSTGRES_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: postgres-secret
                  key: password
  volumeClaimTemplates:
    - metadata:
        name: data
      spec:
        accessModes: ["ReadWriteOnce"]
        storageClassName: gp3-encrypted
        resources:
          requests:
            storage: 100Gi

Pods created by this StatefulSet:

postgres-0  ->  postgres-0.postgres-headless.database.svc.cluster.local
postgres-1  ->  postgres-1.postgres-headless.database.svc.cluster.local
postgres-2  ->  postgres-2.postgres-headless.database.svc.cluster.local

Each Pod binds to the same name and same PVC even after restarts.

2.2 PodManagementPolicy

spec:
  podManagementPolicy: OrderedReady  # default
  # OrderedReady: Create in order 0->1->2, delete in order 2->1->0
  # Parallel: All Pods created/deleted simultaneously (use with caution for DBs)
  • OrderedReady: Primary (index 0) starts first and becomes Ready before Replicas start
  • Parallel: Only use for initial cluster bootstrap (when no existing data)

2.3 volumeClaimTemplates

volumeClaimTemplates:
  - metadata:
      name: data
      labels:
        type: database-storage
    spec:
      accessModes: ["ReadWriteOnce"]
      storageClassName: gp3-encrypted
      resources:
        requests:
          storage: 100Gi

This template creates individual PVCs for each Pod:

data-postgres-0  ->  100Gi PV (gp3-encrypted)
data-postgres-1  ->  100Gi PV (gp3-encrypted)
data-postgres-2  ->  100Gi PV (gp3-encrypted)

PVCs are not automatically deleted when the StatefulSet is deleted (data protection).

2.4 Update Strategies

spec:
  updateStrategy:
    type: RollingUpdate       # or OnDelete
    rollingUpdate:
      partition: 0            # only update indices >= partition
      maxUnavailable: 1       # K8s 1.24+
  • RollingUpdate: Updates in reverse order from highest index (2 then 1 then 0). Replicas first, Primary last
  • OnDelete: Pod must be manually deleted to update. Useful for canary updates
  • partition: Setting partition: 2 only updates index 2 and above (canary)

2.5 Headless Service

apiVersion: v1
kind: Service
metadata:
  name: postgres-headless
  namespace: database
spec:
  type: ClusterIP
  clusterIP: None  # Headless Service
  selector:
    app: postgres
  ports:
    - port: 5432
      targetPort: 5432

Headless Service returns each Pod's IP directly as DNS A records:

# Query all Pod IPs
nslookup postgres-headless.database.svc.cluster.local

# Direct access to specific Pod
psql -h postgres-0.postgres-headless.database.svc.cluster.local -U postgres

3. CloudNativePG (PostgreSQL) Deep Dive

3.1 Cluster CRD

apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: my-postgres
  namespace: database
spec:
  instances: 3

  postgresql:
    parameters:
      shared_buffers: "2GB"
      effective_cache_size: "6GB"
      work_mem: "64MB"
      maintenance_work_mem: "512MB"
      max_connections: "200"
      max_wal_size: "2GB"
      min_wal_size: "1GB"
      wal_buffers: "64MB"
      random_page_cost: "1.1"
      effective_io_concurrency: "200"
      max_worker_processes: "8"
      max_parallel_workers_per_gather: "4"
      max_parallel_workers: "8"

  storage:
    size: 100Gi
    storageClass: gp3-encrypted
    pvcTemplate:
      accessModes:
        - ReadWriteOnce

  walStorage:
    size: 20Gi
    storageClass: gp3-encrypted

  resources:
    requests:
      cpu: "2"
      memory: "8Gi"
    limits:
      cpu: "4"
      memory: "8Gi"

  monitoring:
    enablePodMonitor: true

  bootstrap:
    initdb:
      database: myapp
      owner: myapp
      secret:
        name: myapp-db-credentials

3.2 Automatic Failover

CloudNativePG automatically promotes a Replica when the Primary fails.

Normal state:
  my-postgres-1 (Primary, RW)
  my-postgres-2 (Replica, RO) - streaming replication
  my-postgres-3 (Replica, RO) - streaming replication

Primary failure occurs:
  1. CloudNativePG detects Primary failure (health check failure)
  2. Selects Replica with most recent WAL position
  3. Executes pg_promote() -> promotes new Primary
  4. Remaining Replicas follow new Primary
  5. pg_rewind rejoins former Primary as Replica

After failover:
  my-postgres-2 (Primary, RW) <- auto-promoted
  my-postgres-3 (Replica, RO) - follows new Primary
  my-postgres-1 (Replica, RO) - rejoined after pg_rewind

Failover duration: typically 10-30 seconds.

3.3 Backup Configuration (Barman + Object Storage)

apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: my-postgres
spec:
  instances: 3

  backup:
    barmanObjectStore:
      destinationPath: s3://my-backup-bucket/postgres/
      endpointURL: https://s3.us-east-1.amazonaws.com
      s3Credentials:
        accessKeyId:
          name: aws-creds
          key: ACCESS_KEY_ID
        secretAccessKey:
          name: aws-creds
          key: ACCESS_SECRET_KEY
      wal:
        compression: gzip
        maxParallel: 4
      data:
        compression: gzip
        immediateCheckpoint: true
    retentionPolicy: "30d"

Scheduled backup:

apiVersion: postgresql.cnpg.io/v1
kind: ScheduledBackup
metadata:
  name: daily-backup
  namespace: database
spec:
  schedule: "0 3 * * *"  # Daily at 3 AM
  backupOwnerReference: self
  cluster:
    name: my-postgres
  immediate: false
  suspend: false

3.4 WAL Archiving and PITR

WAL (Write-Ahead Log) archiving provides continuous data protection.

PITR (Point-in-Time Recovery):

apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: my-postgres-restored
spec:
  instances: 3

  bootstrap:
    recovery:
      source: my-postgres
      recoveryTarget:
        targetTime: "2024-09-01 14:30:00.00000+00"

  externalClusters:
    - name: my-postgres
      barmanObjectStore:
        destinationPath: s3://my-backup-bucket/postgres/
        endpointURL: https://s3.us-east-1.amazonaws.com
        s3Credentials:
          accessKeyId:
            name: aws-creds
            key: ACCESS_KEY_ID
          secretAccessKey:
            name: aws-creds
            key: ACCESS_SECRET_KEY

3.5 Connection Pooling (Built-in PgBouncer)

apiVersion: postgresql.cnpg.io/v1
kind: Pooler
metadata:
  name: my-postgres-pooler-rw
  namespace: database
spec:
  cluster:
    name: my-postgres
  instances: 2
  type: rw  # rw or ro
  pgbouncer:
    poolMode: transaction
    parameters:
      max_client_conn: "1000"
      default_pool_size: "25"
      min_pool_size: "5"
      reserve_pool_size: "5"
      reserve_pool_timeout: "5"
      server_idle_timeout: "300"
# Read-only pooler (targets Replicas)
apiVersion: postgresql.cnpg.io/v1
kind: Pooler
metadata:
  name: my-postgres-pooler-ro
spec:
  cluster:
    name: my-postgres
  instances: 2
  type: ro
  pgbouncer:
    poolMode: transaction
    parameters:
      max_client_conn: "2000"
      default_pool_size: "50"

Application connections:

# Read/Write (Primary)
my-postgres-pooler-rw.database.svc.cluster.local:5432

# Read-Only (Replica)
my-postgres-pooler-ro.database.svc.cluster.local:5432

3.6 Monitoring (Prometheus + Grafana)

spec:
  monitoring:
    enablePodMonitor: true
    customQueriesConfigMap:
      - name: pg-custom-queries
        key: queries

Custom queries ConfigMap:

apiVersion: v1
kind: ConfigMap
metadata:
  name: pg-custom-queries
  namespace: database
data:
  queries: |
    pg_database_size:
      query: "SELECT datname, pg_database_size(datname) as size_bytes FROM pg_database WHERE datname NOT IN ('template0', 'template1')"
      master: true
      metrics:
        - datname:
            usage: "LABEL"
            description: "Database name"
        - size_bytes:
            usage: "GAUGE"
            description: "Database size in bytes"

    pg_replication_lag:
      query: "SELECT CASE WHEN pg_is_in_recovery() THEN EXTRACT(EPOCH FROM (now() - pg_last_xact_replay_timestamp()))::int ELSE 0 END as lag_seconds"
      master: false
      metrics:
        - lag_seconds:
            usage: "GAUGE"
            description: "Replication lag in seconds"

3.7 Rolling Update (Zero-Downtime)

# Image update
spec:
  imageName: ghcr.io/cloudnative-pg/postgresql:16.3

# CloudNativePG automatically:
# 1. Restarts Replicas one by one
# 2. Confirms each Replica is Ready before proceeding
# 3. Finally switchover Primary (promote new Primary then restart old)
# -> Update completes with zero downtime

4. Percona Operator for MySQL

4.1 XtraDB Cluster (Synchronous Replication)

apiVersion: pxc.percona.com/v1
kind: PerconaXtraDBCluster
metadata:
  name: my-mysql
  namespace: database
spec:
  crVersion: "1.14.0"
  secretsName: my-mysql-secrets

  pxc:
    size: 3
    image: percona/percona-xtradb-cluster:8.0.35
    resources:
      requests:
        cpu: "2"
        memory: "8Gi"
      limits:
        cpu: "4"
        memory: "8Gi"
    volumeSpec:
      persistentVolumeClaim:
        storageClassName: gp3-encrypted
        accessModes: ["ReadWriteOnce"]
        resources:
          requests:
            storage: 100Gi
    configuration: |
      [mysqld]
      innodb_buffer_pool_size=4G
      innodb_log_file_size=1G
      innodb_flush_method=O_DIRECT
      max_connections=500
      wsrep_sync_wait=3
      wsrep_slave_threads=4

  haproxy:
    enabled: true
    size: 2
    image: percona/haproxy:2.8.5
    resources:
      requests:
        cpu: "500m"
        memory: "1Gi"

  backup:
    image: percona/percona-xtradb-cluster-operator:1.14.0-pxc8.0-backup-pxb8.0.35
    storages:
      s3-backup:
        type: s3
        s3:
          bucket: my-mysql-backups
          region: us-east-1
          credentialsSecret: aws-s3-secret
    schedule:
      - name: daily-full
        schedule: "0 3 * * *"
        keep: 7
        storageName: s3-backup

4.2 ProxySQL Connection Routing

spec:
  proxysql:
    enabled: true
    size: 2
    image: percona/proxysql2:2.5.5
    resources:
      requests:
        cpu: "500m"
        memory: "1Gi"

4.3 Percona XtraBackup

# Manual backup
apiVersion: pxc.percona.com/v1
kind: PerconaXtraDBClusterBackup
metadata:
  name: manual-backup-20240901
  namespace: database
spec:
  pxcCluster: my-mysql
  storageName: s3-backup
# Check backup status
kubectl get pxc-backup -n database

# Restore from backup
kubectl apply -f - <<EOF
apiVersion: pxc.percona.com/v1
kind: PerconaXtraDBClusterRestore
metadata:
  name: restore-20240901
  namespace: database
spec:
  pxcCluster: my-mysql
  backupName: manual-backup-20240901
EOF

5. MongoDB Community Operator

5.1 ReplicaSet Configuration

apiVersion: mongodbcommunity.mongodb.com/v1
kind: MongoDBCommunity
metadata:
  name: my-mongodb
  namespace: database
spec:
  members: 3
  type: ReplicaSet
  version: "7.0.12"
  security:
    authentication:
      modes: ["SCRAM"]
  users:
    - name: admin
      db: admin
      passwordSecretRef:
        name: mongodb-admin-password
      roles:
        - name: clusterAdmin
          db: admin
        - name: userAdminAnyDatabase
          db: admin
      scramCredentialsSecretName: admin-scram
    - name: myapp
      db: myapp
      passwordSecretRef:
        name: mongodb-myapp-password
      roles:
        - name: readWrite
          db: myapp
      scramCredentialsSecretName: myapp-scram

  statefulSet:
    spec:
      template:
        spec:
          containers:
            - name: mongod
              resources:
                requests:
                  cpu: "2"
                  memory: "4Gi"
                limits:
                  cpu: "4"
                  memory: "4Gi"
      volumeClaimTemplates:
        - metadata:
            name: data-volume
          spec:
            storageClassName: gp3-encrypted
            accessModes: ["ReadWriteOnce"]
            resources:
              requests:
                storage: 50Gi

  additionalMongodConfig:
    storage.wiredTiger.engineConfig.cacheSizeGB: 2
    net.maxIncomingConnections: 500

5.2 Backup Strategies

# mongodump-based logical backup (CronJob)
apiVersion: batch/v1
kind: CronJob
metadata:
  name: mongodb-backup
  namespace: database
spec:
  schedule: "0 2 * * *"
  jobTemplate:
    spec:
      template:
        spec:
          containers:
            - name: backup
              image: mongo:7.0
              command:
                - /bin/sh
                - -c
                - |
                  mongodump \
                    --uri="mongodb://myapp:password@my-mongodb-0.my-mongodb-svc:27017/myapp?replicaSet=my-mongodb" \
                    --gzip \
                    --archive=/backup/myapp-$(date +%Y%m%d).gz
                  aws s3 cp /backup/myapp-$(date +%Y%m%d).gz \
                    s3://my-backup-bucket/mongodb/
          restartPolicy: OnFailure

6. Storage Deep Dive

6.1 StorageClass Configuration

# AWS EBS gp3 StorageClass
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: gp3-encrypted
provisioner: ebs.csi.aws.com
parameters:
  type: gp3
  encrypted: "true"
  iops: "6000"           # gp3 baseline 3000, max 16000
  throughput: "250"       # gp3 baseline 125MB/s, max 1000MB/s
  fsType: ext4
reclaimPolicy: Retain     # DB data must use Retain
volumeBindingMode: WaitForFirstConsumer  # Topology-aware binding
allowVolumeExpansion: true
# GCP PD SSD StorageClass
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: pd-ssd-encrypted
provisioner: pd.csi.storage.gke.io
parameters:
  type: pd-ssd
reclaimPolicy: Retain
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true

6.2 Local PV vs Cloud EBS/PD Performance Comparison

MetricLocal NVMeEBS gp3 (6000 IOPS)EBS io2 (16000 IOPS)GCP PD-SSD
Random Read IOPS500K+6,00016,00030,000
Random Write IOPS200K+6,00016,00030,000
Sequential Read MB/s3,000+2501,0001,200
Sequential Write MB/s2,000+2501,0001,200
P99 LatencyUnder 0.1ms1-3ms0.5-1ms1-2ms
Data DurabilityLost on node failure99.999%99.999%99.999%
ResizeNot possibleDynamic expansionDynamic expansionDynamic expansion

6.3 Storage Benchmarking

# fio benchmark Pod
kubectl run fio-bench --image=nixery.dev/fio \
  --overrides='{
    "spec": {
      "containers": [{
        "name": "fio-bench",
        "image": "nixery.dev/fio",
        "command": ["sleep", "3600"],
        "volumeMounts": [{
          "name": "test-vol",
          "mountPath": "/data"
        }]
      }],
      "volumes": [{
        "name": "test-vol",
        "persistentVolumeClaim": {
          "claimName": "bench-pvc"
        }
      }]
    }
  }'

# Random read benchmark
kubectl exec fio-bench -- fio \
  --name=randread \
  --ioengine=libaio \
  --iodepth=32 \
  --rw=randread \
  --bs=4k \
  --direct=1 \
  --size=1G \
  --numjobs=4 \
  --runtime=60 \
  --directory=/data \
  --group_reporting

# pgbench (PostgreSQL performance test)
kubectl exec -it my-postgres-1 -- pgbench \
  -i -s 100 myapp  # Initialize (100 scale factor)

kubectl exec -it my-postgres-1 -- pgbench \
  -c 50 -j 4 -T 300 -P 10 myapp  # 50 clients, 300 seconds

7. High Availability Patterns

7.1 Synchronous vs Asynchronous Replication

# CloudNativePG: Synchronous replication
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: my-postgres-sync
spec:
  instances: 3

  postgresql:
    parameters:
      synchronous_standby_names: "ANY 1 (*)"
      # ANY 1: Guarantees sync replication to at least 1 Replica
      # FIRST 1: Sync replication to the first Replica

  minSyncReplicas: 1
  maxSyncReplicas: 1
ModeData ConsistencyWrite LatencyRPOUse Case
AsynchronousEventual consistencyLowPossible seconds of data lossGeneral workloads
Synchronous (ANY 1)Strong consistencyHigh (2x)0 (no data loss)Financial, payments
Semi-synchronousMediumMediumVery smallMost production

7.2 Pod Disruption Budget

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: postgres-pdb
  namespace: database
spec:
  maxUnavailable: 1  # Allow max 1 Pod disrupted at a time
  selector:
    matchLabels:
      app: postgres

CloudNativePG automatically creates a PDB. Default is maxUnavailable: 1.

7.3 Topology Spread Constraints

apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: my-postgres
spec:
  instances: 3

  topologySpreadConstraints:
    - maxSkew: 1
      topologyKey: topology.kubernetes.io/zone
      whenUnsatisfiable: DoNotSchedule
      labelSelector:
        matchLabels:
          cnpg.io/cluster: my-postgres
    - maxSkew: 1
      topologyKey: kubernetes.io/hostname
      whenUnsatisfiable: DoNotSchedule
      labelSelector:
        matchLabels:
          cnpg.io/cluster: my-postgres

This spreads DB Pods across different availability zones (AZs) and different nodes.

7.4 Anti-Affinity Rules

spec:
  affinity:
    enablePodAntiAffinity: true
    topologyKey: kubernetes.io/hostname
    podAntiAffinityType: required
    # required: Must spread (Pod stays Pending if not possible)
    # preferred: Spread if possible (allows same node if needed)

    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
          - matchExpressions:
              - key: node-role
                operator: In
                values:
                  - database
    tolerations:
      - key: database-only
        operator: Equal
        value: "true"
        effect: NoSchedule

8. Backup and Recovery Strategies

8.1 Logical vs Physical Backup

AspectLogical (pg_dump)Physical (pgBackRest)
SpeedSlow (hours for large DBs)Fast (incremental backup supported)
SizeSQL text (compressible)Binary (smaller)
Restore flexibilityTable-level restore possibleFull cluster restore
Version compatibilityCross PG versionsSame PG version only
PITRNot possiblePossible
Use caseMigration, partial restoreProduction disaster recovery

8.2 Continuous WAL Archiving

WAL archiving flow:
PG Primary -> WAL segments generated (16MB each) -> S3/GCS archive
                                                        |
                                                Continuous data protection
                                                        |
                                             PITR: Recovery to any point

8.3 PITR (Point-in-Time Recovery) Walkthrough

# 1. Check current backup list
kubectl get backup -n database

# 2. Determine target recovery time (e.g., just before accidental DELETE)
# Time: 2024-09-01 14:29:00 (deletion executed at 14:30)

# 3. PITR restore to new cluster
kubectl apply -f pitr-restore.yaml

# 4. Monitor restore progress
kubectl get cluster my-postgres-restored -n database -w

# 5. Verify data after restore
kubectl exec -it my-postgres-restored-1 -- psql -U myapp -d myapp \
  -c "SELECT count(*) FROM important_table;"

# 6. Switch application connection
kubectl patch service myapp-db -n database \
  -p '{"spec":{"selector":{"cnpg.io/cluster":"my-postgres-restored"}}}'

8.4 Velero Cluster-Level Backup

# Install Velero
velero install \
  --provider aws \
  --plugins velero/velero-plugin-for-aws:v1.9.0 \
  --bucket my-velero-bucket \
  --backup-location-config region=us-east-1 \
  --snapshot-location-config region=us-east-1

# Full backup of DB namespace (including PVs)
velero backup create db-full-backup \
  --include-namespaces database \
  --include-resources '*' \
  --snapshot-volumes=true \
  --volume-snapshot-locations default

# Scheduled backup
velero schedule create daily-db-backup \
  --schedule="0 4 * * *" \
  --include-namespaces database \
  --snapshot-volumes=true \
  --ttl 720h  # 30-day retention

8.5 Cross-Region Disaster Recovery

# Source region cluster
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: my-postgres-primary
spec:
  instances: 3
  backup:
    barmanObjectStore:
      destinationPath: s3://my-backup-bucket-primary/
      s3Credentials:
        accessKeyId:
          name: aws-creds
          key: ACCESS_KEY_ID
        secretAccessKey:
          name: aws-creds
          key: ACCESS_SECRET_KEY

---
# DR region replica cluster
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: my-postgres-dr
spec:
  instances: 2

  replica:
    enabled: true
    source: my-postgres-primary

  externalClusters:
    - name: my-postgres-primary
      barmanObjectStore:
        destinationPath: s3://my-backup-bucket-primary/
        s3Credentials:
          accessKeyId:
            name: aws-creds-dr
            key: ACCESS_KEY_ID
          secretAccessKey:
            name: aws-creds-dr
            key: ACCESS_SECRET_KEY

9. Performance Tuning

9.1 Resource Requests/Limits

# DB Pods must use QoS Guaranteed class
resources:
  requests:
    cpu: "4"
    memory: "16Gi"
  limits:
    cpu: "4"        # requests == limits -> Guaranteed
    memory: "16Gi"  # Prevents OOM Killer

Memory allocation guide:

Based on 16Gi total memory:
- shared_buffers: 4GB (25%)
- effective_cache_size: 12GB (75%)
- work_mem: 64MB (per session)
- maintenance_work_mem: 1GB
- OS/other: ~2GB

9.2 PostgreSQL Performance Parameter Tuning

postgresql:
  parameters:
    # Memory
    shared_buffers: "4GB"
    effective_cache_size: "12GB"
    work_mem: "64MB"
    maintenance_work_mem: "1GB"
    wal_buffers: "64MB"

    # WAL
    max_wal_size: "4GB"
    min_wal_size: "1GB"
    checkpoint_completion_target: "0.9"
    wal_compression: "zstd"

    # Query planner
    random_page_cost: "1.1"       # For SSD (HDD is 4.0)
    effective_io_concurrency: "200"  # For SSD (HDD is 2)

    # Parallelism
    max_worker_processes: "8"
    max_parallel_workers_per_gather: "4"
    max_parallel_workers: "8"

    # Connections
    max_connections: "200"
    idle_in_transaction_session_timeout: "30000"  # 30 seconds

    # Logging
    log_min_duration_statement: "1000"   # Log queries over 1 second
    log_checkpoints: "on"
    log_lock_waits: "on"

    # Auto VACUUM
    autovacuum_max_workers: "4"
    autovacuum_naptime: "30"
    autovacuum_vacuum_cost_limit: "1000"

9.3 Connection Pooling Configuration

PgBouncer recommended settings:

pool_mode = transaction
- Allocates/returns connections per transaction
- Essential for hundreds of Pods connecting simultaneously in K8s
- Restricts PREPARE statement usage (only in session mode)

default_pool_size = 25
- Backend DB connection count (12.5% of max_connections)
- Too high overloads DB, too low increases wait times

max_client_conn = 1000
- Maximum frontend connections
- Calculate: number of Pods x connections per Pod

reserve_pool_size = 5
- Reserve connections for peak traffic

server_idle_timeout = 300
- Clean up idle connections after 5 minutes

10. Security

10.1 Network Policies

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: postgres-network-policy
  namespace: database
spec:
  podSelector:
    matchLabels:
      cnpg.io/cluster: my-postgres
  policyTypes:
    - Ingress
    - Egress
  ingress:
    # Allow access only from application namespace
    - from:
        - namespaceSelector:
            matchLabels:
              app.kubernetes.io/part-of: myapp
      ports:
        - protocol: TCP
          port: 5432
    # Allow replication traffic within cluster
    - from:
        - podSelector:
            matchLabels:
              cnpg.io/cluster: my-postgres
      ports:
        - protocol: TCP
          port: 5432
  egress:
    # Outbound for S3 backup (HTTPS)
    - to: []
      ports:
        - protocol: TCP
          port: 443
    # Intra-cluster communication
    - to:
        - podSelector:
            matchLabels:
              cnpg.io/cluster: my-postgres
      ports:
        - protocol: TCP
          port: 5432
    # DNS
    - to: []
      ports:
        - protocol: UDP
          port: 53
        - protocol: TCP
          port: 53

10.2 Secrets Management (External Secrets Operator)

apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: postgres-credentials
  namespace: database
spec:
  refreshInterval: 1h
  secretStoreRef:
    name: aws-secrets-manager
    kind: ClusterSecretStore
  target:
    name: myapp-db-credentials
    creationPolicy: Owner
  data:
    - secretKey: username
      remoteRef:
        key: production/database/postgres
        property: username
    - secretKey: password
      remoteRef:
        key: production/database/postgres
        property: password

10.3 TLS Configuration

apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: my-postgres
spec:
  certificates:
    serverTLSSecret: my-postgres-server-tls
    serverCASecret: my-postgres-ca

  postgresql:
    parameters:
      ssl: "on"
      ssl_min_protocol_version: "TLSv1.3"

11. Monitoring and Alerting

11.1 Key Metrics and Alert Rules

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: postgres-alerts
  namespace: monitoring
spec:
  groups:
    - name: postgres.rules
      rules:
        - alert: PostgresReplicationLagHigh
          expr: cnpg_pg_replication_lag > 10
          for: 5m
          labels:
            severity: warning
          annotations:
            summary: "PostgreSQL replication lag exceeds 10 seconds"

        - alert: PostgresConnectionsNearLimit
          expr: >
            cnpg_pg_stat_activity_count /
            cnpg_pg_settings_setting{name="max_connections"} > 0.8
          for: 5m
          labels:
            severity: warning
          annotations:
            summary: "Connection count exceeds 80% of maximum"

        - alert: PostgresCacheHitRatioLow
          expr: >
            cnpg_pg_stat_database_blks_hit /
            (cnpg_pg_stat_database_blks_hit + cnpg_pg_stat_database_blks_read) < 0.95
          for: 15m
          labels:
            severity: warning
          annotations:
            summary: "Cache hit ratio below 95%. Consider increasing shared_buffers"

        - alert: PostgresStorageNearFull
          expr: >
            kubelet_volume_stats_used_bytes{namespace="database"} /
            kubelet_volume_stats_capacity_bytes{namespace="database"} > 0.85
          for: 10m
          labels:
            severity: critical
          annotations:
            summary: "DB storage usage exceeds 85%"

        - alert: PostgresFailoverDetected
          expr: changes(cnpg_pg_replication_is_primary[5m]) > 0
          labels:
            severity: critical
          annotations:
            summary: "PostgreSQL Failover detected"

11.2 Grafana Dashboard Key Panels

Key monitoring panels:
1. Replication lag (seconds) - cnpg_pg_replication_lag
2. Active connection count - cnpg_pg_stat_activity_count
3. Cache hit ratio - blks_hit / (blks_hit + blks_read)
4. TPS (transactions/second) - rate(cnpg_pg_stat_database_xact_commit[5m])
5. Query latency P99
6. Disk IOPS / throughput
7. WAL generation rate - rate(cnpg_pg_stat_archiver_archived_count[5m])
8. Storage usage - kubelet_volume_stats_used_bytes
9. CPU / Memory utilization
10. Vacuum / Analyze execution status

12. Migration Strategies

12.1 VM to K8s Migration

# Method 1: pg_dump/pg_restore (small DBs)
pg_dump -Fc -d myapp -h vm-db.internal -U postgres > myapp.dump

kubectl cp myapp.dump database/my-postgres-1:/tmp/myapp.dump
kubectl exec -it my-postgres-1 -n database -- \
  pg_restore -d myapp -U postgres /tmp/myapp.dump

# Method 2: Logical replication (large DBs, zero-downtime)
# 1. Create empty cluster in K8s
# 2. Create publication on VM DB
#    CREATE PUBLICATION my_pub FOR ALL TABLES;
# 3. Create subscription on K8s DB
#    CREATE SUBSCRIPTION my_sub
#      CONNECTION 'host=vm-db.internal dbname=myapp user=replicator'
#      PUBLICATION my_pub;
# 4. Wait for initial data sync completion
# 5. Switch application connections to K8s DB
# 6. Drop subscription

12.2 Zero-Downtime Migration Checklist

Before migration:
[ ] Assess source DB size and table count
[ ] Verify K8s cluster resource availability
[ ] Size StorageClass / PV (2x headroom over source)
[ ] Verify network connectivity (source DB -> K8s cluster)
[ ] Create logical replication user with appropriate permissions

During migration:
[ ] Confirm initial data copy completed
[ ] Verify replication lag converging to 0
[ ] Check sequence value synchronization
[ ] Switch application read traffic to K8s (test)
[ ] Switch write traffic (brief downtime or dual-write)

After migration:
[ ] Verify data integrity (row counts, checksums)
[ ] Confirm application performance
[ ] Stop source DB replication
[ ] Confirm backup policy applied
[ ] Verify monitoring/alerting operational

13. Production Checklist

Infrastructure and Storage

  • StorageClass reclaimPolicy set to Retain
  • volumeBindingMode set to WaitForFirstConsumer
  • EBS/PD IOPS appropriate for workload (minimum 6000 IOPS)
  • Volume auto-expansion (allowVolumeExpansion: true) enabled
  • Nodes have DB-dedicated labels/taints

HA and Replication

  • Minimum 3 instances (1 Primary + 2 Replicas)
  • Pod Anti-Affinity spreads across different nodes
  • Topology Spread across different AZs
  • PodDisruptionBudget configured (maxUnavailable: 1)
  • Sync/async replication mode decided and configured

Backup and Recovery

  • Automated backup schedule set (minimum daily)
  • WAL archiving enabled (PITR capable)
  • Backup retention policy set (minimum 30 days)
  • Backup restore test completed (minimum quarterly)
  • Cross-region backup replication (if DR required)

Performance

  • Resource requests == limits (QoS Guaranteed)
  • shared_buffers = 25% of total memory
  • effective_cache_size = 75% of total memory
  • Connection Pooling enabled (PgBouncer transaction mode)
  • Query logging enabled (log_min_duration_statement)

Security

  • NetworkPolicy restricts DB access
  • TLS enabled (minimum TLSv1.2)
  • Secrets managed by External Secrets Operator or Vault
  • RBAC: Operator service account with minimal privileges
  • Data encryption (storage level + transport level)

Monitoring

  • Prometheus metrics collection configured
  • Grafana dashboard set up
  • Replication lag alert (over 10 seconds)
  • Storage capacity alert (over 85%)
  • Connection count alert (over 80%)
  • Failover occurrence alert

Operations

  • Operator upgrade procedure documented
  • DB minor/major version upgrade procedure documented
  • Incident response runbook created
  • Regular restore test schedule established

14. Quiz

Q1: Why is Pod Identity important in StatefulSet?

A: StatefulSet Pod Identity (ordinal index + stable hostname + dedicated PVC) is essential for database operations. Primary/Replica roles are bound to specific Pods, the same storage is connected even after restarts, and DNS-based access allows other Pods to directly reach specific instances. Regular Deployments cannot provide these guarantees.

Q2: How does CloudNativePG automatic Failover work?

A: When CloudNativePG detects a Primary Pod health check failure, it selects the Replica with the most recent WAL position and promotes it to the new Primary using pg_promote(). Remaining Replicas are reconfigured to follow the new Primary, and the former Primary rejoins as a Replica via pg_rewind. The entire process typically completes within 10-30 seconds.

Q3: Why should DB Pod QoS class be set to Guaranteed?

A: With QoS Guaranteed (requests == limits), K8s guarantees the Pod's resources and it is the last to be OOM Killed during node memory pressure. Since databases require stable in-memory cache (shared_buffers) and consistent CPU, setting Burstable or BestEffort can cause unexpected performance degradation or failures.

Q4: Why set volumeBindingMode to WaitForFirstConsumer?

A: WaitForFirstConsumer provisions the PV only after the node where the Pod will be scheduled is determined. This ensures Pod and PV are in the same availability zone (AZ), preventing cross-AZ network latency. With Immediate mode, the PV can be created in a different AZ, causing Pod scheduling failures or performance degradation.

Q5: What is the difference between PgBouncer transaction mode and session mode?

A: Transaction mode returns backend connections to the pool when transactions end, allowing hundreds of clients to share a small number of DB connections. This is optimal for K8s environments with many Pods connecting simultaneously. Session mode holds connections until client sessions end, used when PREPARE statements or session variables are needed. Transaction mode is generally recommended for K8s environments.


15. References

  1. CloudNativePG Official Documentation - https://cloudnative-pg.io/documentation/
  2. Percona Operator for MySQL - https://docs.percona.com/percona-operator-for-mysql/pxc/
  3. MongoDB Community Operator - https://github.com/mongodb/mongodb-kubernetes-operator
  4. Kubernetes StatefulSet Documentation - https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/
  5. PgBouncer Documentation - https://www.pgbouncer.org/config.html
  6. Velero Official Documentation - https://velero.io/docs/
  7. External Secrets Operator - https://external-secrets.io/
  8. AWS EBS CSI Driver - https://github.com/kubernetes-sigs/aws-ebs-csi-driver
  9. PostgreSQL Performance Tuning Guide - https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server
  10. Kubernetes Network Policies - https://kubernetes.io/docs/concepts/services-networking/network-policies/
  11. pgBackRest Documentation - https://pgbackrest.org/
  12. Prometheus PostgreSQL Exporter - https://github.com/prometheus-community/postgres_exporter
  13. Barman (Backup and Recovery Manager) - https://pgbarman.org/