✍️ 필사 모드: Kubernetes Database Deep Dive 2025: Operator Patterns, HA Configuration, Backup/Recovery, Performance Tuning
EnglishTable of Contents
1. Should You Run Databases on K8s?
1.1 Pros and Cons Analysis (2025 Perspective)
Advantages of K8s DB Operations
- Consistent deployment: Manage DB with the same GitOps workflow as applications
- Resource efficiency: Share node resources with other workloads (cost savings vs dedicated VMs)
- Automatic recovery: Operator detects failures and performs automatic Failover (RTO under 30s)
- Portability: Same configuration anywhere without cloud vendor lock-in
- Dev environment parity: Identical DB stack across dev/staging/production
Disadvantages of K8s DB Operations
- Operational complexity: Additional management points like StorageClass, PV, Operator upgrades
- Storage performance: Network storage (EBS/PD) has higher latency than local disks
- Expertise required: Deep understanding of both DBA and K8s operations needed
- Backup complexity: K8s-specific volume snapshots, object storage integration required
1.2 Decision Framework
| Criteria | K8s DB Suitable | Managed DB (RDS etc.) Suitable |
|---|---|---|
| Team capability | K8s + DBA experts available | No DBA |
| Cost sensitivity | High (need infra optimization) | Medium (management cost OK) |
| Compliance | Data location control needed | Cloud region sufficient |
| Multi-cloud | Required | Single cloud |
| Workload size | Small-medium (hundreds of GB) | Large (multiple TB+) |
| SLA requirement | 99.9% achievable in-house | 99.99% vendor SLA needed |
2. StatefulSet Deep Dive
2.1 Pod Identity (Ordinal Index and Stable Hostname)
StatefulSet assigns each Pod an ordinal index.
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: postgres
namespace: database
spec:
serviceName: postgres-headless
replicas: 3
selector:
matchLabels:
app: postgres
template:
metadata:
labels:
app: postgres
spec:
containers:
- name: postgres
image: postgres:16
ports:
- containerPort: 5432
volumeMounts:
- name: data
mountPath: /var/lib/postgresql/data
env:
- name: POSTGRES_PASSWORD
valueFrom:
secretKeyRef:
name: postgres-secret
key: password
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: ["ReadWriteOnce"]
storageClassName: gp3-encrypted
resources:
requests:
storage: 100Gi
Pods created by this StatefulSet:
postgres-0 -> postgres-0.postgres-headless.database.svc.cluster.local
postgres-1 -> postgres-1.postgres-headless.database.svc.cluster.local
postgres-2 -> postgres-2.postgres-headless.database.svc.cluster.local
Each Pod binds to the same name and same PVC even after restarts.
2.2 PodManagementPolicy
spec:
podManagementPolicy: OrderedReady # default
# OrderedReady: Create in order 0->1->2, delete in order 2->1->0
# Parallel: All Pods created/deleted simultaneously (use with caution for DBs)
- OrderedReady: Primary (index 0) starts first and becomes Ready before Replicas start
- Parallel: Only use for initial cluster bootstrap (when no existing data)
2.3 volumeClaimTemplates
volumeClaimTemplates:
- metadata:
name: data
labels:
type: database-storage
spec:
accessModes: ["ReadWriteOnce"]
storageClassName: gp3-encrypted
resources:
requests:
storage: 100Gi
This template creates individual PVCs for each Pod:
data-postgres-0 -> 100Gi PV (gp3-encrypted)
data-postgres-1 -> 100Gi PV (gp3-encrypted)
data-postgres-2 -> 100Gi PV (gp3-encrypted)
PVCs are not automatically deleted when the StatefulSet is deleted (data protection).
2.4 Update Strategies
spec:
updateStrategy:
type: RollingUpdate # or OnDelete
rollingUpdate:
partition: 0 # only update indices >= partition
maxUnavailable: 1 # K8s 1.24+
- RollingUpdate: Updates in reverse order from highest index (2 then 1 then 0). Replicas first, Primary last
- OnDelete: Pod must be manually deleted to update. Useful for canary updates
- partition: Setting
partition: 2only updates index 2 and above (canary)
2.5 Headless Service
apiVersion: v1
kind: Service
metadata:
name: postgres-headless
namespace: database
spec:
type: ClusterIP
clusterIP: None # Headless Service
selector:
app: postgres
ports:
- port: 5432
targetPort: 5432
Headless Service returns each Pod's IP directly as DNS A records:
# Query all Pod IPs
nslookup postgres-headless.database.svc.cluster.local
# Direct access to specific Pod
psql -h postgres-0.postgres-headless.database.svc.cluster.local -U postgres
3. CloudNativePG (PostgreSQL) Deep Dive
3.1 Cluster CRD
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: my-postgres
namespace: database
spec:
instances: 3
postgresql:
parameters:
shared_buffers: "2GB"
effective_cache_size: "6GB"
work_mem: "64MB"
maintenance_work_mem: "512MB"
max_connections: "200"
max_wal_size: "2GB"
min_wal_size: "1GB"
wal_buffers: "64MB"
random_page_cost: "1.1"
effective_io_concurrency: "200"
max_worker_processes: "8"
max_parallel_workers_per_gather: "4"
max_parallel_workers: "8"
storage:
size: 100Gi
storageClass: gp3-encrypted
pvcTemplate:
accessModes:
- ReadWriteOnce
walStorage:
size: 20Gi
storageClass: gp3-encrypted
resources:
requests:
cpu: "2"
memory: "8Gi"
limits:
cpu: "4"
memory: "8Gi"
monitoring:
enablePodMonitor: true
bootstrap:
initdb:
database: myapp
owner: myapp
secret:
name: myapp-db-credentials
3.2 Automatic Failover
CloudNativePG automatically promotes a Replica when the Primary fails.
Normal state:
my-postgres-1 (Primary, RW)
my-postgres-2 (Replica, RO) - streaming replication
my-postgres-3 (Replica, RO) - streaming replication
Primary failure occurs:
1. CloudNativePG detects Primary failure (health check failure)
2. Selects Replica with most recent WAL position
3. Executes pg_promote() -> promotes new Primary
4. Remaining Replicas follow new Primary
5. pg_rewind rejoins former Primary as Replica
After failover:
my-postgres-2 (Primary, RW) <- auto-promoted
my-postgres-3 (Replica, RO) - follows new Primary
my-postgres-1 (Replica, RO) - rejoined after pg_rewind
Failover duration: typically 10-30 seconds.
3.3 Backup Configuration (Barman + Object Storage)
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: my-postgres
spec:
instances: 3
backup:
barmanObjectStore:
destinationPath: s3://my-backup-bucket/postgres/
endpointURL: https://s3.us-east-1.amazonaws.com
s3Credentials:
accessKeyId:
name: aws-creds
key: ACCESS_KEY_ID
secretAccessKey:
name: aws-creds
key: ACCESS_SECRET_KEY
wal:
compression: gzip
maxParallel: 4
data:
compression: gzip
immediateCheckpoint: true
retentionPolicy: "30d"
Scheduled backup:
apiVersion: postgresql.cnpg.io/v1
kind: ScheduledBackup
metadata:
name: daily-backup
namespace: database
spec:
schedule: "0 3 * * *" # Daily at 3 AM
backupOwnerReference: self
cluster:
name: my-postgres
immediate: false
suspend: false
3.4 WAL Archiving and PITR
WAL (Write-Ahead Log) archiving provides continuous data protection.
PITR (Point-in-Time Recovery):
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: my-postgres-restored
spec:
instances: 3
bootstrap:
recovery:
source: my-postgres
recoveryTarget:
targetTime: "2024-09-01 14:30:00.00000+00"
externalClusters:
- name: my-postgres
barmanObjectStore:
destinationPath: s3://my-backup-bucket/postgres/
endpointURL: https://s3.us-east-1.amazonaws.com
s3Credentials:
accessKeyId:
name: aws-creds
key: ACCESS_KEY_ID
secretAccessKey:
name: aws-creds
key: ACCESS_SECRET_KEY
3.5 Connection Pooling (Built-in PgBouncer)
apiVersion: postgresql.cnpg.io/v1
kind: Pooler
metadata:
name: my-postgres-pooler-rw
namespace: database
spec:
cluster:
name: my-postgres
instances: 2
type: rw # rw or ro
pgbouncer:
poolMode: transaction
parameters:
max_client_conn: "1000"
default_pool_size: "25"
min_pool_size: "5"
reserve_pool_size: "5"
reserve_pool_timeout: "5"
server_idle_timeout: "300"
# Read-only pooler (targets Replicas)
apiVersion: postgresql.cnpg.io/v1
kind: Pooler
metadata:
name: my-postgres-pooler-ro
spec:
cluster:
name: my-postgres
instances: 2
type: ro
pgbouncer:
poolMode: transaction
parameters:
max_client_conn: "2000"
default_pool_size: "50"
Application connections:
# Read/Write (Primary)
my-postgres-pooler-rw.database.svc.cluster.local:5432
# Read-Only (Replica)
my-postgres-pooler-ro.database.svc.cluster.local:5432
3.6 Monitoring (Prometheus + Grafana)
spec:
monitoring:
enablePodMonitor: true
customQueriesConfigMap:
- name: pg-custom-queries
key: queries
Custom queries ConfigMap:
apiVersion: v1
kind: ConfigMap
metadata:
name: pg-custom-queries
namespace: database
data:
queries: |
pg_database_size:
query: "SELECT datname, pg_database_size(datname) as size_bytes FROM pg_database WHERE datname NOT IN ('template0', 'template1')"
master: true
metrics:
- datname:
usage: "LABEL"
description: "Database name"
- size_bytes:
usage: "GAUGE"
description: "Database size in bytes"
pg_replication_lag:
query: "SELECT CASE WHEN pg_is_in_recovery() THEN EXTRACT(EPOCH FROM (now() - pg_last_xact_replay_timestamp()))::int ELSE 0 END as lag_seconds"
master: false
metrics:
- lag_seconds:
usage: "GAUGE"
description: "Replication lag in seconds"
3.7 Rolling Update (Zero-Downtime)
# Image update
spec:
imageName: ghcr.io/cloudnative-pg/postgresql:16.3
# CloudNativePG automatically:
# 1. Restarts Replicas one by one
# 2. Confirms each Replica is Ready before proceeding
# 3. Finally switchover Primary (promote new Primary then restart old)
# -> Update completes with zero downtime
4. Percona Operator for MySQL
4.1 XtraDB Cluster (Synchronous Replication)
apiVersion: pxc.percona.com/v1
kind: PerconaXtraDBCluster
metadata:
name: my-mysql
namespace: database
spec:
crVersion: "1.14.0"
secretsName: my-mysql-secrets
pxc:
size: 3
image: percona/percona-xtradb-cluster:8.0.35
resources:
requests:
cpu: "2"
memory: "8Gi"
limits:
cpu: "4"
memory: "8Gi"
volumeSpec:
persistentVolumeClaim:
storageClassName: gp3-encrypted
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 100Gi
configuration: |
[mysqld]
innodb_buffer_pool_size=4G
innodb_log_file_size=1G
innodb_flush_method=O_DIRECT
max_connections=500
wsrep_sync_wait=3
wsrep_slave_threads=4
haproxy:
enabled: true
size: 2
image: percona/haproxy:2.8.5
resources:
requests:
cpu: "500m"
memory: "1Gi"
backup:
image: percona/percona-xtradb-cluster-operator:1.14.0-pxc8.0-backup-pxb8.0.35
storages:
s3-backup:
type: s3
s3:
bucket: my-mysql-backups
region: us-east-1
credentialsSecret: aws-s3-secret
schedule:
- name: daily-full
schedule: "0 3 * * *"
keep: 7
storageName: s3-backup
4.2 ProxySQL Connection Routing
spec:
proxysql:
enabled: true
size: 2
image: percona/proxysql2:2.5.5
resources:
requests:
cpu: "500m"
memory: "1Gi"
4.3 Percona XtraBackup
# Manual backup
apiVersion: pxc.percona.com/v1
kind: PerconaXtraDBClusterBackup
metadata:
name: manual-backup-20240901
namespace: database
spec:
pxcCluster: my-mysql
storageName: s3-backup
# Check backup status
kubectl get pxc-backup -n database
# Restore from backup
kubectl apply -f - <<EOF
apiVersion: pxc.percona.com/v1
kind: PerconaXtraDBClusterRestore
metadata:
name: restore-20240901
namespace: database
spec:
pxcCluster: my-mysql
backupName: manual-backup-20240901
EOF
5. MongoDB Community Operator
5.1 ReplicaSet Configuration
apiVersion: mongodbcommunity.mongodb.com/v1
kind: MongoDBCommunity
metadata:
name: my-mongodb
namespace: database
spec:
members: 3
type: ReplicaSet
version: "7.0.12"
security:
authentication:
modes: ["SCRAM"]
users:
- name: admin
db: admin
passwordSecretRef:
name: mongodb-admin-password
roles:
- name: clusterAdmin
db: admin
- name: userAdminAnyDatabase
db: admin
scramCredentialsSecretName: admin-scram
- name: myapp
db: myapp
passwordSecretRef:
name: mongodb-myapp-password
roles:
- name: readWrite
db: myapp
scramCredentialsSecretName: myapp-scram
statefulSet:
spec:
template:
spec:
containers:
- name: mongod
resources:
requests:
cpu: "2"
memory: "4Gi"
limits:
cpu: "4"
memory: "4Gi"
volumeClaimTemplates:
- metadata:
name: data-volume
spec:
storageClassName: gp3-encrypted
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 50Gi
additionalMongodConfig:
storage.wiredTiger.engineConfig.cacheSizeGB: 2
net.maxIncomingConnections: 500
5.2 Backup Strategies
# mongodump-based logical backup (CronJob)
apiVersion: batch/v1
kind: CronJob
metadata:
name: mongodb-backup
namespace: database
spec:
schedule: "0 2 * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: backup
image: mongo:7.0
command:
- /bin/sh
- -c
- |
mongodump \
--uri="mongodb://myapp:password@my-mongodb-0.my-mongodb-svc:27017/myapp?replicaSet=my-mongodb" \
--gzip \
--archive=/backup/myapp-$(date +%Y%m%d).gz
aws s3 cp /backup/myapp-$(date +%Y%m%d).gz \
s3://my-backup-bucket/mongodb/
restartPolicy: OnFailure
6. Storage Deep Dive
6.1 StorageClass Configuration
# AWS EBS gp3 StorageClass
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: gp3-encrypted
provisioner: ebs.csi.aws.com
parameters:
type: gp3
encrypted: "true"
iops: "6000" # gp3 baseline 3000, max 16000
throughput: "250" # gp3 baseline 125MB/s, max 1000MB/s
fsType: ext4
reclaimPolicy: Retain # DB data must use Retain
volumeBindingMode: WaitForFirstConsumer # Topology-aware binding
allowVolumeExpansion: true
# GCP PD SSD StorageClass
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: pd-ssd-encrypted
provisioner: pd.csi.storage.gke.io
parameters:
type: pd-ssd
reclaimPolicy: Retain
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
6.2 Local PV vs Cloud EBS/PD Performance Comparison
| Metric | Local NVMe | EBS gp3 (6000 IOPS) | EBS io2 (16000 IOPS) | GCP PD-SSD |
|---|---|---|---|---|
| Random Read IOPS | 500K+ | 6,000 | 16,000 | 30,000 |
| Random Write IOPS | 200K+ | 6,000 | 16,000 | 30,000 |
| Sequential Read MB/s | 3,000+ | 250 | 1,000 | 1,200 |
| Sequential Write MB/s | 2,000+ | 250 | 1,000 | 1,200 |
| P99 Latency | Under 0.1ms | 1-3ms | 0.5-1ms | 1-2ms |
| Data Durability | Lost on node failure | 99.999% | 99.999% | 99.999% |
| Resize | Not possible | Dynamic expansion | Dynamic expansion | Dynamic expansion |
6.3 Storage Benchmarking
# fio benchmark Pod
kubectl run fio-bench --image=nixery.dev/fio \
--overrides='{
"spec": {
"containers": [{
"name": "fio-bench",
"image": "nixery.dev/fio",
"command": ["sleep", "3600"],
"volumeMounts": [{
"name": "test-vol",
"mountPath": "/data"
}]
}],
"volumes": [{
"name": "test-vol",
"persistentVolumeClaim": {
"claimName": "bench-pvc"
}
}]
}
}'
# Random read benchmark
kubectl exec fio-bench -- fio \
--name=randread \
--ioengine=libaio \
--iodepth=32 \
--rw=randread \
--bs=4k \
--direct=1 \
--size=1G \
--numjobs=4 \
--runtime=60 \
--directory=/data \
--group_reporting
# pgbench (PostgreSQL performance test)
kubectl exec -it my-postgres-1 -- pgbench \
-i -s 100 myapp # Initialize (100 scale factor)
kubectl exec -it my-postgres-1 -- pgbench \
-c 50 -j 4 -T 300 -P 10 myapp # 50 clients, 300 seconds
7. High Availability Patterns
7.1 Synchronous vs Asynchronous Replication
# CloudNativePG: Synchronous replication
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: my-postgres-sync
spec:
instances: 3
postgresql:
parameters:
synchronous_standby_names: "ANY 1 (*)"
# ANY 1: Guarantees sync replication to at least 1 Replica
# FIRST 1: Sync replication to the first Replica
minSyncReplicas: 1
maxSyncReplicas: 1
| Mode | Data Consistency | Write Latency | RPO | Use Case |
|---|---|---|---|---|
| Asynchronous | Eventual consistency | Low | Possible seconds of data loss | General workloads |
| Synchronous (ANY 1) | Strong consistency | High (2x) | 0 (no data loss) | Financial, payments |
| Semi-synchronous | Medium | Medium | Very small | Most production |
7.2 Pod Disruption Budget
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: postgres-pdb
namespace: database
spec:
maxUnavailable: 1 # Allow max 1 Pod disrupted at a time
selector:
matchLabels:
app: postgres
CloudNativePG automatically creates a PDB. Default is maxUnavailable: 1.
7.3 Topology Spread Constraints
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: my-postgres
spec:
instances: 3
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
cnpg.io/cluster: my-postgres
- maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
cnpg.io/cluster: my-postgres
This spreads DB Pods across different availability zones (AZs) and different nodes.
7.4 Anti-Affinity Rules
spec:
affinity:
enablePodAntiAffinity: true
topologyKey: kubernetes.io/hostname
podAntiAffinityType: required
# required: Must spread (Pod stays Pending if not possible)
# preferred: Spread if possible (allows same node if needed)
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: node-role
operator: In
values:
- database
tolerations:
- key: database-only
operator: Equal
value: "true"
effect: NoSchedule
8. Backup and Recovery Strategies
8.1 Logical vs Physical Backup
| Aspect | Logical (pg_dump) | Physical (pgBackRest) |
|---|---|---|
| Speed | Slow (hours for large DBs) | Fast (incremental backup supported) |
| Size | SQL text (compressible) | Binary (smaller) |
| Restore flexibility | Table-level restore possible | Full cluster restore |
| Version compatibility | Cross PG versions | Same PG version only |
| PITR | Not possible | Possible |
| Use case | Migration, partial restore | Production disaster recovery |
8.2 Continuous WAL Archiving
WAL archiving flow:
PG Primary -> WAL segments generated (16MB each) -> S3/GCS archive
|
Continuous data protection
|
PITR: Recovery to any point
8.3 PITR (Point-in-Time Recovery) Walkthrough
# 1. Check current backup list
kubectl get backup -n database
# 2. Determine target recovery time (e.g., just before accidental DELETE)
# Time: 2024-09-01 14:29:00 (deletion executed at 14:30)
# 3. PITR restore to new cluster
kubectl apply -f pitr-restore.yaml
# 4. Monitor restore progress
kubectl get cluster my-postgres-restored -n database -w
# 5. Verify data after restore
kubectl exec -it my-postgres-restored-1 -- psql -U myapp -d myapp \
-c "SELECT count(*) FROM important_table;"
# 6. Switch application connection
kubectl patch service myapp-db -n database \
-p '{"spec":{"selector":{"cnpg.io/cluster":"my-postgres-restored"}}}'
8.4 Velero Cluster-Level Backup
# Install Velero
velero install \
--provider aws \
--plugins velero/velero-plugin-for-aws:v1.9.0 \
--bucket my-velero-bucket \
--backup-location-config region=us-east-1 \
--snapshot-location-config region=us-east-1
# Full backup of DB namespace (including PVs)
velero backup create db-full-backup \
--include-namespaces database \
--include-resources '*' \
--snapshot-volumes=true \
--volume-snapshot-locations default
# Scheduled backup
velero schedule create daily-db-backup \
--schedule="0 4 * * *" \
--include-namespaces database \
--snapshot-volumes=true \
--ttl 720h # 30-day retention
8.5 Cross-Region Disaster Recovery
# Source region cluster
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: my-postgres-primary
spec:
instances: 3
backup:
barmanObjectStore:
destinationPath: s3://my-backup-bucket-primary/
s3Credentials:
accessKeyId:
name: aws-creds
key: ACCESS_KEY_ID
secretAccessKey:
name: aws-creds
key: ACCESS_SECRET_KEY
---
# DR region replica cluster
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: my-postgres-dr
spec:
instances: 2
replica:
enabled: true
source: my-postgres-primary
externalClusters:
- name: my-postgres-primary
barmanObjectStore:
destinationPath: s3://my-backup-bucket-primary/
s3Credentials:
accessKeyId:
name: aws-creds-dr
key: ACCESS_KEY_ID
secretAccessKey:
name: aws-creds-dr
key: ACCESS_SECRET_KEY
9. Performance Tuning
9.1 Resource Requests/Limits
# DB Pods must use QoS Guaranteed class
resources:
requests:
cpu: "4"
memory: "16Gi"
limits:
cpu: "4" # requests == limits -> Guaranteed
memory: "16Gi" # Prevents OOM Killer
Memory allocation guide:
Based on 16Gi total memory:
- shared_buffers: 4GB (25%)
- effective_cache_size: 12GB (75%)
- work_mem: 64MB (per session)
- maintenance_work_mem: 1GB
- OS/other: ~2GB
9.2 PostgreSQL Performance Parameter Tuning
postgresql:
parameters:
# Memory
shared_buffers: "4GB"
effective_cache_size: "12GB"
work_mem: "64MB"
maintenance_work_mem: "1GB"
wal_buffers: "64MB"
# WAL
max_wal_size: "4GB"
min_wal_size: "1GB"
checkpoint_completion_target: "0.9"
wal_compression: "zstd"
# Query planner
random_page_cost: "1.1" # For SSD (HDD is 4.0)
effective_io_concurrency: "200" # For SSD (HDD is 2)
# Parallelism
max_worker_processes: "8"
max_parallel_workers_per_gather: "4"
max_parallel_workers: "8"
# Connections
max_connections: "200"
idle_in_transaction_session_timeout: "30000" # 30 seconds
# Logging
log_min_duration_statement: "1000" # Log queries over 1 second
log_checkpoints: "on"
log_lock_waits: "on"
# Auto VACUUM
autovacuum_max_workers: "4"
autovacuum_naptime: "30"
autovacuum_vacuum_cost_limit: "1000"
9.3 Connection Pooling Configuration
PgBouncer recommended settings:
pool_mode = transaction
- Allocates/returns connections per transaction
- Essential for hundreds of Pods connecting simultaneously in K8s
- Restricts PREPARE statement usage (only in session mode)
default_pool_size = 25
- Backend DB connection count (12.5% of max_connections)
- Too high overloads DB, too low increases wait times
max_client_conn = 1000
- Maximum frontend connections
- Calculate: number of Pods x connections per Pod
reserve_pool_size = 5
- Reserve connections for peak traffic
server_idle_timeout = 300
- Clean up idle connections after 5 minutes
10. Security
10.1 Network Policies
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: postgres-network-policy
namespace: database
spec:
podSelector:
matchLabels:
cnpg.io/cluster: my-postgres
policyTypes:
- Ingress
- Egress
ingress:
# Allow access only from application namespace
- from:
- namespaceSelector:
matchLabels:
app.kubernetes.io/part-of: myapp
ports:
- protocol: TCP
port: 5432
# Allow replication traffic within cluster
- from:
- podSelector:
matchLabels:
cnpg.io/cluster: my-postgres
ports:
- protocol: TCP
port: 5432
egress:
# Outbound for S3 backup (HTTPS)
- to: []
ports:
- protocol: TCP
port: 443
# Intra-cluster communication
- to:
- podSelector:
matchLabels:
cnpg.io/cluster: my-postgres
ports:
- protocol: TCP
port: 5432
# DNS
- to: []
ports:
- protocol: UDP
port: 53
- protocol: TCP
port: 53
10.2 Secrets Management (External Secrets Operator)
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: postgres-credentials
namespace: database
spec:
refreshInterval: 1h
secretStoreRef:
name: aws-secrets-manager
kind: ClusterSecretStore
target:
name: myapp-db-credentials
creationPolicy: Owner
data:
- secretKey: username
remoteRef:
key: production/database/postgres
property: username
- secretKey: password
remoteRef:
key: production/database/postgres
property: password
10.3 TLS Configuration
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: my-postgres
spec:
certificates:
serverTLSSecret: my-postgres-server-tls
serverCASecret: my-postgres-ca
postgresql:
parameters:
ssl: "on"
ssl_min_protocol_version: "TLSv1.3"
11. Monitoring and Alerting
11.1 Key Metrics and Alert Rules
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: postgres-alerts
namespace: monitoring
spec:
groups:
- name: postgres.rules
rules:
- alert: PostgresReplicationLagHigh
expr: cnpg_pg_replication_lag > 10
for: 5m
labels:
severity: warning
annotations:
summary: "PostgreSQL replication lag exceeds 10 seconds"
- alert: PostgresConnectionsNearLimit
expr: >
cnpg_pg_stat_activity_count /
cnpg_pg_settings_setting{name="max_connections"} > 0.8
for: 5m
labels:
severity: warning
annotations:
summary: "Connection count exceeds 80% of maximum"
- alert: PostgresCacheHitRatioLow
expr: >
cnpg_pg_stat_database_blks_hit /
(cnpg_pg_stat_database_blks_hit + cnpg_pg_stat_database_blks_read) < 0.95
for: 15m
labels:
severity: warning
annotations:
summary: "Cache hit ratio below 95%. Consider increasing shared_buffers"
- alert: PostgresStorageNearFull
expr: >
kubelet_volume_stats_used_bytes{namespace="database"} /
kubelet_volume_stats_capacity_bytes{namespace="database"} > 0.85
for: 10m
labels:
severity: critical
annotations:
summary: "DB storage usage exceeds 85%"
- alert: PostgresFailoverDetected
expr: changes(cnpg_pg_replication_is_primary[5m]) > 0
labels:
severity: critical
annotations:
summary: "PostgreSQL Failover detected"
11.2 Grafana Dashboard Key Panels
Key monitoring panels:
1. Replication lag (seconds) - cnpg_pg_replication_lag
2. Active connection count - cnpg_pg_stat_activity_count
3. Cache hit ratio - blks_hit / (blks_hit + blks_read)
4. TPS (transactions/second) - rate(cnpg_pg_stat_database_xact_commit[5m])
5. Query latency P99
6. Disk IOPS / throughput
7. WAL generation rate - rate(cnpg_pg_stat_archiver_archived_count[5m])
8. Storage usage - kubelet_volume_stats_used_bytes
9. CPU / Memory utilization
10. Vacuum / Analyze execution status
12. Migration Strategies
12.1 VM to K8s Migration
# Method 1: pg_dump/pg_restore (small DBs)
pg_dump -Fc -d myapp -h vm-db.internal -U postgres > myapp.dump
kubectl cp myapp.dump database/my-postgres-1:/tmp/myapp.dump
kubectl exec -it my-postgres-1 -n database -- \
pg_restore -d myapp -U postgres /tmp/myapp.dump
# Method 2: Logical replication (large DBs, zero-downtime)
# 1. Create empty cluster in K8s
# 2. Create publication on VM DB
# CREATE PUBLICATION my_pub FOR ALL TABLES;
# 3. Create subscription on K8s DB
# CREATE SUBSCRIPTION my_sub
# CONNECTION 'host=vm-db.internal dbname=myapp user=replicator'
# PUBLICATION my_pub;
# 4. Wait for initial data sync completion
# 5. Switch application connections to K8s DB
# 6. Drop subscription
12.2 Zero-Downtime Migration Checklist
Before migration:
[ ] Assess source DB size and table count
[ ] Verify K8s cluster resource availability
[ ] Size StorageClass / PV (2x headroom over source)
[ ] Verify network connectivity (source DB -> K8s cluster)
[ ] Create logical replication user with appropriate permissions
During migration:
[ ] Confirm initial data copy completed
[ ] Verify replication lag converging to 0
[ ] Check sequence value synchronization
[ ] Switch application read traffic to K8s (test)
[ ] Switch write traffic (brief downtime or dual-write)
After migration:
[ ] Verify data integrity (row counts, checksums)
[ ] Confirm application performance
[ ] Stop source DB replication
[ ] Confirm backup policy applied
[ ] Verify monitoring/alerting operational
13. Production Checklist
Infrastructure and Storage
- StorageClass
reclaimPolicyset toRetain -
volumeBindingModeset toWaitForFirstConsumer - EBS/PD IOPS appropriate for workload (minimum 6000 IOPS)
- Volume auto-expansion (
allowVolumeExpansion: true) enabled - Nodes have DB-dedicated labels/taints
HA and Replication
- Minimum 3 instances (1 Primary + 2 Replicas)
- Pod Anti-Affinity spreads across different nodes
- Topology Spread across different AZs
- PodDisruptionBudget configured (
maxUnavailable: 1) - Sync/async replication mode decided and configured
Backup and Recovery
- Automated backup schedule set (minimum daily)
- WAL archiving enabled (PITR capable)
- Backup retention policy set (minimum 30 days)
- Backup restore test completed (minimum quarterly)
- Cross-region backup replication (if DR required)
Performance
- Resource requests == limits (QoS Guaranteed)
- shared_buffers = 25% of total memory
- effective_cache_size = 75% of total memory
- Connection Pooling enabled (PgBouncer transaction mode)
- Query logging enabled (
log_min_duration_statement)
Security
- NetworkPolicy restricts DB access
- TLS enabled (minimum TLSv1.2)
- Secrets managed by External Secrets Operator or Vault
- RBAC: Operator service account with minimal privileges
- Data encryption (storage level + transport level)
Monitoring
- Prometheus metrics collection configured
- Grafana dashboard set up
- Replication lag alert (over 10 seconds)
- Storage capacity alert (over 85%)
- Connection count alert (over 80%)
- Failover occurrence alert
Operations
- Operator upgrade procedure documented
- DB minor/major version upgrade procedure documented
- Incident response runbook created
- Regular restore test schedule established
14. Quiz
Q1: Why is Pod Identity important in StatefulSet?
A: StatefulSet Pod Identity (ordinal index + stable hostname + dedicated PVC) is essential for database operations. Primary/Replica roles are bound to specific Pods, the same storage is connected even after restarts, and DNS-based access allows other Pods to directly reach specific instances. Regular Deployments cannot provide these guarantees.
Q2: How does CloudNativePG automatic Failover work?
A: When CloudNativePG detects a Primary Pod health check failure, it selects the Replica with the most recent WAL position and promotes it to the new Primary using pg_promote(). Remaining Replicas are reconfigured to follow the new Primary, and the former Primary rejoins as a Replica via pg_rewind. The entire process typically completes within 10-30 seconds.
Q3: Why should DB Pod QoS class be set to Guaranteed?
A: With QoS Guaranteed (requests == limits), K8s guarantees the Pod's resources and it is the last to be OOM Killed during node memory pressure. Since databases require stable in-memory cache (shared_buffers) and consistent CPU, setting Burstable or BestEffort can cause unexpected performance degradation or failures.
Q4: Why set volumeBindingMode to WaitForFirstConsumer?
A: WaitForFirstConsumer provisions the PV only after the node where the Pod will be scheduled is determined. This ensures Pod and PV are in the same availability zone (AZ), preventing cross-AZ network latency. With Immediate mode, the PV can be created in a different AZ, causing Pod scheduling failures or performance degradation.
Q5: What is the difference between PgBouncer transaction mode and session mode?
A: Transaction mode returns backend connections to the pool when transactions end, allowing hundreds of clients to share a small number of DB connections. This is optimal for K8s environments with many Pods connecting simultaneously. Session mode holds connections until client sessions end, used when PREPARE statements or session variables are needed. Transaction mode is generally recommended for K8s environments.
15. References
- CloudNativePG Official Documentation - https://cloudnative-pg.io/documentation/
- Percona Operator for MySQL - https://docs.percona.com/percona-operator-for-mysql/pxc/
- MongoDB Community Operator - https://github.com/mongodb/mongodb-kubernetes-operator
- Kubernetes StatefulSet Documentation - https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/
- PgBouncer Documentation - https://www.pgbouncer.org/config.html
- Velero Official Documentation - https://velero.io/docs/
- External Secrets Operator - https://external-secrets.io/
- AWS EBS CSI Driver - https://github.com/kubernetes-sigs/aws-ebs-csi-driver
- PostgreSQL Performance Tuning Guide - https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server
- Kubernetes Network Policies - https://kubernetes.io/docs/concepts/services-networking/network-policies/
- pgBackRest Documentation - https://pgbackrest.org/
- Prometheus PostgreSQL Exporter - https://github.com/prometheus-community/postgres_exporter
- Barman (Backup and Recovery Manager) - https://pgbarman.org/
현재 단락 (1/955)
**Advantages of K8s DB Operations**