- Published on
Running Databases on Kubernetes Complete Guide: StatefulSet, Operators, Backup and Recovery
- Authors

- Name
- Youngju Kim
- @fjvbn20031
TL;DR
- StatefulSet: K8s workload for stateful applications like databases (stable network IDs + persistent storage)
- DB Operators: Automate operations with CloudNativePG (PostgreSQL), Percona Operator (MySQL/MongoDB)
- Storage: Manage persistent data with PV/PVC/StorageClass, leverage CSI drivers
- High Availability: Primary-Replica configuration, automatic failover, PodDisruptionBudget
- Backup/Recovery: pgBackRest, Velero, PITR (Point-in-Time Recovery) strategies
- Monitoring: PMM, pg_exporter + Prometheus + Grafana dashboards
Table of Contents
- Should You Run DBs on K8s?
- StatefulSet Deep Dive
- Storage Strategy
- Headless Service and DNS
- Database Operators
- High Availability (HA)
- Backup and Recovery
- Monitoring
- Performance Tuning
- Security
- Migrating from VMs to K8s
- Production Checklist
- Practical Quiz
- References
1. Should You Run DBs on K8s?
1.1 Pros and Cons
K8s DB operations decision matrix:
When you should: When you should not:
+ Multi-cloud/hybrid environments - Managed DB service available
+ Infrastructure consistency matters - Lack of DBA resources
+ GitOps/IaC pipeline integration - Very large single DB instances
+ Dev/test environment automation - Extremely low latency required
+ Cost optimization is essential - Insufficient K8s experience
+ Data sovereignty regulations - Simple architecture is enough
| Criteria | K8s DB Operations | Managed Service (RDS/CloudSQL) |
|---|---|---|
| Initial setup complexity | High | Low |
| Operations automation | Possible via Operators | Built-in |
| Cost | Efficient (resource sharing) | Premium pricing |
| Multi-cloud | Easy | Vendor lock-in |
| Customization | Full freedom | Limited |
| Backup/Recovery | Manual configuration | Built-in |
| Scalability | Manual/semi-automatic | Auto-scaling |
1.2 Which DBs Are Suitable for K8s?
K8s compatibility ranking:
Highly suitable:
- PostgreSQL (excellent CloudNativePG ecosystem)
- MongoDB (ReplicaSet structure fits K8s naturally)
- Redis (Sentinel/Cluster mode)
Suitable:
- MySQL (Percona/Oracle Operator available)
- Elasticsearch (ECK Operator)
- Cassandra (K8ssandra Operator)
Use with caution:
- Oracle DB (licensing, complexity)
- SQL Server (Windows container limitations)
- Large single-instance DBs
2. StatefulSet Deep Dive
2.1 StatefulSet vs Deployment
Deployment: StatefulSet:
- Pod names: random (abc-xyz) - Pod names: sequential (db-0, db-1, db-2)
- Parallel create/delete - Sequential create/delete (0 -> 1 -> 2)
- Shared volume or none - Dedicated PVC per Pod (auto-created)
- Interchangeable Pods - Pods with unique identity
- Suited for stateless apps - Suited for stateful apps (DB)
2.2 StatefulSet YAML Example
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: postgres
namespace: database
spec:
serviceName: postgres-headless # Headless Service name
replicas: 3
podManagementPolicy: OrderedReady # Sequential creation (default)
updateStrategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1 # K8s 1.24+
selector:
matchLabels:
app: postgres
template:
metadata:
labels:
app: postgres
spec:
terminationGracePeriodSeconds: 120 # Enough time for DB shutdown
securityContext:
fsGroup: 999 # postgres group
runAsUser: 999 # postgres user
containers:
- name: postgres
image: postgres:16-alpine
ports:
- containerPort: 5432
name: postgresql
env:
- name: POSTGRES_DB
value: myapp
- name: POSTGRES_USER
valueFrom:
secretKeyRef:
name: postgres-secret
key: username
- name: POSTGRES_PASSWORD
valueFrom:
secretKeyRef:
name: postgres-secret
key: password
- name: PGDATA
value: /var/lib/postgresql/data/pgdata
resources:
requests:
cpu: "500m"
memory: "1Gi"
limits:
cpu: "2"
memory: "4Gi"
volumeMounts:
- name: data
mountPath: /var/lib/postgresql/data
livenessProbe:
exec:
command:
- pg_isready
- -U
- postgres
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
exec:
command:
- pg_isready
- -U
- postgres
initialDelaySeconds: 5
periodSeconds: 5
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: ["ReadWriteOnce"]
storageClassName: fast-ssd
resources:
requests:
storage: 100Gi
2.3 PodManagementPolicy
# OrderedReady (default): Sequential create/delete
# Pod 0 Ready -> Pod 1 created -> Pod 1 Ready -> Pod 2 created
podManagementPolicy: OrderedReady
# Parallel: All Pods created/deleted simultaneously
# Be careful with initial bootstrapping (DBs typically use OrderedReady)
podManagementPolicy: Parallel
2.4 UpdateStrategy
updateStrategy:
type: RollingUpdate
rollingUpdate:
# Partition: Only Pods with ordinal at or above this value are updated
# Useful for canary deployments (update only Pod 2 first)
partition: 2
# OnDelete: Updates only when Pods are manually deleted
# Provides fine-grained control during DB upgrades
updateStrategy:
type: OnDelete
3. Storage Strategy
3.1 PV / PVC / StorageClass
# StorageClass definition (AWS EBS gp3)
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: fast-ssd
provisioner: ebs.csi.aws.com
parameters:
type: gp3
iops: "5000"
throughput: "250" # MB/s
encrypted: "true"
reclaimPolicy: Retain # MUST be Retain for DB data!
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true # Allow online volume expansion
# StorageClass definition (GCP PD SSD)
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: fast-ssd
provisioner: pd.csi.storage.gke.io
parameters:
type: pd-ssd
replication-type: regional-pd # Regional PD (high availability)
reclaimPolicy: Retain
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
3.2 Local Storage vs Cloud Volumes
Performance comparison:
Local NVMe SSD:
- Random reads: 500K+ IOPS
- Latency: under 0.1ms
- Drawback: No Pod migration, data loss risk on node failure
Cloud EBS gp3:
- Baseline: 3,000 IOPS / 125 MB/s
- Maximum: 16,000 IOPS / 1,000 MB/s
- Advantage: Pod migration possible, snapshot support
Cloud EBS io2:
- Maximum: 64,000 IOPS
- 99.999% durability
- Expensive but suitable for mission-critical DBs
3.3 Volume Expansion
# Expand PVC size (requires allowVolumeExpansion: true on StorageClass)
kubectl patch pvc data-postgres-0 -n database \
-p '{"spec": {"resources": {"requests": {"storage": "200Gi"}}}}'
# Check expansion status
kubectl get pvc data-postgres-0 -n database -o yaml | grep -A 5 status
4. Headless Service and DNS
4.1 Headless Service Definition
apiVersion: v1
kind: Service
metadata:
name: postgres-headless
namespace: database
spec:
clusterIP: None # The key to Headless Service
selector:
app: postgres
ports:
- port: 5432
targetPort: 5432
name: postgresql
4.2 DNS Rules
Each Pod in a StatefulSet gets a predictable DNS name.
DNS pattern:
pod-name.service-name.namespace.svc.cluster.local
Examples:
postgres-0.postgres-headless.database.svc.cluster.local
postgres-1.postgres-headless.database.svc.cluster.local
postgres-2.postgres-headless.database.svc.cluster.local
# Additional Services for read/write splitting
---
apiVersion: v1
kind: Service
metadata:
name: postgres-primary
namespace: database
spec:
selector:
app: postgres
role: primary
ports:
- port: 5432
targetPort: 5432
---
apiVersion: v1
kind: Service
metadata:
name: postgres-replica
namespace: database
spec:
selector:
app: postgres
role: replica
ports:
- port: 5432
targetPort: 5432
5. Database Operators
5.1 Why Are Operators Needed?
DB operations require complex Day-2 operations beyond simple deployment.
Tasks automated by Operators:
1. Cluster initialization (Primary + Replica setup)
2. Automatic failover (Replica promotion on Primary failure)
3. Backup/Recovery (scheduling, PITR)
4. Rolling upgrades (zero downtime)
5. Horizontal scaling (add/remove Replicas)
6. Monitoring integration
7. Certificate management (TLS)
8. Configuration changes (without restart)
5.2 PostgreSQL - CloudNativePG (CNPG)
# Install CloudNativePG
kubectl apply --server-side -f \
https://raw.githubusercontent.com/cloudnative-pg/cloudnative-pg/release-1.24/releases/cnpg-1.24.1.yaml
# CloudNativePG Cluster definition
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: myapp-db
namespace: database
spec:
instances: 3
imageName: ghcr.io/cloudnative-pg/postgresql:16.4
postgresql:
parameters:
max_connections: "200"
shared_buffers: "1GB"
effective_cache_size: "3GB"
work_mem: "16MB"
maintenance_work_mem: "256MB"
wal_buffers: "16MB"
random_page_cost: "1.1"
effective_io_concurrency: "200"
max_wal_size: "2GB"
checkpoint_completion_target: "0.9"
bootstrap:
initdb:
database: myapp
owner: app_user
secret:
name: myapp-db-credentials
storage:
size: 100Gi
storageClass: fast-ssd
resources:
requests:
memory: "2Gi"
cpu: "1"
limits:
memory: "4Gi"
cpu: "2"
backup:
barmanObjectStore:
destinationPath: "s3://my-backup-bucket/cnpg/"
s3Credentials:
accessKeyId:
name: aws-creds
key: ACCESS_KEY_ID
secretAccessKey:
name: aws-creds
key: SECRET_ACCESS_KEY
wal:
compression: gzip
data:
compression: gzip
retentionPolicy: "30d"
monitoring:
enablePodMonitor: true
5.3 MySQL - Percona XtraDB Cluster Operator
# Install Percona Operator
kubectl apply -f https://raw.githubusercontent.com/percona/percona-xtradb-cluster-operator/v1.15.0/deploy/bundle.yaml
# Percona XtraDB Cluster definition
apiVersion: pxc.percona.com/v1
kind: PerconaXtraDBCluster
metadata:
name: myapp-mysql
namespace: database
spec:
crVersion: "1.15.0"
secretsName: myapp-mysql-secrets
pxc:
size: 3
image: percona/percona-xtradb-cluster:8.0.36
resources:
requests:
memory: 2G
cpu: "1"
limits:
memory: 4G
cpu: "2"
volumeSpec:
persistentVolumeClaim:
storageClassName: fast-ssd
resources:
requests:
storage: 100Gi
affinity:
antiAffinityTopologyKey: "kubernetes.io/hostname"
haproxy:
enabled: true
size: 3
image: percona/haproxy:2.8.5
resources:
requests:
memory: 512M
cpu: "500m"
backup:
image: percona/percona-xtradb-cluster-operator:1.15.0-pxc8.0-backup
storages:
s3-backup:
type: s3
s3:
bucket: my-backup-bucket
credentialsSecret: aws-creds
region: ap-northeast-2
schedule:
- name: daily-backup
schedule: "0 3 * * *"
keep: 7
storageName: s3-backup
5.4 MongoDB - Community Operator
# Install MongoDB Community Operator
kubectl apply -f https://raw.githubusercontent.com/mongodb/mongodb-kubernetes-operator/master/config/crd/bases/mongodbcommunity.mongodb.com_mongodbcommunity.yaml
kubectl apply -k https://github.com/mongodb/mongodb-kubernetes-operator/config/rbac/
kubectl create -f https://raw.githubusercontent.com/mongodb/mongodb-kubernetes-operator/master/config/manager/manager.yaml
# MongoDB ReplicaSet definition
apiVersion: mongodbcommunity.mongodb.com/v1
kind: MongoDBCommunity
metadata:
name: myapp-mongodb
namespace: database
spec:
members: 3
type: ReplicaSet
version: "7.0.14"
security:
authentication:
modes: ["SCRAM"]
users:
- name: app-user
db: admin
passwordSecretRef:
name: mongodb-password
roles:
- name: readWrite
db: myapp
- name: clusterAdmin
db: admin
scramCredentialsSecretName: app-user-scram
statefulSet:
spec:
template:
spec:
containers:
- name: mongod
resources:
requests:
cpu: "1"
memory: 2Gi
limits:
cpu: "2"
memory: 4Gi
volumeClaimTemplates:
- metadata:
name: data-volume
spec:
storageClassName: fast-ssd
resources:
requests:
storage: 100Gi
5.5 Operator Comparison
| Feature | CloudNativePG | Percona XtraDB | MongoDB Community |
|---|---|---|---|
| Database | PostgreSQL | MySQL | MongoDB |
| License | Apache 2.0 | Apache 2.0 | SSPL + Apache |
| HA Method | Streaming Replication | Galera Cluster | ReplicaSet |
| Auto Failover | Supported | Supported | Supported |
| Backup | Barman/S3 | xtrabackup/S3 | mongodump integration |
| Monitoring | PodMonitor | PMM integration | Basic metrics |
| Maturity | Very high | High | Medium |
6. High Availability (HA)
6.1 Primary-Replica Configuration
PostgreSQL HA architecture (CloudNativePG):
[CNPG Operator]
|
v
[Primary Pod] ----Streaming Replication----> [Replica Pod 1]
| [Replica Pod 2]
|
[Headless Service]
|
postgres-primary (writes) ----> Routes to Primary only
postgres-replica (reads) ----> Routes to Replicas only
6.2 Automatic Failover
Failover scenario:
1. Primary Pod failure occurs
2. Operator detects it (liveness probe failure)
3. Selects Replica with most recent LSN
4. Promotes selected Replica to Primary
5. Remaining Replicas follow the new Primary
6. Service endpoints updated
7. Failed Pod recreated and joins as new Replica
Entire process: typically under 30 seconds
6.3 PodDisruptionBudget
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: postgres-pdb
namespace: database
spec:
minAvailable: 2 # Maintain at least 2 Pods
selector:
matchLabels:
app: postgres
6.4 Node Failure Protection
# Pod Topology Spread Constraints
spec:
template:
spec:
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app: postgres
- maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app: postgres
7. Backup and Recovery
7.1 Backup Types
Logical backup (pg_dump/mysqldump):
+ High portability (restore to different versions/platforms)
+ Individual table/schema backup possible
- Slow for large DBs
- Index rebuilding needed on restore
Physical backup (pgBackRest/xtrabackup):
+ Fast for large DBs
+ Incremental backup support
+ PITR (Point-in-Time Recovery) possible
- Restore only within same major version
- Whole cluster backup only
7.2 pgBackRest (PostgreSQL)
# CloudNativePG backup configuration
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: myapp-db
spec:
backup:
barmanObjectStore:
destinationPath: "s3://my-backup-bucket/cnpg/myapp-db/"
s3Credentials:
accessKeyId:
name: aws-creds
key: ACCESS_KEY_ID
secretAccessKey:
name: aws-creds
key: SECRET_ACCESS_KEY
wal:
compression: gzip
maxParallel: 4
data:
compression: gzip
immediateCheckpoint: true
retentionPolicy: "30d"
---
# Scheduled backup
apiVersion: postgresql.cnpg.io/v1
kind: ScheduledBackup
metadata:
name: myapp-db-daily
spec:
schedule: "0 3 * * *"
cluster:
name: myapp-db
backupOwnerReference: self
method: barmanObjectStore
7.3 PITR (Point-in-Time Recovery)
# Recover to a specific point in time with PITR
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: myapp-db-recovered
spec:
instances: 3
bootstrap:
recovery:
source: myapp-db-backup
recoveryTarget:
targetTime: "2026-03-24T10:30:00Z" # Recovery target time
externalClusters:
- name: myapp-db-backup
barmanObjectStore:
destinationPath: "s3://my-backup-bucket/cnpg/myapp-db/"
s3Credentials:
accessKeyId:
name: aws-creds
key: ACCESS_KEY_ID
secretAccessKey:
name: aws-creds
key: SECRET_ACCESS_KEY
7.4 Full Backup with Velero
# Install Velero
velero install \
--provider aws \
--bucket my-velero-bucket \
--secret-file ./credentials-velero \
--plugins velero/velero-plugin-for-aws:v1.10.0
# Namespace-level backup
velero backup create database-backup \
--include-namespaces database \
--snapshot-volumes=true \
--volume-snapshot-locations default
# Restore
velero restore create --from-backup database-backup
8. Monitoring
8.1 Prometheus + Grafana
# PostgreSQL Exporter (pg_exporter)
apiVersion: apps/v1
kind: Deployment
metadata:
name: postgres-exporter
namespace: database
spec:
replicas: 1
selector:
matchLabels:
app: postgres-exporter
template:
metadata:
labels:
app: postgres-exporter
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "9187"
spec:
containers:
- name: exporter
image: prometheuscommunity/postgres-exporter:0.15.0
ports:
- containerPort: 9187
env:
- name: DATA_SOURCE_URI
value: "postgres-primary.database.svc:5432/myapp?sslmode=disable"
- name: DATA_SOURCE_USER
valueFrom:
secretKeyRef:
name: postgres-secret
key: username
- name: DATA_SOURCE_PASS
valueFrom:
secretKeyRef:
name: postgres-secret
key: password
8.2 Key Monitoring Metrics
DB monitoring essential metrics:
Performance:
- Queries per second (QPS)
- Query latency (p50, p95, p99)
- Active connection count
- Cache hit ratio (Buffer Cache Hit Ratio)
Replication:
- Replication lag
- WAL receive delay
- Replica status
Storage:
- Disk usage
- IOPS / throughput
- WAL size
Resources:
- CPU utilization
- Memory usage
- Pod restart count
Operations:
- Dead tuples ratio
- Vacuum execution status
- Lock wait count
8.3 Percona Monitoring and Management (PMM)
# PMM Server installation
apiVersion: apps/v1
kind: Deployment
metadata:
name: pmm-server
namespace: monitoring
spec:
replicas: 1
selector:
matchLabels:
app: pmm-server
template:
spec:
containers:
- name: pmm-server
image: percona/pmm-server:2
ports:
- containerPort: 443
volumeMounts:
- name: pmm-data
mountPath: /srv
volumes:
- name: pmm-data
persistentVolumeClaim:
claimName: pmm-data
9. Performance Tuning
9.1 Resource Requests/Limits
# DB Pod resource configuration guide
resources:
requests:
# CPU: Minimum guaranteed CPU
# DBs are sensitive to CPU contention, so be generous
cpu: "2"
# Memory: shared_buffers + work_mem * max_connections + OS
memory: "4Gi"
limits:
# CPU limit: Don't set or set generously
# (throttling causes query latency)
cpu: "4"
# Memory limit: Set higher than request to prevent OOM Kill
memory: "8Gi"
9.2 Affinity and Anti-Affinity
# DB Pod scheduling optimization
spec:
template:
spec:
# Schedule only on DB-dedicated nodes
nodeSelector:
node-type: database
# Or use Toleration for dedicated nodes
tolerations:
- key: "dedicated"
operator: "Equal"
value: "database"
effect: "NoSchedule"
# Distribute Replicas across different nodes/zones
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- postgres
topologyKey: "kubernetes.io/hostname"
9.3 Kernel Parameter Tuning
# Set kernel parameters via initContainer
spec:
template:
spec:
initContainers:
- name: sysctl-tuning
image: busybox:1.36
securityContext:
privileged: true
command:
- sh
- -c
- |
sysctl -w vm.swappiness=1
sysctl -w vm.dirty_background_ratio=5
sysctl -w vm.dirty_ratio=10
sysctl -w vm.overcommit_memory=2
sysctl -w net.core.somaxconn=65535
sysctl -w net.ipv4.tcp_max_syn_backlog=65535
10. Security
10.1 NetworkPolicy
# Restrict network access to DB Pods
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: postgres-network-policy
namespace: database
spec:
podSelector:
matchLabels:
app: postgres
policyTypes:
- Ingress
- Egress
ingress:
- from:
- namespaceSelector:
matchLabels:
name: application
podSelector:
matchLabels:
app: backend
- podSelector:
matchLabels:
app: postgres # Allow inter-Pod replication
ports:
- port: 5432
protocol: TCP
egress:
- to:
- podSelector:
matchLabels:
app: postgres
ports:
- port: 5432
protocol: TCP
- to: # Allow DNS
- namespaceSelector: {}
ports:
- port: 53
protocol: UDP
- port: 53
protocol: TCP
10.2 Secrets Management
# Using External Secrets Operator
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: postgres-secret
namespace: database
spec:
refreshInterval: 1h
secretStoreRef:
name: aws-secrets-manager
kind: ClusterSecretStore
target:
name: postgres-secret
data:
- secretKey: username
remoteRef:
key: prod/database/postgres
property: username
- secretKey: password
remoteRef:
key: prod/database/postgres
property: password
10.3 TLS Configuration
# Issue DB certificates with cert-manager
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: postgres-tls
namespace: database
spec:
secretName: postgres-tls-secret
duration: 8760h # 1 year
renewBefore: 720h # Renew 30 days before expiry
issuerRef:
name: internal-ca
kind: ClusterIssuer
dnsNames:
- postgres-primary.database.svc.cluster.local
- postgres-headless.database.svc.cluster.local
- "*.postgres-headless.database.svc.cluster.local"
11. Migrating from VMs to K8s
11.1 Migration Phases
Phase 1: Preparation
1. Install DB Operator on K8s cluster
2. Configure StorageClass, NetworkPolicy
3. Create target DB cluster (empty)
Phase 2: Data migration
Method A - Logical replication (minimal downtime):
1. Set up logical replication on VM DB
2. Configure K8s DB as Subscriber
3. Initial sync + change streaming
4. Application switchover (brief downtime)
Method B - pg_dump/restore:
1. Backup VM DB (pg_dump)
2. Restore to K8s DB (pg_restore)
3. Switch during downtime window
Phase 3: Switchover
1. Update application DB connection strings
2. DNS change or Service endpoint update
3. Set old VM DB to read-only (rollback safety)
Phase 4: Cleanup
1. Delete old VM DB after verification
2. Update monitoring/alerts
11.2 Logical Replication Setup
-- VM DB (Publisher) setup
-- postgresql.conf
-- wal_level = logical
-- max_replication_slots = 10
CREATE PUBLICATION myapp_pub FOR ALL TABLES;
-- K8s DB (Subscriber) setup
CREATE SUBSCRIPTION myapp_sub
CONNECTION 'host=vm-db.example.com port=5432 dbname=myapp user=repl_user password=secret'
PUBLICATION myapp_pub;
12. Production Checklist
K8s DB Production Checklist:
Storage:
[ ] StorageClass has reclaimPolicy: Retain
[ ] Volume size has 20%+ free space
[ ] allowVolumeExpansion: true confirmed
[ ] IOPS/throughput matches workload
High Availability:
[ ] Minimum 3 instances (1 Primary + 2 Replica)
[ ] PodDisruptionBudget configured
[ ] Pod Anti-Affinity (spread across nodes/zones)
[ ] Automatic failover tested
Backup:
[ ] Automated backup schedule set (at least daily)
[ ] WAL archiving enabled (for PITR)
[ ] Backup restore tested
[ ] Retention policy configured (30+ days)
Security:
[ ] NetworkPolicy restricts access
[ ] Secrets managed via External Secrets Operator
[ ] TLS encryption enabled
[ ] DB user permissions minimized (least privilege)
Monitoring:
[ ] Prometheus + Grafana dashboard configured
[ ] Alert rules set (replication lag, disk usage, connections)
[ ] Log collection (Loki/EFK)
[ ] Query performance monitoring
Performance:
[ ] Resource requests/limits properly set
[ ] DB parameters tuned (shared_buffers, work_mem, etc.)
[ ] Kernel parameters optimized
[ ] Dedicated nodes used (Taint/Toleration)
13. Practical Quiz
Q1: What is the key difference between StatefulSet and Deployment?
Answer:
StatefulSet guarantees:
- Stable network identity: Each Pod gets a sequential name (e.g., db-0, db-1, db-2). The name persists even when a Pod is recreated.
- Persistent storage binding: volumeClaimTemplates automatically create a dedicated PVC for each Pod. Even if a Pod is deleted and recreated, it reconnects to the same PVC.
- Sequential creation/deletion: Pod 1 is created only after Pod 0 reaches the Ready state (OrderedReady policy).
In contrast, Deployment uses random Pod names, shared volumes, and parallel creation, making it suitable for stateless apps.
Q2: Why should DB storage have reclaimPolicy set to Retain?
Answer:
When reclaimPolicy is Delete (the default), the PV and actual storage (EBS volume, etc.) are deleted along with the PVC. This causes data loss in these scenarios:
- Accidental deletion of StatefulSet or PVC
- Namespace deletion
- Helm uninstall
With Retain, the PV and actual storage persist even when the PVC is deleted, enabling data recovery. Production databases must always use Retain.
Q3: How does automatic failover work in CloudNativePG?
Answer:
- The CNPG Operator continuously monitors all instances.
- When the Primary Pod's liveness probe fails, the Operator detects the failure.
- The Operator compares WAL LSN (Log Sequence Number) across all Replicas and selects the one with the most recent data.
- The selected Replica is promoted to Primary using pg_promote.
- Remaining Replicas are reconfigured to follow the new Primary.
- Service endpoints are automatically updated to point to the new Primary.
- The failed Pod is recreated and joins as a new Replica.
The entire process typically completes within 30 seconds.
Q4: Why is it recommended not to set CPU limits for DB Pods on K8s?
Answer:
When CPU limits are set, Kubernetes applies CFS (Completely Fair Scheduler) throttling. When a DB momentarily needs high CPU (e.g., complex queries, VACUUM), throttling causes significant query latency spikes.
Instead, these strategies are recommended:
- Set only CPU requests to guarantee a minimum CPU allocation
- Use dedicated DB nodes (Taint/Toleration) to prevent resource contention with other workloads
- Ensure sufficient CPU resources at the node level
Memory limits should still be set to prevent OOM Kill, but with generous headroom above the request value.
Q5: What is the strategy for minimal downtime when migrating a DB from VMs to K8s?
Answer:
Use logical replication:
- Set
wal_level = logicalon the VM DB and create a Publication. - Create a Subscription on the K8s DB connecting to the VM DB.
- Initial data synchronization proceeds automatically.
- After sync completes, changes are streamed in real time.
- Briefly pause the application (seconds to minutes) and verify replication lag is zero.
- Update the application's DB connection string to point to the K8s DB.
- Restart the application.
This approach minimizes downtime to seconds or minutes.
14. References
- CloudNativePG Documentation
- Percona Operator for MySQL Documentation
- MongoDB Kubernetes Operator
- Kubernetes StatefulSet Documentation
- Kubernetes Persistent Volumes
- Velero - Backup and Restore
- External Secrets Operator
- Percona Monitoring and Management (PMM)
- PostgreSQL Kubernetes Best Practices
- Zalando Postgres Operator
- CrunchyData PGO
- K8ssandra - Cassandra on Kubernetes
- Data on Kubernetes Community
- CNCF Storage Landscape