Skip to content

✍️ 필사 모드: Running Databases on K8s -- Complete Guide to CNPG, Percona, Vitess, and Helm Charts

English
0%
정확도 0%
💡 왼쪽 원문을 읽으면서 오른쪽에 따라 써보세요. Tab 키로 힌트를 받을 수 있습니다.

Introduction

Running databases on Kubernetes was a controversial topic just a few years ago. Many asked, "Why bother running a DB on K8s, which is optimized for stateless workloads?" However, as of 2026, with StatefulSets and the Operator ecosystem having matured sufficiently, running databases on K8s has become a standard choice.

This article compares major database Operators and Helm charts, and explains which tools to use in which situations in practice.


1. Why Run Databases on K8s

Advantages

  • Consistent deployment pipeline: Manage applications and databases with the same GitOps workflow
  • Resource efficiency: Applications and databases share node resources, maximizing utilization through automatic scheduling
  • Auto-recovery: Automatic Pod restart on failure, automatic rescheduling on node failure
  • Environment consistency: Declaratively deploy identical DB configurations across dev/staging/production
  • Cost reduction: Potential license and infrastructure cost savings compared to managed DB services (RDS, Cloud SQL)

Disadvantages

  • Operational complexity: Many areas to manage directly, including storage, networking, and backups
  • Performance overhead: Additional latency from container networking and storage layers
  • Expertise required: Deep understanding of both K8s and databases needed
  • Data loss risk: Potential data loss from incorrect PV/PVC settings or upgrade mistakes

StatefulSet Basics

StatefulSet is a K8s controller for workloads that need to maintain state. Unlike regular Deployments, it guarantees:

  • Stable network IDs: Each Pod maintains a unique hostname (e.g., postgres-0, postgres-1)
  • Ordered deployment/scaling: Pods are created from 0 in order and terminated in reverse
  • Persistent storage: Each Pod gets a unique PVC bound through volumeClaimTemplates
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: postgres
spec:
  serviceName: postgres
  replicas: 3
  selector:
    matchLabels:
      app: postgres
  template:
    metadata:
      labels:
        app: postgres
    spec:
      containers:
        - name: postgres
          image: postgres:16
          ports:
            - containerPort: 5432
          volumeMounts:
            - name: data
              mountPath: /var/lib/postgresql/data
  volumeClaimTemplates:
    - metadata:
        name: data
      spec:
        accessModes: ['ReadWriteOnce']
        storageClassName: gp3
        resources:
          requests:
            storage: 50Gi

However, StatefulSet alone makes it difficult to implement HA, automatic failover, backup/recovery, and monitoring. This is where the Operator pattern comes in.


2. CloudNativePG (CNPG) -- PostgreSQL-Specific Operator

Overview

CloudNativePG (CNPG) is a PostgreSQL-specific Kubernetes Operator that was initially developed by EDB (EnterpriseDB) and is now managed as a CNCF Sandbox project. As of April 2026, the latest version is 1.29, featuring innovative PostgreSQL extension management through Image Catalog and artifacts ecosystems.

Key Features

  • Native K8s design: Uses K8s leader election mechanisms directly without external HA tools like Patroni
  • Declarative DB management: Manages PostgreSQL database lifecycle via Database CRD
  • Logical replication: Supports online migration and major version upgrades via Publication / Subscription CRDs
  • CNPG-I plugin framework: Extensible through external plugins
  • PITR support: Point-In-Time Recovery based on WAL archiving
  • Parallel reconciler: Improves cluster management efficiency through parallel processing

Installation

Helm-based installation is the easiest approach.

# Add Helm repository
helm repo add cnpg https://cloudnative-pg.github.io/charts
helm repo update

# Install Operator
helm upgrade --install cnpg \
  --namespace cnpg-system \
  --create-namespace \
  cnpg/cloudnative-pg

Or you can install directly using manifests.

kubectl apply --server-side -f \
  https://raw.githubusercontent.com/cloudnative-pg/cloudnative-pg/release-1.29/releases/cnpg-1.29.0.yaml

HA Architecture

CNPG uses a Primary 1 + Standby N architecture. The Primary handles writes, and Standbys synchronize data through streaming replication.

  • Automatic promotion of a Standby when the Primary fails
  • Supports both Switchover (planned transition) and Failover (failure transition)
  • Data durability guarantee option through synchronous replication (dataDurability)

Backup and Recovery

CNPG has built-in continuous backup based on Barman.

apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: prod-pg
spec:
  instances: 3
  storage:
    size: 100Gi
    storageClass: gp3
  backup:
    barmanObjectStore:
      destinationPath: s3://my-backup-bucket/prod-pg/
      s3Credentials:
        accessKeyId:
          name: aws-creds
          key: ACCESS_KEY_ID
        secretAccessKey:
          name: aws-creds
          key: SECRET_ACCESS_KEY
      wal:
        compression: gzip
    retentionPolicy: '30d'

You can schedule regular backups with ScheduledBackup.

apiVersion: postgresql.cnpg.io/v1
kind: ScheduledBackup
metadata:
  name: prod-pg-daily
spec:
  schedule: '0 2 * * *'
  cluster:
    name: prod-pg
  backupOwnerReference: self

3. Percona Operator -- Multi-DB Support

Overview

Percona provides dedicated Kubernetes Operators for three databases: MySQL, MongoDB, and PostgreSQL. Fully open source under the Apache 2.0 license, offering enterprise-grade features for free.

Features by Supported DB

Percona Operator for MySQL (PXC)

  • Multi-Primary architecture based on Percona XtraDB Cluster
  • Galera synchronous replication enables read/write on all nodes
  • Automatic routing through ProxySQL or HAProxy
  • Group Replication option added in 2026 GA release

Percona Operator for MongoDB (PSMDB)

  • ReplicaSet and Sharded Cluster support
  • PVC snapshot-based backup support (added in 2025)
  • Official MongoDB 8.0 support
  • Cloud storage authentication via IAM Role for Service Account

Percona Operator for PostgreSQL (PPG)

  • Patroni-based HA configuration
  • Native pg_tde (Transparent Data Encryption) support (2026)
  • Zero-downtime major version upgrade roadmap in progress

Integrated Monitoring: PMM

Percona Monitoring and Management (PMM) is a tool that monitors MySQL, MongoDB, and PostgreSQL all from a single dashboard.

apiVersion: pxc.percona.com/v1
kind: PerconaXtraDBCluster
metadata:
  name: prod-mysql
spec:
  crVersion: '1.15.0'
  pxc:
    size: 3
    image: percona/percona-xtradb-cluster:8.0
    resources:
      requests:
        memory: 2Gi
        cpu: '1'
    volumeSpec:
      persistentVolumeClaim:
        storageClassName: gp3
        resources:
          requests:
            storage: 100Gi
  haproxy:
    enabled: true
    size: 2
  pmm:
    enabled: true
    serverHost: monitoring-service
  backup:
    schedule:
      - name: daily-backup
        schedule: '0 3 * * *'
        keep: 7
        storageName: s3-backup
    storages:
      s3-backup:
        type: s3
        s3:
          bucket: my-backup-bucket
          credentialsSecret: aws-creds
          region: ap-northeast-2

Multi-Cluster Support

Percona Operator supports cross-region replication to synchronize data across multiple K8s clusters. This enables disaster recovery (DR) configurations.


4. Vitess -- MySQL Horizontal Sharding

Overview

Vitess was developed at YouTube to handle large-scale MySQL workloads and is now a CNCF Graduated project. It provides a MySQL-compatible interface while supporting transparent sharding, connection pooling, and online resharding.

PlanetScale is a representative DBaaS service that commercialized Vitess.

Architecture Components

ComponentRole
VTGateQuery router, the endpoint applications connect to
VTTabletProxy wrapping each MySQL instance
Topology ServiceCluster metadata storage (etcd, etc.)
VTOrcOrchestrator, handles automatic failover
VTAdminWeb-based management UI

When to Choose Vitess

  • Massive write traffic that a single MySQL instance cannot handle
  • Sharding of large tables with billions of rows or more
  • Environments with frequent online schema changes (Online DDL)
  • Need for horizontal scaling while maintaining MySQL compatibility

Vitess on K8s Installation

Use the Vitess Operator provided by PlanetScale.

# Install Vitess Operator
kubectl apply -f https://github.com/planetscale/vitess-operator/releases/latest/download/operator.yaml

Manage Keyspaces (logical databases) and Shards declaratively.

apiVersion: planetscale.com/v2
kind: VitessCluster
metadata:
  name: prod-vitess
spec:
  images:
    vtgate: vitess/lite:v19
    vttablet: vitess/lite:v19
    vtbackup: vitess/lite:v19
    vtctld: vitess/lite:v19
    vtorc: vitess/lite:v19
  cells:
    - name: zone1
      gateway:
        replicas: 2
        resources:
          requests:
            cpu: '1'
            memory: 1Gi
  keyspaces:
    - name: commerce
      turndownPolicy: Immediate
      partitionings:
        - equal:
            parts: 2
            shardTemplate:
              databaseInitScriptSecret:
                name: commerce-schema
                key: init.sql
              tabletPools:
                - cell: zone1
                  type: replica
                  replicas: 3
                  dataVolumeClaimTemplate:
                    storageClassName: gp3
                    resources:
                      requests:
                        storage: 50Gi

Caveats

Vitess is very powerful but has a steep learning curve. It may be overkill for simple CRUD applications, and the sharding strategy (Vschema) requires thorough design review.


5. Major Helm Chart Comparison

You can also deploy databases on K8s using just Helm charts without Operators. Bitnami was the most widely used Helm chart provider, but important changes have occurred since 2025.

Bitnami License Changes (2025)

Since September 2025, most Bitnami Helm chart OCI packages have moved behind a Broadcom paid subscription. Public docker.io/bitnami images have been moved to the "Bitnami Legacy" repository and no longer receive updates, fixes, or security patches.

As an alternative, Chainguard provides over 40 security-hardened Helm charts forked from Bitnami, and community-based alternatives are also growing.

Helm Chart Comparison Table

ChartDBDefault ConfigHA SupportBuilt-in BackupNotes
bitnami/postgresqlPostgreSQLPrimary + Read ReplicaRepmgr-basedXLegacy warning
bitnami/postgresql-haPostgreSQLPrimary + StandbyPgpool-II integrationXHA-dedicated chart
bitnami/mysqlMySQLPrimary + SecondarySemi-sync replicationXInnoDB Cluster option
bitnami/redisRedisMaster + ReplicaSentinel-basedXSeparate Cluster mode
bitnami/mongodbMongoDBReplicaSetBuilt-inXSeparate Sharded chart
bitnami/mariadbMariaDBPrimary + SecondaryGalera optionXMySQL compatible

When Helm Charts Are Appropriate

  • Dev/test environments: When you want to spin up a DB quickly
  • Simple configurations: Small-scale services where HA is not mandatory
  • Learning purposes: Building foundational K8s DB operations knowledge
  • Many custom settings: When fine-tuning through values.yaml is needed

Example: PostgreSQL HA Helm Chart

helm install prod-pg bitnami/postgresql-ha \
  --set postgresql.replicaCount=3 \
  --set postgresql.resources.requests.memory=2Gi \
  --set postgresql.resources.requests.cpu=1 \
  --set persistence.size=100Gi \
  --set persistence.storageClass=gp3 \
  --set pgpool.replicaCount=2 \
  --set metrics.enabled=true

6. CNPG vs Percona vs Vitess vs Helm Charts -- Comprehensive Comparison

Feature Comparison Table

ItemCNPGPerconaVitessHelm Charts
Supported DBsPostgreSQLMySQL, MongoDB, PGMySQL (sharding)Various
LicenseApache 2.0Apache 2.0Apache 2.0Varies by chart
CNCF StatusSandbox-Graduated-
HA Auto-FailoverOOODepends on chart
Auto BackupO (Barman)O (multi-storage)OX (separate config)
PITROOPartialX
Horizontal ShardingXMongoDB onlyO (core feature)X
Monitoring IntegrationPrometheusPMM + PrometheusVTAdminPer-chart metrics
Connection PoolingPgBouncer built-inProxySQL/HAProxyVTGate built-inSeparate config
Operational DifficultyMediumMediumHighLow
Production ReadinessHighHighHigh (large-scale)Medium
CRD-based ManagementOOOX

Selection Guide

  • PostgreSQL on K8s: CNPG is the top choice. K8s-native design with an active community
  • MySQL/MongoDB operations: Percona Operator. Integrated monitoring (PMM) is a strength
  • Large-scale MySQL sharding: Vitess. For processing billions of rows of data
  • Dev/test environments: Helm charts. Quick deployment and simple configuration
  • Multi-DB unified management: Percona. MySQL + MongoDB + PostgreSQL under one operational model

7. Operational Considerations

Storage (PV/PVC)

Storage is the most important factor in K8s DB operations.

# Recommended StorageClass example (AWS EBS gp3)
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: gp3-db
provisioner: ebs.csi.aws.com
parameters:
  type: gp3
  iops: '5000'
  throughput: '250'
  encrypted: 'true'
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
reclaimPolicy: Retain

Key principles:

  • volumeBindingMode: WaitForFirstConsumer -- Creates volume in the AZ where the Pod is scheduled
  • reclaimPolicy: Retain -- Preserves data even when PVC is deleted
  • allowVolumeExpansion: true -- Allows online volume expansion
  • Separate WAL and Data volumes -- Separate sequential writes (WAL) and random access (Data) for performance

Performance Tuning

# CPU pinning and NUMA-aware configuration example
spec:
  containers:
    - name: postgres
      resources:
        requests:
          cpu: '4'
          memory: 8Gi
        limits:
          cpu: '4'
          memory: 8Gi
      # Ensure Guaranteed QoS class
      # Set requests == limits

Additional tips:

  • Guaranteed QoS: Set requests equal to limits to prevent CPU throttling
  • Topology-aware scheduling: Use nodeAffinity to place DB Pods on high-performance nodes
  • Anti-affinity: Ensure DB Pods are distributed across different nodes
spec:
  affinity:
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        - labelSelector:
            matchExpressions:
              - key: app
                operator: In
                values:
                  - postgres
          topologyKey: kubernetes.io/hostname

Resource Limits

  • Always set resources.requests and limits for DB Pods
  • Exceeding memory limits triggers OOMKill -- the DB process is forcefully terminated
  • Set PostgreSQL shared_buffers to around 25% of container memory
  • Set MySQL innodb_buffer_pool_size to 50-70% of container memory

Monitoring

# Enable PodMonitor in CNPG
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: prod-pg
spec:
  instances: 3
  monitoring:
    enablePodMonitor: true
    customQueriesConfigMap:
      - name: pg-custom-queries
        key: queries
  storage:
    size: 100Gi

Essential monitoring metrics:

  • Replication Lag: How far Standby is behind the Primary
  • Connection count: Usage rate against maximum connections
  • Transaction throughput (TPS): Transactions per second
  • Disk usage: Set alerts before PV capacity is exhausted
  • WAL archiving status: Verify backup system is functioning normally

Backup Strategy

Apply the 3-2-1 backup rule in K8s environments.

  • 3 copies: Primary data + WAL archive + physical backup
  • 2 different media: Local PV + Object Storage (S3/GCS)
  • 1 offsite: Replicate to a bucket in another region
# CNPG recovery cluster example
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: recovery-cluster
spec:
  instances: 2
  storage:
    size: 100Gi
  bootstrap:
    recovery:
      source: prod-pg
      recoveryTarget:
        targetTime: '2026-04-10T08:00:00Z'
  externalClusters:
    - name: prod-pg
      barmanObjectStore:
        destinationPath: s3://my-backup-bucket/prod-pg/
        s3Credentials:
          accessKeyId:
            name: aws-creds
            key: ACCESS_KEY_ID
          secretAccessKey:
            name: aws-creds
            key: SECRET_ACCESS_KEY

8. Practical Example: Building a PostgreSQL HA Cluster with CNPG

A complete CNPG cluster YAML ready for production use.

Step 1: Create Namespace and Secrets

kubectl create namespace database

kubectl create secret generic pg-superuser \
  --namespace database \
  --from-literal=username=postgres \
  --from-literal=password=CHANGE_ME_TO_STRONG_PASSWORD

kubectl create secret generic aws-creds \
  --namespace database \
  --from-literal=ACCESS_KEY_ID=your-access-key \
  --from-literal=SECRET_ACCESS_KEY=your-secret-key

Step 2: Define the PostgreSQL Cluster

apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: prod-pg
  namespace: database
spec:
  description: 'Production PostgreSQL HA Cluster'
  imageName: ghcr.io/cloudnative-pg/postgresql:16.4
  instances: 3
  startDelay: 30
  stopDelay: 30
  primaryUpdateStrategy: unsupervised
  postgresql:
    parameters:
      shared_buffers: '2GB'
      effective_cache_size: '6GB'
      work_mem: '64MB'
      maintenance_work_mem: '512MB'
      max_connections: '200'
      max_wal_size: '2GB'
      min_wal_size: '1GB'
      wal_buffers: '64MB'
      random_page_cost: '1.1'
      effective_io_concurrency: '200'
      log_statement: 'ddl'
      log_min_duration_statement: '1000'
    pg_hba:
      - host all all 10.0.0.0/8 scram-sha-256
  bootstrap:
    initdb:
      database: appdb
      owner: appuser
      secret:
        name: pg-superuser
  storage:
    size: 100Gi
    storageClass: gp3-db
  walStorage:
    size: 30Gi
    storageClass: gp3-db
  resources:
    requests:
      memory: 8Gi
      cpu: '4'
    limits:
      memory: 8Gi
      cpu: '4'
  affinity:
    enablePodAntiAffinity: true
    topologyKey: kubernetes.io/hostname
  monitoring:
    enablePodMonitor: true
  backup:
    barmanObjectStore:
      destinationPath: s3://my-backup-bucket/prod-pg/
      s3Credentials:
        accessKeyId:
          name: aws-creds
          key: ACCESS_KEY_ID
        secretAccessKey:
          name: aws-creds
          key: SECRET_ACCESS_KEY
      wal:
        compression: gzip
        maxParallel: 4
      data:
        compression: gzip
        jobs: 4
    retentionPolicy: '30d'
  nodeMaintenanceWindow:
    inProgress: false
    reusePVC: true

Step 3: Scheduled Backup

apiVersion: postgresql.cnpg.io/v1
kind: ScheduledBackup
metadata:
  name: prod-pg-daily
  namespace: database
spec:
  schedule: '0 2 * * *'
  backupOwnerReference: self
  cluster:
    name: prod-pg
  target: prefer-standby

Step 4: Pooler (PgBouncer) Configuration

apiVersion: postgresql.cnpg.io/v1
kind: Pooler
metadata:
  name: prod-pg-pooler-rw
  namespace: database
spec:
  cluster:
    name: prod-pg
  instances: 2
  type: rw
  pgbouncer:
    poolMode: transaction
    parameters:
      max_client_conn: '1000'
      default_pool_size: '50'

Step 5: Deploy and Verify

kubectl apply -f cluster.yaml
kubectl apply -f scheduled-backup.yaml
kubectl apply -f pooler.yaml

# Check cluster status
kubectl get cluster -n database

# Check Pod status
kubectl get pods -n database

# Cluster detail info
kubectl describe cluster prod-pg -n database

# Test connection to Primary
kubectl exec -it prod-pg-1 -n database -- psql -U postgres -d appdb

9. Anti-Patterns -- Mistakes to Avoid in K8s DB Operations

1) Deploying DB with Deployment

Deployment is for stateless workloads. Using Deployment for a DB can cause data loss on Pod restart, or multiple Pods accessing the same data directory. Always use StatefulSet or Operator CRDs.

2) Using emptyDir without PVC

emptyDir data is deleted when the Pod is removed. Build the habit of using PVC for DB data even in test environments.

3) Operating without Backups

Even if the Operator provides HA, backups must be configured separately. HA protects against infrastructure failures, while backups protect against logical errors (like a wrong DELETE statement).

4) Not Setting Resource Limits

Without limits on DB Pods, they can starve other Pods' resources, or get terminated at unpredictable times during OOM. Set requests equal to limits to secure Guaranteed QoS.

5) Using reclaimPolicy: Delete

If the default reclaimPolicy is Delete, PV (actual data) is also deleted when PVC is removed. Always set DB StorageClass to Retain.

6) Placing All DB Pods in a Single AZ

Without Pod Anti-Affinity, all DB Pods may land on the same node or AZ. A failure in that node/AZ would take down the entire DB.

# Correct Anti-Affinity configuration
affinity:
  podAntiAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
            - key: cnpg.io/cluster
              operator: In
              values:
                - prod-pg
        topologyKey: topology.kubernetes.io/zone

7) Operating without Monitoring/Alerts

Replication lag increases or disk fills up without anyone knowing. Configure at minimum these alerts:

  • Disk usage exceeding 80%
  • Replication lag over 10 seconds
  • Increasing Pod restart count
  • Backup failure detection

8) Not Testing Rolling Updates During DB Upgrades

Major version upgrades for PostgreSQL, MySQL, etc., should always be tested on a separate cluster first. Even if the Operator supports automatic upgrades, application compatibility testing must be done by humans.

9) Managing Secrets in Plaintext

Don't hardcode DB passwords in YAML. Use External Secrets Operator or Sealed Secrets to manage Secrets securely.

# External Secrets example
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: pg-credentials
spec:
  refreshInterval: 1h
  secretStoreRef:
    name: aws-secretsmanager
    kind: SecretStore
  target:
    name: pg-superuser
  data:
    - secretKey: username
      remoteRef:
        key: prod/database/credentials
        property: username
    - secretKey: password
      remoteRef:
        key: prod/database/credentials
        property: password

Conclusion

Running databases on K8s is no longer an experimental choice. Mature Operators like CNPG, Percona, and Vitess automate complex operational tasks and integrate naturally with K8s's declarative management model.

Key Summary:

  • For running PostgreSQL on K8s, CloudNativePG is the best choice
  • For unified MySQL/MongoDB/PostgreSQL management, Percona Operator
  • For large-scale MySQL sharding, Vitess
  • For dev/test environments, Helm charts remain convenient
  • Regardless of the tool you choose, storage, backup, and monitoring are essential

If adopting an Operator, test thoroughly in a development environment first, simulate failure scenarios (Pod deletion, node down, AZ failure), and then apply to production.

현재 단락 (1/538)

Running databases on Kubernetes was a controversial topic just a few years ago.

작성 글자: 0원문 글자: 18,931작성 단락: 0/538