✍️ 필사 모드: PostgreSQL Production Operations Complete Guide 2025: Extensions, Tuning, Monitoring, HA
EnglishTL;DR
- Extensions are PostgreSQL's true power: pg_stat_statements (query analysis), pgvector (AI), PostGIS (geo), TimescaleDB (time-series), pg_partman (partitioning)
- Three key tuning parameters:
shared_buffers(25% RAM),work_mem(sort/hash memory),effective_cache_size(50-75% RAM) - Autovacuum is critical: Misconfigured causes table bloat → disk explosion → slow queries. Tuning
autovacuum_vacuum_scale_factoris essential - Patroni: De facto standard for PostgreSQL HA. Auto failover within 30 seconds
- Backup: pgBackRest is 10x+ faster than pg_dump. WAL archiving + PITR is essential
1. Why PostgreSQL Remains Loved in 2025
1.1 Five Key Reasons
- Extensibility — 200+ official extensions. Vector, time-series, geo, all of them
- JSONB performance — Comparable or faster than MongoDB for JSON
- MVCC — No read-write conflicts (excellent concurrency)
- Standards compliant — Most faithful ANSI SQL implementation
- Open source — Zero license costs vs Oracle/SQL Server
1.2 Why PostgreSQL Is Standard Even in NewSQL Era
CockroachDB, YugabyteDB, Spanner emerged, but most use PostgreSQL wire protocol compatibility. This means PostgreSQL itself is in a stable position.
2. Essential Extensions Top 10
2.1 Operations Must-Have: pg_stat_statements
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
-- postgresql.conf
shared_preload_libraries = 'pg_stat_statements'
pg_stat_statements.track = 'all'
pg_stat_statements.max = 10000
Usage:
-- TOP 10 slowest queries
SELECT
substring(query, 1, 80) as query,
calls,
total_exec_time::int as total_ms,
mean_exec_time::int as avg_ms,
rows
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;
This single query solves 80% of performance issues. You immediately see which queries consume the most time.
2.2 AI Era Essential: pgvector
CREATE EXTENSION IF NOT EXISTS vector;
CREATE TABLE documents (
id SERIAL PRIMARY KEY,
content TEXT,
embedding vector(768)
);
CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops);
SELECT id, content, 1 - (embedding <=> '[0.1, 0.2, ...]'::vector) AS similarity
FROM documents
ORDER BY embedding <=> '[0.1, 0.2, ...]'::vector
LIMIT 10;
Why pgvector instead of dedicated Vector DB?
- Leverage existing PostgreSQL ops knowledge
- ACID transactions (vector + metadata consistency)
- Cost savings (no separate infrastructure)
2.3 Time-Series: TimescaleDB
CREATE EXTENSION IF NOT EXISTS timescaledb;
CREATE TABLE metrics (
time TIMESTAMPTZ NOT NULL,
device_id TEXT,
temperature FLOAT
);
SELECT create_hypertable('metrics', 'time', chunk_time_interval => INTERVAL '1 day');
SELECT add_retention_policy('metrics', INTERVAL '30 days');
2.4 Other Core Extensions
| Extension | Purpose |
|---|---|
| PostGIS | Geographic info |
| pg_partman | Auto partition management |
| pg_cron | In-DB scheduler |
| pgaudit | Audit logging |
| pglogical | Logical replication |
| citus | Horizontal sharding |
| pg_repack | Lock-free table reorganization |
3. Core Performance Tuning
3.1 Memory Parameters
| Parameter | Recommended (16GB RAM) | Description |
|---|---|---|
shared_buffers | 4GB (25% RAM) | PostgreSQL's own cache |
effective_cache_size | 12GB (75% RAM) | OS cache estimate |
work_mem | 64MB | Per sort/hash memory |
maintenance_work_mem | 1GB | For VACUUM/CREATE INDEX |
wal_buffers | 16MB | WAL buffer |
Formula: shared_buffers + (work_mem × max_connections) ≤ RAM × 0.5
3.2 Checkpoint Tuning
checkpoint_timeout = 15min
checkpoint_completion_target = 0.9
max_wal_size = 4GB
min_wal_size = 1GB
3.3 Query Planner
random_page_cost = 1.1 # SSD (HDD is 4.0)
effective_io_concurrency = 200 # SSD/NVMe
default_statistics_target = 100
4. VACUUM and Autovacuum
4.1 Why VACUUM Is Needed
PostgreSQL's MVCC handles UPDATE as "INSERT new row + mark old row". Old rows aren't deleted immediately but remain as "dead tuples". VACUUM cleans them.
Without VACUUM:
- Table size keeps growing (disk explosion)
- Indexes point to dead tuples → slow queries
- Transaction ID wraparound — data corruption risk!
4.2 Autovacuum Monitoring
SELECT
schemaname,
relname,
n_live_tup,
n_dead_tup,
ROUND(n_dead_tup::numeric / NULLIF(n_live_tup, 0), 2) AS dead_ratio,
last_autovacuum
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC
LIMIT 20;
dead_ratio above 0.2 means autovacuum can't keep up.
4.3 Autovacuum Tuning
autovacuum = on
autovacuum_max_workers = 5
autovacuum_naptime = 30s
autovacuum_vacuum_scale_factor = 0.05
autovacuum_analyze_scale_factor = 0.02
autovacuum_vacuum_cost_limit = 2000
5. Monitoring and Debugging
5.1 Real-time Activity
-- Currently running queries
SELECT pid, usename, state, query_start, query
FROM pg_stat_activity
WHERE state != 'idle'
ORDER BY query_start;
5.2 EXPLAIN ANALYZE
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT * FROM orders WHERE customer_id = 12345;
Key indicators:
Seq ScanvsIndex Scan— index usageactual rowsvsestimated rows— statistics accuracyBuffers: shared hit/read— cache hit rateExecution Time— actual time
5.3 pgBadger — Log Analysis
log_min_duration_statement = 1000
pgbadger -j 4 /var/log/postgresql/postgresql-*.log -o report.html
6. High Availability — Patroni
6.1 How Patroni Works
- Each PostgreSQL node runs Patroni daemon
- Patroni renews leader lock in etcd
- Primary fails → lock expires → Patroni elects new Primary
- HAProxy/PgBouncer auto-routes to new Primary
6.2 Patroni Alternatives
| Tool | Features |
|---|---|
| Patroni | Most popular, requires etcd |
| repmgr | EnterpriseDB, simple |
| CloudNativePG | K8s native (CNCF) |
| Stolon | Container-friendly |
| pg_auto_failover | Microsoft, simple |
7. Backup and Recovery
7.1 pg_dump vs pgBackRest
| pg_dump | pgBackRest | |
|---|---|---|
| Speed | Slow (single-thread) | Fast (parallel) |
| Incremental | No | Yes |
| Compression | gzip | lz4/zstd |
| PITR | No | Yes |
| S3 integration | No | Yes |
7.2 PITR (Point-in-Time Recovery)
pgbackrest --stanza=main \
--type=time \
--target="2025-04-15 14:30:00" \
restore
This enables responses like "rewind to yesterday at 3:25:30 PM".
8. Common Pitfalls
8.1 Index Bloat
-- Unused indexes
SELECT
schemaname || '.' || relname AS table,
indexrelname AS unused_index,
pg_size_pretty(pg_relation_size(indexrelid)) AS size
FROM pg_stat_user_indexes
WHERE idx_scan = 0
ORDER BY pg_relation_size(indexrelid) DESC;
8.2 N+1 Queries
ORM's most common problem. Use pg_stat_statements to find duplicate queries called thousands of times.
8.3 Connection Explosion
PostgreSQL connection = separate process (~10MB each). Solution: PgBouncer (transaction mode recommended).
9. PostgreSQL 17 New Features (2024-09)
- Improved incremental sort
- Logical replication: failover slot sync
- pg_basebackup: incremental backup support
- VACUUM: memory-based dead tuple tracking (faster)
- MERGE command: RETURNING clause added
Quiz
1. Why shouldn't shared_buffers be set to 50% of RAM?
Answer: PostgreSQL uses both its own shared_buffers and OS file system cache. Setting shared_buffers too high (1) reduces OS cache space causing double-caching inefficiency, (2) page replacement algorithm becomes inefficient, (3) backups/checkpoints slow down. 25% of RAM is generally optimal.
2. Signs that autovacuum can't keep up?
Answer: When n_dead_tup exceeds 20% of n_live_tup in pg_stat_user_tables. Solutions: (1) increase autovacuum_max_workers, (2) decrease autovacuum_vacuum_scale_factor (0.2 → 0.05), (3) decrease autovacuum_naptime, (4) per-table settings via ALTER TABLE.
3. Why is PgBouncer needed?
Answer: Each PostgreSQL connection is a separate process (~10MB). 1,000 connections = 10GB+ memory. Process creation/destruction is also costly. PgBouncer maintains a small pool of real connections and multiplexes client connections. Transaction pooling mode is most efficient but has prepared statements compatibility issues.
4. How does WAL archiving enable PITR?
Answer: WAL (Write-Ahead Log) records all data changes sequentially. WAL archiving copies these logs to external storage (S3, etc.). For recovery: (1) restore nearest base backup, (2) replay WAL up to the desired time. Result: point-in-time recovery. pgBackRest's --type=time automates this.
5. Difference between pg_repack and VACUUM FULL?
Answer: VACUUM FULL requires a table lock, unusable in production. pg_repack reorganizes tables without locks — copies data to a new table and swaps names. Essential for removing bloat from large tables in production.
References
현재 단락 (1/180)
- **Extensions are PostgreSQL's true power**: pg_stat_statements (query analysis), pgvector (AI), Po...