Split View: PostgreSQL 쿼리 최적화와 성능 튜닝 실전 가이드: EXPLAIN부터 파티셔닝까지

PostgreSQL 쿼리 최적화와 성능 튜닝 실전 가이드: EXPLAIN부터 파티셔닝까지

들어가며
EXPLAIN ANALYZE 완벽 이해
인덱스 전략: B-tree, GIN, GiST, BRIN
쿼리 리라이팅과 실행 계획 최적화
VACUUM과 Autovacuum 튜닝
테이블 파티셔닝 전략
커넥션 풀링 (PgBouncer, Pgpool-II)
인덱스 유형 비교표
- VACUUM 전략 비교표
운영 시 주의사항과 트러블슈팅
실패 사례와 복구 절차
마치며
참고자료

들어가며

PostgreSQL은 엔터프라이즈급 오픈소스 관계형 데이터베이스로, 복잡한 쿼리 처리 능력과 확장성에서 높은 평가를 받고 있다. 하지만 아무리 뛰어난 DBMS라도 쿼리 최적화 없이는 프로덕션 환경에서 기대한 성능을 얻기 어렵다. 수백만 건의 데이터가 쌓이고, 동시 접속자가 늘어나면 느린 쿼리 하나가 전체 서비스를 마비시킬 수 있다.

실제로 필자가 운영하던 서비스에서 단일 쿼리의 실행 시간이 30초를 넘기면서 커넥션 풀이 고갈되고, 연쇄적으로 API 타임아웃이 발생한 사례가 있었다. EXPLAIN ANALYZE로 실행 계획을 분석하고, 적절한 인덱스를 추가하고, 쿼리를 리라이팅한 결과 동일 쿼리가 50ms 이내로 개선되었다.

이 글에서는 PostgreSQL의 EXPLAIN 분석부터 인덱스 전략, VACUUM 튜닝, 테이블 파티셔닝, 커넥션 풀링까지 프로덕션 환경에서 바로 적용할 수 있는 성능 최적화 기법을 체계적으로 다룬다.

EXPLAIN ANALYZE 완벽 이해

기본 사용법

PostgreSQL의 쿼리 최적화는 항상 EXPLAIN ANALYZE에서 시작한다. 이 명령은 쿼리를 실제로 실행하면서 각 노드의 예상 비용과 실제 실행 시간, 반환 행 수를 보여준다.

-- 기본 EXPLAIN ANALYZE 사용
EXPLAIN ANALYZE
SELECT u.id, u.name, o.total_amount
FROM users u
JOIN orders o ON u.id = o.user_id
WHERE o.created_at >= '2026-01-01'
  AND o.status = 'completed'
ORDER BY o.total_amount DESC
LIMIT 100;

실행 결과는 다음과 같은 형태로 출력된다.

Limit  (cost=15234.56..15234.81 rows=100 width=52) (actual time=45.123..45.156 rows=100 loops=1)
  ->  Sort  (cost=15234.56..15347.89 rows=45332 width=52) (actual time=45.121..45.142 rows=100 loops=1)
        Sort Key: o.total_amount DESC
        Sort Method: top-N heapsort  Memory: 35kB
        ->  Hash Join  (cost=3456.78..13890.12 rows=45332 width=52) (actual time=12.345..38.901 rows=45332 loops=1)
              Hash Cond: (o.user_id = u.id)
              ->  Seq Scan on orders o  (cost=0.00..9876.54 rows=45332 width=20) (actual time=0.015..18.234 rows=45332 loops=1)
                    Filter: ((created_at >= '2026-01-01') AND (status = 'completed'))
                    Rows Removed by Filter: 154668
              ->  Hash  (cost=2345.00..2345.00 rows=100000 width=36) (actual time=12.123..12.123 rows=100000 loops=1)
                    Buckets: 131072  Batches: 1  Memory Usage: 6234kB
                    ->  Seq Scan on users u  (cost=0.00..2345.00 rows=100000 width=36) (actual time=0.008..5.678 rows=100000 loops=1)
Planning Time: 0.456 ms
Execution Time: 45.234 ms

핵심 읽는 법

실행 계획에서 주목해야 할 핵심 지표는 다음과 같다.

actual time: 실제 실행 시간(밀리초). 첫 번째 값은 첫 행 반환까지의 시간, 두 번째 값은 전체 완료 시간이다.
rows: 예상 행 수(cost 옆)와 실제 행 수(actual 옆)의 차이가 크면 통계 정보가 오래되었다는 신호다.
Seq Scan: 테이블 전체를 순차 스캔한다. 대용량 테이블에서 이것이 보이면 인덱스 추가를 고려한다.
Rows Removed by Filter: 필터링으로 제거된 행이 많을수록 불필요한 I/O가 발생하고 있다는 뜻이다.

BUFFERS 옵션 활용

더 깊은 분석을 위해 BUFFERS 옵션을 함께 사용하면 디스크 I/O와 캐시 히트 정보를 확인할 수 있다.

EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT *
FROM orders
WHERE user_id = 12345
  AND created_at BETWEEN '2026-01-01' AND '2026-03-14';

Index Scan using idx_orders_user_created on orders  (cost=0.43..8.45 rows=1 width=64) (actual time=0.023..0.025 rows=3 loops=1)
  Index Cond: ((user_id = 12345) AND (created_at >= '2026-01-01') AND (created_at <= '2026-03-14'))
  Buffers: shared hit=4
Planning Time: 0.123 ms
Execution Time: 0.045 ms

여기서 shared hit=4는 4개의 페이지를 shared buffer(캐시)에서 읽었다는 뜻이다. 만약 shared read=N이 보이면 디스크에서 읽은 것이므로, shared_buffers 크기를 늘리거나 쿼리를 최적화해야 할 수 있다.

시각화 도구

텍스트 형태의 실행 계획이 복잡하면 다음 도구를 활용하자.

explain.depesz.com - 실행 계획을 붙여넣으면 색상으로 병목 구간을 표시해준다
pgexplain.dev - 노드별 시각화와 통계 분석을 제공한다

인덱스 전략: B-tree, GIN, GiST, BRIN

B-tree 인덱스 (기본)

B-tree는 PostgreSQL의 기본 인덱스 타입으로, 등호 비교와 범위 검색에 가장 적합하다. CREATE INDEX 명령 시 별도 타입을 지정하지 않으면 B-tree가 생성된다.

-- 단일 컬럼 B-tree 인덱스
CREATE INDEX idx_orders_user_id ON orders (user_id);

-- 복합 인덱스: 선행 컬럼 순서가 중요하다
-- WHERE user_id = ? AND created_at >= ? 형태의 쿼리에 최적
CREATE INDEX idx_orders_user_created ON orders (user_id, created_at);

-- 커버링 인덱스: Index-Only Scan을 유도
CREATE INDEX idx_orders_covering ON orders (user_id, created_at)
  INCLUDE (status, total_amount);

-- 부분 인덱스: 특정 조건의 행만 인덱싱하여 크기를 줄인다
CREATE INDEX idx_orders_active ON orders (user_id, created_at)
  WHERE status = 'active';

복합 인덱스 설계 시 가장 중요한 원칙은 선행 컬럼의 선택도이다. WHERE 절에서 등호 조건으로 사용되는 컬럼을 먼저, 범위 조건 컬럼을 뒤에 배치한다. 위 예시에서 user_id = ?는 등호, created_at >= ?는 범위 조건이므로 user_id를 선행 컬럼으로 둔 것이다.

GIN (Generalized Inverted Index)

GIN 인덱스는 하나의 행에 여러 값이 포함된 경우에 최적이다. JSONB, 배열, 전문 검색(tsvector)에 주로 사용한다.

-- JSONB 필드에 대한 GIN 인덱스
CREATE INDEX idx_products_metadata ON products USING GIN (metadata);

-- jsonb_path_ops: 특정 연산자에 더 효율적이고 인덱스 크기가 작다
CREATE INDEX idx_products_metadata_path ON products USING GIN (metadata jsonb_path_ops);

-- 전문 검색용 GIN 인덱스
CREATE INDEX idx_articles_search ON articles USING GIN (
  to_tsvector('korean', title || ' ' || content)
);

-- 배열 컬럼에 대한 GIN
CREATE INDEX idx_posts_tags ON posts USING GIN (tags);

GIN의 핵심 주의사항은 쓰기 성능 저하이다. GIN 인덱스는 갱신 비용이 높으므로, 쓰기가 빈번한 테이블에서는 fastupdate = on (기본값) 설정으로 pending list에 일괄 반영하는 방식을 사용하되, pending list가 너무 커지면 읽기 성능이 떨어질 수 있으므로 gin_pending_list_limit을 조정해야 한다.

GiST (Generalized Search Tree)

GiST 인덱스는 지리 데이터(PostGIS), 범위 타입, 근접 검색에 주로 사용한다.

-- PostGIS 공간 인덱스
CREATE INDEX idx_stores_location ON stores USING GiST (location);

-- 범위 타입 인덱스
CREATE INDEX idx_reservations_period ON reservations USING GiST (
  tstzrange(start_time, end_time)
);

-- 범위 겹침 검색
SELECT * FROM reservations
WHERE tstzrange(start_time, end_time) && tstzrange('2026-03-14 09:00', '2026-03-14 18:00');

BRIN (Block Range INdex)

BRIN 인덱스는 테이블의 물리적 순서와 데이터 값의 상관관계가 높을 때 극도로 효율적이다. 시계열 데이터(로그, 이벤트)에서 인덱스 크기를 B-tree의 1/100 수준으로 줄일 수 있다.

-- 시계열 로그 테이블에 BRIN 인덱스
CREATE INDEX idx_logs_created_brin ON access_logs USING BRIN (created_at)
  WITH (pages_per_range = 32);

-- BRIN이 효과적인지 확인: 상관계수가 1에 가까워야 한다
SELECT correlation
FROM pg_stats
WHERE tablename = 'access_logs' AND attname = 'created_at';

correlation 값이 0.9 이상이면 BRIN이 매우 효과적이고, 0.5 이하이면 B-tree를 사용하는 것이 낫다.

쿼리 리라이팅과 실행 계획 최적화

N+1 문제 해결

ORM을 사용할 때 흔히 발생하는 N+1 문제는 데이터베이스 성능의 가장 큰 적이다.

-- Bad: 애플리케이션에서 N+1 패턴으로 실행되는 쿼리
-- 1) SELECT * FROM users WHERE department = 'engineering';  -- 50명 반환
-- 2) SELECT * FROM orders WHERE user_id = 1;  -- 50번 반복 실행
-- 3) SELECT * FROM orders WHERE user_id = 2;
-- ...

-- Good: JOIN으로 한 번에 가져오기
SELECT u.id, u.name, o.id AS order_id, o.total_amount
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.department = 'engineering';

-- Good: LATERAL JOIN으로 사용자당 최근 주문 3건만 가져오기
SELECT u.id, u.name, recent_orders.*
FROM users u
CROSS JOIN LATERAL (
  SELECT o.id, o.total_amount, o.created_at
  FROM orders o
  WHERE o.user_id = u.id
  ORDER BY o.created_at DESC
  LIMIT 3
) recent_orders
WHERE u.department = 'engineering';

서브쿼리 vs JOIN 최적화

-- Bad: 상관 서브쿼리 - 외부 쿼리의 각 행마다 서브쿼리가 실행될 수 있다
SELECT u.id, u.name,
  (SELECT COUNT(*) FROM orders o WHERE o.user_id = u.id) AS order_count
FROM users u
WHERE u.status = 'active';

-- Good: GROUP BY와 JOIN 활용
SELECT u.id, u.name, COALESCE(oc.cnt, 0) AS order_count
FROM users u
LEFT JOIN (
  SELECT user_id, COUNT(*) AS cnt
  FROM orders
  GROUP BY user_id
) oc ON u.id = oc.user_id
WHERE u.status = 'active';

EXISTS vs IN 전략

-- 대용량 데이터에서는 EXISTS가 유리한 경우가 많다
-- EXISTS는 첫 번째 매칭을 찾으면 즉시 중단한다
SELECT u.id, u.name
FROM users u
WHERE EXISTS (
  SELECT 1 FROM orders o
  WHERE o.user_id = u.id
    AND o.created_at >= '2026-01-01'
);

-- IN은 서브쿼리 결과가 작을 때 유리하다
SELECT u.id, u.name
FROM users u
WHERE u.department_id IN (
  SELECT id FROM departments WHERE region = 'APAC'
);

페이지네이션 최적화

OFFSET 기반 페이지네이션은 대용량 테이블에서 심각한 성능 문제를 일으킨다. 100만 번째부터 10건을 가져오려면 100만 건을 먼저 스캔하고 버린다.

-- Bad: OFFSET이 커질수록 느려진다
SELECT * FROM orders
ORDER BY id
OFFSET 1000000 LIMIT 10;

-- Good: 키셋(Keyset) 페이지네이션 - 항상 일정한 성능
SELECT * FROM orders
WHERE id > 1000000  -- 이전 페이지의 마지막 id
ORDER BY id
LIMIT 10;

-- 복합 정렬 기준일 때의 키셋 페이지네이션
SELECT * FROM orders
WHERE (created_at, id) > ('2026-03-13 15:30:00', 98765)
ORDER BY created_at, id
LIMIT 10;

VACUUM과 Autovacuum 튜닝

VACUUM이 필요한 이유

PostgreSQL은 MVCC 아키텍처 덕분에 UPDATE나 DELETE 시 기존 행을 즉시 삭제하지 않고 Dead Tuple로 남긴다. VACUUM은 이 Dead Tuple을 정리하여 공간을 재사용 가능하게 만드는 핵심 유지보수 작업이다.

VACUUM을 소홀히 하면 다음과 같은 문제가 발생한다.

테이블 Bloat: 실제 데이터 대비 테이블 크기가 비정상적으로 커진다
인덱스 Bloat: 인덱스 크기 증가로 쿼리 성능 저하
XID Wraparound: 트랜잭션 ID 소진 시 데이터베이스가 자동으로 읽기 전용 모드로 전환되는 치명적 장애

Autovacuum 핵심 파라미터 튜닝

# postgresql.conf - Autovacuum 튜닝 권장 설정

# Autovacuum 활성화 (절대 끄지 마라)
autovacuum = on

# 동시에 실행 가능한 autovacuum worker 수
# CPU 코어 수와 테이블 수를 고려하여 설정
autovacuum_max_workers = 5

# VACUUM 트리거 임계값: dead tuple이 테이블의 10% + 50건 초과 시 실행
autovacuum_vacuum_threshold = 50
autovacuum_vacuum_scale_factor = 0.1

# 대용량 테이블에서는 scale_factor를 낮춰야 한다
# 1억 건 테이블에서 0.2(기본값)이면 2천만 건이 쌓여야 VACUUM이 실행된다

# ANALYZE 트리거: 변경이 테이블의 5% + 50건 초과 시 통계 갱신
autovacuum_analyze_threshold = 50
autovacuum_analyze_scale_factor = 0.05

# 비용 기반 지연: 높을수록 VACUUM이 빠르지만 I/O 부하 증가
autovacuum_vacuum_cost_delay = 2ms
autovacuum_vacuum_cost_limit = 1000

# XID Wraparound 방지를 위한 설정
autovacuum_freeze_max_age = 300000000

테이블별 개별 튜닝

대용량 트랜잭션 테이블은 글로벌 설정으로는 부족하다. 테이블별로 autovacuum 파라미터를 개별 설정할 수 있다.

-- 대용량 주문 테이블: 더 빈번하게 VACUUM 실행
ALTER TABLE orders SET (
  autovacuum_vacuum_scale_factor = 0.01,      -- 1%만 변경되어도 VACUUM
  autovacuum_vacuum_threshold = 1000,
  autovacuum_analyze_scale_factor = 0.005,
  autovacuum_vacuum_cost_delay = 0             -- 최대 속도로 VACUUM
);

-- VACUUM 상태 모니터링
SELECT
  schemaname,
  relname,
  n_live_tup,
  n_dead_tup,
  ROUND(n_dead_tup::numeric / GREATEST(n_live_tup, 1) * 100, 2) AS dead_ratio_pct,
  last_vacuum,
  last_autovacuum,
  last_analyze,
  last_autoanalyze
FROM pg_stat_user_tables
WHERE n_dead_tup > 10000
ORDER BY n_dead_tup DESC;

테이블 파티셔닝 전략

언제 파티셔닝을 적용하는가

파티셔닝은 논리적으로 하나인 대형 테이블을 물리적으로 여러 파티션으로 분할하는 기법이다. 다음 조건에 해당하면 파티셔닝을 고려해야 한다.

테이블 크기가 수십 GB 이상이고 계속 증가하는 경우
쿼리가 항상 특정 범위의 데이터만 조회하는 경우 (예: 최근 3개월)
오래된 데이터를 주기적으로 삭제해야 하는 경우 (DELETE 대신 DROP PARTITION)
VACUUM이 대형 테이블을 처리하는 데 오래 걸리는 경우

Range 파티셔닝 (날짜 기반)

-- 월별 Range 파티셔닝 테이블 생성
CREATE TABLE events (
  id          BIGSERIAL,
  event_type  VARCHAR(50) NOT NULL,
  payload     JSONB,
  created_at  TIMESTAMPTZ NOT NULL,
  PRIMARY KEY (id, created_at)
) PARTITION BY RANGE (created_at);

-- 월별 파티션 생성
CREATE TABLE events_2026_01 PARTITION OF events
  FOR VALUES FROM ('2026-01-01') TO ('2026-02-01');
CREATE TABLE events_2026_02 PARTITION OF events
  FOR VALUES FROM ('2026-02-01') TO ('2026-03-01');
CREATE TABLE events_2026_03 PARTITION OF events
  FOR VALUES FROM ('2026-03-01') TO ('2026-04-01');

-- 각 파티션에 인덱스 생성 (부모 테이블에 생성하면 자동 상속)
CREATE INDEX idx_events_type_created ON events (event_type, created_at);

-- 파티션 프루닝 확인: 2026년 3월 데이터만 조회
EXPLAIN ANALYZE
SELECT * FROM events
WHERE created_at >= '2026-03-01' AND created_at < '2026-04-01'
  AND event_type = 'purchase';

실행 계획에서 Append 노드 하위에 2026_03 파티션만 스캔하는 것을 확인할 수 있다. 이것이 **파티션 프루닝(Partition Pruning)**이다.

자동 파티션 관리

pg_partman 확장을 사용하면 파티션 생성과 삭제를 자동화할 수 있다.

-- pg_partman 설치 후 자동 관리 설정
CREATE EXTENSION pg_partman;

SELECT partman.create_parent(
  p_parent_table => 'public.events',
  p_control => 'created_at',
  p_type => 'range',
  p_interval => '1 month',
  p_premake => 3  -- 3개월 분 미리 생성
);

-- 오래된 파티션 자동 삭제 (12개월 보관)
UPDATE partman.part_config
SET retention = '12 months',
    retention_keep_table = false
WHERE parent_table = 'public.events';

오래된 데이터 정리: DELETE vs DROP PARTITION

-- Bad: 대량 DELETE는 Dead Tuple을 생성하고 VACUUM 부하를 유발한다
DELETE FROM events WHERE created_at < '2025-01-01';

-- Good: 파티션을 통째로 분리하고 삭제 - 즉시 완료, VACUUM 불필요
ALTER TABLE events DETACH PARTITION events_2024_12;
DROP TABLE events_2024_12;

커넥션 풀링 (PgBouncer, Pgpool-II)

커넥션 풀링이 필요한 이유

PostgreSQL은 연결당 하나의 프로세스를 fork하는 구조이다. 각 연결은 약 5-10MB의 메모리를 소비하며, 동시 연결이 수백 개를 넘으면 컨텍스트 스위칭 오버헤드로 인해 성능이 급격히 저하된다. max_connections를 무작정 높이는 것은 해결책이 아니다.

PgBouncer 설정

PgBouncer는 경량 커넥션 풀러로, 단일 프로세스에서 수천 개의 클라이언트 연결을 처리할 수 있다.

;; pgbouncer.ini

[databases]
myapp = host=127.0.0.1 port=5432 dbname=myapp_production

[pgbouncer]
;; 풀링 모드: transaction이 가장 효율적
pool_mode = transaction

;; 클라이언트 최대 연결 수
max_client_conn = 1000

;; 데이터베이스당 서버 연결 수 (실제 PostgreSQL 연결)
default_pool_size = 30
min_pool_size = 10
reserve_pool_size = 5

;; 유휴 연결 타임아웃
server_idle_timeout = 300

;; 클라이언트 연결 대기 큐 타임아웃
client_login_timeout = 60

;; 로깅
log_connections = 1
log_disconnections = 1
log_pooler_errors = 1

;; 모니터링용 포트
listen_port = 6432
listen_addr = 0.0.0.0
auth_type = md5
auth_file = /etc/pgbouncer/userlist.txt

풀링 모드별 차이점은 다음과 같다.

session: 클라이언트 연결이 끊길 때까지 서버 연결을 점유한다. 가장 호환성이 높지만 효율이 낮다.
transaction: 트랜잭션이 끝나면 서버 연결을 반납한다. 대부분의 경우 최적의 선택이다.
statement: 각 SQL 문 실행 후 연결을 반납한다. 멀티 스테이트먼트 트랜잭션에서 문제가 발생하므로 주의가 필요하다.

Pgpool-II vs PgBouncer 선택 기준

Pgpool-II는 커넥션 풀링 외에도 로드 밸런싱, 페일오버, 쿼리 캐싱 기능을 제공하는 미들웨어이다. 단순히 커넥션 풀링만 필요하다면 PgBouncer가 더 가볍고 빠르지만, PostgreSQL 클러스터를 직접 관리하면서 통합 솔루션이 필요하다면 Pgpool-II를 고려할 수 있다.

# PgBouncer 상태 확인
psql -h 127.0.0.1 -p 6432 -U pgbouncer pgbouncer -c "SHOW POOLS;"

# 활성 연결 및 대기 상태 확인
psql -h 127.0.0.1 -p 6432 -U pgbouncer pgbouncer -c "SHOW CLIENTS;"

# PostgreSQL 쪽 실제 연결 수 확인
psql -c "SELECT count(*) FROM pg_stat_activity WHERE state = 'active';"

인덱스 유형 비교표

인덱스 타입	적합한 데이터	적합한 연산자	인덱스 크기	쓰기 비용	대표 사용 사례
B-tree	스칼라 값 (정수, 문자열, 날짜)	=, <, >, BETWEEN, IN, IS NULL	중간	낮음	PK, FK, 범위 검색, 정렬
GIN	복합 값 (JSONB, 배열, tsvector)	@>, ?\|, ?&, @@ (전문검색)	큼	높음	JSONB 검색, 전문 검색, 태그
GiST	기하/공간 데이터, 범위 타입	&&, @>, <@, <-> (거리)	중간	중간	PostGIS, IP 범위, 시간 범위
BRIN	물리적 순서와 상관관계 높은 데이터	=, <, >, BETWEEN	매우 작음	매우 낮음	시계열 로그, append-only 테이블
Hash	등호 비교만 필요한 경우	=	중간	낮음	정확한 값 매칭 (PG 10 이상에서 WAL 지원)
SP-GiST	비균형 트리 구조 데이터	전화번호, IP, 기하	중간	중간	접두사 검색, 쿼드트리

VACUUM 전략 비교표

전략	대상	언제 사용하는가	잠금 영향	소요 시간
VACUUM	Dead tuple 정리	일상적 유지보수	테이블 읽기/쓰기 가능	테이블 크기에 비례
VACUUM FULL	테이블 재작성으로 디스크 반환	Bloat이 심각할 때	ACCESS EXCLUSIVE 잠금 (서비스 중단)	매우 길다
VACUUM FREEZE	XID Wraparound 방지	오래된 트랜잭션 ID 동결	읽기/쓰기 가능	전체 테이블 스캔
Autovacuum	자동 Dead tuple 정리	항상 활성화	읽기/쓰기 가능	설정에 따라 조절
pg_repack	무중단 테이블 재구성	VACUUM FULL 대신 사용	짧은 잠금만 필요	테이블 크기에 비례
ANALYZE	통계 정보 갱신	대량 INSERT/UPDATE 후	읽기/쓰기 가능	샘플링 기반으로 빠름

운영 시 주의사항과 트러블슈팅

느린 쿼리 모니터링 설정

# postgresql.conf - 슬로우 쿼리 로깅
log_min_duration_statement = 500       # 500ms 이상 쿼리 로깅
log_statement = 'none'                  # DDL만 로깅하려면 'ddl'
log_lock_waits = on                     # 잠금 대기 로깅
deadlock_timeout = 1s                   # 데드락 감지 주기

# pg_stat_statements 확장 활성화 (필수)
shared_preload_libraries = 'pg_stat_statements'
pg_stat_statements.max = 10000
pg_stat_statements.track = 'all'

pg_stat_statements로 가장 느린 쿼리 TOP 10을 조회하는 방법이다.

-- pg_stat_statements 확장 설치
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

-- 평균 실행 시간이 가장 긴 쿼리 TOP 10
SELECT
  queryid,
  calls,
  ROUND(total_exec_time::numeric, 2) AS total_ms,
  ROUND(mean_exec_time::numeric, 2) AS avg_ms,
  ROUND((100 * total_exec_time / SUM(total_exec_time) OVER ())::numeric, 2) AS pct,
  rows,
  query
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 10;

인덱스 사용률 점검

사용되지 않는 인덱스는 쓰기 성능을 저하시키고 디스크 공간을 낭비한다.

-- 사용되지 않는 인덱스 찾기
SELECT
  schemaname || '.' || indexrelname AS index_name,
  pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
  idx_scan AS scan_count,
  idx_tup_read,
  idx_tup_fetch
FROM pg_stat_user_indexes
WHERE idx_scan = 0
  AND indexrelname NOT LIKE 'pg_%'
ORDER BY pg_relation_size(indexrelid) DESC;

-- 중복 인덱스 찾기
SELECT
  pg_size_pretty(SUM(pg_relation_size(idx))::bigint) AS total_size,
  (array_agg(idx))[1] AS idx1,
  (array_agg(idx))[2] AS idx2,
  (array_agg(indkey))[1] AS columns1,
  (array_agg(indkey))[2] AS columns2
FROM (
  SELECT indexrelid::regclass AS idx,
         indrelid, indkey,
         COALESCE(indexprs::text, '') AS exprs,
         COALESCE(indpred::text, '') AS preds
  FROM pg_index
) sub
GROUP BY indrelid, exprs, preds
HAVING COUNT(*) > 1;

잠금(Lock) 모니터링

-- 현재 잠금 대기 상황 확인
SELECT
  blocked_locks.pid AS blocked_pid,
  blocked_activity.usename AS blocked_user,
  blocking_locks.pid AS blocking_pid,
  blocking_activity.usename AS blocking_user,
  blocked_activity.query AS blocked_query,
  blocking_activity.query AS blocking_query,
  blocked_activity.wait_event_type
FROM pg_catalog.pg_locks blocked_locks
JOIN pg_catalog.pg_stat_activity blocked_activity
  ON blocked_activity.pid = blocked_locks.pid
JOIN pg_catalog.pg_locks blocking_locks
  ON blocking_locks.locktype = blocked_locks.locktype
  AND blocking_locks.database IS NOT DISTINCT FROM blocked_locks.database
  AND blocking_locks.relation IS NOT DISTINCT FROM blocked_locks.relation
  AND blocking_locks.page IS NOT DISTINCT FROM blocked_locks.page
  AND blocking_locks.tuple IS NOT DISTINCT FROM blocked_locks.tuple
  AND blocking_locks.transactionid IS NOT DISTINCT FROM blocked_locks.transactionid
  AND blocking_locks.pid != blocked_locks.pid
JOIN pg_catalog.pg_stat_activity blocking_activity
  ON blocking_activity.pid = blocking_locks.pid
WHERE NOT blocked_locks.granted;

shared_buffers 및 work_mem 튜닝

# postgresql.conf - 메모리 관련 핵심 파라미터

# shared_buffers: 전체 RAM의 25% 권장 (최대 40%)
# 64GB RAM 서버 기준
shared_buffers = 16GB

# work_mem: 정렬, 해시 조인 등에 사용하는 세션당 메모리
# 주의: 쿼리당 여러 노드에서 각각 할당되므로 너무 크게 잡지 않는다
# (동시 접속 수 * 노드 수)를 고려하여 설정
work_mem = 64MB

# maintenance_work_mem: VACUUM, CREATE INDEX 등에 사용
maintenance_work_mem = 2GB

# effective_cache_size: OS 캐시를 포함한 총 캐시 크기 (플래너 힌트용)
# 전체 RAM의 50-75%
effective_cache_size = 48GB

# WAL 관련 설정
wal_buffers = 64MB
checkpoint_completion_target = 0.9
max_wal_size = 4GB
min_wal_size = 1GB

실패 사례와 복구 절차

사례 1: 인덱스 없는 JOIN으로 인한 서비스 장애

상황: 개발자가 새로운 리포트 쿼리를 프로덕션에 배포했는데, 1000만 건 테이블 두 개를 인덱스 없이 JOIN하여 Nested Loop으로 실행되었다. CPU 사용률 100%, 커넥션 풀 고갈, 전체 서비스 다운.

즉시 대응:

-- 1. 문제 쿼리를 실행 중인 세션 확인
SELECT pid, now() - pg_stat_activity.query_start AS duration, query, state
FROM pg_stat_activity
WHERE state != 'idle'
  AND query NOT LIKE '%pg_stat_activity%'
ORDER BY duration DESC;

-- 2. 문제 쿼리 강제 종료 (graceful)
SELECT pg_cancel_backend(12345);

-- 3. 종료되지 않으면 강제 킬
SELECT pg_terminate_backend(12345);

근본 해결: 해당 쿼리에 필요한 인덱스를 CONCURRENTLY 옵션으로 추가한다.

-- CONCURRENTLY: 테이블 잠금 없이 인덱스 생성 (프로덕션 필수)
CREATE INDEX CONCURRENTLY idx_reports_user_date
  ON reports (user_id, report_date);

-- 인덱스 생성이 실패하면 INVALID 상태로 남는다. 반드시 확인!
SELECT indexrelname, idx_scan, pg_size_pretty(pg_relation_size(indexrelid))
FROM pg_stat_user_indexes
WHERE schemaname = 'public' AND relname = 'reports';

사례 2: VACUUM FULL 실행 중 서비스 장애

상황: DBA가 테이블 Bloat 해소를 위해 프로덕션 업무 시간에 VACUUM FULL을 실행했다. ACCESS EXCLUSIVE 잠금으로 해당 테이블에 대한 모든 읽기/쓰기가 차단되어 서비스 장애 발생.

교훈: VACUUM FULL은 절대 프로덕션 업무 시간에 실행하지 않는다. 대신 pg_repack을 사용한다.

# pg_repack: 무중단으로 테이블 재구성 (짧은 잠금만 필요)
pg_repack -d myapp_production -t orders --no-superuser-check

# 특정 인덱스만 재구성
pg_repack -d myapp_production --index idx_orders_user_created

사례 3: XID Wraparound 임박 경고

상황: autovacuum이 장시간 실행 중인 트랜잭션에 의해 차단되어 XID가 소진 임박. PostgreSQL이 WARNING 로그를 출력하기 시작했다.

-- XID 소진 상태 확인
SELECT
  datname,
  age(datfrozenxid) AS xid_age,
  ROUND(100 * age(datfrozenxid)::numeric / 2147483647, 2) AS pct_toward_wraparound
FROM pg_database
ORDER BY age(datfrozenxid) DESC;

-- 장시간 실행 중인 트랜잭션 확인 (autovacuum 차단 원인)
SELECT
  pid,
  now() - xact_start AS xact_duration,
  now() - query_start AS query_duration,
  state,
  query
FROM pg_stat_activity
WHERE xact_start IS NOT NULL
  AND state != 'idle'
ORDER BY xact_start ASC;

-- 문제 트랜잭션 종료 후 수동 VACUUM FREEZE 실행
VACUUM FREEZE orders;

사례 4: 커넥션 풀 고갈

상황: 애플리케이션에서 트랜잭션을 열고 외부 API 호출을 한 후 닫는 패턴으로 인해, 외부 API 응답 지연 시 커넥션을 장시간 점유하여 풀이 고갈되었다.

근본 해결: 트랜잭션 범위를 최소화하고, 외부 API 호출은 트랜잭션 밖에서 수행한다. PgBouncer의 query_wait_timeout과 server_idle_timeout을 적절히 설정하여 유휴 연결을 회수한다.

;; pgbouncer.ini - 커넥션 고갈 방지
query_wait_timeout = 30        ;; 서버 연결 대기 최대 30초
server_idle_timeout = 60       ;; 60초 유휴 시 서버 연결 반납
server_lifetime = 3600         ;; 최대 1시간 후 서버 연결 재생성
client_idle_timeout = 300      ;; 5분 유휴 클라이언트 연결 해제

마치며

PostgreSQL 성능 최적화는 단일 기법이 아닌, 여러 계층의 최적화를 체계적으로 적용해야 달성할 수 있다.

쿼리 레벨: EXPLAIN ANALYZE로 실행 계획을 분석하고, 쿼리를 리라이팅한다
인덱스 레벨: 데이터 특성에 맞는 인덱스 타입을 선택하고, 복합 인덱스의 컬럼 순서를 최적화한다
테이블 레벨: 파티셔닝으로 대용량 테이블을 분할하고, VACUUM 설정을 테이블별로 튜닝한다
서버 레벨: shared_buffers, work_mem 등 메모리 파라미터를 워크로드에 맞게 조정한다
인프라 레벨: PgBouncer로 커넥션을 관리하고, 모니터링 체계를 구축한다

성능 최적화에서 가장 중요한 것은 측정이다. pg_stat_statements, EXPLAIN ANALYZE, pg_stat_user_tables 같은 도구로 항상 현재 상태를 파악하고, 변경 전후를 비교해야 한다. 감으로 하는 최적화는 오히려 상황을 악화시킬 수 있다.

프로덕션 환경에서는 항상 다음을 기억하자.

인덱스 생성은 CONCURRENTLY 옵션을 사용한다
VACUUM FULL 대신 pg_repack을 사용한다
Autovacuum을 절대 끄지 않는다
모니터링 없는 최적화는 의미가 없다

참고자료

PostgreSQL Query Optimization and Performance Tuning Practical Guide: From EXPLAIN to Partitioning

Introduction
Understanding EXPLAIN ANALYZE
Index Strategies: B-tree, GIN, GiST, BRIN
Query Rewriting and Execution Plan Optimization
VACUUM and Autovacuum Tuning
Table Partitioning Strategy
Connection Pooling (PgBouncer, Pgpool-II)
Index Type Comparison Table
- VACUUM Strategy Comparison Table
Operational Considerations and Troubleshooting
Failure Cases and Recovery Procedures
Conclusion
References

Introduction

PostgreSQL is an enterprise-grade open-source relational database highly regarded for its complex query processing capabilities and extensibility. However, no matter how excellent the DBMS, achieving expected performance in production without query optimization is difficult. As data grows to millions of rows and concurrent connections increase, a single slow query can bring down an entire service.

In a service I operated, a single query exceeding 30 seconds of execution time led to connection pool exhaustion, which cascaded into API timeouts. After analyzing the execution plan with EXPLAIN ANALYZE, adding appropriate indexes, and rewriting the query, the same query improved to under 50ms.

This article systematically covers performance optimization techniques you can immediately apply in production, from PostgreSQL's EXPLAIN analysis to index strategies, VACUUM tuning, table partitioning, and connection pooling.

Understanding EXPLAIN ANALYZE

Basic Usage

PostgreSQL query optimization always starts with EXPLAIN ANALYZE. This command actually executes the query while showing the estimated cost, actual execution time, and returned row count for each node.

-- Basic EXPLAIN ANALYZE usage
EXPLAIN ANALYZE
SELECT u.id, u.name, o.total_amount
FROM users u
JOIN orders o ON u.id = o.user_id
WHERE o.created_at >= '2026-01-01'
  AND o.status = 'completed'
ORDER BY o.total_amount DESC
LIMIT 100;

The output looks like this:

Limit  (cost=15234.56..15234.81 rows=100 width=52) (actual time=45.123..45.156 rows=100 loops=1)
  ->  Sort  (cost=15234.56..15347.89 rows=45332 width=52) (actual time=45.121..45.142 rows=100 loops=1)
        Sort Key: o.total_amount DESC
        Sort Method: top-N heapsort  Memory: 35kB
        ->  Hash Join  (cost=3456.78..13890.12 rows=45332 width=52) (actual time=12.345..38.901 rows=45332 loops=1)
              Hash Cond: (o.user_id = u.id)
              ->  Seq Scan on orders o  (cost=0.00..9876.54 rows=45332 width=20) (actual time=0.015..18.234 rows=45332 loops=1)
                    Filter: ((created_at >= '2026-01-01') AND (status = 'completed'))
                    Rows Removed by Filter: 154668
              ->  Hash  (cost=2345.00..2345.00 rows=100000 width=36) (actual time=12.123..12.123 rows=100000 loops=1)
                    Buckets: 131072  Batches: 1  Memory Usage: 6234kB
                    ->  Seq Scan on users u  (cost=0.00..2345.00 rows=100000 width=36) (actual time=0.008..5.678 rows=100000 loops=1)
Planning Time: 0.456 ms
Execution Time: 45.234 ms

Key Reading Points

The key metrics to focus on in execution plans:

actual time: Actual execution time in milliseconds. The first value is time to first row, the second is total completion time.
rows: A large difference between estimated rows (next to cost) and actual rows (next to actual) signals stale statistics.
Seq Scan: Sequential scan of the entire table. If this appears on large tables, consider adding an index.
Rows Removed by Filter: A high number means unnecessary I/O is occurring.

Using the BUFFERS Option

For deeper analysis, using the BUFFERS option reveals disk I/O and cache hit information.

EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT *
FROM orders
WHERE user_id = 12345
  AND created_at BETWEEN '2026-01-01' AND '2026-03-14';

Index Scan using idx_orders_user_created on orders  (cost=0.43..8.45 rows=1 width=64) (actual time=0.023..0.025 rows=3 loops=1)
  Index Cond: ((user_id = 12345) AND (created_at >= '2026-01-01') AND (created_at <= '2026-03-14'))
  Buffers: shared hit=4
Planning Time: 0.123 ms
Execution Time: 0.045 ms

Here shared hit=4 means 4 pages were read from the shared buffer (cache). If you see shared read=N, it means reads from disk, suggesting you may need to increase shared_buffers or optimize the query.

Visualization Tools

If text-based execution plans are complex, use these tools:

explain.depesz.com - Paste execution plans and bottlenecks are highlighted with colors
pgexplain.dev - Provides per-node visualization and statistical analysis

Index Strategies: B-tree, GIN, GiST, BRIN

B-tree Index (Default)

B-tree is PostgreSQL's default index type, most suitable for equality comparisons and range searches. When no type is specified in CREATE INDEX, a B-tree is created.

-- Single column B-tree index
CREATE INDEX idx_orders_user_id ON orders (user_id);

-- Composite index: leading column order matters
-- Optimal for queries like WHERE user_id = ? AND created_at >= ?
CREATE INDEX idx_orders_user_created ON orders (user_id, created_at);

-- Covering index: enables Index-Only Scan
CREATE INDEX idx_orders_covering ON orders (user_id, created_at)
  INCLUDE (status, total_amount);

-- Partial index: indexes only rows matching specific conditions to reduce size
CREATE INDEX idx_orders_active ON orders (user_id, created_at)
  WHERE status = 'active';

The most important principle in composite index design is leading column selectivity. Place columns used with equality conditions in WHERE first, followed by range condition columns. In the example above, user_id = ? is equality and created_at >= ? is range, so user_id is the leading column.

GIN (Generalized Inverted Index)

GIN indexes are optimal when a single row contains multiple values. Primarily used for JSONB, arrays, and full-text search (tsvector).

-- GIN index on JSONB field
CREATE INDEX idx_products_metadata ON products USING GIN (metadata);

-- jsonb_path_ops: more efficient for specific operators with smaller index size
CREATE INDEX idx_products_metadata_path ON products USING GIN (metadata jsonb_path_ops);

-- GIN index for full-text search
CREATE INDEX idx_articles_search ON articles USING GIN (
  to_tsvector('korean', title || ' ' || content)
);

-- GIN on array column
CREATE INDEX idx_posts_tags ON posts USING GIN (tags);

The key caveat with GIN is write performance degradation. GIN indexes have high update costs, so for write-heavy tables, the fastupdate = on (default) setting batches updates to a pending list, but if the pending list grows too large, read performance can suffer, requiring gin_pending_list_limit adjustment.

GiST (Generalized Search Tree)

GiST indexes are primarily used for geographic data (PostGIS), range types, and proximity searches.

-- PostGIS spatial index
CREATE INDEX idx_stores_location ON stores USING GiST (location);

-- Range type index
CREATE INDEX idx_reservations_period ON reservations USING GiST (
  tstzrange(start_time, end_time)
);

-- Range overlap search
SELECT * FROM reservations
WHERE tstzrange(start_time, end_time) && tstzrange('2026-03-14 09:00', '2026-03-14 18:00');

BRIN (Block Range INdex)

BRIN indexes are extremely efficient when there's high correlation between physical table order and data values. For time-series data (logs, events), index size can be reduced to 1/100th of B-tree.

-- BRIN index on time-series log table
CREATE INDEX idx_logs_created_brin ON access_logs USING BRIN (created_at)
  WITH (pages_per_range = 32);

-- Verify BRIN effectiveness: correlation should be close to 1
SELECT correlation
FROM pg_stats
WHERE tablename = 'access_logs' AND attname = 'created_at';

A correlation value of 0.9+ makes BRIN very effective; below 0.5, B-tree is better.

Query Rewriting and Execution Plan Optimization

Solving the N+1 Problem

The N+1 problem commonly encountered with ORMs is the biggest enemy of database performance.

-- Bad: Queries executed in N+1 pattern from application
-- 1) SELECT * FROM users WHERE department = 'engineering';  -- returns 50
-- 2) SELECT * FROM orders WHERE user_id = 1;  -- repeated 50 times
-- 3) SELECT * FROM orders WHERE user_id = 2;
-- ...

-- Good: Fetch in one go with JOIN
SELECT u.id, u.name, o.id AS order_id, o.total_amount
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.department = 'engineering';

-- Good: LATERAL JOIN to get only 3 most recent orders per user
SELECT u.id, u.name, recent_orders.*
FROM users u
CROSS JOIN LATERAL (
  SELECT o.id, o.total_amount, o.created_at
  FROM orders o
  WHERE o.user_id = u.id
  ORDER BY o.created_at DESC
  LIMIT 3
) recent_orders
WHERE u.department = 'engineering';

Subquery vs JOIN Optimization

-- Bad: Correlated subquery - subquery may execute for each outer row
SELECT u.id, u.name,
  (SELECT COUNT(*) FROM orders o WHERE o.user_id = u.id) AS order_count
FROM users u
WHERE u.status = 'active';

-- Good: Using GROUP BY with JOIN
SELECT u.id, u.name, COALESCE(oc.cnt, 0) AS order_count
FROM users u
LEFT JOIN (
  SELECT user_id, COUNT(*) AS cnt
  FROM orders
  GROUP BY user_id
) oc ON u.id = oc.user_id
WHERE u.status = 'active';

EXISTS vs IN Strategy

-- EXISTS is often advantageous for large datasets
-- EXISTS stops immediately upon finding the first match
SELECT u.id, u.name
FROM users u
WHERE EXISTS (
  SELECT 1 FROM orders o
  WHERE o.user_id = u.id
    AND o.created_at >= '2026-01-01'
);

-- IN is advantageous when the subquery result is small
SELECT u.id, u.name
FROM users u
WHERE u.department_id IN (
  SELECT id FROM departments WHERE region = 'APAC'
);

Pagination Optimization

OFFSET-based pagination causes serious performance issues on large tables. To fetch 10 rows starting from the millionth row, it first scans and discards a million rows.

-- Bad: Gets slower as OFFSET increases
SELECT * FROM orders
ORDER BY id
OFFSET 1000000 LIMIT 10;

-- Good: Keyset pagination - consistent performance
SELECT * FROM orders
WHERE id > 1000000  -- last id from previous page
ORDER BY id
LIMIT 10;

-- Keyset pagination with composite sort criteria
SELECT * FROM orders
WHERE (created_at, id) > ('2026-03-13 15:30:00', 98765)
ORDER BY created_at, id
LIMIT 10;

VACUUM and Autovacuum Tuning

Why VACUUM Is Needed

Thanks to PostgreSQL's MVCC architecture, UPDATE and DELETE operations don't immediately remove old rows but leave them as Dead Tuples. VACUUM is the essential maintenance operation that cleans up these Dead Tuples to make space reusable.

Neglecting VACUUM leads to:

Table Bloat: Table size becomes abnormally large relative to actual data
Index Bloat: Query performance degrades as index sizes grow
XID Wraparound: A fatal failure where the database automatically switches to read-only mode when transaction IDs are exhausted

Autovacuum Core Parameter Tuning

# postgresql.conf - Recommended Autovacuum tuning settings

# Enable Autovacuum (never disable this)
autovacuum = on

# Number of concurrent autovacuum workers
# Set considering CPU cores and table count
autovacuum_max_workers = 5

# VACUUM trigger threshold: runs when dead tuples exceed 10% of table + 50 rows
autovacuum_vacuum_threshold = 50
autovacuum_vacuum_scale_factor = 0.1

# For large tables, scale_factor should be lowered
# On a 100M row table, 0.2 (default) means 20M dead tuples before VACUUM runs

# ANALYZE trigger: updates statistics when changes exceed 5% of table + 50 rows
autovacuum_analyze_threshold = 50
autovacuum_analyze_scale_factor = 0.05

# Cost-based delay: higher means faster VACUUM but more I/O load
autovacuum_vacuum_cost_delay = 2ms
autovacuum_vacuum_cost_limit = 1000

# XID Wraparound prevention settings
autovacuum_freeze_max_age = 300000000

Per-Table Tuning

High-volume transaction tables need more than global settings. Autovacuum parameters can be set individually per table.

-- High-volume orders table: more frequent VACUUM execution
ALTER TABLE orders SET (
  autovacuum_vacuum_scale_factor = 0.01,      -- VACUUM even at 1% changes
  autovacuum_vacuum_threshold = 1000,
  autovacuum_analyze_scale_factor = 0.005,
  autovacuum_vacuum_cost_delay = 0             -- VACUUM at maximum speed
);

-- VACUUM status monitoring
SELECT
  schemaname,
  relname,
  n_live_tup,
  n_dead_tup,
  ROUND(n_dead_tup::numeric / GREATEST(n_live_tup, 1) * 100, 2) AS dead_ratio_pct,
  last_vacuum,
  last_autovacuum,
  last_analyze,
  last_autoanalyze
FROM pg_stat_user_tables
WHERE n_dead_tup > 10000
ORDER BY n_dead_tup DESC;

Table Partitioning Strategy

When to Apply Partitioning

Partitioning is a technique that physically divides a logically single large table into multiple partitions. Consider partitioning when:

Table size exceeds tens of GB and continues growing
Queries always access only specific ranges of data (e.g., last 3 months)
Old data needs periodic deletion (DROP PARTITION instead of DELETE)
VACUUM takes too long to process a large table

Range Partitioning (Date-based)

-- Create monthly range-partitioned table
CREATE TABLE events (
  id          BIGSERIAL,
  event_type  VARCHAR(50) NOT NULL,
  payload     JSONB,
  created_at  TIMESTAMPTZ NOT NULL,
  PRIMARY KEY (id, created_at)
) PARTITION BY RANGE (created_at);

-- Create monthly partitions
CREATE TABLE events_2026_01 PARTITION OF events
  FOR VALUES FROM ('2026-01-01') TO ('2026-02-01');
CREATE TABLE events_2026_02 PARTITION OF events
  FOR VALUES FROM ('2026-02-01') TO ('2026-03-01');
CREATE TABLE events_2026_03 PARTITION OF events
  FOR VALUES FROM ('2026-03-01') TO ('2026-04-01');

-- Create index on each partition (auto-inherited when created on parent)
CREATE INDEX idx_events_type_created ON events (event_type, created_at);

-- Verify partition pruning: query only March 2026 data
EXPLAIN ANALYZE
SELECT * FROM events
WHERE created_at >= '2026-03-01' AND created_at < '2026-04-01'
  AND event_type = 'purchase';

In the execution plan, you can confirm that only the 2026_03 partition is scanned under the Append node. This is Partition Pruning.

Automated Partition Management

The pg_partman extension automates partition creation and deletion.

-- Automatic management setup after installing pg_partman
CREATE EXTENSION pg_partman;

SELECT partman.create_parent(
  p_parent_table => 'public.events',
  p_control => 'created_at',
  p_type => 'range',
  p_interval => '1 month',
  p_premake => 3  -- pre-create 3 months ahead
);

-- Automatic old partition deletion (12 month retention)
UPDATE partman.part_config
SET retention = '12 months',
    retention_keep_table = false
WHERE parent_table = 'public.events';

Old Data Cleanup: DELETE vs DROP PARTITION

-- Bad: Mass DELETE creates Dead Tuples and VACUUM load
DELETE FROM events WHERE created_at < '2025-01-01';

-- Good: Detach and drop entire partition - instant, no VACUUM needed
ALTER TABLE events DETACH PARTITION events_2024_12;
DROP TABLE events_2024_12;

Connection Pooling (PgBouncer, Pgpool-II)

Why Connection Pooling Is Needed

PostgreSQL forks one process per connection. Each connection consumes approximately 5-10MB of memory, and when concurrent connections exceed hundreds, performance degrades sharply due to context switching overhead. Simply increasing max_connections is not the solution.

PgBouncer Configuration

PgBouncer is a lightweight connection pooler that can handle thousands of client connections in a single process.

;; pgbouncer.ini

[databases]
myapp = host=127.0.0.1 port=5432 dbname=myapp_production

[pgbouncer]
;; Pooling mode: transaction is most efficient
pool_mode = transaction

;; Maximum client connections
max_client_conn = 1000

;; Server connections per database (actual PostgreSQL connections)
default_pool_size = 30
min_pool_size = 10
reserve_pool_size = 5

;; Idle connection timeout
server_idle_timeout = 300

;; Client connection queue timeout
client_login_timeout = 60

;; Logging
log_connections = 1
log_disconnections = 1
log_pooler_errors = 1

;; Monitoring port
listen_port = 6432
listen_addr = 0.0.0.0
auth_type = md5
auth_file = /etc/pgbouncer/userlist.txt

The differences between pooling modes:

session: Holds server connection until client disconnects. Highest compatibility but lowest efficiency.
transaction: Returns server connection when transaction ends. The optimal choice for most cases.
statement: Returns connection after each SQL statement. Requires caution as multi-statement transactions can have issues.

Pgpool-II vs PgBouncer Selection Criteria

Pgpool-II is middleware that provides load balancing, failover, and query caching in addition to connection pooling. If you only need connection pooling, PgBouncer is lighter and faster, but if you need an integrated solution for managing PostgreSQL clusters directly, consider Pgpool-II.

# Check PgBouncer status
psql -h 127.0.0.1 -p 6432 -U pgbouncer pgbouncer -c "SHOW POOLS;"

# Check active connections and wait status
psql -h 127.0.0.1 -p 6432 -U pgbouncer pgbouncer -c "SHOW CLIENTS;"

# Check actual connection count on PostgreSQL side
psql -c "SELECT count(*) FROM pg_stat_activity WHERE state = 'active';"

Index Type Comparison Table

Index Type	Suitable Data	Suitable Operators	Index Size	Write Cost	Primary Use Cases
B-tree	Scalar values (int, string, date)	=, <, >, BETWEEN, IN, IS NULL	Medium	Low	PK, FK, range search, sorting
GIN	Composite values (JSONB, array, tsvector)	@>, ?\|, ?&, @@ (full-text)	Large	High	JSONB search, full-text, tags
GiST	Geometric/spatial data, range types	&&, @>, <@, <-> (distance)	Medium	Medium	PostGIS, IP ranges, time ranges
BRIN	Data with high physical correlation	=, <, >, BETWEEN	Very small	Very low	Time-series logs, append-only tables
Hash	When only equality comparison needed	=	Medium	Low	Exact value matching (WAL support PG 10+)
SP-GiST	Unbalanced tree structure data	Phone numbers, IP, geometry	Medium	Medium	Prefix search, quadtree

VACUUM Strategy Comparison Table

Strategy	Target	When to Use	Lock Impact	Duration
VACUUM	Dead tuple cleanup	Routine maintenance	Table read/write allowed	Proportional to size
VACUUM FULL	Table rewrite for disk reclaim	When bloat is severe	ACCESS EXCLUSIVE lock (service down)	Very long
VACUUM FREEZE	XID Wraparound prevention	Freeze old transaction IDs	Read/write allowed	Full table scan
Autovacuum	Automatic dead tuple cleanup	Always enabled	Read/write allowed	Adjustable by settings
pg_repack	Zero-downtime table restructuring	Instead of VACUUM FULL	Only brief lock needed	Proportional to size
ANALYZE	Statistics refresh	After bulk INSERT/UPDATE	Read/write allowed	Fast (sampling-based)

Operational Considerations and Troubleshooting

Slow Query Monitoring Setup

# postgresql.conf - Slow query logging
log_min_duration_statement = 500       # Log queries over 500ms
log_statement = 'none'                  # Use 'ddl' for DDL only
log_lock_waits = on                     # Log lock waits
deadlock_timeout = 1s                   # Deadlock detection interval

# Enable pg_stat_statements extension (essential)
shared_preload_libraries = 'pg_stat_statements'
pg_stat_statements.max = 10000
pg_stat_statements.track = 'all'

How to query the top 10 slowest queries with pg_stat_statements:

-- Install pg_stat_statements extension
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

-- Top 10 queries by average execution time
SELECT
  queryid,
  calls,
  ROUND(total_exec_time::numeric, 2) AS total_ms,
  ROUND(mean_exec_time::numeric, 2) AS avg_ms,
  ROUND((100 * total_exec_time / SUM(total_exec_time) OVER ())::numeric, 2) AS pct,
  rows,
  query
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 10;

Index Usage Audit

Unused indexes degrade write performance and waste disk space.

-- Find unused indexes
SELECT
  schemaname || '.' || indexrelname AS index_name,
  pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
  idx_scan AS scan_count,
  idx_tup_read,
  idx_tup_fetch
FROM pg_stat_user_indexes
WHERE idx_scan = 0
  AND indexrelname NOT LIKE 'pg_%'
ORDER BY pg_relation_size(indexrelid) DESC;

-- Find duplicate indexes
SELECT
  pg_size_pretty(SUM(pg_relation_size(idx))::bigint) AS total_size,
  (array_agg(idx))[1] AS idx1,
  (array_agg(idx))[2] AS idx2,
  (array_agg(indkey))[1] AS columns1,
  (array_agg(indkey))[2] AS columns2
FROM (
  SELECT indexrelid::regclass AS idx,
         indrelid, indkey,
         COALESCE(indexprs::text, '') AS exprs,
         COALESCE(indpred::text, '') AS preds
  FROM pg_index
) sub
GROUP BY indrelid, exprs, preds
HAVING COUNT(*) > 1;

Lock Monitoring

-- Check current lock wait situations
SELECT
  blocked_locks.pid AS blocked_pid,
  blocked_activity.usename AS blocked_user,
  blocking_locks.pid AS blocking_pid,
  blocking_activity.usename AS blocking_user,
  blocked_activity.query AS blocked_query,
  blocking_activity.query AS blocking_query,
  blocked_activity.wait_event_type
FROM pg_catalog.pg_locks blocked_locks
JOIN pg_catalog.pg_stat_activity blocked_activity
  ON blocked_activity.pid = blocked_locks.pid
JOIN pg_catalog.pg_locks blocking_locks
  ON blocking_locks.locktype = blocked_locks.locktype
  AND blocking_locks.database IS NOT DISTINCT FROM blocked_locks.database
  AND blocking_locks.relation IS NOT DISTINCT FROM blocked_locks.relation
  AND blocking_locks.page IS NOT DISTINCT FROM blocked_locks.page
  AND blocking_locks.tuple IS NOT DISTINCT FROM blocked_locks.tuple
  AND blocking_locks.transactionid IS NOT DISTINCT FROM blocked_locks.transactionid
  AND blocking_locks.pid != blocked_locks.pid
JOIN pg_catalog.pg_stat_activity blocking_activity
  ON blocking_activity.pid = blocking_locks.pid
WHERE NOT blocked_locks.granted;

shared_buffers and work_mem Tuning

# postgresql.conf - Core memory parameters

# shared_buffers: Recommended 25% of total RAM (max 40%)
# Based on 64GB RAM server
shared_buffers = 16GB

# work_mem: Per-session memory for sorting, hash joins, etc.
# Caution: allocated per node per query, so don't set too high
# Consider (concurrent connections * nodes per query)
work_mem = 64MB

# maintenance_work_mem: Used for VACUUM, CREATE INDEX, etc.
maintenance_work_mem = 2GB

# effective_cache_size: Total cache size including OS cache (planner hint)
# 50-75% of total RAM
effective_cache_size = 48GB

# WAL settings
wal_buffers = 64MB
checkpoint_completion_target = 0.9
max_wal_size = 4GB
min_wal_size = 1GB

Failure Cases and Recovery Procedures

Case 1: Service Outage from Unindexed JOIN

Situation: A developer deployed a new report query to production that JOINed two 10-million-row tables without indexes, executing as a Nested Loop. CPU at 100%, connection pool exhausted, entire service down.

Immediate Response:

-- 1. Identify sessions running the problematic query
SELECT pid, now() - pg_stat_activity.query_start AS duration, query, state
FROM pg_stat_activity
WHERE state != 'idle'
  AND query NOT LIKE '%pg_stat_activity%'
ORDER BY duration DESC;

-- 2. Gracefully cancel the problematic query
SELECT pg_cancel_backend(12345);

-- 3. Force kill if it doesn't terminate
SELECT pg_terminate_backend(12345);

Root Fix: Add the required index using the CONCURRENTLY option.

-- CONCURRENTLY: Create index without table lock (essential for production)
CREATE INDEX CONCURRENTLY idx_reports_user_date
  ON reports (user_id, report_date);

-- Index creation can fail and remain in INVALID state. Always verify!
SELECT indexrelname, idx_scan, pg_size_pretty(pg_relation_size(indexrelid))
FROM pg_stat_user_indexes
WHERE schemaname = 'public' AND relname = 'reports';

Case 2: Service Outage During VACUUM FULL

Situation: A DBA ran VACUUM FULL during production hours to address table bloat. The ACCESS EXCLUSIVE lock blocked all reads/writes to the table, causing a service outage.

Lesson: Never run VACUUM FULL during production hours. Use pg_repack instead.

# pg_repack: Zero-downtime table restructuring (only brief lock needed)
pg_repack -d myapp_production -t orders --no-superuser-check

# Restructure specific index only
pg_repack -d myapp_production --index idx_orders_user_created

Case 3: Imminent XID Wraparound Warning

Situation: Autovacuum was blocked by a long-running transaction, XID exhaustion imminent. PostgreSQL began outputting WARNING logs.

-- Check XID exhaustion status
SELECT
  datname,
  age(datfrozenxid) AS xid_age,
  ROUND(100 * age(datfrozenxid)::numeric / 2147483647, 2) AS pct_toward_wraparound
FROM pg_database
ORDER BY age(datfrozenxid) DESC;

-- Find long-running transactions (autovacuum blocking cause)
SELECT
  pid,
  now() - xact_start AS xact_duration,
  now() - query_start AS query_duration,
  state,
  query
FROM pg_stat_activity
WHERE xact_start IS NOT NULL
  AND state != 'idle'
ORDER BY xact_start ASC;

-- After terminating problematic transaction, run manual VACUUM FREEZE
VACUUM FREEZE orders;

Case 4: Connection Pool Exhaustion

Situation: The application pattern of opening a transaction, calling an external API, then closing led to connections being held for extended periods during external API delays, exhausting the pool.

Root Fix: Minimize transaction scope and perform external API calls outside transactions. Set appropriate query_wait_timeout and server_idle_timeout in PgBouncer to reclaim idle connections.

;; pgbouncer.ini - Connection exhaustion prevention
query_wait_timeout = 30        ;; Max 30s waiting for server connection
server_idle_timeout = 60       ;; Return server connection after 60s idle
server_lifetime = 3600         ;; Recreate server connection after max 1 hour
client_idle_timeout = 300      ;; Disconnect idle client after 5 minutes

Conclusion

PostgreSQL performance optimization cannot be achieved with a single technique — it requires systematic application of optimizations across multiple layers.

Query level: Analyze execution plans with EXPLAIN ANALYZE and rewrite queries
Index level: Choose index types matching data characteristics and optimize composite index column order
Table level: Split large tables with partitioning and tune VACUUM settings per table
Server level: Adjust memory parameters like shared_buffers and work_mem for the workload
Infrastructure level: Manage connections with PgBouncer and establish monitoring

The most important thing in performance optimization is measurement. Always understand the current state using tools like pg_stat_statements, EXPLAIN ANALYZE, and pg_stat_user_tables, and compare before and after changes. Optimization by intuition can actually make things worse.

In production, always remember:

Use the CONCURRENTLY option for index creation
Use pg_repack instead of VACUUM FULL
Never disable Autovacuum
Optimization without monitoring is meaningless