Skip to content

Split View: 데이터베이스 성능 튜닝 완전 가이드: 쿼리 최적화, 인덱스 전략, 커넥션 풀 관리까지

✨ Learn with Quiz
|

데이터베이스 성능 튜닝 완전 가이드: 쿼리 최적화, 인덱스 전략, 커넥션 풀 관리까지

TL;DR

  • EXPLAIN ANALYZE: 쿼리 실행 계획의 실제 비용과 행 수를 분석하는 핵심 도구
  • 인덱스 전략: B-Tree(기본), GIN(전문검색), BRIN(시계열), 부분/표현식 인덱스 활용
  • Slow Query: pg_stat_statements, slow_query_log로 병목 쿼리 식별
  • N+1 문제: ORM의 Eager Loading, BatchSize, JOIN으로 해결
  • 커넥션 풀: HikariCP(Java), PgBouncer(PostgreSQL)로 연결 관리
  • 파티셔닝: Range/List/Hash로 대용량 테이블 분할
  • 캐싱: Redis/Memcached로 읽기 부하 분산

목차

  1. 왜 DB가 병목인가
  2. EXPLAIN ANALYZE 완전 해석
  3. 인덱스 전략
  4. Slow Query 분석
  5. N+1 쿼리 문제
  6. 커넥션 풀링
  7. 쿼리 최적화 패턴
  8. 파티셔닝
  9. 읽기 복제본과 캐싱
  10. PostgreSQL 전용 최적화
  11. MySQL 전용 최적화
  12. 실전 퀴즈
  13. 참고 자료

1. 왜 DB가 병목인가

1.1 일반적인 성능 병목 지점

웹 애플리케이션의 성능 병목 중 80% 이상이 데이터베이스에서 발생합니다.

요청 흐름에서의 병목:
Client -> CDN -> Load Balancer -> App Server -> [Database]
                                           대부분의 지연 발생
병목 유형원인영향
느린 쿼리인덱스 부재, 비효율적 JOIN응답 시간 증가
락 경합동시 업데이트 충돌처리량 감소
커넥션 고갈풀 크기 부족요청 대기/실패
과도한 I/O전체 테이블 스캔CPU/디스크 과부하
네트워크 왕복N+1 쿼리불필요한 RTT 증가

1.2 성능 튜닝 우선순위

성능 개선 효과가 큰 순서대로 접근해야 합니다.

성능 개선 효과 피라미드:

        /\
       /  \        쿼리 최적화 (가장 큰 효과)
      /____\
     /      \      인덱스 전략
    /________\
   /          \    스키마 설계
  /____________\
 /              \  하드웨어/설정 튜닝
/________________\ 캐싱/읽기 복제본

2. EXPLAIN ANALYZE 완전 해석

2.1 기본 사용법

-- PostgreSQL
EXPLAIN ANALYZE
SELECT u.name, o.total
FROM users u
JOIN orders o ON u.id = o.user_id
WHERE o.created_at > '2025-01-01'
  AND o.total > 100;

2.2 실행 계획 해석

QUERY PLAN
---------------------------------------------------------------------------
Nested Loop  (cost=0.85..892.45 rows=23 width=48)
             (actual time=0.045..2.341 rows=19 loops=1)
  ->  Index Scan using idx_orders_created_at on orders o
        (cost=0.42..445.23 rows=23 width=20)
        (actual time=0.025..1.123 rows=19 loops=1)
        Filter: (total > 100)
        Rows Removed by Filter: 5
  ->  Index Scan using users_pkey on users u
        (cost=0.42..8.44 rows=1 width=36)
        (actual time=0.008..0.008 rows=1 loops=19)
Planning Time: 0.234 ms
Execution Time: 2.456 ms

핵심 해석 포인트:

항목의미주의점
cost=0.85..892.45시작~총 예상 비용단위는 임의 비용 단위
rows=23예상 행 수actual rows와 큰 차이면 통계 갱신 필요
actual time실제 소요 시간 (ms)loops와 곱해야 실제 시간
loops=19반복 실행 횟수Nested Loop에서 중요
Rows Removed by Filter필터로 제거된 행많으면 인덱스 개선 필요

2.3 스캔 유형별 이해

-- 1. Seq Scan (순차 스캔) - 전체 테이블 읽기
-- 작은 테이블이거나 대부분의 행을 읽어야 할 때 적합
EXPLAIN ANALYZE SELECT * FROM small_table;

-- 2. Index Scan - 인덱스로 행 위치 찾은 후 테이블 접근
-- 선택도가 높은(적은 행 반환) 쿼리에 적합
EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'user@example.com';

-- 3. Index Only Scan - 인덱스만으로 결과 반환 (테이블 접근 불필요)
-- Covering Index 사용 시
EXPLAIN ANALYZE SELECT id, email FROM users WHERE email = 'user@example.com';

-- 4. Bitmap Index Scan - 인덱스로 비트맵 생성 후 테이블 접근
-- 중간 선택도 쿼리에 적합
EXPLAIN ANALYZE SELECT * FROM orders WHERE status = 'pending' AND total > 100;
스캔 유형 성능 순서 (일반적):

Index Only Scan  >  Index Scan  >  Bitmap Scan  >  Seq Scan
(가장 빠름)                                         (가장 느림)

2.4 EXPLAIN 옵션들

-- PostgreSQL EXPLAIN 옵션
EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON)
SELECT * FROM orders WHERE user_id = 42;

-- BUFFERS: I/O 정보 포함
-- FORMAT JSON: JSON 형식으로 출력
-- VERBOSE: 추가 정보 포함
-- COSTS: 비용 추정값 포함 (기본 ON)
-- TIMING: 시간 정보 포함 (기본 ON)
-- MySQL EXPLAIN 옵션
EXPLAIN ANALYZE
SELECT * FROM orders WHERE user_id = 42;

-- MySQL 8.0+에서 트리 형식 지원
EXPLAIN FORMAT=TREE
SELECT * FROM orders WHERE user_id = 42;

3. 인덱스 전략

3.1 B-Tree 인덱스 (기본)

B-Tree는 가장 일반적인 인덱스 유형으로, 등호와 범위 검색에 적합합니다.

-- 단일 컬럼 인덱스
CREATE INDEX idx_users_email ON users(email);

-- 복합 인덱스 (컬럼 순서가 중요!)
CREATE INDEX idx_orders_user_date ON orders(user_id, created_at);

-- 복합 인덱스 활용 규칙 (Leftmost Prefix Rule)
-- idx_orders_user_date는 다음 쿼리에 활용됨:
-- WHERE user_id = 1                          (O)
-- WHERE user_id = 1 AND created_at > '2025'  (O)
-- WHERE created_at > '2025'                  (X) 선행 컬럼 없음

3.2 인덱스 유형 비교

유형용도PostgreSQLMySQL
B-Tree등호, 범위, 정렬기본기본
Hash등호만지원지원
GIN전문검색, 배열, JSONB지원미지원
GiST지리/기하학, 범위지원미지원
BRIN시계열, 순차 데이터지원미지원
Full-Text전문검색지원지원

3.3 GIN 인덱스 (전문검색/JSONB)

-- 전문검색 GIN 인덱스
CREATE INDEX idx_articles_search
ON articles USING GIN(to_tsvector('english', title || ' ' || body));

-- JSONB GIN 인덱스
CREATE INDEX idx_events_data ON events USING GIN(metadata);

-- JSONB 특정 경로 인덱스
CREATE INDEX idx_events_type ON events USING GIN((metadata -> 'type'));

-- 사용 예시
SELECT * FROM articles
WHERE to_tsvector('english', title || ' ' || body) @@ to_tsquery('postgresql & tuning');

SELECT * FROM events
WHERE metadata @> '{"type": "click", "page": "/home"}';

3.4 BRIN 인덱스 (시계열 데이터)

-- BRIN 인덱스 - 시계열 데이터에 매우 효율적
-- B-Tree보다 100배 이상 작은 크기
CREATE INDEX idx_logs_created ON logs USING BRIN(created_at);

-- 물리적 정렬과 상관관계가 높아야 효과적
-- 확인 방법:
SELECT correlation
FROM pg_stats
WHERE tablename = 'logs' AND attname = 'created_at';
-- correlation이 1에 가까우면 BRIN 적합

3.5 부분 인덱스와 표현식 인덱스

-- 부분 인덱스 (Partial Index) - 특정 조건의 행만 인덱싱
CREATE INDEX idx_orders_pending
ON orders(created_at)
WHERE status = 'pending';
-- 전체 행의 5%만 인덱싱 -> 크기 95% 절약

-- 표현식 인덱스 (Expression Index)
CREATE INDEX idx_users_lower_email
ON users(LOWER(email));
-- WHERE LOWER(email) = 'user@example.com' 쿼리에 활용

-- 커버링 인덱스 (Covering Index)
CREATE INDEX idx_orders_covering
ON orders(user_id, created_at) INCLUDE (total, status);
-- Index Only Scan 가능 -> 테이블 접근 불필요

3.6 인덱스 설계 체크리스트

인덱스 설계 체크리스트:

필수 확인:
  [x] WHERE 절에 자주 사용되는 컬럼에 인덱스
  [x] JOIN 조건 컬럼에 인덱스
  [x] ORDER BY 컬럼이 인덱스에 포함
  [x] 복합 인덱스의 컬럼 순서 최적화
  [x] 선택도(Selectivity)가 높은 컬럼 우선

주의사항:
  [x] 쓰기 성능 vs 읽기 성능 트레이드오프
  [x] 사용되지 않는 인덱스 제거
  [x] 인덱스 크기 모니터링
  [x] 인덱스 bloat 관리 (PostgreSQL)

4. Slow Query 분석

4.1 PostgreSQL - pg_stat_statements

-- pg_stat_statements 활성화 (postgresql.conf)
-- shared_preload_libraries = 'pg_stat_statements'

-- 확장 설치
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

-- 느린 쿼리 Top 10 (총 시간 기준)
SELECT
  substring(query, 1, 80) AS short_query,
  calls,
  round(total_exec_time::numeric, 2) AS total_ms,
  round(mean_exec_time::numeric, 2) AS mean_ms,
  round((100 * total_exec_time / sum(total_exec_time) OVER ())::numeric, 2) AS pct,
  rows
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;

-- 평균 실행 시간이 긴 쿼리
SELECT
  substring(query, 1, 100) AS short_query,
  calls,
  round(mean_exec_time::numeric, 2) AS mean_ms,
  rows / NULLIF(calls, 0) AS avg_rows
FROM pg_stat_statements
WHERE calls > 100
ORDER BY mean_exec_time DESC
LIMIT 10;

4.2 MySQL - Slow Query Log

# my.cnf - Slow Query Log 설정
[mysqld]
slow_query_log = 1
slow_query_log_file = /var/log/mysql/slow-query.log
long_query_time = 1.0          # 1초 이상 쿼리 기록
log_queries_not_using_indexes = 1  # 인덱스 미사용 쿼리 기록
min_examined_row_limit = 100   # 최소 100행 이상 검사한 쿼리
# mysqldumpslow로 Slow Query 분석
mysqldumpslow -s t -t 10 /var/log/mysql/slow-query.log

# pt-query-digest (Percona Toolkit)
pt-query-digest /var/log/mysql/slow-query.log \
  --limit 10 \
  --order-by query_time:sum

4.3 자동화된 쿼리 모니터링

-- PostgreSQL: 실시간 장시간 쿼리 확인
SELECT
  pid,
  now() - pg_stat_activity.query_start AS duration,
  query,
  state
FROM pg_stat_activity
WHERE (now() - pg_stat_activity.query_start) > interval '5 seconds'
  AND state != 'idle'
ORDER BY duration DESC;

-- 락 대기 쿼리 확인
SELECT
  blocked.pid AS blocked_pid,
  blocked.query AS blocked_query,
  blocking.pid AS blocking_pid,
  blocking.query AS blocking_query
FROM pg_stat_activity blocked
JOIN pg_locks bl ON bl.pid = blocked.pid
JOIN pg_locks kl ON kl.locktype = bl.locktype
  AND kl.database IS NOT DISTINCT FROM bl.database
  AND kl.relation IS NOT DISTINCT FROM bl.relation
JOIN pg_stat_activity blocking ON kl.pid = blocking.pid
WHERE NOT bl.granted AND kl.granted;

5. N+1 쿼리 문제

5.1 N+1 문제란?

# N+1 문제 예시 (Python/SQLAlchemy)

# BAD: N+1 쿼리 발생
users = session.query(User).all()  # 1번 쿼리
for user in users:
    print(user.orders)  # N번 추가 쿼리 (사용자마다 1번)

# 실제 실행되는 SQL:
# SELECT * FROM users;                    -- 1번
# SELECT * FROM orders WHERE user_id = 1;  -- +1
# SELECT * FROM orders WHERE user_id = 2;  -- +1
# SELECT * FROM orders WHERE user_id = 3;  -- +1
# ... (N명의 사용자)

5.2 해결 방법

# 해결 1: Eager Loading (JOIN)
users = session.query(User).options(
    joinedload(User.orders)
).all()
# SQL: SELECT * FROM users LEFT JOIN orders ON ...  -- 1번 쿼리

# 해결 2: Subquery Loading
users = session.query(User).options(
    subqueryload(User.orders)
).all()
# SQL: SELECT * FROM users;                         -- 1번
# SQL: SELECT * FROM orders WHERE user_id IN (...);  -- 1번

# 해결 3: selectinload (SQLAlchemy 추천)
users = session.query(User).options(
    selectinload(User.orders)
).all()
// Node.js (Prisma) - N+1 해결
// BAD: N+1
const users = await prisma.user.findMany();
for (const user of users) {
  const orders = await prisma.order.findMany({
    where: { userId: user.id }
  });
}

// GOOD: Include로 해결
const users = await prisma.user.findMany({
  include: { orders: true }
});

// GOOD: 필요한 필드만 선택
const users = await prisma.user.findMany({
  select: {
    id: true,
    name: true,
    orders: {
      select: { id: true, total: true },
      where: { total: { gt: 100 } }
    }
  }
});
// Java (JPA/Hibernate)
// BAD: LAZY 로딩으로 N+1 발생
@Entity
public class User {
    @OneToMany(mappedBy = "user", fetch = FetchType.LAZY)
    private List<Order> orders;
}

// GOOD 1: JPQL JOIN FETCH
@Query("SELECT u FROM User u JOIN FETCH u.orders")
List<User> findAllWithOrders();

// GOOD 2: EntityGraph
@EntityGraph(attributePaths = {"orders"})
@Query("SELECT u FROM User u")
List<User> findAllWithOrders();

// GOOD 3: BatchSize (Hibernate)
@Entity
public class User {
    @OneToMany(mappedBy = "user")
    @BatchSize(size = 100)
    private List<Order> orders;
}

5.3 N+1 탐지

# Spring Boot - Hibernate N+1 탐지
# application.yml
spring:
  jpa:
    properties:
      hibernate:
        generate_statistics: true
        session.events.log.LOG_QUERIES_SLOWER_THAN_MS: 25

logging:
  level:
    org.hibernate.SQL: DEBUG
    org.hibernate.stat: DEBUG
# Django - django-debug-toolbar 또는 nplusone
# settings.py
INSTALLED_APPS = [
    'nplusone.ext.django',
]

NPLUSONE_RAISE = True  # 개발 환경에서 N+1 발견 시 에러 발생

6. 커넥션 풀링

6.1 왜 커넥션 풀이 필요한가?

DB 커넥션 생성은 비용이 큽니다: TCP 연결 + 인증 + 세션 초기화에 수십 ms가 소요됩니다.

커넥션 풀 없이:
Request -> [새 연결 생성 30ms] -> [쿼리 5ms] -> [연결 종료] = 35ms

커넥션 풀 사용:
Request -> [풀에서 기존 연결 가져오기 0.1ms] -> [쿼리 5ms] -> [풀에 반환] = 5.1ms

6.2 HikariCP 설정 (Java)

# application.yml - HikariCP 최적 설정
spring:
  datasource:
    hikari:
      # 풀 크기 공식: connections = (core_count * 2) + spinning_disks
      # 4코어 SSD 서버: (4 * 2) + 0 = 8
      maximum-pool-size: 10
      minimum-idle: 5

      # 커넥션 타임아웃 (풀에서 대기 최대 시간)
      connection-timeout: 30000   # 30초

      # 유휴 커넥션 제거 시간
      idle-timeout: 600000        # 10분

      # 커넥션 최대 수명
      max-lifetime: 1800000       # 30분

      # 커넥션 검증 쿼리
      connection-test-query: "SELECT 1"

      # 커넥션 누수 감지
      leak-detection-threshold: 60000  # 60초

      # 풀 이름 (모니터링용)
      pool-name: "MyApp-HikariPool"

6.3 PgBouncer 설정 (PostgreSQL)

# pgbouncer.ini
[databases]
myapp = host=localhost port=5432 dbname=myapp

[pgbouncer]
# 풀링 모드
# session: 세션 유지 (기본)
# transaction: 트랜잭션 단위 (권장)
# statement: 문장 단위 (제한적)
pool_mode = transaction

# 풀 크기
default_pool_size = 20
min_pool_size = 5
reserve_pool_size = 5

# 연결 제한
max_client_conn = 200
max_db_connections = 50

# 타임아웃
server_idle_timeout = 600
client_idle_timeout = 300
query_timeout = 30
client_login_timeout = 60

# 인증
auth_type = md5
auth_file = /etc/pgbouncer/userlist.txt

# 로그
log_connections = 1
log_disconnections = 1
log_pooler_errors = 1
stats_period = 60

# 관리 인터페이스
listen_addr = 0.0.0.0
listen_port = 6432
admin_users = pgbouncer

6.4 커넥션 풀 모니터링

-- PgBouncer 상태 확인
SHOW pools;
SHOW stats;
SHOW servers;
SHOW clients;

-- PostgreSQL 연결 수 확인
SELECT count(*) AS total_connections,
       count(*) FILTER (WHERE state = 'active') AS active,
       count(*) FILTER (WHERE state = 'idle') AS idle,
       count(*) FILTER (WHERE state = 'idle in transaction') AS idle_in_txn
FROM pg_stat_activity
WHERE backend_type = 'client backend';

-- 최대 연결 수 확인
SHOW max_connections;

7. 쿼리 최적화 패턴

7.1 안티패턴 10가지

-- 안티패턴 1: SELECT *
-- BAD
SELECT * FROM users WHERE id = 1;
-- GOOD
SELECT id, name, email FROM users WHERE id = 1;

-- 안티패턴 2: 함수 사용으로 인덱스 무효화
-- BAD (인덱스 사용 불가)
SELECT * FROM users WHERE YEAR(created_at) = 2025;
-- GOOD (인덱스 사용 가능)
SELECT * FROM users
WHERE created_at BETWEEN '2025-01-01' AND '2025-12-31';

-- 안티패턴 3: LIKE 선행 와일드카드
-- BAD (인덱스 사용 불가)
SELECT * FROM products WHERE name LIKE '%phone%';
-- GOOD (전문검색 인덱스 활용)
SELECT * FROM products
WHERE to_tsvector('english', name) @@ to_tsquery('phone');

-- 안티패턴 4: OR 대신 UNION ALL
-- BAD (인덱스 비효율적 사용)
SELECT * FROM orders WHERE user_id = 1 OR status = 'pending';
-- GOOD
SELECT * FROM orders WHERE user_id = 1
UNION ALL
SELECT * FROM orders WHERE status = 'pending' AND user_id != 1;

-- 안티패턴 5: 큰 IN 절
-- BAD
SELECT * FROM products WHERE id IN (1, 2, 3, ..., 10000);
-- GOOD (임시 테이블 또는 ANY 사용)
SELECT * FROM products WHERE id = ANY(ARRAY[1, 2, 3, ...]);
-- 또는 임시 테이블 JOIN
-- 안티패턴 6: 불필요한 DISTINCT
-- BAD
SELECT DISTINCT u.name FROM users u JOIN orders o ON u.id = o.user_id;
-- GOOD (EXISTS 사용)
SELECT u.name FROM users u WHERE EXISTS (
  SELECT 1 FROM orders o WHERE o.user_id = u.id
);

-- 안티패턴 7: COUNT(*) vs EXISTS
-- BAD (전체 행 카운트)
SELECT CASE WHEN COUNT(*) > 0 THEN true ELSE false END
FROM orders WHERE user_id = 1;
-- GOOD (첫 행만 확인)
SELECT EXISTS(SELECT 1 FROM orders WHERE user_id = 1);

-- 안티패턴 8: 암시적 타입 변환
-- BAD (문자열 컬럼에 숫자 비교 -> 인덱스 무효)
SELECT * FROM users WHERE phone = 01012345678;
-- GOOD
SELECT * FROM users WHERE phone = '01012345678';

-- 안티패턴 9: OFFSET 페이지네이션
-- BAD (큰 OFFSET은 느림)
SELECT * FROM orders ORDER BY id LIMIT 20 OFFSET 100000;
-- GOOD (키셋 페이지네이션)
SELECT * FROM orders WHERE id > 100000 ORDER BY id LIMIT 20;

-- 안티패턴 10: 과도한 서브쿼리
-- BAD
SELECT * FROM users WHERE id IN (
  SELECT user_id FROM orders WHERE total > (
    SELECT AVG(total) FROM orders
  )
);
-- GOOD (CTE 또는 JOIN)
WITH avg_total AS (SELECT AVG(total) AS avg_val FROM orders)
SELECT DISTINCT u.*
FROM users u
JOIN orders o ON u.id = o.user_id
CROSS JOIN avg_total
WHERE o.total > avg_total.avg_val;

7.2 JOIN 최적화

-- JOIN 순서 최적화 (작은 테이블을 먼저)
-- PostgreSQL은 자동 최적화하지만, 힌트가 필요한 경우:

-- 테이블 크기 확인
SELECT relname, reltuples::bigint AS row_count
FROM pg_class
WHERE relname IN ('users', 'orders', 'products')
ORDER BY reltuples DESC;

-- JOIN에서 인덱스가 있는 컬럼 사용
-- GOOD
SELECT u.name, o.total
FROM orders o
JOIN users u ON u.id = o.user_id  -- users.id는 PK (인덱스)
WHERE o.status = 'completed';

-- Lateral Join 활용 (상위 N개 패턴)
SELECT u.name, recent_orders.*
FROM users u
CROSS JOIN LATERAL (
  SELECT id, total, created_at
  FROM orders
  WHERE user_id = u.id
  ORDER BY created_at DESC
  LIMIT 3
) recent_orders;

8. 파티셔닝

8.1 파티셔닝이 필요한 시점

테이블 크기가 수천만 행을 넘거나 특정 범위의 데이터만 자주 조회할 때 파티셔닝을 고려합니다.

8.2 PostgreSQL 파티셔닝

-- Range 파티셔닝 (시계열 데이터에 적합)
CREATE TABLE orders (
  id bigserial,
  user_id bigint NOT NULL,
  total decimal(10,2),
  status varchar(20),
  created_at timestamp NOT NULL
) PARTITION BY RANGE (created_at);

-- 월별 파티션 생성
CREATE TABLE orders_2025_01
  PARTITION OF orders
  FOR VALUES FROM ('2025-01-01') TO ('2025-02-01');

CREATE TABLE orders_2025_02
  PARTITION OF orders
  FOR VALUES FROM ('2025-02-01') TO ('2025-03-01');

-- 자동 파티션 생성 (pg_partman 활용)
-- CREATE EXTENSION pg_partman;
SELECT partman.create_parent(
  'public.orders',
  'created_at',
  'native',
  'monthly'
);
-- List 파티셔닝 (카테고리별 분할)
CREATE TABLE products (
  id bigserial,
  name varchar(255),
  category varchar(50),
  price decimal(10,2)
) PARTITION BY LIST (category);

CREATE TABLE products_electronics
  PARTITION OF products
  FOR VALUES IN ('electronics', 'computers', 'phones');

CREATE TABLE products_clothing
  PARTITION OF products
  FOR VALUES IN ('clothing', 'shoes', 'accessories');
-- Hash 파티셔닝 (균등 분배)
CREATE TABLE sessions (
  id uuid,
  user_id bigint,
  data jsonb,
  expires_at timestamp
) PARTITION BY HASH (user_id);

CREATE TABLE sessions_p0
  PARTITION OF sessions
  FOR VALUES WITH (MODULUS 4, REMAINDER 0);

CREATE TABLE sessions_p1
  PARTITION OF sessions
  FOR VALUES WITH (MODULUS 4, REMAINDER 1);

CREATE TABLE sessions_p2
  PARTITION OF sessions
  FOR VALUES WITH (MODULUS 4, REMAINDER 2);

CREATE TABLE sessions_p3
  PARTITION OF sessions
  FOR VALUES WITH (MODULUS 4, REMAINDER 3);

8.3 MySQL 파티셔닝

-- MySQL Range 파티셔닝
CREATE TABLE orders (
  id BIGINT AUTO_INCREMENT,
  user_id BIGINT NOT NULL,
  total DECIMAL(10,2),
  created_at DATETIME NOT NULL,
  PRIMARY KEY (id, created_at)
) PARTITION BY RANGE (YEAR(created_at)) (
  PARTITION p2023 VALUES LESS THAN (2024),
  PARTITION p2024 VALUES LESS THAN (2025),
  PARTITION p2025 VALUES LESS THAN (2026),
  PARTITION p_future VALUES LESS THAN MAXVALUE
);

9. 읽기 복제본과 캐싱

9.1 읽기 복제본 전략

쓰기/읽기 분리 아키텍처:

       쓰기 요청          읽기 요청
          |                  |
     [Primary]    ->    [Replica 1]
          |              [Replica 2]
          |              [Replica 3]
     비동기 복제
# Spring Boot - 읽기/쓰기 분리
spring:
  datasource:
    primary:
      url: jdbc:postgresql://primary:5432/myapp
      username: app_write
    replica:
      url: jdbc:postgresql://replica:5432/myapp
      username: app_read
# Django - 읽기/쓰기 라우터
# settings.py
DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.postgresql',
        'HOST': 'primary.db.example.com',
        'NAME': 'myapp',
    },
    'replica': {
        'ENGINE': 'django.db.backends.postgresql',
        'HOST': 'replica.db.example.com',
        'NAME': 'myapp',
    }
}

DATABASE_ROUTERS = ['myapp.routers.ReadReplicaRouter']

# routers.py
class ReadReplicaRouter:
    def db_for_read(self, model, **hints):
        return 'replica'

    def db_for_write(self, model, **hints):
        return 'default'

9.2 쿼리 캐싱 전략

# Redis 캐싱 패턴
import redis
import json
from functools import wraps

r = redis.Redis(host='localhost', port=6379, db=0)

# Cache-Aside (Lazy Loading) 패턴
def cache_aside(key_prefix, ttl=3600):
    def decorator(func):
        @wraps(func)
        def wrapper(*args, **kwargs):
            cache_key = f"{key_prefix}:{args}:{kwargs}"

            # 1. 캐시에서 조회
            cached = r.get(cache_key)
            if cached:
                return json.loads(cached)

            # 2. DB에서 조회
            result = func(*args, **kwargs)

            # 3. 캐시에 저장
            r.setex(cache_key, ttl, json.dumps(result))

            return result
        return wrapper
    return decorator

@cache_aside("user", ttl=1800)
def get_user_profile(user_id):
    return db.query("SELECT * FROM users WHERE id = %s", [user_id])
# Write-Through 패턴
def update_user(user_id, data):
    # 1. DB 업데이트
    db.execute("UPDATE users SET name=%s WHERE id=%s", [data['name'], user_id])

    # 2. 캐시 업데이트 (또는 삭제)
    cache_key = f"user:{user_id}"
    r.delete(cache_key)  # Cache Invalidation
    # 또는
    r.setex(cache_key, 1800, json.dumps(data))  # Cache Update
캐싱 전략 비교:

Cache-Aside:
  장점: 가장 일반적, 유연함
  단점: 첫 요청은 캐시 미스

Write-Through:
  장점: 캐시 항상 최신
  단점: 쓰기 지연 증가

Write-Behind (Write-Back):
  장점: 쓰기 성능 최고
  단점: 데이터 손실 위험

Read-Through:
  장점: 캐시 로직이 분리됨
  단점: 구현 복잡도

10. PostgreSQL 전용 최적화

10.1 VACUUM과 ANALYZE

PostgreSQL의 MVCC 모델은 업데이트/삭제 시 이전 행을 즉시 제거하지 않습니다. VACUUM이 이 죽은 행(dead tuples)을 정리합니다.

-- 수동 VACUUM
VACUUM (VERBOSE) orders;

-- VACUUM FULL (테이블 재작성 - 락 발생!)
VACUUM FULL orders;

-- 통계 갱신
ANALYZE orders;

-- VACUUM + ANALYZE 동시
VACUUM ANALYZE orders;
# postgresql.conf - autovacuum 설정
autovacuum = on
autovacuum_max_workers = 3
autovacuum_naptime = 60

# 테이블의 20% 변경 시 autovacuum 실행
autovacuum_vacuum_scale_factor = 0.2
autovacuum_vacuum_threshold = 50

# 테이블의 10% 변경 시 autoanalyze 실행
autovacuum_analyze_scale_factor = 0.1
autovacuum_analyze_threshold = 50

# autovacuum 속도 조절 (I/O 부하)
autovacuum_vacuum_cost_delay = 2ms
autovacuum_vacuum_cost_limit = 200

10.2 HOT Updates

HOT(Heap-Only Tuple) Updates는 인덱스 업데이트 없이 테이블만 업데이트하는 최적화입니다.

-- HOT Update 조건:
-- 1. 업데이트된 컬럼에 인덱스가 없음
-- 2. 새 행이 같은 페이지에 저장될 수 있음

-- HOT 비율 확인
SELECT
  relname,
  n_tup_upd,
  n_tup_hot_upd,
  round(100.0 * n_tup_hot_upd / NULLIF(n_tup_upd, 0), 2) AS hot_pct
FROM pg_stat_user_tables
WHERE n_tup_upd > 0
ORDER BY n_tup_upd DESC;

-- HOT 비율을 높이려면:
-- 1. 자주 업데이트되는 컬럼에는 인덱스를 최소화
-- 2. fillfactor를 낮추어 같은 페이지에 공간 확보
ALTER TABLE orders SET (fillfactor = 80);

10.3 pg_stat 뷰 활용

-- 테이블별 I/O 통계
SELECT
  relname,
  seq_scan,
  idx_scan,
  n_tup_ins,
  n_tup_upd,
  n_tup_del,
  n_live_tup,
  n_dead_tup
FROM pg_stat_user_tables
ORDER BY seq_scan DESC;

-- 인덱스 사용률
SELECT
  indexrelname,
  idx_scan,
  idx_tup_read,
  idx_tup_fetch,
  pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
ORDER BY idx_scan;

-- 사용되지 않는 인덱스 (삭제 후보)
SELECT
  indexrelname,
  idx_scan,
  pg_size_pretty(pg_relation_size(indexrelid)) AS size
FROM pg_stat_user_indexes
WHERE idx_scan = 0
  AND indexrelname NOT LIKE '%_pkey'
ORDER BY pg_relation_size(indexrelid) DESC;

-- 캐시 적중률
SELECT
  sum(heap_blks_read) AS heap_read,
  sum(heap_blks_hit) AS heap_hit,
  round(100.0 * sum(heap_blks_hit) / NULLIF(sum(heap_blks_hit) + sum(heap_blks_read), 0), 2) AS hit_pct
FROM pg_statio_user_tables;
-- 99% 이상이 이상적

11. MySQL 전용 최적화

11.1 InnoDB Buffer Pool

# my.cnf - InnoDB 최적화
[mysqld]
# Buffer Pool 크기 (전체 메모리의 70~80%)
innodb_buffer_pool_size = 8G

# Buffer Pool 인스턴스 (8G 이상일 때 분할)
innodb_buffer_pool_instances = 8

# 로그 파일 크기
innodb_log_file_size = 1G
innodb_log_buffer_size = 64M

# I/O 스레드
innodb_read_io_threads = 4
innodb_write_io_threads = 4

# 동시성
innodb_thread_concurrency = 0  # 자동

# flush 방법
innodb_flush_method = O_DIRECT
innodb_flush_log_at_trx_commit = 1  # 안전 (2: 성능 우선)

# 임시 테이블
tmp_table_size = 256M
max_heap_table_size = 256M
-- Buffer Pool 상태 확인
SHOW ENGINE INNODB STATUS\G

-- Buffer Pool 적중률
SELECT
  (1 - (
    (SELECT VARIABLE_VALUE FROM performance_schema.global_status
     WHERE VARIABLE_NAME = 'Innodb_buffer_pool_reads') /
    (SELECT VARIABLE_VALUE FROM performance_schema.global_status
     WHERE VARIABLE_NAME = 'Innodb_buffer_pool_read_requests')
  )) * 100 AS buffer_pool_hit_ratio;
-- 99% 이상이 이상적

11.2 MySQL 8.0 쿼리 캐시 제거

MySQL 8.0에서 Query Cache가 제거되었습니다. 대신 다음을 활용합니다.

MySQL 8.0 쿼리 캐시 대안:

1. 애플리케이션 레벨 캐싱 (Redis/Memcached)
2. ProxySQL 쿼리 캐시
3. MySQL Router 캐싱
4. InnoDB Buffer Pool 최적화
-- MySQL 8.0 Performance Schema 활용
-- 느린 쿼리 분석
SELECT
  DIGEST_TEXT,
  COUNT_STAR AS exec_count,
  ROUND(SUM_TIMER_WAIT/1000000000, 2) AS total_ms,
  ROUND(AVG_TIMER_WAIT/1000000000, 2) AS avg_ms,
  SUM_ROWS_EXAMINED,
  SUM_ROWS_SENT
FROM performance_schema.events_statements_summary_by_digest
ORDER BY SUM_TIMER_WAIT DESC
LIMIT 10;

11.3 MySQL 인덱스 힌트

-- 인덱스 힌트 (옵티마이저가 잘못된 선택을 할 때)
SELECT * FROM orders USE INDEX (idx_orders_user_date)
WHERE user_id = 1 AND created_at > '2025-01-01';

SELECT * FROM orders FORCE INDEX (idx_orders_status)
WHERE status = 'pending';

-- MySQL 8.0 옵티마이저 힌트
SELECT /*+ NO_INDEX(orders idx_orders_status) */
  * FROM orders WHERE user_id = 1;

SELECT /*+ JOIN_ORDER(users, orders) */
  u.name, o.total
FROM users u JOIN orders o ON u.id = o.user_id;

12. 실전 퀴즈

Q1: EXPLAIN ANALYZE에서 actual rows와 estimated rows가 크게 다를 때 어떻게 해야 하나요?

정답:

통계 정보가 오래되었을 가능성이 높습니다. 다음 조치를 취합니다:

  1. ANALYZE 실행: ANALYZE table_name;으로 통계를 갱신합니다.
  2. 통계 샘플 크기 증가: ALTER TABLE SET (n_distinct = ...) 또는 default_statistics_target 값을 높입니다.
  3. 인덱스 확인: 적절한 인덱스가 있는지 확인합니다.
  4. autovacuum 설정 확인: autovacuum_analyze_scale_factor가 적절한지 확인합니다.

estimated rows가 실제보다 매우 적으면 플래너가 Nested Loop을 선택하여 성능이 저하될 수 있고, 매우 많으면 불필요한 Seq Scan을 선택할 수 있습니다.

Q2: 복합 인덱스에서 컬럼 순서는 왜 중요한가요?

정답:

복합 인덱스는 Leftmost Prefix Rule을 따릅니다.

예를 들어 INDEX(a, b, c) 인덱스가 있을 때:

  • WHERE a = 1 - 사용 가능
  • WHERE a = 1 AND b = 2 - 사용 가능
  • WHERE a = 1 AND b = 2 AND c = 3 - 사용 가능
  • WHERE b = 2 - 사용 불가 (선행 컬럼 a 없음)
  • WHERE a = 1 AND c = 3 - a만 인덱스 사용, c는 필터

컬럼 순서 결정 기준:

  1. 등호 조건 컬럼을 먼저
  2. 범위 조건 컬럼을 나중에
  3. 선택도(Selectivity)가 높은 컬럼을 먼저
Q3: N+1 쿼리 문제를 탐지하는 방법은?

정답:

  1. 로그 분석: SQL 쿼리 로그에서 같은 패턴의 쿼리가 반복되는지 확인
  2. ORM 도구 활용: Django의 nplusone, Rails의 bullet gem, Spring의 hibernate.generate_statistics
  3. APM 도구: Datadog, New Relic 등에서 쿼리 패턴 분석
  4. pg_stat_statements: 같은 쿼리의 calls 수가 비정상적으로 높은지 확인
  5. 개발 환경: 쿼리 수 카운터를 테스트에 추가하여 예상 쿼리 수 검증

해결 방법: Eager Loading(JOIN), Batch Loading(IN 절), DataLoader 패턴 적용

Q4: 커넥션 풀 크기를 어떻게 결정하나요?

정답:

HikariCP 공식: connections = (core_count * 2) + effective_spindle_count

  • core_count: CPU 코어 수
  • effective_spindle_count: SSD는 0, HDD는 디스크 수

예시: 4코어 SSD 서버 = (4 * 2) + 0 = 8개

추가 고려사항:

  • 대부분의 애플리케이션에서 10~20개로 충분
  • 풀 크기를 무한정 늘리면 오히려 성능 저하 (컨텍스트 스위칭, 메모리)
  • PostgreSQL의 max_connections와 연결된 애플리케이션 수를 고려
  • PgBouncer 같은 외부 풀러를 사용하면 더 많은 클라이언트 지원 가능
Q5: OFFSET 기반 페이지네이션이 느린 이유와 대안은?

정답:

느린 이유: OFFSET 100000 LIMIT 20은 100,020행을 읽고 앞의 100,000행을 버립니다. OFFSET이 클수록 더 많은 행을 읽고 버려야 합니다.

대안: 키셋 페이지네이션 (Cursor-based Pagination)

-- 기존 (느림)
SELECT * FROM orders ORDER BY id LIMIT 20 OFFSET 100000;

-- 키셋 (빠름)
SELECT * FROM orders WHERE id > 100000 ORDER BY id LIMIT 20;

키셋 페이지네이션의 장점:

  • OFFSET에 관계없이 일정한 성능
  • 인덱스를 효율적으로 사용
  • 동시 삽입/삭제에도 일관된 결과

단점:

  • 특정 페이지로 직접 이동 불가
  • 정렬 기준 컬럼에 인덱스 필요

13. 참고 자료

  1. PostgreSQL Official Documentation - Performance Tips
  2. MySQL Performance Tuning Guide
  3. Use The Index, Luke
  4. HikariCP Wiki - About Pool Sizing
  5. PgBouncer Documentation
  6. Postgres EXPLAIN Visualizer (Dalibo)
  7. pg_stat_statements Documentation
  8. Percona Toolkit Documentation
  9. pgMustard - EXPLAIN ANALYZE Insights
  10. MySQL Performance Blog (Percona)
  11. PostgreSQL Wiki - Performance Optimization
  12. Citus Data Blog - PostgreSQL Tips
  13. Modern SQL Blog

Database Performance Tuning Complete Guide: Query Optimization, Index Strategy, Connection Pool Management

TL;DR

  • EXPLAIN ANALYZE: The essential tool for analyzing actual costs and row counts in query execution plans
  • Index Strategy: B-Tree (default), GIN (full-text search), BRIN (time-series), partial/expression indexes
  • Slow Query: Identify bottleneck queries with pg_stat_statements and slow_query_log
  • N+1 Problem: Solve with ORM Eager Loading, BatchSize, or JOIN
  • Connection Pool: Manage connections with HikariCP (Java) and PgBouncer (PostgreSQL)
  • Partitioning: Split large tables using Range/List/Hash
  • Caching: Distribute read load with Redis/Memcached

Table of Contents

  1. Why the DB Is the Bottleneck
  2. EXPLAIN ANALYZE Deep Dive
  3. Index Strategy
  4. Slow Query Analysis
  5. N+1 Query Problem
  6. Connection Pooling
  7. Query Optimization Patterns
  8. Partitioning
  9. Read Replicas and Caching
  10. PostgreSQL-Specific Optimization
  11. MySQL-Specific Optimization
  12. Practical Quiz
  13. References

1. Why the DB Is the Bottleneck

1.1 Common Performance Bottleneck Points

Over 80% of performance bottlenecks in web applications originate from the database.

Bottleneck in the request flow:
Client -> CDN -> Load Balancer -> App Server -> [Database]
                                                  ^
                                           Most latency here
Bottleneck TypeCauseImpact
Slow queriesMissing indexes, inefficient JOINsIncreased response time
Lock contentionConcurrent update conflictsReduced throughput
Connection exhaustionInsufficient pool sizeRequest queuing/failure
Excessive I/OFull table scansCPU/disk overload
Network round-tripsN+1 queriesUnnecessary RTT increase

1.2 Performance Tuning Priority

You should approach optimization in order of greatest impact.

Performance improvement pyramid:

        /\
       /  \        Query optimization (greatest impact)
      /____\
     /      \      Index strategy
    /________\
   /          \    Schema design
  /____________\
 /              \  Hardware/config tuning
/________________\ Caching / Read replicas

2. EXPLAIN ANALYZE Deep Dive

2.1 Basic Usage

-- PostgreSQL
EXPLAIN ANALYZE
SELECT u.name, o.total
FROM users u
JOIN orders o ON u.id = o.user_id
WHERE o.created_at > '2025-01-01'
  AND o.total > 100;

2.2 Reading the Execution Plan

QUERY PLAN
---------------------------------------------------------------------------
Nested Loop  (cost=0.85..892.45 rows=23 width=48)
             (actual time=0.045..2.341 rows=19 loops=1)
  ->  Index Scan using idx_orders_created_at on orders o
        (cost=0.42..445.23 rows=23 width=20)
        (actual time=0.025..1.123 rows=19 loops=1)
        Filter: (total > 100)
        Rows Removed by Filter: 5
  ->  Index Scan using users_pkey on users u
        (cost=0.42..8.44 rows=1 width=36)
        (actual time=0.008..0.008 rows=1 loops=19)
Planning Time: 0.234 ms
Execution Time: 2.456 ms

Key points for interpretation:

FieldMeaningNotes
cost=0.85..892.45Start to total estimated costUnits are arbitrary cost units
rows=23Estimated row countIf very different from actual rows, update statistics
actual timeActual elapsed time (ms)Multiply by loops for real time
loops=19Number of loop iterationsImportant in Nested Loops
Rows Removed by FilterRows discarded by filterIf high, index improvement needed

2.3 Understanding Scan Types

-- 1. Seq Scan - reads the entire table
-- Appropriate for small tables or when most rows need reading
EXPLAIN ANALYZE SELECT * FROM small_table;

-- 2. Index Scan - finds row location via index, then accesses table
-- Suitable for high-selectivity (few rows returned) queries
EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'user@example.com';

-- 3. Index Only Scan - returns results from index alone (no table access)
-- When using a Covering Index
EXPLAIN ANALYZE SELECT id, email FROM users WHERE email = 'user@example.com';

-- 4. Bitmap Index Scan - builds bitmap from index, then accesses table
-- Suitable for medium-selectivity queries
EXPLAIN ANALYZE SELECT * FROM orders WHERE status = 'pending' AND total > 100;
Scan type performance order (general):

Index Only Scan  >  Index Scan  >  Bitmap Scan  >  Seq Scan
(fastest)                                          (slowest)

2.4 EXPLAIN Options

-- PostgreSQL EXPLAIN options
EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON)
SELECT * FROM orders WHERE user_id = 42;

-- BUFFERS: Include I/O information
-- FORMAT JSON: Output in JSON format
-- VERBOSE: Include additional details
-- COSTS: Include cost estimates (ON by default)
-- TIMING: Include timing info (ON by default)
-- MySQL EXPLAIN options
EXPLAIN ANALYZE
SELECT * FROM orders WHERE user_id = 42;

-- MySQL 8.0+ supports tree format
EXPLAIN FORMAT=TREE
SELECT * FROM orders WHERE user_id = 42;

3. Index Strategy

3.1 B-Tree Index (Default)

B-Tree is the most common index type, suitable for equality and range searches.

-- Single column index
CREATE INDEX idx_users_email ON users(email);

-- Composite index (column order matters!)
CREATE INDEX idx_orders_user_date ON orders(user_id, created_at);

-- Composite index usage rules (Leftmost Prefix Rule)
-- idx_orders_user_date works for:
-- WHERE user_id = 1                          (O)
-- WHERE user_id = 1 AND created_at > '2025'  (O)
-- WHERE created_at > '2025'                  (X) missing leading column

3.2 Index Type Comparison

TypeUse CasePostgreSQLMySQL
B-TreeEquality, range, sortingDefaultDefault
HashEquality onlySupportedSupported
GINFull-text, arrays, JSONBSupportedNot supported
GiSTGeo/geometric, rangesSupportedNot supported
BRINTime-series, sequential dataSupportedNot supported
Full-TextFull-text searchSupportedSupported

3.3 GIN Index (Full-Text / JSONB)

-- Full-text search GIN index
CREATE INDEX idx_articles_search
ON articles USING GIN(to_tsvector('english', title || ' ' || body));

-- JSONB GIN index
CREATE INDEX idx_events_data ON events USING GIN(metadata);

-- JSONB specific path index
CREATE INDEX idx_events_type ON events USING GIN((metadata -> 'type'));

-- Usage examples
SELECT * FROM articles
WHERE to_tsvector('english', title || ' ' || body) @@ to_tsquery('postgresql & tuning');

SELECT * FROM events
WHERE metadata @> '{"type": "click", "page": "/home"}';

3.4 BRIN Index (Time-Series Data)

-- BRIN index - extremely efficient for time-series data
-- Over 100x smaller than B-Tree
CREATE INDEX idx_logs_created ON logs USING BRIN(created_at);

-- Effective when physical ordering strongly correlates
-- How to check:
SELECT correlation
FROM pg_stats
WHERE tablename = 'logs' AND attname = 'created_at';
-- correlation close to 1 means BRIN is suitable

3.5 Partial and Expression Indexes

-- Partial Index - indexes only rows matching a condition
CREATE INDEX idx_orders_pending
ON orders(created_at)
WHERE status = 'pending';
-- Only 5% of all rows indexed -> 95% size savings

-- Expression Index
CREATE INDEX idx_users_lower_email
ON users(LOWER(email));
-- Used by: WHERE LOWER(email) = 'user@example.com'

-- Covering Index
CREATE INDEX idx_orders_covering
ON orders(user_id, created_at) INCLUDE (total, status);
-- Enables Index Only Scan -> no table access needed

3.6 Index Design Checklist

Index Design Checklist:

Must verify:
  [x] Indexes on columns frequently used in WHERE clauses
  [x] Indexes on JOIN condition columns
  [x] ORDER BY columns included in indexes
  [x] Column order optimization in composite indexes
  [x] High-selectivity columns first

Cautions:
  [x] Write vs read performance trade-off
  [x] Remove unused indexes
  [x] Monitor index size
  [x] Manage index bloat (PostgreSQL)

4. Slow Query Analysis

4.1 PostgreSQL - pg_stat_statements

-- Enable pg_stat_statements (postgresql.conf)
-- shared_preload_libraries = 'pg_stat_statements'

-- Install extension
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

-- Top 10 slow queries (by total time)
SELECT
  substring(query, 1, 80) AS short_query,
  calls,
  round(total_exec_time::numeric, 2) AS total_ms,
  round(mean_exec_time::numeric, 2) AS mean_ms,
  round((100 * total_exec_time / sum(total_exec_time) OVER ())::numeric, 2) AS pct,
  rows
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;

-- Queries with highest average execution time
SELECT
  substring(query, 1, 100) AS short_query,
  calls,
  round(mean_exec_time::numeric, 2) AS mean_ms,
  rows / NULLIF(calls, 0) AS avg_rows
FROM pg_stat_statements
WHERE calls > 100
ORDER BY mean_exec_time DESC
LIMIT 10;

4.2 MySQL - Slow Query Log

# my.cnf - Slow Query Log configuration
[mysqld]
slow_query_log = 1
slow_query_log_file = /var/log/mysql/slow-query.log
long_query_time = 1.0             # Log queries over 1 second
log_queries_not_using_indexes = 1 # Log queries not using indexes
min_examined_row_limit = 100      # Minimum 100 rows examined
# Analyze slow queries with mysqldumpslow
mysqldumpslow -s t -t 10 /var/log/mysql/slow-query.log

# pt-query-digest (Percona Toolkit)
pt-query-digest /var/log/mysql/slow-query.log \
  --limit 10 \
  --order-by query_time:sum

4.3 Automated Query Monitoring

-- PostgreSQL: Check long-running queries in real time
SELECT
  pid,
  now() - pg_stat_activity.query_start AS duration,
  query,
  state
FROM pg_stat_activity
WHERE (now() - pg_stat_activity.query_start) > interval '5 seconds'
  AND state != 'idle'
ORDER BY duration DESC;

-- Check lock-waiting queries
SELECT
  blocked.pid AS blocked_pid,
  blocked.query AS blocked_query,
  blocking.pid AS blocking_pid,
  blocking.query AS blocking_query
FROM pg_stat_activity blocked
JOIN pg_locks bl ON bl.pid = blocked.pid
JOIN pg_locks kl ON kl.locktype = bl.locktype
  AND kl.database IS NOT DISTINCT FROM bl.database
  AND kl.relation IS NOT DISTINCT FROM bl.relation
JOIN pg_stat_activity blocking ON kl.pid = blocking.pid
WHERE NOT bl.granted AND kl.granted;

5. N+1 Query Problem

5.1 What Is the N+1 Problem?

# N+1 problem example (Python/SQLAlchemy)

# BAD: N+1 queries triggered
users = session.query(User).all()  # 1 query
for user in users:
    print(user.orders)  # N additional queries (1 per user)

# Actual SQL executed:
# SELECT * FROM users;                    -- 1
# SELECT * FROM orders WHERE user_id = 1;  -- +1
# SELECT * FROM orders WHERE user_id = 2;  -- +1
# SELECT * FROM orders WHERE user_id = 3;  -- +1
# ... (N users)

5.2 Solutions

# Solution 1: Eager Loading (JOIN)
users = session.query(User).options(
    joinedload(User.orders)
).all()
# SQL: SELECT * FROM users LEFT JOIN orders ON ...  -- 1 query

# Solution 2: Subquery Loading
users = session.query(User).options(
    subqueryload(User.orders)
).all()
# SQL: SELECT * FROM users;                         -- 1
# SQL: SELECT * FROM orders WHERE user_id IN (...);  -- 1

# Solution 3: selectinload (SQLAlchemy recommended)
users = session.query(User).options(
    selectinload(User.orders)
).all()
// Node.js (Prisma) - Solving N+1
// BAD: N+1
const users = await prisma.user.findMany();
for (const user of users) {
  const orders = await prisma.order.findMany({
    where: { userId: user.id }
  });
}

// GOOD: Solve with include
const users = await prisma.user.findMany({
  include: { orders: true }
});

// GOOD: Select only needed fields
const users = await prisma.user.findMany({
  select: {
    id: true,
    name: true,
    orders: {
      select: { id: true, total: true },
      where: { total: { gt: 100 } }
    }
  }
});
// Java (JPA/Hibernate)
// BAD: LAZY loading causes N+1
@Entity
public class User {
    @OneToMany(mappedBy = "user", fetch = FetchType.LAZY)
    private List<Order> orders;
}

// GOOD 1: JPQL JOIN FETCH
@Query("SELECT u FROM User u JOIN FETCH u.orders")
List<User> findAllWithOrders();

// GOOD 2: EntityGraph
@EntityGraph(attributePaths = {"orders"})
@Query("SELECT u FROM User u")
List<User> findAllWithOrders();

// GOOD 3: BatchSize (Hibernate)
@Entity
public class User {
    @OneToMany(mappedBy = "user")
    @BatchSize(size = 100)
    private List<Order> orders;
}

5.3 Detecting N+1

# Spring Boot - Hibernate N+1 detection
# application.yml
spring:
  jpa:
    properties:
      hibernate:
        generate_statistics: true
        session.events.log.LOG_QUERIES_SLOWER_THAN_MS: 25

logging:
  level:
    org.hibernate.SQL: DEBUG
    org.hibernate.stat: DEBUG
# Django - django-debug-toolbar or nplusone
# settings.py
INSTALLED_APPS = [
    'nplusone.ext.django',
]

NPLUSONE_RAISE = True  # Raise error on N+1 detection in dev

6. Connection Pooling

6.1 Why Connection Pools Are Needed

Creating DB connections is expensive: TCP connection + authentication + session initialization takes tens of milliseconds.

Without connection pool:
Request -> [Create new connection 30ms] -> [Query 5ms] -> [Close connection] = 35ms

With connection pool:
Request -> [Get existing connection from pool 0.1ms] -> [Query 5ms] -> [Return to pool] = 5.1ms

6.2 HikariCP Configuration (Java)

# application.yml - Optimal HikariCP settings
spring:
  datasource:
    hikari:
      # Pool size formula: connections = (core_count * 2) + spinning_disks
      # 4-core SSD server: (4 * 2) + 0 = 8
      maximum-pool-size: 10
      minimum-idle: 5

      # Connection timeout (max wait time from pool)
      connection-timeout: 30000   # 30 seconds

      # Idle connection removal time
      idle-timeout: 600000        # 10 minutes

      # Maximum connection lifetime
      max-lifetime: 1800000       # 30 minutes

      # Connection validation query
      connection-test-query: "SELECT 1"

      # Connection leak detection
      leak-detection-threshold: 60000  # 60 seconds

      # Pool name (for monitoring)
      pool-name: "MyApp-HikariPool"

6.3 PgBouncer Configuration (PostgreSQL)

# pgbouncer.ini
[databases]
myapp = host=localhost port=5432 dbname=myapp

[pgbouncer]
# Pooling mode
# session: Maintain session (default)
# transaction: Per-transaction (recommended)
# statement: Per-statement (limited)
pool_mode = transaction

# Pool size
default_pool_size = 20
min_pool_size = 5
reserve_pool_size = 5

# Connection limits
max_client_conn = 200
max_db_connections = 50

# Timeouts
server_idle_timeout = 600
client_idle_timeout = 300
query_timeout = 30
client_login_timeout = 60

# Authentication
auth_type = md5
auth_file = /etc/pgbouncer/userlist.txt

# Logging
log_connections = 1
log_disconnections = 1
log_pooler_errors = 1
stats_period = 60

# Admin interface
listen_addr = 0.0.0.0
listen_port = 6432
admin_users = pgbouncer

6.4 Connection Pool Monitoring

-- PgBouncer status check
SHOW pools;
SHOW stats;
SHOW servers;
SHOW clients;

-- PostgreSQL connection count
SELECT count(*) AS total_connections,
       count(*) FILTER (WHERE state = 'active') AS active,
       count(*) FILTER (WHERE state = 'idle') AS idle,
       count(*) FILTER (WHERE state = 'idle in transaction') AS idle_in_txn
FROM pg_stat_activity
WHERE backend_type = 'client backend';

-- Check maximum connections
SHOW max_connections;

7. Query Optimization Patterns

7.1 Top 10 Anti-Patterns

-- Anti-pattern 1: SELECT *
-- BAD
SELECT * FROM users WHERE id = 1;
-- GOOD
SELECT id, name, email FROM users WHERE id = 1;

-- Anti-pattern 2: Function usage invalidates index
-- BAD (cannot use index)
SELECT * FROM users WHERE YEAR(created_at) = 2025;
-- GOOD (can use index)
SELECT * FROM users
WHERE created_at BETWEEN '2025-01-01' AND '2025-12-31';

-- Anti-pattern 3: LIKE with leading wildcard
-- BAD (cannot use index)
SELECT * FROM products WHERE name LIKE '%phone%';
-- GOOD (use full-text search index)
SELECT * FROM products
WHERE to_tsvector('english', name) @@ to_tsquery('phone');

-- Anti-pattern 4: UNION ALL instead of OR
-- BAD (inefficient index use)
SELECT * FROM orders WHERE user_id = 1 OR status = 'pending';
-- GOOD
SELECT * FROM orders WHERE user_id = 1
UNION ALL
SELECT * FROM orders WHERE status = 'pending' AND user_id != 1;

-- Anti-pattern 5: Large IN clause
-- BAD
SELECT * FROM products WHERE id IN (1, 2, 3, ..., 10000);
-- GOOD (use temp table or ANY)
SELECT * FROM products WHERE id = ANY(ARRAY[1, 2, 3, ...]);
-- Or use temp table JOIN
-- Anti-pattern 6: Unnecessary DISTINCT
-- BAD
SELECT DISTINCT u.name FROM users u JOIN orders o ON u.id = o.user_id;
-- GOOD (use EXISTS)
SELECT u.name FROM users u WHERE EXISTS (
  SELECT 1 FROM orders o WHERE o.user_id = u.id
);

-- Anti-pattern 7: COUNT(*) vs EXISTS
-- BAD (counts all rows)
SELECT CASE WHEN COUNT(*) > 0 THEN true ELSE false END
FROM orders WHERE user_id = 1;
-- GOOD (checks only first row)
SELECT EXISTS(SELECT 1 FROM orders WHERE user_id = 1);

-- Anti-pattern 8: Implicit type conversion
-- BAD (string column compared with number -> index invalidated)
SELECT * FROM users WHERE phone = 01012345678;
-- GOOD
SELECT * FROM users WHERE phone = '01012345678';

-- Anti-pattern 9: OFFSET pagination
-- BAD (large OFFSET is slow)
SELECT * FROM orders ORDER BY id LIMIT 20 OFFSET 100000;
-- GOOD (keyset pagination)
SELECT * FROM orders WHERE id > 100000 ORDER BY id LIMIT 20;

-- Anti-pattern 10: Excessive subqueries
-- BAD
SELECT * FROM users WHERE id IN (
  SELECT user_id FROM orders WHERE total > (
    SELECT AVG(total) FROM orders
  )
);
-- GOOD (CTE or JOIN)
WITH avg_total AS (SELECT AVG(total) AS avg_val FROM orders)
SELECT DISTINCT u.*
FROM users u
JOIN orders o ON u.id = o.user_id
CROSS JOIN avg_total
WHERE o.total > avg_total.avg_val;

7.2 JOIN Optimization

-- JOIN order optimization (smaller table first)
-- PostgreSQL optimizes automatically, but when hints are needed:

-- Check table sizes
SELECT relname, reltuples::bigint AS row_count
FROM pg_class
WHERE relname IN ('users', 'orders', 'products')
ORDER BY reltuples DESC;

-- Use indexed columns in JOIN
-- GOOD
SELECT u.name, o.total
FROM orders o
JOIN users u ON u.id = o.user_id  -- users.id is PK (indexed)
WHERE o.status = 'completed';

-- Lateral Join usage (top N pattern)
SELECT u.name, recent_orders.*
FROM users u
CROSS JOIN LATERAL (
  SELECT id, total, created_at
  FROM orders
  WHERE user_id = u.id
  ORDER BY created_at DESC
  LIMIT 3
) recent_orders;

8. Partitioning

8.1 When Partitioning Is Needed

Consider partitioning when a table exceeds tens of millions of rows or when only specific ranges of data are frequently queried.

8.2 PostgreSQL Partitioning

-- Range partitioning (ideal for time-series data)
CREATE TABLE orders (
  id bigserial,
  user_id bigint NOT NULL,
  total decimal(10,2),
  status varchar(20),
  created_at timestamp NOT NULL
) PARTITION BY RANGE (created_at);

-- Create monthly partitions
CREATE TABLE orders_2025_01
  PARTITION OF orders
  FOR VALUES FROM ('2025-01-01') TO ('2025-02-01');

CREATE TABLE orders_2025_02
  PARTITION OF orders
  FOR VALUES FROM ('2025-02-01') TO ('2025-03-01');

-- Automatic partition creation (using pg_partman)
-- CREATE EXTENSION pg_partman;
SELECT partman.create_parent(
  'public.orders',
  'created_at',
  'native',
  'monthly'
);
-- List partitioning (split by category)
CREATE TABLE products (
  id bigserial,
  name varchar(255),
  category varchar(50),
  price decimal(10,2)
) PARTITION BY LIST (category);

CREATE TABLE products_electronics
  PARTITION OF products
  FOR VALUES IN ('electronics', 'computers', 'phones');

CREATE TABLE products_clothing
  PARTITION OF products
  FOR VALUES IN ('clothing', 'shoes', 'accessories');
-- Hash partitioning (even distribution)
CREATE TABLE sessions (
  id uuid,
  user_id bigint,
  data jsonb,
  expires_at timestamp
) PARTITION BY HASH (user_id);

CREATE TABLE sessions_p0
  PARTITION OF sessions
  FOR VALUES WITH (MODULUS 4, REMAINDER 0);

CREATE TABLE sessions_p1
  PARTITION OF sessions
  FOR VALUES WITH (MODULUS 4, REMAINDER 1);

CREATE TABLE sessions_p2
  PARTITION OF sessions
  FOR VALUES WITH (MODULUS 4, REMAINDER 2);

CREATE TABLE sessions_p3
  PARTITION OF sessions
  FOR VALUES WITH (MODULUS 4, REMAINDER 3);

8.3 MySQL Partitioning

-- MySQL Range partitioning
CREATE TABLE orders (
  id BIGINT AUTO_INCREMENT,
  user_id BIGINT NOT NULL,
  total DECIMAL(10,2),
  created_at DATETIME NOT NULL,
  PRIMARY KEY (id, created_at)
) PARTITION BY RANGE (YEAR(created_at)) (
  PARTITION p2023 VALUES LESS THAN (2024),
  PARTITION p2024 VALUES LESS THAN (2025),
  PARTITION p2025 VALUES LESS THAN (2026),
  PARTITION p_future VALUES LESS THAN MAXVALUE
);

9. Read Replicas and Caching

9.1 Read Replica Strategy

Write/Read split architecture:

       Write requests         Read requests
          |                       |
     [Primary]     ->      [Replica 1]
          |                 [Replica 2]
          |                 [Replica 3]
     Async replication
# Spring Boot - Read/Write split
spring:
  datasource:
    primary:
      url: jdbc:postgresql://primary:5432/myapp
      username: app_write
    replica:
      url: jdbc:postgresql://replica:5432/myapp
      username: app_read
# Django - Read/Write router
# settings.py
DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.postgresql',
        'HOST': 'primary.db.example.com',
        'NAME': 'myapp',
    },
    'replica': {
        'ENGINE': 'django.db.backends.postgresql',
        'HOST': 'replica.db.example.com',
        'NAME': 'myapp',
    }
}

DATABASE_ROUTERS = ['myapp.routers.ReadReplicaRouter']

# routers.py
class ReadReplicaRouter:
    def db_for_read(self, model, **hints):
        return 'replica'

    def db_for_write(self, model, **hints):
        return 'default'

9.2 Query Caching Strategy

# Redis caching pattern
import redis
import json
from functools import wraps

r = redis.Redis(host='localhost', port=6379, db=0)

# Cache-Aside (Lazy Loading) pattern
def cache_aside(key_prefix, ttl=3600):
    def decorator(func):
        @wraps(func)
        def wrapper(*args, **kwargs):
            cache_key = f"{key_prefix}:{args}:{kwargs}"

            # 1. Look up in cache
            cached = r.get(cache_key)
            if cached:
                return json.loads(cached)

            # 2. Query from DB
            result = func(*args, **kwargs)

            # 3. Store in cache
            r.setex(cache_key, ttl, json.dumps(result))

            return result
        return wrapper
    return decorator

@cache_aside("user", ttl=1800)
def get_user_profile(user_id):
    return db.query("SELECT * FROM users WHERE id = %s", [user_id])
# Write-Through pattern
def update_user(user_id, data):
    # 1. Update DB
    db.execute("UPDATE users SET name=%s WHERE id=%s", [data['name'], user_id])

    # 2. Update cache (or delete)
    cache_key = f"user:{user_id}"
    r.delete(cache_key)  # Cache Invalidation
    # or
    r.setex(cache_key, 1800, json.dumps(data))  # Cache Update
Caching strategy comparison:

Cache-Aside:
  Pros: Most common, flexible
  Cons: First request is a cache miss

Write-Through:
  Pros: Cache is always up-to-date
  Cons: Increased write latency

Write-Behind (Write-Back):
  Pros: Best write performance
  Cons: Risk of data loss

Read-Through:
  Pros: Cache logic is separated
  Cons: Implementation complexity

10. PostgreSQL-Specific Optimization

10.1 VACUUM and ANALYZE

PostgreSQL's MVCC model does not immediately remove previous rows on update/delete. VACUUM cleans up these dead tuples.

-- Manual VACUUM
VACUUM (VERBOSE) orders;

-- VACUUM FULL (table rewrite - causes lock!)
VACUUM FULL orders;

-- Update statistics
ANALYZE orders;

-- VACUUM + ANALYZE simultaneously
VACUUM ANALYZE orders;
# postgresql.conf - autovacuum settings
autovacuum = on
autovacuum_max_workers = 3
autovacuum_naptime = 60

# Run autovacuum when 20% of table is changed
autovacuum_vacuum_scale_factor = 0.2
autovacuum_vacuum_threshold = 50

# Run autoanalyze when 10% of table is changed
autovacuum_analyze_scale_factor = 0.1
autovacuum_analyze_threshold = 50

# autovacuum speed control (I/O load)
autovacuum_vacuum_cost_delay = 2ms
autovacuum_vacuum_cost_limit = 200

10.2 HOT Updates

HOT (Heap-Only Tuple) Updates is an optimization that updates only the table without updating indexes.

-- HOT Update conditions:
-- 1. Updated column has no index
-- 2. New row can be stored in the same page

-- Check HOT ratio
SELECT
  relname,
  n_tup_upd,
  n_tup_hot_upd,
  round(100.0 * n_tup_hot_upd / NULLIF(n_tup_upd, 0), 2) AS hot_pct
FROM pg_stat_user_tables
WHERE n_tup_upd > 0
ORDER BY n_tup_upd DESC;

-- To increase HOT ratio:
-- 1. Minimize indexes on frequently updated columns
-- 2. Lower fillfactor to reserve space on the same page
ALTER TABLE orders SET (fillfactor = 80);

10.3 Using pg_stat Views

-- Per-table I/O statistics
SELECT
  relname,
  seq_scan,
  idx_scan,
  n_tup_ins,
  n_tup_upd,
  n_tup_del,
  n_live_tup,
  n_dead_tup
FROM pg_stat_user_tables
ORDER BY seq_scan DESC;

-- Index usage rates
SELECT
  indexrelname,
  idx_scan,
  idx_tup_read,
  idx_tup_fetch,
  pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
ORDER BY idx_scan;

-- Unused indexes (candidates for removal)
SELECT
  indexrelname,
  idx_scan,
  pg_size_pretty(pg_relation_size(indexrelid)) AS size
FROM pg_stat_user_indexes
WHERE idx_scan = 0
  AND indexrelname NOT LIKE '%_pkey'
ORDER BY pg_relation_size(indexrelid) DESC;

-- Cache hit ratio
SELECT
  sum(heap_blks_read) AS heap_read,
  sum(heap_blks_hit) AS heap_hit,
  round(100.0 * sum(heap_blks_hit) / NULLIF(sum(heap_blks_hit) + sum(heap_blks_read), 0), 2) AS hit_pct
FROM pg_statio_user_tables;
-- 99% or above is ideal

11. MySQL-Specific Optimization

11.1 InnoDB Buffer Pool

# my.cnf - InnoDB optimization
[mysqld]
# Buffer Pool size (70-80% of total memory)
innodb_buffer_pool_size = 8G

# Buffer Pool instances (split when 8G or more)
innodb_buffer_pool_instances = 8

# Log file size
innodb_log_file_size = 1G
innodb_log_buffer_size = 64M

# I/O threads
innodb_read_io_threads = 4
innodb_write_io_threads = 4

# Concurrency
innodb_thread_concurrency = 0  # Automatic

# Flush method
innodb_flush_method = O_DIRECT
innodb_flush_log_at_trx_commit = 1  # Safe (2: performance priority)

# Temporary tables
tmp_table_size = 256M
max_heap_table_size = 256M
-- Buffer Pool status check
SHOW ENGINE INNODB STATUS\G

-- Buffer Pool hit ratio
SELECT
  (1 - (
    (SELECT VARIABLE_VALUE FROM performance_schema.global_status
     WHERE VARIABLE_NAME = 'Innodb_buffer_pool_reads') /
    (SELECT VARIABLE_VALUE FROM performance_schema.global_status
     WHERE VARIABLE_NAME = 'Innodb_buffer_pool_read_requests')
  )) * 100 AS buffer_pool_hit_ratio;
-- 99% or above is ideal

11.2 MySQL 8.0 Query Cache Removal

MySQL 8.0 removed the Query Cache. Use these alternatives instead:

MySQL 8.0 Query Cache alternatives:

1. Application-level caching (Redis/Memcached)
2. ProxySQL query cache
3. MySQL Router caching
4. InnoDB Buffer Pool optimization
-- MySQL 8.0 Performance Schema usage
-- Slow query analysis
SELECT
  DIGEST_TEXT,
  COUNT_STAR AS exec_count,
  ROUND(SUM_TIMER_WAIT/1000000000, 2) AS total_ms,
  ROUND(AVG_TIMER_WAIT/1000000000, 2) AS avg_ms,
  SUM_ROWS_EXAMINED,
  SUM_ROWS_SENT
FROM performance_schema.events_statements_summary_by_digest
ORDER BY SUM_TIMER_WAIT DESC
LIMIT 10;

11.3 MySQL Index Hints

-- Index hints (when optimizer makes wrong choices)
SELECT * FROM orders USE INDEX (idx_orders_user_date)
WHERE user_id = 1 AND created_at > '2025-01-01';

SELECT * FROM orders FORCE INDEX (idx_orders_status)
WHERE status = 'pending';

-- MySQL 8.0 optimizer hints
SELECT /*+ NO_INDEX(orders idx_orders_status) */
  * FROM orders WHERE user_id = 1;

SELECT /*+ JOIN_ORDER(users, orders) */
  u.name, o.total
FROM users u JOIN orders o ON u.id = o.user_id;

12. Practical Quiz

Q1: What should you do when actual rows and estimated rows are vastly different in EXPLAIN ANALYZE?

Answer:

Statistics are likely outdated. Take the following actions:

  1. Run ANALYZE: Refresh statistics with ANALYZE table_name;
  2. Increase statistics sample size: Raise ALTER TABLE SET (n_distinct = ...) or default_statistics_target values.
  3. Check indexes: Verify appropriate indexes exist.
  4. Check autovacuum settings: Confirm autovacuum_analyze_scale_factor is appropriate.

If estimated rows are far lower than actual, the planner may choose Nested Loop resulting in degraded performance. If far higher, it may choose an unnecessary Seq Scan.

Q2: Why does column order matter in composite indexes?

Answer:

Composite indexes follow the Leftmost Prefix Rule.

For example, with INDEX(a, b, c):

  • WHERE a = 1 - Can use index
  • WHERE a = 1 AND b = 2 - Can use index
  • WHERE a = 1 AND b = 2 AND c = 3 - Can use index
  • WHERE b = 2 - Cannot use index (missing leading column a)
  • WHERE a = 1 AND c = 3 - Only a uses index, c is filtered

Column order decision criteria:

  1. Equality condition columns first
  2. Range condition columns later
  3. Higher selectivity columns first
Q3: How do you detect N+1 query problems?

Answer:

  1. Log analysis: Check if the same query pattern repeats in SQL query logs
  2. ORM tools: Django nplusone, Rails bullet gem, Spring hibernate.generate_statistics
  3. APM tools: Analyze query patterns with Datadog, New Relic, etc.
  4. pg_stat_statements: Check if calls count is abnormally high for the same query
  5. Development environment: Add query count assertions in tests to verify expected query counts

Solutions: Apply Eager Loading (JOIN), Batch Loading (IN clause), or the DataLoader pattern.

Q4: How do you determine connection pool size?

Answer:

HikariCP formula: connections = (core_count * 2) + effective_spindle_count

  • core_count: Number of CPU cores
  • effective_spindle_count: 0 for SSD, number of disks for HDD

Example: 4-core SSD server = (4 * 2) + 0 = 8 connections

Additional considerations:

  • 10-20 connections is sufficient for most applications
  • Increasing pool size indefinitely actually degrades performance (context switching, memory)
  • Consider PostgreSQL max_connections and the number of connected applications
  • Using an external pooler like PgBouncer supports more clients
Q5: Why is OFFSET-based pagination slow and what are the alternatives?

Answer:

Why it is slow: OFFSET 100000 LIMIT 20 reads 100,020 rows and discards the first 100,000. The larger the OFFSET, the more rows must be read and discarded.

Alternative: Keyset Pagination (Cursor-based Pagination)

-- Traditional (slow)
SELECT * FROM orders ORDER BY id LIMIT 20 OFFSET 100000;

-- Keyset (fast)
SELECT * FROM orders WHERE id > 100000 ORDER BY id LIMIT 20;

Benefits of keyset pagination:

  • Consistent performance regardless of OFFSET
  • Efficient index usage
  • Consistent results even with concurrent inserts/deletes

Drawbacks:

  • Cannot jump directly to a specific page
  • Requires index on the sort column

13. References

  1. PostgreSQL Official Documentation - Performance Tips
  2. MySQL Performance Tuning Guide
  3. Use The Index, Luke
  4. HikariCP Wiki - About Pool Sizing
  5. PgBouncer Documentation
  6. Postgres EXPLAIN Visualizer (Dalibo)
  7. pg_stat_statements Documentation
  8. Percona Toolkit Documentation
  9. pgMustard - EXPLAIN ANALYZE Insights
  10. MySQL Performance Blog (Percona)
  11. PostgreSQL Wiki - Performance Optimization
  12. Citus Data Blog - PostgreSQL Tips
  13. Modern SQL Blog