- Published on
Database Performance Tuning Complete Guide: Query Optimization, Index Strategy, Connection Pool Management
- Authors

- Name
- Youngju Kim
- @fjvbn20031
TL;DR
- EXPLAIN ANALYZE: The essential tool for analyzing actual costs and row counts in query execution plans
- Index Strategy: B-Tree (default), GIN (full-text search), BRIN (time-series), partial/expression indexes
- Slow Query: Identify bottleneck queries with pg_stat_statements and slow_query_log
- N+1 Problem: Solve with ORM Eager Loading, BatchSize, or JOIN
- Connection Pool: Manage connections with HikariCP (Java) and PgBouncer (PostgreSQL)
- Partitioning: Split large tables using Range/List/Hash
- Caching: Distribute read load with Redis/Memcached
Table of Contents
- Why the DB Is the Bottleneck
- EXPLAIN ANALYZE Deep Dive
- Index Strategy
- Slow Query Analysis
- N+1 Query Problem
- Connection Pooling
- Query Optimization Patterns
- Partitioning
- Read Replicas and Caching
- PostgreSQL-Specific Optimization
- MySQL-Specific Optimization
- Practical Quiz
- References
1. Why the DB Is the Bottleneck
1.1 Common Performance Bottleneck Points
Over 80% of performance bottlenecks in web applications originate from the database.
Bottleneck in the request flow:
Client -> CDN -> Load Balancer -> App Server -> [Database]
^
Most latency here
| Bottleneck Type | Cause | Impact |
|---|---|---|
| Slow queries | Missing indexes, inefficient JOINs | Increased response time |
| Lock contention | Concurrent update conflicts | Reduced throughput |
| Connection exhaustion | Insufficient pool size | Request queuing/failure |
| Excessive I/O | Full table scans | CPU/disk overload |
| Network round-trips | N+1 queries | Unnecessary RTT increase |
1.2 Performance Tuning Priority
You should approach optimization in order of greatest impact.
Performance improvement pyramid:
/\
/ \ Query optimization (greatest impact)
/____\
/ \ Index strategy
/________\
/ \ Schema design
/____________\
/ \ Hardware/config tuning
/________________\ Caching / Read replicas
2. EXPLAIN ANALYZE Deep Dive
2.1 Basic Usage
-- PostgreSQL
EXPLAIN ANALYZE
SELECT u.name, o.total
FROM users u
JOIN orders o ON u.id = o.user_id
WHERE o.created_at > '2025-01-01'
AND o.total > 100;
2.2 Reading the Execution Plan
QUERY PLAN
---------------------------------------------------------------------------
Nested Loop (cost=0.85..892.45 rows=23 width=48)
(actual time=0.045..2.341 rows=19 loops=1)
-> Index Scan using idx_orders_created_at on orders o
(cost=0.42..445.23 rows=23 width=20)
(actual time=0.025..1.123 rows=19 loops=1)
Filter: (total > 100)
Rows Removed by Filter: 5
-> Index Scan using users_pkey on users u
(cost=0.42..8.44 rows=1 width=36)
(actual time=0.008..0.008 rows=1 loops=19)
Planning Time: 0.234 ms
Execution Time: 2.456 ms
Key points for interpretation:
| Field | Meaning | Notes |
|---|---|---|
| cost=0.85..892.45 | Start to total estimated cost | Units are arbitrary cost units |
| rows=23 | Estimated row count | If very different from actual rows, update statistics |
| actual time | Actual elapsed time (ms) | Multiply by loops for real time |
| loops=19 | Number of loop iterations | Important in Nested Loops |
| Rows Removed by Filter | Rows discarded by filter | If high, index improvement needed |
2.3 Understanding Scan Types
-- 1. Seq Scan - reads the entire table
-- Appropriate for small tables or when most rows need reading
EXPLAIN ANALYZE SELECT * FROM small_table;
-- 2. Index Scan - finds row location via index, then accesses table
-- Suitable for high-selectivity (few rows returned) queries
EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'user@example.com';
-- 3. Index Only Scan - returns results from index alone (no table access)
-- When using a Covering Index
EXPLAIN ANALYZE SELECT id, email FROM users WHERE email = 'user@example.com';
-- 4. Bitmap Index Scan - builds bitmap from index, then accesses table
-- Suitable for medium-selectivity queries
EXPLAIN ANALYZE SELECT * FROM orders WHERE status = 'pending' AND total > 100;
Scan type performance order (general):
Index Only Scan > Index Scan > Bitmap Scan > Seq Scan
(fastest) (slowest)
2.4 EXPLAIN Options
-- PostgreSQL EXPLAIN options
EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON)
SELECT * FROM orders WHERE user_id = 42;
-- BUFFERS: Include I/O information
-- FORMAT JSON: Output in JSON format
-- VERBOSE: Include additional details
-- COSTS: Include cost estimates (ON by default)
-- TIMING: Include timing info (ON by default)
-- MySQL EXPLAIN options
EXPLAIN ANALYZE
SELECT * FROM orders WHERE user_id = 42;
-- MySQL 8.0+ supports tree format
EXPLAIN FORMAT=TREE
SELECT * FROM orders WHERE user_id = 42;
3. Index Strategy
3.1 B-Tree Index (Default)
B-Tree is the most common index type, suitable for equality and range searches.
-- Single column index
CREATE INDEX idx_users_email ON users(email);
-- Composite index (column order matters!)
CREATE INDEX idx_orders_user_date ON orders(user_id, created_at);
-- Composite index usage rules (Leftmost Prefix Rule)
-- idx_orders_user_date works for:
-- WHERE user_id = 1 (O)
-- WHERE user_id = 1 AND created_at > '2025' (O)
-- WHERE created_at > '2025' (X) missing leading column
3.2 Index Type Comparison
| Type | Use Case | PostgreSQL | MySQL |
|---|---|---|---|
| B-Tree | Equality, range, sorting | Default | Default |
| Hash | Equality only | Supported | Supported |
| GIN | Full-text, arrays, JSONB | Supported | Not supported |
| GiST | Geo/geometric, ranges | Supported | Not supported |
| BRIN | Time-series, sequential data | Supported | Not supported |
| Full-Text | Full-text search | Supported | Supported |
3.3 GIN Index (Full-Text / JSONB)
-- Full-text search GIN index
CREATE INDEX idx_articles_search
ON articles USING GIN(to_tsvector('english', title || ' ' || body));
-- JSONB GIN index
CREATE INDEX idx_events_data ON events USING GIN(metadata);
-- JSONB specific path index
CREATE INDEX idx_events_type ON events USING GIN((metadata -> 'type'));
-- Usage examples
SELECT * FROM articles
WHERE to_tsvector('english', title || ' ' || body) @@ to_tsquery('postgresql & tuning');
SELECT * FROM events
WHERE metadata @> '{"type": "click", "page": "/home"}';
3.4 BRIN Index (Time-Series Data)
-- BRIN index - extremely efficient for time-series data
-- Over 100x smaller than B-Tree
CREATE INDEX idx_logs_created ON logs USING BRIN(created_at);
-- Effective when physical ordering strongly correlates
-- How to check:
SELECT correlation
FROM pg_stats
WHERE tablename = 'logs' AND attname = 'created_at';
-- correlation close to 1 means BRIN is suitable
3.5 Partial and Expression Indexes
-- Partial Index - indexes only rows matching a condition
CREATE INDEX idx_orders_pending
ON orders(created_at)
WHERE status = 'pending';
-- Only 5% of all rows indexed -> 95% size savings
-- Expression Index
CREATE INDEX idx_users_lower_email
ON users(LOWER(email));
-- Used by: WHERE LOWER(email) = 'user@example.com'
-- Covering Index
CREATE INDEX idx_orders_covering
ON orders(user_id, created_at) INCLUDE (total, status);
-- Enables Index Only Scan -> no table access needed
3.6 Index Design Checklist
Index Design Checklist:
Must verify:
[x] Indexes on columns frequently used in WHERE clauses
[x] Indexes on JOIN condition columns
[x] ORDER BY columns included in indexes
[x] Column order optimization in composite indexes
[x] High-selectivity columns first
Cautions:
[x] Write vs read performance trade-off
[x] Remove unused indexes
[x] Monitor index size
[x] Manage index bloat (PostgreSQL)
4. Slow Query Analysis
4.1 PostgreSQL - pg_stat_statements
-- Enable pg_stat_statements (postgresql.conf)
-- shared_preload_libraries = 'pg_stat_statements'
-- Install extension
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
-- Top 10 slow queries (by total time)
SELECT
substring(query, 1, 80) AS short_query,
calls,
round(total_exec_time::numeric, 2) AS total_ms,
round(mean_exec_time::numeric, 2) AS mean_ms,
round((100 * total_exec_time / sum(total_exec_time) OVER ())::numeric, 2) AS pct,
rows
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;
-- Queries with highest average execution time
SELECT
substring(query, 1, 100) AS short_query,
calls,
round(mean_exec_time::numeric, 2) AS mean_ms,
rows / NULLIF(calls, 0) AS avg_rows
FROM pg_stat_statements
WHERE calls > 100
ORDER BY mean_exec_time DESC
LIMIT 10;
4.2 MySQL - Slow Query Log
# my.cnf - Slow Query Log configuration
[mysqld]
slow_query_log = 1
slow_query_log_file = /var/log/mysql/slow-query.log
long_query_time = 1.0 # Log queries over 1 second
log_queries_not_using_indexes = 1 # Log queries not using indexes
min_examined_row_limit = 100 # Minimum 100 rows examined
# Analyze slow queries with mysqldumpslow
mysqldumpslow -s t -t 10 /var/log/mysql/slow-query.log
# pt-query-digest (Percona Toolkit)
pt-query-digest /var/log/mysql/slow-query.log \
--limit 10 \
--order-by query_time:sum
4.3 Automated Query Monitoring
-- PostgreSQL: Check long-running queries in real time
SELECT
pid,
now() - pg_stat_activity.query_start AS duration,
query,
state
FROM pg_stat_activity
WHERE (now() - pg_stat_activity.query_start) > interval '5 seconds'
AND state != 'idle'
ORDER BY duration DESC;
-- Check lock-waiting queries
SELECT
blocked.pid AS blocked_pid,
blocked.query AS blocked_query,
blocking.pid AS blocking_pid,
blocking.query AS blocking_query
FROM pg_stat_activity blocked
JOIN pg_locks bl ON bl.pid = blocked.pid
JOIN pg_locks kl ON kl.locktype = bl.locktype
AND kl.database IS NOT DISTINCT FROM bl.database
AND kl.relation IS NOT DISTINCT FROM bl.relation
JOIN pg_stat_activity blocking ON kl.pid = blocking.pid
WHERE NOT bl.granted AND kl.granted;
5. N+1 Query Problem
5.1 What Is the N+1 Problem?
# N+1 problem example (Python/SQLAlchemy)
# BAD: N+1 queries triggered
users = session.query(User).all() # 1 query
for user in users:
print(user.orders) # N additional queries (1 per user)
# Actual SQL executed:
# SELECT * FROM users; -- 1
# SELECT * FROM orders WHERE user_id = 1; -- +1
# SELECT * FROM orders WHERE user_id = 2; -- +1
# SELECT * FROM orders WHERE user_id = 3; -- +1
# ... (N users)
5.2 Solutions
# Solution 1: Eager Loading (JOIN)
users = session.query(User).options(
joinedload(User.orders)
).all()
# SQL: SELECT * FROM users LEFT JOIN orders ON ... -- 1 query
# Solution 2: Subquery Loading
users = session.query(User).options(
subqueryload(User.orders)
).all()
# SQL: SELECT * FROM users; -- 1
# SQL: SELECT * FROM orders WHERE user_id IN (...); -- 1
# Solution 3: selectinload (SQLAlchemy recommended)
users = session.query(User).options(
selectinload(User.orders)
).all()
// Node.js (Prisma) - Solving N+1
// BAD: N+1
const users = await prisma.user.findMany();
for (const user of users) {
const orders = await prisma.order.findMany({
where: { userId: user.id }
});
}
// GOOD: Solve with include
const users = await prisma.user.findMany({
include: { orders: true }
});
// GOOD: Select only needed fields
const users = await prisma.user.findMany({
select: {
id: true,
name: true,
orders: {
select: { id: true, total: true },
where: { total: { gt: 100 } }
}
}
});
// Java (JPA/Hibernate)
// BAD: LAZY loading causes N+1
@Entity
public class User {
@OneToMany(mappedBy = "user", fetch = FetchType.LAZY)
private List<Order> orders;
}
// GOOD 1: JPQL JOIN FETCH
@Query("SELECT u FROM User u JOIN FETCH u.orders")
List<User> findAllWithOrders();
// GOOD 2: EntityGraph
@EntityGraph(attributePaths = {"orders"})
@Query("SELECT u FROM User u")
List<User> findAllWithOrders();
// GOOD 3: BatchSize (Hibernate)
@Entity
public class User {
@OneToMany(mappedBy = "user")
@BatchSize(size = 100)
private List<Order> orders;
}
5.3 Detecting N+1
# Spring Boot - Hibernate N+1 detection
# application.yml
spring:
jpa:
properties:
hibernate:
generate_statistics: true
session.events.log.LOG_QUERIES_SLOWER_THAN_MS: 25
logging:
level:
org.hibernate.SQL: DEBUG
org.hibernate.stat: DEBUG
# Django - django-debug-toolbar or nplusone
# settings.py
INSTALLED_APPS = [
'nplusone.ext.django',
]
NPLUSONE_RAISE = True # Raise error on N+1 detection in dev
6. Connection Pooling
6.1 Why Connection Pools Are Needed
Creating DB connections is expensive: TCP connection + authentication + session initialization takes tens of milliseconds.
Without connection pool:
Request -> [Create new connection 30ms] -> [Query 5ms] -> [Close connection] = 35ms
With connection pool:
Request -> [Get existing connection from pool 0.1ms] -> [Query 5ms] -> [Return to pool] = 5.1ms
6.2 HikariCP Configuration (Java)
# application.yml - Optimal HikariCP settings
spring:
datasource:
hikari:
# Pool size formula: connections = (core_count * 2) + spinning_disks
# 4-core SSD server: (4 * 2) + 0 = 8
maximum-pool-size: 10
minimum-idle: 5
# Connection timeout (max wait time from pool)
connection-timeout: 30000 # 30 seconds
# Idle connection removal time
idle-timeout: 600000 # 10 minutes
# Maximum connection lifetime
max-lifetime: 1800000 # 30 minutes
# Connection validation query
connection-test-query: "SELECT 1"
# Connection leak detection
leak-detection-threshold: 60000 # 60 seconds
# Pool name (for monitoring)
pool-name: "MyApp-HikariPool"
6.3 PgBouncer Configuration (PostgreSQL)
# pgbouncer.ini
[databases]
myapp = host=localhost port=5432 dbname=myapp
[pgbouncer]
# Pooling mode
# session: Maintain session (default)
# transaction: Per-transaction (recommended)
# statement: Per-statement (limited)
pool_mode = transaction
# Pool size
default_pool_size = 20
min_pool_size = 5
reserve_pool_size = 5
# Connection limits
max_client_conn = 200
max_db_connections = 50
# Timeouts
server_idle_timeout = 600
client_idle_timeout = 300
query_timeout = 30
client_login_timeout = 60
# Authentication
auth_type = md5
auth_file = /etc/pgbouncer/userlist.txt
# Logging
log_connections = 1
log_disconnections = 1
log_pooler_errors = 1
stats_period = 60
# Admin interface
listen_addr = 0.0.0.0
listen_port = 6432
admin_users = pgbouncer
6.4 Connection Pool Monitoring
-- PgBouncer status check
SHOW pools;
SHOW stats;
SHOW servers;
SHOW clients;
-- PostgreSQL connection count
SELECT count(*) AS total_connections,
count(*) FILTER (WHERE state = 'active') AS active,
count(*) FILTER (WHERE state = 'idle') AS idle,
count(*) FILTER (WHERE state = 'idle in transaction') AS idle_in_txn
FROM pg_stat_activity
WHERE backend_type = 'client backend';
-- Check maximum connections
SHOW max_connections;
7. Query Optimization Patterns
7.1 Top 10 Anti-Patterns
-- Anti-pattern 1: SELECT *
-- BAD
SELECT * FROM users WHERE id = 1;
-- GOOD
SELECT id, name, email FROM users WHERE id = 1;
-- Anti-pattern 2: Function usage invalidates index
-- BAD (cannot use index)
SELECT * FROM users WHERE YEAR(created_at) = 2025;
-- GOOD (can use index)
SELECT * FROM users
WHERE created_at BETWEEN '2025-01-01' AND '2025-12-31';
-- Anti-pattern 3: LIKE with leading wildcard
-- BAD (cannot use index)
SELECT * FROM products WHERE name LIKE '%phone%';
-- GOOD (use full-text search index)
SELECT * FROM products
WHERE to_tsvector('english', name) @@ to_tsquery('phone');
-- Anti-pattern 4: UNION ALL instead of OR
-- BAD (inefficient index use)
SELECT * FROM orders WHERE user_id = 1 OR status = 'pending';
-- GOOD
SELECT * FROM orders WHERE user_id = 1
UNION ALL
SELECT * FROM orders WHERE status = 'pending' AND user_id != 1;
-- Anti-pattern 5: Large IN clause
-- BAD
SELECT * FROM products WHERE id IN (1, 2, 3, ..., 10000);
-- GOOD (use temp table or ANY)
SELECT * FROM products WHERE id = ANY(ARRAY[1, 2, 3, ...]);
-- Or use temp table JOIN
-- Anti-pattern 6: Unnecessary DISTINCT
-- BAD
SELECT DISTINCT u.name FROM users u JOIN orders o ON u.id = o.user_id;
-- GOOD (use EXISTS)
SELECT u.name FROM users u WHERE EXISTS (
SELECT 1 FROM orders o WHERE o.user_id = u.id
);
-- Anti-pattern 7: COUNT(*) vs EXISTS
-- BAD (counts all rows)
SELECT CASE WHEN COUNT(*) > 0 THEN true ELSE false END
FROM orders WHERE user_id = 1;
-- GOOD (checks only first row)
SELECT EXISTS(SELECT 1 FROM orders WHERE user_id = 1);
-- Anti-pattern 8: Implicit type conversion
-- BAD (string column compared with number -> index invalidated)
SELECT * FROM users WHERE phone = 01012345678;
-- GOOD
SELECT * FROM users WHERE phone = '01012345678';
-- Anti-pattern 9: OFFSET pagination
-- BAD (large OFFSET is slow)
SELECT * FROM orders ORDER BY id LIMIT 20 OFFSET 100000;
-- GOOD (keyset pagination)
SELECT * FROM orders WHERE id > 100000 ORDER BY id LIMIT 20;
-- Anti-pattern 10: Excessive subqueries
-- BAD
SELECT * FROM users WHERE id IN (
SELECT user_id FROM orders WHERE total > (
SELECT AVG(total) FROM orders
)
);
-- GOOD (CTE or JOIN)
WITH avg_total AS (SELECT AVG(total) AS avg_val FROM orders)
SELECT DISTINCT u.*
FROM users u
JOIN orders o ON u.id = o.user_id
CROSS JOIN avg_total
WHERE o.total > avg_total.avg_val;
7.2 JOIN Optimization
-- JOIN order optimization (smaller table first)
-- PostgreSQL optimizes automatically, but when hints are needed:
-- Check table sizes
SELECT relname, reltuples::bigint AS row_count
FROM pg_class
WHERE relname IN ('users', 'orders', 'products')
ORDER BY reltuples DESC;
-- Use indexed columns in JOIN
-- GOOD
SELECT u.name, o.total
FROM orders o
JOIN users u ON u.id = o.user_id -- users.id is PK (indexed)
WHERE o.status = 'completed';
-- Lateral Join usage (top N pattern)
SELECT u.name, recent_orders.*
FROM users u
CROSS JOIN LATERAL (
SELECT id, total, created_at
FROM orders
WHERE user_id = u.id
ORDER BY created_at DESC
LIMIT 3
) recent_orders;
8. Partitioning
8.1 When Partitioning Is Needed
Consider partitioning when a table exceeds tens of millions of rows or when only specific ranges of data are frequently queried.
8.2 PostgreSQL Partitioning
-- Range partitioning (ideal for time-series data)
CREATE TABLE orders (
id bigserial,
user_id bigint NOT NULL,
total decimal(10,2),
status varchar(20),
created_at timestamp NOT NULL
) PARTITION BY RANGE (created_at);
-- Create monthly partitions
CREATE TABLE orders_2025_01
PARTITION OF orders
FOR VALUES FROM ('2025-01-01') TO ('2025-02-01');
CREATE TABLE orders_2025_02
PARTITION OF orders
FOR VALUES FROM ('2025-02-01') TO ('2025-03-01');
-- Automatic partition creation (using pg_partman)
-- CREATE EXTENSION pg_partman;
SELECT partman.create_parent(
'public.orders',
'created_at',
'native',
'monthly'
);
-- List partitioning (split by category)
CREATE TABLE products (
id bigserial,
name varchar(255),
category varchar(50),
price decimal(10,2)
) PARTITION BY LIST (category);
CREATE TABLE products_electronics
PARTITION OF products
FOR VALUES IN ('electronics', 'computers', 'phones');
CREATE TABLE products_clothing
PARTITION OF products
FOR VALUES IN ('clothing', 'shoes', 'accessories');
-- Hash partitioning (even distribution)
CREATE TABLE sessions (
id uuid,
user_id bigint,
data jsonb,
expires_at timestamp
) PARTITION BY HASH (user_id);
CREATE TABLE sessions_p0
PARTITION OF sessions
FOR VALUES WITH (MODULUS 4, REMAINDER 0);
CREATE TABLE sessions_p1
PARTITION OF sessions
FOR VALUES WITH (MODULUS 4, REMAINDER 1);
CREATE TABLE sessions_p2
PARTITION OF sessions
FOR VALUES WITH (MODULUS 4, REMAINDER 2);
CREATE TABLE sessions_p3
PARTITION OF sessions
FOR VALUES WITH (MODULUS 4, REMAINDER 3);
8.3 MySQL Partitioning
-- MySQL Range partitioning
CREATE TABLE orders (
id BIGINT AUTO_INCREMENT,
user_id BIGINT NOT NULL,
total DECIMAL(10,2),
created_at DATETIME NOT NULL,
PRIMARY KEY (id, created_at)
) PARTITION BY RANGE (YEAR(created_at)) (
PARTITION p2023 VALUES LESS THAN (2024),
PARTITION p2024 VALUES LESS THAN (2025),
PARTITION p2025 VALUES LESS THAN (2026),
PARTITION p_future VALUES LESS THAN MAXVALUE
);
9. Read Replicas and Caching
9.1 Read Replica Strategy
Write/Read split architecture:
Write requests Read requests
| |
[Primary] -> [Replica 1]
| [Replica 2]
| [Replica 3]
Async replication
# Spring Boot - Read/Write split
spring:
datasource:
primary:
url: jdbc:postgresql://primary:5432/myapp
username: app_write
replica:
url: jdbc:postgresql://replica:5432/myapp
username: app_read
# Django - Read/Write router
# settings.py
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.postgresql',
'HOST': 'primary.db.example.com',
'NAME': 'myapp',
},
'replica': {
'ENGINE': 'django.db.backends.postgresql',
'HOST': 'replica.db.example.com',
'NAME': 'myapp',
}
}
DATABASE_ROUTERS = ['myapp.routers.ReadReplicaRouter']
# routers.py
class ReadReplicaRouter:
def db_for_read(self, model, **hints):
return 'replica'
def db_for_write(self, model, **hints):
return 'default'
9.2 Query Caching Strategy
# Redis caching pattern
import redis
import json
from functools import wraps
r = redis.Redis(host='localhost', port=6379, db=0)
# Cache-Aside (Lazy Loading) pattern
def cache_aside(key_prefix, ttl=3600):
def decorator(func):
@wraps(func)
def wrapper(*args, **kwargs):
cache_key = f"{key_prefix}:{args}:{kwargs}"
# 1. Look up in cache
cached = r.get(cache_key)
if cached:
return json.loads(cached)
# 2. Query from DB
result = func(*args, **kwargs)
# 3. Store in cache
r.setex(cache_key, ttl, json.dumps(result))
return result
return wrapper
return decorator
@cache_aside("user", ttl=1800)
def get_user_profile(user_id):
return db.query("SELECT * FROM users WHERE id = %s", [user_id])
# Write-Through pattern
def update_user(user_id, data):
# 1. Update DB
db.execute("UPDATE users SET name=%s WHERE id=%s", [data['name'], user_id])
# 2. Update cache (or delete)
cache_key = f"user:{user_id}"
r.delete(cache_key) # Cache Invalidation
# or
r.setex(cache_key, 1800, json.dumps(data)) # Cache Update
Caching strategy comparison:
Cache-Aside:
Pros: Most common, flexible
Cons: First request is a cache miss
Write-Through:
Pros: Cache is always up-to-date
Cons: Increased write latency
Write-Behind (Write-Back):
Pros: Best write performance
Cons: Risk of data loss
Read-Through:
Pros: Cache logic is separated
Cons: Implementation complexity
10. PostgreSQL-Specific Optimization
10.1 VACUUM and ANALYZE
PostgreSQL's MVCC model does not immediately remove previous rows on update/delete. VACUUM cleans up these dead tuples.
-- Manual VACUUM
VACUUM (VERBOSE) orders;
-- VACUUM FULL (table rewrite - causes lock!)
VACUUM FULL orders;
-- Update statistics
ANALYZE orders;
-- VACUUM + ANALYZE simultaneously
VACUUM ANALYZE orders;
# postgresql.conf - autovacuum settings
autovacuum = on
autovacuum_max_workers = 3
autovacuum_naptime = 60
# Run autovacuum when 20% of table is changed
autovacuum_vacuum_scale_factor = 0.2
autovacuum_vacuum_threshold = 50
# Run autoanalyze when 10% of table is changed
autovacuum_analyze_scale_factor = 0.1
autovacuum_analyze_threshold = 50
# autovacuum speed control (I/O load)
autovacuum_vacuum_cost_delay = 2ms
autovacuum_vacuum_cost_limit = 200
10.2 HOT Updates
HOT (Heap-Only Tuple) Updates is an optimization that updates only the table without updating indexes.
-- HOT Update conditions:
-- 1. Updated column has no index
-- 2. New row can be stored in the same page
-- Check HOT ratio
SELECT
relname,
n_tup_upd,
n_tup_hot_upd,
round(100.0 * n_tup_hot_upd / NULLIF(n_tup_upd, 0), 2) AS hot_pct
FROM pg_stat_user_tables
WHERE n_tup_upd > 0
ORDER BY n_tup_upd DESC;
-- To increase HOT ratio:
-- 1. Minimize indexes on frequently updated columns
-- 2. Lower fillfactor to reserve space on the same page
ALTER TABLE orders SET (fillfactor = 80);
10.3 Using pg_stat Views
-- Per-table I/O statistics
SELECT
relname,
seq_scan,
idx_scan,
n_tup_ins,
n_tup_upd,
n_tup_del,
n_live_tup,
n_dead_tup
FROM pg_stat_user_tables
ORDER BY seq_scan DESC;
-- Index usage rates
SELECT
indexrelname,
idx_scan,
idx_tup_read,
idx_tup_fetch,
pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
ORDER BY idx_scan;
-- Unused indexes (candidates for removal)
SELECT
indexrelname,
idx_scan,
pg_size_pretty(pg_relation_size(indexrelid)) AS size
FROM pg_stat_user_indexes
WHERE idx_scan = 0
AND indexrelname NOT LIKE '%_pkey'
ORDER BY pg_relation_size(indexrelid) DESC;
-- Cache hit ratio
SELECT
sum(heap_blks_read) AS heap_read,
sum(heap_blks_hit) AS heap_hit,
round(100.0 * sum(heap_blks_hit) / NULLIF(sum(heap_blks_hit) + sum(heap_blks_read), 0), 2) AS hit_pct
FROM pg_statio_user_tables;
-- 99% or above is ideal
11. MySQL-Specific Optimization
11.1 InnoDB Buffer Pool
# my.cnf - InnoDB optimization
[mysqld]
# Buffer Pool size (70-80% of total memory)
innodb_buffer_pool_size = 8G
# Buffer Pool instances (split when 8G or more)
innodb_buffer_pool_instances = 8
# Log file size
innodb_log_file_size = 1G
innodb_log_buffer_size = 64M
# I/O threads
innodb_read_io_threads = 4
innodb_write_io_threads = 4
# Concurrency
innodb_thread_concurrency = 0 # Automatic
# Flush method
innodb_flush_method = O_DIRECT
innodb_flush_log_at_trx_commit = 1 # Safe (2: performance priority)
# Temporary tables
tmp_table_size = 256M
max_heap_table_size = 256M
-- Buffer Pool status check
SHOW ENGINE INNODB STATUS\G
-- Buffer Pool hit ratio
SELECT
(1 - (
(SELECT VARIABLE_VALUE FROM performance_schema.global_status
WHERE VARIABLE_NAME = 'Innodb_buffer_pool_reads') /
(SELECT VARIABLE_VALUE FROM performance_schema.global_status
WHERE VARIABLE_NAME = 'Innodb_buffer_pool_read_requests')
)) * 100 AS buffer_pool_hit_ratio;
-- 99% or above is ideal
11.2 MySQL 8.0 Query Cache Removal
MySQL 8.0 removed the Query Cache. Use these alternatives instead:
MySQL 8.0 Query Cache alternatives:
1. Application-level caching (Redis/Memcached)
2. ProxySQL query cache
3. MySQL Router caching
4. InnoDB Buffer Pool optimization
-- MySQL 8.0 Performance Schema usage
-- Slow query analysis
SELECT
DIGEST_TEXT,
COUNT_STAR AS exec_count,
ROUND(SUM_TIMER_WAIT/1000000000, 2) AS total_ms,
ROUND(AVG_TIMER_WAIT/1000000000, 2) AS avg_ms,
SUM_ROWS_EXAMINED,
SUM_ROWS_SENT
FROM performance_schema.events_statements_summary_by_digest
ORDER BY SUM_TIMER_WAIT DESC
LIMIT 10;
11.3 MySQL Index Hints
-- Index hints (when optimizer makes wrong choices)
SELECT * FROM orders USE INDEX (idx_orders_user_date)
WHERE user_id = 1 AND created_at > '2025-01-01';
SELECT * FROM orders FORCE INDEX (idx_orders_status)
WHERE status = 'pending';
-- MySQL 8.0 optimizer hints
SELECT /*+ NO_INDEX(orders idx_orders_status) */
* FROM orders WHERE user_id = 1;
SELECT /*+ JOIN_ORDER(users, orders) */
u.name, o.total
FROM users u JOIN orders o ON u.id = o.user_id;
12. Practical Quiz
Q1: What should you do when actual rows and estimated rows are vastly different in EXPLAIN ANALYZE?
Answer:
Statistics are likely outdated. Take the following actions:
- Run ANALYZE: Refresh statistics with
ANALYZE table_name; - Increase statistics sample size: Raise
ALTER TABLE SET (n_distinct = ...)ordefault_statistics_targetvalues. - Check indexes: Verify appropriate indexes exist.
- Check autovacuum settings: Confirm
autovacuum_analyze_scale_factoris appropriate.
If estimated rows are far lower than actual, the planner may choose Nested Loop resulting in degraded performance. If far higher, it may choose an unnecessary Seq Scan.
Q2: Why does column order matter in composite indexes?
Answer:
Composite indexes follow the Leftmost Prefix Rule.
For example, with INDEX(a, b, c):
WHERE a = 1- Can use indexWHERE a = 1 AND b = 2- Can use indexWHERE a = 1 AND b = 2 AND c = 3- Can use indexWHERE b = 2- Cannot use index (missing leading column a)WHERE a = 1 AND c = 3- Only a uses index, c is filtered
Column order decision criteria:
- Equality condition columns first
- Range condition columns later
- Higher selectivity columns first
Q3: How do you detect N+1 query problems?
Answer:
- Log analysis: Check if the same query pattern repeats in SQL query logs
- ORM tools: Django
nplusone, Railsbulletgem, Springhibernate.generate_statistics - APM tools: Analyze query patterns with Datadog, New Relic, etc.
- pg_stat_statements: Check if calls count is abnormally high for the same query
- Development environment: Add query count assertions in tests to verify expected query counts
Solutions: Apply Eager Loading (JOIN), Batch Loading (IN clause), or the DataLoader pattern.
Q4: How do you determine connection pool size?
Answer:
HikariCP formula: connections = (core_count * 2) + effective_spindle_count
- core_count: Number of CPU cores
- effective_spindle_count: 0 for SSD, number of disks for HDD
Example: 4-core SSD server = (4 * 2) + 0 = 8 connections
Additional considerations:
- 10-20 connections is sufficient for most applications
- Increasing pool size indefinitely actually degrades performance (context switching, memory)
- Consider PostgreSQL max_connections and the number of connected applications
- Using an external pooler like PgBouncer supports more clients
Q5: Why is OFFSET-based pagination slow and what are the alternatives?
Answer:
Why it is slow: OFFSET 100000 LIMIT 20 reads 100,020 rows and discards the first 100,000. The larger the OFFSET, the more rows must be read and discarded.
Alternative: Keyset Pagination (Cursor-based Pagination)
-- Traditional (slow)
SELECT * FROM orders ORDER BY id LIMIT 20 OFFSET 100000;
-- Keyset (fast)
SELECT * FROM orders WHERE id > 100000 ORDER BY id LIMIT 20;
Benefits of keyset pagination:
- Consistent performance regardless of OFFSET
- Efficient index usage
- Consistent results even with concurrent inserts/deletes
Drawbacks:
- Cannot jump directly to a specific page
- Requires index on the sort column
13. References
- PostgreSQL Official Documentation - Performance Tips
- MySQL Performance Tuning Guide
- Use The Index, Luke
- HikariCP Wiki - About Pool Sizing
- PgBouncer Documentation
- Postgres EXPLAIN Visualizer (Dalibo)
- pg_stat_statements Documentation
- Percona Toolkit Documentation
- pgMustard - EXPLAIN ANALYZE Insights
- MySQL Performance Blog (Percona)
- PostgreSQL Wiki - Performance Optimization
- Citus Data Blog - PostgreSQL Tips
- Modern SQL Blog