✍️ 필사 모드: DB Index Master Guide 2025: B-Tree, Hash, GIN, BRIN, Composite Indexes, Query Plan Analysis
EnglishTL;DR
- B-Tree covers 90%: fits most queries. Supports sort, range, exact match.
- GIN is king for search: arrays, JSON, full-text. PostgreSQL's secret weapon.
- Composite column order is decisive:
WHERE a=? AND b=?differs fromWHERE b=? AND a=?in usability. - Covering indexes skip table access: leverage the
INCLUDEclause. - Know the 10 reasons an index is not used — statistics, functions, casts, patterns, etc.
1. Why Are Indexes Fast?
1.1 Without an index
SELECT * FROM users WHERE email = 'alice@example.com';
Without an index: Sequential Scan — read and compare every row. 100M rows = 100M comparisons.
100M rows x 100 nanoseconds = 10 seconds
1.2 With B-Tree
100M rows -> log2(10^8) ~= 27 comparisons
27 x 100 ns = 2.7 microseconds
About 4 million times faster — the magic of indexes.
1.3 The cost of an index
Not free:
- Storage: indexes consume disk space
- Slower writes: INSERT/UPDATE/DELETE must also update the index
- Maintenance: VACUUM, REINDEX
Rule: weigh the read-to-write ratio. 100:1 favors an index; 1:100 demands care.
2. B-Tree Internals
2.1 Balanced tree
[50]
/ \
[25] [75]
/ \ / \
[10] [40] [60] [90]
Properties:
- All leaves at the same depth (balanced)
- Each node stores many keys (hundreds)
- Sorted order maintained
2.2 Page-based storage
PostgreSQL B-Tree page (8KB)
+----------------------------------+
| Header |
+----------------------------------+
| Item Pointer 1 -> key1 + ctid |
| Item Pointer 2 -> key2 + ctid |
+----------------------------------+
| Free Space |
+----------------------------------+
| Items (low to high) |
+----------------------------------+
ctid is PostgreSQL's row location (page number, offset).
2.3 Tree depth math
Assuming 200 keys per 8KB page:
- Depth 1: 200 keys
- Depth 2: 40,000
- Depth 3: 8,000,000
- Depth 4: 1,600,000,000
100M rows = depth 4 -> only 4 disk reads per lookup.
3. Index Types (PostgreSQL)
3.1 B-Tree (default)
CREATE INDEX idx_users_email ON users(email);
Good for: exact match, range, ORDER BY, prefix LIKE 'abc%', IS NULL.
Bad for: LIKE '%abc', LIKE '%abc%', wrapped functions like LOWER(email) (need expression index).
3.2 Hash
CREATE INDEX idx_users_email_hash ON users USING hash(email);
Exact match only. Usually B-Tree is preferred; PG 10+ made hash WAL-safe, so it is occasionally useful (slightly smaller).
3.3 GIN (Generalized Inverted Index)
-- array
CREATE INDEX idx_tags ON posts USING gin(tags);
-- JSONB
CREATE INDEX idx_metadata ON products USING gin(metadata);
-- full-text
CREATE INDEX idx_content_fts ON articles USING gin(to_tsvector('english', content));
Inverted index — each key maps to a list of rows.
"redis" -> [row1, row5, row100, ...]
"kafka" -> [row3, row7, row50, ...]
"postgres" -> [row1, row2, row100, ...]
Good for: array containment (tags @> ARRAY['redis']), JSONB key/value, full-text, multi-value columns.
SELECT * FROM posts WHERE tags @> ARRAY['redis'];
SELECT * FROM posts WHERE tags @> ARRAY['redis', 'database'];
SELECT * FROM products WHERE metadata @> '{"category": "electronics"}';
Downside: slow build. Use fastupdate=on to defer UPDATE cost.
3.4 GiST (Generalized Search Tree)
CREATE INDEX idx_locations ON places USING gist(location);
CREATE INDEX idx_periods ON events USING gist(period);
Geography, range types, nearest-neighbor search, overlap checks.
3.5 BRIN (Block Range Index)
CREATE INDEX idx_logs_created ON logs USING brin(created_at);
Stores min/max per page range. Tiny index. Great for naturally ordered huge tables (time-series, logs). Example: 100M-row log table — BRIN is a few MB vs several GB for B-Tree.
3.6 SP-GiST
CREATE INDEX idx_ips ON connections USING spgist(ip_addr);
Non-balanced tree; IP ranges, some PostGIS types.
3.7 Comparison
| Type | Exact | Range | Pattern | Array/JSON | Geo | Size | Build |
|---|---|---|---|---|---|---|---|
| B-Tree | yes | yes | prefix only | no | no | medium | fast |
| Hash | yes | no | no | no | no | small | fast |
| GIN | yes | no | full-text | yes | no | large | slow |
| GiST | yes | yes | no | no | yes | medium | medium |
| BRIN | no | yes* | no | no | no | tiny | fast |
| SP-GiST | yes | yes | no | no | yes | medium | medium |
*BRIN works only when data is physically ordered.
4. Composite Indexes
4.1 Column order is decisive
CREATE INDEX idx_users_country_age ON users(country, age);
Usable:
WHERE country = 'KR'
WHERE country = 'KR' AND age = 30
WHERE country = 'KR' AND age > 25
Not usable:
WHERE age = 30 -- country is leading
WHERE age > 25
Rule: leftmost prefix.
4.2 Choosing column order
Priorities:
- Most frequently used columns first
- Higher selectivity first (filters more)
- Equality before range
Example:
WHERE country = 'KR' AND age > 25 AND created_at > '2024-01-01'
Selectivity: country=30%, created_at=50%, age=70%. Optimal order puts the most selective column first:
CREATE INDEX idx_users_search ON users(country, created_at, age);
4.3 Equality + range
WHERE country = 'KR' AND age BETWEEN 25 AND 35
Best: (country, age) — equality column first.
5. Specialized Index Patterns
5.1 Partial Index
CREATE INDEX idx_active_users ON users(email)
WHERE status = 'active';
Smaller index, faster build and maintenance, better cache efficiency. Great for frequently filtered subsets.
5.2 Expression Index
CREATE INDEX idx_users_email_lower ON users(LOWER(email));
SELECT * FROM users WHERE LOWER(email) = LOWER('Alice@example.com');
Common: LOWER, UPPER, EXTRACT(year FROM ...), JSON extraction (metadata->>'category').
5.3 Covering Index (INCLUDE)
-- PostgreSQL 11+
CREATE INDEX idx_users_country_email ON users(country) INCLUDE (email, name);
Returns results without touching the table (Index Only Scan).
SELECT email, name FROM users WHERE country = 'KR';
5.4 Unique Index
CREATE UNIQUE INDEX idx_users_email_unique ON users(email);
Prevents duplicates and enables optimizer shortcuts.
6. Mastering EXPLAIN ANALYZE
6.1 Usage
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT * FROM users WHERE country = 'KR' AND age > 25;
6.2 Reading output
Index Scan using idx_users_country_age on users
(cost=0.43..2547.39 rows=10000 width=120)
(actual time=0.025..15.234 rows=9847 loops=1)
Index Cond: ((country)::text = 'KR'::text)
Filter: (age > 25)
Rows Removed by Filter: 153
Buffers: shared hit=1234 read=56
Planning Time: 0.123 ms
Execution Time: 15.567 ms
cost: optimizer estimaterowsvsactual rows: estimate vs realityBuffers: cache hit/read
6.3 Estimated vs actual
Large divergence (e.g. 1000x) indicates stale stats. Run ANALYZE and consider raising default_statistics_target.
6.4 Node types
| Node | Meaning | Good? |
|---|---|---|
Seq Scan | full scan | only for small tables |
Index Scan | index + heap lookup | yes |
Index Only Scan | index only | best |
Bitmap Index Scan | bitmap combine | yes |
Nested Loop | small joins | small only |
Hash Join | large joins | yes |
Merge Join | sorted joins | yes |
Sort | in-memory or on-disk | watch for disk |
6.5 Red flags
Rows Removed by Filter: 100000+Disk: external mergeestimated/actual > 100Seq Scanon a large table
7. 10 Reasons an Index Is Not Used
7.1 Function/expression wrap
-- bad
SELECT * FROM users WHERE LOWER(email) = 'alice@example.com';
-- good
CREATE INDEX idx_email_lower ON users(LOWER(email));
7.2 Type casting
-- bad: varchar vs int
SELECT * FROM users WHERE phone = 12345678;
-- good
SELECT * FROM users WHERE phone = '12345678';
7.3 Leading wildcard
-- bad
SELECT * FROM users WHERE name LIKE '%alice%';
-- good
CREATE EXTENSION pg_trgm;
CREATE INDEX idx_users_name_trgm ON users USING gin(name gin_trgm_ops);
7.4 NULL comparison
CREATE INDEX idx_active ON users(id) WHERE deleted_at IS NULL;
7.5 OR conditions
Prefer IN or UNION over OR across columns.
7.6 Stale statistics
ANALYZE users;
7.7 Too many rows returned
Above ~10% of rows the planner may choose a seq scan — use partial indexes or different query strategies.
7.8 No optimizer hints
PostgreSQL lacks hints by design; MySQL supports USE INDEX. Use pg_hint_plan if needed.
7.9 Missing leading column
CREATE INDEX idx_users_country_age ON users(country, age);
-- bad: country missing
SELECT * FROM users WHERE age > 25;
7.10 Implicit conversion
-- bad on some DBs
SELECT * FROM orders WHERE order_id = 12345;
-- good
SELECT * FROM orders WHERE order_id = 12345::bigint;
8. Real-World Tuning Cases
8.1 Slow search
SELECT * FROM products
WHERE category = 'electronics'
AND price BETWEEN 100 AND 500
ORDER BY created_at DESC
LIMIT 20;
Before: Seq Scan + Sort (10s). Fix:
CREATE INDEX idx_products_search ON products(category, price, created_at DESC);
After: Index Scan, 5ms — 2000x faster.
8.2 Slow JOIN
SELECT u.*, COUNT(o.id) AS order_count
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
WHERE u.country = 'KR'
GROUP BY u.id;
Fix:
CREATE INDEX idx_orders_user_id ON orders(user_id);
8.3 Slow JSON query
CREATE INDEX idx_events_metadata ON events USING gin(metadata);
8.4 Composite vs single
Given WHERE a=? AND b=? AND c=? — one composite index on (a, b, c) almost always beats three single-column indexes combined with BitmapAnd.
9. Index Maintenance
9.1 Find unused indexes
SELECT
schemaname || '.' || relname AS table,
indexrelname AS unused_index,
pg_size_pretty(pg_relation_size(indexrelid)) AS size,
idx_scan AS index_scans
FROM pg_stat_user_indexes
WHERE idx_scan = 0
ORDER BY pg_relation_size(indexrelid) DESC;
9.2 Index bloat
REINDEX INDEX CONCURRENTLY idx_users_email;
9.3 Redundant indexes
If idx_ab(a, b) exists, idx_a(a) is redundant (leftmost rule).
9.4 Monitoring
SELECT schemaname, relname, indexrelname, idx_scan, idx_tup_read, idx_tup_fetch
FROM pg_stat_user_indexes
ORDER BY idx_scan DESC;
10. MySQL vs PostgreSQL
| Feature | MySQL (InnoDB) | PostgreSQL |
|---|---|---|
| Default index | B+Tree | B-Tree |
| Clustered index | yes (PRIMARY KEY) | no |
| Hash index | Memory engine only | yes |
| Full-text | FULLTEXT | tsvector + GIN |
| JSON indexing | Generated columns | GIN on JSONB |
| Partial index | limited (8.0+) | yes |
| Expression index | 8.0+ | yes |
| Covering index | yes | yes (INCLUDE) |
| BRIN | no | yes |
| GiST | no | yes |
Quiz
1. Which queries can use a composite index on (a, b, c)?
Answer: leftmost prefix rule — usable: WHERE a=?, WHERE a=? AND b=?, WHERE a=? AND b=? AND c=?, partially WHERE a=? AND c=?. Not usable: WHERE b=?, WHERE c=?, WHERE b=? AND c=?.
2. When is a GIN index appropriate?
Answer: multi-valued columns — arrays tags @> ARRAY['redis'], JSONB metadata @> '{"key": "value"}', full-text tsvector @@ tsquery, and trigram acceleration via pg_trgm for LIKE-with-leading-wildcard.
3. Estimated and actual rows differ 1000x — what now?
Answer: statistics are stale. Run ANALYZE, raise default_statistics_target (100 -> 1000), or tune per column with SET STATISTICS. Also consider lowering autovacuum's analyze_scale_factor.
4. Benefits of a Partial Index?
Answer: smaller index, faster build and maintenance, better cache efficiency. Example: CREATE INDEX idx_active ON users(email) WHERE status='active' indexes only active users.
5. Why is Index Only Scan faster than Index Scan?
Answer: a regular Index Scan fetches ctid then reads the heap row. Index Only Scan has all needed columns in the index (use INCLUDE), avoiding table access. Requires an up-to-date visibility map (VACUUM).
References
현재 단락 (1/236)
- **B-Tree covers 90%**: fits most queries. Supports sort, range, exact match.