✍️ 필사 모드: Database Query Optimizer Complete Guide 2025: Volcano, Cascades, Cost-Based, Cardinality Estimation — PostgreSQL/MySQL In-Depth Analysis
EnglishIntroduction: The Magic of One SQL Line
Same Query, 1000x Difference
Consider the following query:
SELECT u.name, o.total
FROM users u JOIN orders o ON u.id = o.user_id
WHERE u.country = 'KR' AND o.created_at > '2025-01-01';
This one-line SQL can be executed in at least dozens of ways:
- Scan users first → filter → nested loop join with orders
- Scan orders first → filter → hash join with users
- Use indexes on both sides → merge join
- Combine country index + date index...
The performance difference between these methods can be hundreds to thousands of times. Same query, same data, completely different execution times.
The one making that choice is the query optimizer. When you submit SQL, the optimizer:
- Converts the query to relational algebra
- Generates possible execution plans
- Estimates the cost of each plan
- Picks the cheapest one
This article dissects the entire process. From Volcano to Cascades, from the principles of cost-based optimization to real-world implementations in modern databases.
Why Should You Know This?
- Debugging slow queries: To read EXPLAIN output accurately, you need to understand the optimizer.
- Index design: Knowing how the optimizer picks helps you understand why an index "isn't being used."
- Statistics management: Why
ANALYZEmatters, how statistics govern performance. - Cross-DB differences: The optimizers in PostgreSQL, MySQL, and Oracle differ significantly. Important when porting.
- Modern NewSQL: How are the optimizers in TiDB and CockroachDB different?
1. The Three Stages of an Optimizer
Stage 1: Parsing
Converts SQL text into an AST (Abstract Syntax Tree).
SELECT * FROM t WHERE id = 5
↓
Select
├── Projection: *
├── From: t
└── Where: Equal(id, 5)
The parsing stage mainly checks syntax. It also performs catalog validation afterwards (does the table/column exist).
Stage 2: Rewriting / Logical Optimization
Converts the AST into relational algebra and applies logical transformations:
- Predicate pushdown: Push WHERE conditions as far inside as possible.
- Projection pushdown: Read only the necessary columns.
- Constant folding: Turn
1 + 2into3. - Subquery unnesting: Convert subqueries into joins.
- View expansion: Expand views back into their original queries.
These transformations do not change semantics.
Stage 3: Physical Optimization / Cost-Based
Generates possible physical execution plans, estimates their cost, and picks the cheapest. This stage is the real heart of the optimizer.
- Which access method to use? (full scan, index scan, index only scan)
- Which join algorithm? (nested loop, hash join, merge join)
- Which join order?
- Where to sort?
2. Relational Algebra and Logical Optimization
Relational Algebra Basics
Relational algebra is the mathematical model of query languages based on set theory:
- σ (selection): WHERE (row filtering).
- π (projection): SELECT columns (column selection).
- ⋈ (join): Table combination.
- ∪ (union), ∩ (intersection), - (difference).
- γ (group): GROUP BY.
SQL:
SELECT name FROM users WHERE age > 30
Relational algebra:
π_name(σ_age>30(users))
Algebraic Equivalence Transformations
Relational algebra has transformation rules. For example:
- Selection distribution:
σ_p∧q(R) = σ_p(σ_q(R)) - Selection push through join:
σ_p(R ⋈ S) = σ_p(R) ⋈ S(when p only uses R's columns) - Projection push through selection:
π_A(σ_p(R)) = π_A(σ_p(π_A∪required(R))) - Join commutativity:
R ⋈ S = S ⋈ R - Join associativity:
(R ⋈ S) ⋈ T = R ⋈ (S ⋈ T)
Applying these rules produces a different query with the same result.
Predicate Pushdown Example
SELECT u.name, o.total
FROM users u JOIN orders o ON u.id = o.user_id
WHERE u.country = 'KR';
Initial plan:
π_{name,total}(σ_{country='KR'}(users ⋈ orders))
Join users and orders first, then filter by country → inefficient.
With predicate pushdown:
π_{name,total}(σ_{country='KR'}(users) ⋈ orders)
Filter users for KR first, then join with orders → much more efficient.
Modern optimizers apply hundreds of such transformations automatically.
Projection Pushdown
SELECT name FROM users JOIN orders ON users.id = orders.user_id;
Initial plan: fetch all columns of users and orders, join, then extract only name.
Optimized: before the join, fetch only (id, name) from users and user_id from orders. Dramatic savings in network and memory.
Subquery Unnesting
SELECT * FROM orders
WHERE user_id IN (SELECT id FROM users WHERE country = 'KR');
This query can be executed two ways:
- Correlated: Run the subquery for each orders row → N users scans → very slow.
- Unnested: Convert to a join.
-- The optimizer rewrites it like this
SELECT o.* FROM orders o JOIN users u ON o.user_id = u.id
WHERE u.country = 'KR';
A good optimizer performs this transformation automatically. A poor (or older) optimizer executes correlated execution as-is, producing extremely slow results.
3. Physical Operators: The Basic Units of Execution
Access Method: Reading Tables
Sequential Scan (Full Table Scan):
[Page 1][Page 2][Page 3]...
↓ ↓ ↓
Read all rows sequentially
- Large table + no condition: fast (sequential I/O).
- Small result + conditions: slow (must discard all rows).
Index Scan:
B-Tree
↓ (find keys matching the condition)
Heap → read the corresponding rows
- Small result: very fast.
- Large result (high selectivity): slow due to heavy random I/O.
- Generally if more than 10% of the total is returned, full scan wins.
Index Only Scan:
B-Tree
↓ (answer from index alone)
No heap access
If all needed columns are in the index, the heap is never touched. The fastest access method.
Bitmap Index Scan:
Index 1 → bitmap 1
Index 2 → bitmap 2
↓
AND/OR combination
↓
Heap scan (in sorted order)
Combine results from multiple indexes. Converts to sequential access pattern to reduce random I/O.
Join Algorithms
Nested Loop Join (NLJ):
for r in R:
for s in S:
if r.key == s.key:
output(r, s)
- Complexity: O(|R| × |S|).
- With an index: O(|R| × log|S|).
- Best for small outer + indexed inner.
Hash Join:
# Build phase
hash_table = {}
for r in R:
hash_table[r.key] = r
# Probe phase
for s in S:
if s.key in hash_table:
output(hash_table[s.key], s)
- Complexity: O(|R| + |S|).
- Requires memory.
- Best for large tables + no index.
- Parallelism-friendly.
Sort-Merge Join:
sort(R, by=key)
sort(S, by=key)
i, j = 0, 0
while i < len(R) and j < len(S):
if R[i].key == S[j].key:
output(R[i], S[j])
# Handle duplicates
elif R[i].key < S[j].key:
i += 1
else:
j += 1
- Complexity: O(|R| log|R| + |S| log|S|).
- If already sorted: O(|R| + |S|).
- Best when ORDER BY or an index already sorts the input.
- Supports range joins.
Selection Criteria
Which join to pick? The optimizer compares via a cost function:
- NLJ cost:
|R| × cost_per_probe - Hash join cost:
|R| × hash_cost + |S| × probe_cost + build_memory - Merge join cost:
sort(R) + sort(S) + scan(R) + scan(S)
The key point here is that the optimizer does not know |R| and |S|. It must estimate them.
4. Cost-Based Optimization: The Mathematics of Cost
Cost Functions
Each physical operator has its own cost formula:
Sequential Scan:
cost = seq_page_cost × num_pages + cpu_tuple_cost × num_tuples
Index Scan:
cost = random_page_cost × (index_pages + heap_pages)
+ cpu_index_tuple_cost × num_matched
+ cpu_tuple_cost × num_matched
Parameter Meanings
PostgreSQL defaults:
seq_page_cost = 1.0 # Sequential disk page I/O
random_page_cost = 4.0 # Random disk page I/O
cpu_tuple_cost = 0.01 # Row processing CPU cost
cpu_index_tuple_cost = 0.005 # Index entry processing
cpu_operator_cost = 0.0025 # Operator execution (=, <, ...)
These values are relative. seq_page_cost = 1.0 is just a reference.
Adjustments for the SSD Era
In the HDD era, random I/O was much slower than sequential I/O (about 100x). That's why random_page_cost = 4.0 was set. But on SSDs the gap is small. On NVMe it's nearly identical.
# In SSD environments
random_page_cost = 1.1
This single change alone leads the optimizer to use indexes more often.
effective_cache_size
effective_cache_size = 4GB
This parameter tells the optimizer "how many MB can be cached in OS page cache + PostgreSQL shared buffers." The optimizer uses this to adjust expected I/O for index scans. Larger → prefers indexes, smaller → prefers full scan.
Join Cost Example
SELECT * FROM A JOIN B ON A.id = B.a_id
A: 10,000 rows, B: 100,000 rows, B has an a_id index.
NLJ (A outer):
10,000 × (index_lookup_cost + matched_rows_cost)
≈ 10,000 × (4 + 0.01 × 10) (10 matches per A row on average)
≈ 40,000 + 1000 = 41,000
Hash Join (B as build):
build: 100,000 × cpu_tuple_cost = 1,000
probe: 10,000 × (hash_cost + matched_cost) ≈ 10,000 × 0.02 = 200
total: 1,200
Hash is much cheaper. The optimizer picks hash join.
Note: Cost is for relative comparison, not absolute time. "cost=1,000" is not "1 second." It's used to rank alternatives within the same query.
5. Cardinality Estimation: The Lifeblood of the Optimizer
"How Many Rows Will This Produce?"
All cost calculations depend on the result row count (cardinality) of each operator. If the estimate is wrong, the entire plan is wrong.
SELECT * FROM users WHERE country = 'KR' AND age > 30;
Questions:
- Total users: 1 million
- Selectivity of country='KR': 20% → 200k
- Selectivity of age > 30: 40% → 400k
- Both satisfied: ?
Independence assumption for selectivity: 20% × 40% = 8% → estimated 80k rows.
If this estimate is close to reality (say 50k), the plan is good. If it's completely off (say 500k), disaster.
Statistics: The Optimizer's Sense Organs
Accurate statistics lead to accurate estimates. Information stored in PostgreSQL's pg_statistic:
- n_distinct: number of distinct values in a column.
- null_frac: NULL ratio.
- most_common_vals (MCV): the most frequently occurring values.
- most_common_freqs: frequencies of the MCVs.
- histogram_bounds: value distribution histogram (equal-frequency buckets).
- correlation: correlation between physical and logical order.
Histograms
Histogram for column age (100 buckets)
[0-5][5-12][12-20][20-25][25-28]...[70-120]
5% 5% 5% 5% 5% 5%
Each bucket holds an equal fraction (e.g., 1%) of the total. Narrower bucket ranges indicate denser value regions.
Query WHERE age > 30:
- How many buckets are at or above 30?
- Each bucket is 1% of the total.
- Estimate:
bucket_count × 1% × total_rows.
MCV (Most Common Values)
When values are heavily skewed (e.g., 90% of status is 'active'):
- MCV:
['active', 'inactive', 'pending'] - MCV freqs:
[0.9, 0.08, 0.02]
WHERE status = 'active' → estimate: 0.9 × total_rows.
Values not in MCV are estimated via the histogram.
Pitfalls in Selectivity Computation
Pitfall 1: The Independence Assumption Fallacy
WHERE city = 'Seoul' AND country = 'Korea'
Optimizer: P(city='Seoul') × P(country='Korea') = 0.1 × 0.2 = 0.02
Reality: P(city='Seoul' AND country='Korea') = 0.1 (Seoul only exists in Korea)
5x off. Leads to a bad plan.
Solution: Extended Statistics. In PostgreSQL 10+:
CREATE STATISTICS ON (city, country) FROM addresses;
ANALYZE addresses;
Now the optimizer knows the correlation between the two columns.
Pitfall 2: LIKE Patterns
WHERE name LIKE 'A%'
Selectivity is hard to estimate. The optimizer tries with MCVs and histograms, but often misses.
Pitfall 3: Estimating Join Results
SELECT * FROM A JOIN B ON A.key = B.key
Result row count estimate:
|R ⋈ S| ≈ |R| × |S| / max(n_distinct(A.key), n_distinct(B.key))
This formula is based on an "even distribution" assumption. In reality, skew causes large errors.
The Importance of ANALYZE
Statistics are updated with the ANALYZE command:
ANALYZE users; -- Specific table
ANALYZE; -- Everything
PostgreSQL's autovacuum runs ANALYZE periodically. However:
- After bulk INSERT/UPDATE, a manual ANALYZE is recommended.
- Stale statistics cause the optimizer to pick bad plans.
- Querying new tables at scale without ANALYZE may yield horrible plans.
6. Volcano Optimizer
Goetz Graefe's Innovation
The Volcano Optimizer, published by Goetz Graefe in 1993, is the starting point of modern query optimizers. The core idea:
"Generate every possible plan using a set of transformation rules, and pick the cheapest."
Operator Tree
A query is represented as a tree of logical operators:
Join
├── Scan(A)
└── Scan(B)
The optimizer's goal is to convert this into a tree of physical operators while minimizing cost:
HashJoin
├── SeqScan(A)
└── IndexScan(B)
Transformation Rules
Volcano explores with sets of transformation rules:
- Logical → Logical: Join commutativity, associativity.
- Logical → Physical:
Join → HashJoin,Join → NestedLoopJoin, etc. - Enforcer: Add a Sort if ordering is required.
Top-down Search
Volcano performs a top-down search:
- Start from the root operator.
- Enumerate possible physical implementations for each operator.
- Recursively optimize children.
- After computing costs of all subtrees, pick the optimum.
Memoization
Re-computing the same subproblem many times is expensive. Volcano uses memoization (caching computed results). The same subexpression is optimized only once.
Drawbacks
Volcano is excellent but had several limitations:
- Exhaustive search: Tries to generate every plan, exploding with many joins.
- Top-down inefficiency: Decisions at the top propagate down and cause repeated revisits.
The improvement over this was Cascades.
7. Cascades Optimizer
Graefe's Next Work
Cascades (1995) is also Goetz Graefe's work. The optimizers of SQL Server, CockroachDB, Apache Calcite, and Greenplum are all Cascades-based.
Key Improvements
- Unified logical/physical transformations: Rules expressed in a unified way.
- Better memoization structure: The "memo" data structure prevents redundant optimization.
- Property propagation: Propagates properties like ordering and distribution.
- Branch-and-bound pruning: Obviously bad plans are abandoned early.
The Memo Structure
The memo is a set of groups of equivalent expressions:
Group 1: σ_p(R)
Group 2: σ_p(R) ⋈ S, (R ⋈ S with σ_p pushed)
Group 3: A ⋈ B, B ⋈ A (commutativity)
Each group holds multiple expressions that yield the same result. The optimizer compares costs at the group level.
Properties
Physical Properties are characteristics of a plan:
- Ordering: Is the result sorted?
- Distribution: In a distributed DB, how is the data partitioned?
- Uniqueness: Are there no duplicates in the result?
When a parent operator requires a certain property (e.g., Merge Join requires sorted input), the child either provides it or an enforcer is added (Sort).
Pruning
To prevent the search from exploding, pruning:
- Skip subproblems whose cost already exceeds the best found.
- Branch-and-bound algorithm.
Cascades-Based Optimizers
- SQL Server: Its own Cascades-based implementation.
- PostgreSQL: Not Cascades (dynamic-programming-based).
- CockroachDB: Cascades-based.
- Apache Calcite: Cascades-based, used in Hive/Flink/Beam.
- Presto/Trino: Calcite-inspired + custom implementation.
8. PostgreSQL's Optimizer
Structure
PostgreSQL takes a slightly different approach from Volcano/Cascades:
- Rewriting → Logical plan
- Plan generation: Dynamic programming + Genetic algorithm.
- Cost estimation: Discrete cost functions.
- Plan selection: Cheapest wins.
Dynamic Programming Join Ordering
Join ordering uses Dynamic Programming (DP). It computes the optimal join of each subset and combines them.
- Complexity: around O(3^n).
- n ≤ 12: DP gives the exact optimum.
- n > 12: exceeds
geqo_threshold→ switch to Genetic Algorithm.
Genetic Algorithm (GEQO)
For joins across many tables (e.g., 15), DP is too slow, so:
- Generate random join orders.
- Evaluate "fitness" (estimated cost).
- Cross over and mutate good ones.
- Evolve over generations.
- Pick the best.
Not exactly optimal, but finds a reasonable solution quickly. geqo = on (default).
Generic vs Custom Plan
PostgreSQL behaves specially for prepared statements:
- Custom plan: Optimized per execution based on parameters.
- Generic plan: Optimized once, then reused.
PostgreSQL uses custom plans for the first few executions, then switches to generic if generic is cheaper. This can cause the phenomenon where "it's fast at first, then suddenly slow."
EXPLAIN
To see the optimizer's decisions, use EXPLAIN:
EXPLAIN (ANALYZE, BUFFERS, VERBOSE)
SELECT u.name, o.total
FROM users u JOIN orders o ON u.id = o.user_id
WHERE u.country = 'KR';
Hash Join (cost=10.00..250.00 rows=500 width=64) (actual time=0.1..5.0 rows=487 loops=1)
Hash Cond: (o.user_id = u.id)
Buffers: shared hit=234
-> Seq Scan on orders o (cost=0.00..150.00 rows=10000 width=16)
-> Hash (cost=8.00..8.00 rows=100 width=56)
-> Index Scan using idx_users_country on users u
Index Cond: (country = 'KR')
Interpretation:
- cost: estimated cost (startup_cost..total_cost).
- rows: estimated row count.
- actual: measurement during actual execution.
- loops: how many times this node ran.
A big gap between estimated rows and actual rows indicates a statistics problem. ANALYZE needed.
9. MySQL's Optimizer
History
MySQL's optimizer was heuristics-based for a long time:
- Always joins tables in fixed order (left-deep tree).
- Decides index usage via rules.
Since 5.6 it has introduced a cost-based optimizer and continues to improve.
Histograms
Histograms are supported from MySQL 8.0:
ANALYZE TABLE users UPDATE HISTOGRAM ON country WITH 100 BUCKETS;
Added later than PostgreSQL but effective.
Optimizer Hints
MySQL allows giving hints to the optimizer:
SELECT /*+ INDEX(users idx_country) */ *
FROM users WHERE country = 'KR';
This tells it "use the country index." PostgreSQL doesn't officially support hints (philosophical difference).
ICP (Index Condition Pushdown)
A useful MySQL feature. Evaluates some WHERE conditions at the index level:
SELECT * FROM users WHERE country = 'KR' AND age > 30;
With an index (country, age):
- Normal: match country='KR' in the index → fetch rows → check age.
- ICP: match country='KR' in the index → check age condition at the index level → fetch only those that pass.
Greatly reduces random I/O.
Join Buffer (Block Nested Loop)
When no index exists, MySQL's default join is Block Nested Loop:
Gather several outer rows into memory up to join_buffer_size
↓
Scan the inner table once, matching each outer block
↓
Reduced I/O
Hash join was added in MySQL 8.0.18. Before that, block NLJ was the only alternative.
10. Optimizer Limitations and Real-World Techniques
Limitation 1: Missing or Stale Statistics
Symptom: The rows estimate in EXPLAIN is badly off from reality.
Solution:
ANALYZE my_table; -- PostgreSQL
ANALYZE TABLE my_table; -- MySQL
Verify automatic ANALYZE works. After big updates, run it manually.
Limitation 2: Join Explosion
Joining 20 tables exhausts the optimizer. Time grows exponentially.
Solutions:
- Decompose via views: Frequently used combinations as views.
- Split via subqueries: Compute parts first.
- PostgreSQL: Tune
join_collapse_limit,from_collapse_limit. - MySQL: Tune optimizer_search_depth.
Limitation 3: Correlated Columns
WHERE city = 'Seoul' AND country = 'Korea'
Independence assumption fails, causing estimation errors. Use extended statistics (PostgreSQL) or histograms with correlation information.
Limitation 4: Skewed Data
When a particular value is extremely frequent:
status column: 'active' 99%, 'deleted' 1%
Problematic in "all-or-nothing" queries:
SELECT * FROM t WHERE status = 'deleted';
- Without stats: estimate "50% of rows" → full scan.
- With MCV: estimate "1%" → use index.
MCV needs to be captured accurately.
Limitation 5: Parameter-Dependent Queries
PREPARE p AS SELECT * FROM t WHERE x = $1 AND y > $2;
The optimal plan may differ for $1 = 'rare_value' vs $1 = 'common_value'. A generic plan cannot be optimal for both cases.
Solutions:
- Force custom plan: PostgreSQL's
plan_cache_mode = force_custom_plan. - Dynamic SQL: Give up parameterization.
Practical Technique: Query Rewriting
For problems the optimizer can't solve, rewrite SQL:
Original (slow):
SELECT * FROM orders
WHERE EXISTS (
SELECT 1 FROM users
WHERE users.id = orders.user_id AND users.country = 'KR'
);
Rewritten (fast):
SELECT o.* FROM orders o
INNER JOIN users u ON o.user_id = u.id
WHERE u.country = 'KR';
Modern optimizers do most such rewrites automatically, but not always.
Practical Technique: Materialized Views
Precompute and store expensive, frequently run queries:
CREATE MATERIALIZED VIEW user_order_summary AS
SELECT u.id, u.name, COUNT(o.id) as order_count, SUM(o.total) as total_spent
FROM users u LEFT JOIN orders o ON u.id = o.user_id
GROUP BY u.id, u.name;
REFRESH MATERIALIZED VIEW user_order_summary;
The optimizer (by default in PostgreSQL) doesn't exploit these automatically, but the application can use them directly.
Practical Technique: Hints or Plan Pinning
The last resort: force a specific plan.
Oracle:
SELECT /*+ INDEX(t idx_t_name) */ * FROM t WHERE name = 'foo';
PostgreSQL (pg_hint_plan extension):
/*+ IndexScan(t idx_t_name) */
SELECT * FROM t WHERE name = 'foo';
MySQL:
SELECT * FROM t USE INDEX (idx_t_name) WHERE name = 'foo';
Hints are tempting but risky. If data distribution changes, they may no longer be optimal. Use only as a last resort.
11. Optimizers in Distributed Databases
Additional Considerations
Optimizers in distributed DBs (CockroachDB, TiDB, Spanner) have additional concerns:
- Data locality: Which node has which data.
- Network cost: Where to perform the join?
- Parallelism: Execute concurrently on multiple nodes.
- Exchange operator: Data movement between nodes.
Distributed Join Strategies
Broadcast Join: Replicate a small table to all nodes.
Small table → broadcast → local join at each node
Shuffle Join (Hash Redistribution): Redistribute to nodes by hashing the join key.
Both tables → hash on join key → same keys to the same node
Collocated Join: If both tables are already partitioned on the same key, join locally.
The optimizer compares their costs and chooses.
TiDB's Optimizer
TiDB is a Cost-Based + Rule-Based hybrid:
- Rule-based: always-applied transforms (predicate pushdown, etc.).
- Cost-based: cardinality + cost model for plan selection.
- Statistics: histograms + Count-Min Sketch for skew handling.
Significant research continues to overcome the extra complexity of distributed environments.
Quiz for Review
Q1. What is the most important input for an optimizer to pick a "good plan," and why?
A. Statistics, particularly accurate cardinality estimation. All cost calculations in the optimizer are based on the expected row count from each stage. If estimates are wrong:
- The chosen access method becomes inappropriate (index scan on a big result, full scan on a small one).
- Wrong join algorithm (nested loop is the worst case).
- Wrong join order → intermediate result explosion.
If statistics are missing or stale, these will happen without fail. That's why the first step of database tuning is always to check "Did you run ANALYZE?". After bulk INSERT/UPDATE or adding a new index, a manual ANALYZE is a must.
Q2. What happens in PostgreSQL when you lower random_page_cost from 4.0 to 1.1?
A. The optimizer becomes more likely to prefer index scans. Why:
- Index Scan causes random I/O against the heap.
- The default of 4.0 assumes random I/O is 4x slower than sequential — a legacy of the HDD era.
- On modern SSDs/NVMe, the gap is almost nonexistent (1.1 is more realistic).
After this change:
- More queries switch to index scans.
- Conditions with low selectivity (few rows) avoid full scan.
- Overall performance typically improves, especially where random I/O is cheap.
Caveat: This parameter only changes the optimizer's estimates, not actual performance directly. If it doesn't match the environment, it may lead to bad plans. Use 1.1 to 1.5 on SSDs, keep 4.0 on HDDs, somewhere in between for network storage.
Q3. What is the problem with the "independent selectivity assumption" and PostgreSQL's solution?
A. Problem: The optimizer assumes P(A and B) = P(A) × P(B). This holds when conditions are independent, but in practice, they are often correlated:
WHERE city = 'Seoul' AND country = 'Korea'
- Estimate: 10% × 20% = 2%
- Reality: 10% (Seoul only in Korea)
- 5x underestimate → wrong plan.
PostgreSQL solution (10+): Extended Statistics
CREATE STATISTICS stat_addr (dependencies) ON city, country FROM addresses;
CREATE STATISTICS stat_addr_ndist (ndistinct) ON city, country FROM addresses;
ANALYZE addresses;
Three types:
- dependencies: Detect functional dependencies (city → country).
- ndistinct: Number of distinct combinations across columns.
- MCV (multivariate): Most common value combinations across columns.
With this info, the optimizer can account for correlation. Very effective, but it has additional statistics-maintenance cost, so apply selectively to problematic queries.
Q4. What is the difference between the Volcano and Cascades optimizers?
A.
Volcano (1993):
- Top-down search.
- Exhaustive plan generation.
- Has memoization.
- Based on transformation rules.
- Limits: Search space explosion for many joins, limited property handling.
Cascades (1995):
- Improvement over Volcano (same author).
- Memo data structure groups equivalent expressions.
- Physical property propagation (ordering, distribution).
- Branch-and-bound pruning.
- Enforcers: Add Sort, Exchange to provide required properties.
Analogy: If Volcano is "generate all plans and pick one," Cascades is "compare at the group level, demand only required properties, and discard obviously bad ones early."
Usage: SQL Server, CockroachDB, Apache Calcite (Cascades-based). PostgreSQL is neither — it's its own DP-based design. Volcano mostly appears in research and lectures, while Cascades has become the industry standard.
Q5. What is the diagnostic order when "estimated rows" and "actual rows" in EXPLAIN ANALYZE differ significantly?
A. This is a sign that the optimizer made a bad estimate. Diagnostic order:
1. Check whether ANALYZE ran
SELECT last_analyze, last_autoanalyze FROM pg_stat_user_tables WHERE relname='mytable';
If too stale, run it manually: ANALYZE mytable;
2. Check statistics quality
SELECT attname, n_distinct, most_common_vals, histogram_bounds
FROM pg_stats WHERE tablename = 'mytable';
- If
n_distinctis odd, the sample may be too small →ALTER TABLE ... SET STATISTICS 1000. - If MCV is empty, the data might not be evenly distributed.
3. Suspect correlated columns If you have multiple WHERE conditions that always appear together, use extended statistics:
CREATE STATISTICS s ON col1, col2 FROM mytable;
4. Subqueries / function calls
WHERE col = get_default() and similar function calls make selectivity unestimable.
5. Bad join-result estimates
If an intermediate join result looks wrong, the n_distinct of join-participating columns may be inaccurate.
6. Hints / rewrite If all the above fail, force a specific plan with the pg_hint_plan extension, or rewrite the query (e.g., use a CTE, inline a subquery).
The key is to approach it in the order "statistics → data characteristics → query structure." Most problems are resolved with a single ANALYZE.
Closing: The Optimizer Is a Partner
Key Points
- The optimizer depends on statistics. Don't skimp on ANALYZE.
- It's cost-based. Tune cost parameters to your environment.
- Cardinality estimation is life. For correlated columns, use extended statistics.
- Read EXPLAIN. The difference between estimated and actual is the start of diagnosis.
- Join order is the optimizer's hard problem. With many tables, consider hints or rewrites.
- Cascades is the modern standard, but PostgreSQL has its own DP.
- Hints are the last resort. Fix statistics and structure first.
Cooperating with the Optimizer
Think of the optimizer not as an enemy but as a partner:
- Give it statistics: ANALYZE, extended statistics.
- Write clear queries: SQL the optimizer can understand easily.
- Provide indexes: Secure the necessary access paths.
- Measure: Check reality with EXPLAIN ANALYZE.
- Trust, but verify: Most of the time, the optimizer is right.
Final Lesson
The optimizer is the product of decades of research and work by tens of thousands of engineers. Don't try to beat it. Instead, help the optimizer make the right decisions. Accurate statistics, appropriate indexes, clear queries. When those three come together, SQL runs magically fast.
Next time you meet a slow query, first ask:
- "Are the statistics fresh?"
- "Does the cardinality the optimizer sees match reality?"
- "Is the structure amenable to optimization with the indexes and statistics I provided?"
If the answers are "yes," the optimizer will almost always give you the right answer.
References
- The Volcano Optimizer Generator (Graefe, 1993)
- The Cascades Framework for Query Optimization (Graefe, 1995)
- Access Path Selection in a Relational Database Management System (Selinger et al., 1979) - The System R classic
- PostgreSQL Query Planner
- How PostgreSQL's Query Planner Works (Interdb)
- MySQL Query Optimization
- Apache Calcite Documentation
- CockroachDB SQL Optimizer (Cascades-based)
- Database Internals (Alex Petrov), Chapter 12
- Use the Index, Luke! - A complete guide to index usage
현재 단락 (1/512)
Consider the following query: