Chaos and Order

💡 왼쪽 원문을 읽으면서 오른쪽에 따라 써보세요. Tab 키로 힌트를 받을 수 있습니다.

Introduction: The Magic of One SQL Line

Same Query, 1000x Difference

Consider the following query:

SELECT u.name, o.total
FROM users u JOIN orders o ON u.id = o.user_id
WHERE u.country = 'KR' AND o.created_at > '2025-01-01';

This one-line SQL can be executed in at least dozens of ways:

Scan users first → filter → nested loop join with orders
Scan orders first → filter → hash join with users
Use indexes on both sides → merge join
Combine country index + date index...

The performance difference between these methods can be hundreds to thousands of times. Same query, same data, completely different execution times.

The one making that choice is the query optimizer. When you submit SQL, the optimizer:

Converts the query to relational algebra
Generates possible execution plans
Estimates the cost of each plan
Picks the cheapest one

This article dissects the entire process. From Volcano to Cascades, from the principles of cost-based optimization to real-world implementations in modern databases.

Why Should You Know This?

Debugging slow queries: To read EXPLAIN output accurately, you need to understand the optimizer.
Index design: Knowing how the optimizer picks helps you understand why an index "isn't being used."
Statistics management: Why ANALYZE matters, how statistics govern performance.
Cross-DB differences: The optimizers in PostgreSQL, MySQL, and Oracle differ significantly. Important when porting.
Modern NewSQL: How are the optimizers in TiDB and CockroachDB different?

1. The Three Stages of an Optimizer

Stage 1: Parsing

Converts SQL text into an AST (Abstract Syntax Tree).

SELECT * FROM t WHERE id = 5
        ↓
Select
├── Projection: *
├── From: t
└── Where: Equal(id, 5)

The parsing stage mainly checks syntax. It also performs catalog validation afterwards (does the table/column exist).

Stage 2: Rewriting / Logical Optimization

Converts the AST into relational algebra and applies logical transformations:

Predicate pushdown: Push WHERE conditions as far inside as possible.
Projection pushdown: Read only the necessary columns.
Constant folding: Turn 1 + 2 into 3.
Subquery unnesting: Convert subqueries into joins.
View expansion: Expand views back into their original queries.

These transformations do not change semantics.

Stage 3: Physical Optimization / Cost-Based

Generates possible physical execution plans, estimates their cost, and picks the cheapest. This stage is the real heart of the optimizer.

Which access method to use? (full scan, index scan, index only scan)
Which join algorithm? (nested loop, hash join, merge join)
Which join order?
Where to sort?

2. Relational Algebra and Logical Optimization

Relational Algebra Basics

Relational algebra is the mathematical model of query languages based on set theory:

σ (selection): WHERE (row filtering).
π (projection): SELECT columns (column selection).
⋈ (join): Table combination.
∪ (union), ∩ (intersection), - (difference).
γ (group): GROUP BY.

SQL:

SELECT name FROM users WHERE age > 30

Relational algebra:

π_name(σ_age>30(users))

Algebraic Equivalence Transformations

Relational algebra has transformation rules. For example:

Selection distribution: σ_p∧q(R) = σ_p(σ_q(R))
Selection push through join: σ_p(R ⋈ S) = σ_p(R) ⋈ S (when p only uses R's columns)
Projection push through selection: π_A(σ_p(R)) = π_A(σ_p(π_A∪required(R)))
Join commutativity: R ⋈ S = S ⋈ R
Join associativity: (R ⋈ S) ⋈ T = R ⋈ (S ⋈ T)

Applying these rules produces a different query with the same result.

Predicate Pushdown Example

SELECT u.name, o.total
FROM users u JOIN orders o ON u.id = o.user_id
WHERE u.country = 'KR';

Initial plan:

π_{name,total}(σ_{country='KR'}(users ⋈ orders))

Join users and orders first, then filter by country → inefficient.

With predicate pushdown:

π_{name,total}(σ_{country='KR'}(users) ⋈ orders)

Filter users for KR first, then join with orders → much more efficient.

Modern optimizers apply hundreds of such transformations automatically.

Projection Pushdown

SELECT name FROM users JOIN orders ON users.id = orders.user_id;

Initial plan: fetch all columns of users and orders, join, then extract only name.

Optimized: before the join, fetch only (id, name) from users and user_id from orders. Dramatic savings in network and memory.

Subquery Unnesting

SELECT * FROM orders
WHERE user_id IN (SELECT id FROM users WHERE country = 'KR');

This query can be executed two ways:

Correlated: Run the subquery for each orders row → N users scans → very slow.
Unnested: Convert to a join.

-- The optimizer rewrites it like this
SELECT o.* FROM orders o JOIN users u ON o.user_id = u.id
WHERE u.country = 'KR';

A good optimizer performs this transformation automatically. A poor (or older) optimizer executes correlated execution as-is, producing extremely slow results.

3. Physical Operators: The Basic Units of Execution

Access Method: Reading Tables

Sequential Scan (Full Table Scan):

[Page 1][Page 2][Page 3]...
   ↓       ↓       ↓
Read all rows sequentially

Large table + no condition: fast (sequential I/O).
Small result + conditions: slow (must discard all rows).

Index Scan:

B-Tree
   ↓ (find keys matching the condition)
Heap → read the corresponding rows

Small result: very fast.
Large result (high selectivity): slow due to heavy random I/O.
Generally if more than 10% of the total is returned, full scan wins.

Index Only Scan:

B-Tree
   ↓ (answer from index alone)
No heap access

If all needed columns are in the index, the heap is never touched. The fastest access method.

Bitmap Index Scan:

Index 1 → bitmap 1
Index 2 → bitmap 2
  ↓
AND/OR combination
  ↓
Heap scan (in sorted order)

Combine results from multiple indexes. Converts to sequential access pattern to reduce random I/O.

Join Algorithms

Nested Loop Join (NLJ):

for r in R:
    for s in S:
        if r.key == s.key:
            output(r, s)

Complexity: O(|R| × |S|).
With an index: O(|R| × log|S|).
Best for small outer + indexed inner.

Hash Join:

# Build phase
hash_table = {}
for r in R:
    hash_table[r.key] = r

# Probe phase
for s in S:
    if s.key in hash_table:
        output(hash_table[s.key], s)

Complexity: O(|R| + |S|).
Requires memory.
Best for large tables + no index.
Parallelism-friendly.

Sort-Merge Join:

sort(R, by=key)
sort(S, by=key)

i, j = 0, 0
while i < len(R) and j < len(S):
    if R[i].key == S[j].key:
        output(R[i], S[j])
        # Handle duplicates
    elif R[i].key < S[j].key:
        i += 1
    else:
        j += 1

Complexity: O(|R| log|R| + |S| log|S|).
If already sorted: O(|R| + |S|).
Best when ORDER BY or an index already sorts the input.
Supports range joins.

Selection Criteria

Which join to pick? The optimizer compares via a cost function:

NLJ cost: |R| × cost_per_probe
Hash join cost: |R| × hash_cost + |S| × probe_cost + build_memory
Merge join cost: sort(R) + sort(S) + scan(R) + scan(S)

The key point here is that the optimizer does not know |R| and |S|. It must estimate them.

4. Cost-Based Optimization: The Mathematics of Cost

Cost Functions

Each physical operator has its own cost formula:

Sequential Scan:

cost = seq_page_cost × num_pages + cpu_tuple_cost × num_tuples

Index Scan:

cost = random_page_cost × (index_pages + heap_pages) 
     + cpu_index_tuple_cost × num_matched 
     + cpu_tuple_cost × num_matched

Parameter Meanings

PostgreSQL defaults:

seq_page_cost = 1.0          # Sequential disk page I/O
random_page_cost = 4.0       # Random disk page I/O
cpu_tuple_cost = 0.01        # Row processing CPU cost
cpu_index_tuple_cost = 0.005 # Index entry processing
cpu_operator_cost = 0.0025   # Operator execution (=, <, ...)

These values are relative. seq_page_cost = 1.0 is just a reference.

Adjustments for the SSD Era

In the HDD era, random I/O was much slower than sequential I/O (about 100x). That's why random_page_cost = 4.0 was set. But on SSDs the gap is small. On NVMe it's nearly identical.

# In SSD environments
random_page_cost = 1.1

This single change alone leads the optimizer to use indexes more often.

effective_cache_size

effective_cache_size = 4GB

This parameter tells the optimizer "how many MB can be cached in OS page cache + PostgreSQL shared buffers." The optimizer uses this to adjust expected I/O for index scans. Larger → prefers indexes, smaller → prefers full scan.

Join Cost Example

SELECT * FROM A JOIN B ON A.id = B.a_id

A: 10,000 rows, B: 100,000 rows, B has an a_id index.

NLJ (A outer):

10,000 × (index_lookup_cost + matched_rows_cost)
≈ 10,000 × (4 + 0.01 × 10)  (10 matches per A row on average)
≈ 40,000 + 1000 = 41,000

Hash Join (B as build):

build: 100,000 × cpu_tuple_cost = 1,000
probe: 10,000 × (hash_cost + matched_cost) ≈ 10,000 × 0.02 = 200
total: 1,200

Hash is much cheaper. The optimizer picks hash join.

Note: Cost is for relative comparison, not absolute time. "cost=1,000" is not "1 second." It's used to rank alternatives within the same query.

5. Cardinality Estimation: The Lifeblood of the Optimizer

"How Many Rows Will This Produce?"

All cost calculations depend on the result row count (cardinality) of each operator. If the estimate is wrong, the entire plan is wrong.

SELECT * FROM users WHERE country = 'KR' AND age > 30;

Questions:

Total users: 1 million
Selectivity of country='KR': 20% → 200k
Selectivity of age > 30: 40% → 400k
Both satisfied: ?

Independence assumption for selectivity: 20% × 40% = 8% → estimated 80k rows.

If this estimate is close to reality (say 50k), the plan is good. If it's completely off (say 500k), disaster.

Statistics: The Optimizer's Sense Organs

Accurate statistics lead to accurate estimates. Information stored in PostgreSQL's pg_statistic:

n_distinct: number of distinct values in a column.
null_frac: NULL ratio.
most_common_vals (MCV): the most frequently occurring values.
most_common_freqs: frequencies of the MCVs.
histogram_bounds: value distribution histogram (equal-frequency buckets).
correlation: correlation between physical and logical order.

Histograms

Histogram for column age (100 buckets)
[0-5][5-12][12-20][20-25][25-28]...[70-120]
 5%   5%     5%     5%      5%       5%

Each bucket holds an equal fraction (e.g., 1%) of the total. Narrower bucket ranges indicate denser value regions.

Query WHERE age > 30:

How many buckets are at or above 30?
Each bucket is 1% of the total.
Estimate: bucket_count × 1% × total_rows.

MCV (Most Common Values)

When values are heavily skewed (e.g., 90% of status is 'active'):

MCV: ['active', 'inactive', 'pending']
MCV freqs: [0.9, 0.08, 0.02]

WHERE status = 'active' → estimate: 0.9 × total_rows.

Values not in MCV are estimated via the histogram.

Pitfalls in Selectivity Computation

Pitfall 1: The Independence Assumption Fallacy

WHERE city = 'Seoul' AND country = 'Korea'

Optimizer: P(city='Seoul') × P(country='Korea') = 0.1 × 0.2 = 0.02

Reality: P(city='Seoul' AND country='Korea') = 0.1 (Seoul only exists in Korea)

5x off. Leads to a bad plan.

Solution: Extended Statistics. In PostgreSQL 10+:

CREATE STATISTICS ON (city, country) FROM addresses;
ANALYZE addresses;

Now the optimizer knows the correlation between the two columns.

Pitfall 2: LIKE Patterns

WHERE name LIKE 'A%'

Selectivity is hard to estimate. The optimizer tries with MCVs and histograms, but often misses.

Pitfall 3: Estimating Join Results

SELECT * FROM A JOIN B ON A.key = B.key

Result row count estimate:

|R ⋈ S| ≈ |R| × |S| / max(n_distinct(A.key), n_distinct(B.key))

This formula is based on an "even distribution" assumption. In reality, skew causes large errors.

The Importance of ANALYZE

Statistics are updated with the ANALYZE command:

ANALYZE users;  -- Specific table
ANALYZE;        -- Everything

PostgreSQL's autovacuum runs ANALYZE periodically. However:

After bulk INSERT/UPDATE, a manual ANALYZE is recommended.
Stale statistics cause the optimizer to pick bad plans.
Querying new tables at scale without ANALYZE may yield horrible plans.

6. Volcano Optimizer

Goetz Graefe's Innovation

The Volcano Optimizer, published by Goetz Graefe in 1993, is the starting point of modern query optimizers. The core idea:

"Generate every possible plan using a set of transformation rules, and pick the cheapest."

Operator Tree

A query is represented as a tree of logical operators:

Join
├── Scan(A)
└── Scan(B)

The optimizer's goal is to convert this into a tree of physical operators while minimizing cost:

HashJoin
├── SeqScan(A)
└── IndexScan(B)

Transformation Rules

Volcano explores with sets of transformation rules:

Logical → Logical: Join commutativity, associativity.
Logical → Physical: Join → HashJoin, Join → NestedLoopJoin, etc.
Enforcer: Add a Sort if ordering is required.

Top-down Search

Volcano performs a top-down search:

Start from the root operator.
Enumerate possible physical implementations for each operator.
Recursively optimize children.
After computing costs of all subtrees, pick the optimum.

Memoization

Re-computing the same subproblem many times is expensive. Volcano uses memoization (caching computed results). The same subexpression is optimized only once.

Drawbacks

Volcano is excellent but had several limitations:

Exhaustive search: Tries to generate every plan, exploding with many joins.
Top-down inefficiency: Decisions at the top propagate down and cause repeated revisits.

The improvement over this was Cascades.

7. Cascades Optimizer

Graefe's Next Work

Cascades (1995) is also Goetz Graefe's work. The optimizers of SQL Server, CockroachDB, Apache Calcite, and Greenplum are all Cascades-based.

Key Improvements

Unified logical/physical transformations: Rules expressed in a unified way.
Better memoization structure: The "memo" data structure prevents redundant optimization.
Property propagation: Propagates properties like ordering and distribution.
Branch-and-bound pruning: Obviously bad plans are abandoned early.

The Memo Structure

The memo is a set of groups of equivalent expressions:

Group 1: σ_p(R)
Group 2: σ_p(R) ⋈ S, (R ⋈ S with σ_p pushed)
Group 3: A ⋈ B, B ⋈ A (commutativity)

Each group holds multiple expressions that yield the same result. The optimizer compares costs at the group level.

Properties

Physical Properties are characteristics of a plan:

Ordering: Is the result sorted?
Distribution: In a distributed DB, how is the data partitioned?
Uniqueness: Are there no duplicates in the result?

When a parent operator requires a certain property (e.g., Merge Join requires sorted input), the child either provides it or an enforcer is added (Sort).

Pruning

To prevent the search from exploding, pruning:

Skip subproblems whose cost already exceeds the best found.
Branch-and-bound algorithm.

Cascades-Based Optimizers

SQL Server: Its own Cascades-based implementation.
PostgreSQL: Not Cascades (dynamic-programming-based).
CockroachDB: Cascades-based.
Apache Calcite: Cascades-based, used in Hive/Flink/Beam.
Presto/Trino: Calcite-inspired + custom implementation.

8. PostgreSQL's Optimizer

Structure

PostgreSQL takes a slightly different approach from Volcano/Cascades:

Rewriting → Logical plan
Plan generation: Dynamic programming + Genetic algorithm.
Cost estimation: Discrete cost functions.
Plan selection: Cheapest wins.

Dynamic Programming Join Ordering

Join ordering uses Dynamic Programming (DP). It computes the optimal join of each subset and combines them.

Complexity: around O(3^n).
n ≤ 12: DP gives the exact optimum.
n > 12: exceeds geqo_threshold → switch to Genetic Algorithm.

Genetic Algorithm (GEQO)

For joins across many tables (e.g., 15), DP is too slow, so:

Generate random join orders.
Evaluate "fitness" (estimated cost).
Cross over and mutate good ones.
Evolve over generations.
Pick the best.

Not exactly optimal, but finds a reasonable solution quickly. geqo = on (default).

Generic vs Custom Plan

PostgreSQL behaves specially for prepared statements:

Custom plan: Optimized per execution based on parameters.
Generic plan: Optimized once, then reused.

PostgreSQL uses custom plans for the first few executions, then switches to generic if generic is cheaper. This can cause the phenomenon where "it's fast at first, then suddenly slow."

EXPLAIN

To see the optimizer's decisions, use EXPLAIN:

EXPLAIN (ANALYZE, BUFFERS, VERBOSE) 
SELECT u.name, o.total 
FROM users u JOIN orders o ON u.id = o.user_id
WHERE u.country = 'KR';

Hash Join  (cost=10.00..250.00 rows=500 width=64) (actual time=0.1..5.0 rows=487 loops=1)
  Hash Cond: (o.user_id = u.id)
  Buffers: shared hit=234
  ->  Seq Scan on orders o  (cost=0.00..150.00 rows=10000 width=16)
  ->  Hash  (cost=8.00..8.00 rows=100 width=56)
        ->  Index Scan using idx_users_country on users u
              Index Cond: (country = 'KR')

Interpretation:

cost: estimated cost (startup_cost..total_cost).
rows: estimated row count.
actual: measurement during actual execution.
loops: how many times this node ran.

A big gap between estimated rows and actual rows indicates a statistics problem. ANALYZE needed.

9. MySQL's Optimizer

History

MySQL's optimizer was heuristics-based for a long time:

Always joins tables in fixed order (left-deep tree).
Decides index usage via rules.

Since 5.6 it has introduced a cost-based optimizer and continues to improve.

Histograms

Histograms are supported from MySQL 8.0:

ANALYZE TABLE users UPDATE HISTOGRAM ON country WITH 100 BUCKETS;

Added later than PostgreSQL but effective.

Optimizer Hints

MySQL allows giving hints to the optimizer:

SELECT /*+ INDEX(users idx_country) */ *
FROM users WHERE country = 'KR';

This tells it "use the country index." PostgreSQL doesn't officially support hints (philosophical difference).

ICP (Index Condition Pushdown)

A useful MySQL feature. Evaluates some WHERE conditions at the index level:

SELECT * FROM users WHERE country = 'KR' AND age > 30;

With an index (country, age):

Normal: match country='KR' in the index → fetch rows → check age.
ICP: match country='KR' in the index → check age condition at the index level → fetch only those that pass.

Greatly reduces random I/O.

Join Buffer (Block Nested Loop)

When no index exists, MySQL's default join is Block Nested Loop:

Gather several outer rows into memory up to join_buffer_size
  ↓
Scan the inner table once, matching each outer block
  ↓
Reduced I/O

Hash join was added in MySQL 8.0.18. Before that, block NLJ was the only alternative.

10. Optimizer Limitations and Real-World Techniques

Limitation 1: Missing or Stale Statistics

Symptom: The rows estimate in EXPLAIN is badly off from reality.

Solution:

ANALYZE my_table;  -- PostgreSQL
ANALYZE TABLE my_table;  -- MySQL

Verify automatic ANALYZE works. After big updates, run it manually.

Limitation 2: Join Explosion

Joining 20 tables exhausts the optimizer. Time grows exponentially.

Solutions:

Decompose via views: Frequently used combinations as views.
Split via subqueries: Compute parts first.
PostgreSQL: Tune join_collapse_limit, from_collapse_limit.
MySQL: Tune optimizer_search_depth.

Limitation 3: Correlated Columns

WHERE city = 'Seoul' AND country = 'Korea'

Independence assumption fails, causing estimation errors. Use extended statistics (PostgreSQL) or histograms with correlation information.

Limitation 4: Skewed Data

When a particular value is extremely frequent:

status column: 'active' 99%, 'deleted' 1%

Problematic in "all-or-nothing" queries:

SELECT * FROM t WHERE status = 'deleted';

Without stats: estimate "50% of rows" → full scan.
With MCV: estimate "1%" → use index.

MCV needs to be captured accurately.

Limitation 5: Parameter-Dependent Queries

PREPARE p AS SELECT * FROM t WHERE x = $1 AND y > $2;

The optimal plan may differ for $1 = 'rare_value' vs $1 = 'common_value'. A generic plan cannot be optimal for both cases.

Solutions:

Force custom plan: PostgreSQL's plan_cache_mode = force_custom_plan.
Dynamic SQL: Give up parameterization.

Practical Technique: Query Rewriting

For problems the optimizer can't solve, rewrite SQL:

Original (slow):

SELECT * FROM orders
WHERE EXISTS (
  SELECT 1 FROM users
  WHERE users.id = orders.user_id AND users.country = 'KR'
);

Rewritten (fast):

SELECT o.* FROM orders o
INNER JOIN users u ON o.user_id = u.id
WHERE u.country = 'KR';

Modern optimizers do most such rewrites automatically, but not always.

Practical Technique: Materialized Views

Precompute and store expensive, frequently run queries:

CREATE MATERIALIZED VIEW user_order_summary AS
SELECT u.id, u.name, COUNT(o.id) as order_count, SUM(o.total) as total_spent
FROM users u LEFT JOIN orders o ON u.id = o.user_id
GROUP BY u.id, u.name;

REFRESH MATERIALIZED VIEW user_order_summary;

The optimizer (by default in PostgreSQL) doesn't exploit these automatically, but the application can use them directly.

Practical Technique: Hints or Plan Pinning

The last resort: force a specific plan.

Oracle:

SELECT /*+ INDEX(t idx_t_name) */ * FROM t WHERE name = 'foo';

PostgreSQL (pg_hint_plan extension):

/*+ IndexScan(t idx_t_name) */
SELECT * FROM t WHERE name = 'foo';

MySQL:

SELECT * FROM t USE INDEX (idx_t_name) WHERE name = 'foo';

Hints are tempting but risky. If data distribution changes, they may no longer be optimal. Use only as a last resort.

11. Optimizers in Distributed Databases

Additional Considerations

Optimizers in distributed DBs (CockroachDB, TiDB, Spanner) have additional concerns:

Data locality: Which node has which data.
Network cost: Where to perform the join?
Parallelism: Execute concurrently on multiple nodes.
Exchange operator: Data movement between nodes.

Distributed Join Strategies

Broadcast Join: Replicate a small table to all nodes.

Small table → broadcast → local join at each node

Shuffle Join (Hash Redistribution): Redistribute to nodes by hashing the join key.

Both tables → hash on join key → same keys to the same node

Collocated Join: If both tables are already partitioned on the same key, join locally.

The optimizer compares their costs and chooses.

TiDB's Optimizer

TiDB is a Cost-Based + Rule-Based hybrid:

Rule-based: always-applied transforms (predicate pushdown, etc.).
Cost-based: cardinality + cost model for plan selection.
Statistics: histograms + Count-Min Sketch for skew handling.

Significant research continues to overcome the extra complexity of distributed environments.

Quiz for Review

Q1. What is the most important input for an optimizer to pick a "good plan," and why?

A. Statistics, particularly accurate cardinality estimation. All cost calculations in the optimizer are based on the expected row count from each stage. If estimates are wrong:

The chosen access method becomes inappropriate (index scan on a big result, full scan on a small one).
Wrong join algorithm (nested loop is the worst case).
Wrong join order → intermediate result explosion.

If statistics are missing or stale, these will happen without fail. That's why the first step of database tuning is always to check "Did you run ANALYZE?". After bulk INSERT/UPDATE or adding a new index, a manual ANALYZE is a must.

Q2. What happens in PostgreSQL when you lower random_page_cost from 4.0 to 1.1?

A. The optimizer becomes more likely to prefer index scans. Why:

Index Scan causes random I/O against the heap.
The default of 4.0 assumes random I/O is 4x slower than sequential — a legacy of the HDD era.
On modern SSDs/NVMe, the gap is almost nonexistent (1.1 is more realistic).

After this change:

More queries switch to index scans.
Conditions with low selectivity (few rows) avoid full scan.
Overall performance typically improves, especially where random I/O is cheap.

Caveat: This parameter only changes the optimizer's estimates, not actual performance directly. If it doesn't match the environment, it may lead to bad plans. Use 1.1 to 1.5 on SSDs, keep 4.0 on HDDs, somewhere in between for network storage.

Q3. What is the problem with the "independent selectivity assumption" and PostgreSQL's solution?

A. Problem: The optimizer assumes P(A and B) = P(A) × P(B). This holds when conditions are independent, but in practice, they are often correlated:

WHERE city = 'Seoul' AND country = 'Korea'

Estimate: 10% × 20% = 2%
Reality: 10% (Seoul only in Korea)
5x underestimate → wrong plan.

PostgreSQL solution (10+): Extended Statistics

CREATE STATISTICS stat_addr (dependencies) ON city, country FROM addresses;
CREATE STATISTICS stat_addr_ndist (ndistinct) ON city, country FROM addresses;
ANALYZE addresses;

Three types:

dependencies: Detect functional dependencies (city → country).
ndistinct: Number of distinct combinations across columns.
MCV (multivariate): Most common value combinations across columns.

With this info, the optimizer can account for correlation. Very effective, but it has additional statistics-maintenance cost, so apply selectively to problematic queries.

Q4. What is the difference between the Volcano and Cascades optimizers?

Volcano (1993):

Top-down search.
Exhaustive plan generation.
Has memoization.
Based on transformation rules.
Limits: Search space explosion for many joins, limited property handling.

Cascades (1995):

Improvement over Volcano (same author).
Memo data structure groups equivalent expressions.
Physical property propagation (ordering, distribution).
Branch-and-bound pruning.
Enforcers: Add Sort, Exchange to provide required properties.

Analogy: If Volcano is "generate all plans and pick one," Cascades is "compare at the group level, demand only required properties, and discard obviously bad ones early."

Usage: SQL Server, CockroachDB, Apache Calcite (Cascades-based). PostgreSQL is neither — it's its own DP-based design. Volcano mostly appears in research and lectures, while Cascades has become the industry standard.

Q5. What is the diagnostic order when "estimated rows" and "actual rows" in EXPLAIN ANALYZE differ significantly?

A. This is a sign that the optimizer made a bad estimate. Diagnostic order:

1. Check whether ANALYZE ran

SELECT last_analyze, last_autoanalyze FROM pg_stat_user_tables WHERE relname='mytable';

If too stale, run it manually: ANALYZE mytable;

2. Check statistics quality

SELECT attname, n_distinct, most_common_vals, histogram_bounds
FROM pg_stats WHERE tablename = 'mytable';

If n_distinct is odd, the sample may be too small → ALTER TABLE ... SET STATISTICS 1000.
If MCV is empty, the data might not be evenly distributed.

3. Suspect correlated columns If you have multiple WHERE conditions that always appear together, use extended statistics:

CREATE STATISTICS s ON col1, col2 FROM mytable;

4. Subqueries / function calls WHERE col = get_default() and similar function calls make selectivity unestimable.

5. Bad join-result estimates If an intermediate join result looks wrong, the n_distinct of join-participating columns may be inaccurate.

6. Hints / rewrite If all the above fail, force a specific plan with the pg_hint_plan extension, or rewrite the query (e.g., use a CTE, inline a subquery).

The key is to approach it in the order "statistics → data characteristics → query structure." Most problems are resolved with a single ANALYZE.

Closing: The Optimizer Is a Partner

Key Points

The optimizer depends on statistics. Don't skimp on ANALYZE.
It's cost-based. Tune cost parameters to your environment.
Cardinality estimation is life. For correlated columns, use extended statistics.
Read EXPLAIN. The difference between estimated and actual is the start of diagnosis.
Join order is the optimizer's hard problem. With many tables, consider hints or rewrites.
Cascades is the modern standard, but PostgreSQL has its own DP.
Hints are the last resort. Fix statistics and structure first.

Cooperating with the Optimizer

Think of the optimizer not as an enemy but as a partner:

Give it statistics: ANALYZE, extended statistics.
Write clear queries: SQL the optimizer can understand easily.
Provide indexes: Secure the necessary access paths.
Measure: Check reality with EXPLAIN ANALYZE.
Trust, but verify: Most of the time, the optimizer is right.

Final Lesson

The optimizer is the product of decades of research and work by tens of thousands of engineers. Don't try to beat it. Instead, help the optimizer make the right decisions. Accurate statistics, appropriate indexes, clear queries. When those three come together, SQL runs magically fast.

Next time you meet a slow query, first ask:

"Are the statistics fresh?"
"Does the cardinality the optimizer sees match reality?"
"Is the structure amenable to optimization with the indexes and statistics I provided?"

If the answers are "yes," the optimizer will almost always give you the right answer.

References

The Volcano Optimizer Generator (Graefe, 1993)
The Cascades Framework for Query Optimization (Graefe, 1995)
Access Path Selection in a Relational Database Management System (Selinger et al., 1979) - The System R classic
PostgreSQL Query Planner
How PostgreSQL's Query Planner Works (Interdb)
MySQL Query Optimization
Apache Calcite Documentation
CockroachDB SQL Optimizer (Cascades-based)
Database Internals (Alex Petrov), Chapter 12
Use the Index, Luke! - A complete guide to index usage