SQL Mastery 2025: Developer Interview SQL — From JOINs to Window Functions and Query Optimization

Introduction: Why SQL Is Still King in 2025
1. JOIN Mastery
2. Aggregation
3. Window Functions
4. Subquery vs CTE vs Derived Table
5. Recursive CTE
- 5.1 Org Chart Tree Traversal
- 5.2 Sequence Generation
6. Indexing
7. EXPLAIN ANALYZE
8. Query Optimization
9. Transaction Isolation Levels
10. PostgreSQL-Specific Features
11. Practice Problems (30)
12. Interview Questions (15)
13. Quiz (5 Questions)
14. References

Introduction: Why SQL Is Still King in 2025

In the Stack Overflow 2024 Developer Survey, SQL ranked among the top 3 most-used languages, and it remains one of the most sought-after skills. PostgreSQL was named DB-Engines DBMS of the Year for two consecutive years (2023-2024), and even in the AI/LLM era, the fundamental skill for working with data is SQL.

Why SQL matters in interviews:

Over 80% of backend position interviews include SQL questions
Essential for data engineers, full-stack, and DevOps roles
When ORM-only code hits performance walls, SQL is the solution
Window Functions and CTEs are core topics in modern interviews

┌──────────────────────────────────────────────────┐
│          Why SQL Is Still King (2025)             │
├──────────────────────────────────────────────────┤
│  Stack Overflow  → Top 3 most used language      │
│  PostgreSQL      → DBMS of the Year 2 years      │
│  AI/LLM          → Text-to-SQL core technology   │
│  Cloud            → Aurora, AlloyDB, Neon         │
│  Interview freq   → 80%+ of backend interviews   │
└──────────────────────────────────────────────────┘

This article covers everything you need for SQL interviews: all JOIN types, Window Functions, indexing, query optimization, and transactions.

1. JOIN Mastery

1.1 Sample Tables

-- Employees table
CREATE TABLE employees (
    id    INT PRIMARY KEY,
    name  VARCHAR(50),
    dept_id INT
);

INSERT INTO employees VALUES
(1, 'Alice',   10),
(2, 'Bob',     20),
(3, 'Charlie', 10),
(4, 'Diana',   30),
(5, 'Eve',     NULL);

-- Departments table
CREATE TABLE departments (
    id    INT PRIMARY KEY,
    name  VARCHAR(50)
);

INSERT INTO departments VALUES
(10, 'Engineering'),
(20, 'Marketing'),
(40, 'HR');

1.2 INNER JOIN

Returns only rows that match in both tables.

SELECT e.name AS employee, d.name AS department
FROM employees e
INNER JOIN departments d ON e.dept_id = d.id;

┌──────────┬─────────────┐
│ employee │ department  │
├──────────┼─────────────┤
│ Alice    │ Engineering │
│ Bob      │ Marketing   │
│ Charlie  │ Engineering │
└──────────┴─────────────┘
-- Diana (dept_id=30) and Eve (dept_id=NULL) excluded - no match
-- HR (id=40) excluded - no matching employees

┌─────────┐   ┌─────────┐
│Employees│   │  Depts  │
│    ┌────┼───┼────┐    │
│    │████│███│████│    │
│    │████│███│████│    │
│    └────┼───┼────┘    │
└─────────┘   └─────────┘
      INNER JOIN = Intersection

1.3 LEFT JOIN (LEFT OUTER JOIN)

All rows from the left table + matching right rows (NULL if no match).

SELECT e.name AS employee, d.name AS department
FROM employees e
LEFT JOIN departments d ON e.dept_id = d.id;

┌──────────┬─────────────┐
│ employee │ department  │
├──────────┼─────────────┤
│ Alice    │ Engineering │
│ Bob      │ Marketing   │
│ Charlie  │ Engineering │
│ Diana    │ NULL        │
│ Eve      │ NULL        │
└──────────┴─────────────┘

1.4 RIGHT JOIN (RIGHT OUTER JOIN)

All rows from the right table + matching left rows (NULL if no match).

SELECT e.name AS employee, d.name AS department
FROM employees e
RIGHT JOIN departments d ON e.dept_id = d.id;

┌──────────┬─────────────┐
│ employee │ department  │
├──────────┼─────────────┤
│ Alice    │ Engineering │
│ Bob      │ Marketing   │
│ Charlie  │ Engineering │
│ NULL     │ HR          │
└──────────┴─────────────┘

1.5 FULL OUTER JOIN

Returns all rows from both tables. NULL where no match exists.

SELECT e.name AS employee, d.name AS department
FROM employees e
FULL OUTER JOIN departments d ON e.dept_id = d.id;

┌──────────┬─────────────┐
│ employee │ department  │
├──────────┼─────────────┤
│ Alice    │ Engineering │
│ Bob      │ Marketing   │
│ Charlie  │ Engineering │
│ Diana    │ NULL        │
│ Eve      │ NULL        │
│ NULL     │ HR          │
└──────────┴─────────────┘

1.6 CROSS JOIN

Returns all combinations (Cartesian product) of two tables.

SELECT e.name, d.name
FROM employees e
CROSS JOIN departments d;
-- 5 employees x 3 departments = 15 rows

1.7 SELF JOIN

Joining a table with itself.

-- Manager-employee relationship
CREATE TABLE staff (
    id         INT PRIMARY KEY,
    name       VARCHAR(50),
    manager_id INT
);

INSERT INTO staff VALUES
(1, 'CEO',     NULL),
(2, 'CTO',     1),
(3, 'Alice',   2),
(4, 'Bob',     2);

SELECT
    s.name AS employee,
    m.name AS manager
FROM staff s
LEFT JOIN staff m ON s.manager_id = m.id;

┌──────────┬─────────┐
│ employee │ manager │
├──────────┼─────────┤
│ CEO      │ NULL    │
│ CTO      │ CEO     │
│ Alice    │ CTO     │
│ Bob      │ CTO     │
└──────────┴─────────┘

2. Aggregation

2.1 GROUP BY and HAVING

-- Employee count and average salary by department
SELECT
    dept_id,
    COUNT(*) AS emp_count,
    AVG(salary) AS avg_salary,
    MAX(salary) AS max_salary,
    MIN(salary) AS min_salary
FROM employees
GROUP BY dept_id
HAVING COUNT(*) >= 2
ORDER BY avg_salary DESC;

Key rule: Any non-aggregate column in SELECT must appear in GROUP BY.

2.2 ROLLUP

-- Department totals with grand total
SELECT
    COALESCE(dept_name, 'Grand Total') AS department,
    COALESCE(job_title, 'Subtotal') AS job,
    COUNT(*) AS emp_count,
    SUM(salary) AS total_salary
FROM employees e
JOIN departments d ON e.dept_id = d.id
GROUP BY ROLLUP(dept_name, job_title);

2.3 CUBE and GROUPING SETS

-- CUBE: subtotals for all combinations
SELECT dept_name, job_title, SUM(salary)
FROM employees e JOIN departments d ON e.dept_id = d.id
GROUP BY CUBE(dept_name, job_title);

-- GROUPING SETS: only specific combinations
SELECT dept_name, job_title, SUM(salary)
FROM employees e JOIN departments d ON e.dept_id = d.id
GROUP BY GROUPING SETS (
    (dept_name, job_title),  -- by dept + job
    (dept_name),              -- by dept
    ()                        -- grand total
);

3. Window Functions

3.1 What Are Window Functions

Functions that aggregate while preserving individual rows. Unlike GROUP BY, they do not reduce rows.

-- GROUP BY: reduces rows (1 row per dept)
SELECT dept_id, AVG(salary) FROM employees GROUP BY dept_id;

-- Window Function: preserves rows (all employees + dept average)
SELECT
    name,
    dept_id,
    salary,
    AVG(salary) OVER (PARTITION BY dept_id) AS dept_avg
FROM employees;

3.2 ROW_NUMBER, RANK, DENSE_RANK

SELECT
    name,
    dept_id,
    salary,
    ROW_NUMBER() OVER (ORDER BY salary DESC) AS row_num,
    RANK()       OVER (ORDER BY salary DESC) AS rank,
    DENSE_RANK() OVER (ORDER BY salary DESC) AS dense_rank
FROM employees;

┌─────────┬─────────┬────────┬─────────┬──────┬────────────┐
│  name   │ dept_id │ salary │ row_num │ rank │ dense_rank │
├─────────┼─────────┼────────┼─────────┼──────┼────────────┤
│ Alice   │   10    │ 120000 │    1    │  1   │     1      │
│ Charlie │   10    │ 120000 │    2    │  1   │     1      │
│ Bob     │   20    │ 100000 │    3    │  3   │     2      │
│ Diana   │   30    │  80000 │    4    │  4   │     3      │
│ Eve     │   20    │  80000 │    5    │  4   │     3      │
└─────────┴─────────┴────────┴─────────┴──────┴────────────┘

ROW_NUMBER: Always unique numbers (different even for ties)
RANK:       Same number for ties, gaps after (1, 1, 3)
DENSE_RANK: Same number for ties, no gaps   (1, 1, 2)

3.3 NTILE

Divides data into N equal groups.

-- Divide into salary quartiles
SELECT
    name,
    salary,
    NTILE(4) OVER (ORDER BY salary) AS quartile
FROM employees;

3.4 LAG and LEAD

Reference values from previous/next rows.

-- Month-over-month revenue changes
SELECT
    month,
    revenue,
    LAG(revenue, 1) OVER (ORDER BY month) AS prev_month,
    revenue - LAG(revenue, 1) OVER (ORDER BY month) AS change,
    LEAD(revenue, 1) OVER (ORDER BY month) AS next_month
FROM monthly_sales;

┌───────┬─────────┬────────────┬────────┬────────────┐
│ month │ revenue │ prev_month │ change │ next_month │
├───────┼─────────┼────────────┼────────┼────────────┤
│ 01    │  50000  │   NULL     │  NULL  │   60000    │
│ 02    │  60000  │   50000    │  10000 │   45000    │
│ 03    │  45000  │   60000    │ -15000 │   70000    │
│ 04    │  70000  │   45000    │  25000 │   NULL     │
└───────┴─────────┴────────────┴────────┴────────────┘

3.5 Running Total

SELECT
    order_date,
    amount,
    SUM(amount) OVER (ORDER BY order_date) AS running_total,
    SUM(amount) OVER (
        ORDER BY order_date
        ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
    ) AS moving_sum_3
FROM orders;

3.6 Top N Per Group

-- Top 3 salaries per department
WITH ranked AS (
    SELECT
        name,
        dept_id,
        salary,
        ROW_NUMBER() OVER (
            PARTITION BY dept_id
            ORDER BY salary DESC
        ) AS rn
    FROM employees
)
SELECT name, dept_id, salary
FROM ranked
WHERE rn <= 3;

4. Subquery vs CTE vs Derived Table

4.1 Subqueries

-- Correlated subquery: employees earning above their dept average
SELECT name, salary, dept_id
FROM employees e1
WHERE salary > (
    SELECT AVG(salary)
    FROM employees e2
    WHERE e2.dept_id = e1.dept_id
);

-- Non-correlated subquery: above overall average
SELECT name, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);

4.2 CTE (Common Table Expression)

WITH dept_avg AS (
    SELECT dept_id, AVG(salary) AS avg_salary
    FROM employees
    GROUP BY dept_id
),
high_earners AS (
    SELECT e.name, e.salary, e.dept_id, d.avg_salary
    FROM employees e
    JOIN dept_avg d ON e.dept_id = d.dept_id
    WHERE e.salary > d.avg_salary
)
SELECT * FROM high_earners ORDER BY salary DESC;

4.3 Comparison

┌────────────────┬───────────────┬───────────────┬────────────────┐
│    Aspect       │  Subquery     │     CTE       │  Derived Table │
├────────────────┼───────────────┼───────────────┼────────────────┤
│ Readability    │ Low (nested)  │ High          │ Medium         │
│ Reuse          │ Not possible  │ Within query  │ Not possible   │
│ Recursion      │ Not possible  │ Yes (WITH REC)│ Not possible   │
│ Performance    │ DB-dependent  │ DB-dependent  │ Inline view    │
│ Debugging      │ Hard          │ Easy          │ Medium         │
└────────────────┴───────────────┴───────────────┴────────────────┘

5. Recursive CTE

5.1 Org Chart Tree Traversal

CREATE TABLE org (
    id         INT PRIMARY KEY,
    name       VARCHAR(50),
    manager_id INT REFERENCES org(id)
);

INSERT INTO org VALUES
(1, 'CEO',      NULL),
(2, 'VP Eng',   1),
(3, 'VP Sales', 1),
(4, 'Dir Eng',  2),
(5, 'Lead Dev', 4),
(6, 'Dev 1',    5),
(7, 'Dev 2',    5);

-- Recursive CTE: traverse full org chart from CEO
WITH RECURSIVE org_tree AS (
    -- Base case: CEO (no manager)
    SELECT id, name, manager_id, 0 AS depth, name::TEXT AS path
    FROM org
    WHERE manager_id IS NULL

    UNION ALL

    -- Recursive case: find subordinates
    SELECT o.id, o.name, o.manager_id,
           t.depth + 1,
           t.path || ' > ' || o.name
    FROM org o
    JOIN org_tree t ON o.manager_id = t.id
)
SELECT
    REPEAT('  ', depth) || name AS hierarchy,
    depth,
    path
FROM org_tree
ORDER BY path;

┌────────────────────┬───────┬──────────────────────────────┐
│ hierarchy          │ depth │ path                         │
├────────────────────┼───────┼──────────────────────────────┤
│ CEO                │   0   │ CEO                          │
│   VP Eng           │   1   │ CEO > VP Eng                 │
│     Dir Eng        │   2   │ CEO > VP Eng > Dir Eng       │
│       Lead Dev     │   3   │ CEO > VP Eng > Dir Eng > ... │
│         Dev 1      │   4   │ CEO > VP Eng > Dir Eng > ... │
│         Dev 2      │   4   │ CEO > VP Eng > Dir Eng > ... │
│   VP Sales         │   1   │ CEO > VP Sales               │
└────────────────────┴───────┴──────────────────────────────┘

5.2 Sequence Generation

-- Generate numbers 1 to 10
WITH RECURSIVE nums AS (
    SELECT 1 AS n
    UNION ALL
    SELECT n + 1 FROM nums WHERE n < 10
)
SELECT n FROM nums;

-- Date series generation
WITH RECURSIVE dates AS (
    SELECT DATE '2025-01-01' AS dt
    UNION ALL
    SELECT dt + INTERVAL '1 day' FROM dates WHERE dt < '2025-01-31'
)
SELECT dt FROM dates;

6. Indexing

6.1 B-Tree Index

The most common index type. Effective for equality (=), range comparisons, sorting, and LIKE 'prefix%'.

CREATE INDEX idx_employees_name ON employees(name);

-- Effective queries
SELECT * FROM employees WHERE name = 'Alice';            -- equality
SELECT * FROM employees WHERE name BETWEEN 'A' AND 'M';  -- range
SELECT * FROM employees ORDER BY name;                    -- sort
SELECT * FROM employees WHERE name LIKE 'Al%';           -- prefix

B-Tree structure:
              [M]
             /   \
          [D,H]  [P,T]
         /  |  \  / | \
       [A-C][E-G][I-L][N-O][Q-S][U-Z]
       (Leaf nodes: actual data pointers)

6.2 Hash Index

Only effective for equality (=) comparisons. Cannot handle range searches or sorting.

CREATE INDEX idx_employees_email_hash
ON employees USING HASH (email);

-- Effective
SELECT * FROM employees WHERE email = 'alice@example.com';

6.3 GIN (Generalized Inverted Index)

Ideal for arrays, JSONB, and full-text search.

-- JSONB field indexing
CREATE INDEX idx_products_tags ON products USING GIN (tags);

SELECT * FROM products WHERE tags @> '["electronics"]';

-- Full-text search
CREATE INDEX idx_articles_search ON articles
USING GIN (to_tsvector('english', title || ' ' || body));

6.4 GiST (Generalized Search Tree)

Suitable for geographic data, range types, and nearest-neighbor searches.

CREATE INDEX idx_locations_geom ON locations USING GiST (geom);

SELECT * FROM locations
WHERE ST_DWithin(geom, ST_MakePoint(127.0, 37.5)::geography, 1000);

6.5 BRIN (Block Range Index)

Ideal for physically sorted large tables. Very small size for great effectiveness.

-- Optimal for time-series data
CREATE INDEX idx_logs_created ON logs USING BRIN (created_at);

6.6 Composite Index

-- Column order matters!
CREATE INDEX idx_emp_dept_salary ON employees(dept_id, salary);

-- Effective: dept_id only, or dept_id + salary together
SELECT * FROM employees WHERE dept_id = 10;
SELECT * FROM employees WHERE dept_id = 10 AND salary > 50000;

-- Ineffective: salary only (skipping first column)
-- SELECT * FROM employees WHERE salary > 50000;

Composite Index ordering rules (Leftmost Prefix):

Index: (A, B, C)

WHERE A = ?                        -> Used
WHERE A = ? AND B = ?              -> Used
WHERE A = ? AND B = ? AND C = ?    -> Used
WHERE B = ?                        -> NOT Used (A missing)
WHERE B = ? AND C = ?              -> NOT Used (A missing)
WHERE A = ? AND C = ?              -> Only A used (B skipped)

7. EXPLAIN ANALYZE

7.1 Reading Execution Plans

EXPLAIN ANALYZE
SELECT e.name, d.name AS dept
FROM employees e
JOIN departments d ON e.dept_id = d.id
WHERE e.salary > 50000;

Hash Join  (cost=1.09..2.24 rows=3 width=36) (actual time=0.05..0.06 rows=3 loops=1)
  Hash Cond: (e.dept_id = d.id)
  ->  Seq Scan on employees e  (cost=0.00..1.12 rows=3 width=22)
        Filter: (salary > 50000)
        Rows Removed by Filter: 2
  ->  Hash  (cost=1.03..1.03 rows=3 width=18)
        ->  Seq Scan on departments d  (cost=0.00..1.03 rows=3 width=18)
Planning Time: 0.15 ms
Execution Time: 0.09 ms

7.2 Key Scan Types

┌──────────────────┬─────────────────────────────────────┐
│ Scan Type         │ Description                         │
├──────────────────┼─────────────────────────────────────┤
│ Seq Scan         │ Full table sequential scan           │
│ Index Scan       │ Index lookup + table access          │
│ Index Only Scan  │ Results from index only (no table)   │
│ Bitmap Index Scan│ Build bitmap from index, then table  │
│ Bitmap Heap Scan │ Table access based on bitmap         │
└──────────────────┴─────────────────────────────────────┘

7.3 Reading Costs

cost=startup_cost..total_cost  rows=estimated_rows  width=row_size(bytes)

- startup_cost: cost until first row returned
- total_cost: total cost to return all rows
- actual time: real execution time (ms)
- rows: actual rows returned
- loops: number of times this node executed

7.4 Slow Query Diagnosis Flow

┌──────────────────────────────────────────────────┐
│         Query Optimization Flow                   │
├──────────────────────────────────────────────────┤
│ 1. Run EXPLAIN ANALYZE                            │
│    |                                              │
│ 2. Seq Scan on large table?                       │
│    -> Consider adding index                       │
│    |                                              │
│ 3. Large gap between estimated and actual rows?   │
│    -> Run ANALYZE to update statistics            │
│    |                                              │
│ 4. Nested Loop on large datasets?                 │
│    -> Hash Join or Merge Join may be better       │
│    |                                              │
│ 5. Sort using disk?                               │
│    -> Increase work_mem or add index              │
└──────────────────────────────────────────────────┘

8. Query Optimization

8.1 N+1 Problem in SQL

-- Bad: separate query per order (from application)
-- 1. SELECT * FROM orders WHERE user_id = 1;
-- 2. SELECT * FROM order_items WHERE order_id = 101;
-- 3. SELECT * FROM order_items WHERE order_id = 102;
-- ... (N orders = N+1 queries)

-- Good: single JOIN query
SELECT o.id, o.total, oi.product_name, oi.quantity
FROM orders o
JOIN order_items oi ON o.id = oi.order_id
WHERE o.user_id = 1;

8.2 Covering Index

Completes the query using only the index, avoiding table access.

CREATE INDEX idx_emp_dept_name_salary
ON employees(dept_id, name, salary);

-- Index Only Scan possible (no table access needed)
SELECT name, salary FROM employees WHERE dept_id = 10;

8.3 Partial Index

Indexes only rows matching a condition.

-- Index only active users (most queries target active users)
CREATE INDEX idx_active_users
ON users(email)
WHERE status = 'active';

SELECT * FROM users WHERE status = 'active' AND email = 'alice@example.com';

8.4 Expression Index

-- Index for case-insensitive search
CREATE INDEX idx_users_lower_email
ON users(LOWER(email));

SELECT * FROM users WHERE LOWER(email) = 'alice@example.com';

-- JSON field indexing
CREATE INDEX idx_data_category
ON products((data->>'category'));

8.5 Optimization Tips

-- 1. EXISTS vs IN: EXISTS is usually faster for large datasets
-- Slower
SELECT * FROM orders WHERE customer_id IN (
    SELECT id FROM customers WHERE country = 'US'
);
-- Faster
SELECT * FROM orders o WHERE EXISTS (
    SELECT 1 FROM customers c
    WHERE c.id = o.customer_id AND c.country = 'US'
);

-- 2. LIMIT prevents unnecessary scanning
SELECT * FROM logs ORDER BY created_at DESC LIMIT 100;

-- 3. UNION ALL vs UNION: use ALL when dedup not needed
SELECT name FROM employees_2024
UNION ALL
SELECT name FROM employees_2025;

-- 4. Remove unnecessary ORDER BY inside subqueries
-- 5. SELECT specific columns instead of SELECT *

9. Transaction Isolation Levels

9.1 Four Isolation Levels

┌──────────────────┬───────────┬──────────────┬──────────────┬─────────────┐
│ Isolation Level   │Dirty Read │Non-Repeatable│Phantom Read  │ Performance │
│                  │           │    Read      │              │             │
├──────────────────┼───────────┼──────────────┼──────────────┼─────────────┤
│ Read Uncommitted │ Possible  │ Possible     │ Possible     │ Fastest     │
│ Read Committed   │ Prevented │ Possible     │ Possible     │ Fast        │
│ Repeatable Read  │ Prevented │ Prevented    │ Possible     │ Medium      │
│ Serializable     │ Prevented │ Prevented    │ Prevented    │ Slowest     │
└──────────────────┴───────────┴──────────────┴──────────────┴─────────────┘

PostgreSQL default: Read Committed
MySQL InnoDB default: Repeatable Read

9.2 Anomaly Explanations

Dirty Read: Reading uncommitted data

-- Transaction A
UPDATE accounts SET balance = 0 WHERE id = 1;
-- (not yet committed)

-- Transaction B (Read Uncommitted)
SELECT balance FROM accounts WHERE id = 1;
-- balance = 0 (uncommitted value!)

-- Transaction A
ROLLBACK;
-- balance reverts to original, but Transaction B already used wrong data

Non-Repeatable Read: Same query returns different results

-- Transaction A
SELECT balance FROM accounts WHERE id = 1;  -- 1000

-- Transaction B
UPDATE accounts SET balance = 500 WHERE id = 1;
COMMIT;

-- Transaction A (within same transaction)
SELECT balance FROM accounts WHERE id = 1;  -- 500 (different!)

Phantom Read: Row count changes for the same condition

-- Transaction A
SELECT COUNT(*) FROM orders WHERE status = 'pending';  -- 5

-- Transaction B
INSERT INTO orders (status) VALUES ('pending');
COMMIT;

-- Transaction A
SELECT COUNT(*) FROM orders WHERE status = 'pending';  -- 6 (phantom row!)

9.3 MVCC (Multi-Version Concurrency Control)

PostgreSQL uses MVCC so reads and writes do not block each other.

┌──────────────────────────────────────────────────┐
│                  MVCC Principle                   │
├──────────────────────────────────────────────────┤
│                                                  │
│  Transaction 100 (Read):                         │
│  "I only see versions with xmin <= 100"          │
│                                                  │
│  Transaction 101 (Write):                        │
│  "Create new version with xmin=101,              │
│   mark old row with xmax=101"                    │
│                                                  │
│  -> Transaction 100 still sees the old version   │
│  -> Reads and writes do not block each other     │
│                                                  │
└──────────────────────────────────────────────────┘

10. PostgreSQL-Specific Features

10.1 JSONB Queries

CREATE TABLE products (
    id   SERIAL PRIMARY KEY,
    name TEXT,
    data JSONB
);

INSERT INTO products (name, data) VALUES
('Laptop', '{"brand": "Dell", "specs": {"ram": 16, "cpu": "i7"}, "tags": ["electronics", "computer"]}'),
('Phone',  '{"brand": "Apple", "specs": {"ram": 8, "cpu": "A17"}, "tags": ["electronics", "mobile"]}');

-- JSONB queries
SELECT name, data->>'brand' AS brand
FROM products
WHERE data->'specs'->>'ram' = '16';

-- Array containment
SELECT * FROM products WHERE data->'tags' @> '"mobile"';

-- JSONB path query (PostgreSQL 12+)
SELECT * FROM products
WHERE data @? '$.specs ? (@.ram > 10)';

10.2 Array Operators

CREATE TABLE articles (
    id   SERIAL PRIMARY KEY,
    title TEXT,
    tags  TEXT[]
);

INSERT INTO articles (title, tags) VALUES
('SQL Guide', ARRAY['sql', 'database', 'backend']),
('React Tips', ARRAY['react', 'frontend', 'javascript']);

-- Array containment search
SELECT * FROM articles WHERE tags @> ARRAY['sql'];
SELECT * FROM articles WHERE 'react' = ANY(tags);
SELECT * FROM articles WHERE tags && ARRAY['sql', 'react'];  -- overlap check

10.3 LATERAL JOIN

-- Get 3 most recent orders per department
SELECT d.name, recent.*
FROM departments d
CROSS JOIN LATERAL (
    SELECT o.id, o.amount, o.created_at
    FROM orders o
    WHERE o.dept_id = d.id
    ORDER BY o.created_at DESC
    LIMIT 3
) recent;

10.4 Materialized View

-- Pre-compute complex aggregation results
CREATE MATERIALIZED VIEW monthly_sales_summary AS
SELECT
    DATE_TRUNC('month', order_date) AS month,
    dept_id,
    COUNT(*) AS order_count,
    SUM(amount) AS total_amount,
    AVG(amount) AS avg_amount
FROM orders
GROUP BY DATE_TRUNC('month', order_date), dept_id;

-- Add index
CREATE INDEX idx_monthly_sales ON monthly_sales_summary(month, dept_id);

-- Refresh data
REFRESH MATERIALIZED VIEW CONCURRENTLY monthly_sales_summary;

11. Practice Problems (30)

Easy (1-10)

Problem 1. List all employees with their department names. Include employees without departments.

Show Answer

SELECT e.name, d.name AS department
FROM employees e
LEFT JOIN departments d ON e.dept_id = d.id;

Problem 2. Count employees per department.

Show Answer

SELECT d.name, COUNT(e.id) AS emp_count
FROM departments d
LEFT JOIN employees e ON d.id = e.dept_id
GROUP BY d.name
ORDER BY emp_count DESC;

Problem 3. Find employees earning above the overall average.

Show Answer

SELECT name, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);

Problem 4. Find duplicate email addresses.

Show Answer

SELECT email, COUNT(*) AS cnt
FROM users
GROUP BY email
HAVING COUNT(*) > 1;

Problem 5. Find customers with no orders.

Show Answer

SELECT c.name
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id
WHERE o.id IS NULL;

-- OR using NOT EXISTS
SELECT c.name
FROM customers c
WHERE NOT EXISTS (
    SELECT 1 FROM orders o WHERE o.customer_id = c.id
);

Problem 6. Get the 5 most recent orders.

Show Answer

SELECT * FROM orders
ORDER BY created_at DESC
LIMIT 5;

Problem 7. Average salary by department, only where average is 60000 or more.

Show Answer

SELECT dept_id, AVG(salary) AS avg_salary
FROM employees
GROUP BY dept_id
HAVING AVG(salary) >= 60000;

Problem 8. Find the second highest salary.

Show Answer

SELECT MAX(salary) AS second_highest
FROM employees
WHERE salary < (SELECT MAX(salary) FROM employees);

-- OR
SELECT DISTINCT salary
FROM employees
ORDER BY salary DESC
LIMIT 1 OFFSET 1;

Problem 9. Replace NULL values with defaults.

Show Answer

SELECT name, COALESCE(phone, 'N/A') AS phone
FROM employees;

Problem 10. Find employees whose names start with 'A'.

Show Answer

SELECT * FROM employees WHERE name LIKE 'A%';
-- Case-insensitive:
SELECT * FROM employees WHERE name ILIKE 'a%';

Medium (11-25)

Problem 11. Find the highest-paid employee in each department.

Show Answer

WITH ranked AS (
    SELECT name, dept_id, salary,
           RANK() OVER (PARTITION BY dept_id ORDER BY salary DESC) AS rk
    FROM employees
)
SELECT name, dept_id, salary FROM ranked WHERE rk = 1;

Problem 12. Calculate monthly revenue with month-over-month growth rate.

Show Answer

SELECT
    month,
    revenue,
    LAG(revenue) OVER (ORDER BY month) AS prev_revenue,
    ROUND(
        (revenue - LAG(revenue) OVER (ORDER BY month))::NUMERIC
        / NULLIF(LAG(revenue) OVER (ORDER BY month), 0) * 100,
        2
    ) AS growth_pct
FROM monthly_sales;

Problem 13. Find users who logged in for 3 or more consecutive days.

Show Answer

WITH login_groups AS (
    SELECT
        user_id,
        login_date,
        login_date - ROW_NUMBER() OVER (
            PARTITION BY user_id ORDER BY login_date
        )::INT * INTERVAL '1 day' AS grp
    FROM logins
)
SELECT user_id, COUNT(*) AS consecutive_days, MIN(login_date), MAX(login_date)
FROM login_groups
GROUP BY user_id, grp
HAVING COUNT(*) >= 3;

Problem 14. Calculate a running total.

Show Answer

SELECT
    order_date,
    amount,
    SUM(amount) OVER (ORDER BY order_date) AS running_total
FROM orders;

Problem 15. Pivot table: convert department/quarterly sales from rows to columns.

Show Answer

SELECT
    dept_name,
    SUM(CASE WHEN quarter = 1 THEN revenue ELSE 0 END) AS q1,
    SUM(CASE WHEN quarter = 2 THEN revenue ELSE 0 END) AS q2,
    SUM(CASE WHEN quarter = 3 THEN revenue ELSE 0 END) AS q3,
    SUM(CASE WHEN quarter = 4 THEN revenue ELSE 0 END) AS q4
FROM quarterly_sales
GROUP BY dept_name;

Problem 16. Generate Fibonacci sequence using recursive CTE.

Show Answer

WITH RECURSIVE fib AS (
    SELECT 1 AS n, 0::BIGINT AS a, 1::BIGINT AS b
    UNION ALL
    SELECT n + 1, b, a + b FROM fib WHERE n < 20
)
SELECT n, a AS fibonacci FROM fib;

Problem 17. Rank employees by salary within their department.

Show Answer

SELECT
    name,
    dept_id,
    salary,
    DENSE_RANK() OVER (PARTITION BY dept_id ORDER BY salary DESC) AS dept_rank
FROM employees;

Problem 18. Top 5 best-selling products with their categories.

Show Answer

SELECT p.name, p.category, SUM(oi.quantity) AS total_sold
FROM order_items oi
JOIN products p ON oi.product_id = p.id
GROUP BY p.name, p.category
ORDER BY total_sold DESC
LIMIT 5;

Problem 19. Use LATERAL JOIN to get the 3 latest products per category.

Show Answer

SELECT c.name AS category, p.*
FROM categories c
CROSS JOIN LATERAL (
    SELECT id, name, price, created_at
    FROM products
    WHERE category_id = c.id
    ORDER BY created_at DESC
    LIMIT 3
) p;

Problem 20. Calculate a 3-month moving average.

Show Answer

SELECT
    month,
    revenue,
    AVG(revenue) OVER (
        ORDER BY month
        ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
    ) AS moving_avg_3m
FROM monthly_sales;

Problem 21. Filter on JSONB fields with specific conditions.

Show Answer

SELECT id, name, data->>'brand' AS brand
FROM products
WHERE (data->'specs'->>'ram')::INT >= 16
  AND data->'tags' @> '"electronics"';

Problem 22. Gap analysis: find missing numbers in sequential IDs.

Show Answer

WITH id_range AS (
    SELECT generate_series(MIN(id), MAX(id)) AS expected_id
    FROM orders
)
SELECT expected_id AS missing_id
FROM id_range
WHERE expected_id NOT IN (SELECT id FROM orders);

Problem 23. Remove duplicate rows (keep only the latest).

Show Answer

DELETE FROM users
WHERE id NOT IN (
    SELECT DISTINCT ON (email) id
    FROM users
    ORDER BY email, created_at DESC
);

Problem 24. Calculate percentages without GROUP BY.

Show Answer

SELECT
    status,
    COUNT(*) AS cnt,
    ROUND(
        COUNT(*)::NUMERIC / SUM(COUNT(*)) OVER () * 100, 2
    ) AS percentage
FROM orders
GROUP BY status;

Problem 25. Cumulative unique user count.

Show Answer

WITH daily_new AS (
    SELECT
        user_id,
        MIN(login_date) AS first_date
    FROM logins
    GROUP BY user_id
)
SELECT
    first_date AS login_date,
    COUNT(*) AS new_users,
    SUM(COUNT(*)) OVER (ORDER BY first_date) AS cumulative_users
FROM daily_new
GROUP BY first_date
ORDER BY first_date;

Hard (26-30)

Problem 26. Find the longest consecutive revenue increase period.

Show Answer

WITH changes AS (
    SELECT
        month,
        revenue,
        CASE
            WHEN revenue > LAG(revenue) OVER (ORDER BY month)
            THEN 0 ELSE 1
        END AS is_break
    FROM monthly_sales
),
groups AS (
    SELECT *,
        SUM(is_break) OVER (ORDER BY month) AS grp
    FROM changes
)
SELECT
    grp,
    MIN(month) AS start_month,
    MAX(month) AS end_month,
    COUNT(*) AS streak_length
FROM groups
GROUP BY grp
ORDER BY streak_length DESC
LIMIT 1;

Problem 27. BOM (Bill of Materials) explosion using recursive CTE.

Show Answer

WITH RECURSIVE bom AS (
    SELECT
        parent_id, child_id, quantity, 1 AS level,
        quantity AS total_qty
    FROM bill_of_materials
    WHERE parent_id = 'PRODUCT_A'

    UNION ALL

    SELECT
        b.parent_id, b.child_id, b.quantity,
        bom.level + 1,
        bom.total_qty * b.quantity
    FROM bill_of_materials b
    JOIN bom ON b.parent_id = bom.child_id
    WHERE bom.level < 10
)
SELECT child_id AS component, SUM(total_qty) AS total_needed
FROM bom
GROUP BY child_id
ORDER BY total_needed DESC;

Problem 28. Detect time interval overlaps (scheduling conflicts).

Show Answer

SELECT
    a.id AS meeting_a,
    b.id AS meeting_b,
    a.start_time,
    a.end_time,
    b.start_time AS conflict_start,
    b.end_time AS conflict_end
FROM meetings a
JOIN meetings b ON a.id < b.id
    AND a.room_id = b.room_id
    AND a.start_time < b.end_time
    AND a.end_time > b.start_time;

Problem 29. Calculate median (without aggregate functions).

Show Answer

-- Pure SQL approach:
WITH ordered AS (
    SELECT salary,
           ROW_NUMBER() OVER (ORDER BY salary) AS rn,
           COUNT(*) OVER () AS total
    FROM employees
)
SELECT AVG(salary) AS median
FROM ordered
WHERE rn IN (
    FLOOR((total + 1) / 2.0),
    CEIL((total + 1) / 2.0)
);

-- PostgreSQL built-in:
SELECT percentile_cont(0.5) WITHIN GROUP (ORDER BY salary) AS median
FROM employees;

Problem 30. Graph shortest path using recursive CTE.

Show Answer

CREATE TABLE edges (
    src  VARCHAR(10),
    dst  VARCHAR(10),
    cost INT
);

WITH RECURSIVE paths AS (
    SELECT src, dst, cost, ARRAY[src, dst] AS path, 1 AS hops
    FROM edges
    WHERE src = 'A'

    UNION ALL

    SELECT p.src, e.dst, p.cost + e.cost,
           p.path || e.dst,
           p.hops + 1
    FROM paths p
    JOIN edges e ON p.dst = e.src
    WHERE NOT e.dst = ANY(p.path)  -- cycle prevention
      AND p.hops < 10             -- max depth limit
)
SELECT dst, MIN(cost) AS min_cost, path
FROM paths
WHERE dst = 'Z'
GROUP BY dst, path
ORDER BY min_cost
LIMIT 1;

12. Interview Questions (15)

Basics (1-5)

Q1. What is the difference between INNER JOIN and LEFT JOIN?

Show Answer

INNER JOIN returns only rows that match in both tables. LEFT JOIN returns all rows from the left table and fills NULL for non-matching right table rows. Use LEFT JOIN when you need to include rows without matches (e.g., employees without departments).

Q2. What is the difference between GROUP BY and HAVING?

Show Answer

GROUP BY groups rows together. HAVING applies conditions to grouped results. WHERE filters individual rows before grouping, while HAVING filters groups after grouping. Example: HAVING COUNT(*) > 5 returns only groups with more than 5 rows.

Q3. When does an index NOT get used in a WHERE clause?

Show Answer

(1) Functions applied to columns: WHERE YEAR(created_at) = 2025. (2) Implicit type conversions. (3) Leading wildcard: LIKE '%keyword%'. (4) OR conditions mismatching the index. (5) NOT IN, negation conditions. (6) Optimizer determines full scan is more efficient.

Q4. What is the result of comparison operations with NULL?

Show Answer

All comparison operations with NULL return NULL (UNKNOWN). NULL = NULL is NOT TRUE but NULL. Use IS NULL or IS NOT NULL to check for nulls. Aggregate functions ignore NULLs (except COUNT(*)).

Q5. What are the differences between DELETE, TRUNCATE, and DROP?

Show Answer

DELETE: Row-by-row deletion, supports WHERE, rollback possible, triggers fire. TRUNCATE: Instant full deletion, no WHERE, DDL so usually not rollback-able (PostgreSQL allows it), very fast. DROP: Deletes the table itself (structure + data + indexes).

Intermediate (6-10)

Q6. How do Window Functions differ from GROUP BY?

Show Answer

GROUP BY reduces to one row per group. Window Functions aggregate while preserving individual rows. OVER(PARTITION BY ...) defines groups, and adds group-level calculations to each row. For example, display each employee's row with their department average salary alongside.

Q7. When do you choose CTE over subqueries?

Show Answer

CTE offers better readability, is reusable within the same query, and supports recursive queries. Subqueries work for simple cases. Performance varies by DB, but in PostgreSQL 12+, CTEs are inlined by default, so performance difference is minimal.

Q8. Why does column order matter in composite indexes?

Show Answer

B-Tree composite indexes follow the Leftmost Prefix rule. An index on (A, B, C) can be used for A only, A+B, or A+B+C conditions, but NOT for B only or C only. Generally place higher cardinality columns first, and equality conditions before range conditions.

Q9. What indicators should you watch in EXPLAIN ANALYZE?

Show Answer

(1) actual rows vs rows: large gap indicates stale statistics. (2) Seq Scan on large tables: suggests index needed. (3) Sort Method: disk means insufficient work_mem. (4) Nested Loop on large data: consider Hash Join. (5) Execution Time: total execution time.

Q10. How do you choose a transaction isolation level?

Show Answer

Read Committed suits most OLTP (PostgreSQL default). Serializable for financial/payment systems requiring exactness. Repeatable Read for reports/analytics needing snapshot consistency. Higher isolation means lower concurrency, so choose based on requirements.

Advanced (11-15)

Q11. What is a Covering Index and why does it matter?

Show Answer

A Covering Index includes all columns needed by a query, enabling Index Only Scan. This avoids table access entirely, greatly reducing I/O. In PostgreSQL, use the INCLUDE clause to add extra columns to an index.

Q12. Explain how MVCC works.

Show Answer

MVCC maintains multiple versions of each row. When a transaction starts, it records a snapshot point and cannot see later changes. Writes create new versions, reads access versions matching their snapshot. This way reads and writes never block each other. Old versions are cleaned up by VACUUM.

Q13. Give use cases for Partial Index and Expression Index.

Show Answer

Partial Index indexes only frequently searched subsets. Example: active users only with WHERE status = 'active'. Smaller index size means better performance. Expression Index indexes function results. Example: LOWER(email) index for case-insensitive search optimization.

Q14. Why is JSONB better than JSON in PostgreSQL?

Show Answer

JSONB stores data in binary format eliminating parsing overhead, supports GIN indexes, and enables containment (@>), existence (?), and path (@?) operators. JSON stores as raw text with faster insertion but requires parsing on every access and has limited indexing.

Q15. What strategies exist for large table migrations?

Show Answer

(1) Online DDL: ALTER TABLE ... ADD COLUMN completes instantly in PostgreSQL when defaults exist. (2) Batch updates instead of full table UPDATE. (3) Table partitioning to reduce scope. (4) pg_repack for table restructuring. (5) Test on replicas first. Zero-downtime is key.

13. Quiz (5 Questions)

Quiz 1

How many rows does this query return?

SELECT * FROM A CROSS JOIN B;
-- A: 5 rows, B: 3 rows

A) 3 B) 5 C) 8 D) 15

Show Answer

D) 15. CROSS JOIN produces a Cartesian product: 5 x 3 = 15 rows.

Quiz 2

Among ROW_NUMBER, RANK, and DENSE_RANK, which one skips the next rank when there are ties?

A) ROW_NUMBER B) RANK C) DENSE_RANK D) All skip

Show Answer

B) RANK. RANK assigns the same rank for ties and skips the next (1, 1, 3). DENSE_RANK does not skip (1, 1, 2). ROW_NUMBER always assigns unique numbers.

Quiz 3

With composite index (dept_id, salary, name), which query CANNOT use this index?

A) WHERE dept_id = 10 B) WHERE dept_id = 10 AND salary > 50000 C) WHERE salary > 50000 D) WHERE dept_id = 10 AND salary > 50000 AND name = 'Alice'

Show Answer

C) By the Leftmost Prefix rule, the index cannot be used with just the second column (salary) without the first column (dept_id).

Quiz 4

What is the minimum isolation level that prevents Phantom Reads?

A) Read Uncommitted B) Read Committed C) Repeatable Read D) Serializable

Show Answer

D) Serializable. Per SQL standard, only Serializable prevents Phantom Reads. However, PostgreSQL's Repeatable Read also prevents phantoms internally using MVCC snapshots.

Quiz 5

What is the dept_avg value in this result?

SELECT name, salary,
       AVG(salary) OVER () AS dept_avg
FROM employees;
-- employees: Alice 100, Bob 200, Charlie 300

A) Different for each row B) 200 for all rows C) NULL D) Error

Show Answer

B) OVER () aggregates over the entire dataset without partitioning. (100 + 200 + 300) / 3 = 200 for all rows.

14. References

Official Documentation

Books

SQL Performance Explained - Markus Winand
PostgreSQL 14 Internals - Egor Rogov
The Art of SQL - Stephane Faroult
High Performance MySQL - Baron Schwartz et al.
SQL Antipatterns - Bill Karwin

Online Resources

Use The Index, Luke - Complete indexing guide
pgexercises.com - PostgreSQL practice
SQLZoo - SQL basics practice
LeetCode Database Problems - SQL interview problems
Mode Analytics SQL Tutorial

Tools

explain.dalibo.com - PostgreSQL EXPLAIN visualizer
pgMustard - Execution plan analysis
DBeaver - Universal DB client
pgAdmin - PostgreSQL administration tool

Introduction: Why SQL Is Still King in 2025

1. JOIN Mastery

1.1 Sample Tables

1.2 INNER JOIN

1.3 LEFT JOIN (LEFT OUTER JOIN)

1.4 RIGHT JOIN (RIGHT OUTER JOIN)

1.5 FULL OUTER JOIN

1.6 CROSS JOIN

1.7 SELF JOIN

2. Aggregation

2.1 GROUP BY and HAVING

2.2 ROLLUP

2.3 CUBE and GROUPING SETS

3. Window Functions

3.1 What Are Window Functions

3.2 ROW_NUMBER, RANK, DENSE_RANK

3.3 NTILE

3.4 LAG and LEAD

3.5 Running Total

3.6 Top N Per Group

4. Subquery vs CTE vs Derived Table

4.1 Subqueries

4.2 CTE (Common Table Expression)

4.3 Comparison

5. Recursive CTE

5.1 Org Chart Tree Traversal

5.2 Sequence Generation

6. Indexing

6.1 B-Tree Index

6.2 Hash Index

6.3 GIN (Generalized Inverted Index)

6.4 GiST (Generalized Search Tree)

6.5 BRIN (Block Range Index)

6.6 Composite Index

7. EXPLAIN ANALYZE

7.1 Reading Execution Plans

7.2 Key Scan Types

7.3 Reading Costs

7.4 Slow Query Diagnosis Flow

8. Query Optimization

8.1 N+1 Problem in SQL

8.2 Covering Index

8.3 Partial Index

8.4 Expression Index

8.5 Optimization Tips

9. Transaction Isolation Levels

9.1 Four Isolation Levels

9.2 Anomaly Explanations

9.3 MVCC (Multi-Version Concurrency Control)

10. PostgreSQL-Specific Features

10.1 JSONB Queries

10.2 Array Operators

10.3 LATERAL JOIN

10.4 Materialized View

11. Practice Problems (30)

Easy (1-10)

Medium (11-25)

Hard (26-30)

12. Interview Questions (15)

Basics (1-5)

Intermediate (6-10)

Advanced (11-15)

13. Quiz (5 Questions)

Quiz 1

Quiz 2

Quiz 3

Quiz 4

Quiz 5

14. References

Official Documentation

Books

Online Resources

Tools

Blogs