- Published on
SQL Mastery 2025: Developer Interview SQL — From JOINs to Window Functions and Query Optimization
- Authors

- Name
- Youngju Kim
- @fjvbn20031
- Introduction: Why SQL Is Still King in 2025
- 1. JOIN Mastery
- 2. Aggregation
- 3. Window Functions
- 4. Subquery vs CTE vs Derived Table
- 5. Recursive CTE
- 6. Indexing
- 7. EXPLAIN ANALYZE
- 8. Query Optimization
- 9. Transaction Isolation Levels
- 10. PostgreSQL-Specific Features
- 11. Practice Problems (30)
- 12. Interview Questions (15)
- 13. Quiz (5 Questions)
- 14. References
Introduction: Why SQL Is Still King in 2025
In the Stack Overflow 2024 Developer Survey, SQL ranked among the top 3 most-used languages, and it remains one of the most sought-after skills. PostgreSQL was named DB-Engines DBMS of the Year for two consecutive years (2023-2024), and even in the AI/LLM era, the fundamental skill for working with data is SQL.
Why SQL matters in interviews:
- Over 80% of backend position interviews include SQL questions
- Essential for data engineers, full-stack, and DevOps roles
- When ORM-only code hits performance walls, SQL is the solution
- Window Functions and CTEs are core topics in modern interviews
┌──────────────────────────────────────────────────┐
│ Why SQL Is Still King (2025) │
├──────────────────────────────────────────────────┤
│ Stack Overflow → Top 3 most used language │
│ PostgreSQL → DBMS of the Year 2 years │
│ AI/LLM → Text-to-SQL core technology │
│ Cloud → Aurora, AlloyDB, Neon │
│ Interview freq → 80%+ of backend interviews │
└──────────────────────────────────────────────────┘
This article covers everything you need for SQL interviews: all JOIN types, Window Functions, indexing, query optimization, and transactions.
1. JOIN Mastery
1.1 Sample Tables
-- Employees table
CREATE TABLE employees (
id INT PRIMARY KEY,
name VARCHAR(50),
dept_id INT
);
INSERT INTO employees VALUES
(1, 'Alice', 10),
(2, 'Bob', 20),
(3, 'Charlie', 10),
(4, 'Diana', 30),
(5, 'Eve', NULL);
-- Departments table
CREATE TABLE departments (
id INT PRIMARY KEY,
name VARCHAR(50)
);
INSERT INTO departments VALUES
(10, 'Engineering'),
(20, 'Marketing'),
(40, 'HR');
1.2 INNER JOIN
Returns only rows that match in both tables.
SELECT e.name AS employee, d.name AS department
FROM employees e
INNER JOIN departments d ON e.dept_id = d.id;
┌──────────┬─────────────┐
│ employee │ department │
├──────────┼─────────────┤
│ Alice │ Engineering │
│ Bob │ Marketing │
│ Charlie │ Engineering │
└──────────┴─────────────┘
-- Diana (dept_id=30) and Eve (dept_id=NULL) excluded - no match
-- HR (id=40) excluded - no matching employees
┌─────────┐ ┌─────────┐
│Employees│ │ Depts │
│ ┌────┼───┼────┐ │
│ │████│███│████│ │
│ │████│███│████│ │
│ └────┼───┼────┘ │
└─────────┘ └─────────┘
INNER JOIN = Intersection
1.3 LEFT JOIN (LEFT OUTER JOIN)
All rows from the left table + matching right rows (NULL if no match).
SELECT e.name AS employee, d.name AS department
FROM employees e
LEFT JOIN departments d ON e.dept_id = d.id;
┌──────────┬─────────────┐
│ employee │ department │
├──────────┼─────────────┤
│ Alice │ Engineering │
│ Bob │ Marketing │
│ Charlie │ Engineering │
│ Diana │ NULL │
│ Eve │ NULL │
└──────────┴─────────────┘
1.4 RIGHT JOIN (RIGHT OUTER JOIN)
All rows from the right table + matching left rows (NULL if no match).
SELECT e.name AS employee, d.name AS department
FROM employees e
RIGHT JOIN departments d ON e.dept_id = d.id;
┌──────────┬─────────────┐
│ employee │ department │
├──────────┼─────────────┤
│ Alice │ Engineering │
│ Bob │ Marketing │
│ Charlie │ Engineering │
│ NULL │ HR │
└──────────┴─────────────┘
1.5 FULL OUTER JOIN
Returns all rows from both tables. NULL where no match exists.
SELECT e.name AS employee, d.name AS department
FROM employees e
FULL OUTER JOIN departments d ON e.dept_id = d.id;
┌──────────┬─────────────┐
│ employee │ department │
├──────────┼─────────────┤
│ Alice │ Engineering │
│ Bob │ Marketing │
│ Charlie │ Engineering │
│ Diana │ NULL │
│ Eve │ NULL │
│ NULL │ HR │
└──────────┴─────────────┘
1.6 CROSS JOIN
Returns all combinations (Cartesian product) of two tables.
SELECT e.name, d.name
FROM employees e
CROSS JOIN departments d;
-- 5 employees x 3 departments = 15 rows
1.7 SELF JOIN
Joining a table with itself.
-- Manager-employee relationship
CREATE TABLE staff (
id INT PRIMARY KEY,
name VARCHAR(50),
manager_id INT
);
INSERT INTO staff VALUES
(1, 'CEO', NULL),
(2, 'CTO', 1),
(3, 'Alice', 2),
(4, 'Bob', 2);
SELECT
s.name AS employee,
m.name AS manager
FROM staff s
LEFT JOIN staff m ON s.manager_id = m.id;
┌──────────┬─────────┐
│ employee │ manager │
├──────────┼─────────┤
│ CEO │ NULL │
│ CTO │ CEO │
│ Alice │ CTO │
│ Bob │ CTO │
└──────────┴─────────┘
2. Aggregation
2.1 GROUP BY and HAVING
-- Employee count and average salary by department
SELECT
dept_id,
COUNT(*) AS emp_count,
AVG(salary) AS avg_salary,
MAX(salary) AS max_salary,
MIN(salary) AS min_salary
FROM employees
GROUP BY dept_id
HAVING COUNT(*) >= 2
ORDER BY avg_salary DESC;
Key rule: Any non-aggregate column in SELECT must appear in GROUP BY.
2.2 ROLLUP
-- Department totals with grand total
SELECT
COALESCE(dept_name, 'Grand Total') AS department,
COALESCE(job_title, 'Subtotal') AS job,
COUNT(*) AS emp_count,
SUM(salary) AS total_salary
FROM employees e
JOIN departments d ON e.dept_id = d.id
GROUP BY ROLLUP(dept_name, job_title);
2.3 CUBE and GROUPING SETS
-- CUBE: subtotals for all combinations
SELECT dept_name, job_title, SUM(salary)
FROM employees e JOIN departments d ON e.dept_id = d.id
GROUP BY CUBE(dept_name, job_title);
-- GROUPING SETS: only specific combinations
SELECT dept_name, job_title, SUM(salary)
FROM employees e JOIN departments d ON e.dept_id = d.id
GROUP BY GROUPING SETS (
(dept_name, job_title), -- by dept + job
(dept_name), -- by dept
() -- grand total
);
3. Window Functions
3.1 What Are Window Functions
Functions that aggregate while preserving individual rows. Unlike GROUP BY, they do not reduce rows.
-- GROUP BY: reduces rows (1 row per dept)
SELECT dept_id, AVG(salary) FROM employees GROUP BY dept_id;
-- Window Function: preserves rows (all employees + dept average)
SELECT
name,
dept_id,
salary,
AVG(salary) OVER (PARTITION BY dept_id) AS dept_avg
FROM employees;
3.2 ROW_NUMBER, RANK, DENSE_RANK
SELECT
name,
dept_id,
salary,
ROW_NUMBER() OVER (ORDER BY salary DESC) AS row_num,
RANK() OVER (ORDER BY salary DESC) AS rank,
DENSE_RANK() OVER (ORDER BY salary DESC) AS dense_rank
FROM employees;
┌─────────┬─────────┬────────┬─────────┬──────┬────────────┐
│ name │ dept_id │ salary │ row_num │ rank │ dense_rank │
├─────────┼─────────┼────────┼─────────┼──────┼────────────┤
│ Alice │ 10 │ 120000 │ 1 │ 1 │ 1 │
│ Charlie │ 10 │ 120000 │ 2 │ 1 │ 1 │
│ Bob │ 20 │ 100000 │ 3 │ 3 │ 2 │
│ Diana │ 30 │ 80000 │ 4 │ 4 │ 3 │
│ Eve │ 20 │ 80000 │ 5 │ 4 │ 3 │
└─────────┴─────────┴────────┴─────────┴──────┴────────────┘
ROW_NUMBER: Always unique numbers (different even for ties)
RANK: Same number for ties, gaps after (1, 1, 3)
DENSE_RANK: Same number for ties, no gaps (1, 1, 2)
3.3 NTILE
Divides data into N equal groups.
-- Divide into salary quartiles
SELECT
name,
salary,
NTILE(4) OVER (ORDER BY salary) AS quartile
FROM employees;
3.4 LAG and LEAD
Reference values from previous/next rows.
-- Month-over-month revenue changes
SELECT
month,
revenue,
LAG(revenue, 1) OVER (ORDER BY month) AS prev_month,
revenue - LAG(revenue, 1) OVER (ORDER BY month) AS change,
LEAD(revenue, 1) OVER (ORDER BY month) AS next_month
FROM monthly_sales;
┌───────┬─────────┬────────────┬────────┬────────────┐
│ month │ revenue │ prev_month │ change │ next_month │
├───────┼─────────┼────────────┼────────┼────────────┤
│ 01 │ 50000 │ NULL │ NULL │ 60000 │
│ 02 │ 60000 │ 50000 │ 10000 │ 45000 │
│ 03 │ 45000 │ 60000 │ -15000 │ 70000 │
│ 04 │ 70000 │ 45000 │ 25000 │ NULL │
└───────┴─────────┴────────────┴────────┴────────────┘
3.5 Running Total
SELECT
order_date,
amount,
SUM(amount) OVER (ORDER BY order_date) AS running_total,
SUM(amount) OVER (
ORDER BY order_date
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
) AS moving_sum_3
FROM orders;
3.6 Top N Per Group
-- Top 3 salaries per department
WITH ranked AS (
SELECT
name,
dept_id,
salary,
ROW_NUMBER() OVER (
PARTITION BY dept_id
ORDER BY salary DESC
) AS rn
FROM employees
)
SELECT name, dept_id, salary
FROM ranked
WHERE rn <= 3;
4. Subquery vs CTE vs Derived Table
4.1 Subqueries
-- Correlated subquery: employees earning above their dept average
SELECT name, salary, dept_id
FROM employees e1
WHERE salary > (
SELECT AVG(salary)
FROM employees e2
WHERE e2.dept_id = e1.dept_id
);
-- Non-correlated subquery: above overall average
SELECT name, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);
4.2 CTE (Common Table Expression)
WITH dept_avg AS (
SELECT dept_id, AVG(salary) AS avg_salary
FROM employees
GROUP BY dept_id
),
high_earners AS (
SELECT e.name, e.salary, e.dept_id, d.avg_salary
FROM employees e
JOIN dept_avg d ON e.dept_id = d.dept_id
WHERE e.salary > d.avg_salary
)
SELECT * FROM high_earners ORDER BY salary DESC;
4.3 Comparison
┌────────────────┬───────────────┬───────────────┬────────────────┐
│ Aspect │ Subquery │ CTE │ Derived Table │
├────────────────┼───────────────┼───────────────┼────────────────┤
│ Readability │ Low (nested) │ High │ Medium │
│ Reuse │ Not possible │ Within query │ Not possible │
│ Recursion │ Not possible │ Yes (WITH REC)│ Not possible │
│ Performance │ DB-dependent │ DB-dependent │ Inline view │
│ Debugging │ Hard │ Easy │ Medium │
└────────────────┴───────────────┴───────────────┴────────────────┘
5. Recursive CTE
5.1 Org Chart Tree Traversal
CREATE TABLE org (
id INT PRIMARY KEY,
name VARCHAR(50),
manager_id INT REFERENCES org(id)
);
INSERT INTO org VALUES
(1, 'CEO', NULL),
(2, 'VP Eng', 1),
(3, 'VP Sales', 1),
(4, 'Dir Eng', 2),
(5, 'Lead Dev', 4),
(6, 'Dev 1', 5),
(7, 'Dev 2', 5);
-- Recursive CTE: traverse full org chart from CEO
WITH RECURSIVE org_tree AS (
-- Base case: CEO (no manager)
SELECT id, name, manager_id, 0 AS depth, name::TEXT AS path
FROM org
WHERE manager_id IS NULL
UNION ALL
-- Recursive case: find subordinates
SELECT o.id, o.name, o.manager_id,
t.depth + 1,
t.path || ' > ' || o.name
FROM org o
JOIN org_tree t ON o.manager_id = t.id
)
SELECT
REPEAT(' ', depth) || name AS hierarchy,
depth,
path
FROM org_tree
ORDER BY path;
┌────────────────────┬───────┬──────────────────────────────┐
│ hierarchy │ depth │ path │
├────────────────────┼───────┼──────────────────────────────┤
│ CEO │ 0 │ CEO │
│ VP Eng │ 1 │ CEO > VP Eng │
│ Dir Eng │ 2 │ CEO > VP Eng > Dir Eng │
│ Lead Dev │ 3 │ CEO > VP Eng > Dir Eng > ... │
│ Dev 1 │ 4 │ CEO > VP Eng > Dir Eng > ... │
│ Dev 2 │ 4 │ CEO > VP Eng > Dir Eng > ... │
│ VP Sales │ 1 │ CEO > VP Sales │
└────────────────────┴───────┴──────────────────────────────┘
5.2 Sequence Generation
-- Generate numbers 1 to 10
WITH RECURSIVE nums AS (
SELECT 1 AS n
UNION ALL
SELECT n + 1 FROM nums WHERE n < 10
)
SELECT n FROM nums;
-- Date series generation
WITH RECURSIVE dates AS (
SELECT DATE '2025-01-01' AS dt
UNION ALL
SELECT dt + INTERVAL '1 day' FROM dates WHERE dt < '2025-01-31'
)
SELECT dt FROM dates;
6. Indexing
6.1 B-Tree Index
The most common index type. Effective for equality (=), range comparisons, sorting, and LIKE 'prefix%'.
CREATE INDEX idx_employees_name ON employees(name);
-- Effective queries
SELECT * FROM employees WHERE name = 'Alice'; -- equality
SELECT * FROM employees WHERE name BETWEEN 'A' AND 'M'; -- range
SELECT * FROM employees ORDER BY name; -- sort
SELECT * FROM employees WHERE name LIKE 'Al%'; -- prefix
B-Tree structure:
[M]
/ \
[D,H] [P,T]
/ | \ / | \
[A-C][E-G][I-L][N-O][Q-S][U-Z]
(Leaf nodes: actual data pointers)
6.2 Hash Index
Only effective for equality (=) comparisons. Cannot handle range searches or sorting.
CREATE INDEX idx_employees_email_hash
ON employees USING HASH (email);
-- Effective
SELECT * FROM employees WHERE email = 'alice@example.com';
6.3 GIN (Generalized Inverted Index)
Ideal for arrays, JSONB, and full-text search.
-- JSONB field indexing
CREATE INDEX idx_products_tags ON products USING GIN (tags);
SELECT * FROM products WHERE tags @> '["electronics"]';
-- Full-text search
CREATE INDEX idx_articles_search ON articles
USING GIN (to_tsvector('english', title || ' ' || body));
6.4 GiST (Generalized Search Tree)
Suitable for geographic data, range types, and nearest-neighbor searches.
CREATE INDEX idx_locations_geom ON locations USING GiST (geom);
SELECT * FROM locations
WHERE ST_DWithin(geom, ST_MakePoint(127.0, 37.5)::geography, 1000);
6.5 BRIN (Block Range Index)
Ideal for physically sorted large tables. Very small size for great effectiveness.
-- Optimal for time-series data
CREATE INDEX idx_logs_created ON logs USING BRIN (created_at);
6.6 Composite Index
-- Column order matters!
CREATE INDEX idx_emp_dept_salary ON employees(dept_id, salary);
-- Effective: dept_id only, or dept_id + salary together
SELECT * FROM employees WHERE dept_id = 10;
SELECT * FROM employees WHERE dept_id = 10 AND salary > 50000;
-- Ineffective: salary only (skipping first column)
-- SELECT * FROM employees WHERE salary > 50000;
Composite Index ordering rules (Leftmost Prefix):
Index: (A, B, C)
WHERE A = ? -> Used
WHERE A = ? AND B = ? -> Used
WHERE A = ? AND B = ? AND C = ? -> Used
WHERE B = ? -> NOT Used (A missing)
WHERE B = ? AND C = ? -> NOT Used (A missing)
WHERE A = ? AND C = ? -> Only A used (B skipped)
7. EXPLAIN ANALYZE
7.1 Reading Execution Plans
EXPLAIN ANALYZE
SELECT e.name, d.name AS dept
FROM employees e
JOIN departments d ON e.dept_id = d.id
WHERE e.salary > 50000;
Hash Join (cost=1.09..2.24 rows=3 width=36) (actual time=0.05..0.06 rows=3 loops=1)
Hash Cond: (e.dept_id = d.id)
-> Seq Scan on employees e (cost=0.00..1.12 rows=3 width=22)
Filter: (salary > 50000)
Rows Removed by Filter: 2
-> Hash (cost=1.03..1.03 rows=3 width=18)
-> Seq Scan on departments d (cost=0.00..1.03 rows=3 width=18)
Planning Time: 0.15 ms
Execution Time: 0.09 ms
7.2 Key Scan Types
┌──────────────────┬─────────────────────────────────────┐
│ Scan Type │ Description │
├──────────────────┼─────────────────────────────────────┤
│ Seq Scan │ Full table sequential scan │
│ Index Scan │ Index lookup + table access │
│ Index Only Scan │ Results from index only (no table) │
│ Bitmap Index Scan│ Build bitmap from index, then table │
│ Bitmap Heap Scan │ Table access based on bitmap │
└──────────────────┴─────────────────────────────────────┘
7.3 Reading Costs
cost=startup_cost..total_cost rows=estimated_rows width=row_size(bytes)
- startup_cost: cost until first row returned
- total_cost: total cost to return all rows
- actual time: real execution time (ms)
- rows: actual rows returned
- loops: number of times this node executed
7.4 Slow Query Diagnosis Flow
┌──────────────────────────────────────────────────┐
│ Query Optimization Flow │
├──────────────────────────────────────────────────┤
│ 1. Run EXPLAIN ANALYZE │
│ | │
│ 2. Seq Scan on large table? │
│ -> Consider adding index │
│ | │
│ 3. Large gap between estimated and actual rows? │
│ -> Run ANALYZE to update statistics │
│ | │
│ 4. Nested Loop on large datasets? │
│ -> Hash Join or Merge Join may be better │
│ | │
│ 5. Sort using disk? │
│ -> Increase work_mem or add index │
└──────────────────────────────────────────────────┘
8. Query Optimization
8.1 N+1 Problem in SQL
-- Bad: separate query per order (from application)
-- 1. SELECT * FROM orders WHERE user_id = 1;
-- 2. SELECT * FROM order_items WHERE order_id = 101;
-- 3. SELECT * FROM order_items WHERE order_id = 102;
-- ... (N orders = N+1 queries)
-- Good: single JOIN query
SELECT o.id, o.total, oi.product_name, oi.quantity
FROM orders o
JOIN order_items oi ON o.id = oi.order_id
WHERE o.user_id = 1;
8.2 Covering Index
Completes the query using only the index, avoiding table access.
CREATE INDEX idx_emp_dept_name_salary
ON employees(dept_id, name, salary);
-- Index Only Scan possible (no table access needed)
SELECT name, salary FROM employees WHERE dept_id = 10;
8.3 Partial Index
Indexes only rows matching a condition.
-- Index only active users (most queries target active users)
CREATE INDEX idx_active_users
ON users(email)
WHERE status = 'active';
SELECT * FROM users WHERE status = 'active' AND email = 'alice@example.com';
8.4 Expression Index
-- Index for case-insensitive search
CREATE INDEX idx_users_lower_email
ON users(LOWER(email));
SELECT * FROM users WHERE LOWER(email) = 'alice@example.com';
-- JSON field indexing
CREATE INDEX idx_data_category
ON products((data->>'category'));
8.5 Optimization Tips
-- 1. EXISTS vs IN: EXISTS is usually faster for large datasets
-- Slower
SELECT * FROM orders WHERE customer_id IN (
SELECT id FROM customers WHERE country = 'US'
);
-- Faster
SELECT * FROM orders o WHERE EXISTS (
SELECT 1 FROM customers c
WHERE c.id = o.customer_id AND c.country = 'US'
);
-- 2. LIMIT prevents unnecessary scanning
SELECT * FROM logs ORDER BY created_at DESC LIMIT 100;
-- 3. UNION ALL vs UNION: use ALL when dedup not needed
SELECT name FROM employees_2024
UNION ALL
SELECT name FROM employees_2025;
-- 4. Remove unnecessary ORDER BY inside subqueries
-- 5. SELECT specific columns instead of SELECT *
9. Transaction Isolation Levels
9.1 Four Isolation Levels
┌──────────────────┬───────────┬──────────────┬──────────────┬─────────────┐
│ Isolation Level │Dirty Read │Non-Repeatable│Phantom Read │ Performance │
│ │ │ Read │ │ │
├──────────────────┼───────────┼──────────────┼──────────────┼─────────────┤
│ Read Uncommitted │ Possible │ Possible │ Possible │ Fastest │
│ Read Committed │ Prevented │ Possible │ Possible │ Fast │
│ Repeatable Read │ Prevented │ Prevented │ Possible │ Medium │
│ Serializable │ Prevented │ Prevented │ Prevented │ Slowest │
└──────────────────┴───────────┴──────────────┴──────────────┴─────────────┘
PostgreSQL default: Read Committed
MySQL InnoDB default: Repeatable Read
9.2 Anomaly Explanations
Dirty Read: Reading uncommitted data
-- Transaction A
UPDATE accounts SET balance = 0 WHERE id = 1;
-- (not yet committed)
-- Transaction B (Read Uncommitted)
SELECT balance FROM accounts WHERE id = 1;
-- balance = 0 (uncommitted value!)
-- Transaction A
ROLLBACK;
-- balance reverts to original, but Transaction B already used wrong data
Non-Repeatable Read: Same query returns different results
-- Transaction A
SELECT balance FROM accounts WHERE id = 1; -- 1000
-- Transaction B
UPDATE accounts SET balance = 500 WHERE id = 1;
COMMIT;
-- Transaction A (within same transaction)
SELECT balance FROM accounts WHERE id = 1; -- 500 (different!)
Phantom Read: Row count changes for the same condition
-- Transaction A
SELECT COUNT(*) FROM orders WHERE status = 'pending'; -- 5
-- Transaction B
INSERT INTO orders (status) VALUES ('pending');
COMMIT;
-- Transaction A
SELECT COUNT(*) FROM orders WHERE status = 'pending'; -- 6 (phantom row!)
9.3 MVCC (Multi-Version Concurrency Control)
PostgreSQL uses MVCC so reads and writes do not block each other.
┌──────────────────────────────────────────────────┐
│ MVCC Principle │
├──────────────────────────────────────────────────┤
│ │
│ Transaction 100 (Read): │
│ "I only see versions with xmin <= 100" │
│ │
│ Transaction 101 (Write): │
│ "Create new version with xmin=101, │
│ mark old row with xmax=101" │
│ │
│ -> Transaction 100 still sees the old version │
│ -> Reads and writes do not block each other │
│ │
└──────────────────────────────────────────────────┘
10. PostgreSQL-Specific Features
10.1 JSONB Queries
CREATE TABLE products (
id SERIAL PRIMARY KEY,
name TEXT,
data JSONB
);
INSERT INTO products (name, data) VALUES
('Laptop', '{"brand": "Dell", "specs": {"ram": 16, "cpu": "i7"}, "tags": ["electronics", "computer"]}'),
('Phone', '{"brand": "Apple", "specs": {"ram": 8, "cpu": "A17"}, "tags": ["electronics", "mobile"]}');
-- JSONB queries
SELECT name, data->>'brand' AS brand
FROM products
WHERE data->'specs'->>'ram' = '16';
-- Array containment
SELECT * FROM products WHERE data->'tags' @> '"mobile"';
-- JSONB path query (PostgreSQL 12+)
SELECT * FROM products
WHERE data @? '$.specs ? (@.ram > 10)';
10.2 Array Operators
CREATE TABLE articles (
id SERIAL PRIMARY KEY,
title TEXT,
tags TEXT[]
);
INSERT INTO articles (title, tags) VALUES
('SQL Guide', ARRAY['sql', 'database', 'backend']),
('React Tips', ARRAY['react', 'frontend', 'javascript']);
-- Array containment search
SELECT * FROM articles WHERE tags @> ARRAY['sql'];
SELECT * FROM articles WHERE 'react' = ANY(tags);
SELECT * FROM articles WHERE tags && ARRAY['sql', 'react']; -- overlap check
10.3 LATERAL JOIN
-- Get 3 most recent orders per department
SELECT d.name, recent.*
FROM departments d
CROSS JOIN LATERAL (
SELECT o.id, o.amount, o.created_at
FROM orders o
WHERE o.dept_id = d.id
ORDER BY o.created_at DESC
LIMIT 3
) recent;
10.4 Materialized View
-- Pre-compute complex aggregation results
CREATE MATERIALIZED VIEW monthly_sales_summary AS
SELECT
DATE_TRUNC('month', order_date) AS month,
dept_id,
COUNT(*) AS order_count,
SUM(amount) AS total_amount,
AVG(amount) AS avg_amount
FROM orders
GROUP BY DATE_TRUNC('month', order_date), dept_id;
-- Add index
CREATE INDEX idx_monthly_sales ON monthly_sales_summary(month, dept_id);
-- Refresh data
REFRESH MATERIALIZED VIEW CONCURRENTLY monthly_sales_summary;
11. Practice Problems (30)
Easy (1-10)
Problem 1. List all employees with their department names. Include employees without departments.
Show Answer
SELECT e.name, d.name AS department
FROM employees e
LEFT JOIN departments d ON e.dept_id = d.id;
Problem 2. Count employees per department.
Show Answer
SELECT d.name, COUNT(e.id) AS emp_count
FROM departments d
LEFT JOIN employees e ON d.id = e.dept_id
GROUP BY d.name
ORDER BY emp_count DESC;
Problem 3. Find employees earning above the overall average.
Show Answer
SELECT name, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);
Problem 4. Find duplicate email addresses.
Show Answer
SELECT email, COUNT(*) AS cnt
FROM users
GROUP BY email
HAVING COUNT(*) > 1;
Problem 5. Find customers with no orders.
Show Answer
SELECT c.name
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id
WHERE o.id IS NULL;
-- OR using NOT EXISTS
SELECT c.name
FROM customers c
WHERE NOT EXISTS (
SELECT 1 FROM orders o WHERE o.customer_id = c.id
);
Problem 6. Get the 5 most recent orders.
Show Answer
SELECT * FROM orders
ORDER BY created_at DESC
LIMIT 5;
Problem 7. Average salary by department, only where average is 60000 or more.
Show Answer
SELECT dept_id, AVG(salary) AS avg_salary
FROM employees
GROUP BY dept_id
HAVING AVG(salary) >= 60000;
Problem 8. Find the second highest salary.
Show Answer
SELECT MAX(salary) AS second_highest
FROM employees
WHERE salary < (SELECT MAX(salary) FROM employees);
-- OR
SELECT DISTINCT salary
FROM employees
ORDER BY salary DESC
LIMIT 1 OFFSET 1;
Problem 9. Replace NULL values with defaults.
Show Answer
SELECT name, COALESCE(phone, 'N/A') AS phone
FROM employees;
Problem 10. Find employees whose names start with 'A'.
Show Answer
SELECT * FROM employees WHERE name LIKE 'A%';
-- Case-insensitive:
SELECT * FROM employees WHERE name ILIKE 'a%';
Medium (11-25)
Problem 11. Find the highest-paid employee in each department.
Show Answer
WITH ranked AS (
SELECT name, dept_id, salary,
RANK() OVER (PARTITION BY dept_id ORDER BY salary DESC) AS rk
FROM employees
)
SELECT name, dept_id, salary FROM ranked WHERE rk = 1;
Problem 12. Calculate monthly revenue with month-over-month growth rate.
Show Answer
SELECT
month,
revenue,
LAG(revenue) OVER (ORDER BY month) AS prev_revenue,
ROUND(
(revenue - LAG(revenue) OVER (ORDER BY month))::NUMERIC
/ NULLIF(LAG(revenue) OVER (ORDER BY month), 0) * 100,
2
) AS growth_pct
FROM monthly_sales;
Problem 13. Find users who logged in for 3 or more consecutive days.
Show Answer
WITH login_groups AS (
SELECT
user_id,
login_date,
login_date - ROW_NUMBER() OVER (
PARTITION BY user_id ORDER BY login_date
)::INT * INTERVAL '1 day' AS grp
FROM logins
)
SELECT user_id, COUNT(*) AS consecutive_days, MIN(login_date), MAX(login_date)
FROM login_groups
GROUP BY user_id, grp
HAVING COUNT(*) >= 3;
Problem 14. Calculate a running total.
Show Answer
SELECT
order_date,
amount,
SUM(amount) OVER (ORDER BY order_date) AS running_total
FROM orders;
Problem 15. Pivot table: convert department/quarterly sales from rows to columns.
Show Answer
SELECT
dept_name,
SUM(CASE WHEN quarter = 1 THEN revenue ELSE 0 END) AS q1,
SUM(CASE WHEN quarter = 2 THEN revenue ELSE 0 END) AS q2,
SUM(CASE WHEN quarter = 3 THEN revenue ELSE 0 END) AS q3,
SUM(CASE WHEN quarter = 4 THEN revenue ELSE 0 END) AS q4
FROM quarterly_sales
GROUP BY dept_name;
Problem 16. Generate Fibonacci sequence using recursive CTE.
Show Answer
WITH RECURSIVE fib AS (
SELECT 1 AS n, 0::BIGINT AS a, 1::BIGINT AS b
UNION ALL
SELECT n + 1, b, a + b FROM fib WHERE n < 20
)
SELECT n, a AS fibonacci FROM fib;
Problem 17. Rank employees by salary within their department.
Show Answer
SELECT
name,
dept_id,
salary,
DENSE_RANK() OVER (PARTITION BY dept_id ORDER BY salary DESC) AS dept_rank
FROM employees;
Problem 18. Top 5 best-selling products with their categories.
Show Answer
SELECT p.name, p.category, SUM(oi.quantity) AS total_sold
FROM order_items oi
JOIN products p ON oi.product_id = p.id
GROUP BY p.name, p.category
ORDER BY total_sold DESC
LIMIT 5;
Problem 19. Use LATERAL JOIN to get the 3 latest products per category.
Show Answer
SELECT c.name AS category, p.*
FROM categories c
CROSS JOIN LATERAL (
SELECT id, name, price, created_at
FROM products
WHERE category_id = c.id
ORDER BY created_at DESC
LIMIT 3
) p;
Problem 20. Calculate a 3-month moving average.
Show Answer
SELECT
month,
revenue,
AVG(revenue) OVER (
ORDER BY month
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
) AS moving_avg_3m
FROM monthly_sales;
Problem 21. Filter on JSONB fields with specific conditions.
Show Answer
SELECT id, name, data->>'brand' AS brand
FROM products
WHERE (data->'specs'->>'ram')::INT >= 16
AND data->'tags' @> '"electronics"';
Problem 22. Gap analysis: find missing numbers in sequential IDs.
Show Answer
WITH id_range AS (
SELECT generate_series(MIN(id), MAX(id)) AS expected_id
FROM orders
)
SELECT expected_id AS missing_id
FROM id_range
WHERE expected_id NOT IN (SELECT id FROM orders);
Problem 23. Remove duplicate rows (keep only the latest).
Show Answer
DELETE FROM users
WHERE id NOT IN (
SELECT DISTINCT ON (email) id
FROM users
ORDER BY email, created_at DESC
);
Problem 24. Calculate percentages without GROUP BY.
Show Answer
SELECT
status,
COUNT(*) AS cnt,
ROUND(
COUNT(*)::NUMERIC / SUM(COUNT(*)) OVER () * 100, 2
) AS percentage
FROM orders
GROUP BY status;
Problem 25. Cumulative unique user count.
Show Answer
WITH daily_new AS (
SELECT
user_id,
MIN(login_date) AS first_date
FROM logins
GROUP BY user_id
)
SELECT
first_date AS login_date,
COUNT(*) AS new_users,
SUM(COUNT(*)) OVER (ORDER BY first_date) AS cumulative_users
FROM daily_new
GROUP BY first_date
ORDER BY first_date;
Hard (26-30)
Problem 26. Find the longest consecutive revenue increase period.
Show Answer
WITH changes AS (
SELECT
month,
revenue,
CASE
WHEN revenue > LAG(revenue) OVER (ORDER BY month)
THEN 0 ELSE 1
END AS is_break
FROM monthly_sales
),
groups AS (
SELECT *,
SUM(is_break) OVER (ORDER BY month) AS grp
FROM changes
)
SELECT
grp,
MIN(month) AS start_month,
MAX(month) AS end_month,
COUNT(*) AS streak_length
FROM groups
GROUP BY grp
ORDER BY streak_length DESC
LIMIT 1;
Problem 27. BOM (Bill of Materials) explosion using recursive CTE.
Show Answer
WITH RECURSIVE bom AS (
SELECT
parent_id, child_id, quantity, 1 AS level,
quantity AS total_qty
FROM bill_of_materials
WHERE parent_id = 'PRODUCT_A'
UNION ALL
SELECT
b.parent_id, b.child_id, b.quantity,
bom.level + 1,
bom.total_qty * b.quantity
FROM bill_of_materials b
JOIN bom ON b.parent_id = bom.child_id
WHERE bom.level < 10
)
SELECT child_id AS component, SUM(total_qty) AS total_needed
FROM bom
GROUP BY child_id
ORDER BY total_needed DESC;
Problem 28. Detect time interval overlaps (scheduling conflicts).
Show Answer
SELECT
a.id AS meeting_a,
b.id AS meeting_b,
a.start_time,
a.end_time,
b.start_time AS conflict_start,
b.end_time AS conflict_end
FROM meetings a
JOIN meetings b ON a.id < b.id
AND a.room_id = b.room_id
AND a.start_time < b.end_time
AND a.end_time > b.start_time;
Problem 29. Calculate median (without aggregate functions).
Show Answer
-- Pure SQL approach:
WITH ordered AS (
SELECT salary,
ROW_NUMBER() OVER (ORDER BY salary) AS rn,
COUNT(*) OVER () AS total
FROM employees
)
SELECT AVG(salary) AS median
FROM ordered
WHERE rn IN (
FLOOR((total + 1) / 2.0),
CEIL((total + 1) / 2.0)
);
-- PostgreSQL built-in:
SELECT percentile_cont(0.5) WITHIN GROUP (ORDER BY salary) AS median
FROM employees;
Problem 30. Graph shortest path using recursive CTE.
Show Answer
CREATE TABLE edges (
src VARCHAR(10),
dst VARCHAR(10),
cost INT
);
WITH RECURSIVE paths AS (
SELECT src, dst, cost, ARRAY[src, dst] AS path, 1 AS hops
FROM edges
WHERE src = 'A'
UNION ALL
SELECT p.src, e.dst, p.cost + e.cost,
p.path || e.dst,
p.hops + 1
FROM paths p
JOIN edges e ON p.dst = e.src
WHERE NOT e.dst = ANY(p.path) -- cycle prevention
AND p.hops < 10 -- max depth limit
)
SELECT dst, MIN(cost) AS min_cost, path
FROM paths
WHERE dst = 'Z'
GROUP BY dst, path
ORDER BY min_cost
LIMIT 1;
12. Interview Questions (15)
Basics (1-5)
Q1. What is the difference between INNER JOIN and LEFT JOIN?
Show Answer
INNER JOIN returns only rows that match in both tables. LEFT JOIN returns all rows from the left table and fills NULL for non-matching right table rows. Use LEFT JOIN when you need to include rows without matches (e.g., employees without departments).
Q2. What is the difference between GROUP BY and HAVING?
Show Answer
GROUP BY groups rows together. HAVING applies conditions to grouped results. WHERE filters individual rows before grouping, while HAVING filters groups after grouping. Example: HAVING COUNT(*) > 5 returns only groups with more than 5 rows.
Q3. When does an index NOT get used in a WHERE clause?
Show Answer
(1) Functions applied to columns: WHERE YEAR(created_at) = 2025. (2) Implicit type conversions. (3) Leading wildcard: LIKE '%keyword%'. (4) OR conditions mismatching the index. (5) NOT IN, negation conditions. (6) Optimizer determines full scan is more efficient.
Q4. What is the result of comparison operations with NULL?
Show Answer
All comparison operations with NULL return NULL (UNKNOWN). NULL = NULL is NOT TRUE but NULL. Use IS NULL or IS NOT NULL to check for nulls. Aggregate functions ignore NULLs (except COUNT(*)).
Q5. What are the differences between DELETE, TRUNCATE, and DROP?
Show Answer
DELETE: Row-by-row deletion, supports WHERE, rollback possible, triggers fire. TRUNCATE: Instant full deletion, no WHERE, DDL so usually not rollback-able (PostgreSQL allows it), very fast. DROP: Deletes the table itself (structure + data + indexes).
Intermediate (6-10)
Q6. How do Window Functions differ from GROUP BY?
Show Answer
GROUP BY reduces to one row per group. Window Functions aggregate while preserving individual rows. OVER(PARTITION BY ...) defines groups, and adds group-level calculations to each row. For example, display each employee's row with their department average salary alongside.
Q7. When do you choose CTE over subqueries?
Show Answer
CTE offers better readability, is reusable within the same query, and supports recursive queries. Subqueries work for simple cases. Performance varies by DB, but in PostgreSQL 12+, CTEs are inlined by default, so performance difference is minimal.
Q8. Why does column order matter in composite indexes?
Show Answer
B-Tree composite indexes follow the Leftmost Prefix rule. An index on (A, B, C) can be used for A only, A+B, or A+B+C conditions, but NOT for B only or C only. Generally place higher cardinality columns first, and equality conditions before range conditions.
Q9. What indicators should you watch in EXPLAIN ANALYZE?
Show Answer
(1) actual rows vs rows: large gap indicates stale statistics. (2) Seq Scan on large tables: suggests index needed. (3) Sort Method: disk means insufficient work_mem. (4) Nested Loop on large data: consider Hash Join. (5) Execution Time: total execution time.
Q10. How do you choose a transaction isolation level?
Show Answer
Read Committed suits most OLTP (PostgreSQL default). Serializable for financial/payment systems requiring exactness. Repeatable Read for reports/analytics needing snapshot consistency. Higher isolation means lower concurrency, so choose based on requirements.
Advanced (11-15)
Q11. What is a Covering Index and why does it matter?
Show Answer
A Covering Index includes all columns needed by a query, enabling Index Only Scan. This avoids table access entirely, greatly reducing I/O. In PostgreSQL, use the INCLUDE clause to add extra columns to an index.
Q12. Explain how MVCC works.
Show Answer
MVCC maintains multiple versions of each row. When a transaction starts, it records a snapshot point and cannot see later changes. Writes create new versions, reads access versions matching their snapshot. This way reads and writes never block each other. Old versions are cleaned up by VACUUM.
Q13. Give use cases for Partial Index and Expression Index.
Show Answer
Partial Index indexes only frequently searched subsets. Example: active users only with WHERE status = 'active'. Smaller index size means better performance. Expression Index indexes function results. Example: LOWER(email) index for case-insensitive search optimization.
Q14. Why is JSONB better than JSON in PostgreSQL?
Show Answer
JSONB stores data in binary format eliminating parsing overhead, supports GIN indexes, and enables containment (@>), existence (?), and path (@?) operators. JSON stores as raw text with faster insertion but requires parsing on every access and has limited indexing.
Q15. What strategies exist for large table migrations?
Show Answer
(1) Online DDL: ALTER TABLE ... ADD COLUMN completes instantly in PostgreSQL when defaults exist. (2) Batch updates instead of full table UPDATE. (3) Table partitioning to reduce scope. (4) pg_repack for table restructuring. (5) Test on replicas first. Zero-downtime is key.
13. Quiz (5 Questions)
Quiz 1
How many rows does this query return?
SELECT * FROM A CROSS JOIN B;
-- A: 5 rows, B: 3 rows
A) 3 B) 5 C) 8 D) 15
Show Answer
D) 15. CROSS JOIN produces a Cartesian product: 5 x 3 = 15 rows.
Quiz 2
Among ROW_NUMBER, RANK, and DENSE_RANK, which one skips the next rank when there are ties?
A) ROW_NUMBER B) RANK C) DENSE_RANK D) All skip
Show Answer
B) RANK. RANK assigns the same rank for ties and skips the next (1, 1, 3). DENSE_RANK does not skip (1, 1, 2). ROW_NUMBER always assigns unique numbers.
Quiz 3
With composite index (dept_id, salary, name), which query CANNOT use this index?
A) WHERE dept_id = 10
B) WHERE dept_id = 10 AND salary > 50000
C) WHERE salary > 50000
D) WHERE dept_id = 10 AND salary > 50000 AND name = 'Alice'
Show Answer
C) By the Leftmost Prefix rule, the index cannot be used with just the second column (salary) without the first column (dept_id).
Quiz 4
What is the minimum isolation level that prevents Phantom Reads?
A) Read Uncommitted B) Read Committed C) Repeatable Read D) Serializable
Show Answer
D) Serializable. Per SQL standard, only Serializable prevents Phantom Reads. However, PostgreSQL's Repeatable Read also prevents phantoms internally using MVCC snapshots.
Quiz 5
What is the dept_avg value in this result?
SELECT name, salary,
AVG(salary) OVER () AS dept_avg
FROM employees;
-- employees: Alice 100, Bob 200, Charlie 300
A) Different for each row B) 200 for all rows C) NULL D) Error
Show Answer
B) OVER () aggregates over the entire dataset without partitioning. (100 + 200 + 300) / 3 = 200 for all rows.
14. References
Official Documentation
Books
- SQL Performance Explained - Markus Winand
- PostgreSQL 14 Internals - Egor Rogov
- The Art of SQL - Stephane Faroult
- High Performance MySQL - Baron Schwartz et al.
- SQL Antipatterns - Bill Karwin
Online Resources
- Use The Index, Luke - Complete indexing guide
- pgexercises.com - PostgreSQL practice
- SQLZoo - SQL basics practice
- LeetCode Database Problems - SQL interview problems
- Mode Analytics SQL Tutorial
Tools
- explain.dalibo.com - PostgreSQL EXPLAIN visualizer
- pgMustard - Execution plan analysis
- DBeaver - Universal DB client
- pgAdmin - PostgreSQL administration tool