Split View: SQL 완전 정복 2025: 개발자 면접에 나오는 SQL — JOIN부터 Window Function, 쿼리 최적화까지

SQL 완전 정복 2025: 개발자 면접에 나오는 SQL — JOIN부터 Window Function, 쿼리 최적화까지

도입: 2025년에도 SQL이 최강인 이유
1. JOIN 완전 정복
2. 집계와 그룹화 (Aggregation)
3. Window Functions
4. 서브쿼리 vs CTE vs 파생 테이블
5. 재귀 CTE (Recursive CTE)
- 5.1 조직도 트리 탐색
- 5.2 수열 생성
6. 인덱싱 (Indexing)
7. EXPLAIN ANALYZE
8. 쿼리 최적화
9. 트랜잭션 격리 수준
10. PostgreSQL 고급 기능
11. 실전 문제 30선
12. 면접 질문 15선
13. 퀴즈 5선
14. 참고 자료

도입: 2025년에도 SQL이 최강인 이유

Stack Overflow 2024 Developer Survey에서 SQL은 가장 많이 사용되는 언어 3위를 기록했고, 개발자가 가장 배우고 싶은 기술 중 상위에 올랐습니다. PostgreSQL은 2023-2024 연속 DB-Engines 올해의 DBMS로 선정되었으며, AI/LLM 시대에도 데이터를 다루는 근본 기술은 SQL입니다.

왜 SQL이 면접에서 중요한가:

백엔드 포지션 면접의 80% 이상에서 SQL 문제 출제
데이터 엔지니어, 풀스택, DevOps까지 SQL 필수
실무에서 ORM만 쓰다가 성능 문제를 만나면 결국 SQL로 해결
Window Function과 CTE는 최근 면접 트렌드의 핵심

┌──────────────────────────────────────────────┐
│          SQL이 여전히 최강인 이유 (2025)       │
├──────────────────────────────────────────────┤
│  Stack Overflow  → 사용 언어 TOP 3           │
│  PostgreSQL      → 2년 연속 올해의 DBMS      │
│  AI/LLM          → Text-to-SQL 핵심 기술     │
│  클라우드         → Aurora, AlloyDB, Neon     │
│  면접 빈도        → 백엔드 면접 80%+ 출제     │
└──────────────────────────────────────────────┘

이 글에서는 JOIN의 모든 종류부터 Window Function, 인덱싱, 쿼리 최적화, 트랜잭션까지 면접에 필요한 SQL의 모든 것을 다룹니다.

1. JOIN 완전 정복

1.1 샘플 테이블

-- 직원 테이블
CREATE TABLE employees (
    id    INT PRIMARY KEY,
    name  VARCHAR(50),
    dept_id INT
);

INSERT INTO employees VALUES
(1, 'Alice',   10),
(2, 'Bob',     20),
(3, 'Charlie', 10),
(4, 'Diana',   30),
(5, 'Eve',     NULL);

-- 부서 테이블
CREATE TABLE departments (
    id    INT PRIMARY KEY,
    name  VARCHAR(50)
);

INSERT INTO departments VALUES
(10, 'Engineering'),
(20, 'Marketing'),
(40, 'HR');

1.2 INNER JOIN

양쪽 테이블에 모두 매칭되는 행만 반환합니다.

SELECT e.name AS employee, d.name AS department
FROM employees e
INNER JOIN departments d ON e.dept_id = d.id;

┌──────────┬─────────────┐
│ employee │ department  │
├──────────┼─────────────┤
│ Alice    │ Engineering │
│ Bob      │ Marketing   │
│ Charlie  │ Engineering │
└──────────┴─────────────┘
-- Diana(dept_id=30)와 Eve(dept_id=NULL)는 매칭 없어 제외
-- HR(id=40)도 매칭 직원 없어 제외

┌─────────┐   ┌─────────┐
│Employees│   │  Depts  │
│    ┌────┼───┼────┐    │
│    │████│███│████│    │
│    │████│███│████│    │
│    └────┼───┼────┘    │
└─────────┘   └─────────┘
      INNER JOIN = 교집합

1.3 LEFT JOIN (LEFT OUTER JOIN)

왼쪽 테이블의 모든 행 + 매칭되는 오른쪽 행 (없으면 NULL).

SELECT e.name AS employee, d.name AS department
FROM employees e
LEFT JOIN departments d ON e.dept_id = d.id;

┌──────────┬─────────────┐
│ employee │ department  │
├──────────┼─────────────┤
│ Alice    │ Engineering │
│ Bob      │ Marketing   │
│ Charlie  │ Engineering │
│ Diana    │ NULL        │
│ Eve      │ NULL        │
└──────────┴─────────────┘

1.4 RIGHT JOIN (RIGHT OUTER JOIN)

오른쪽 테이블의 모든 행 + 매칭되는 왼쪽 행 (없으면 NULL).

SELECT e.name AS employee, d.name AS department
FROM employees e
RIGHT JOIN departments d ON e.dept_id = d.id;

┌──────────┬─────────────┐
│ employee │ department  │
├──────────┼─────────────┤
│ Alice    │ Engineering │
│ Bob      │ Marketing   │
│ Charlie  │ Engineering │
│ NULL     │ HR          │
└──────────┴─────────────┘

1.5 FULL OUTER JOIN

양쪽 테이블의 모든 행을 반환합니다. 매칭이 없으면 NULL.

SELECT e.name AS employee, d.name AS department
FROM employees e
FULL OUTER JOIN departments d ON e.dept_id = d.id;

┌──────────┬─────────────┐
│ employee │ department  │
├──────────┼─────────────┤
│ Alice    │ Engineering │
│ Bob      │ Marketing   │
│ Charlie  │ Engineering │
│ Diana    │ NULL        │
│ Eve      │ NULL        │
│ NULL     │ HR          │
└──────────┴─────────────┘

1.6 CROSS JOIN

두 테이블의 모든 조합(카테시안 곱)을 반환합니다.

SELECT e.name, d.name
FROM employees e
CROSS JOIN departments d;
-- 5명 x 3부서 = 15행

1.7 SELF JOIN

같은 테이블을 자기 자신과 조인합니다.

-- 관리자-직원 관계 조회
CREATE TABLE staff (
    id         INT PRIMARY KEY,
    name       VARCHAR(50),
    manager_id INT
);

INSERT INTO staff VALUES
(1, 'CEO',     NULL),
(2, 'CTO',     1),
(3, 'Alice',   2),
(4, 'Bob',     2);

SELECT
    s.name AS employee,
    m.name AS manager
FROM staff s
LEFT JOIN staff m ON s.manager_id = m.id;

┌──────────┬─────────┐
│ employee │ manager │
├──────────┼─────────┤
│ CEO      │ NULL    │
│ CTO      │ CEO     │
│ Alice    │ CTO     │
│ Bob      │ CTO     │
└──────────┴─────────┘

1.8 JOIN 성능 최적화 팁

┌──────────────────────────────────────────────────┐
│              JOIN 최적화 체크리스트                │
├──────────────────────────────────────────────────┤
│ 1. JOIN 컬럼에 인덱스가 있는가?                   │
│ 2. 불필요한 컬럼을 SELECT하고 있지 않은가?        │
│ 3. WHERE 조건을 JOIN 전에 적용할 수 있는가?       │
│ 4. EXPLAIN으로 실행 계획을 확인했는가?             │
│ 5. 데이터 크기에 따라 JOIN 순서를 조정했는가?      │
└──────────────────────────────────────────────────┘

2. 집계와 그룹화 (Aggregation)

2.1 GROUP BY와 HAVING

-- 부서별 직원 수와 평균 급여
SELECT
    dept_id,
    COUNT(*) AS emp_count,
    AVG(salary) AS avg_salary,
    MAX(salary) AS max_salary,
    MIN(salary) AS min_salary
FROM employees
GROUP BY dept_id
HAVING COUNT(*) >= 2
ORDER BY avg_salary DESC;

핵심 규칙: SELECT에 집계 함수가 아닌 컬럼이 있으면 반드시 GROUP BY에 포함해야 합니다.

2.2 ROLLUP

-- 부서별, 전체 합계 포함
SELECT
    COALESCE(dept_name, '전체') AS department,
    COALESCE(job_title, '소계') AS job,
    COUNT(*) AS emp_count,
    SUM(salary) AS total_salary
FROM employees e
JOIN departments d ON e.dept_id = d.id
GROUP BY ROLLUP(dept_name, job_title);

┌──────────────┬──────────┬───────────┬──────────────┐
│ department   │ job      │ emp_count │ total_salary │
├──────────────┼──────────┼───────────┼──────────────┤
│ Engineering  │ Senior   │    3      │   300000     │
│ Engineering  │ Junior   │    2      │   140000     │
│ Engineering  │ 소계     │    5      │   440000     │
│ Marketing    │ Senior   │    1      │    90000     │
│ Marketing    │ Junior   │    2      │   120000     │
│ Marketing    │ 소계     │    3      │   210000     │
│ 전체         │ 소계     │    8      │   650000     │
└──────────────┴──────────┴───────────┴──────────────┘

2.3 CUBE와 GROUPING SETS

-- CUBE: 모든 조합의 소계
SELECT dept_name, job_title, SUM(salary)
FROM employees e JOIN departments d ON e.dept_id = d.id
GROUP BY CUBE(dept_name, job_title);

-- GROUPING SETS: 원하는 조합만 선택
SELECT dept_name, job_title, SUM(salary)
FROM employees e JOIN departments d ON e.dept_id = d.id
GROUP BY GROUPING SETS (
    (dept_name, job_title),  -- 부서+직급별
    (dept_name),              -- 부서별
    ()                        -- 전체
);

3. Window Functions

3.1 Window Function이란

집계하면서도 개별 행을 유지하는 함수입니다. GROUP BY와 달리 행을 줄이지 않습니다.

-- GROUP BY: 행 축소 (부서별 1행)
SELECT dept_id, AVG(salary) FROM employees GROUP BY dept_id;

-- Window Function: 행 유지 (모든 직원 + 부서 평균)
SELECT
    name,
    dept_id,
    salary,
    AVG(salary) OVER (PARTITION BY dept_id) AS dept_avg
FROM employees;

3.2 ROW_NUMBER, RANK, DENSE_RANK

SELECT
    name,
    dept_id,
    salary,
    ROW_NUMBER() OVER (ORDER BY salary DESC) AS row_num,
    RANK()       OVER (ORDER BY salary DESC) AS rank,
    DENSE_RANK() OVER (ORDER BY salary DESC) AS dense_rank
FROM employees;

┌─────────┬─────────┬────────┬─────────┬──────┬────────────┐
│  name   │ dept_id │ salary │ row_num │ rank │ dense_rank │
├─────────┼─────────┼────────┼─────────┼──────┼────────────┤
│ Alice   │   10    │ 120000 │    1    │  1   │     1      │
│ Charlie │   10    │ 120000 │    2    │  1   │     1      │
│ Bob     │   20    │ 100000 │    3    │  3   │     2      │
│ Diana   │   30    │  80000 │    4    │  4   │     3      │
│ Eve     │   20    │  80000 │    5    │  4   │     3      │
└─────────┴─────────┴────────┴─────────┴──────┴────────────┘

ROW_NUMBER: 항상 고유한 번호 (동점에도 다른 번호)
RANK:       동점은 같은 번호, 다음은 건너뜀 (1, 1, 3)
DENSE_RANK: 동점은 같은 번호, 다음은 연속  (1, 1, 2)

3.3 NTILE

데이터를 N등분합니다.

-- 급여 기준 4분위로 나누기
SELECT
    name,
    salary,
    NTILE(4) OVER (ORDER BY salary) AS quartile
FROM employees;

3.4 LAG와 LEAD

이전/다음 행의 값을 참조합니다.

-- 전월 대비 매출 변화
SELECT
    month,
    revenue,
    LAG(revenue, 1) OVER (ORDER BY month) AS prev_month,
    revenue - LAG(revenue, 1) OVER (ORDER BY month) AS change,
    LEAD(revenue, 1) OVER (ORDER BY month) AS next_month
FROM monthly_sales;

┌───────┬─────────┬────────────┬────────┬────────────┐
│ month │ revenue │ prev_month │ change │ next_month │
├───────┼─────────┼────────────┼────────┼────────────┤
│ 01    │  50000  │   NULL     │  NULL  │   60000    │
│ 02    │  60000  │   50000    │  10000 │   45000    │
│ 03    │  45000  │   60000    │ -15000 │   70000    │
│ 04    │  70000  │   45000    │  25000 │   NULL     │
└───────┴─────────┴────────────┴────────┴────────────┘

3.5 Running Total (누적 합계)

SELECT
    order_date,
    amount,
    SUM(amount) OVER (ORDER BY order_date) AS running_total,
    SUM(amount) OVER (
        ORDER BY order_date
        ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
    ) AS moving_avg_3
FROM orders;

3.6 부서별 TOP N 조회

-- 각 부서에서 급여 TOP 3
WITH ranked AS (
    SELECT
        name,
        dept_id,
        salary,
        ROW_NUMBER() OVER (
            PARTITION BY dept_id
            ORDER BY salary DESC
        ) AS rn
    FROM employees
)
SELECT name, dept_id, salary
FROM ranked
WHERE rn <= 3;

4. 서브쿼리 vs CTE vs 파생 테이블

4.1 서브쿼리 (Subquery)

-- 상관 서브쿼리: 자기 부서 평균보다 급여가 높은 직원
SELECT name, salary, dept_id
FROM employees e1
WHERE salary > (
    SELECT AVG(salary)
    FROM employees e2
    WHERE e2.dept_id = e1.dept_id
);

-- 비상관 서브쿼리: 전체 평균보다 급여가 높은 직원
SELECT name, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);

4.2 CTE (Common Table Expression)

-- CTE: 가독성 향상
WITH dept_avg AS (
    SELECT dept_id, AVG(salary) AS avg_salary
    FROM employees
    GROUP BY dept_id
),
high_earners AS (
    SELECT e.name, e.salary, e.dept_id, d.avg_salary
    FROM employees e
    JOIN dept_avg d ON e.dept_id = d.dept_id
    WHERE e.salary > d.avg_salary
)
SELECT * FROM high_earners ORDER BY salary DESC;

4.3 비교

┌──────────────┬───────────────┬───────────────┬────────────────┐
│    항목       │  서브쿼리      │     CTE       │  파생 테이블    │
├──────────────┼───────────────┼───────────────┼────────────────┤
│ 가독성       │ 낮음 (중첩)    │ 높음          │ 보통           │
│ 재사용       │ 불가          │ 같은 쿼리 내   │ 불가           │
│ 재귀 지원    │ 불가          │ 가능 (WITH REC)│ 불가           │
│ 성능         │ DB마다 다름    │ DB마다 다름    │ 인라인 뷰      │
│ 디버깅       │ 어려움        │ 쉬움          │ 보통           │
└──────────────┴───────────────┴───────────────┴────────────────┘

5. 재귀 CTE (Recursive CTE)

5.1 조직도 트리 탐색

-- 조직도 테이블
CREATE TABLE org (
    id         INT PRIMARY KEY,
    name       VARCHAR(50),
    manager_id INT REFERENCES org(id)
);

INSERT INTO org VALUES
(1, 'CEO',      NULL),
(2, 'VP Eng',   1),
(3, 'VP Sales', 1),
(4, 'Dir Eng',  2),
(5, 'Lead Dev', 4),
(6, 'Dev 1',    5),
(7, 'Dev 2',    5);

-- 재귀 CTE로 CEO부터 전체 조직도 탐색
WITH RECURSIVE org_tree AS (
    -- 기저 조건: CEO (manager 없음)
    SELECT id, name, manager_id, 0 AS depth, name::TEXT AS path
    FROM org
    WHERE manager_id IS NULL

    UNION ALL

    -- 재귀 조건: 부하 직원 탐색
    SELECT o.id, o.name, o.manager_id,
           t.depth + 1,
           t.path || ' > ' || o.name
    FROM org o
    JOIN org_tree t ON o.manager_id = t.id
)
SELECT
    REPEAT('  ', depth) || name AS hierarchy,
    depth,
    path
FROM org_tree
ORDER BY path;

┌────────────────────┬───────┬──────────────────────────────┐
│ hierarchy          │ depth │ path                         │
├────────────────────┼───────┼──────────────────────────────┤
│ CEO                │   0   │ CEO                          │
│   VP Eng           │   1   │ CEO > VP Eng                 │
│     Dir Eng        │   2   │ CEO > VP Eng > Dir Eng       │
│       Lead Dev     │   3   │ CEO > VP Eng > Dir Eng > ... │
│         Dev 1      │   4   │ CEO > VP Eng > Dir Eng > ... │
│         Dev 2      │   4   │ CEO > VP Eng > Dir Eng > ... │
│   VP Sales         │   1   │ CEO > VP Sales               │
└────────────────────┴───────┴──────────────────────────────┘

5.2 수열 생성

-- 1부터 10까지 수열
WITH RECURSIVE nums AS (
    SELECT 1 AS n
    UNION ALL
    SELECT n + 1 FROM nums WHERE n < 10
)
SELECT n FROM nums;

-- 날짜 시리즈 생성
WITH RECURSIVE dates AS (
    SELECT DATE '2025-01-01' AS dt
    UNION ALL
    SELECT dt + INTERVAL '1 day' FROM dates WHERE dt < '2025-01-31'
)
SELECT dt FROM dates;

6. 인덱싱 (Indexing)

6.1 B-Tree 인덱스

가장 일반적인 인덱스. 등호(=), 범위 비교, 정렬, LIKE 'prefix%'에 효과적입니다.

CREATE INDEX idx_employees_name ON employees(name);

-- 효과적인 쿼리
SELECT * FROM employees WHERE name = 'Alice';          -- 등호
SELECT * FROM employees WHERE name BETWEEN 'A' AND 'M'; -- 범위
SELECT * FROM employees ORDER BY name;                  -- 정렬
SELECT * FROM employees WHERE name LIKE 'Al%';         -- 접두사

B-Tree 구조:
              [M]
             /   \
          [D,H]  [P,T]
         /  |  \  / | \
       [A-C][E-G][I-L][N-O][Q-S][U-Z]
       (리프 노드: 실제 데이터 포인터)

6.2 Hash 인덱스

등호(=) 비교에만 효과적. 범위 검색, 정렬은 불가합니다.

CREATE INDEX idx_employees_email_hash
ON employees USING HASH (email);

-- 효과적
SELECT * FROM employees WHERE email = 'alice@example.com';

-- 비효과적 (Hash로는 범위 검색 불가)
-- SELECT * FROM employees WHERE email > 'a@' AND email < 'b@';

6.3 GIN (Generalized Inverted Index)

배열, JSONB, 전문 검색에 적합합니다.

-- JSONB 필드 인덱싱
CREATE INDEX idx_products_tags ON products USING GIN (tags);

SELECT * FROM products WHERE tags @> '["electronics"]';

-- 전문 검색
CREATE INDEX idx_articles_search ON articles
USING GIN (to_tsvector('english', title || ' ' || body));

6.4 GiST (Generalized Search Tree)

지리 데이터, 범위 타입, 근접 검색에 적합합니다.

-- 지리 데이터 인덱싱
CREATE INDEX idx_locations_geom ON locations USING GiST (geom);

-- 반경 내 검색
SELECT * FROM locations
WHERE ST_DWithin(geom, ST_MakePoint(127.0, 37.5)::geography, 1000);

6.5 BRIN (Block Range Index)

물리적으로 정렬된 대용량 테이블에 적합. 매우 작은 크기로 큰 효과를 냅니다.

-- 시계열 데이터에 최적
CREATE INDEX idx_logs_created ON logs USING BRIN (created_at);

6.6 복합 인덱스 (Composite Index)

-- 컬럼 순서가 중요!
CREATE INDEX idx_emp_dept_salary ON employees(dept_id, salary);

-- 효과적: dept_id만 사용, 또는 dept_id + salary 둘 다 사용
SELECT * FROM employees WHERE dept_id = 10;
SELECT * FROM employees WHERE dept_id = 10 AND salary > 50000;

-- 비효과적: salary만 사용 (첫 번째 컬럼 건너뜀)
-- SELECT * FROM employees WHERE salary > 50000;

복합 인덱스의 순서 규칙 (Leftmost Prefix):

인덱스: (A, B, C)

WHERE A = ?                → 사용 O
WHERE A = ? AND B = ?      → 사용 O
WHERE A = ? AND B = ? AND C = ? → 사용 O
WHERE B = ?                → 사용 X (A 없음)
WHERE B = ? AND C = ?      → 사용 X (A 없음)
WHERE A = ? AND C = ?      → A만 사용 (B 건너뜀)

7. EXPLAIN ANALYZE

7.1 실행 계획 읽기

EXPLAIN ANALYZE
SELECT e.name, d.name AS dept
FROM employees e
JOIN departments d ON e.dept_id = d.id
WHERE e.salary > 50000;

Hash Join  (cost=1.09..2.24 rows=3 width=36) (actual time=0.05..0.06 rows=3 loops=1)
  Hash Cond: (e.dept_id = d.id)
  ->  Seq Scan on employees e  (cost=0.00..1.12 rows=3 width=22) (actual time=0.01..0.02 rows=3 loops=1)
        Filter: (salary > 50000)
        Rows Removed by Filter: 2
  ->  Hash  (cost=1.03..1.03 rows=3 width=18) (actual time=0.01..0.01 rows=3 loops=1)
        Buckets: 1024  Batches: 1  Memory Usage: 9kB
        ->  Seq Scan on departments d  (cost=0.00..1.03 rows=3 width=18) (actual time=0.00..0.00 rows=3 loops=1)
Planning Time: 0.15 ms
Execution Time: 0.09 ms

7.2 주요 스캔 타입

┌──────────────────┬──────────────────────────────────────┐
│ 스캔 타입         │ 설명                                  │
├──────────────────┼──────────────────────────────────────┤
│ Seq Scan         │ 전체 테이블 순차 스캔 (풀 스캔)        │
│ Index Scan       │ 인덱스로 조건 행 찾고 테이블 접근       │
│ Index Only Scan  │ 인덱스만으로 결과 반환 (테이블 안 봄)   │
│ Bitmap Index Scan│ 인덱스로 비트맵 만들고 테이블 접근      │
│ Bitmap Heap Scan │ 비트맵 기반 테이블 접근                │
└──────────────────┴──────────────────────────────────────┘

7.3 비용 읽기

cost=시작비용..총비용  rows=예상행수  width=행크기(bytes)

- 시작 비용: 첫 번째 행을 반환하기까지의 비용
- 총 비용: 모든 행을 반환하는 데 드는 총 비용
- actual time: 실제 실행 시간 (ms)
- rows: 실제 반환 행 수
- loops: 해당 노드 반복 실행 횟수

7.4 느린 쿼리 진단 체크리스트

┌──────────────────────────────────────────────────┐
│          쿼리 최적화 진단 플로우                    │
├──────────────────────────────────────────────────┤
│ 1. EXPLAIN ANALYZE 실행                           │
│    ↓                                              │
│ 2. Seq Scan이 대형 테이블에 있는가?               │
│    → 인덱스 추가 고려                              │
│    ↓                                              │
│ 3. 예상 rows와 실제 rows 차이가 큰가?             │
│    → ANALYZE 실행하여 통계 갱신                     │
│    ↓                                              │
│ 4. Nested Loop이 대량 데이터에 사용되는가?         │
│    → Hash Join이나 Merge Join이 더 적합할 수 있음  │
│    ↓                                              │
│ 5. Sort가 디스크에서 실행되는가?                   │
│    → work_mem 증가 또는 인덱스 추가                 │
└──────────────────────────────────────────────────┘

8. 쿼리 최적화

8.1 N+1 문제 in SQL

-- 나쁜 예: 각 주문마다 별도 쿼리 (애플리케이션에서)
-- 1. SELECT * FROM orders WHERE user_id = 1;
-- 2. SELECT * FROM order_items WHERE order_id = 101;
-- 3. SELECT * FROM order_items WHERE order_id = 102;
-- ... (N개 주문이면 N+1번 쿼리)

-- 좋은 예: JOIN으로 한 번에
SELECT o.id, o.total, oi.product_name, oi.quantity
FROM orders o
JOIN order_items oi ON o.id = oi.order_id
WHERE o.user_id = 1;

8.2 Covering Index (커버링 인덱스)

인덱스만으로 쿼리를 완료하여 테이블 접근을 피합니다.

-- 인덱스 생성
CREATE INDEX idx_emp_dept_name_salary
ON employees(dept_id, name, salary);

-- Index Only Scan 가능 (테이블 접근 불필요)
SELECT name, salary FROM employees WHERE dept_id = 10;

8.3 Partial Index (부분 인덱스)

조건을 만족하는 행만 인덱싱합니다.

-- 활성 사용자만 인덱싱 (대부분의 쿼리가 활성 사용자 대상)
CREATE INDEX idx_active_users
ON users(email)
WHERE status = 'active';

-- 효과적으로 사용됨
SELECT * FROM users WHERE status = 'active' AND email = 'alice@example.com';

8.4 Expression Index (표현식 인덱스)

-- 대소문자 무시 검색을 위한 인덱스
CREATE INDEX idx_users_lower_email
ON users(LOWER(email));

SELECT * FROM users WHERE LOWER(email) = 'alice@example.com';

-- JSON 필드 인덱싱
CREATE INDEX idx_data_category
ON products((data->>'category'));

8.5 최적화 팁 모음

-- 1. EXISTS vs IN: 대량 데이터에서는 EXISTS가 보통 빠름
-- 느린
SELECT * FROM orders WHERE customer_id IN (
    SELECT id FROM customers WHERE country = 'KR'
);
-- 빠른
SELECT * FROM orders o WHERE EXISTS (
    SELECT 1 FROM customers c
    WHERE c.id = o.customer_id AND c.country = 'KR'
);

-- 2. LIMIT과 함께 사용하면 불필요한 스캔 방지
SELECT * FROM logs ORDER BY created_at DESC LIMIT 100;

-- 3. UNION ALL vs UNION: 중복 제거 불필요 시 ALL 사용
SELECT name FROM employees_2024
UNION ALL
SELECT name FROM employees_2025;

-- 4. 불필요한 ORDER BY 제거 (서브쿼리 내에서)
-- 5. SELECT *보다 필요한 컬럼만 명시

9. 트랜잭션 격리 수준

9.1 4가지 격리 수준

┌──────────────────┬───────────┬──────────────┬──────────────┬─────────────┐
│ 격리 수준         │Dirty Read │Non-Repeatable│Phantom Read  │ 성능        │
│                  │           │    Read      │              │             │
├──────────────────┼───────────┼──────────────┼──────────────┼─────────────┤
│ Read Uncommitted │ 가능      │ 가능         │ 가능          │ 가장 빠름   │
│ Read Committed   │ 방지      │ 가능         │ 가능          │ 빠름        │
│ Repeatable Read  │ 방지      │ 방지         │ 가능          │ 보통        │
│ Serializable     │ 방지      │ 방지         │ 방지          │ 가장 느림   │
└──────────────────┴───────────┴──────────────┴──────────────┴─────────────┘

PostgreSQL 기본값: Read Committed
MySQL InnoDB 기본값: Repeatable Read

9.2 이상 현상 설명

Dirty Read: 커밋되지 않은 데이터를 읽음

-- Transaction A
UPDATE accounts SET balance = 0 WHERE id = 1;
-- (아직 COMMIT 안 함)

-- Transaction B (Read Uncommitted)
SELECT balance FROM accounts WHERE id = 1;
-- balance = 0 (커밋 안 된 값!)

-- Transaction A
ROLLBACK;
-- 실제로는 balance가 원래 값으로 복원됨
-- Transaction B는 잘못된 데이터를 기반으로 결정을 내릴 수 있음

Non-Repeatable Read: 같은 쿼리를 두 번 실행했을 때 결과가 다름

-- Transaction A
SELECT balance FROM accounts WHERE id = 1;  -- 1000

-- Transaction B
UPDATE accounts SET balance = 500 WHERE id = 1;
COMMIT;

-- Transaction A (같은 트랜잭션 내에서)
SELECT balance FROM accounts WHERE id = 1;  -- 500 (결과가 다름!)

Phantom Read: 같은 조건 쿼리에서 행의 수가 달라짐

-- Transaction A
SELECT COUNT(*) FROM orders WHERE status = 'pending';  -- 5

-- Transaction B
INSERT INTO orders (status) VALUES ('pending');
COMMIT;

-- Transaction A
SELECT COUNT(*) FROM orders WHERE status = 'pending';  -- 6 (유령 행!)

9.3 MVCC (Multi-Version Concurrency Control)

PostgreSQL은 MVCC를 사용하여 읽기와 쓰기가 서로 블로킹하지 않습니다.

┌──────────────────────────────────────────────────┐
│                    MVCC 원리                      │
├──────────────────────────────────────────────────┤
│                                                  │
│  Transaction 100 (Read):                         │
│  "나는 xmin 100 이하의 버전만 본다"               │
│                                                  │
│  Transaction 101 (Write):                        │
│  "새 버전을 xmin=101로 생성, 기존 행에 xmax=101" │
│                                                  │
│  → Transaction 100은 여전히 이전 버전을 봄        │
│  → 읽기와 쓰기가 서로 블로킹하지 않음             │
│                                                  │
└──────────────────────────────────────────────────┘

10. PostgreSQL 고급 기능

10.1 JSONB 쿼리

-- JSONB 데이터 저장
CREATE TABLE products (
    id   SERIAL PRIMARY KEY,
    name TEXT,
    data JSONB
);

INSERT INTO products (name, data) VALUES
('Laptop', '{"brand": "Dell", "specs": {"ram": 16, "cpu": "i7"}, "tags": ["electronics", "computer"]}'),
('Phone',  '{"brand": "Apple", "specs": {"ram": 8, "cpu": "A17"}, "tags": ["electronics", "mobile"]}');

-- JSONB 쿼리
SELECT name, data->>'brand' AS brand
FROM products
WHERE data->'specs'->>'ram' = '16';

-- 배열 포함 검색
SELECT * FROM products WHERE data->'tags' @> '"mobile"';

-- JSONB 경로 쿼리 (PostgreSQL 12+)
SELECT * FROM products
WHERE data @? '$.specs ? (@.ram > 10)';

10.2 배열 연산자

-- 배열 타입 사용
CREATE TABLE articles (
    id   SERIAL PRIMARY KEY,
    title TEXT,
    tags  TEXT[]
);

INSERT INTO articles (title, tags) VALUES
('SQL Guide', ARRAY['sql', 'database', 'backend']),
('React Tips', ARRAY['react', 'frontend', 'javascript']);

-- 배열 포함 검색
SELECT * FROM articles WHERE tags @> ARRAY['sql'];
SELECT * FROM articles WHERE 'react' = ANY(tags);
SELECT * FROM articles WHERE tags && ARRAY['sql', 'react'];  -- 겹침 확인

10.3 LATERAL JOIN

-- 각 부서의 최근 주문 3개씩 조회
SELECT d.name, recent.*
FROM departments d
CROSS JOIN LATERAL (
    SELECT o.id, o.amount, o.created_at
    FROM orders o
    WHERE o.dept_id = d.id
    ORDER BY o.created_at DESC
    LIMIT 3
) recent;

10.4 Materialized View

-- 복잡한 집계 결과를 미리 계산해 저장
CREATE MATERIALIZED VIEW monthly_sales_summary AS
SELECT
    DATE_TRUNC('month', order_date) AS month,
    dept_id,
    COUNT(*) AS order_count,
    SUM(amount) AS total_amount,
    AVG(amount) AS avg_amount
FROM orders
GROUP BY DATE_TRUNC('month', order_date), dept_id;

-- 인덱스 추가
CREATE INDEX idx_monthly_sales ON monthly_sales_summary(month, dept_id);

-- 데이터 갱신
REFRESH MATERIALIZED VIEW CONCURRENTLY monthly_sales_summary;

11. 실전 문제 30선

Easy (1-10)

문제 1. 모든 직원의 이름과 소속 부서명을 조회하세요. 부서가 없는 직원도 포함합니다.

정답 보기

SELECT e.name, d.name AS department
FROM employees e
LEFT JOIN departments d ON e.dept_id = d.id;

문제 2. 각 부서별 직원 수를 구하세요.

정답 보기

SELECT d.name, COUNT(e.id) AS emp_count
FROM departments d
LEFT JOIN employees e ON d.id = e.dept_id
GROUP BY d.name
ORDER BY emp_count DESC;

문제 3. 급여가 전체 평균보다 높은 직원을 조회하세요.

정답 보기

SELECT name, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);

문제 4. 중복된 이메일 주소를 가진 사용자를 찾으세요.

정답 보기

SELECT email, COUNT(*) AS cnt
FROM users
GROUP BY email
HAVING COUNT(*) > 1;

문제 5. 주문이 한 건도 없는 고객을 찾으세요.

정답 보기

-- LEFT JOIN 방법
SELECT c.name
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id
WHERE o.id IS NULL;

-- NOT EXISTS 방법
SELECT c.name
FROM customers c
WHERE NOT EXISTS (
    SELECT 1 FROM orders o WHERE o.customer_id = c.id
);

문제 6. 가장 최근 주문 5건을 조회하세요.

정답 보기

SELECT * FROM orders
ORDER BY created_at DESC
LIMIT 5;

문제 7. 부서별 평균 급여를 구하되, 평균 급여가 60000 이상인 부서만 표시하세요.

정답 보기

SELECT dept_id, AVG(salary) AS avg_salary
FROM employees
GROUP BY dept_id
HAVING AVG(salary) >= 60000;

문제 8. 두 번째로 높은 급여를 구하세요.

정답 보기

SELECT MAX(salary) AS second_highest
FROM employees
WHERE salary < (SELECT MAX(salary) FROM employees);

-- 또는
SELECT DISTINCT salary
FROM employees
ORDER BY salary DESC
LIMIT 1 OFFSET 1;

문제 9. NULL 값을 기본값으로 대체하여 조회하세요.

정답 보기

SELECT name, COALESCE(phone, 'N/A') AS phone
FROM employees;

문제 10. 문자열 패턴으로 검색: 이름이 'A'로 시작하는 직원을 찾으세요.

정답 보기

SELECT * FROM employees WHERE name LIKE 'A%';
-- 또는 (대소문자 무시)
SELECT * FROM employees WHERE name ILIKE 'a%';

Medium (11-25)

문제 11. 각 부서에서 급여가 가장 높은 직원을 구하세요.

정답 보기

WITH ranked AS (
    SELECT name, dept_id, salary,
           RANK() OVER (PARTITION BY dept_id ORDER BY salary DESC) AS rk
    FROM employees
)
SELECT name, dept_id, salary FROM ranked WHERE rk = 1;

문제 12. 월별 매출과 전월 대비 증감율을 구하세요.

정답 보기

SELECT
    month,
    revenue,
    LAG(revenue) OVER (ORDER BY month) AS prev_revenue,
    ROUND(
        (revenue - LAG(revenue) OVER (ORDER BY month))::NUMERIC
        / NULLIF(LAG(revenue) OVER (ORDER BY month), 0) * 100,
        2
    ) AS growth_pct
FROM monthly_sales;

문제 13. 연속으로 3일 이상 로그인한 사용자를 찾으세요.

정답 보기

WITH login_groups AS (
    SELECT
        user_id,
        login_date,
        login_date - ROW_NUMBER() OVER (
            PARTITION BY user_id ORDER BY login_date
        )::INT * INTERVAL '1 day' AS grp
    FROM logins
)
SELECT user_id, COUNT(*) AS consecutive_days, MIN(login_date), MAX(login_date)
FROM login_groups
GROUP BY user_id, grp
HAVING COUNT(*) >= 3;

문제 14. 누적 합계 (Running Total)을 구하세요.

정답 보기

SELECT
    order_date,
    amount,
    SUM(amount) OVER (ORDER BY order_date) AS running_total
FROM orders;

문제 15. 피벗 테이블: 부서별/분기별 매출을 행에서 열로 변환하세요.

정답 보기

SELECT
    dept_name,
    SUM(CASE WHEN quarter = 1 THEN revenue ELSE 0 END) AS q1,
    SUM(CASE WHEN quarter = 2 THEN revenue ELSE 0 END) AS q2,
    SUM(CASE WHEN quarter = 3 THEN revenue ELSE 0 END) AS q3,
    SUM(CASE WHEN quarter = 4 THEN revenue ELSE 0 END) AS q4
FROM quarterly_sales
GROUP BY dept_name;

문제 16. 재귀 CTE로 1부터 100까지의 피보나치 수열을 생성하세요.

정답 보기

WITH RECURSIVE fib AS (
    SELECT 1 AS n, 0::BIGINT AS a, 1::BIGINT AS b
    UNION ALL
    SELECT n + 1, b, a + b FROM fib WHERE n < 20
)
SELECT n, a AS fibonacci FROM fib;

문제 17. 자기 부서 내에서 급여 순위를 매기세요.

정답 보기

SELECT
    name,
    dept_id,
    salary,
    DENSE_RANK() OVER (PARTITION BY dept_id ORDER BY salary DESC) AS dept_rank
FROM employees;

문제 18. 가장 많이 팔린 상품 TOP 5와 해당 카테고리를 구하세요.

정답 보기

SELECT p.name, p.category, SUM(oi.quantity) AS total_sold
FROM order_items oi
JOIN products p ON oi.product_id = p.id
GROUP BY p.name, p.category
ORDER BY total_sold DESC
LIMIT 5;

문제 19. LATERAL JOIN으로 각 카테고리의 최신 상품 3개를 구하세요.

정답 보기

SELECT c.name AS category, p.*
FROM categories c
CROSS JOIN LATERAL (
    SELECT id, name, price, created_at
    FROM products
    WHERE category_id = c.id
    ORDER BY created_at DESC
    LIMIT 3
) p;

문제 20. 이동 평균 (3개월 이동 평균)을 구하세요.

정답 보기

SELECT
    month,
    revenue,
    AVG(revenue) OVER (
        ORDER BY month
        ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
    ) AS moving_avg_3m
FROM monthly_sales;

문제 21. JSONB 필드에서 특정 조건으로 필터링하세요.

정답 보기

SELECT id, name, data->>'brand' AS brand
FROM products
WHERE (data->'specs'->>'ram')::INT >= 16
  AND data->'tags' @> '"electronics"';

문제 22. 갭 분석: 연속된 ID에서 빠진 번호를 찾으세요.

정답 보기

WITH id_range AS (
    SELECT generate_series(MIN(id), MAX(id)) AS expected_id
    FROM orders
)
SELECT expected_id AS missing_id
FROM id_range
WHERE expected_id NOT IN (SELECT id FROM orders);

문제 23. 중복 행 제거 (가장 최신만 유지).

정답 보기

DELETE FROM users
WHERE id NOT IN (
    SELECT DISTINCT ON (email) id
    FROM users
    ORDER BY email, created_at DESC
);

문제 24. GROUP BY없이 비율 계산.

정답 보기

SELECT
    status,
    COUNT(*) AS cnt,
    ROUND(
        COUNT(*)::NUMERIC / SUM(COUNT(*)) OVER () * 100, 2
    ) AS percentage
FROM orders
GROUP BY status;

문제 25. 누적 고유 사용자 수 구하기.

정답 보기

WITH daily_new AS (
    SELECT
        user_id,
        MIN(login_date) AS first_date
    FROM logins
    GROUP BY user_id
)
SELECT
    login_date,
    COUNT(*) AS new_users,
    SUM(COUNT(*)) OVER (ORDER BY login_date) AS cumulative_users
FROM daily_new
GROUP BY login_date
ORDER BY login_date;

Hard (26-30)

문제 26. 최장 연속 증가 매출 기간을 구하세요.

정답 보기

WITH changes AS (
    SELECT
        month,
        revenue,
        CASE
            WHEN revenue > LAG(revenue) OVER (ORDER BY month)
            THEN 0 ELSE 1
        END AS is_break
    FROM monthly_sales
),
groups AS (
    SELECT *,
        SUM(is_break) OVER (ORDER BY month) AS grp
    FROM changes
)
SELECT
    grp,
    MIN(month) AS start_month,
    MAX(month) AS end_month,
    COUNT(*) AS streak_length
FROM groups
GROUP BY grp
ORDER BY streak_length DESC
LIMIT 1;

문제 27. 재귀 CTE로 BOM(Bill of Materials) 전개하기.

정답 보기

WITH RECURSIVE bom AS (
    -- 최상위 제품
    SELECT
        parent_id, child_id, quantity, 1 AS level,
        quantity AS total_qty
    FROM bill_of_materials
    WHERE parent_id = 'PRODUCT_A'

    UNION ALL

    -- 하위 부품 전개
    SELECT
        b.parent_id, b.child_id, b.quantity,
        bom.level + 1,
        bom.total_qty * b.quantity
    FROM bill_of_materials b
    JOIN bom ON b.parent_id = bom.child_id
    WHERE bom.level < 10  -- 무한 재귀 방지
)
SELECT child_id AS component, SUM(total_qty) AS total_needed
FROM bom
GROUP BY child_id
ORDER BY total_needed DESC;

문제 28. 시간 간격 겹침 감지 (스케줄링 충돌).

정답 보기

SELECT
    a.id AS meeting_a,
    b.id AS meeting_b,
    a.start_time,
    a.end_time,
    b.start_time AS conflict_start,
    b.end_time AS conflict_end
FROM meetings a
JOIN meetings b ON a.id < b.id
    AND a.room_id = b.room_id
    AND a.start_time < b.end_time
    AND a.end_time > b.start_time;

문제 29. 메디안 (중앙값) 구하기 (집계 함수 없이).

정답 보기

-- PostgreSQL에는 percentile_cont가 있지만, 순수 SQL로:
WITH ordered AS (
    SELECT salary,
           ROW_NUMBER() OVER (ORDER BY salary) AS rn,
           COUNT(*) OVER () AS total
    FROM employees
)
SELECT AVG(salary) AS median
FROM ordered
WHERE rn IN (
    FLOOR((total + 1) / 2.0),
    CEIL((total + 1) / 2.0)
);

-- PostgreSQL 전용:
SELECT percentile_cont(0.5) WITHIN GROUP (ORDER BY salary) AS median
FROM employees;

문제 30. 그래프 최단 경로 (재귀 CTE).

정답 보기

CREATE TABLE edges (
    src  VARCHAR(10),
    dst  VARCHAR(10),
    cost INT
);

WITH RECURSIVE paths AS (
    SELECT src, dst, cost, ARRAY[src, dst] AS path, 1 AS hops
    FROM edges
    WHERE src = 'A'

    UNION ALL

    SELECT p.src, e.dst, p.cost + e.cost,
           p.path || e.dst,
           p.hops + 1
    FROM paths p
    JOIN edges e ON p.dst = e.src
    WHERE NOT e.dst = ANY(p.path)  -- 사이클 방지
      AND p.hops < 10             -- 최대 깊이 제한
)
SELECT dst, MIN(cost) AS min_cost, path
FROM paths
WHERE dst = 'Z'
GROUP BY dst, path
ORDER BY min_cost
LIMIT 1;

12. 면접 질문 15선

기초 (1-5)

Q1. INNER JOIN과 LEFT JOIN의 차이점은?

정답 보기

INNER JOIN은 양쪽 테이블에 모두 매칭되는 행만 반환합니다. LEFT JOIN은 왼쪽 테이블의 모든 행을 반환하고, 오른쪽 테이블에 매칭이 없으면 NULL로 채웁니다. 부서가 없는 직원도 조회하려면 LEFT JOIN이 필요합니다.

Q2. GROUP BY와 HAVING의 차이는?

정답 보기

GROUP BY는 행을 그룹으로 묶습니다. HAVING은 그룹화된 결과에 조건을 적용합니다. WHERE는 그룹화 전에 개별 행을 필터링하고, HAVING은 그룹화 후에 그룹을 필터링합니다. 예: HAVING COUNT(*) > 5는 5건 초과 그룹만 반환합니다.

Q3. WHERE절에서 인덱스가 사용되지 않는 경우는?

정답 보기

(1) 컬럼에 함수를 적용한 경우: WHERE YEAR(created_at) = 2025. (2) 암시적 타입 변환. (3) LIKE '%keyword%'처럼 앞에 와일드카드. (4) OR 조건이 인덱스와 맞지 않는 경우. (5) NOT IN, 부정 조건. (6) 옵티마이저가 풀 스캔이 더 효율적이라 판단한 경우.

Q4. NULL에 대한 비교 연산의 결과는?

정답 보기

NULL과의 모든 비교 연산 결과는 NULL(UNKNOWN)입니다. NULL = NULL은 TRUE가 아닌 NULL입니다. NULL 확인에는 IS NULL 또는 IS NOT NULL을 사용해야 합니다. 집계 함수에서 NULL은 무시됩니다(COUNT(*)만 예외).

Q5. DELETE, TRUNCATE, DROP의 차이는?

정답 보기

DELETE: 행 단위 삭제, WHERE 조건 가능, 롤백 가능, 트리거 실행. TRUNCATE: 전체 행 즉시 삭제, 조건 불가, DDL이라 보통 롤백 불가(PostgreSQL은 가능), 매우 빠름. DROP: 테이블 자체를 삭제(구조+데이터+인덱스 모두).

중급 (6-10)

Q6. Window Function이 GROUP BY와 다른 점은?

정답 보기

GROUP BY는 그룹당 하나의 행으로 축소합니다. Window Function은 집계하면서도 개별 행을 유지합니다. OVER(PARTITION BY ...)로 그룹을 정의하고, 각 행에 그룹 수준의 계산 결과를 추가합니다. 예를 들어 각 직원 행에 부서 평균 급여를 함께 표시할 수 있습니다.

Q7. CTE와 서브쿼리 중 어떤 것을 선택하나요?

정답 보기

CTE는 가독성이 좋고, 같은 쿼리 내에서 재사용 가능하며, 재귀 쿼리를 지원합니다. 서브쿼리는 단순한 경우에 사용합니다. 성능은 DB에 따라 다르지만, PostgreSQL 12+에서 CTE는 기본적으로 인라인화되어 성능 차이가 거의 없습니다.

Q8. 복합 인덱스의 컬럼 순서가 중요한 이유는?

정답 보기

B-Tree 복합 인덱스는 Leftmost Prefix 규칙을 따릅니다. (A, B, C) 인덱스는 A만, A+B, A+B+C 조건에서 사용 가능하지만, B만이나 C만으로는 사용할 수 없습니다. 일반적으로 선택도(cardinality)가 높은 컬럼을 먼저, 등호 조건을 범위 조건보다 앞에 배치합니다.

Q9. EXPLAIN ANALYZE에서 주의해야 할 지표는?

정답 보기

(1) actual rows vs rows: 예상치와 실제치의 큰 차이는 통계 부정확. (2) Seq Scan on 대형 테이블: 인덱스 필요 시사. (3) Sort Method: disk = work_mem 부족. (4) Nested Loop with 대량 데이터: Hash Join 고려. (5) Execution Time: 전체 실행 시간 확인.

Q10. 트랜잭션 격리 수준을 선택하는 기준은?

정답 보기

대부분의 OLTP에서 Read Committed가 적합합니다(PostgreSQL 기본값). 재무/결제 등 정확성이 중요한 경우 Serializable. 보고서/분석처럼 스냅샷 일관성이 필요하면 Repeatable Read. 격리 수준이 높을수록 동시성이 낮아지므로 요구사항에 맞게 선택합니다.

심화 (11-15)

Q11. Covering Index란 무엇이며 왜 중요한가요?

정답 보기

Covering Index는 쿼리에 필요한 모든 컬럼이 인덱스에 포함된 경우로, Index Only Scan이 가능합니다. 테이블 본체에 접근하지 않아 I/O가 크게 감소합니다. PostgreSQL에서는 INCLUDE 절로 인덱스에 추가 컬럼을 포함할 수 있습니다.

Q12. MVCC의 동작 원리를 설명하세요.

정답 보기

MVCC는 각 행의 여러 버전을 유지합니다. 트랜잭션이 시작되면 스냅샷 시점을 기록하고, 그 이후의 변경은 보이지 않습니다. 쓰기는 새 버전을 생성하고, 읽기는 자신의 스냅샷에 맞는 버전을 읽습니다. 이로써 읽기와 쓰기가 서로 블로킹하지 않습니다. 오래된 버전은 VACUUM으로 정리합니다.

Q13. Partial Index와 Expression Index의 활용 사례는?

정답 보기

Partial Index는 자주 검색하는 조건의 부분집합만 인덱싱합니다. 예: 활성 사용자만 WHERE status = 'active'. 인덱스 크기가 작아 성능이 좋습니다. Expression Index는 함수 결과를 인덱싱합니다. 예: LOWER(email) 인덱스로 대소문자 무시 검색 최적화.

Q14. PostgreSQL에서 JSONB가 JSON보다 유리한 이유는?

정답 보기

JSONB는 바이너리 형태로 저장되어 파싱 오버헤드가 없고, GIN 인덱스를 지원하며, 포함(@>), 존재(?), 경로(@?) 연산자 사용이 가능합니다. JSON은 텍스트 그대로 저장하여 삽입이 빠르지만, 매번 파싱이 필요하고 인덱싱이 제한됩니다.

Q15. 대규모 테이블의 마이그레이션 전략은?

정답 보기

(1) 온라인 DDL: ALTER TABLE ... ADD COLUMN은 PostgreSQL에서 기본값이 있으면 즉시 완료. (2) 배치 업데이트: 전체 UPDATE 대신 배치로 나누어 실행. (3) 테이블 파티셔닝으로 작업 범위 축소. (4) pg_repack으로 테이블 재구성. (5) 레플리카에서 먼저 테스트. 무중단 운영이 핵심입니다.

13. 퀴즈 5선

퀴즈 1

다음 쿼리의 결과 행 수는?

SELECT * FROM A CROSS JOIN B;
-- A: 5행, B: 3행

A) 3 B) 5 C) 8 D) 15

정답 보기

D) 15. CROSS JOIN은 카테시안 곱으로, 5 x 3 = 15행을 반환합니다.

퀴즈 2

ROW_NUMBER, RANK, DENSE_RANK에서 동점(동일한 salary)이 있을 때, 다음 순위가 건너뛰어지는 것은?

A) ROW_NUMBER B) RANK C) DENSE_RANK D) 모두 건너뜀

정답 보기

B) RANK. RANK는 동점에 같은 순위를 부여하고, 다음 순위를 건너뜁니다(1, 1, 3). DENSE_RANK는 건너뛰지 않습니다(1, 1, 2). ROW_NUMBER는 항상 고유한 번호를 부여합니다.

퀴즈 3

복합 인덱스 (dept_id, salary, name)이 있을 때, 다음 중 이 인덱스를 사용할 수 없는 쿼리는?

A) WHERE dept_id = 10 B) WHERE dept_id = 10 AND salary > 50000 C) WHERE salary > 50000 D) WHERE dept_id = 10 AND salary > 50000 AND name = 'Alice'

정답 보기

C) Leftmost Prefix 규칙에 의해, 첫 번째 컬럼(dept_id) 없이 두 번째 컬럼(salary)만으로는 인덱스를 사용할 수 없습니다.

퀴즈 4

트랜잭션 격리 수준에서 Phantom Read를 방지하는 최소 수준은?

A) Read Uncommitted B) Read Committed C) Repeatable Read D) Serializable

정답 보기

D) Serializable. 표준 SQL에서는 Serializable만 Phantom Read를 방지합니다. 단, PostgreSQL의 Repeatable Read는 내부적으로 MVCC 스냅샷을 사용하여 Phantom Read도 방지합니다.

퀴즈 5

다음 결과에서 dept_avg 값은?

SELECT name, salary,
       AVG(salary) OVER () AS dept_avg
FROM employees;
-- employees: Alice 100, Bob 200, Charlie 300

A) 각 행마다 다른 값 B) 모든 행에서 200 C) NULL D) 에러

정답 보기

B) OVER ()는 파티션 없이 전체 데이터에 대한 집계입니다. (100 + 200 + 300) / 3 = 200이 모든 행에 표시됩니다.

14. 참고 자료

공식 문서

서적

SQL Performance Explained - Markus Winand
PostgreSQL 14 Internals - Egor Rogov
The Art of SQL - Stephane Faroult
High Performance MySQL - Baron Schwartz et al.
SQL Antipatterns - Bill Karwin

온라인 자료

Use The Index, Luke - 인덱스 완벽 가이드
pgexercises.com - PostgreSQL 실습
SQLZoo - SQL 기초 실습
LeetCode Database Problems - SQL 면접 문제
Mode Analytics SQL Tutorial

도구

explain.dalibo.com - PostgreSQL EXPLAIN 시각화
pgMustard - 실행 계획 분석 도구
DBeaver - 범용 DB 클라이언트
pgAdmin - PostgreSQL 전용 관리 도구

블로그

SQL Mastery 2025: Developer Interview SQL — From JOINs to Window Functions and Query Optimization

Introduction: Why SQL Is Still King in 2025
1. JOIN Mastery
2. Aggregation
3. Window Functions
4. Subquery vs CTE vs Derived Table
5. Recursive CTE
- 5.1 Org Chart Tree Traversal
- 5.2 Sequence Generation
6. Indexing
7. EXPLAIN ANALYZE
8. Query Optimization
9. Transaction Isolation Levels
10. PostgreSQL-Specific Features
11. Practice Problems (30)
12. Interview Questions (15)
13. Quiz (5 Questions)
14. References

Introduction: Why SQL Is Still King in 2025

In the Stack Overflow 2024 Developer Survey, SQL ranked among the top 3 most-used languages, and it remains one of the most sought-after skills. PostgreSQL was named DB-Engines DBMS of the Year for two consecutive years (2023-2024), and even in the AI/LLM era, the fundamental skill for working with data is SQL.

Why SQL matters in interviews:

Over 80% of backend position interviews include SQL questions
Essential for data engineers, full-stack, and DevOps roles
When ORM-only code hits performance walls, SQL is the solution
Window Functions and CTEs are core topics in modern interviews

┌──────────────────────────────────────────────────┐
│          Why SQL Is Still King (2025)             │
├──────────────────────────────────────────────────┤
│  Stack Overflow  → Top 3 most used language      │
│  PostgreSQL      → DBMS of the Year 2 years      │
│  AI/LLM          → Text-to-SQL core technology   │
│  Cloud            → Aurora, AlloyDB, Neon         │
│  Interview freq   → 80%+ of backend interviews   │
└──────────────────────────────────────────────────┘

This article covers everything you need for SQL interviews: all JOIN types, Window Functions, indexing, query optimization, and transactions.

1. JOIN Mastery

1.1 Sample Tables

-- Employees table
CREATE TABLE employees (
    id    INT PRIMARY KEY,
    name  VARCHAR(50),
    dept_id INT
);

INSERT INTO employees VALUES
(1, 'Alice',   10),
(2, 'Bob',     20),
(3, 'Charlie', 10),
(4, 'Diana',   30),
(5, 'Eve',     NULL);

-- Departments table
CREATE TABLE departments (
    id    INT PRIMARY KEY,
    name  VARCHAR(50)
);

INSERT INTO departments VALUES
(10, 'Engineering'),
(20, 'Marketing'),
(40, 'HR');

1.2 INNER JOIN

Returns only rows that match in both tables.

SELECT e.name AS employee, d.name AS department
FROM employees e
INNER JOIN departments d ON e.dept_id = d.id;

┌──────────┬─────────────┐
│ employee │ department  │
├──────────┼─────────────┤
│ Alice    │ Engineering │
│ Bob      │ Marketing   │
│ Charlie  │ Engineering │
└──────────┴─────────────┘
-- Diana (dept_id=30) and Eve (dept_id=NULL) excluded - no match
-- HR (id=40) excluded - no matching employees

┌─────────┐   ┌─────────┐
│Employees│   │  Depts  │
│    ┌────┼───┼────┐    │
│    │████│███│████│    │
│    │████│███│████│    │
│    └────┼───┼────┘    │
└─────────┘   └─────────┘
      INNER JOIN = Intersection

1.3 LEFT JOIN (LEFT OUTER JOIN)

All rows from the left table + matching right rows (NULL if no match).

SELECT e.name AS employee, d.name AS department
FROM employees e
LEFT JOIN departments d ON e.dept_id = d.id;

┌──────────┬─────────────┐
│ employee │ department  │
├──────────┼─────────────┤
│ Alice    │ Engineering │
│ Bob      │ Marketing   │
│ Charlie  │ Engineering │
│ Diana    │ NULL        │
│ Eve      │ NULL        │
└──────────┴─────────────┘

1.4 RIGHT JOIN (RIGHT OUTER JOIN)

All rows from the right table + matching left rows (NULL if no match).

SELECT e.name AS employee, d.name AS department
FROM employees e
RIGHT JOIN departments d ON e.dept_id = d.id;

┌──────────┬─────────────┐
│ employee │ department  │
├──────────┼─────────────┤
│ Alice    │ Engineering │
│ Bob      │ Marketing   │
│ Charlie  │ Engineering │
│ NULL     │ HR          │
└──────────┴─────────────┘

1.5 FULL OUTER JOIN

Returns all rows from both tables. NULL where no match exists.

SELECT e.name AS employee, d.name AS department
FROM employees e
FULL OUTER JOIN departments d ON e.dept_id = d.id;

┌──────────┬─────────────┐
│ employee │ department  │
├──────────┼─────────────┤
│ Alice    │ Engineering │
│ Bob      │ Marketing   │
│ Charlie  │ Engineering │
│ Diana    │ NULL        │
│ Eve      │ NULL        │
│ NULL     │ HR          │
└──────────┴─────────────┘

1.6 CROSS JOIN

Returns all combinations (Cartesian product) of two tables.

SELECT e.name, d.name
FROM employees e
CROSS JOIN departments d;
-- 5 employees x 3 departments = 15 rows

1.7 SELF JOIN

Joining a table with itself.

-- Manager-employee relationship
CREATE TABLE staff (
    id         INT PRIMARY KEY,
    name       VARCHAR(50),
    manager_id INT
);

INSERT INTO staff VALUES
(1, 'CEO',     NULL),
(2, 'CTO',     1),
(3, 'Alice',   2),
(4, 'Bob',     2);

SELECT
    s.name AS employee,
    m.name AS manager
FROM staff s
LEFT JOIN staff m ON s.manager_id = m.id;

┌──────────┬─────────┐
│ employee │ manager │
├──────────┼─────────┤
│ CEO      │ NULL    │
│ CTO      │ CEO     │
│ Alice    │ CTO     │
│ Bob      │ CTO     │
└──────────┴─────────┘

2. Aggregation

2.1 GROUP BY and HAVING

-- Employee count and average salary by department
SELECT
    dept_id,
    COUNT(*) AS emp_count,
    AVG(salary) AS avg_salary,
    MAX(salary) AS max_salary,
    MIN(salary) AS min_salary
FROM employees
GROUP BY dept_id
HAVING COUNT(*) >= 2
ORDER BY avg_salary DESC;

Key rule: Any non-aggregate column in SELECT must appear in GROUP BY.

2.2 ROLLUP

-- Department totals with grand total
SELECT
    COALESCE(dept_name, 'Grand Total') AS department,
    COALESCE(job_title, 'Subtotal') AS job,
    COUNT(*) AS emp_count,
    SUM(salary) AS total_salary
FROM employees e
JOIN departments d ON e.dept_id = d.id
GROUP BY ROLLUP(dept_name, job_title);

2.3 CUBE and GROUPING SETS

-- CUBE: subtotals for all combinations
SELECT dept_name, job_title, SUM(salary)
FROM employees e JOIN departments d ON e.dept_id = d.id
GROUP BY CUBE(dept_name, job_title);

-- GROUPING SETS: only specific combinations
SELECT dept_name, job_title, SUM(salary)
FROM employees e JOIN departments d ON e.dept_id = d.id
GROUP BY GROUPING SETS (
    (dept_name, job_title),  -- by dept + job
    (dept_name),              -- by dept
    ()                        -- grand total
);

3. Window Functions

3.1 What Are Window Functions

Functions that aggregate while preserving individual rows. Unlike GROUP BY, they do not reduce rows.

-- GROUP BY: reduces rows (1 row per dept)
SELECT dept_id, AVG(salary) FROM employees GROUP BY dept_id;

-- Window Function: preserves rows (all employees + dept average)
SELECT
    name,
    dept_id,
    salary,
    AVG(salary) OVER (PARTITION BY dept_id) AS dept_avg
FROM employees;

3.2 ROW_NUMBER, RANK, DENSE_RANK

SELECT
    name,
    dept_id,
    salary,
    ROW_NUMBER() OVER (ORDER BY salary DESC) AS row_num,
    RANK()       OVER (ORDER BY salary DESC) AS rank,
    DENSE_RANK() OVER (ORDER BY salary DESC) AS dense_rank
FROM employees;

┌─────────┬─────────┬────────┬─────────┬──────┬────────────┐
│  name   │ dept_id │ salary │ row_num │ rank │ dense_rank │
├─────────┼─────────┼────────┼─────────┼──────┼────────────┤
│ Alice   │   10    │ 120000 │    1    │  1   │     1      │
│ Charlie │   10    │ 120000 │    2    │  1   │     1      │
│ Bob     │   20    │ 100000 │    3    │  3   │     2      │
│ Diana   │   30    │  80000 │    4    │  4   │     3      │
│ Eve     │   20    │  80000 │    5    │  4   │     3      │
└─────────┴─────────┴────────┴─────────┴──────┴────────────┘

ROW_NUMBER: Always unique numbers (different even for ties)
RANK:       Same number for ties, gaps after (1, 1, 3)
DENSE_RANK: Same number for ties, no gaps   (1, 1, 2)

3.3 NTILE

Divides data into N equal groups.

-- Divide into salary quartiles
SELECT
    name,
    salary,
    NTILE(4) OVER (ORDER BY salary) AS quartile
FROM employees;

3.4 LAG and LEAD

Reference values from previous/next rows.

-- Month-over-month revenue changes
SELECT
    month,
    revenue,
    LAG(revenue, 1) OVER (ORDER BY month) AS prev_month,
    revenue - LAG(revenue, 1) OVER (ORDER BY month) AS change,
    LEAD(revenue, 1) OVER (ORDER BY month) AS next_month
FROM monthly_sales;

┌───────┬─────────┬────────────┬────────┬────────────┐
│ month │ revenue │ prev_month │ change │ next_month │
├───────┼─────────┼────────────┼────────┼────────────┤
│ 01    │  50000  │   NULL     │  NULL  │   60000    │
│ 02    │  60000  │   50000    │  10000 │   45000    │
│ 03    │  45000  │   60000    │ -15000 │   70000    │
│ 04    │  70000  │   45000    │  25000 │   NULL     │
└───────┴─────────┴────────────┴────────┴────────────┘

3.5 Running Total

SELECT
    order_date,
    amount,
    SUM(amount) OVER (ORDER BY order_date) AS running_total,
    SUM(amount) OVER (
        ORDER BY order_date
        ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
    ) AS moving_sum_3
FROM orders;

3.6 Top N Per Group

-- Top 3 salaries per department
WITH ranked AS (
    SELECT
        name,
        dept_id,
        salary,
        ROW_NUMBER() OVER (
            PARTITION BY dept_id
            ORDER BY salary DESC
        ) AS rn
    FROM employees
)
SELECT name, dept_id, salary
FROM ranked
WHERE rn <= 3;

4. Subquery vs CTE vs Derived Table

4.1 Subqueries

-- Correlated subquery: employees earning above their dept average
SELECT name, salary, dept_id
FROM employees e1
WHERE salary > (
    SELECT AVG(salary)
    FROM employees e2
    WHERE e2.dept_id = e1.dept_id
);

-- Non-correlated subquery: above overall average
SELECT name, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);

4.2 CTE (Common Table Expression)

WITH dept_avg AS (
    SELECT dept_id, AVG(salary) AS avg_salary
    FROM employees
    GROUP BY dept_id
),
high_earners AS (
    SELECT e.name, e.salary, e.dept_id, d.avg_salary
    FROM employees e
    JOIN dept_avg d ON e.dept_id = d.dept_id
    WHERE e.salary > d.avg_salary
)
SELECT * FROM high_earners ORDER BY salary DESC;

4.3 Comparison

┌────────────────┬───────────────┬───────────────┬────────────────┐
│    Aspect       │  Subquery     │     CTE       │  Derived Table │
├────────────────┼───────────────┼───────────────┼────────────────┤
│ Readability    │ Low (nested)  │ High          │ Medium         │
│ Reuse          │ Not possible  │ Within query  │ Not possible   │
│ Recursion      │ Not possible  │ Yes (WITH REC)│ Not possible   │
│ Performance    │ DB-dependent  │ DB-dependent  │ Inline view    │
│ Debugging      │ Hard          │ Easy          │ Medium         │
└────────────────┴───────────────┴───────────────┴────────────────┘

5. Recursive CTE

5.1 Org Chart Tree Traversal

CREATE TABLE org (
    id         INT PRIMARY KEY,
    name       VARCHAR(50),
    manager_id INT REFERENCES org(id)
);

INSERT INTO org VALUES
(1, 'CEO',      NULL),
(2, 'VP Eng',   1),
(3, 'VP Sales', 1),
(4, 'Dir Eng',  2),
(5, 'Lead Dev', 4),
(6, 'Dev 1',    5),
(7, 'Dev 2',    5);

-- Recursive CTE: traverse full org chart from CEO
WITH RECURSIVE org_tree AS (
    -- Base case: CEO (no manager)
    SELECT id, name, manager_id, 0 AS depth, name::TEXT AS path
    FROM org
    WHERE manager_id IS NULL

    UNION ALL

    -- Recursive case: find subordinates
    SELECT o.id, o.name, o.manager_id,
           t.depth + 1,
           t.path || ' > ' || o.name
    FROM org o
    JOIN org_tree t ON o.manager_id = t.id
)
SELECT
    REPEAT('  ', depth) || name AS hierarchy,
    depth,
    path
FROM org_tree
ORDER BY path;

┌────────────────────┬───────┬──────────────────────────────┐
│ hierarchy          │ depth │ path                         │
├────────────────────┼───────┼──────────────────────────────┤
│ CEO                │   0   │ CEO                          │
│   VP Eng           │   1   │ CEO > VP Eng                 │
│     Dir Eng        │   2   │ CEO > VP Eng > Dir Eng       │
│       Lead Dev     │   3   │ CEO > VP Eng > Dir Eng > ... │
│         Dev 1      │   4   │ CEO > VP Eng > Dir Eng > ... │
│         Dev 2      │   4   │ CEO > VP Eng > Dir Eng > ... │
│   VP Sales         │   1   │ CEO > VP Sales               │
└────────────────────┴───────┴──────────────────────────────┘

5.2 Sequence Generation

-- Generate numbers 1 to 10
WITH RECURSIVE nums AS (
    SELECT 1 AS n
    UNION ALL
    SELECT n + 1 FROM nums WHERE n < 10
)
SELECT n FROM nums;

-- Date series generation
WITH RECURSIVE dates AS (
    SELECT DATE '2025-01-01' AS dt
    UNION ALL
    SELECT dt + INTERVAL '1 day' FROM dates WHERE dt < '2025-01-31'
)
SELECT dt FROM dates;

6. Indexing

6.1 B-Tree Index

The most common index type. Effective for equality (=), range comparisons, sorting, and LIKE 'prefix%'.

CREATE INDEX idx_employees_name ON employees(name);

-- Effective queries
SELECT * FROM employees WHERE name = 'Alice';            -- equality
SELECT * FROM employees WHERE name BETWEEN 'A' AND 'M';  -- range
SELECT * FROM employees ORDER BY name;                    -- sort
SELECT * FROM employees WHERE name LIKE 'Al%';           -- prefix

B-Tree structure:
              [M]
             /   \
          [D,H]  [P,T]
         /  |  \  / | \
       [A-C][E-G][I-L][N-O][Q-S][U-Z]
       (Leaf nodes: actual data pointers)

6.2 Hash Index

Only effective for equality (=) comparisons. Cannot handle range searches or sorting.

CREATE INDEX idx_employees_email_hash
ON employees USING HASH (email);

-- Effective
SELECT * FROM employees WHERE email = 'alice@example.com';

6.3 GIN (Generalized Inverted Index)

Ideal for arrays, JSONB, and full-text search.

-- JSONB field indexing
CREATE INDEX idx_products_tags ON products USING GIN (tags);

SELECT * FROM products WHERE tags @> '["electronics"]';

-- Full-text search
CREATE INDEX idx_articles_search ON articles
USING GIN (to_tsvector('english', title || ' ' || body));

6.4 GiST (Generalized Search Tree)

Suitable for geographic data, range types, and nearest-neighbor searches.

CREATE INDEX idx_locations_geom ON locations USING GiST (geom);

SELECT * FROM locations
WHERE ST_DWithin(geom, ST_MakePoint(127.0, 37.5)::geography, 1000);

6.5 BRIN (Block Range Index)

Ideal for physically sorted large tables. Very small size for great effectiveness.

-- Optimal for time-series data
CREATE INDEX idx_logs_created ON logs USING BRIN (created_at);

6.6 Composite Index

-- Column order matters!
CREATE INDEX idx_emp_dept_salary ON employees(dept_id, salary);

-- Effective: dept_id only, or dept_id + salary together
SELECT * FROM employees WHERE dept_id = 10;
SELECT * FROM employees WHERE dept_id = 10 AND salary > 50000;

-- Ineffective: salary only (skipping first column)
-- SELECT * FROM employees WHERE salary > 50000;

Composite Index ordering rules (Leftmost Prefix):

Index: (A, B, C)

WHERE A = ?                        -> Used
WHERE A = ? AND B = ?              -> Used
WHERE A = ? AND B = ? AND C = ?    -> Used
WHERE B = ?                        -> NOT Used (A missing)
WHERE B = ? AND C = ?              -> NOT Used (A missing)
WHERE A = ? AND C = ?              -> Only A used (B skipped)

7. EXPLAIN ANALYZE

7.1 Reading Execution Plans

EXPLAIN ANALYZE
SELECT e.name, d.name AS dept
FROM employees e
JOIN departments d ON e.dept_id = d.id
WHERE e.salary > 50000;

Hash Join  (cost=1.09..2.24 rows=3 width=36) (actual time=0.05..0.06 rows=3 loops=1)
  Hash Cond: (e.dept_id = d.id)
  ->  Seq Scan on employees e  (cost=0.00..1.12 rows=3 width=22)
        Filter: (salary > 50000)
        Rows Removed by Filter: 2
  ->  Hash  (cost=1.03..1.03 rows=3 width=18)
        ->  Seq Scan on departments d  (cost=0.00..1.03 rows=3 width=18)
Planning Time: 0.15 ms
Execution Time: 0.09 ms

7.2 Key Scan Types

┌──────────────────┬─────────────────────────────────────┐
│ Scan Type         │ Description                         │
├──────────────────┼─────────────────────────────────────┤
│ Seq Scan         │ Full table sequential scan           │
│ Index Scan       │ Index lookup + table access          │
│ Index Only Scan  │ Results from index only (no table)   │
│ Bitmap Index Scan│ Build bitmap from index, then table  │
│ Bitmap Heap Scan │ Table access based on bitmap         │
└──────────────────┴─────────────────────────────────────┘

7.3 Reading Costs

cost=startup_cost..total_cost  rows=estimated_rows  width=row_size(bytes)

- startup_cost: cost until first row returned
- total_cost: total cost to return all rows
- actual time: real execution time (ms)
- rows: actual rows returned
- loops: number of times this node executed

7.4 Slow Query Diagnosis Flow

┌──────────────────────────────────────────────────┐
│         Query Optimization Flow                   │
├──────────────────────────────────────────────────┤
│ 1. Run EXPLAIN ANALYZE                            │
│    |                                              │
│ 2. Seq Scan on large table?                       │
│    -> Consider adding index                       │
│    |                                              │
│ 3. Large gap between estimated and actual rows?   │
│    -> Run ANALYZE to update statistics            │
│    |                                              │
│ 4. Nested Loop on large datasets?                 │
│    -> Hash Join or Merge Join may be better       │
│    |                                              │
│ 5. Sort using disk?                               │
│    -> Increase work_mem or add index              │
└──────────────────────────────────────────────────┘

8. Query Optimization

8.1 N+1 Problem in SQL

-- Bad: separate query per order (from application)
-- 1. SELECT * FROM orders WHERE user_id = 1;
-- 2. SELECT * FROM order_items WHERE order_id = 101;
-- 3. SELECT * FROM order_items WHERE order_id = 102;
-- ... (N orders = N+1 queries)

-- Good: single JOIN query
SELECT o.id, o.total, oi.product_name, oi.quantity
FROM orders o
JOIN order_items oi ON o.id = oi.order_id
WHERE o.user_id = 1;

8.2 Covering Index

Completes the query using only the index, avoiding table access.

CREATE INDEX idx_emp_dept_name_salary
ON employees(dept_id, name, salary);

-- Index Only Scan possible (no table access needed)
SELECT name, salary FROM employees WHERE dept_id = 10;

8.3 Partial Index

Indexes only rows matching a condition.

-- Index only active users (most queries target active users)
CREATE INDEX idx_active_users
ON users(email)
WHERE status = 'active';

SELECT * FROM users WHERE status = 'active' AND email = 'alice@example.com';

8.4 Expression Index

-- Index for case-insensitive search
CREATE INDEX idx_users_lower_email
ON users(LOWER(email));

SELECT * FROM users WHERE LOWER(email) = 'alice@example.com';

-- JSON field indexing
CREATE INDEX idx_data_category
ON products((data->>'category'));

8.5 Optimization Tips

-- 1. EXISTS vs IN: EXISTS is usually faster for large datasets
-- Slower
SELECT * FROM orders WHERE customer_id IN (
    SELECT id FROM customers WHERE country = 'US'
);
-- Faster
SELECT * FROM orders o WHERE EXISTS (
    SELECT 1 FROM customers c
    WHERE c.id = o.customer_id AND c.country = 'US'
);

-- 2. LIMIT prevents unnecessary scanning
SELECT * FROM logs ORDER BY created_at DESC LIMIT 100;

-- 3. UNION ALL vs UNION: use ALL when dedup not needed
SELECT name FROM employees_2024
UNION ALL
SELECT name FROM employees_2025;

-- 4. Remove unnecessary ORDER BY inside subqueries
-- 5. SELECT specific columns instead of SELECT *

9. Transaction Isolation Levels

9.1 Four Isolation Levels

┌──────────────────┬───────────┬──────────────┬──────────────┬─────────────┐
│ Isolation Level   │Dirty Read │Non-Repeatable│Phantom Read  │ Performance │
│                  │           │    Read      │              │             │
├──────────────────┼───────────┼──────────────┼──────────────┼─────────────┤
│ Read Uncommitted │ Possible  │ Possible     │ Possible     │ Fastest     │
│ Read Committed   │ Prevented │ Possible     │ Possible     │ Fast        │
│ Repeatable Read  │ Prevented │ Prevented    │ Possible     │ Medium      │
│ Serializable     │ Prevented │ Prevented    │ Prevented    │ Slowest     │
└──────────────────┴───────────┴──────────────┴──────────────┴─────────────┘

PostgreSQL default: Read Committed
MySQL InnoDB default: Repeatable Read

9.2 Anomaly Explanations

Dirty Read: Reading uncommitted data

-- Transaction A
UPDATE accounts SET balance = 0 WHERE id = 1;
-- (not yet committed)

-- Transaction B (Read Uncommitted)
SELECT balance FROM accounts WHERE id = 1;
-- balance = 0 (uncommitted value!)

-- Transaction A
ROLLBACK;
-- balance reverts to original, but Transaction B already used wrong data

Non-Repeatable Read: Same query returns different results

-- Transaction A
SELECT balance FROM accounts WHERE id = 1;  -- 1000

-- Transaction B
UPDATE accounts SET balance = 500 WHERE id = 1;
COMMIT;

-- Transaction A (within same transaction)
SELECT balance FROM accounts WHERE id = 1;  -- 500 (different!)

Phantom Read: Row count changes for the same condition

-- Transaction A
SELECT COUNT(*) FROM orders WHERE status = 'pending';  -- 5

-- Transaction B
INSERT INTO orders (status) VALUES ('pending');
COMMIT;

-- Transaction A
SELECT COUNT(*) FROM orders WHERE status = 'pending';  -- 6 (phantom row!)

9.3 MVCC (Multi-Version Concurrency Control)

PostgreSQL uses MVCC so reads and writes do not block each other.

┌──────────────────────────────────────────────────┐
│                  MVCC Principle                   │
├──────────────────────────────────────────────────┤
│                                                  │
│  Transaction 100 (Read):                         │
│  "I only see versions with xmin <= 100"          │
│                                                  │
│  Transaction 101 (Write):                        │
│  "Create new version with xmin=101,              │
│   mark old row with xmax=101"                    │
│                                                  │
│  -> Transaction 100 still sees the old version   │
│  -> Reads and writes do not block each other     │
│                                                  │
└──────────────────────────────────────────────────┘

10. PostgreSQL-Specific Features

10.1 JSONB Queries

CREATE TABLE products (
    id   SERIAL PRIMARY KEY,
    name TEXT,
    data JSONB
);

INSERT INTO products (name, data) VALUES
('Laptop', '{"brand": "Dell", "specs": {"ram": 16, "cpu": "i7"}, "tags": ["electronics", "computer"]}'),
('Phone',  '{"brand": "Apple", "specs": {"ram": 8, "cpu": "A17"}, "tags": ["electronics", "mobile"]}');

-- JSONB queries
SELECT name, data->>'brand' AS brand
FROM products
WHERE data->'specs'->>'ram' = '16';

-- Array containment
SELECT * FROM products WHERE data->'tags' @> '"mobile"';

-- JSONB path query (PostgreSQL 12+)
SELECT * FROM products
WHERE data @? '$.specs ? (@.ram > 10)';

10.2 Array Operators

CREATE TABLE articles (
    id   SERIAL PRIMARY KEY,
    title TEXT,
    tags  TEXT[]
);

INSERT INTO articles (title, tags) VALUES
('SQL Guide', ARRAY['sql', 'database', 'backend']),
('React Tips', ARRAY['react', 'frontend', 'javascript']);

-- Array containment search
SELECT * FROM articles WHERE tags @> ARRAY['sql'];
SELECT * FROM articles WHERE 'react' = ANY(tags);
SELECT * FROM articles WHERE tags && ARRAY['sql', 'react'];  -- overlap check

10.3 LATERAL JOIN

-- Get 3 most recent orders per department
SELECT d.name, recent.*
FROM departments d
CROSS JOIN LATERAL (
    SELECT o.id, o.amount, o.created_at
    FROM orders o
    WHERE o.dept_id = d.id
    ORDER BY o.created_at DESC
    LIMIT 3
) recent;

10.4 Materialized View

-- Pre-compute complex aggregation results
CREATE MATERIALIZED VIEW monthly_sales_summary AS
SELECT
    DATE_TRUNC('month', order_date) AS month,
    dept_id,
    COUNT(*) AS order_count,
    SUM(amount) AS total_amount,
    AVG(amount) AS avg_amount
FROM orders
GROUP BY DATE_TRUNC('month', order_date), dept_id;

-- Add index
CREATE INDEX idx_monthly_sales ON monthly_sales_summary(month, dept_id);

-- Refresh data
REFRESH MATERIALIZED VIEW CONCURRENTLY monthly_sales_summary;

11. Practice Problems (30)

Easy (1-10)

Problem 1. List all employees with their department names. Include employees without departments.

Show Answer

SELECT e.name, d.name AS department
FROM employees e
LEFT JOIN departments d ON e.dept_id = d.id;

Problem 2. Count employees per department.

Show Answer

SELECT d.name, COUNT(e.id) AS emp_count
FROM departments d
LEFT JOIN employees e ON d.id = e.dept_id
GROUP BY d.name
ORDER BY emp_count DESC;

Problem 3. Find employees earning above the overall average.

Show Answer

SELECT name, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);

Problem 4. Find duplicate email addresses.

Show Answer

SELECT email, COUNT(*) AS cnt
FROM users
GROUP BY email
HAVING COUNT(*) > 1;

Problem 5. Find customers with no orders.

Show Answer

SELECT c.name
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id
WHERE o.id IS NULL;

-- OR using NOT EXISTS
SELECT c.name
FROM customers c
WHERE NOT EXISTS (
    SELECT 1 FROM orders o WHERE o.customer_id = c.id
);

Problem 6. Get the 5 most recent orders.

Show Answer

SELECT * FROM orders
ORDER BY created_at DESC
LIMIT 5;

Problem 7. Average salary by department, only where average is 60000 or more.

Show Answer

SELECT dept_id, AVG(salary) AS avg_salary
FROM employees
GROUP BY dept_id
HAVING AVG(salary) >= 60000;

Problem 8. Find the second highest salary.

Show Answer

SELECT MAX(salary) AS second_highest
FROM employees
WHERE salary < (SELECT MAX(salary) FROM employees);

-- OR
SELECT DISTINCT salary
FROM employees
ORDER BY salary DESC
LIMIT 1 OFFSET 1;

Problem 9. Replace NULL values with defaults.

Show Answer

SELECT name, COALESCE(phone, 'N/A') AS phone
FROM employees;

Problem 10. Find employees whose names start with 'A'.

Show Answer

SELECT * FROM employees WHERE name LIKE 'A%';
-- Case-insensitive:
SELECT * FROM employees WHERE name ILIKE 'a%';

Medium (11-25)

Problem 11. Find the highest-paid employee in each department.

Show Answer

WITH ranked AS (
    SELECT name, dept_id, salary,
           RANK() OVER (PARTITION BY dept_id ORDER BY salary DESC) AS rk
    FROM employees
)
SELECT name, dept_id, salary FROM ranked WHERE rk = 1;

Problem 12. Calculate monthly revenue with month-over-month growth rate.

Show Answer

SELECT
    month,
    revenue,
    LAG(revenue) OVER (ORDER BY month) AS prev_revenue,
    ROUND(
        (revenue - LAG(revenue) OVER (ORDER BY month))::NUMERIC
        / NULLIF(LAG(revenue) OVER (ORDER BY month), 0) * 100,
        2
    ) AS growth_pct
FROM monthly_sales;

Problem 13. Find users who logged in for 3 or more consecutive days.

Show Answer

WITH login_groups AS (
    SELECT
        user_id,
        login_date,
        login_date - ROW_NUMBER() OVER (
            PARTITION BY user_id ORDER BY login_date
        )::INT * INTERVAL '1 day' AS grp
    FROM logins
)
SELECT user_id, COUNT(*) AS consecutive_days, MIN(login_date), MAX(login_date)
FROM login_groups
GROUP BY user_id, grp
HAVING COUNT(*) >= 3;

Problem 14. Calculate a running total.

Show Answer

SELECT
    order_date,
    amount,
    SUM(amount) OVER (ORDER BY order_date) AS running_total
FROM orders;

Problem 15. Pivot table: convert department/quarterly sales from rows to columns.

Show Answer

SELECT
    dept_name,
    SUM(CASE WHEN quarter = 1 THEN revenue ELSE 0 END) AS q1,
    SUM(CASE WHEN quarter = 2 THEN revenue ELSE 0 END) AS q2,
    SUM(CASE WHEN quarter = 3 THEN revenue ELSE 0 END) AS q3,
    SUM(CASE WHEN quarter = 4 THEN revenue ELSE 0 END) AS q4
FROM quarterly_sales
GROUP BY dept_name;

Problem 16. Generate Fibonacci sequence using recursive CTE.

Show Answer

WITH RECURSIVE fib AS (
    SELECT 1 AS n, 0::BIGINT AS a, 1::BIGINT AS b
    UNION ALL
    SELECT n + 1, b, a + b FROM fib WHERE n < 20
)
SELECT n, a AS fibonacci FROM fib;

Problem 17. Rank employees by salary within their department.

Show Answer

SELECT
    name,
    dept_id,
    salary,
    DENSE_RANK() OVER (PARTITION BY dept_id ORDER BY salary DESC) AS dept_rank
FROM employees;

Problem 18. Top 5 best-selling products with their categories.

Show Answer

SELECT p.name, p.category, SUM(oi.quantity) AS total_sold
FROM order_items oi
JOIN products p ON oi.product_id = p.id
GROUP BY p.name, p.category
ORDER BY total_sold DESC
LIMIT 5;

Problem 19. Use LATERAL JOIN to get the 3 latest products per category.

Show Answer

SELECT c.name AS category, p.*
FROM categories c
CROSS JOIN LATERAL (
    SELECT id, name, price, created_at
    FROM products
    WHERE category_id = c.id
    ORDER BY created_at DESC
    LIMIT 3
) p;

Problem 20. Calculate a 3-month moving average.

Show Answer

SELECT
    month,
    revenue,
    AVG(revenue) OVER (
        ORDER BY month
        ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
    ) AS moving_avg_3m
FROM monthly_sales;

Problem 21. Filter on JSONB fields with specific conditions.

Show Answer

SELECT id, name, data->>'brand' AS brand
FROM products
WHERE (data->'specs'->>'ram')::INT >= 16
  AND data->'tags' @> '"electronics"';

Problem 22. Gap analysis: find missing numbers in sequential IDs.

Show Answer

WITH id_range AS (
    SELECT generate_series(MIN(id), MAX(id)) AS expected_id
    FROM orders
)
SELECT expected_id AS missing_id
FROM id_range
WHERE expected_id NOT IN (SELECT id FROM orders);

Problem 23. Remove duplicate rows (keep only the latest).

Show Answer

DELETE FROM users
WHERE id NOT IN (
    SELECT DISTINCT ON (email) id
    FROM users
    ORDER BY email, created_at DESC
);

Problem 24. Calculate percentages without GROUP BY.

Show Answer

SELECT
    status,
    COUNT(*) AS cnt,
    ROUND(
        COUNT(*)::NUMERIC / SUM(COUNT(*)) OVER () * 100, 2
    ) AS percentage
FROM orders
GROUP BY status;

Problem 25. Cumulative unique user count.

Show Answer

WITH daily_new AS (
    SELECT
        user_id,
        MIN(login_date) AS first_date
    FROM logins
    GROUP BY user_id
)
SELECT
    first_date AS login_date,
    COUNT(*) AS new_users,
    SUM(COUNT(*)) OVER (ORDER BY first_date) AS cumulative_users
FROM daily_new
GROUP BY first_date
ORDER BY first_date;

Hard (26-30)

Problem 26. Find the longest consecutive revenue increase period.

Show Answer

WITH changes AS (
    SELECT
        month,
        revenue,
        CASE
            WHEN revenue > LAG(revenue) OVER (ORDER BY month)
            THEN 0 ELSE 1
        END AS is_break
    FROM monthly_sales
),
groups AS (
    SELECT *,
        SUM(is_break) OVER (ORDER BY month) AS grp
    FROM changes
)
SELECT
    grp,
    MIN(month) AS start_month,
    MAX(month) AS end_month,
    COUNT(*) AS streak_length
FROM groups
GROUP BY grp
ORDER BY streak_length DESC
LIMIT 1;

Problem 27. BOM (Bill of Materials) explosion using recursive CTE.

Show Answer

WITH RECURSIVE bom AS (
    SELECT
        parent_id, child_id, quantity, 1 AS level,
        quantity AS total_qty
    FROM bill_of_materials
    WHERE parent_id = 'PRODUCT_A'

    UNION ALL

    SELECT
        b.parent_id, b.child_id, b.quantity,
        bom.level + 1,
        bom.total_qty * b.quantity
    FROM bill_of_materials b
    JOIN bom ON b.parent_id = bom.child_id
    WHERE bom.level < 10
)
SELECT child_id AS component, SUM(total_qty) AS total_needed
FROM bom
GROUP BY child_id
ORDER BY total_needed DESC;

Problem 28. Detect time interval overlaps (scheduling conflicts).

Show Answer

SELECT
    a.id AS meeting_a,
    b.id AS meeting_b,
    a.start_time,
    a.end_time,
    b.start_time AS conflict_start,
    b.end_time AS conflict_end
FROM meetings a
JOIN meetings b ON a.id < b.id
    AND a.room_id = b.room_id
    AND a.start_time < b.end_time
    AND a.end_time > b.start_time;

Problem 29. Calculate median (without aggregate functions).

Show Answer

-- Pure SQL approach:
WITH ordered AS (
    SELECT salary,
           ROW_NUMBER() OVER (ORDER BY salary) AS rn,
           COUNT(*) OVER () AS total
    FROM employees
)
SELECT AVG(salary) AS median
FROM ordered
WHERE rn IN (
    FLOOR((total + 1) / 2.0),
    CEIL((total + 1) / 2.0)
);

-- PostgreSQL built-in:
SELECT percentile_cont(0.5) WITHIN GROUP (ORDER BY salary) AS median
FROM employees;

Problem 30. Graph shortest path using recursive CTE.

Show Answer

CREATE TABLE edges (
    src  VARCHAR(10),
    dst  VARCHAR(10),
    cost INT
);

WITH RECURSIVE paths AS (
    SELECT src, dst, cost, ARRAY[src, dst] AS path, 1 AS hops
    FROM edges
    WHERE src = 'A'

    UNION ALL

    SELECT p.src, e.dst, p.cost + e.cost,
           p.path || e.dst,
           p.hops + 1
    FROM paths p
    JOIN edges e ON p.dst = e.src
    WHERE NOT e.dst = ANY(p.path)  -- cycle prevention
      AND p.hops < 10             -- max depth limit
)
SELECT dst, MIN(cost) AS min_cost, path
FROM paths
WHERE dst = 'Z'
GROUP BY dst, path
ORDER BY min_cost
LIMIT 1;

12. Interview Questions (15)

Basics (1-5)

Q1. What is the difference between INNER JOIN and LEFT JOIN?

Show Answer

INNER JOIN returns only rows that match in both tables. LEFT JOIN returns all rows from the left table and fills NULL for non-matching right table rows. Use LEFT JOIN when you need to include rows without matches (e.g., employees without departments).

Q2. What is the difference between GROUP BY and HAVING?

Show Answer

GROUP BY groups rows together. HAVING applies conditions to grouped results. WHERE filters individual rows before grouping, while HAVING filters groups after grouping. Example: HAVING COUNT(*) > 5 returns only groups with more than 5 rows.

Q3. When does an index NOT get used in a WHERE clause?

Show Answer

(1) Functions applied to columns: WHERE YEAR(created_at) = 2025. (2) Implicit type conversions. (3) Leading wildcard: LIKE '%keyword%'. (4) OR conditions mismatching the index. (5) NOT IN, negation conditions. (6) Optimizer determines full scan is more efficient.

Q4. What is the result of comparison operations with NULL?

Show Answer

All comparison operations with NULL return NULL (UNKNOWN). NULL = NULL is NOT TRUE but NULL. Use IS NULL or IS NOT NULL to check for nulls. Aggregate functions ignore NULLs (except COUNT(*)).

Q5. What are the differences between DELETE, TRUNCATE, and DROP?

Show Answer

DELETE: Row-by-row deletion, supports WHERE, rollback possible, triggers fire. TRUNCATE: Instant full deletion, no WHERE, DDL so usually not rollback-able (PostgreSQL allows it), very fast. DROP: Deletes the table itself (structure + data + indexes).

Intermediate (6-10)

Q6. How do Window Functions differ from GROUP BY?

Show Answer

GROUP BY reduces to one row per group. Window Functions aggregate while preserving individual rows. OVER(PARTITION BY ...) defines groups, and adds group-level calculations to each row. For example, display each employee's row with their department average salary alongside.

Q7. When do you choose CTE over subqueries?

Show Answer

CTE offers better readability, is reusable within the same query, and supports recursive queries. Subqueries work for simple cases. Performance varies by DB, but in PostgreSQL 12+, CTEs are inlined by default, so performance difference is minimal.

Q8. Why does column order matter in composite indexes?

Show Answer

B-Tree composite indexes follow the Leftmost Prefix rule. An index on (A, B, C) can be used for A only, A+B, or A+B+C conditions, but NOT for B only or C only. Generally place higher cardinality columns first, and equality conditions before range conditions.

Q9. What indicators should you watch in EXPLAIN ANALYZE?

Show Answer

(1) actual rows vs rows: large gap indicates stale statistics. (2) Seq Scan on large tables: suggests index needed. (3) Sort Method: disk means insufficient work_mem. (4) Nested Loop on large data: consider Hash Join. (5) Execution Time: total execution time.

Q10. How do you choose a transaction isolation level?

Show Answer

Read Committed suits most OLTP (PostgreSQL default). Serializable for financial/payment systems requiring exactness. Repeatable Read for reports/analytics needing snapshot consistency. Higher isolation means lower concurrency, so choose based on requirements.

Advanced (11-15)

Q11. What is a Covering Index and why does it matter?

Show Answer

A Covering Index includes all columns needed by a query, enabling Index Only Scan. This avoids table access entirely, greatly reducing I/O. In PostgreSQL, use the INCLUDE clause to add extra columns to an index.

Q12. Explain how MVCC works.

Show Answer

MVCC maintains multiple versions of each row. When a transaction starts, it records a snapshot point and cannot see later changes. Writes create new versions, reads access versions matching their snapshot. This way reads and writes never block each other. Old versions are cleaned up by VACUUM.

Q13. Give use cases for Partial Index and Expression Index.

Show Answer

Partial Index indexes only frequently searched subsets. Example: active users only with WHERE status = 'active'. Smaller index size means better performance. Expression Index indexes function results. Example: LOWER(email) index for case-insensitive search optimization.

Q14. Why is JSONB better than JSON in PostgreSQL?

Show Answer

JSONB stores data in binary format eliminating parsing overhead, supports GIN indexes, and enables containment (@>), existence (?), and path (@?) operators. JSON stores as raw text with faster insertion but requires parsing on every access and has limited indexing.

Q15. What strategies exist for large table migrations?

Show Answer

(1) Online DDL: ALTER TABLE ... ADD COLUMN completes instantly in PostgreSQL when defaults exist. (2) Batch updates instead of full table UPDATE. (3) Table partitioning to reduce scope. (4) pg_repack for table restructuring. (5) Test on replicas first. Zero-downtime is key.

13. Quiz (5 Questions)

Quiz 1

How many rows does this query return?

SELECT * FROM A CROSS JOIN B;
-- A: 5 rows, B: 3 rows

A) 3 B) 5 C) 8 D) 15

Show Answer

D) 15. CROSS JOIN produces a Cartesian product: 5 x 3 = 15 rows.

Quiz 2

Among ROW_NUMBER, RANK, and DENSE_RANK, which one skips the next rank when there are ties?

A) ROW_NUMBER B) RANK C) DENSE_RANK D) All skip

Show Answer

B) RANK. RANK assigns the same rank for ties and skips the next (1, 1, 3). DENSE_RANK does not skip (1, 1, 2). ROW_NUMBER always assigns unique numbers.

Quiz 3

With composite index (dept_id, salary, name), which query CANNOT use this index?

A) WHERE dept_id = 10 B) WHERE dept_id = 10 AND salary > 50000 C) WHERE salary > 50000 D) WHERE dept_id = 10 AND salary > 50000 AND name = 'Alice'

Show Answer

C) By the Leftmost Prefix rule, the index cannot be used with just the second column (salary) without the first column (dept_id).

Quiz 4

What is the minimum isolation level that prevents Phantom Reads?

A) Read Uncommitted B) Read Committed C) Repeatable Read D) Serializable

Show Answer

D) Serializable. Per SQL standard, only Serializable prevents Phantom Reads. However, PostgreSQL's Repeatable Read also prevents phantoms internally using MVCC snapshots.

Quiz 5

What is the dept_avg value in this result?

SELECT name, salary,
       AVG(salary) OVER () AS dept_avg
FROM employees;
-- employees: Alice 100, Bob 200, Charlie 300

A) Different for each row B) 200 for all rows C) NULL D) Error

Show Answer

B) OVER () aggregates over the entire dataset without partitioning. (100 + 200 + 300) / 3 = 200 for all rows.

14. References

Official Documentation

Books

SQL Performance Explained - Markus Winand
PostgreSQL 14 Internals - Egor Rogov
The Art of SQL - Stephane Faroult
High Performance MySQL - Baron Schwartz et al.
SQL Antipatterns - Bill Karwin

Online Resources

Use The Index, Luke - Complete indexing guide
pgexercises.com - PostgreSQL practice
SQLZoo - SQL basics practice
LeetCode Database Problems - SQL interview problems
Mode Analytics SQL Tutorial

Tools

explain.dalibo.com - PostgreSQL EXPLAIN visualizer
pgMustard - Execution plan analysis
DBeaver - Universal DB client
pgAdmin - PostgreSQL administration tool