Split View: FAANG 시스템 디자인 면접 완전 정복 가이드

FAANG 시스템 디자인 면접 완전 정복 가이드

시스템 디자인 면접은 시니어 엔지니어 포지션에서 가장 중요한 관문입니다. 이 가이드는 FAANG(Facebook/Meta, Apple, Amazon, Netflix, Google) 면접에서 자주 출제되는 문제들을 체계적으로 정리합니다.

1. 시스템 디자인 면접 프레임워크: RESHADED

RESHADED 프레임워크는 45분 면접에서 체계적으로 답변할 수 있도록 돕는 구조입니다.

단계	내용	시간
R - Requirements	기능/비기능 요구사항 명확화	5분
E - Estimation	규모 추정 (DAU, QPS, Storage)	5분
S - Storage	데이터 모델 및 DB 선택	5분
H - High-level design	전체 아키텍처 초안	10분
A - APIs	API 엔드포인트 설계	5분
D - Detailed design	핵심 컴포넌트 심층 설계	10분
E - Evaluation	트레이드오프 및 병목 분석	3분
D - Distinguishing features	차별화 요소 제안	2분

45분 시간 배분 전략

[0~5분]   요구사항 수집 및 범위 정의
[5~10분]  용량 추정 (Back-of-envelope calculation)
[10~20분] 고수준 설계 (High-level diagram)
[20~35분] 핵심 컴포넌트 상세 설계
[35~43분] 트레이드오프 논의 및 개선안 제시
[43~45분] 면접관 질문 대응

좋은 면접자 vs 나쁜 면접자

좋은 면접자:

요구사항을 먼저 명확히 하고 가정(assumption)을 명시함
설계를 진행하면서 트레이드오프를 자발적으로 언급
숫자 기반으로 설계 결정을 정당화
"이 방법은 X 문제가 있지만 Y 이유로 선택했습니다"처럼 명확히 설명
면접관을 파트너로 삼아 대화하듯 진행

나쁜 면접자:

요구사항 확인 없이 바로 설계 시작
단일 솔루션만 제시하고 대안을 고려하지 않음
추상적인 답변만 하고 구체적인 기술 선택 근거가 없음
침묵이 길거나 혼자 고민만 함
면접관의 힌트를 무시

2. 핵심 개념 빠르게 복습

2.1 Horizontal vs Vertical Scaling

Vertical Scaling (스케일업):

단일 서버의 CPU, RAM, 디스크를 업그레이드
장점: 단순, 데이터 일관성 유지 쉬움
단점: 하드웨어 한계가 있음, SPOF(Single Point of Failure) 위험

Horizontal Scaling (스케일아웃):

서버 인스턴스를 추가하여 부하 분산
장점: 이론적으로 무한 확장, 장애 복구 용이
단점: 상태 공유 복잡성 증가, 네트워크 오버헤드

Vertical:   [Server 4GB RAM] → [Server 32GB RAM]
Horizontal: [Server] + [Server] + [Server] → Load Balancer 앞에 배치

2.2 Load Balancer 알고리즘

알고리즘	설명	사용 사례
Round Robin	순서대로 서버에 분배	서버 스펙이 동일할 때
Least Connections	현재 연결 수가 가장 적은 서버로	요청 처리 시간이 다양할 때
IP Hash	클라이언트 IP 기반 고정 분배	세션 유지가 필요할 때
Weighted Round Robin	서버 성능에 따라 가중치 부여	서버 스펙이 다를 때

2.3 CDN (Content Delivery Network)

CDN은 전 세계 엣지 서버에 정적 콘텐츠(이미지, JS, CSS, 동영상)를 캐싱하여 사용자와 가까운 곳에서 제공합니다.

사용자 → [가장 가까운 CDN 엣지] → (캐시 히트) → 콘텐츠 반환
                                 → (캐시 미스) → [Origin Server] → CDN 캐시 후 반환

Push CDN vs Pull CDN:

Push CDN: 콘텐츠를 미리 CDN에 업로드 (대용량 정적 파일에 적합)
Pull CDN: 첫 요청 시 Origin에서 가져와 캐싱 (동적 콘텐츠에 적합)

2.4 캐싱 전략

Cache-aside (Lazy Loading):

1. 앱이 캐시에서 데이터 조회
2. 캐시 미스 → DB에서 조회
3. DB 결과를 캐시에 저장
4. 다음 요청부터 캐시 히트

Write-through:

1. 앱이 캐시에 쓰기
2. 캐시가 동기적으로 DB에도 쓰기
→ 데이터 일관성 보장, 쓰기 지연 발생

Write-back (Write-behind):

1. 앱이 캐시에만 쓰기
2. 캐시가 비동기적으로 DB에 쓰기
→ 쓰기 성능 우수, 캐시 장애 시 데이터 손실 위험

TTL (Time-To-Live): 캐시 만료 시간 설정. 너무 짧으면 캐시 효율 저하, 너무 길면 stale data 위험.

2.5 Database: SQL vs NoSQL 선택 기준

SQL (관계형 DB) 선택 기준:

ACID 트랜잭션이 필수 (결제, 재고 관리)
데이터 구조가 명확하고 변경이 적음
복잡한 JOIN 쿼리가 필요
예: PostgreSQL, MySQL

NoSQL 선택 기준:

스키마가 유연하거나 자주 변경됨
수평 확장이 필수인 대규모 서비스
읽기 성능이 매우 중요
예: MongoDB(문서), Cassandra(컬럼), Redis(키-값), Neo4j(그래프)

2.6 CAP Theorem 실무 적용

CAP Theorem: 분산 시스템은 Consistency(일관성), Availability(가용성), Partition tolerance(파티션 허용) 중 동시에 2개만 보장 가능.

실무에서는 네트워크 파티션(P)은 피할 수 없으므로
  CP 시스템: 은행, 주식 거래 (일관성 우선)
  AP 시스템: SNS 피드, 쇼핑몰 장바구니 (가용성 우선)

시스템	타입	이유
Zookeeper	CP	분산 락, 설정 관리
Cassandra	AP	항상 쓰기 가능, eventual consistency
HBase	CP	강한 일관성
DynamoDB	AP (기본)	설정으로 CP 가능

2.7 Consistent Hashing

일반 해싱의 문제: 서버 추가/제거 시 거의 모든 키가 재배치됨.

Consistent Hashing: 링(ring) 구조에서 서버와 키를 배치. 서버 추가/제거 시 일부 키만 재배치.

링 구조 (0 ~ 2^32-1):

        0
       /|\
      / | \
Server A  Server B
    \   |   /
     Server C

키는 시계방향으로 가장 가까운 서버에 할당
서버 제거 시 해당 서버의 키만 다음 서버로 이동

Virtual Nodes: 서버를 링에 여러 번 배치하여 균등 분산 보장.

2.8 Message Queue

Message Queue는 서비스 간 비동기 통신을 가능하게 하고, 서비스를 느슨하게 결합(loose coupling)합니다.

Kafka:

높은 처리량, 영속성(로그 보관), Consumer Group 지원
사용: 이벤트 스트리밍, 로그 집계, 실시간 분석

RabbitMQ:

복잡한 라우팅, 다양한 메시지 패턴 지원
사용: 작업 큐, 알림 시스템

Producer → [Message Queue] → Consumer

장점:
- 비동기 처리로 응답 시간 단축
- 트래픽 버퍼링 (트래픽 스파이크 흡수)
- 서비스 간 결합도 감소

3. URL Shortener 설계 (TinyURL / bit.ly)

3.1 요구사항 명확화

기능적 요구사항:

긴 URL을 입력받아 짧은 URL 생성
짧은 URL 접속 시 원본 URL로 리다이렉트
(선택) 사용자 커스텀 단축 URL 지원
(선택) URL 만료 기간 설정

비기능적 요구사항:

DAU: 100M (일 활성 사용자 1억 명)
읽기:쓰기 비율 = 100:1 (조회가 압도적으로 많음)
가용성: 99.9% SLA
리다이렉트 지연: 100ms 미만

3.2 용량 추정

쓰기 QPS:
  - 일 신규 URL 생성: 100M / 100 = 1M건/일
  - 초당 쓰기: 1,000,000 / 86,400 ≈ 12 QPS

읽기 QPS:
  - 읽기 = 쓰기 × 100 = 1,200 QPS
  - 피크: 1,200 × 5 = 6,000 QPS

Storage:
  - URL 레코드 1건 ≈ 500 bytes
  - 10년 데이터: 1M × 365 × 10 × 500 bytes ≈ 1.8 TB

Cache:
  - 80/20 법칙: 상위 20% URL이 80% 트래픽
  - 캐시 크기: 1,200 QPS × 86,400 × 0.2 × 500 bytes ≈ 10 GB/일

3.3 API 설계

POST /api/v1/urls
  Request:  { "long_url": "https://example.com/...", "expire_date": "2027-01-01" }
  Response: { "short_url": "https://tinyurl.com/abc123" }

GET /{short_code}
  Response: 301 Redirect (영구) 또는 302 Redirect (임시)
  → 301: 브라우저 캐싱으로 서버 부하 감소 (분석 불가)
  → 302: 항상 서버 경유 (클릭 분석 가능)

3.4 Base62 인코딩

문자셋: [0-9, a-z, A-Z] = 62개 문자
6자리 Base62 = 62^6 = 56.8억 가지 조합

long_url → MD5/SHA-256 → 처음 7바이트 추출 → Base62 인코딩 → 단축 코드

예시:
  "https://www.example.com/long-path"
  → MD5: "1a2b3c4d..."
  → 첫 7바이트의 정수값 → Base62 → "aB3xY9z"

3.5 DB 스키마 설계

CREATE TABLE url_mappings (
  id          BIGINT PRIMARY KEY,        -- Snowflake ID
  short_code  VARCHAR(8) UNIQUE NOT NULL,
  long_url    TEXT NOT NULL,
  user_id     BIGINT,
  created_at  TIMESTAMP DEFAULT NOW(),
  expire_at   TIMESTAMP,
  click_count BIGINT DEFAULT 0
);

CREATE INDEX idx_short_code ON url_mappings(short_code);

DB 선택: MySQL (단순 키-값 조회, ACID 불필요) 또는 NoSQL(Cassandra)로 수평 확장.

3.6 전체 아키텍처 다이어그램

[클라이언트]
    |
    v
[DNS] → [CDN] (정적 자원)
    |
    v
[Load Balancer]
    |
    +-----------+-----------+
    v           v           v
[API Server] [API Server] [API Server]
    |               |
    v               v
[Redis Cache]  [URL DB (Master)]
(단축 URL 캐싱)     |
                [URL DB (Replica)]
                (읽기 전용)

쓰기 흐름:
  클라이언트 → LB → API Server → DB 저장 → Redis 캐시 갱신 → 단축 URL 반환

읽기 흐름:
  클라이언트 → LB → API Server → Redis 캐시 확인
    → 캐시 히트: 즉시 리다이렉트
    → 캐시 미스: DB 조회 → Redis 저장 → 리다이렉트

3.7 고가용성 처리

Redis Cluster: 단축 URL을 Redis에 캐싱하여 DB 부하 90% 감소
DB 복제: Master-Slave 구조로 읽기/쓰기 분리
Rate Limiting: 동일 IP의 과도한 단축 URL 생성 차단
무결성 보장: 중복 단축 코드 생성 방지를 위해 UUID 또는 Snowflake ID 사용

4. Twitter/X 피드 시스템 설계

4.1 요구사항

DAU: 300M
트윗 작성: 5M건/일
타임라인 로딩: 최신 20개 트윗
팔로워 수: 평균 300명, 최대 100M(셀럽)

4.2 Fan-out 전략

Fan-out on write (Push 모델):

트윗 작성 시:
  1. 트윗 DB에 저장
  2. 작성자의 모든 팔로워 ID 조회
  3. 각 팔로워의 타임라인 캐시(Redis)에 트윗 ID 삽입

장점: 읽기가 매우 빠름 (Redis에서 바로 조회)
단점: 팔로워 100M명 = 쓰기 100M번 → 쓰기 지연 큼

Fan-out on read (Pull 모델):

타임라인 조회 시:
  1. 사용자가 팔로우하는 계정 목록 조회
  2. 각 계정의 최신 트윗 조회 및 병합
  3. 시간 순 정렬하여 반환

장점: 쓰기가 단순
단점: 읽기 시 팔로우 수만큼 DB 쿼리 → 읽기 지연 큼

하이브리드 전략 (실제 Twitter 방식):

일반 유저 (팔로워 < 10,000):  Fan-out on write
셀럽 유저 (팔로워 >= 10,000): Fan-out on read

타임라인 생성:
  1. Redis에서 일반 유저들의 트윗 ID 조회 (Fan-out on write 결과)
  2. 팔로우 중인 셀럽의 최신 트윗 조회 (Fan-out on read)
  3. 두 결과를 병합 및 정렬

4.3 Timeline 캐싱 (Redis Sorted Set)

Key:   timeline:{user_id}
Value: Sorted Set (score = 트윗 timestamp, member = tweet_id)

예시:
  ZADD timeline:123 1700000001 tweet:456
  ZADD timeline:123 1700000002 tweet:789
  ZREVRANGE timeline:123 0 19  # 최신 20개 조회

최대 1,000개 트윗 ID만 유지 (메모리 관리)

4.4 트윗 ID 생성 (Snowflake ID)

Twitter에서 개발한 64비트 분산 ID 생성 방식:

64비트 구조:
  [1비트: 부호] [41비트: 타임스탬프] [10비트: 머신ID] [12비트: 시퀀스]

장점:
  - 시간 순 정렬 가능 (타임스탬프 포함)
  - 분산 환경에서 중복 없음
  - 초당 4,096개 × 머신 수만큼 생성 가능

4.5 전체 아키텍처

[모바일/웹 클라이언트]
         |
    [API Gateway]
         |
   +-----------+
   |           |
[트윗 서비스] [타임라인 서비스]
   |                |
[트윗 DB]    [Redis 타임라인 캐시]
(Cassandra)          |
                [Fan-out 서비스]
                (Kafka 소비 후
                 팔로워 타임라인 갱신)
                     |
              [팔로워 그래프 DB]
              (팔로우 관계 저장)

미디어 처리:
[이미지/동영상 업로드] → [Object Storage (S3)] → [CDN]

5. YouTube / Netflix 동영상 스트리밍 설계

5.1 요구사항

DAU: 2B (YouTube 기준)
동영상 업로드: 500시간/분
동시 스트리밍: 수억 명
지원 해상도: 360p ~ 4K

5.2 동영상 업로드 파이프라인

[사용자]
   |
   v
[업로드 서비스] → S3 원본 저장
   |
   v
[메시지 큐 (Kafka)]
   |
   v
[트랜스코딩 워커들] (병렬 처리)
   ├── 360p 변환
   ├── 720p 변환
   ├── 1080p 변환
   └── 4K 변환
   |
   v
[CDN에 배포]
   |
   v
[메타데이터 DB 업데이트] → 사용자에게 완료 알림

트랜스코딩 최적화:

DAG(Directed Acyclic Graph) 기반 작업 분할
영상을 GOP(Group of Pictures) 단위로 분할하여 병렬 처리
워터마킹, 썸네일 생성도 파이프라인에 포함

5.3 Adaptive Bitrate Streaming (ABR)

사용자 네트워크 상태에 따라 자동으로 화질 변환:

[좋은 네트워크] → 1080p/4K 세그먼트 요청
[보통 네트워크] → 720p 세그먼트 요청
[나쁜 네트워크] → 360p 세그먼트 요청

HLS (HTTP Live Streaming) 방식:
  - 영상을 2~10초 세그먼트(.ts 파일)로 분할
  - M3U8 플레이리스트 파일로 세그먼트 목록 관리
  - 클라이언트가 주기적으로 버퍼 상태 확인 후 화질 결정

5.4 CDN 전략

[Origin Server] → [Regional CDN] → [Edge CDN] → [사용자]

인기 동영상: 여러 엣지 노드에 사전 캐싱 (Push CDN)
비인기 동영상: 첫 요청 시 Origin에서 가져와 캐싱 (Pull CDN)

Netflix의 경우: ISP와 직접 협력하여 ISP 내부에 캐시 서버 배치 (OCA: Open Connect Appliance)

5.5 전체 아키텍처

업로드 경로:
[크리에이터] → [업로드 API] → [S3] → [Kafka] → [트랜스코딩 클러스터]
                                                       |
                                                  [CDN 배포]

시청 경로:
[사용자] → [API Gateway] → [비디오 서비스]
                                  |
                    +-------------+-------------+
                    |             |             |
               [메타데이터]  [CDN 스트림]  [추천 엔진]
               (MySQL)      (HLS/DASH)   (ML 모델)

6. 채팅 시스템 설계 (WhatsApp / Slack)

6.1 요구사항

DAU: 500M
1:1 채팅, 그룹 채팅 (최대 500명)
메시지 전송 지연: 100ms 미만
읽음 확인 표시 (WhatsApp 파란 체크)
온라인 상태 표시

6.2 실시간 통신 프로토콜 비교

방식	동작	장점	단점
Long Polling	응답 전까지 연결 유지	구현 단순	서버 자원 낭비
SSE (Server-Sent Events)	서버→클라이언트 단방향	단방향 스트리밍에 적합	클라이언트→서버 불가
WebSocket	양방향 전이중 연결	낮은 지연, 양방향	연결 상태 관리 복잡

채팅 시스템 선택: WebSocket

클라이언트 ←→ WebSocket 연결 ←→ Chat Server
  (지속적인 양방향 통신)

6.3 메시지 저장 전략

메시지 특성:

쓰기 매우 많음 (고빈도 삽입)
읽기: 최근 메시지 위주
삭제는 드물고 수정 없음

Cassandra / HBase 선택 이유:

Cassandra 스키마:
  Partition Key: channel_id
  Clustering Key: message_id (Snowflake, 시간순 정렬)

  CREATE TABLE messages (
    channel_id  UUID,
    message_id  BIGINT,      -- Snowflake ID (시간 포함)
    sender_id   UUID,
    content     TEXT,
    created_at  TIMESTAMP,
    PRIMARY KEY (channel_id, message_id)
  ) WITH CLUSTERING ORDER BY (message_id DESC);

→ 채널별 최신 메시지 조회가 매우 효율적
→ 수평 확장 용이

6.4 읽음 확인 및 온라인 상태

읽음 확인:

메시지 상태: SENT → DELIVERED → READ

1. 메시지 전송 → DB에 상태 SENT 저장
2. 수신자 기기에 도달 → 상태 DELIVERED 업데이트 → 발신자에게 알림
3. 수신자가 메시지 확인 → 상태 READ 업데이트 → 발신자에게 알림

온라인 상태:

방법 1: Heartbeat (30초마다 서버에 핑)
  - Redis에 user:{id}:last_seen = timestamp 저장
  - TTL 60초 설정 → 60초 이상 핑 없으면 오프라인

방법 2: WebSocket 연결 상태 추적
  - 연결 시 온라인, 연결 해제 시 오프라인 이벤트 발생
  - 친구들에게 상태 변경 Pub/Sub으로 전파

6.5 그룹 채팅 메시지 전달

소규모 그룹 (N < 100):
  [발신자] → [Chat Server] → [각 멤버에게 직접 WebSocket 전송]

대규모 그룹 (N >= 100):
  [발신자] → [Chat Server] → [Kafka Topic: group:{id}]
                                    |
                              [Consumer 클러스터]
                                    |
                         [각 멤버 기기로 푸시 알림]

6.6 E2E 암호화 개요

WhatsApp의 Signal Protocol 기반:
  1. 각 기기에서 공개키/비밀키 쌍 생성
  2. 서버에 공개키만 등록
  3. 발신자: 수신자 공개키로 메시지 암호화
  4. 서버: 암호화된 메시지만 전달 (내용 해독 불가)
  5. 수신자: 자신의 비밀키로 복호화

7. 기업별 시스템 디자인 출제 경향

Google

문제	핵심 포인트
검색 엔진	크롤러, 인덱싱, PageRank, 자동완성
웹 크롤러	URL frontier, 중복 제거, robots.txt, politeness
Google Maps	지도 타일, 경로 탐색(Dijkstra), ETA 예측
Google Drive	파일 업로드/다운로드, 실시간 공동 편집, 버전 관리

Meta (Facebook/Instagram)

문제	핵심 포인트
뉴스피드	Fan-out 전략, 엣지 랭킹 알고리즘
Instagram	사진 업로드, 팔로우 그래프, 타임라인
Facebook 메시징	실시간 채팅, 메시지 동기화
친구 추천	Graph DB, 공통 친구 계산

Amazon

문제	핵심 포인트
E-commerce 장바구니	세션 스토리지, 재고 관리, 결제 처리
추천 시스템	Collaborative Filtering, 실시간 vs 배치
Amazon S3	객체 스토리지, 내구성 11-9s, 멀티파트 업로드
주문 처리 시스템	분산 트랜잭션, Saga 패턴

Netflix

문제	핵심 포인트
동영상 스트리밍	ABR, CDN, 트랜스코딩
추천 시스템	A/B 테스트, 개인화 ML
API Gateway	Rate limiting, 서킷 브레이커
채팅/알림	실시간 알림 시스템

8. 핵심 설계 패턴 요약

데이터베이스 패턴

읽기 확장:     Master-Slave 복제 + 읽기 전용 Replica
쓰기 확장:     Sharding (Consistent Hashing)
캐싱:          Redis (인메모리) → DB 부하 감소
검색:          Elasticsearch → 전문 검색
시계열 데이터: InfluxDB / TimescaleDB
그래프 데이터: Neo4j / Amazon Neptune

비동기 처리 패턴

작업 큐:  무거운 작업(트랜스코딩, 이메일 발송) 비동기 처리
이벤트 스트리밍: Kafka → 서비스 간 이벤트 전파
CQRS: 읽기/쓰기 모델 분리로 성능 최적화
Saga 패턴: 분산 트랜잭션 처리

퀴즈

퀴즈 1: URL Shortener에서 301 리다이렉트와 302 리다이렉트의 차이는 무엇이며, 언제 각각을 사용해야 하나요?

정답: 301은 영구 리다이렉트(Moved Permanently), 302는 임시 리다이렉트(Found/Temporary Redirect)입니다.

설명: 301 사용 시 브라우저가 리다이렉트 결과를 캐싱하여 다음 요청부터는 서버를 거치지 않습니다. 서버 부하는 줄어들지만 클릭 분석이 불가능합니다. 302 사용 시 항상 서버를 경유하므로 클릭 수 추적, A/B 테스트, URL 변경이 자유롭습니다. URL Shortener 서비스가 광고나 분석 데이터를 수집한다면 302를, 서버 부하 최소화가 우선이라면 301을 선택합니다.

퀴즈 2: Twitter의 Fan-out on write 방식에서 팔로워가 1억 명인 셀럽 계정이 트윗을 올리면 어떤 문제가 생기나요?

정답: "Hotspot" 또는 "Celebrity Problem"이라 불리는 쓰기 폭풍(write storm) 문제가 발생합니다.

설명: 팔로워 1억 명에게 트윗 ID를 Fan-out on write로 전달하려면 Redis에 1억 번의 쓰기 연산이 필요합니다. 이는 수 분의 지연과 엄청난 리소스 소모를 유발합니다. 실제 Twitter는 팔로워 임계값(약 10,000명)을 기준으로 셀럽 계정은 Fan-out on read 방식을 적용합니다. 사용자가 타임라인을 조회할 때 팔로우 중인 셀럽의 최신 트윗을 별도로 가져와 병합합니다.

퀴즈 3: Cassandra가 채팅 메시지 저장에 적합한 이유 3가지를 설명하세요.

정답: 높은 쓰기 처리량, 시간 순 정렬 지원, 수평 확장성입니다.

설명: (1) Cassandra는 LSM-Tree 구조로 디스크 랜덤 I/O 없이 순차 쓰기를 지원, 초당 수십만 건의 메시지 삽입이 가능합니다. (2) Snowflake ID를 Clustering Key로 사용하면 타임스탬프 기반 정렬이 자동으로 이루어져 최신 메시지 조회가 효율적입니다. (3) 노드를 추가하는 것만으로 선형적인 성능 확장이 가능하며, 자동으로 데이터가 재분배됩니다. 반면 MySQL은 대용량 메시지 테이블에서 인덱스 오버헤드와 수직 확장 한계가 문제가 됩니다.

퀴즈 4: CAP Theorem에서 은행 시스템과 SNS 좋아요 수는 각각 어느 타입(CP/AP)이어야 하며, 그 이유는 무엇인가요?

정답: 은행 시스템은 CP, SNS 좋아요 수는 AP입니다.

설명: 은행 계좌 잔액은 정확성이 절대적으로 중요합니다. 네트워크 파티션 발생 시 잘못된 잔액 정보보다는 일시적 서비스 중단이 낫습니다. 따라서 CP를 선택하고 강한 일관성을 보장합니다. 반면 SNS 좋아요 수는 1,234,567개와 1,234,570개의 차이가 사용자 경험에 거의 영향을 주지 않습니다. 네트워크 파티션 시에도 서비스가 계속 동작하는 것(가용성)이 더 중요하므로 AP를 선택하고 eventual consistency를 허용합니다.

퀴즈 5: 동영상 스트리밍 서비스에서 Adaptive Bitrate Streaming(ABR)이 필요한 이유와 동작 원리를 설명하세요.

정답: 다양한 네트워크 환경의 사용자에게 끊김 없는 스트리밍을 제공하기 위함입니다.

설명: 모바일 사용자는 4G에서 WiFi로, 지하에서 지상으로 이동하면서 네트워크 대역폭이 급격히 변화합니다. 고정 화질 스트리밍 시 네트워크가 나빠지면 버퍼링이 발생합니다. ABR은 영상을 2~~10초 세그먼트로 분할하고 여러 화질(360p~~4K)로 미리 인코딩합니다. 클라이언트 플레이어는 현재 버퍼 수준과 다운로드 속도를 모니터링하여 다음 세그먼트의 화질을 동적으로 선택합니다. 네트워크가 나빠지면 낮은 화질로 전환하여 버퍼링 없이 재생을 유지합니다.

FAANG System Design Interview Complete Guide

The system design interview is one of the most critical stages for senior engineering positions. This guide systematically covers the most frequently asked problems at FAANG (Facebook/Meta, Apple, Amazon, Netflix, Google) interviews.

1. System Design Interview Framework: RESHADED

The RESHADED framework helps you structure answers systematically within a 45-minute interview.

Step	Content	Time
R - Requirements	Clarify functional/non-functional requirements	5 min
E - Estimation	Scale estimation (DAU, QPS, Storage)	5 min
S - Storage	Data model and DB selection	5 min
H - High-level design	Draft overall architecture	10 min
A - APIs	Design API endpoints	5 min
D - Detailed design	Deep dive into key components	10 min
E - Evaluation	Analyze trade-offs and bottlenecks	3 min
D - Distinguishing features	Propose differentiating enhancements	2 min

45-Minute Time Allocation Strategy

[0-5 min]   Gather requirements and define scope
[5-10 min]  Capacity estimation (Back-of-envelope calculation)
[10-20 min] High-level design (architecture diagram)
[20-35 min] Detailed design of key components
[35-43 min] Discuss trade-offs and improvements
[43-45 min] Handle interviewer questions

Good vs Bad Interviewee Behavior

Good interviewee:

Clarifies requirements first and explicitly states assumptions
Voluntarily discusses trade-offs while designing
Justifies design decisions with numbers
Explains clearly: "This approach has issue X, but I chose it because of Y"
Treats the interviewer as a partner in a collaborative dialogue

Bad interviewee:

Starts designing without clarifying requirements
Presents only one solution without considering alternatives
Gives abstract answers without concrete technology choices
Long silences or thinking only internally
Ignores hints from the interviewer

2. Key Concept Quick Review

2.1 Horizontal vs Vertical Scaling

Vertical Scaling (Scale-up):

Upgrade CPU, RAM, or disk on a single server
Pros: Simple, easy to maintain data consistency
Cons: Hardware limits exist, SPOF (Single Point of Failure) risk

Horizontal Scaling (Scale-out):

Add server instances to distribute load
Pros: Theoretically infinite scaling, easier fault recovery
Cons: State sharing complexity, network overhead

Vertical:   [Server 4GB RAM] → [Server 32GB RAM]
Horizontal: [Server] + [Server] + [Server] → placed behind Load Balancer

2.2 Load Balancer Algorithms

Algorithm	Description	Use Case
Round Robin	Distribute requests sequentially	When server specs are identical
Least Connections	Route to server with fewest active connections	When request handling time varies
IP Hash	Fixed distribution based on client IP	When session persistence is required
Weighted Round Robin	Weight servers by capacity	When server specs differ

2.3 CDN (Content Delivery Network)

A CDN caches static content (images, JS, CSS, video) on edge servers worldwide, serving users from geographically closer locations.

User → [Nearest CDN Edge] → (Cache Hit) → Return content
                          → (Cache Miss) → [Origin Server] → Cache in CDN → Return content

Push CDN vs Pull CDN:

Push CDN: Pre-upload content to CDN (suitable for large static files)
Pull CDN: Fetch from Origin on first request, then cache (suitable for dynamic content)

2.4 Caching Strategies

Cache-aside (Lazy Loading):

1. App queries cache for data
2. Cache miss → query DB
3. Store DB result in cache
4. Subsequent requests hit cache

Write-through:

1. App writes to cache
2. Cache synchronously writes to DB
→ Guarantees data consistency, incurs write latency

Write-back (Write-behind):

1. App writes only to cache
2. Cache asynchronously writes to DB
→ Excellent write performance, risk of data loss on cache failure

TTL (Time-To-Live): Sets cache expiration. Too short reduces cache efficiency; too long risks stale data.

2.5 Database: SQL vs NoSQL Selection Criteria

Choose SQL (Relational DB) when:

ACID transactions are required (payments, inventory)
Data structure is clear and changes infrequently
Complex JOIN queries are needed
Examples: PostgreSQL, MySQL

Choose NoSQL when:

Schema is flexible or changes frequently
Horizontal scaling is essential for large-scale services
Read performance is critical
Examples: MongoDB (document), Cassandra (column), Redis (key-value), Neo4j (graph)

2.6 CAP Theorem in Practice

CAP Theorem: A distributed system can only guarantee 2 of: Consistency, Availability, Partition tolerance simultaneously.

In practice, network Partition (P) is unavoidable, so:
  CP systems: Banking, stock trading (consistency first)
  AP systems: Social feeds, shopping carts (availability first)

System	Type	Reason
Zookeeper	CP	Distributed locks, config management
Cassandra	AP	Always writable, eventual consistency
HBase	CP	Strong consistency
DynamoDB	AP (default)	Configurable to CP

2.7 Consistent Hashing

Problem with regular hashing: Adding/removing servers causes most keys to be remapped.

Consistent Hashing: Servers and keys are placed on a ring structure. Adding/removing a server only remaps a fraction of keys.

Ring structure (0 to 2^32 - 1):

        0
       /|\
      / | \
Server A  Server B
    \   |   /
     Server C

Keys are assigned to the nearest server clockwise
When a server is removed, only its keys move to the next server

Virtual Nodes: Place each server multiple times on the ring to ensure even distribution.

2.8 Message Queue

Message Queues enable asynchronous communication between services and achieve loose coupling.

Kafka:

High throughput, persistence (log retention), Consumer Group support
Use: Event streaming, log aggregation, real-time analytics

RabbitMQ:

Complex routing, various messaging patterns
Use: Task queues, notification systems

Producer → [Message Queue] → Consumer

Benefits:
- Asynchronous processing reduces response time
- Traffic buffering (absorbs traffic spikes)
- Reduced coupling between services

3. URL Shortener Design (TinyURL / bit.ly)

3.1 Requirements Clarification

Functional Requirements:

Accept a long URL and generate a short URL
Redirect short URL to original URL
(Optional) Custom short URL alias
(Optional) URL expiration date

Non-Functional Requirements:

DAU: 100M (100 million daily active users)
Read:Write ratio = 100:1 (reads vastly outnumber writes)
Availability: 99.9% SLA
Redirect latency: under 100ms

3.2 Capacity Estimation

Write QPS:
  - New URLs per day: 100M / 100 = 1M per day
  - Write QPS: 1,000,000 / 86,400 ≈ 12 QPS

Read QPS:
  - Read = Write × 100 = 1,200 QPS
  - Peak: 1,200 × 5 = 6,000 QPS

Storage:
  - 1 URL record ≈ 500 bytes
  - 10-year data: 1M × 365 × 10 × 500 bytes ≈ 1.8 TB

Cache:
  - 80/20 rule: top 20% URLs generate 80% of traffic
  - Cache size: 1,200 QPS × 86,400 × 0.2 × 500 bytes ≈ 10 GB/day

3.3 API Design

POST /api/v1/urls
  Request:  { "long_url": "https://example.com/...", "expire_date": "2027-01-01" }
  Response: { "short_url": "https://tinyurl.com/abc123" }

GET /{short_code}
  Response: 301 Redirect (permanent) or 302 Redirect (temporary)
  → 301: Browser caches redirect, reduces server load (no analytics)
  → 302: Always goes through server (enables click analytics)

3.4 Base62 Encoding

Character set: [0-9, a-z, A-Z] = 62 characters
6-char Base62 = 62^6 = 56.8 billion combinations

long_url → MD5/SHA-256 → extract first 7 bytes → Base62 encode → short code

Example:
  "https://www.example.com/long-path"
  → MD5: "1a2b3c4d..."
  → integer of first 7 bytes → Base62 → "aB3xY9z"

3.5 DB Schema Design

CREATE TABLE url_mappings (
  id          BIGINT PRIMARY KEY,        -- Snowflake ID
  short_code  VARCHAR(8) UNIQUE NOT NULL,
  long_url    TEXT NOT NULL,
  user_id     BIGINT,
  created_at  TIMESTAMP DEFAULT NOW(),
  expire_at   TIMESTAMP,
  click_count BIGINT DEFAULT 0
);

CREATE INDEX idx_short_code ON url_mappings(short_code);

DB Choice: MySQL (simple key-value lookup, ACID not required) or NoSQL (Cassandra) for horizontal scaling.

3.6 Overall Architecture Diagram

[Client]
    |
    v
[DNS] → [CDN] (static assets)
    |
    v
[Load Balancer]
    |
    +-----------+-----------+
    v           v           v
[API Server] [API Server] [API Server]
    |               |
    v               v
[Redis Cache]  [URL DB (Master)]
(short URL cache)   |
                [URL DB (Replica)]
                (read-only)

Write flow:
  Client → LB → API Server → Save to DB → Update Redis cache → Return short URL

Read flow:
  Client → LB → API Server → Check Redis cache
    → Cache hit: immediate redirect
    → Cache miss: query DB → store in Redis → redirect

3.7 High Availability

Redis Cluster: Cache short URLs in Redis to reduce DB load by 90%
DB Replication: Master-Slave for read/write separation
Rate Limiting: Block excessive short URL creation from the same IP
Uniqueness: Use UUID or Snowflake ID to prevent duplicate short codes

4. Twitter/X Feed System Design

4.1 Requirements

DAU: 300M
Tweet creation: 5M per day
Timeline loading: latest 20 tweets
Follower count: average 300, max 100M (celebrities)

4.2 Fan-out Strategies

Fan-out on write (Push model):

On tweet creation:
  1. Save tweet to DB
  2. Retrieve all follower IDs
  3. Insert tweet ID into each follower's timeline cache (Redis)

Pros: Reads are very fast (direct lookup from Redis)
Cons: 100M followers = 100M writes → high write latency

Fan-out on read (Pull model):

On timeline fetch:
  1. Retrieve list of followed accounts
  2. Fetch latest tweets from each account
  3. Merge and sort by time

Pros: Simple writes
Cons: Read queries scale with follow count → high read latency

Hybrid Strategy (actual Twitter approach):

Regular users (followers < 10,000):  Fan-out on write
Celebrity users (followers >= 10,000): Fan-out on read

Timeline generation:
  1. Retrieve tweet IDs from Redis for followed regular users (fan-out on write results)
  2. Fetch latest tweets from followed celebrities (fan-out on read)
  3. Merge and sort both result sets

4.3 Timeline Caching (Redis Sorted Set)

Key:   timeline:{user_id}
Value: Sorted Set (score = tweet timestamp, member = tweet_id)

Example:
  ZADD timeline:123 1700000001 tweet:456
  ZADD timeline:123 1700000002 tweet:789
  ZREVRANGE timeline:123 0 19  # Get latest 20 tweets

Keep a max of 1,000 tweet IDs per user (memory management)

4.4 Tweet ID Generation (Snowflake ID)

Twitter's 64-bit distributed ID generation scheme:

64-bit layout:
  [1 bit: sign] [41 bits: timestamp] [10 bits: machine ID] [12 bits: sequence]

Benefits:
  - Time-sortable (timestamp embedded)
  - No collisions in distributed environments
  - Generates 4,096 IDs/sec × number of machines

4.5 Overall Architecture

[Mobile/Web Clients]
         |
    [API Gateway]
         |
   +-----------+
   |           |
[Tweet Service] [Timeline Service]
   |                |
[Tweet DB]   [Redis Timeline Cache]
(Cassandra)          |
                [Fan-out Service]
                (Consumes Kafka,
                 updates follower timelines)
                     |
              [Follower Graph DB]
              (stores follow relationships)

Media handling:
[Image/Video Upload] → [Object Storage (S3)] → [CDN]

5. YouTube / Netflix Video Streaming Design

5.1 Requirements

DAU: 2B (YouTube scale)
Video uploads: 500 hours/minute
Concurrent streams: hundreds of millions
Supported resolutions: 360p to 4K

5.2 Video Upload Pipeline

[User]
   |
   v
[Upload Service] → Store raw in S3
   |
   v
[Message Queue (Kafka)]
   |
   v
[Transcoding Workers] (parallel processing)
   ├── 360p encode
   ├── 720p encode
   ├── 1080p encode
   └── 4K encode
   |
   v
[Distribute to CDN]
   |
   v
[Update Metadata DB] → Notify user of completion

Transcoding Optimization:

DAG (Directed Acyclic Graph)-based task splitting
Split video into GOP (Group of Pictures) units for parallel processing
Watermarking and thumbnail generation also included in pipeline

5.3 Adaptive Bitrate Streaming (ABR)

Automatically switches quality based on user network conditions:

[Good network]    → Request 1080p/4K segments
[Moderate network] → Request 720p segments
[Poor network]    → Request 360p segments

HLS (HTTP Live Streaming):
  - Video split into 2-10 second segments (.ts files)
  - M3U8 playlist file manages segment list
  - Client monitors buffer state and selects quality for next segment

5.4 CDN Strategy

[Origin Server] → [Regional CDN] → [Edge CDN] → [User]

Popular videos: Pre-cached on multiple edge nodes (Push CDN)
Less popular:   Fetched from Origin on first request (Pull CDN)

Netflix approach: Partners with ISPs to place cache servers inside ISP networks
                  (OCA: Open Connect Appliance)

5.5 Overall Architecture

Upload path:
[Creator] → [Upload API] → [S3] → [Kafka] → [Transcoding Cluster]
                                                  |
                                             [CDN Distribution]

Viewing path:
[User] → [API Gateway] → [Video Service]
                               |
                 +-------------+-------------+
                 |             |             |
            [Metadata]    [CDN Stream]  [Recommendation]
            (MySQL)       (HLS/DASH)   (ML Model)

6. Chat System Design (WhatsApp / Slack)

6.1 Requirements

DAU: 500M
1:1 chat, group chat (up to 500 members)
Message delivery latency: under 100ms
Read receipts (WhatsApp blue ticks)
Online presence indicator

6.2 Real-Time Communication Protocol Comparison

Approach	Behavior	Pros	Cons
Long Polling	Hold connection until response	Simple to implement	Wastes server resources
SSE (Server-Sent Events)	Server-to-client one-way	Good for one-way streaming	Cannot push from client
WebSocket	Full-duplex bidirectional	Low latency, bidirectional	Complex connection management

Chat system choice: WebSocket

Client ←→ WebSocket connection ←→ Chat Server
  (persistent bidirectional communication)

6.3 Message Storage Strategy

Message characteristics:

Very high write frequency
Reads focus on recent messages
Rare deletions, no modifications

Why Cassandra / HBase:

Cassandra schema:
  Partition Key: channel_id
  Clustering Key: message_id (Snowflake, time-ordered)

  CREATE TABLE messages (
    channel_id  UUID,
    message_id  BIGINT,      -- Snowflake ID (time-embedded)
    sender_id   UUID,
    content     TEXT,
    created_at  TIMESTAMP,
    PRIMARY KEY (channel_id, message_id)
  ) WITH CLUSTERING ORDER BY (message_id DESC);

→ Very efficient for fetching latest messages per channel
→ Easy horizontal scaling

6.4 Read Receipts and Online Presence

Read Receipts:

Message states: SENT → DELIVERED → READ

1. Message sent → save to DB with state SENT
2. Reaches recipient device → update to DELIVERED → notify sender
3. Recipient views message → update to READ → notify sender

Online Presence:

Method 1: Heartbeat (ping server every 30 seconds)
  - Store user:{id}:last_seen = timestamp in Redis
  - TTL of 60 seconds → offline if no ping for 60+ seconds

Method 2: WebSocket connection tracking
  - Connect = online event, disconnect = offline event
  - Propagate status changes to friends via Pub/Sub

6.5 Group Chat Message Delivery

Small groups (N < 100):
  [Sender] → [Chat Server] → [Direct WebSocket delivery to each member]

Large groups (N >= 100):
  [Sender] → [Chat Server] → [Kafka Topic: group:{id}]
                                    |
                              [Consumer Cluster]
                                    |
                         [Push notification to each member device]

6.6 End-to-End Encryption Overview

WhatsApp's Signal Protocol-based approach:
  1. Each device generates a public/private key pair
  2. Only the public key is registered with the server
  3. Sender encrypts message with recipient's public key
  4. Server relays encrypted message (cannot decrypt content)
  5. Recipient decrypts with their private key

7. Company-Specific System Design Question Trends

Google

Problem	Key Points
Search Engine	Crawler, indexing, PageRank, autocomplete
Web Crawler	URL frontier, deduplication, robots.txt, politeness
Google Maps	Map tiles, pathfinding (Dijkstra), ETA prediction
Google Drive	File upload/download, real-time co-editing, versioning

Meta (Facebook/Instagram)

Problem	Key Points
News Feed	Fan-out strategy, EdgeRank algorithm
Instagram	Photo upload, follow graph, timeline
Facebook Messaging	Real-time chat, message sync
Friend Recommendations	Graph DB, mutual friend calculation

Amazon

Problem	Key Points
E-commerce Cart	Session storage, inventory management, payment processing
Recommendation System	Collaborative Filtering, real-time vs batch
Amazon S3	Object storage, 11-9s durability, multipart upload
Order Processing	Distributed transactions, Saga pattern

Netflix

Problem	Key Points
Video Streaming	ABR, CDN, transcoding
Recommendation System	A/B testing, personalized ML
API Gateway	Rate limiting, circuit breaker
Notifications	Real-time notification system

8. Key Design Patterns Summary

Database Patterns

Read scaling:    Master-Slave replication + read-only Replicas
Write scaling:   Sharding (Consistent Hashing)
Caching:         Redis (in-memory) → reduces DB load
Search:          Elasticsearch → full-text search
Time-series:     InfluxDB / TimescaleDB
Graph data:      Neo4j / Amazon Neptune

Async Processing Patterns

Task queue:       Async processing of heavy tasks (transcoding, email)
Event streaming:  Kafka → propagate events between services
CQRS:             Separate read/write models for performance
Saga pattern:     Handle distributed transactions

Quiz

Quiz 1: What is the difference between 301 and 302 redirects in a URL Shortener, and when should each be used?

Answer: 301 is a permanent redirect (Moved Permanently); 302 is a temporary redirect (Found/Temporary Redirect).

Explanation: With a 301, the browser caches the redirect, so subsequent requests skip the server entirely — reducing server load but making click analytics impossible. With a 302, every request goes through the server, enabling click tracking, A/B testing, and easy URL changes. If a URL shortener service collects advertising or analytics data, use 302. If minimizing server load is the priority, use 301.

Quiz 2: What problem occurs with Fan-out on write when a celebrity account with 100 million followers posts a tweet?

Answer: A "write storm" (also called the "Celebrity Problem" or "hotspot problem") occurs.

Explanation: Delivering a tweet ID via fan-out on write to 100 million followers requires 100 million write operations to Redis. This causes delays of several minutes and enormous resource consumption. Real Twitter applies a threshold (roughly 10,000 followers): celebrity accounts use fan-out on read instead. When a user loads their timeline, the system separately fetches the latest tweets from followed celebrities and merges them in.

Quiz 3: Name three reasons why Cassandra is well-suited for chat message storage.

Answer: High write throughput, time-ordered sorting support, and horizontal scalability.

Explanation: (1) Cassandra's LSM-Tree structure supports sequential writes without random disk I/O, enabling hundreds of thousands of message inserts per second. (2) Using a Snowflake ID as the Clustering Key automatically sorts messages by timestamp, making latest-message queries highly efficient. (3) Adding nodes provides linear performance scaling with automatic data rebalancing. In contrast, MySQL suffers from index overhead and vertical scaling limits on large message tables.

Quiz 4: In CAP Theorem, which type (CP or AP) should a banking system and a social media like count be, and why?

Answer: Banking systems should be CP; social media like counts should be AP.

Explanation: Bank account balances require absolute accuracy. During a network partition, a temporary service outage is preferable to showing an incorrect balance — so CP is chosen for strong consistency. In contrast, the difference between 1,234,567 and 1,234,570 likes barely affects user experience. Keeping the service running (availability) during a partition is more important, so AP is chosen with eventual consistency accepted.

Quiz 5: Why is Adaptive Bitrate Streaming (ABR) necessary for video streaming services, and how does it work?

Answer: To provide uninterrupted streaming to users across diverse network conditions.

Explanation: Mobile users move between 4G and WiFi, above and below ground, causing rapid bandwidth fluctuations. Fixed-quality streaming causes buffering when the network degrades. ABR splits video into 2-10 second segments and pre-encodes them at multiple qualities (360p to 4K). The client player monitors buffer levels and download speeds, dynamically selecting the quality for the next segment. When the network degrades, it switches to lower quality to maintain uninterrupted playback.