FAANG System Design Interview Complete Guide

The system design interview is one of the most critical stages for senior engineering positions. This guide systematically covers the most frequently asked problems at FAANG (Facebook/Meta, Apple, Amazon, Netflix, Google) interviews.

1. System Design Interview Framework: RESHADED

The RESHADED framework helps you structure answers systematically within a 45-minute interview.

Step	Content	Time
R - Requirements	Clarify functional/non-functional requirements	5 min
E - Estimation	Scale estimation (DAU, QPS, Storage)	5 min
S - Storage	Data model and DB selection	5 min
H - High-level design	Draft overall architecture	10 min
A - APIs	Design API endpoints	5 min
D - Detailed design	Deep dive into key components	10 min
E - Evaluation	Analyze trade-offs and bottlenecks	3 min
D - Distinguishing features	Propose differentiating enhancements	2 min

45-Minute Time Allocation Strategy

[0-5 min]   Gather requirements and define scope
[5-10 min]  Capacity estimation (Back-of-envelope calculation)
[10-20 min] High-level design (architecture diagram)
[20-35 min] Detailed design of key components
[35-43 min] Discuss trade-offs and improvements
[43-45 min] Handle interviewer questions

Good vs Bad Interviewee Behavior

Good interviewee:

Clarifies requirements first and explicitly states assumptions
Voluntarily discusses trade-offs while designing
Justifies design decisions with numbers
Explains clearly: "This approach has issue X, but I chose it because of Y"
Treats the interviewer as a partner in a collaborative dialogue

Bad interviewee:

Starts designing without clarifying requirements
Presents only one solution without considering alternatives
Gives abstract answers without concrete technology choices
Long silences or thinking only internally
Ignores hints from the interviewer

2. Key Concept Quick Review

2.1 Horizontal vs Vertical Scaling

Vertical Scaling (Scale-up):

Upgrade CPU, RAM, or disk on a single server
Pros: Simple, easy to maintain data consistency
Cons: Hardware limits exist, SPOF (Single Point of Failure) risk

Horizontal Scaling (Scale-out):

Add server instances to distribute load
Pros: Theoretically infinite scaling, easier fault recovery
Cons: State sharing complexity, network overhead

Vertical:   [Server 4GB RAM] → [Server 32GB RAM]
Horizontal: [Server] + [Server] + [Server] → placed behind Load Balancer

2.2 Load Balancer Algorithms

Algorithm	Description	Use Case
Round Robin	Distribute requests sequentially	When server specs are identical
Least Connections	Route to server with fewest active connections	When request handling time varies
IP Hash	Fixed distribution based on client IP	When session persistence is required
Weighted Round Robin	Weight servers by capacity	When server specs differ

2.3 CDN (Content Delivery Network)

A CDN caches static content (images, JS, CSS, video) on edge servers worldwide, serving users from geographically closer locations.

User → [Nearest CDN Edge] → (Cache Hit) → Return content
                          → (Cache Miss) → [Origin Server] → Cache in CDN → Return content

Push CDN vs Pull CDN:

Push CDN: Pre-upload content to CDN (suitable for large static files)
Pull CDN: Fetch from Origin on first request, then cache (suitable for dynamic content)

2.4 Caching Strategies

Cache-aside (Lazy Loading):

1. App queries cache for data
2. Cache miss → query DB
3. Store DB result in cache
4. Subsequent requests hit cache

Write-through:

1. App writes to cache
2. Cache synchronously writes to DB
→ Guarantees data consistency, incurs write latency

Write-back (Write-behind):

1. App writes only to cache
2. Cache asynchronously writes to DB
→ Excellent write performance, risk of data loss on cache failure

TTL (Time-To-Live): Sets cache expiration. Too short reduces cache efficiency; too long risks stale data.

2.5 Database: SQL vs NoSQL Selection Criteria

Choose SQL (Relational DB) when:

ACID transactions are required (payments, inventory)
Data structure is clear and changes infrequently
Complex JOIN queries are needed
Examples: PostgreSQL, MySQL

Choose NoSQL when:

Schema is flexible or changes frequently
Horizontal scaling is essential for large-scale services
Read performance is critical
Examples: MongoDB (document), Cassandra (column), Redis (key-value), Neo4j (graph)

2.6 CAP Theorem in Practice

CAP Theorem: A distributed system can only guarantee 2 of: Consistency, Availability, Partition tolerance simultaneously.

In practice, network Partition (P) is unavoidable, so:
  CP systems: Banking, stock trading (consistency first)
  AP systems: Social feeds, shopping carts (availability first)

System	Type	Reason
Zookeeper	CP	Distributed locks, config management
Cassandra	AP	Always writable, eventual consistency
HBase	CP	Strong consistency
DynamoDB	AP (default)	Configurable to CP

2.7 Consistent Hashing

Problem with regular hashing: Adding/removing servers causes most keys to be remapped.

Consistent Hashing: Servers and keys are placed on a ring structure. Adding/removing a server only remaps a fraction of keys.

Ring structure (0 to 2^32 - 1):

        0
       /|\
      / | \
Server A  Server B
    \   |   /
     Server C

Keys are assigned to the nearest server clockwise
When a server is removed, only its keys move to the next server

Virtual Nodes: Place each server multiple times on the ring to ensure even distribution.

2.8 Message Queue

Message Queues enable asynchronous communication between services and achieve loose coupling.

Kafka:

High throughput, persistence (log retention), Consumer Group support
Use: Event streaming, log aggregation, real-time analytics

RabbitMQ:

Complex routing, various messaging patterns
Use: Task queues, notification systems

Producer → [Message Queue] → Consumer

Benefits:
- Asynchronous processing reduces response time
- Traffic buffering (absorbs traffic spikes)
- Reduced coupling between services

3. URL Shortener Design (TinyURL / bit.ly)

3.1 Requirements Clarification

Functional Requirements:

Accept a long URL and generate a short URL
Redirect short URL to original URL
(Optional) Custom short URL alias
(Optional) URL expiration date

Non-Functional Requirements:

DAU: 100M (100 million daily active users)
Read:Write ratio = 100:1 (reads vastly outnumber writes)
Availability: 99.9% SLA
Redirect latency: under 100ms

3.2 Capacity Estimation

Write QPS:
  - New URLs per day: 100M / 100 = 1M per day
  - Write QPS: 1,000,000 / 86,400 ≈ 12 QPS

Read QPS:
  - Read = Write × 100 = 1,200 QPS
  - Peak: 1,200 × 5 = 6,000 QPS

Storage:
  - 1 URL record ≈ 500 bytes
  - 10-year data: 1M × 365 × 10 × 500 bytes ≈ 1.8 TB

Cache:
  - 80/20 rule: top 20% URLs generate 80% of traffic
  - Cache size: 1,200 QPS × 86,400 × 0.2 × 500 bytes ≈ 10 GB/day

3.3 API Design

POST /api/v1/urls
  Request:  { "long_url": "https://example.com/...", "expire_date": "2027-01-01" }
  Response: { "short_url": "https://tinyurl.com/abc123" }

GET /{short_code}
  Response: 301 Redirect (permanent) or 302 Redirect (temporary)
  → 301: Browser caches redirect, reduces server load (no analytics)
  → 302: Always goes through server (enables click analytics)

3.4 Base62 Encoding

Character set: [0-9, a-z, A-Z] = 62 characters
6-char Base62 = 62^6 = 56.8 billion combinations

long_url → MD5/SHA-256 → extract first 7 bytes → Base62 encode → short code

Example:
  "https://www.example.com/long-path"
  → MD5: "1a2b3c4d..."
  → integer of first 7 bytes → Base62 → "aB3xY9z"

3.5 DB Schema Design

CREATE TABLE url_mappings (
  id          BIGINT PRIMARY KEY,        -- Snowflake ID
  short_code  VARCHAR(8) UNIQUE NOT NULL,
  long_url    TEXT NOT NULL,
  user_id     BIGINT,
  created_at  TIMESTAMP DEFAULT NOW(),
  expire_at   TIMESTAMP,
  click_count BIGINT DEFAULT 0
);

CREATE INDEX idx_short_code ON url_mappings(short_code);

DB Choice: MySQL (simple key-value lookup, ACID not required) or NoSQL (Cassandra) for horizontal scaling.

3.6 Overall Architecture Diagram

[Client]
    |
    v
[DNS] → [CDN] (static assets)
    |
    v
[Load Balancer]
    |
    +-----------+-----------+
    v           v           v
[API Server] [API Server] [API Server]
    |               |
    v               v
[Redis Cache]  [URL DB (Master)]
(short URL cache)   |
                [URL DB (Replica)]
                (read-only)

Write flow:
  Client → LB → API Server → Save to DB → Update Redis cache → Return short URL

Read flow:
  Client → LB → API Server → Check Redis cache
    → Cache hit: immediate redirect
    → Cache miss: query DB → store in Redis → redirect

3.7 High Availability

Redis Cluster: Cache short URLs in Redis to reduce DB load by 90%
DB Replication: Master-Slave for read/write separation
Rate Limiting: Block excessive short URL creation from the same IP
Uniqueness: Use UUID or Snowflake ID to prevent duplicate short codes

4. Twitter/X Feed System Design

4.1 Requirements

DAU: 300M
Tweet creation: 5M per day
Timeline loading: latest 20 tweets
Follower count: average 300, max 100M (celebrities)

4.2 Fan-out Strategies

Fan-out on write (Push model):

On tweet creation:
  1. Save tweet to DB
  2. Retrieve all follower IDs
  3. Insert tweet ID into each follower's timeline cache (Redis)

Pros: Reads are very fast (direct lookup from Redis)
Cons: 100M followers = 100M writes → high write latency

Fan-out on read (Pull model):

On timeline fetch:
  1. Retrieve list of followed accounts
  2. Fetch latest tweets from each account
  3. Merge and sort by time

Pros: Simple writes
Cons: Read queries scale with follow count → high read latency

Hybrid Strategy (actual Twitter approach):

Regular users (followers < 10,000):  Fan-out on write
Celebrity users (followers >= 10,000): Fan-out on read

Timeline generation:
  1. Retrieve tweet IDs from Redis for followed regular users (fan-out on write results)
  2. Fetch latest tweets from followed celebrities (fan-out on read)
  3. Merge and sort both result sets

4.3 Timeline Caching (Redis Sorted Set)

Key:   timeline:{user_id}
Value: Sorted Set (score = tweet timestamp, member = tweet_id)

Example:
  ZADD timeline:123 1700000001 tweet:456
  ZADD timeline:123 1700000002 tweet:789
  ZREVRANGE timeline:123 0 19  # Get latest 20 tweets

Keep a max of 1,000 tweet IDs per user (memory management)

4.4 Tweet ID Generation (Snowflake ID)

Twitter's 64-bit distributed ID generation scheme:

64-bit layout:
  [1 bit: sign] [41 bits: timestamp] [10 bits: machine ID] [12 bits: sequence]

Benefits:
  - Time-sortable (timestamp embedded)
  - No collisions in distributed environments
  - Generates 4,096 IDs/sec × number of machines

4.5 Overall Architecture

[Mobile/Web Clients]
         |
    [API Gateway]
         |
   +-----------+
   |           |
[Tweet Service] [Timeline Service]
   |                |
[Tweet DB]   [Redis Timeline Cache]
(Cassandra)          |
                [Fan-out Service]
                (Consumes Kafka,
                 updates follower timelines)
                     |
              [Follower Graph DB]
              (stores follow relationships)

Media handling:
[Image/Video Upload] → [Object Storage (S3)] → [CDN]

5. YouTube / Netflix Video Streaming Design

5.1 Requirements

DAU: 2B (YouTube scale)
Video uploads: 500 hours/minute
Concurrent streams: hundreds of millions
Supported resolutions: 360p to 4K

5.2 Video Upload Pipeline

[User]
   |
   v
[Upload Service] → Store raw in S3
   |
   v
[Message Queue (Kafka)]
   |
   v
[Transcoding Workers] (parallel processing)
   ├── 360p encode
   ├── 720p encode
   ├── 1080p encode
   └── 4K encode
   |
   v
[Distribute to CDN]
   |
   v
[Update Metadata DB] → Notify user of completion

Transcoding Optimization:

DAG (Directed Acyclic Graph)-based task splitting
Split video into GOP (Group of Pictures) units for parallel processing
Watermarking and thumbnail generation also included in pipeline

5.3 Adaptive Bitrate Streaming (ABR)

Automatically switches quality based on user network conditions:

[Good network]    → Request 1080p/4K segments
[Moderate network] → Request 720p segments
[Poor network]    → Request 360p segments

HLS (HTTP Live Streaming):
  - Video split into 2-10 second segments (.ts files)
  - M3U8 playlist file manages segment list
  - Client monitors buffer state and selects quality for next segment

5.4 CDN Strategy

[Origin Server] → [Regional CDN] → [Edge CDN] → [User]

Popular videos: Pre-cached on multiple edge nodes (Push CDN)
Less popular:   Fetched from Origin on first request (Pull CDN)

Netflix approach: Partners with ISPs to place cache servers inside ISP networks
                  (OCA: Open Connect Appliance)

5.5 Overall Architecture

Upload path:
[Creator] → [Upload API] → [S3] → [Kafka] → [Transcoding Cluster]
                                                  |
                                             [CDN Distribution]

Viewing path:
[User] → [API Gateway] → [Video Service]
                               |
                 +-------------+-------------+
                 |             |             |
            [Metadata]    [CDN Stream]  [Recommendation]
            (MySQL)       (HLS/DASH)   (ML Model)

6. Chat System Design (WhatsApp / Slack)

6.1 Requirements

DAU: 500M
1:1 chat, group chat (up to 500 members)
Message delivery latency: under 100ms
Read receipts (WhatsApp blue ticks)
Online presence indicator

6.2 Real-Time Communication Protocol Comparison

Approach	Behavior	Pros	Cons
Long Polling	Hold connection until response	Simple to implement	Wastes server resources
SSE (Server-Sent Events)	Server-to-client one-way	Good for one-way streaming	Cannot push from client
WebSocket	Full-duplex bidirectional	Low latency, bidirectional	Complex connection management

Chat system choice: WebSocket

Client ←→ WebSocket connection ←→ Chat Server
  (persistent bidirectional communication)

6.3 Message Storage Strategy

Message characteristics:

Very high write frequency
Reads focus on recent messages
Rare deletions, no modifications

Why Cassandra / HBase:

Cassandra schema:
  Partition Key: channel_id
  Clustering Key: message_id (Snowflake, time-ordered)

  CREATE TABLE messages (
    channel_id  UUID,
    message_id  BIGINT,      -- Snowflake ID (time-embedded)
    sender_id   UUID,
    content     TEXT,
    created_at  TIMESTAMP,
    PRIMARY KEY (channel_id, message_id)
  ) WITH CLUSTERING ORDER BY (message_id DESC);

→ Very efficient for fetching latest messages per channel
→ Easy horizontal scaling

6.4 Read Receipts and Online Presence

Read Receipts:

Message states: SENT → DELIVERED → READ

1. Message sent → save to DB with state SENT
2. Reaches recipient device → update to DELIVERED → notify sender
3. Recipient views message → update to READ → notify sender

Online Presence:

Method 1: Heartbeat (ping server every 30 seconds)
  - Store user:{id}:last_seen = timestamp in Redis
  - TTL of 60 seconds → offline if no ping for 60+ seconds

Method 2: WebSocket connection tracking
  - Connect = online event, disconnect = offline event
  - Propagate status changes to friends via Pub/Sub

6.5 Group Chat Message Delivery

Small groups (N < 100):
  [Sender] → [Chat Server] → [Direct WebSocket delivery to each member]

Large groups (N >= 100):
  [Sender] → [Chat Server] → [Kafka Topic: group:{id}]
                                    |
                              [Consumer Cluster]
                                    |
                         [Push notification to each member device]

6.6 End-to-End Encryption Overview

WhatsApp's Signal Protocol-based approach:
  1. Each device generates a public/private key pair
  2. Only the public key is registered with the server
  3. Sender encrypts message with recipient's public key
  4. Server relays encrypted message (cannot decrypt content)
  5. Recipient decrypts with their private key

7. Company-Specific System Design Question Trends

Google

Problem	Key Points
Search Engine	Crawler, indexing, PageRank, autocomplete
Web Crawler	URL frontier, deduplication, robots.txt, politeness
Google Maps	Map tiles, pathfinding (Dijkstra), ETA prediction
Google Drive	File upload/download, real-time co-editing, versioning

Meta (Facebook/Instagram)

Problem	Key Points
News Feed	Fan-out strategy, EdgeRank algorithm
Instagram	Photo upload, follow graph, timeline
Facebook Messaging	Real-time chat, message sync
Friend Recommendations	Graph DB, mutual friend calculation

Amazon

Problem	Key Points
E-commerce Cart	Session storage, inventory management, payment processing
Recommendation System	Collaborative Filtering, real-time vs batch
Amazon S3	Object storage, 11-9s durability, multipart upload
Order Processing	Distributed transactions, Saga pattern

Netflix

Problem	Key Points
Video Streaming	ABR, CDN, transcoding
Recommendation System	A/B testing, personalized ML
API Gateway	Rate limiting, circuit breaker
Notifications	Real-time notification system

8. Key Design Patterns Summary

Database Patterns

Read scaling:    Master-Slave replication + read-only Replicas
Write scaling:   Sharding (Consistent Hashing)
Caching:         Redis (in-memory) → reduces DB load
Search:          Elasticsearch → full-text search
Time-series:     InfluxDB / TimescaleDB
Graph data:      Neo4j / Amazon Neptune

Async Processing Patterns

Task queue:       Async processing of heavy tasks (transcoding, email)
Event streaming:  Kafka → propagate events between services
CQRS:             Separate read/write models for performance
Saga pattern:     Handle distributed transactions

Quiz

Quiz 1: What is the difference between 301 and 302 redirects in a URL Shortener, and when should each be used?

Answer: 301 is a permanent redirect (Moved Permanently); 302 is a temporary redirect (Found/Temporary Redirect).

Explanation: With a 301, the browser caches the redirect, so subsequent requests skip the server entirely — reducing server load but making click analytics impossible. With a 302, every request goes through the server, enabling click tracking, A/B testing, and easy URL changes. If a URL shortener service collects advertising or analytics data, use 302. If minimizing server load is the priority, use 301.

Quiz 2: What problem occurs with Fan-out on write when a celebrity account with 100 million followers posts a tweet?

Answer: A "write storm" (also called the "Celebrity Problem" or "hotspot problem") occurs.

Explanation: Delivering a tweet ID via fan-out on write to 100 million followers requires 100 million write operations to Redis. This causes delays of several minutes and enormous resource consumption. Real Twitter applies a threshold (roughly 10,000 followers): celebrity accounts use fan-out on read instead. When a user loads their timeline, the system separately fetches the latest tweets from followed celebrities and merges them in.

Quiz 3: Name three reasons why Cassandra is well-suited for chat message storage.

Answer: High write throughput, time-ordered sorting support, and horizontal scalability.

Explanation: (1) Cassandra's LSM-Tree structure supports sequential writes without random disk I/O, enabling hundreds of thousands of message inserts per second. (2) Using a Snowflake ID as the Clustering Key automatically sorts messages by timestamp, making latest-message queries highly efficient. (3) Adding nodes provides linear performance scaling with automatic data rebalancing. In contrast, MySQL suffers from index overhead and vertical scaling limits on large message tables.

Quiz 4: In CAP Theorem, which type (CP or AP) should a banking system and a social media like count be, and why?

Answer: Banking systems should be CP; social media like counts should be AP.

Explanation: Bank account balances require absolute accuracy. During a network partition, a temporary service outage is preferable to showing an incorrect balance — so CP is chosen for strong consistency. In contrast, the difference between 1,234,567 and 1,234,570 likes barely affects user experience. Keeping the service running (availability) during a partition is more important, so AP is chosen with eventual consistency accepted.

Quiz 5: Why is Adaptive Bitrate Streaming (ABR) necessary for video streaming services, and how does it work?

Answer: To provide uninterrupted streaming to users across diverse network conditions.

Explanation: Mobile users move between 4G and WiFi, above and below ground, causing rapid bandwidth fluctuations. Fixed-quality streaming causes buffering when the network degrades. ABR splits video into 2-10 second segments and pre-encodes them at multiple qualities (360p to 4K). The client player monitors buffer levels and download speeds, dynamically selecting the quality for the next segment. When the network degrades, it switches to lower quality to maintain uninterrupted playback.