- Authors

- Name
- Youngju Kim
- @fjvbn20031
FAANG System Design Interview Complete Guide
The system design interview is one of the most critical stages for senior engineering positions. This guide systematically covers the most frequently asked problems at FAANG (Facebook/Meta, Apple, Amazon, Netflix, Google) interviews.
1. System Design Interview Framework: RESHADED
The RESHADED framework helps you structure answers systematically within a 45-minute interview.
| Step | Content | Time |
|---|---|---|
| R - Requirements | Clarify functional/non-functional requirements | 5 min |
| E - Estimation | Scale estimation (DAU, QPS, Storage) | 5 min |
| S - Storage | Data model and DB selection | 5 min |
| H - High-level design | Draft overall architecture | 10 min |
| A - APIs | Design API endpoints | 5 min |
| D - Detailed design | Deep dive into key components | 10 min |
| E - Evaluation | Analyze trade-offs and bottlenecks | 3 min |
| D - Distinguishing features | Propose differentiating enhancements | 2 min |
45-Minute Time Allocation Strategy
[0-5 min] Gather requirements and define scope
[5-10 min] Capacity estimation (Back-of-envelope calculation)
[10-20 min] High-level design (architecture diagram)
[20-35 min] Detailed design of key components
[35-43 min] Discuss trade-offs and improvements
[43-45 min] Handle interviewer questions
Good vs Bad Interviewee Behavior
Good interviewee:
- Clarifies requirements first and explicitly states assumptions
- Voluntarily discusses trade-offs while designing
- Justifies design decisions with numbers
- Explains clearly: "This approach has issue X, but I chose it because of Y"
- Treats the interviewer as a partner in a collaborative dialogue
Bad interviewee:
- Starts designing without clarifying requirements
- Presents only one solution without considering alternatives
- Gives abstract answers without concrete technology choices
- Long silences or thinking only internally
- Ignores hints from the interviewer
2. Key Concept Quick Review
2.1 Horizontal vs Vertical Scaling
Vertical Scaling (Scale-up):
- Upgrade CPU, RAM, or disk on a single server
- Pros: Simple, easy to maintain data consistency
- Cons: Hardware limits exist, SPOF (Single Point of Failure) risk
Horizontal Scaling (Scale-out):
- Add server instances to distribute load
- Pros: Theoretically infinite scaling, easier fault recovery
- Cons: State sharing complexity, network overhead
Vertical: [Server 4GB RAM] → [Server 32GB RAM]
Horizontal: [Server] + [Server] + [Server] → placed behind Load Balancer
2.2 Load Balancer Algorithms
| Algorithm | Description | Use Case |
|---|---|---|
| Round Robin | Distribute requests sequentially | When server specs are identical |
| Least Connections | Route to server with fewest active connections | When request handling time varies |
| IP Hash | Fixed distribution based on client IP | When session persistence is required |
| Weighted Round Robin | Weight servers by capacity | When server specs differ |
2.3 CDN (Content Delivery Network)
A CDN caches static content (images, JS, CSS, video) on edge servers worldwide, serving users from geographically closer locations.
User → [Nearest CDN Edge] → (Cache Hit) → Return content
→ (Cache Miss) → [Origin Server] → Cache in CDN → Return content
Push CDN vs Pull CDN:
- Push CDN: Pre-upload content to CDN (suitable for large static files)
- Pull CDN: Fetch from Origin on first request, then cache (suitable for dynamic content)
2.4 Caching Strategies
Cache-aside (Lazy Loading):
1. App queries cache for data
2. Cache miss → query DB
3. Store DB result in cache
4. Subsequent requests hit cache
Write-through:
1. App writes to cache
2. Cache synchronously writes to DB
→ Guarantees data consistency, incurs write latency
Write-back (Write-behind):
1. App writes only to cache
2. Cache asynchronously writes to DB
→ Excellent write performance, risk of data loss on cache failure
TTL (Time-To-Live): Sets cache expiration. Too short reduces cache efficiency; too long risks stale data.
2.5 Database: SQL vs NoSQL Selection Criteria
Choose SQL (Relational DB) when:
- ACID transactions are required (payments, inventory)
- Data structure is clear and changes infrequently
- Complex JOIN queries are needed
- Examples: PostgreSQL, MySQL
Choose NoSQL when:
- Schema is flexible or changes frequently
- Horizontal scaling is essential for large-scale services
- Read performance is critical
- Examples: MongoDB (document), Cassandra (column), Redis (key-value), Neo4j (graph)
2.6 CAP Theorem in Practice
CAP Theorem: A distributed system can only guarantee 2 of: Consistency, Availability, Partition tolerance simultaneously.
In practice, network Partition (P) is unavoidable, so:
CP systems: Banking, stock trading (consistency first)
AP systems: Social feeds, shopping carts (availability first)
| System | Type | Reason |
|---|---|---|
| Zookeeper | CP | Distributed locks, config management |
| Cassandra | AP | Always writable, eventual consistency |
| HBase | CP | Strong consistency |
| DynamoDB | AP (default) | Configurable to CP |
2.7 Consistent Hashing
Problem with regular hashing: Adding/removing servers causes most keys to be remapped.
Consistent Hashing: Servers and keys are placed on a ring structure. Adding/removing a server only remaps a fraction of keys.
Ring structure (0 to 2^32 - 1):
0
/|\
/ | \
Server A Server B
\ | /
Server C
Keys are assigned to the nearest server clockwise
When a server is removed, only its keys move to the next server
Virtual Nodes: Place each server multiple times on the ring to ensure even distribution.
2.8 Message Queue
Message Queues enable asynchronous communication between services and achieve loose coupling.
Kafka:
- High throughput, persistence (log retention), Consumer Group support
- Use: Event streaming, log aggregation, real-time analytics
RabbitMQ:
- Complex routing, various messaging patterns
- Use: Task queues, notification systems
Producer → [Message Queue] → Consumer
Benefits:
- Asynchronous processing reduces response time
- Traffic buffering (absorbs traffic spikes)
- Reduced coupling between services
3. URL Shortener Design (TinyURL / bit.ly)
3.1 Requirements Clarification
Functional Requirements:
- Accept a long URL and generate a short URL
- Redirect short URL to original URL
- (Optional) Custom short URL alias
- (Optional) URL expiration date
Non-Functional Requirements:
- DAU: 100M (100 million daily active users)
- Read:Write ratio = 100:1 (reads vastly outnumber writes)
- Availability: 99.9% SLA
- Redirect latency: under 100ms
3.2 Capacity Estimation
Write QPS:
- New URLs per day: 100M / 100 = 1M per day
- Write QPS: 1,000,000 / 86,400 ≈ 12 QPS
Read QPS:
- Read = Write × 100 = 1,200 QPS
- Peak: 1,200 × 5 = 6,000 QPS
Storage:
- 1 URL record ≈ 500 bytes
- 10-year data: 1M × 365 × 10 × 500 bytes ≈ 1.8 TB
Cache:
- 80/20 rule: top 20% URLs generate 80% of traffic
- Cache size: 1,200 QPS × 86,400 × 0.2 × 500 bytes ≈ 10 GB/day
3.3 API Design
POST /api/v1/urls
Request: { "long_url": "https://example.com/...", "expire_date": "2027-01-01" }
Response: { "short_url": "https://tinyurl.com/abc123" }
GET /{short_code}
Response: 301 Redirect (permanent) or 302 Redirect (temporary)
→ 301: Browser caches redirect, reduces server load (no analytics)
→ 302: Always goes through server (enables click analytics)
3.4 Base62 Encoding
Character set: [0-9, a-z, A-Z] = 62 characters
6-char Base62 = 62^6 = 56.8 billion combinations
long_url → MD5/SHA-256 → extract first 7 bytes → Base62 encode → short code
Example:
"https://www.example.com/long-path"
→ MD5: "1a2b3c4d..."
→ integer of first 7 bytes → Base62 → "aB3xY9z"
3.5 DB Schema Design
CREATE TABLE url_mappings (
id BIGINT PRIMARY KEY, -- Snowflake ID
short_code VARCHAR(8) UNIQUE NOT NULL,
long_url TEXT NOT NULL,
user_id BIGINT,
created_at TIMESTAMP DEFAULT NOW(),
expire_at TIMESTAMP,
click_count BIGINT DEFAULT 0
);
CREATE INDEX idx_short_code ON url_mappings(short_code);
DB Choice: MySQL (simple key-value lookup, ACID not required) or NoSQL (Cassandra) for horizontal scaling.
3.6 Overall Architecture Diagram
[Client]
|
v
[DNS] → [CDN] (static assets)
|
v
[Load Balancer]
|
+-----------+-----------+
v v v
[API Server] [API Server] [API Server]
| |
v v
[Redis Cache] [URL DB (Master)]
(short URL cache) |
[URL DB (Replica)]
(read-only)
Write flow:
Client → LB → API Server → Save to DB → Update Redis cache → Return short URL
Read flow:
Client → LB → API Server → Check Redis cache
→ Cache hit: immediate redirect
→ Cache miss: query DB → store in Redis → redirect
3.7 High Availability
- Redis Cluster: Cache short URLs in Redis to reduce DB load by 90%
- DB Replication: Master-Slave for read/write separation
- Rate Limiting: Block excessive short URL creation from the same IP
- Uniqueness: Use UUID or Snowflake ID to prevent duplicate short codes
4. Twitter/X Feed System Design
4.1 Requirements
- DAU: 300M
- Tweet creation: 5M per day
- Timeline loading: latest 20 tweets
- Follower count: average 300, max 100M (celebrities)
4.2 Fan-out Strategies
Fan-out on write (Push model):
On tweet creation:
1. Save tweet to DB
2. Retrieve all follower IDs
3. Insert tweet ID into each follower's timeline cache (Redis)
Pros: Reads are very fast (direct lookup from Redis)
Cons: 100M followers = 100M writes → high write latency
Fan-out on read (Pull model):
On timeline fetch:
1. Retrieve list of followed accounts
2. Fetch latest tweets from each account
3. Merge and sort by time
Pros: Simple writes
Cons: Read queries scale with follow count → high read latency
Hybrid Strategy (actual Twitter approach):
Regular users (followers < 10,000): Fan-out on write
Celebrity users (followers >= 10,000): Fan-out on read
Timeline generation:
1. Retrieve tweet IDs from Redis for followed regular users (fan-out on write results)
2. Fetch latest tweets from followed celebrities (fan-out on read)
3. Merge and sort both result sets
4.3 Timeline Caching (Redis Sorted Set)
Key: timeline:{user_id}
Value: Sorted Set (score = tweet timestamp, member = tweet_id)
Example:
ZADD timeline:123 1700000001 tweet:456
ZADD timeline:123 1700000002 tweet:789
ZREVRANGE timeline:123 0 19 # Get latest 20 tweets
Keep a max of 1,000 tweet IDs per user (memory management)
4.4 Tweet ID Generation (Snowflake ID)
Twitter's 64-bit distributed ID generation scheme:
64-bit layout:
[1 bit: sign] [41 bits: timestamp] [10 bits: machine ID] [12 bits: sequence]
Benefits:
- Time-sortable (timestamp embedded)
- No collisions in distributed environments
- Generates 4,096 IDs/sec × number of machines
4.5 Overall Architecture
[Mobile/Web Clients]
|
[API Gateway]
|
+-----------+
| |
[Tweet Service] [Timeline Service]
| |
[Tweet DB] [Redis Timeline Cache]
(Cassandra) |
[Fan-out Service]
(Consumes Kafka,
updates follower timelines)
|
[Follower Graph DB]
(stores follow relationships)
Media handling:
[Image/Video Upload] → [Object Storage (S3)] → [CDN]
5. YouTube / Netflix Video Streaming Design
5.1 Requirements
- DAU: 2B (YouTube scale)
- Video uploads: 500 hours/minute
- Concurrent streams: hundreds of millions
- Supported resolutions: 360p to 4K
5.2 Video Upload Pipeline
[User]
|
v
[Upload Service] → Store raw in S3
|
v
[Message Queue (Kafka)]
|
v
[Transcoding Workers] (parallel processing)
├── 360p encode
├── 720p encode
├── 1080p encode
└── 4K encode
|
v
[Distribute to CDN]
|
v
[Update Metadata DB] → Notify user of completion
Transcoding Optimization:
- DAG (Directed Acyclic Graph)-based task splitting
- Split video into GOP (Group of Pictures) units for parallel processing
- Watermarking and thumbnail generation also included in pipeline
5.3 Adaptive Bitrate Streaming (ABR)
Automatically switches quality based on user network conditions:
[Good network] → Request 1080p/4K segments
[Moderate network] → Request 720p segments
[Poor network] → Request 360p segments
HLS (HTTP Live Streaming):
- Video split into 2-10 second segments (.ts files)
- M3U8 playlist file manages segment list
- Client monitors buffer state and selects quality for next segment
5.4 CDN Strategy
[Origin Server] → [Regional CDN] → [Edge CDN] → [User]
Popular videos: Pre-cached on multiple edge nodes (Push CDN)
Less popular: Fetched from Origin on first request (Pull CDN)
Netflix approach: Partners with ISPs to place cache servers inside ISP networks
(OCA: Open Connect Appliance)
5.5 Overall Architecture
Upload path:
[Creator] → [Upload API] → [S3] → [Kafka] → [Transcoding Cluster]
|
[CDN Distribution]
Viewing path:
[User] → [API Gateway] → [Video Service]
|
+-------------+-------------+
| | |
[Metadata] [CDN Stream] [Recommendation]
(MySQL) (HLS/DASH) (ML Model)
6. Chat System Design (WhatsApp / Slack)
6.1 Requirements
- DAU: 500M
- 1:1 chat, group chat (up to 500 members)
- Message delivery latency: under 100ms
- Read receipts (WhatsApp blue ticks)
- Online presence indicator
6.2 Real-Time Communication Protocol Comparison
| Approach | Behavior | Pros | Cons |
|---|---|---|---|
| Long Polling | Hold connection until response | Simple to implement | Wastes server resources |
| SSE (Server-Sent Events) | Server-to-client one-way | Good for one-way streaming | Cannot push from client |
| WebSocket | Full-duplex bidirectional | Low latency, bidirectional | Complex connection management |
Chat system choice: WebSocket
Client ←→ WebSocket connection ←→ Chat Server
(persistent bidirectional communication)
6.3 Message Storage Strategy
Message characteristics:
- Very high write frequency
- Reads focus on recent messages
- Rare deletions, no modifications
Why Cassandra / HBase:
Cassandra schema:
Partition Key: channel_id
Clustering Key: message_id (Snowflake, time-ordered)
CREATE TABLE messages (
channel_id UUID,
message_id BIGINT, -- Snowflake ID (time-embedded)
sender_id UUID,
content TEXT,
created_at TIMESTAMP,
PRIMARY KEY (channel_id, message_id)
) WITH CLUSTERING ORDER BY (message_id DESC);
→ Very efficient for fetching latest messages per channel
→ Easy horizontal scaling
6.4 Read Receipts and Online Presence
Read Receipts:
Message states: SENT → DELIVERED → READ
1. Message sent → save to DB with state SENT
2. Reaches recipient device → update to DELIVERED → notify sender
3. Recipient views message → update to READ → notify sender
Online Presence:
Method 1: Heartbeat (ping server every 30 seconds)
- Store user:{id}:last_seen = timestamp in Redis
- TTL of 60 seconds → offline if no ping for 60+ seconds
Method 2: WebSocket connection tracking
- Connect = online event, disconnect = offline event
- Propagate status changes to friends via Pub/Sub
6.5 Group Chat Message Delivery
Small groups (N < 100):
[Sender] → [Chat Server] → [Direct WebSocket delivery to each member]
Large groups (N >= 100):
[Sender] → [Chat Server] → [Kafka Topic: group:{id}]
|
[Consumer Cluster]
|
[Push notification to each member device]
6.6 End-to-End Encryption Overview
WhatsApp's Signal Protocol-based approach:
1. Each device generates a public/private key pair
2. Only the public key is registered with the server
3. Sender encrypts message with recipient's public key
4. Server relays encrypted message (cannot decrypt content)
5. Recipient decrypts with their private key
7. Company-Specific System Design Question Trends
| Problem | Key Points |
|---|---|
| Search Engine | Crawler, indexing, PageRank, autocomplete |
| Web Crawler | URL frontier, deduplication, robots.txt, politeness |
| Google Maps | Map tiles, pathfinding (Dijkstra), ETA prediction |
| Google Drive | File upload/download, real-time co-editing, versioning |
Meta (Facebook/Instagram)
| Problem | Key Points |
|---|---|
| News Feed | Fan-out strategy, EdgeRank algorithm |
| Photo upload, follow graph, timeline | |
| Facebook Messaging | Real-time chat, message sync |
| Friend Recommendations | Graph DB, mutual friend calculation |
Amazon
| Problem | Key Points |
|---|---|
| E-commerce Cart | Session storage, inventory management, payment processing |
| Recommendation System | Collaborative Filtering, real-time vs batch |
| Amazon S3 | Object storage, 11-9s durability, multipart upload |
| Order Processing | Distributed transactions, Saga pattern |
Netflix
| Problem | Key Points |
|---|---|
| Video Streaming | ABR, CDN, transcoding |
| Recommendation System | A/B testing, personalized ML |
| API Gateway | Rate limiting, circuit breaker |
| Notifications | Real-time notification system |
8. Key Design Patterns Summary
Database Patterns
Read scaling: Master-Slave replication + read-only Replicas
Write scaling: Sharding (Consistent Hashing)
Caching: Redis (in-memory) → reduces DB load
Search: Elasticsearch → full-text search
Time-series: InfluxDB / TimescaleDB
Graph data: Neo4j / Amazon Neptune
Async Processing Patterns
Task queue: Async processing of heavy tasks (transcoding, email)
Event streaming: Kafka → propagate events between services
CQRS: Separate read/write models for performance
Saga pattern: Handle distributed transactions
Quiz
Quiz 1: What is the difference between 301 and 302 redirects in a URL Shortener, and when should each be used?
Answer: 301 is a permanent redirect (Moved Permanently); 302 is a temporary redirect (Found/Temporary Redirect).
Explanation: With a 301, the browser caches the redirect, so subsequent requests skip the server entirely — reducing server load but making click analytics impossible. With a 302, every request goes through the server, enabling click tracking, A/B testing, and easy URL changes. If a URL shortener service collects advertising or analytics data, use 302. If minimizing server load is the priority, use 301.
Quiz 2: What problem occurs with Fan-out on write when a celebrity account with 100 million followers posts a tweet?
Answer: A "write storm" (also called the "Celebrity Problem" or "hotspot problem") occurs.
Explanation: Delivering a tweet ID via fan-out on write to 100 million followers requires 100 million write operations to Redis. This causes delays of several minutes and enormous resource consumption. Real Twitter applies a threshold (roughly 10,000 followers): celebrity accounts use fan-out on read instead. When a user loads their timeline, the system separately fetches the latest tweets from followed celebrities and merges them in.
Quiz 3: Name three reasons why Cassandra is well-suited for chat message storage.
Answer: High write throughput, time-ordered sorting support, and horizontal scalability.
Explanation: (1) Cassandra's LSM-Tree structure supports sequential writes without random disk I/O, enabling hundreds of thousands of message inserts per second. (2) Using a Snowflake ID as the Clustering Key automatically sorts messages by timestamp, making latest-message queries highly efficient. (3) Adding nodes provides linear performance scaling with automatic data rebalancing. In contrast, MySQL suffers from index overhead and vertical scaling limits on large message tables.
Quiz 4: In CAP Theorem, which type (CP or AP) should a banking system and a social media like count be, and why?
Answer: Banking systems should be CP; social media like counts should be AP.
Explanation: Bank account balances require absolute accuracy. During a network partition, a temporary service outage is preferable to showing an incorrect balance — so CP is chosen for strong consistency. In contrast, the difference between 1,234,567 and 1,234,570 likes barely affects user experience. Keeping the service running (availability) during a partition is more important, so AP is chosen with eventual consistency accepted.
Quiz 5: Why is Adaptive Bitrate Streaming (ABR) necessary for video streaming services, and how does it work?
Answer: To provide uninterrupted streaming to users across diverse network conditions.
Explanation: Mobile users move between 4G and WiFi, above and below ground, causing rapid bandwidth fluctuations. Fixed-quality streaming causes buffering when the network degrades. ABR splits video into 2-10 second segments and pre-encodes them at multiple qualities (360p to 4K). The client player monitors buffer levels and download speeds, dynamically selecting the quality for the next segment. When the network degrades, it switches to lower quality to maintain uninterrupted playback.