Skip to content
Published on

FAANG System Design Interview Complete Guide

Authors

FAANG System Design Interview Complete Guide

The system design interview is one of the most critical stages for senior engineering positions. This guide systematically covers the most frequently asked problems at FAANG (Facebook/Meta, Apple, Amazon, Netflix, Google) interviews.


1. System Design Interview Framework: RESHADED

The RESHADED framework helps you structure answers systematically within a 45-minute interview.

StepContentTime
R - RequirementsClarify functional/non-functional requirements5 min
E - EstimationScale estimation (DAU, QPS, Storage)5 min
S - StorageData model and DB selection5 min
H - High-level designDraft overall architecture10 min
A - APIsDesign API endpoints5 min
D - Detailed designDeep dive into key components10 min
E - EvaluationAnalyze trade-offs and bottlenecks3 min
D - Distinguishing featuresPropose differentiating enhancements2 min

45-Minute Time Allocation Strategy

[0-5 min]   Gather requirements and define scope
[5-10 min]  Capacity estimation (Back-of-envelope calculation)
[10-20 min] High-level design (architecture diagram)
[20-35 min] Detailed design of key components
[35-43 min] Discuss trade-offs and improvements
[43-45 min] Handle interviewer questions

Good vs Bad Interviewee Behavior

Good interviewee:

  • Clarifies requirements first and explicitly states assumptions
  • Voluntarily discusses trade-offs while designing
  • Justifies design decisions with numbers
  • Explains clearly: "This approach has issue X, but I chose it because of Y"
  • Treats the interviewer as a partner in a collaborative dialogue

Bad interviewee:

  • Starts designing without clarifying requirements
  • Presents only one solution without considering alternatives
  • Gives abstract answers without concrete technology choices
  • Long silences or thinking only internally
  • Ignores hints from the interviewer

2. Key Concept Quick Review

2.1 Horizontal vs Vertical Scaling

Vertical Scaling (Scale-up):

  • Upgrade CPU, RAM, or disk on a single server
  • Pros: Simple, easy to maintain data consistency
  • Cons: Hardware limits exist, SPOF (Single Point of Failure) risk

Horizontal Scaling (Scale-out):

  • Add server instances to distribute load
  • Pros: Theoretically infinite scaling, easier fault recovery
  • Cons: State sharing complexity, network overhead
Vertical:   [Server 4GB RAM][Server 32GB RAM]
Horizontal: [Server] + [Server] + [Server] → placed behind Load Balancer

2.2 Load Balancer Algorithms

AlgorithmDescriptionUse Case
Round RobinDistribute requests sequentiallyWhen server specs are identical
Least ConnectionsRoute to server with fewest active connectionsWhen request handling time varies
IP HashFixed distribution based on client IPWhen session persistence is required
Weighted Round RobinWeight servers by capacityWhen server specs differ

2.3 CDN (Content Delivery Network)

A CDN caches static content (images, JS, CSS, video) on edge servers worldwide, serving users from geographically closer locations.

User[Nearest CDN Edge]  (Cache Hit)Return content
                           (Cache Miss)[Origin Server]Cache in CDNReturn content

Push CDN vs Pull CDN:

  • Push CDN: Pre-upload content to CDN (suitable for large static files)
  • Pull CDN: Fetch from Origin on first request, then cache (suitable for dynamic content)

2.4 Caching Strategies

Cache-aside (Lazy Loading):

1. App queries cache for data
2. Cache miss → query DB
3. Store DB result in cache
4. Subsequent requests hit cache

Write-through:

1. App writes to cache
2. Cache synchronously writes to DB
Guarantees data consistency, incurs write latency

Write-back (Write-behind):

1. App writes only to cache
2. Cache asynchronously writes to DB
Excellent write performance, risk of data loss on cache failure

TTL (Time-To-Live): Sets cache expiration. Too short reduces cache efficiency; too long risks stale data.

2.5 Database: SQL vs NoSQL Selection Criteria

Choose SQL (Relational DB) when:

  • ACID transactions are required (payments, inventory)
  • Data structure is clear and changes infrequently
  • Complex JOIN queries are needed
  • Examples: PostgreSQL, MySQL

Choose NoSQL when:

  • Schema is flexible or changes frequently
  • Horizontal scaling is essential for large-scale services
  • Read performance is critical
  • Examples: MongoDB (document), Cassandra (column), Redis (key-value), Neo4j (graph)

2.6 CAP Theorem in Practice

CAP Theorem: A distributed system can only guarantee 2 of: Consistency, Availability, Partition tolerance simultaneously.

In practice, network Partition (P) is unavoidable, so:
  CP systems: Banking, stock trading (consistency first)
  AP systems: Social feeds, shopping carts (availability first)
SystemTypeReason
ZookeeperCPDistributed locks, config management
CassandraAPAlways writable, eventual consistency
HBaseCPStrong consistency
DynamoDBAP (default)Configurable to CP

2.7 Consistent Hashing

Problem with regular hashing: Adding/removing servers causes most keys to be remapped.

Consistent Hashing: Servers and keys are placed on a ring structure. Adding/removing a server only remaps a fraction of keys.

Ring structure (0 to 2^32 - 1):

        0
       /|\
      / | \
Server A  Server B
    \   |   /
     Server C

Keys are assigned to the nearest server clockwise
When a server is removed, only its keys move to the next server

Virtual Nodes: Place each server multiple times on the ring to ensure even distribution.

2.8 Message Queue

Message Queues enable asynchronous communication between services and achieve loose coupling.

Kafka:

  • High throughput, persistence (log retention), Consumer Group support
  • Use: Event streaming, log aggregation, real-time analytics

RabbitMQ:

  • Complex routing, various messaging patterns
  • Use: Task queues, notification systems
Producer[Message Queue]Consumer

Benefits:
- Asynchronous processing reduces response time
- Traffic buffering (absorbs traffic spikes)
- Reduced coupling between services

3. URL Shortener Design (TinyURL / bit.ly)

3.1 Requirements Clarification

Functional Requirements:

  • Accept a long URL and generate a short URL
  • Redirect short URL to original URL
  • (Optional) Custom short URL alias
  • (Optional) URL expiration date

Non-Functional Requirements:

  • DAU: 100M (100 million daily active users)
  • Read:Write ratio = 100:1 (reads vastly outnumber writes)
  • Availability: 99.9% SLA
  • Redirect latency: under 100ms

3.2 Capacity Estimation

Write QPS:
  - New URLs per day: 100M / 100 = 1M per day
  - Write QPS: 1,000,000 / 86,40012 QPS

Read QPS:
  - Read = Write × 100 = 1,200 QPS
  - Peak: 1,200 × 5 = 6,000 QPS

Storage:
  - 1 URL record ≈ 500 bytes
  - 10-year data: 1M × 365 × 10 × 500 bytes ≈ 1.8 TB

Cache:
  - 80/20 rule: top 20% URLs generate 80% of traffic
  - Cache size: 1,200 QPS × 86,400 × 0.2 × 500 bytes ≈ 10 GB/day

3.3 API Design

POST /api/v1/urls
  Request:  { "long_url": "https://example.com/...", "expire_date": "2027-01-01" }
  Response: { "short_url": "https://tinyurl.com/abc123" }

GET /{short_code}
  Response: 301 Redirect (permanent) or 302 Redirect (temporary)
301: Browser caches redirect, reduces server load (no analytics)
302: Always goes through server (enables click analytics)

3.4 Base62 Encoding

Character set: [0-9, a-z, A-Z] = 62 characters
6-char Base62 = 62^6 = 56.8 billion combinations

long_url → MD5/SHA-256 → extract first 7 bytes → Base62 encode → short code

Example:
  "https://www.example.com/long-path"
MD5: "1a2b3c4d..."
  → integer of first 7 bytes → Base62"aB3xY9z"

3.5 DB Schema Design

CREATE TABLE url_mappings (
  id          BIGINT PRIMARY KEY,        -- Snowflake ID
  short_code  VARCHAR(8) UNIQUE NOT NULL,
  long_url    TEXT NOT NULL,
  user_id     BIGINT,
  created_at  TIMESTAMP DEFAULT NOW(),
  expire_at   TIMESTAMP,
  click_count BIGINT DEFAULT 0
);

CREATE INDEX idx_short_code ON url_mappings(short_code);

DB Choice: MySQL (simple key-value lookup, ACID not required) or NoSQL (Cassandra) for horizontal scaling.

3.6 Overall Architecture Diagram

[Client]
    |
    v
[DNS][CDN] (static assets)
    |
    v
[Load Balancer]
    |
    +-----------+-----------+
    v           v           v
[API Server] [API Server] [API Server]
    |               |
    v               v
[Redis Cache]  [URL DB (Master)]
(short URL cache)   |
                [URL DB (Replica)]
                (read-only)

Write flow:
  ClientLBAPI ServerSave to DBUpdate Redis cache → Return short URL

Read flow:
  ClientLBAPI ServerCheck Redis cache
Cache hit: immediate redirect
Cache miss: query DB → store in Redis → redirect

3.7 High Availability

  • Redis Cluster: Cache short URLs in Redis to reduce DB load by 90%
  • DB Replication: Master-Slave for read/write separation
  • Rate Limiting: Block excessive short URL creation from the same IP
  • Uniqueness: Use UUID or Snowflake ID to prevent duplicate short codes

4. Twitter/X Feed System Design

4.1 Requirements

  • DAU: 300M
  • Tweet creation: 5M per day
  • Timeline loading: latest 20 tweets
  • Follower count: average 300, max 100M (celebrities)

4.2 Fan-out Strategies

Fan-out on write (Push model):

On tweet creation:
  1. Save tweet to DB
  2. Retrieve all follower IDs
  3. Insert tweet ID into each follower's timeline cache (Redis)

Pros: Reads are very fast (direct lookup from Redis)
Cons: 100M followers = 100M writes → high write latency

Fan-out on read (Pull model):

On timeline fetch:
  1. Retrieve list of followed accounts
  2. Fetch latest tweets from each account
  3. Merge and sort by time

Pros: Simple writes
Cons: Read queries scale with follow count → high read latency

Hybrid Strategy (actual Twitter approach):

Regular users (followers < 10,000):  Fan-out on write
Celebrity users (followers >= 10,000): Fan-out on read

Timeline generation:
  1. Retrieve tweet IDs from Redis for followed regular users (fan-out on write results)
  2. Fetch latest tweets from followed celebrities (fan-out on read)
  3. Merge and sort both result sets

4.3 Timeline Caching (Redis Sorted Set)

Key:   timeline:{user_id}
Value: Sorted Set (score = tweet timestamp, member = tweet_id)

Example:
  ZADD timeline:123 1700000001 tweet:456
  ZADD timeline:123 1700000002 tweet:789
  ZREVRANGE timeline:123 0 19  # Get latest 20 tweets

Keep a max of 1,000 tweet IDs per user (memory management)

4.4 Tweet ID Generation (Snowflake ID)

Twitter's 64-bit distributed ID generation scheme:

64-bit layout:
  [1 bit: sign] [41 bits: timestamp] [10 bits: machine ID] [12 bits: sequence]

Benefits:
  - Time-sortable (timestamp embedded)
  - No collisions in distributed environments
  - Generates 4,096 IDs/sec × number of machines

4.5 Overall Architecture

[Mobile/Web Clients]
         |
    [API Gateway]
         |
   +-----------+
   |           |
[Tweet Service] [Timeline Service]
   |                |
[Tweet DB]   [Redis Timeline Cache]
(Cassandra)          |
                [Fan-out Service]
                (Consumes Kafka,
                 updates follower timelines)
                     |
              [Follower Graph DB]
              (stores follow relationships)

Media handling:
[Image/Video Upload][Object Storage (S3)][CDN]

5. YouTube / Netflix Video Streaming Design

5.1 Requirements

  • DAU: 2B (YouTube scale)
  • Video uploads: 500 hours/minute
  • Concurrent streams: hundreds of millions
  • Supported resolutions: 360p to 4K

5.2 Video Upload Pipeline

[User]
   |
   v
[Upload Service]Store raw in S3
   |
   v
[Message Queue (Kafka)]
   |
   v
[Transcoding Workers] (parallel processing)
   ├── 360p encode
   ├── 720p encode
   ├── 1080p encode
   └── 4K encode
   |
   v
[Distribute to CDN]
   |
   v
[Update Metadata DB]Notify user of completion

Transcoding Optimization:

  • DAG (Directed Acyclic Graph)-based task splitting
  • Split video into GOP (Group of Pictures) units for parallel processing
  • Watermarking and thumbnail generation also included in pipeline

5.3 Adaptive Bitrate Streaming (ABR)

Automatically switches quality based on user network conditions:

[Good network]Request 1080p/4K segments
[Moderate network]Request 720p segments
[Poor network]Request 360p segments

HLS (HTTP Live Streaming):
  - Video split into 2-10 second segments (.ts files)
  - M3U8 playlist file manages segment list
  - Client monitors buffer state and selects quality for next segment

5.4 CDN Strategy

[Origin Server][Regional CDN][Edge CDN][User]

Popular videos: Pre-cached on multiple edge nodes (Push CDN)
Less popular:   Fetched from Origin on first request (Pull CDN)

Netflix approach: Partners with ISPs to place cache servers inside ISP networks
                  (OCA: Open Connect Appliance)

5.5 Overall Architecture

Upload path:
[Creator][Upload API][S3][Kafka][Transcoding Cluster]
                                                  |
                                             [CDN Distribution]

Viewing path:
[User][API Gateway][Video Service]
                               |
                 +-------------+-------------+
                 |             |             |
            [Metadata]    [CDN Stream]  [Recommendation]
            (MySQL)       (HLS/DASH)   (ML Model)

6. Chat System Design (WhatsApp / Slack)

6.1 Requirements

  • DAU: 500M
  • 1:1 chat, group chat (up to 500 members)
  • Message delivery latency: under 100ms
  • Read receipts (WhatsApp blue ticks)
  • Online presence indicator

6.2 Real-Time Communication Protocol Comparison

ApproachBehaviorProsCons
Long PollingHold connection until responseSimple to implementWastes server resources
SSE (Server-Sent Events)Server-to-client one-wayGood for one-way streamingCannot push from client
WebSocketFull-duplex bidirectionalLow latency, bidirectionalComplex connection management

Chat system choice: WebSocket

Client ←→ WebSocket connection ←→ Chat Server
  (persistent bidirectional communication)

6.3 Message Storage Strategy

Message characteristics:

  • Very high write frequency
  • Reads focus on recent messages
  • Rare deletions, no modifications

Why Cassandra / HBase:

Cassandra schema:
  Partition Key: channel_id
  Clustering Key: message_id (Snowflake, time-ordered)

  CREATE TABLE messages (
    channel_id  UUID,
    message_id  BIGINT,      -- Snowflake ID (time-embedded)
    sender_id   UUID,
    content     TEXT,
    created_at  TIMESTAMP,
    PRIMARY KEY (channel_id, message_id)
  ) WITH CLUSTERING ORDER BY (message_id DESC);

Very efficient for fetching latest messages per channel
Easy horizontal scaling

6.4 Read Receipts and Online Presence

Read Receipts:

Message states: SENTDELIVEREDREAD

1. Message sent → save to DB with state SENT
2. Reaches recipient device → update to DELIVERED → notify sender
3. Recipient views message → update to READ → notify sender

Online Presence:

Method 1: Heartbeat (ping server every 30 seconds)
  - Store user:{id}:last_seen = timestamp in Redis
  - TTL of 60 seconds → offline if no ping for 60+ seconds

Method 2: WebSocket connection tracking
  - Connect = online event, disconnect = offline event
  - Propagate status changes to friends via Pub/Sub

6.5 Group Chat Message Delivery

Small groups (N < 100):
  [Sender][Chat Server][Direct WebSocket delivery to each member]

Large groups (N >= 100):
  [Sender][Chat Server][Kafka Topic: group:{id}]
                                    |
                              [Consumer Cluster]
                                    |
                         [Push notification to each member device]

6.6 End-to-End Encryption Overview

WhatsApp's Signal Protocol-based approach:
  1. Each device generates a public/private key pair
  2. Only the public key is registered with the server
  3. Sender encrypts message with recipient's public key
  4. Server relays encrypted message (cannot decrypt content)
  5. Recipient decrypts with their private key

Google

ProblemKey Points
Search EngineCrawler, indexing, PageRank, autocomplete
Web CrawlerURL frontier, deduplication, robots.txt, politeness
Google MapsMap tiles, pathfinding (Dijkstra), ETA prediction
Google DriveFile upload/download, real-time co-editing, versioning

Meta (Facebook/Instagram)

ProblemKey Points
News FeedFan-out strategy, EdgeRank algorithm
InstagramPhoto upload, follow graph, timeline
Facebook MessagingReal-time chat, message sync
Friend RecommendationsGraph DB, mutual friend calculation

Amazon

ProblemKey Points
E-commerce CartSession storage, inventory management, payment processing
Recommendation SystemCollaborative Filtering, real-time vs batch
Amazon S3Object storage, 11-9s durability, multipart upload
Order ProcessingDistributed transactions, Saga pattern

Netflix

ProblemKey Points
Video StreamingABR, CDN, transcoding
Recommendation SystemA/B testing, personalized ML
API GatewayRate limiting, circuit breaker
NotificationsReal-time notification system

8. Key Design Patterns Summary

Database Patterns

Read scaling:    Master-Slave replication + read-only Replicas
Write scaling:   Sharding (Consistent Hashing)
Caching:         Redis (in-memory) → reduces DB load
Search:          Elasticsearch → full-text search
Time-series:     InfluxDB / TimescaleDB
Graph data:      Neo4j / Amazon Neptune

Async Processing Patterns

Task queue:       Async processing of heavy tasks (transcoding, email)
Event streaming:  Kafka → propagate events between services
CQRS:             Separate read/write models for performance
Saga pattern:     Handle distributed transactions

Quiz

Quiz 1: What is the difference between 301 and 302 redirects in a URL Shortener, and when should each be used?

Answer: 301 is a permanent redirect (Moved Permanently); 302 is a temporary redirect (Found/Temporary Redirect).

Explanation: With a 301, the browser caches the redirect, so subsequent requests skip the server entirely — reducing server load but making click analytics impossible. With a 302, every request goes through the server, enabling click tracking, A/B testing, and easy URL changes. If a URL shortener service collects advertising or analytics data, use 302. If minimizing server load is the priority, use 301.

Quiz 2: What problem occurs with Fan-out on write when a celebrity account with 100 million followers posts a tweet?

Answer: A "write storm" (also called the "Celebrity Problem" or "hotspot problem") occurs.

Explanation: Delivering a tweet ID via fan-out on write to 100 million followers requires 100 million write operations to Redis. This causes delays of several minutes and enormous resource consumption. Real Twitter applies a threshold (roughly 10,000 followers): celebrity accounts use fan-out on read instead. When a user loads their timeline, the system separately fetches the latest tweets from followed celebrities and merges them in.

Quiz 3: Name three reasons why Cassandra is well-suited for chat message storage.

Answer: High write throughput, time-ordered sorting support, and horizontal scalability.

Explanation: (1) Cassandra's LSM-Tree structure supports sequential writes without random disk I/O, enabling hundreds of thousands of message inserts per second. (2) Using a Snowflake ID as the Clustering Key automatically sorts messages by timestamp, making latest-message queries highly efficient. (3) Adding nodes provides linear performance scaling with automatic data rebalancing. In contrast, MySQL suffers from index overhead and vertical scaling limits on large message tables.

Quiz 4: In CAP Theorem, which type (CP or AP) should a banking system and a social media like count be, and why?

Answer: Banking systems should be CP; social media like counts should be AP.

Explanation: Bank account balances require absolute accuracy. During a network partition, a temporary service outage is preferable to showing an incorrect balance — so CP is chosen for strong consistency. In contrast, the difference between 1,234,567 and 1,234,570 likes barely affects user experience. Keeping the service running (availability) during a partition is more important, so AP is chosen with eventual consistency accepted.

Quiz 5: Why is Adaptive Bitrate Streaming (ABR) necessary for video streaming services, and how does it work?

Answer: To provide uninterrupted streaming to users across diverse network conditions.

Explanation: Mobile users move between 4G and WiFi, above and below ground, causing rapid bandwidth fluctuations. Fixed-quality streaming causes buffering when the network degrades. ABR splits video into 2-10 second segments and pre-encodes them at multiple qualities (360p to 4K). The client player monitors buffer levels and download speeds, dynamically selecting the quality for the next segment. When the network degrades, it switches to lower quality to maintain uninterrupted playback.