Skip to content
Published on

System Design Interview Complete Guide 2025: Scalable Architecture, Availability, and Design Patterns

Authors

Introduction

System design interviews are the most critical evaluation component in senior engineering hiring. While algorithm questions measure coding ability, system design evaluates comprehensive engineering capability -- the ability to architect large-scale systems and understand tradeoffs.

Companies like Google, Meta, Amazon, and Netflix ask candidates to design systems handling millions to hundreds of millions of users in 45-60 minutes. To succeed, you need more than technical knowledge -- you must analyze requirements, choose appropriate technologies, and clearly explain tradeoffs.

This guide systematically covers the 4-step framework, scalability, caching, databases, message queues, and real-world design examples.


1. The 4-Step System Design Interview Framework

1.1 Step 1: Clarify Requirements (5 minutes)

The most important first step. Jumping straight into design is the most common mistake.

Functional Requirements:

  • What are the core features?
  • What main actions will users perform?
  • What are the inputs and outputs?

Non-Functional Requirements:

  • Expected number of users / DAU
  • Read-to-write ratio
  • Latency requirements
  • Availability level (99.9%? 99.99%?)
  • Data consistency requirements
Interviewer: "Design a URL shortening service."

Good questions:
Q: "What's the expected DAU?"
A: "100 million."

Q: "Which is more important, URL shortening or redirection?"
A: "Redirection is more important. Reads are much more frequent."

Q: "Is there a length limit for shortened URLs?"
A: "As short as possible."

Q: "Should we support custom URLs?"
A: "Yes, optionally."

1.2 Step 2: High-Level Design (15-20 minutes)

Draw the big picture based on requirements.

+-----------+     +---------------+     +---------------+
|  Client   |---->| Load Balancer |---->|  Web Server   |
+-----------+     +---------------+     +------+--------+
                                               |
                    +--------------------------+
                    v                          v
             +-----------+            +---------------+
             |   Cache   |            |   Database    |
             +-----------+            +---------------+

What to cover at this stage:

  • API design (RESTful endpoints)
  • Data model design
  • Key components and their relationships
  • Data flow

1.3 Step 3: Deep Dive (15-20 minutes)

Dig deep into specific components the interviewer is interested in.

  • Database schema and indexing strategies
  • Caching strategies and invalidation
  • Scalability bottlenecks and solutions
  • Failure handling and recovery

1.4 Step 4: Wrap Up (5 minutes)

  • Identify design bottlenecks
  • Discuss future scaling directions
  • Monitoring/alerting strategy
  • Suggest additional features

2. Scalability

2.1 Vertical Scaling

Upgrading hardware specifications of existing servers.

Before:                 After:
+---------------+      +---------------+
|  4 CPU        |      |  32 CPU       |
|  16GB RAM     | ---> |  256GB RAM    |
|  500GB SSD    |      |  4TB SSD      |
+---------------+      +---------------+

Pros: Simple implementation, easy data consistency, no software changes needed

Cons: Physical limits exist (single server ceiling), single point of failure (SPOF), costs increase exponentially

2.2 Horizontal Scaling

Adding more servers to distribute load.

                    +---------------+
              +---->|  Server 1     |
              |     +---------------+
+-----------+ |     +---------------+
|   Load    |-+---->|  Server 2     |
| Balancer  | |     +---------------+
+-----------+ |     +---------------+
              +---->|  Server 3     |
              |     +---------------+
              |     +---------------+
              +---->|  Server N     |
                    +---------------+

Pros: Theoretically unlimited scaling, fault isolation, incremental expansion

Cons: Data consistency management complexity, session management needed, operational complexity increase

2.3 Stateless Service Design

The key to horizontal scaling is making servers stateless.

Bad (Stateful):
+-----------+  session  +-----------+
|  Client   |---------->| Server A  |  (Session lost if Server A dies)
+-----------+           +-----------+

Good (Stateless):
+-----------+     +-----------+     +-----------+
|  Client   |---->|   Any     |---->|   Redis   |
+-----------+     |  Server   |     | (Session) |
                  +-----------+     +-----------+

By storing sessions, cache, and state in external stores (Redis, Memcached), any server can handle any request identically.

2.4 Auto-Scaling Strategies

# AWS Auto Scaling configuration example
AutoScalingGroup:
  MinSize: 2
  MaxSize: 20
  DesiredCapacity: 4
  TargetTrackingScaling:
    TargetValue: 70.0          # Maintain 70% CPU
    PredefinedMetricType: ASGAverageCPUUtilization

  # Schedule-based scaling (predictable traffic)
  ScheduledActions:
    - Schedule: "0 8 * * MON-FRI"   # Weekdays 8 AM
      MinSize: 6
    - Schedule: "0 22 * * *"        # Daily 10 PM
      MinSize: 2
StrategyBased OnBest For
Target TrackingMetric target valueCPU, request count
Step ScalingThreshold stepsSudden traffic spikes
ScheduledTime-basedPredictable patterns
PredictiveML-basedRecurring patterns

3. Load Balancing

3.1 Load Balancing Algorithms

1. Round Robin (Sequential Distribution)
   Request 1 -> Server A
   Request 2 -> Server B
   Request 3 -> Server C
   Request 4 -> Server A  (back to start)

2. Weighted Round Robin
   Server A (weight=5): handles 5 requests
   Server B (weight=3): handles 3 requests
   Server C (weight=2): handles 2 requests

3. Least Connections
   Server A: 10 connections  <-- new request goes here
   Server B: 25 connections
   Server C: 18 connections

4. IP Hash
   hash(client_ip) % server_count = target_server
   -> Same client always goes to same server

3.2 L4 vs L7 Load Balancers

L4 (Transport Layer):
+----------+    TCP/UDP     +-----------+
| Client   |--------------->| L4 LB     |-- routes by IP:Port
+----------+                +-----------+
  Fast, simple, no SSL offload

L7 (Application Layer):
+----------+    HTTP/HTTPS  +-----------+
| Client   |--------------->| L7 LB     |-- routes by URL, Header, Cookie
+----------+                +-----------+
  Content-based routing, SSL termination, more flexible
FeatureL4L7
LayerTCP/UDPHTTP/HTTPS
PerformanceVery fastRelatively slower
RoutingIP/PortURL, Header, Cookie
SSL TerminationNoYes
ExamplesAWS NLBAWS ALB, Nginx

3.3 Consistent Hashing

Regular hashing redistributes almost all keys when servers are added or removed. Consistent hashing solves this problem.

Regular Hashing:
hash(key) % 3 = server number
-> Adding a server: hash(key) % 4 = most keys move!

Consistent Hashing (Hash Ring):

        Server A
        /      \
       /   0    \
      /          \
  Server D     Server B
      \          /
       \   key  /
        \  |  /
        Server C

-> Adding a server: only adjacent keys move (~1/N of total)

Virtual Nodes for even distribution:

class ConsistentHash:
    def __init__(self, nodes, virtual_nodes=150):
        self.ring = SortedDict()
        self.virtual_nodes = virtual_nodes

        for node in nodes:
            for i in range(virtual_nodes):
                key = self._hash(f"node-vn-i")
                self.ring[key] = node

    def get_node(self, data_key):
        if not self.ring:
            return None
        hash_val = self._hash(data_key)
        # Find nearest node clockwise on the hash ring
        idx = self.ring.bisect_right(hash_val)
        if idx == len(self.ring):
            idx = 0
        return self.ring.peekitem(idx)[1]

    def _hash(self, key):
        return int(hashlib.md5(key.encode()).hexdigest(), 16)

4. Caching

4.1 Caching Strategies

1. Cache-Aside (Lazy Loading):
   +--------+  1.read   +---------+
   |  App   |---------->|  Cache  |
   +---+----+  miss     +---------+
       | 2.read DB         ^
       v             3.store in cache
   +--------+
   |   DB   |
   +--------+

2. Write-Through:
   +--------+  1.write  +---------+  2.sync write  +--------+
   |  App   |---------->|  Cache  |--------------->|   DB   |
   +--------+           +---------+                +--------+
   (Always consistent, higher write latency)

3. Write-Back (Write-Behind):
   +--------+  1.write  +---------+  2.async batch write  +--------+
   |  App   |---------->|  Cache  |======================>|   DB   |
   +--------+           +---------+                       +--------+
   (Great write performance, data loss risk)

4. Write-Around:
   +--------+  1.write  +--------+
   |  App   |---------->|   DB   |
   +---+----+           +--------+
       | 2.load to cache on read
       v
   +---------+
   |  Cache  |
   +---------+

4.2 Redis vs Memcached

# Redis - supports various data structures
import redis

r = redis.Redis(host='localhost', port=6379)

# String
r.set('user:1001:name', 'Alice', ex=3600)  # 1-hour TTL

# Hash
r.hset('user:1001', mapping={
    'name': 'Alice',
    'email': 'alice@example.com',
    'login_count': 42
})

# Sorted Set (leaderboard)
r.zadd('leaderboard', {'player_a': 1500, 'player_b': 1800})
top_10 = r.zrevrange('leaderboard', 0, 9, withscores=True)

# Pub/Sub
r.publish('notifications', 'New message!')
FeatureRedisMemcached
Data StructuresString, Hash, List, Set, Sorted Set, etc.Key-Value only
PersistenceRDB/AOF supportedNone
ReplicationMaster-ReplicaNone
ClusteringRedis ClusterClient-side sharding
Memory EfficiencyRelatively lowerHigher (simple structure)
Use CasesSessions, leaderboards, queues, Pub/SubSimple caching

4.3 CDN (Content Delivery Network)

Without CDN:
User(Seoul) ---- 5000km ----> Origin Server(US)
RTT: ~200ms

With CDN:
User(Seoul) ---- 50km ----> CDN Edge(Seoul) -- cache hit
RTT: ~5ms

On Cache Miss:
User(Seoul) -> CDN Edge(Seoul) -> Origin(US)
                | store in cache
         Next request served from Edge

CDN Cache Invalidation Strategies:

1. TTL-based: Cache-Control: max-age=86400
2. Versioning: /static/app.v2.3.1.js
3. Manual invalidation: CloudFront Invalidation
4. Tag-based: Surrogate-Key header for selective purging

4.4 Cache Problems and Solutions

1. Cache Stampede (Thundering Herd):
   TTL expires -> thousands of requests hit DB simultaneously
   Solution: Mutex lock + pre-refresh

2. Cache Penetration:
   Repeated queries for non-existent keys -> every request hits DB
   Solution: Bloom Filter + cache empty results

3. Hot Key Problem:
   Traffic concentrated on specific key -> single cache node overloaded
   Solution: Local cache + key replication (key_1, key_2, ...)

5. Database Design

5.1 SQL vs NoSQL

CriteriaSQL (RDBMS)NoSQL
Data ModelNormalized tablesDocument, Key-Value, Column, Graph
SchemaFixed schemaFlexible schema
ScalingPrimarily verticalEasy horizontal scaling
TransactionsFull ACID supportLimited (some support)
JoinsEfficient supportLimited/impossible
Best ForComplex relationships, transactions criticalLarge volume, flexible schema, high performance
ProductsPostgreSQL, MySQLMongoDB, Cassandra, DynamoDB
SQL Selection Criteria:
- Complex data relationships with frequent joins
- ACID transactions essential (finance, payments)
- Data integrity is top priority
- Stable schema

NoSQL Selection Criteria:
- Data structure changes frequently
- Massive data volume (multiple TB+)
- Very low latency required
- Horizontal scaling essential
- Unstructured data (logs, social media)

5.2 Database Replication

Master-Slave Replication:
+-----------+     sync/async replication     +-----------+
|  Master   | =============================> |  Slave 1  |  <- Read-only
|  (Write)  | =============================> |  Slave 2  |  <- Read-only
+-----------+                                +-----------+

Multi-Master Replication:
+-----------+ <=============================> +-----------+
| Master 1  |    bidirectional replication     | Master 2  |
| (R/W)     |                                 | (R/W)     |
+-----------+                                 +-----------+
(Conflict resolution required)

5.3 Database Sharding

Before Sharding:
+-----------------------+
|   Single Database     |
|   10TB, latency grows |
+-----------------------+

After Sharding (Hash-based):
hash(user_id) % 4 = shard_number

+-----------+ +-----------+ +-----------+ +-----------+
|  Shard 0  | |  Shard 1  | |  Shard 2  | |  Shard 3  |
|  2.5TB    | |  2.5TB    | |  2.5TB    | |  2.5TB    |
+-----------+ +-----------+ +-----------+ +-----------+

Sharding Strategy Comparison:

StrategyProsCons
Range-basedEfficient range queriesHotspot potential
Hash-basedEven distributionInefficient range queries
Directory-basedFlexible mappingDirectory is SPOF
GeographicLocality advantageCross-region queries difficult

Sharding Considerations:

1. Join queries: Cross-shard joins impossible -> denormalization or application-level joins
2. Rebalancing: Shard redistribution on growth -> use Consistent Hashing
3. Hotspots: Traffic concentrated on specific shard -> redesign shard key
4. Transactions: Distributed transactions -> Saga pattern or 2PC
5. Unique IDs: Need globally unique IDs -> Snowflake ID, UUID

5.4 Denormalization

-- Normalized design (joins needed):
SELECT u.name, o.total, p.name AS product_name
FROM users u
JOIN orders o ON u.id = o.user_id
JOIN order_items oi ON o.id = oi.order_id
JOIN products p ON oi.product_id = p.id;

-- Denormalized design (single table):
-- orders table includes user_name, product_name
SELECT user_name, total, product_name
FROM denormalized_orders
WHERE user_id = 1001;

Denormalization improves read performance but increases data duplication and write complexity. Best suited when reads significantly outnumber writes.


6. CAP Theorem and PACELC

6.1 CAP Theorem

       Consistency
         /          \
        /            \
       /     CA       \
      /  (single DC)   \
     /------------------\
    |                    |
 CP |  Distributed Sys   | AP
    |  P is unavoidable  |
    |--------------------|
         Availability

During network partition:
CP system -> maintains consistency, rejects some requests
AP system -> maintains availability, allows temporary inconsistency

Real-World CAP Choices:

SystemChoiceReason
Bank AccountsCPBalance inconsistency unacceptable
DNSAPSlightly stale records acceptable
MongoDBCPStrong consistency by default
CassandraAPEventual consistency
ZooKeeperCPDistributed coordination

6.2 PACELC Theory

PACELC extends CAP by describing tradeoffs when there is no partition.

if (Partition) then
    Choose between Availability and Consistency
else
    Choose between Latency and Consistency

Examples:
- DynamoDB:   PA/EL (availability + low latency)
- MongoDB:    PC/EC (consistency-first)
- Cassandra:  PA/EL (availability + low latency)
- PostgreSQL(single): PC/EC (consistency + consistency)

7. Message Queues

7.1 The Need for Asynchronous Processing

Synchronous Processing:
Client -> API -> Send Email(3s) -> Resize Image(2s) -> Log(1s) -> Response
Total: 6 seconds waiting

Asynchronous Processing:
Client -> API -> Message Queue -> Response (immediately)
                    |
             +------+------+
             v             v
       Email Worker   Image Worker
         (3s)           (2s)
Total: 0.1 seconds waiting

7.2 Kafka vs RabbitMQ vs SQS

Apache Kafka:
+-----------+     +---------------------------+
| Producer  |---->|  Topic: orders            |
+-----------+     |  +-------+-------+------+ |
                  |  |  P0   |  P1   |  P2  | |
                  |  +---+---+---+---+--+---+ |
                  +------+-------+------+-----+
                         v       v      v
                  +------------------------+
                  |  Consumer Group A       |
                  |  (one consumer per      |
                  |   partition)            |
                  +------------------------+
FeatureKafkaRabbitMQSQS
ModelLog-based streamingAMQP message brokerManaged queue
ThroughputMillions/secTens of thousands/secAuto-scaling
Message RetentionConfigurable retentionDeleted after consumption14 days
OrderingGuaranteed within partitionPer-queue guaranteeWith FIFO queue
Consumption ModelPull (Consumer Poll)Push + PullPull (Long Polling)
Use CasesEvent streaming, logsTask queues, RPCServerless, decoupling

7.3 Message Delivery Guarantees

1. At-Most-Once:
   Producer -> Broker (no ACK wait)
   Fast but messages can be lost
   Use: Logs, metrics (loss acceptable)

2. At-Least-Once:
   Producer -> Broker -> ACK
   Retry on ACK failure -> duplicates possible
   Use: Most systems (idempotent processing needed)

3. Exactly-Once:
   Transaction-based or idempotency keys
   Performance overhead
   Use: Finance, payments (accuracy critical)

8. Rate Limiting

8.1 Token Bucket Algorithm

import time
import threading

class TokenBucket:
    def __init__(self, capacity, refill_rate):
        self.capacity = capacity        # Max tokens in bucket
        self.tokens = capacity           # Current tokens
        self.refill_rate = refill_rate   # Tokens added per second
        self.last_refill = time.time()
        self.lock = threading.Lock()

    def allow_request(self):
        with self.lock:
            now = time.time()
            # Refill tokens
            elapsed = now - self.last_refill
            self.tokens = min(
                self.capacity,
                self.tokens + elapsed * self.refill_rate
            )
            self.last_refill = now

            # Check if request allowed
            if self.tokens >= 1:
                self.tokens -= 1
                return True
            return False

# 10 requests/sec, burst up to 20
limiter = TokenBucket(capacity=20, refill_rate=10)

8.2 Sliding Window Log Algorithm

import time
from collections import deque

class SlidingWindowLog:
    def __init__(self, window_size, max_requests):
        self.window_size = window_size   # Window size (seconds)
        self.max_requests = max_requests
        self.requests = deque()

    def allow_request(self):
        now = time.time()
        # Remove old requests outside window
        while self.requests and self.requests[0] < now - self.window_size:
            self.requests.popleft()

        if len(self.requests) < self.max_requests:
            self.requests.append(now)
            return True
        return False

# Max 100 requests per minute
limiter = SlidingWindowLog(window_size=60, max_requests=100)

8.3 Sliding Window Counter

Sliding Window Counter (memory efficient):

Previous window count: 84
Current window count: 36
Window size: 60 seconds
Current position: 25% into window

Weighted count = 84 * (1 - 0.25) + 36 = 84 * 0.75 + 36 = 99

max_requests = 100 -> allowed (99 < 100)
AlgorithmMemoryAccuracyImplementation
Token BucketO(1)HighLow
Leaky BucketO(1)HighLow
Sliding Window LogO(N)Very HighMedium
Sliding Window CounterO(1)ApproximateMedium
Fixed WindowO(1)LowVery Low

9. Real-World Design Examples

9.1 URL Shortener Design

Requirements:

  • 100M DAU, read:write = 100:1
  • Shortened URLs as short as possible (7 chars)
  • Custom URL support
  • Redirection latency under 100ms

Back-of-Envelope Estimation:

Writes: 100M DAU * 0.1 URLs/day = 10M/day
Write QPS = 10,000,000 / 86,400 = ~116 QPS
Read QPS = 116 * 100 = ~11,600 QPS
Peak QPS = 11,600 * 3 = ~35,000 QPS

Storage (5 years):
10,000,000 * 365 * 5 = 18.25 billion records
Each record ~500 bytes -> 18.25B * 500B = ~9.1TB

System Architecture:

+----------+     +------+     +---------------+
|  Client  |---->|  LB  |---->|  API Server   |
+----------+     +------+     +------+--------+
                                     |
                    +----------------+----------------+
                    v                v                v
              +-----------+  +-----------+  +-----------+
              |   Redis   |  |  MySQL    |  | Analytics |
              |   Cache   |  | (Sharded) |  |  (Kafka)  |
              +-----------+  +-----------+  +-----------+

URL Encoding Strategy:

import hashlib

def generate_short_url(long_url, counter):
    # Method 1: Base62 encoding (auto-increment ID)
    chars = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
    short = ""
    n = counter
    while n > 0:
        short = chars[n % 62] + short
        n //= 62
    return short  # e.g., "dK4x9Z"

    # Method 2: MD5 hash + Base62 (first 7 chars)
    hash_val = hashlib.md5(long_url.encode()).hexdigest()
    num = int(hash_val[:12], 16)
    short = ""
    for _ in range(7):
        short = chars[num % 62] + short
        num //= 62
    return short

# 62^7 = ~3.5 trillion unique URLs possible

9.2 Twitter/X Feed Design

Core Problem: Fan-out

Fan-out on Write (Push Model):
When user tweets -> pre-write to all followers' timelines

User A tweets -> Fan-out Service
                +--> Follower 1's timeline cache
                +--> Follower 2's timeline cache
                +--> Follower N's timeline cache

Pros: Reads are very fast (pre-computed)
Cons: Enormous write load for users with many followers

Fan-out on Read (Pull Model):
When timeline requested -> assemble tweets in real-time

User B requests timeline -> Query followed users
                           +--> User X's latest tweets
                           +--> User Y's latest tweets
                           +--> User Z's latest tweets
                           -> merge + sort -> return

Pros: Simple writes
Cons: Multiple DB queries on every read

Celebrity Problem Solution (Hybrid):

Regular users (followers < 10,000): Fan-out on Write
Celebrities (followers > 10,000): Fan-out on Read

Timeline fetch:
1. Get pre-fanned-out tweets from cache
2. Query latest tweets from followed celebrities
3. Merge both results + sort by time

9.3 Chat System Design

WebSocket-based Real-time Communication:

+----------+  WebSocket  +---------------+
| User A   |<==========>|  Chat         |
+----------+             |  Server      |
                         |  Cluster     |
+----------+  WebSocket  |              |
| User B   |<==========>|              |
+----------+             +------+-------+
                                |
                    +-----------+-----------+
                    v           v           v
              +----------+ +--------+ +----------+
              |  Redis   | | Kafka  | | Message  |
              | Presence | |        | |    DB    |
              +----------+ +--------+ +----------+

Message Ordering:

Method 1: Server timestamps (simple but clock sync issues)
Method 2: Snowflake ID (time + server + sequence)

Snowflake ID Structure (64-bit):
+---------------+----------+----------+--------------+
|  Timestamp    | Datactr  | Machine  |   Sequence   |
|   41 bits     |  5 bits  |  5 bits  |   12 bits    |
+---------------+----------+----------+--------------+
-> Time-sortable + globally unique
-> Can generate ~4 million IDs/sec

Presence Management:

Heartbeat approach:
- Client sends heartbeat every 5 seconds
- Marked offline after 30 seconds without heartbeat
- Last heartbeat time stored in Redis

Status change notifications:
- Notify user's friend list on change
- For users with many friends: only notify currently online friends

9.4 Netflix / Video Streaming Design

Video Upload Pipeline:

+-----------+    +-----------+    +-------------------+
| Uploader  |--->|  Object   |--->|  Transcoding      |
|           |    |  Store    |    |  Pipeline          |
+-----------+    |  (S3)     |    |                    |
                 +-----------+    |  +---------------+ |
                                  |  | 240p          | |
                                  |  | 480p          | |
                                  |  | 720p          | |
                                  |  | 1080p         | |
                                  |  | 4K            | |
                                  |  +---------------+ |
                                  +---------+----------+
                                            v
                                  +-------------------+
                                  |   Deploy to CDN   |
                                  | (Global Edges)    |
                                  +-------------------+

Adaptive Bitrate Streaming (ABR):

Automatically adjusts quality based on user network:

Bandwidth 10Mbps -> request 1080p segments
Bandwidth 3Mbps  -> switch to 480p segments
Bandwidth recovers -> gradually increase quality

HLS (HTTP Live Streaming) approach:
1. Split video into 2-10 second segments
2. Encode each segment at multiple quality levels
3. Provide m3u8 manifest file listing segments
4. Client selects segments matching bandwidth

10. Back-of-Envelope Estimation

10.1 Commonly Used Numbers

Latency:
- L1 cache reference: ~1ns
- L2 cache reference: ~4ns
- Main memory reference: ~100ns
- SSD random read: ~150us
- HDD seek: ~10ms
- Same DC round trip: ~0.5ms
- US <-> Europe round trip: ~75ms

Capacity:
- 1 char = 1 byte (ASCII) / 2-4 bytes (UTF-8)
- 1 UUID = 36 bytes (string) / 16 bytes (binary)
- 1 image (medium) = ~200KB
- 1 video (1 min, 720p) = ~50MB

Daily conversion:
- 1 day = 86,400 sec = ~100,000 sec (simplified)
- 1 million requests/day = ~12 QPS
- 100 million requests/day = ~1,200 QPS

10.2 QPS Calculation Example

Twitter-scale estimation:
- DAU: 300 million
- Avg tweet views: 100/day
- Read QPS = 300M * 100 / 86,400 = ~347,000 QPS
- Peak QPS = 347,000 * 3 = ~1,000,000 QPS

- Avg tweets written: 2/day
- Write QPS = 300M * 2 / 86,400 = ~6,944 QPS

Storage:
- Tweet size: ~300 bytes (text + metadata)
- Daily: 600M tweets * 300B = ~180GB/day
- 5 years: 180GB * 365 * 5 = ~328TB
- With media: ~5x = ~1.6PB

11. Common Mistakes and Tips

11.1 Mistakes to Avoid

1. Jumping into design without clarifying requirements
   -> Always invest 5 minutes in questions

2. Focusing on a single component
   -> Big picture first, then deep dive where interviewer asks

3. Pursuing a perfect design
   -> Explaining tradeoffs is more important

4. Design without numbers
   -> Always calculate QPS, storage, bandwidth

5. Talking without interacting
   -> Converse with interviewer, accept feedback

6. Over-engineering
   -> Choose appropriate technology for the problem

11.2 Interview Tips

1. Structured Approach:
   - Always follow the 4-step framework
   - Time management in each step

2. Explain Tradeoffs:
   - "The advantage of this approach is A, the drawback is B."
   - "Alternative C exists, but D is more suitable for our situation."

3. Reference Real Examples:
   - "Netflix solved this problem using X approach."
   - "I experienced a similar issue in a previous project."

4. Whiteboard Usage:
   - Clean diagrams
   - Data flow arrows
   - Numbers/metrics displayed

5. Accept Interviewer Hints:
   - Interviewer questions are hints
   - "Great point, if we improve that part..."

12. Quiz

Q1. Scalability

Explain 3 key differences between horizontal and vertical scaling.

Answer:

  1. Cost Structure: Vertical scaling costs increase exponentially with high-spec hardware, while horizontal scaling costs grow linearly with commodity machines.
  2. Fault Isolation: Vertical scaling is a single point of failure (SPOF), while horizontal scaling means one server failure does not affect the whole system.
  3. Scaling Limits: Vertical scaling has physical limits, while horizontal scaling is theoretically unlimited.

The tradeoff for horizontal scaling is increased complexity in data consistency management, session handling, and operations.

Q2. CAP Theorem

Why do banking systems choose CP? What problems arise if they choose AP?

Answer: Banking systems choose CP because balance consistency is absolutely critical. If AP is chosen, during a network partition, withdrawals could happen simultaneously on two nodes, allowing more money to be withdrawn than the actual balance -- a double spending problem. This directly leads to financial loss, so consistency must be guaranteed even at the cost of some availability.

Q3. Caching

What is a Cache Stampede and how do you prevent it?

Answer: A Cache Stampede occurs when the TTL of a popular cache entry expires and thousands of requests simultaneously flood the database.

Prevention methods:

  1. Mutex/Distributed Lock: Only the first request fetches from DB, others wait
  2. Pre-refresh: Refresh in background before TTL expiration (at 80% of TTL)
  3. TTL Randomization: Add small random values to prevent simultaneous expiration
  4. External refresh: Dedicated worker periodically refreshes the cache

Q4. Message Queues

When should you choose Kafka vs RabbitMQ?

Answer: Choose Kafka:

  • Event streaming (log collection, real-time analytics)
  • High throughput needed (millions of messages per second)
  • Message replay needed (retained after consumption)
  • Event sourcing patterns

Choose RabbitMQ:

  • Traditional task queues (email sending, image processing)
  • Complex routing needed (Exchange, Routing Key)
  • Message priority needed
  • RPC patterns
  • Per-message ACK needed

Key difference: Kafka is an "event log", RabbitMQ is a "message broker".

Q5. Real Design

Why use Base62 encoding in a URL Shortener and how do you handle hash collisions?

Answer: Why Base62:

  • Uses only URL-safe characters (0-9, a-z, A-Z)
  • 62^7 = ~3.5 trillion unique URLs possible
  • Short strings represent large numbers

Collision Handling:

  1. Auto-increment ID + Base62: No collisions (recommended)
  2. MD5/SHA256 + first 7 chars: Append characters on collision
  3. Collision detection + retry: Check DB for duplicates, retry with different hash
  4. Bloom Filter: Quick existence check, regenerate on collision

In practice, Method 1 (using Snowflake ID in distributed environments) is most reliable.


References

  1. Alex Xu, "System Design Interview - An Insider's Guide" (2020)
  2. Alex Xu, "System Design Interview Volume 2" (2022)
  3. Martin Kleppmann, "Designing Data-Intensive Applications" (2017)
  4. Google SRE Book - https://sre.google/sre-book/table-of-contents/
  5. AWS Well-Architected Framework - https://aws.amazon.com/architecture/well-architected/
  6. Meta Engineering Blog - https://engineering.fb.com/
  7. Netflix Tech Blog - https://netflixtechblog.com/
  8. Uber Engineering Blog - https://eng.uber.com/
  9. Cassandra vs MongoDB vs DynamoDB - https://db-engines.com/en/system/Amazon+DynamoDB%3BCassandra%3BMongoDB
  10. Kafka Documentation - https://kafka.apache.org/documentation/
  11. Redis Documentation - https://redis.io/documentation
  12. Consistent Hashing Paper - Karger et al. (1997)
  13. Dynamo Paper - Amazon (2007)
  14. MapReduce Paper - Google (2004)