✍️ 필사 모드: DB Sharding & Partitioning Complete Guide 2025: Horizontal Scaling, Range/Hash/Directory, Resharding, Vitess
EnglishTL;DR
- Sharding = the answer to horizontal scaling: essential at a scale where a single DB cannot cope
- Shard key selection is destiny: wrong choice leads to hot spots and resharding hell
- Three main strategies: Range, Hash, Directory (Lookup) — each has trade-offs
- Distributed transactions are hard: 2PC is slow, Saga is practical
- Vitess is the standard: used by YouTube/Slack/GitHub. A distributed layer on top of MySQL
1. What is Sharding?
1.1 Limits of a Single DB
[Single DB]
├─ CPU: up to one machine's worth
├─ RAM: 256GB? 512GB?
├─ Disk: 30TB
└─ Network: 25 Gbps
→ Eventually hits the limit
Vertical Scaling:
- Bigger machine
- Has limits (largest EC2 = 24TB RAM, $20+/hour)
- Single point of failure
Horizontal Scaling:
- Multiple machines
- Infinite scaling
- Increased complexity
1.2 Can Read Replicas Solve It?
[Primary] (writes)
↓ replication
[Replica 1] [Replica 2] [Replica 3] (reads)
Helps: distributes read traffic.
Limitation: writes still go through a single Primary. If writes are the bottleneck, this does not help.
→ Sharding is required.
1.3 Definition of Sharding
Sharding = distributing data across multiple independent DBs (shards).
[users table]
↓ shard by user_id
[Shard 1: user_id 1-1M]
[Shard 2: user_id 1M-2M]
[Shard 3: user_id 2M-3M]
...
Each shard is a completely independent DB with its own CPU, RAM, and disk.
1.4 Effects of Sharding
| Single DB | 100 Shards | |
|---|---|---|
| Data | 1 TB | 100 TB |
| Write TPS | 10,000 | 1,000,000 |
| Cost | $10K/month | $50K/month |
| Complexity | Low | Very High |
100x performance + 5x cost = good ROI. However, complexity is a serious problem.
2. Shard Key Selection — the Most Important Decision
2.1 Conditions for a Good Shard Key
- Even distribution — data spread evenly across all shards
- Immutable — rarely changes once set
- Matches query patterns — data queried together lives on the same shard
- High cardinality — enough unique values
- Not monotonically increasing — auto-increment IDs risk hot spots
2.2 Common Shard Key Candidates
user_id:
- Groups a user's data on one shard (related data together)
- Large users become hot spots
- Cross-user JOINs are hard
created_at:
- Monotonically increasing → all new data goes to one shard (hot spot)
- OK for time-series data
hash(id):
- Even distribution
- No range queries possible
- Related data lands on different shards
country:
- Geographic efficiency
- Very uneven (US is huge, Monaco is tiny)
organization_id:
- Ideal for B2B SaaS
- Large organizations become hot spots
2.3 Slack's Case
Slack's shard key = workspace_id.
Why:
- All data from the same workspace on one shard → fast lookups
- Almost no cross-workspace JOINs
- Natural isolation per workspace
Downside: a massive enterprise workspace overwhelms a single shard → the "Whale shard" problem.
Solution: huge workspaces get their own dedicated shard.
3. Sharding Strategies
3.1 Range Sharding
Partition data by key range.
Shard 1: id 1 ~ 1,000,000
Shard 2: id 1,000,001 ~ 2,000,000
Shard 3: id 2,000,001 ~ 3,000,000
Pros:
- Efficient range queries (
WHERE id BETWEEN 500000 AND 600000) - Simple routing
- Easy to add new shards
Cons:
- Hot spot risk: with monotonically increasing keys, new data lands entirely on the last shard
- Possible uneven distribution
Use cases: time-series data, BigTable, HBase
3.2 Hash Sharding
Partition by the hash of the shard key.
shard_id = hash(user_id) % num_shards
Pros:
- Even distribution — assuming a good hash
- Safe even for monotonic keys
Cons:
- No range queries (
WHERE id BETWEEN ...) - Adding a shard forces nearly all data to be redistributed (because
% num_shardschanges)
Solution: Consistent Hashing — only some data moves when shards are added or removed.
3.3 Consistent Hashing
[Circular hash space]
Shard A
/
Shard D
\
Shard B
/
Shard C
Each shard and data item is mapped to the same hash space. Data goes to the nearest shard clockwise.
When adding a shard: only the affected section is redistributed (1/N of the total).
Virtual Nodes: represent each shard with multiple virtual nodes → more even distribution.
Used by: Cassandra, DynamoDB, Riak, Memcached.
3.4 Directory-Based Sharding
Use a lookup table to explicitly map which data lives on which shard.
[Lookup]
user_1 → Shard A
user_2 → Shard B
user_3 → Shard A
...
Pros:
- Flexible: arbitrary mappings possible
- Handles uneven distributions
- Easier migrations
Cons:
- The lookup itself becomes a bottleneck (caching is mandatory)
- Extra infrastructure (Redis, Consul)
- Single point of failure
Used by: Vitess (a variation of this approach)
3.5 Geographic Sharding
Shard by location:
- US users → US shard
- EU users → EU shard
- Asia users → Asia shard
Pros:
- Close to users → lower latency
- Compliance (GDPR — EU data stays in EU)
- Natural partitioning
Cons:
- Hard when users move
- Global queries are hard
- Very uneven (different user counts per region)
3.6 Comparison Table
| Strategy | Evenness | Range Queries | Adding Shards | Query Pattern |
|---|---|---|---|---|
| Range | Low | Yes | Easy | Range |
| Hash | High | No | Hard | Exact |
| Consistent Hash | High | No | Easy | Exact |
| Directory | High | Partial | Easy | Flexible |
| Geographic | Low | Partial | Moderate | Regional |
4. The Hot Spot Problem
4.1 What is a Hot Spot?
Traffic or data concentrating on a specific shard.
Causes:
- Wrong shard key (popular users, popular products)
- Monotonically increasing keys (created_at)
- Uneven distribution (a few giant organizations)
4.2 Impact
[Shard 1: 10% traffic]
[Shard 2: 5% traffic]
[Shard 3: 80% traffic] ← Hot spot! Dying
[Shard 4: 5% traffic]
The entire system is bottlenecked by the weakest shard.
4.3 Detection
-- Data volume per shard
SELECT shard_id, COUNT(*) as row_count
FROM users
GROUP BY shard_id;
-- Query count per shard (application metrics)
Warning signs:
- One shard is 5x+ larger than the average
- One shard's CPU/IO is at 100%
4.4 Solutions
1. Change the shard key
- Best but hardest (requires resharding)
2. Composite Key
shard_key = hash(user_id + timestamp_bucket)
Combines time-based + user-based factors.
3. Sub-sharding
- Split hot shards into finer pieces
- Handle "Whale shards"
4. Caching
- Move hot data to Redis
- Reduce DB load
5. Read Replicas
- Distribute read load
- Writes remain hot
4.5 Example: Twitter's Celebrity Problem
Problem: a celebrity user (say, Elon Musk) tweets → must update timelines of hundreds of millions of followers.
Solution:
- Fan-out on write (regular users): pre-populate follower timelines on tweet
- Fan-in on read (celebrities): do not build timelines on tweet, fetch on read
- Hybrid: fan-out for active followers, fan-in for inactive ones
5. Difficulty of Distributed Queries
5.1 Single-Shard Queries
SELECT * FROM users WHERE user_id = 12345;
→ determine shard by user_id → query only that shard. Very fast.
5.2 Cross-Shard Queries
SELECT COUNT(*) FROM users WHERE country = 'KR';
→ query all shards → aggregate results. Slow + heavy load.
5.3 Difficulty of JOINs
SELECT u.name, o.total
FROM users u
JOIN orders o ON o.user_id = u.user_id
WHERE u.country = 'KR';
What if users and orders live on different shards? You need a distributed JOIN.
Strategies:
- Co-location: keep users and orders with the same user_id on the same shard
- Denormalization: copy user info into orders
- Distributed query engine: Vitess, Citus handle it automatically
- CQRS: use a separate DW (Snowflake) for queries
5.4 Distributed Transactions
def transfer(from_user, to_user, amount):
# The two users may live on different shards
db.update(f"... WHERE user_id={from_user}", -amount)
db.update(f"... WHERE user_id={to_user}", +amount)
Problem: first update succeeds + second fails → money disappears.
Solutions:
1. 2PC (Two-Phase Commit):
- Prepare → Commit
- All shards must agree to commit
- Slow + single point of failure → not recommended
2. Saga Pattern:
- Compensating transactions
- Eventually consistent
- De facto standard in practice
3. Keep in the same shard:
- When possible, put sender/receiver in the same shard
- Not always feasible
6. Resharding — the Scariest Operation
6.1 Why is Resharding Hard?
- Data movement: moving tens of TB
- Zero downtime: no user impact
- Rollback: recover from problems
- Consistency: handle writes/reads mid-migration
6.2 Resharding Scenarios
1. Adding shards: 4 → 8 shards 2. Splitting a shard: dividing a huge shard in two 3. Merging shards: combining small shards 4. Changing the shard key: the hardest of all
6.3 Resharding Patterns
1. Bulk Copy + Switchover:
- Copy data to new shards
- Short maintenance window
- Switch traffic
- Drop old shards
Downside: downtime.
2. CDC + Dual Write:
- Sync old → new shards via CDC
- Application writes to both old + new (dual write)
- Verify data parity
- Switch reads to new
- Retire old shards
Zero downtime, complex.
3. Online Migration:
- Vitess's VReplication
- Automated zero-downtime migration
- Progress monitoring
6.4 Case: Slack's Sharding Evolution
- 2017: single MySQL → replicas
- 2018: first sharding (workspace_id)
- 2020: introduced Vitess
- 2022: handled whale workspaces
- 2024: thousands of shards, petabytes
Lesson: sharding is never one and done. It is a continuous evolution.
7. Vitess — a Distributed Layer on Top of MySQL
7.1 What is Vitess?
A MySQL sharding solution built by YouTube. CNCF graduated.
[Client]
↓
[VTGate] (query router)
↓
[VTTablet] [VTTablet] [VTTablet]
↓ ↓ ↓
[MySQL] [MySQL] [MySQL]
Pros:
- MySQL compatible — existing code works as is
- Automatic sharding — key-based routing
- Schema migrations — safe DDL
- VReplication — zero-downtime resharding
- Battle-tested at scale — YouTube, Slack, GitHub, Square
7.2 Companies Using It
- YouTube (creator, petabytes)
- Slack (large scale)
- GitHub (migrated MySQL → Vitess)
- Square
- JD.com (China's largest e-commerce)
7.3 VSchema Example
{
"sharded": true,
"vindexes": {
"hash": {
"type": "hash"
}
},
"tables": {
"users": {
"column_vindexes": [
{
"column": "user_id",
"name": "hash"
}
]
}
}
}
→ VTGate automatically shards based on user_id.
8. Other Distributed DB Options
8.1 Citus (PostgreSQL)
A PostgreSQL extension. Supports distributed queries and distributed transactions.
SELECT create_distributed_table('events', 'user_id');
Pros: PostgreSQL compatible, keep SQL as is. Cons: inherits PostgreSQL limitations (less mature than Vitess).
8.2 CockroachDB
Designed from scratch as a distributed SQL DB.
Pros:
- PostgreSQL compatible
- Automatic sharding (range-based)
- Global distribution support
- Distributed ACID transactions
Cons:
- Some PG features unsupported
- More expensive to operate
- Learning curve
Used by: DoorDash, Bose, Comcast.
8.3 YugabyteDB
Similar to CockroachDB. Both PostgreSQL compatible and Cassandra compatible.
8.4 TiDB
MySQL compatible + automatic distribution. Popular in China.
8.5 Comparison Table
| Vitess | Citus | CockroachDB | YugabyteDB | TiDB | |
|---|---|---|---|---|---|
| Base | MySQL | PostgreSQL | Custom | Custom | Custom |
| Compatibility | MySQL | PostgreSQL | PostgreSQL | PG + CQL | MySQL |
| Distributed Transactions | Limited | Yes | Yes | Yes | Yes |
| Automatic Sharding | Manual key | Manual key | Automatic | Automatic | Automatic |
| Operational Difficulty | Moderate | Low | Moderate | Moderate | Moderate |
| Validation | YouTube, Slack | Many | DoorDash | Many | Chinese giants |
9. In Practice — Designing E-commerce Sharding
9.1 Scenario
- 100 million users
- 10 million daily orders
- Single PostgreSQL hitting its limits
9.2 Analysis
Tables:
users(100M rows)orders(1B rows)products(1M rows)reviews(5B rows)
9.3 Sharding Decisions
1. users:
- Shard key:
user_id(hash) - 16 shards
- Even distribution
2. orders:
- Shard key:
user_id(hash, same shard as users) - Same user's user + orders on the same shard → JOIN is fast
3. products:
- Small (1M rows)
- No sharding — single DB + read replicas
- Or replicate across all shards
4. reviews:
- Shard key:
product_id(hash) - Fits product-based analytics
9.4 Query Patterns
-- User orders (same shard)
SELECT * FROM orders WHERE user_id = 12345;
-- Fast
-- User info + orders (same shard, JOIN OK)
SELECT u.*, o.*
FROM users u JOIN orders o ON o.user_id = u.user_id
WHERE u.user_id = 12345;
-- Fast
-- Product reviews (different shard)
SELECT * FROM reviews WHERE product_id = 67890;
-- Fast
-- All reviews written by a user (cross-shard)
SELECT * FROM reviews WHERE user_id = 12345;
-- Slow — scans every reviews shard
-- Solution: denormalize. Store review summaries on the users shard too.
9.5 Infrastructure
[Application]
↓
[ProxySQL / Vitess]
↓
[16 user shards]
[16 order shards (co-located)]
[1 products DB + 5 read replicas]
[8 reviews shards]
9.6 Monitoring
- Query count per shard
- Disk usage per shard
- Cross-shard query ratio
- Hot spot detection
10. Sharding Pitfalls and Lessons
10.1 Sharding Too Early
Wrong decision: sharding at 1000 QPS of traffic.
Reality:
- Single PostgreSQL = 50,000 QPS+
- Read replica = 200,000 QPS+
- Proper indexes + caching = even more
→ Try every other optimization first before sharding.
10.2 Wrong Shard Key
GitHub's first sharding attempt failure case:
- Sharded by repo_id → popular repos became hot spots
- Linus's linux repository overwhelmed a single shard
- Resharding required (months of work)
Lesson: the shard key is extremely hard to change. Be careful up front.
10.3 Avoiding Distributed Transactions
Distributed transactions really are hard. Whenever possible, keep transactions within a single shard.
While designing: shape the data model so that every write transaction fits in a single shard.
10.4 Operational Complexity
A sharded system means:
- Complex backups (16 shards simultaneously)
- Complex monitoring
- Harder debugging (which shard?)
- Harder migrations
→ Automation is essential. Have enough operations staff.
10.5 Is NewSQL the Answer?
NewSQL systems like CockroachDB and YugabyteDB offer automatic sharding + distributed transactions. None of the complexity of manual sharding.
When to go NewSQL:
- New projects
- Strong consistency required
- Limited operations staff
When to go manual sharding:
- Existing MySQL/PostgreSQL
- Extreme performance needed
- Concerns about NewSQL maturity
Quiz
1. Why can sharding not be replaced by read replicas?
Answer: read replicas only distribute reads. Writes still go through a single Primary. A system where writes are the bottleneck (e.g. 1 billion INSERTs per day) cannot be fixed with read replicas. Only sharding can spread writes across multiple machines. That said, sharding brings enormous complexity, so first check whether read replicas + caching + index optimization can solve the problem.
2. What are the pros and cons of hash sharding?
Answer: Pros: even distribution (with a good hash). Safe even with monotonically increasing keys. Cons: no range queries (WHERE id BETWEEN ...). Adding a shard redistributes nearly all data (because % num_shards changes). Solution: Consistent Hashing — only a fraction (1/N) of the data moves on add/remove. Used by Cassandra, DynamoDB, and Riak. Virtual nodes give more even distribution.
3. How do you solve the hot spot problem?
Answer: (1) Change the shard key — best, but requires resharding (hard), (2) Composite key — hash(user_id + timestamp_bucket) to spread across time/users, (3) Sub-sharding — split hot shards finer ("whale shard"), (4) Caching — move hot data to Redis, (5) Read replicas — distribute read load. The most common hot spots: popular users, popular products, monotonically increasing keys. Always review hot spot scenarios when choosing a shard key.
4. Why did Vitess become the standard for MySQL distribution?
Answer: (1) Born at YouTube — proven at massive scale, (2) MySQL compatible — existing code/tools keep working, (3) Automatic sharding — key-based routing, (4) VReplication — zero-downtime migrations and resharding, (5) CNCF graduated — cloud-native standard, (6) Proven users — Slack, GitHub, Square. Downside: distributed transactions are relatively weak (the Saga pattern is recommended). PostgreSQL users use Citus for a similar role.
5. When should you use NewSQL (CockroachDB) vs. manual sharding?
Answer: Pick NewSQL (CockroachDB, YugabyteDB) when: new project, strong consistency required (distributed ACID), limited operations staff, PostgreSQL compatibility is acceptable. Pick manual sharding (Vitess + MySQL) when: existing MySQL system, extreme performance needed, worried about NewSQL maturity, want more control. Reality: new systems increasingly go NewSQL; existing huge systems stick with Vitess. Both are valid — decide based on trade-offs.
References
- Designing Data-Intensive Applications — Martin Kleppmann (sharding chapter)
- Vitess — official YouTube project
- Citus Data
- CockroachDB Docs
- YugabyteDB
- TiDB
- Slack's Data Stores — Vitess migration
- GitHub MySQL → Vitess
- Stripe's Sharding Approach
- Discord's Trillion Messages — uses Cassandra
- Pinterest Sharding
현재 단락 (1/406)
- **Sharding = the answer to horizontal scaling**: essential at a scale where a single DB cannot cop...