Skip to content

✍️ 필사 모드: DB Sharding & Partitioning Complete Guide 2025: Horizontal Scaling, Range/Hash/Directory, Resharding, Vitess

English
0%
정확도 0%
💡 왼쪽 원문을 읽으면서 오른쪽에 따라 써보세요. Tab 키로 힌트를 받을 수 있습니다.

TL;DR

  • Sharding = the answer to horizontal scaling: essential at a scale where a single DB cannot cope
  • Shard key selection is destiny: wrong choice leads to hot spots and resharding hell
  • Three main strategies: Range, Hash, Directory (Lookup) — each has trade-offs
  • Distributed transactions are hard: 2PC is slow, Saga is practical
  • Vitess is the standard: used by YouTube/Slack/GitHub. A distributed layer on top of MySQL

1. What is Sharding?

1.1 Limits of a Single DB

[Single DB]
  ├─ CPU: up to one machine's worth
  ├─ RAM: 256GB? 512GB?
  ├─ Disk: 30TB
  └─ Network: 25 Gbps

Eventually hits the limit

Vertical Scaling:

  • Bigger machine
  • Has limits (largest EC2 = 24TB RAM, $20+/hour)
  • Single point of failure

Horizontal Scaling:

  • Multiple machines
  • Infinite scaling
  • Increased complexity

1.2 Can Read Replicas Solve It?

[Primary] (writes)
   ↓ replication
[Replica 1] [Replica 2] [Replica 3] (reads)

Helps: distributes read traffic.

Limitation: writes still go through a single Primary. If writes are the bottleneck, this does not help.

Sharding is required.

1.3 Definition of Sharding

Sharding = distributing data across multiple independent DBs (shards).

[users table]
  ↓ shard by user_id
[Shard 1: user_id 1-1M]
[Shard 2: user_id 1M-2M]
[Shard 3: user_id 2M-3M]
...

Each shard is a completely independent DB with its own CPU, RAM, and disk.

1.4 Effects of Sharding

Single DB100 Shards
Data1 TB100 TB
Write TPS10,0001,000,000
Cost$10K/month$50K/month
ComplexityLowVery High

100x performance + 5x cost = good ROI. However, complexity is a serious problem.


2. Shard Key Selection — the Most Important Decision

2.1 Conditions for a Good Shard Key

  1. Even distribution — data spread evenly across all shards
  2. Immutable — rarely changes once set
  3. Matches query patterns — data queried together lives on the same shard
  4. High cardinality — enough unique values
  5. Not monotonically increasing — auto-increment IDs risk hot spots

2.2 Common Shard Key Candidates

user_id:

  • Groups a user's data on one shard (related data together)
  • Large users become hot spots
  • Cross-user JOINs are hard

created_at:

  • Monotonically increasing → all new data goes to one shard (hot spot)
  • OK for time-series data

hash(id):

  • Even distribution
  • No range queries possible
  • Related data lands on different shards

country:

  • Geographic efficiency
  • Very uneven (US is huge, Monaco is tiny)

organization_id:

  • Ideal for B2B SaaS
  • Large organizations become hot spots

2.3 Slack's Case

Slack's shard key = workspace_id.

Why:

  • All data from the same workspace on one shard → fast lookups
  • Almost no cross-workspace JOINs
  • Natural isolation per workspace

Downside: a massive enterprise workspace overwhelms a single shard → the "Whale shard" problem.

Solution: huge workspaces get their own dedicated shard.


3. Sharding Strategies

3.1 Range Sharding

Partition data by key range.

Shard 1: id 1 ~ 1,000,000
Shard 2: id 1,000,001 ~ 2,000,000
Shard 3: id 2,000,001 ~ 3,000,000

Pros:

  • Efficient range queries (WHERE id BETWEEN 500000 AND 600000)
  • Simple routing
  • Easy to add new shards

Cons:

  • Hot spot risk: with monotonically increasing keys, new data lands entirely on the last shard
  • Possible uneven distribution

Use cases: time-series data, BigTable, HBase

3.2 Hash Sharding

Partition by the hash of the shard key.

shard_id = hash(user_id) % num_shards

Pros:

  • Even distribution — assuming a good hash
  • Safe even for monotonic keys

Cons:

  • No range queries (WHERE id BETWEEN ...)
  • Adding a shard forces nearly all data to be redistributed (because % num_shards changes)

Solution: Consistent Hashing — only some data moves when shards are added or removed.

3.3 Consistent Hashing

[Circular hash space]
        Shard A
       /
  Shard D
       \
        Shard B
       /
  Shard C

Each shard and data item is mapped to the same hash space. Data goes to the nearest shard clockwise.

When adding a shard: only the affected section is redistributed (1/N of the total).

Virtual Nodes: represent each shard with multiple virtual nodes → more even distribution.

Used by: Cassandra, DynamoDB, Riak, Memcached.

3.4 Directory-Based Sharding

Use a lookup table to explicitly map which data lives on which shard.

[Lookup]
user_1 → Shard A
user_2 → Shard B
user_3 → Shard A
...

Pros:

  • Flexible: arbitrary mappings possible
  • Handles uneven distributions
  • Easier migrations

Cons:

  • The lookup itself becomes a bottleneck (caching is mandatory)
  • Extra infrastructure (Redis, Consul)
  • Single point of failure

Used by: Vitess (a variation of this approach)

3.5 Geographic Sharding

Shard by location:

  • US users → US shard
  • EU users → EU shard
  • Asia users → Asia shard

Pros:

  • Close to users → lower latency
  • Compliance (GDPR — EU data stays in EU)
  • Natural partitioning

Cons:

  • Hard when users move
  • Global queries are hard
  • Very uneven (different user counts per region)

3.6 Comparison Table

StrategyEvennessRange QueriesAdding ShardsQuery Pattern
RangeLowYesEasyRange
HashHighNoHardExact
Consistent HashHighNoEasyExact
DirectoryHighPartialEasyFlexible
GeographicLowPartialModerateRegional

4. The Hot Spot Problem

4.1 What is a Hot Spot?

Traffic or data concentrating on a specific shard.

Causes:

  • Wrong shard key (popular users, popular products)
  • Monotonically increasing keys (created_at)
  • Uneven distribution (a few giant organizations)

4.2 Impact

[Shard 1: 10% traffic]
[Shard 2: 5% traffic]
[Shard 3: 80% traffic]Hot spot! Dying
[Shard 4: 5% traffic]

The entire system is bottlenecked by the weakest shard.

4.3 Detection

-- Data volume per shard
SELECT shard_id, COUNT(*) as row_count
FROM users
GROUP BY shard_id;

-- Query count per shard (application metrics)

Warning signs:

  • One shard is 5x+ larger than the average
  • One shard's CPU/IO is at 100%

4.4 Solutions

1. Change the shard key

  • Best but hardest (requires resharding)

2. Composite Key

shard_key = hash(user_id + timestamp_bucket)

Combines time-based + user-based factors.

3. Sub-sharding

  • Split hot shards into finer pieces
  • Handle "Whale shards"

4. Caching

  • Move hot data to Redis
  • Reduce DB load

5. Read Replicas

  • Distribute read load
  • Writes remain hot

4.5 Example: Twitter's Celebrity Problem

Problem: a celebrity user (say, Elon Musk) tweets → must update timelines of hundreds of millions of followers.

Solution:

  • Fan-out on write (regular users): pre-populate follower timelines on tweet
  • Fan-in on read (celebrities): do not build timelines on tweet, fetch on read
  • Hybrid: fan-out for active followers, fan-in for inactive ones

5. Difficulty of Distributed Queries

5.1 Single-Shard Queries

SELECT * FROM users WHERE user_id = 12345;

→ determine shard by user_id → query only that shard. Very fast.

5.2 Cross-Shard Queries

SELECT COUNT(*) FROM users WHERE country = 'KR';

query all shards → aggregate results. Slow + heavy load.

5.3 Difficulty of JOINs

SELECT u.name, o.total
FROM users u
JOIN orders o ON o.user_id = u.user_id
WHERE u.country = 'KR';

What if users and orders live on different shards? You need a distributed JOIN.

Strategies:

  1. Co-location: keep users and orders with the same user_id on the same shard
  2. Denormalization: copy user info into orders
  3. Distributed query engine: Vitess, Citus handle it automatically
  4. CQRS: use a separate DW (Snowflake) for queries

5.4 Distributed Transactions

def transfer(from_user, to_user, amount):
    # The two users may live on different shards
    db.update(f"... WHERE user_id={from_user}", -amount)
    db.update(f"... WHERE user_id={to_user}", +amount)

Problem: first update succeeds + second fails → money disappears.

Solutions:

1. 2PC (Two-Phase Commit):

  • Prepare → Commit
  • All shards must agree to commit
  • Slow + single point of failure → not recommended

2. Saga Pattern:

  • Compensating transactions
  • Eventually consistent
  • De facto standard in practice

3. Keep in the same shard:

  • When possible, put sender/receiver in the same shard
  • Not always feasible

6. Resharding — the Scariest Operation

6.1 Why is Resharding Hard?

  • Data movement: moving tens of TB
  • Zero downtime: no user impact
  • Rollback: recover from problems
  • Consistency: handle writes/reads mid-migration

6.2 Resharding Scenarios

1. Adding shards: 4 → 8 shards 2. Splitting a shard: dividing a huge shard in two 3. Merging shards: combining small shards 4. Changing the shard key: the hardest of all

6.3 Resharding Patterns

1. Bulk Copy + Switchover:

  1. Copy data to new shards
  2. Short maintenance window
  3. Switch traffic
  4. Drop old shards

Downside: downtime.

2. CDC + Dual Write:

  1. Sync old → new shards via CDC
  2. Application writes to both old + new (dual write)
  3. Verify data parity
  4. Switch reads to new
  5. Retire old shards

Zero downtime, complex.

3. Online Migration:

  • Vitess's VReplication
  • Automated zero-downtime migration
  • Progress monitoring

6.4 Case: Slack's Sharding Evolution

  • 2017: single MySQL → replicas
  • 2018: first sharding (workspace_id)
  • 2020: introduced Vitess
  • 2022: handled whale workspaces
  • 2024: thousands of shards, petabytes

Lesson: sharding is never one and done. It is a continuous evolution.


7. Vitess — a Distributed Layer on Top of MySQL

7.1 What is Vitess?

A MySQL sharding solution built by YouTube. CNCF graduated.

[Client]
[VTGate] (query router)
[VTTablet] [VTTablet] [VTTablet]
   ↓        ↓          ↓
[MySQL]  [MySQL]    [MySQL]

Pros:

  • MySQL compatible — existing code works as is
  • Automatic sharding — key-based routing
  • Schema migrations — safe DDL
  • VReplication — zero-downtime resharding
  • Battle-tested at scale — YouTube, Slack, GitHub, Square

7.2 Companies Using It

  • YouTube (creator, petabytes)
  • Slack (large scale)
  • GitHub (migrated MySQL → Vitess)
  • Square
  • Pinterest
  • JD.com (China's largest e-commerce)

7.3 VSchema Example

{
  "sharded": true,
  "vindexes": {
    "hash": {
      "type": "hash"
    }
  },
  "tables": {
    "users": {
      "column_vindexes": [
        {
          "column": "user_id",
          "name": "hash"
        }
      ]
    }
  }
}

→ VTGate automatically shards based on user_id.


8. Other Distributed DB Options

8.1 Citus (PostgreSQL)

A PostgreSQL extension. Supports distributed queries and distributed transactions.

SELECT create_distributed_table('events', 'user_id');

Pros: PostgreSQL compatible, keep SQL as is. Cons: inherits PostgreSQL limitations (less mature than Vitess).

8.2 CockroachDB

Designed from scratch as a distributed SQL DB.

Pros:

  • PostgreSQL compatible
  • Automatic sharding (range-based)
  • Global distribution support
  • Distributed ACID transactions

Cons:

  • Some PG features unsupported
  • More expensive to operate
  • Learning curve

Used by: DoorDash, Bose, Comcast.

8.3 YugabyteDB

Similar to CockroachDB. Both PostgreSQL compatible and Cassandra compatible.

8.4 TiDB

MySQL compatible + automatic distribution. Popular in China.

8.5 Comparison Table

VitessCitusCockroachDBYugabyteDBTiDB
BaseMySQLPostgreSQLCustomCustomCustom
CompatibilityMySQLPostgreSQLPostgreSQLPG + CQLMySQL
Distributed TransactionsLimitedYesYesYesYes
Automatic ShardingManual keyManual keyAutomaticAutomaticAutomatic
Operational DifficultyModerateLowModerateModerateModerate
ValidationYouTube, SlackManyDoorDashManyChinese giants

9. In Practice — Designing E-commerce Sharding

9.1 Scenario

  • 100 million users
  • 10 million daily orders
  • Single PostgreSQL hitting its limits

9.2 Analysis

Tables:

  • users (100M rows)
  • orders (1B rows)
  • products (1M rows)
  • reviews (5B rows)

9.3 Sharding Decisions

1. users:

  • Shard key: user_id (hash)
  • 16 shards
  • Even distribution

2. orders:

  • Shard key: user_id (hash, same shard as users)
  • Same user's user + orders on the same shard → JOIN is fast

3. products:

  • Small (1M rows)
  • No sharding — single DB + read replicas
  • Or replicate across all shards

4. reviews:

  • Shard key: product_id (hash)
  • Fits product-based analytics

9.4 Query Patterns

-- User orders (same shard)
SELECT * FROM orders WHERE user_id = 12345;
-- Fast

-- User info + orders (same shard, JOIN OK)
SELECT u.*, o.*
FROM users u JOIN orders o ON o.user_id = u.user_id
WHERE u.user_id = 12345;
-- Fast

-- Product reviews (different shard)
SELECT * FROM reviews WHERE product_id = 67890;
-- Fast

-- All reviews written by a user (cross-shard)
SELECT * FROM reviews WHERE user_id = 12345;
-- Slow — scans every reviews shard

-- Solution: denormalize. Store review summaries on the users shard too.

9.5 Infrastructure

[Application]
[ProxySQL / Vitess]
[16 user shards]
[16 order shards (co-located)]
[1 products DB + 5 read replicas]
[8 reviews shards]

9.6 Monitoring

  • Query count per shard
  • Disk usage per shard
  • Cross-shard query ratio
  • Hot spot detection

10. Sharding Pitfalls and Lessons

10.1 Sharding Too Early

Wrong decision: sharding at 1000 QPS of traffic.

Reality:

  • Single PostgreSQL = 50,000 QPS+
  • Read replica = 200,000 QPS+
  • Proper indexes + caching = even more

Try every other optimization first before sharding.

10.2 Wrong Shard Key

GitHub's first sharding attempt failure case:

  • Sharded by repo_id → popular repos became hot spots
  • Linus's linux repository overwhelmed a single shard
  • Resharding required (months of work)

Lesson: the shard key is extremely hard to change. Be careful up front.

10.3 Avoiding Distributed Transactions

Distributed transactions really are hard. Whenever possible, keep transactions within a single shard.

While designing: shape the data model so that every write transaction fits in a single shard.

10.4 Operational Complexity

A sharded system means:

  • Complex backups (16 shards simultaneously)
  • Complex monitoring
  • Harder debugging (which shard?)
  • Harder migrations

Automation is essential. Have enough operations staff.

10.5 Is NewSQL the Answer?

NewSQL systems like CockroachDB and YugabyteDB offer automatic sharding + distributed transactions. None of the complexity of manual sharding.

When to go NewSQL:

  • New projects
  • Strong consistency required
  • Limited operations staff

When to go manual sharding:

  • Existing MySQL/PostgreSQL
  • Extreme performance needed
  • Concerns about NewSQL maturity

Quiz

1. Why can sharding not be replaced by read replicas?

Answer: read replicas only distribute reads. Writes still go through a single Primary. A system where writes are the bottleneck (e.g. 1 billion INSERTs per day) cannot be fixed with read replicas. Only sharding can spread writes across multiple machines. That said, sharding brings enormous complexity, so first check whether read replicas + caching + index optimization can solve the problem.

2. What are the pros and cons of hash sharding?

Answer: Pros: even distribution (with a good hash). Safe even with monotonically increasing keys. Cons: no range queries (WHERE id BETWEEN ...). Adding a shard redistributes nearly all data (because % num_shards changes). Solution: Consistent Hashing — only a fraction (1/N) of the data moves on add/remove. Used by Cassandra, DynamoDB, and Riak. Virtual nodes give more even distribution.

3. How do you solve the hot spot problem?

Answer: (1) Change the shard key — best, but requires resharding (hard), (2) Composite keyhash(user_id + timestamp_bucket) to spread across time/users, (3) Sub-sharding — split hot shards finer ("whale shard"), (4) Caching — move hot data to Redis, (5) Read replicas — distribute read load. The most common hot spots: popular users, popular products, monotonically increasing keys. Always review hot spot scenarios when choosing a shard key.

4. Why did Vitess become the standard for MySQL distribution?

Answer: (1) Born at YouTube — proven at massive scale, (2) MySQL compatible — existing code/tools keep working, (3) Automatic sharding — key-based routing, (4) VReplication — zero-downtime migrations and resharding, (5) CNCF graduated — cloud-native standard, (6) Proven users — Slack, GitHub, Square. Downside: distributed transactions are relatively weak (the Saga pattern is recommended). PostgreSQL users use Citus for a similar role.

5. When should you use NewSQL (CockroachDB) vs. manual sharding?

Answer: Pick NewSQL (CockroachDB, YugabyteDB) when: new project, strong consistency required (distributed ACID), limited operations staff, PostgreSQL compatibility is acceptable. Pick manual sharding (Vitess + MySQL) when: existing MySQL system, extreme performance needed, worried about NewSQL maturity, want more control. Reality: new systems increasingly go NewSQL; existing huge systems stick with Vitess. Both are valid — decide based on trade-offs.


References

현재 단락 (1/406)

- **Sharding = the answer to horizontal scaling**: essential at a scale where a single DB cannot cop...

작성 글자: 0원문 글자: 15,977작성 단락: 0/406