✍️ 필사 모드: Consistent Hashing Complete Guide 2025: Virtual Nodes, Jump Hash, Rendezvous, and the Core Building Block of Distributed Systems
EnglishIntroduction: Why Do We Need Consistent Hashing?
The Problem: The Trap of Hash-Based Sharding
Suppose you operate four cache servers and distribute keys with a simple hash:
def get_server(key, num_servers):
return hash(key) % num_servers
# Distribute across 4 servers
get_server("user:1234", 4) # server 2
get_server("user:5678", 4) # server 0
get_server("user:9012", 4) # server 3
Works fine in steady state. But what happens when traffic grows and you scale up to 5 servers?
get_server("user:1234", 5) # server 4 (was 2 → now 4)
get_server("user:5678", 5) # server 3 (was 0 → now 3)
get_server("user:9012", 5) # server 2 (was 3 → now 2)
Almost every key moves. Cache misses explode, traffic floods the database, and the service collapses. This is called the Rehashing Storm.
The Promise of Consistent Hashing
Consistent Hashing, published in 1997 by David Karger and colleagues at MIT, solves this elegantly:
When nodes change from N to N+1, only K/N keys on average need to move. (K is the total number of keys.)
So scaling from 4 to 5 servers moves only about 1/5 of keys. The other 4/5 stay put. This principle became the foundation of countless distributed systems: Amazon DynamoDB, Apache Cassandra, Riak, Memcached, Akamai CDN, and more.
graph LR
A[Key] --> B{Hashing method}
B -->|modulo| C[80% keys move<br/>Disaster]
B -->|Consistent Hash| D[20% keys move<br/>Normal]
1. The Core Idea: The Hash Ring
Place Nodes and Keys on a Circular Space
The core idea of Consistent Hashing is surprisingly simple:
- Treat the output space of the hash function as a circle (0 to 2^32-1).
- Hash each node (server) and place it on a point on the ring.
- Hash each key (data) and place it on a point on the ring.
- A key is stored on the nearest node in the clockwise direction.
0 / 2^32
|
NodeA |
---+----+----+---
| | |
| | |
NodeD NodeB
| | |
| | |
---+----+----+---
NodeC |
|
Basic Python Implementation
import hashlib
from bisect import bisect_right, insort
class ConsistentHashBasic:
def __init__(self):
self.ring = {} # hash → node
self.sorted_keys = []
def _hash(self, key: str) -> int:
return int(hashlib.md5(key.encode()).hexdigest(), 16)
def add_node(self, node: str):
h = self._hash(node)
self.ring[h] = node
insort(self.sorted_keys, h)
def remove_node(self, node: str):
h = self._hash(node)
del self.ring[h]
self.sorted_keys.remove(h)
def get_node(self, key: str) -> str:
if not self.ring:
return None
h = self._hash(key)
# Find the nearest node clockwise
idx = bisect_right(self.sorted_keys, h)
if idx == len(self.sorted_keys):
idx = 0 # Wrap around the ring
return self.ring[self.sorted_keys[idx]]
# Example usage
ch = ConsistentHashBasic()
for node in ["server-A", "server-B", "server-C", "server-D"]:
ch.add_node(node)
print(ch.get_node("user:1234")) # server-C
print(ch.get_node("user:5678")) # server-A
Impact of Adding/Removing Nodes
When you add node E:
- E's hash value lands somewhere between existing nodes.
- Only keys between E and the preceding node move to E.
- The rest of the keys stay put.
On average only K/N (1/5) keys move. That's the magic.
2. Problems With the Basic Approach: Imbalance
Basic Consistent Hashing isn't perfect. The biggest issue is uneven key distribution.
Problem 1: Skew With Few Nodes
With only 4 nodes on the ring, even if hash values are uniformly distributed, the size of the "arc" each node owns varies wildly:
NodeA: 0 ~ 5 (5%)
NodeB: 5 ~ 40 (35%) ← heavy load
NodeC: 40 ~ 50 (10%)
NodeD: 50 ~ 100 (50%) ← half the ring
Result: NodeD gets half of all traffic, and NodeA sits nearly idle.
Problem 2: Worsening Skew on Node Removal
If NodeB goes down, all of B's keys pile onto NodeC. A hotspot forms.
Problem 3: Hard to Reflect Heterogeneous Server Capacity
Some servers might have 16 cores and 64GB RAM; others 4 cores and 16GB. The basic scheme can't express capacity ratios.
3. The Fix: Virtual Nodes (vnodes)
The Idea
Represent each physical node as multiple virtual nodes scattered on the ring. For example, with 150 vnodes per node:
- 10 physical nodes × 150 = 1,500 virtual nodes distributed on the ring.
- By the law of large numbers, each physical node's arc becomes uniform.
4 physical nodes (150 vnodes each)
Distribution on ring:
A-1, B-7, C-3, A-99, D-42, B-15, C-88, D-1, A-15, ...
(nearly uniformly interleaved)
Virtual Node Implementation
class ConsistentHashWithVNodes:
def __init__(self, vnodes_per_node: int = 150):
self.vnodes_per_node = vnodes_per_node
self.ring = {}
self.sorted_keys = []
def _hash(self, key: str) -> int:
return int(hashlib.md5(key.encode()).hexdigest(), 16)
def add_node(self, node: str, weight: int = 1):
# Higher weight → more vnodes
count = self.vnodes_per_node * weight
for i in range(count):
vnode_key = f"{node}#{i}"
h = self._hash(vnode_key)
self.ring[h] = node
insort(self.sorted_keys, h)
def remove_node(self, node: str, weight: int = 1):
count = self.vnodes_per_node * weight
for i in range(count):
vnode_key = f"{node}#{i}"
h = self._hash(vnode_key)
del self.ring[h]
self.sorted_keys.remove(h)
def get_node(self, key: str) -> str:
if not self.ring:
return None
h = self._hash(key)
idx = bisect_right(self.sorted_keys, h)
if idx == len(self.sorted_keys):
idx = 0
return self.ring[self.sorted_keys[idx]]
# Example: servers with different capacities
ch = ConsistentHashWithVNodes(vnodes_per_node=150)
ch.add_node("server-A", weight=2) # 2x performance → 300 vnodes
ch.add_node("server-B", weight=1) # standard → 150 vnodes
ch.add_node("server-C", weight=1)
ch.add_node("server-D", weight=1)
# Now A gets ~40% of keys, the rest get ~20% each
Vnode Count Selection Guide
| Vnode count | Std. deviation | Memory overhead |
|---|---|---|
| 10 | ±30% | low |
| 100 | ±10% | medium |
| 160 | ±6% | balanced (Ketama default) |
| 1000 | ±2% | high |
In practice, 100-200 is the sweet spot. Memcached's libketama uses 160; Cassandra defaults to 256 (num_tokens).
4. Benchmark: How Even Is the Actual Distribution?
import random
import statistics
def benchmark(vnodes_per_node: int, num_nodes: int, num_keys: int):
ch = ConsistentHashWithVNodes(vnodes_per_node)
for i in range(num_nodes):
ch.add_node(f"node-{i}")
counts = {f"node-{i}": 0 for i in range(num_nodes)}
for k in range(num_keys):
key = f"key-{random.random()}"
node = ch.get_node(key)
counts[node] += 1
values = list(counts.values())
avg = statistics.mean(values)
stddev = statistics.stdev(values)
max_val = max(values)
min_val = min(values)
print(f"vnodes={vnodes_per_node}: "
f"min={min_val}, max={max_val}, "
f"stddev={stddev:.0f} ({stddev/avg*100:.1f}% of avg)")
# Distribute 1,000,000 keys across 10 nodes
benchmark(vnodes_per_node=1, num_nodes=10, num_keys=1_000_000)
benchmark(vnodes_per_node=10, num_nodes=10, num_keys=1_000_000)
benchmark(vnodes_per_node=100, num_nodes=10, num_keys=1_000_000)
benchmark(vnodes_per_node=500, num_nodes=10, num_keys=1_000_000)
Expected output:
vnodes=1: min=30000, max=250000, stddev=70000 (70.0% of avg)
vnodes=10: min=60000, max=180000, stddev=35000 (35.0% of avg)
vnodes=100: min=92000, max=110000, stddev=5800 (5.8% of avg)
vnodes=500: min=97000, max=103000, stddev=2000 (2.0% of avg)
With just 100 virtual nodes you can achieve ±6% balance.
5. Jump Consistent Hash: Uniform Without Virtual Nodes
In 2014, John Lamping and Eric Veach at Google published Jump Consistent Hash, which achieves O(1) space and nearly perfect uniform distribution without virtual nodes.
Striking Properties
- Memory: O(1) (no node list needed!)
- Time: O(log N) (N is node count)
- Distribution: nearly perfect
- Downside: inconvenient node removal (only the last node can be removed)
The Jump Hash Algorithm
def jump_consistent_hash(key: int, num_buckets: int) -> int:
"""
Google Jump Consistent Hash
key: hashed integer
num_buckets: number of buckets (nodes)
"""
b = -1
j = 0
while j < num_buckets:
b = j
key = (key * 2862933555777941757 + 1) & 0xFFFFFFFFFFFFFFFF
j = int((b + 1) * (1 << 31) / ((key >> 33) + 1))
return b
# Example
import hashlib
def hash_key(key: str) -> int:
return int(hashlib.md5(key.encode()).hexdigest()[:16], 16)
num_nodes = 10
key = hash_key("user:1234")
bucket = jump_consistent_hash(key, num_nodes)
print(f"user:1234 → bucket {bucket}")
Why Does This Work?
The core of Jump Hash is a pseudo-random probabilistic skip. Each iteration probabilistically decides the "next jump position." Mathematically proven properties:
- When node count changes from N to N+1, exactly K/(N+1) keys move to the new node.
- Distribution is perfectly uniform (no virtual nodes required).
Limitation: Node Removal
Jump Hash can only remove the last node (highest index). Removing a middle node shifts all following indices, breaking every key mapping.
To work around this, add a Lookup Table or use Rendezvous Hashing, described next.
6. Rendezvous Hashing (HRW: Highest Random Weight)
In 1997, David Thaler and Chinya Ravishankar proposed Rendezvous Hashing, another elegant solution.
The Idea
For each key, compute the hash with every node and pick the node with the highest (or lowest) value.
def rendezvous_hash(key: str, nodes: list) -> str:
"""
Assign each key to the node with the highest weight
"""
if not nodes:
return None
max_weight = -1
selected_node = None
for node in nodes:
combined = f"{key}:{node}"
h = int(hashlib.md5(combined.encode()).hexdigest(), 16)
if h > max_weight:
max_weight = h
selected_node = node
return selected_node
# Example
nodes = ["server-A", "server-B", "server-C", "server-D"]
print(rendezvous_hash("user:1234", nodes)) # e.g., "server-C"
Pros and Cons
Pros:
- Extremely simple (just needs a hash function)
- Perfectly uniform distribution
- Only K/N keys move on node add/remove
- Unlike Jump Hash, any node can be removed
- Great for replica selection: picking the Top-K nodes immediately gives K replica placements.
Cons:
- O(N) time complexity: slow with many nodes.
- Not suitable beyond 1000 nodes.
Replica Selection
Where Rendezvous Hashing shines is Top-K node selection:
def rendezvous_hash_topk(key: str, nodes: list, k: int) -> list:
weights = []
for node in nodes:
h = int(hashlib.md5(f"{key}:{node}".encode()).hexdigest(), 16)
weights.append((h, node))
weights.sort(reverse=True)
return [node for _, node in weights[:k]]
# Pick 3 nodes for 3 replicas
replicas = rendezvous_hash_topk("user:1234", nodes, k=3)
print(replicas) # ["server-C", "server-A", "server-D"]
This is similar to the idea used in Cassandra's replica placement strategy.
7. Anchor Hashing: A Modern Algorithm (2018)
Anchor Hashing, published in 2018 by Gal Mendelson et al., provides both Jump Hash's speed and Rendezvous's flexible node removal.
Key properties:
- O(log N) time complexity
- O(N) space (constant memory per node)
- Arbitrary node removal
- Perfectly uniform distribution
The real implementation is complex, so we only mention it here. Maglev Hash (Google) falls in the same category.
8. How Is This Used in Real Systems?
DynamoDB: Basic Consistent Hashing + vnodes
Amazon DynamoDB follows the design of the Amazon Dynamo paper (2007):
- Circular 128-bit hash space.
- Each physical node holds many virtual nodes (tokens).
- Each key is replicated to the next N clockwise nodes (Preference List).
- On node add/remove, data only moves between adjacent nodes.
Cassandra: Token Ring
Cassandra has a similar structure but uses slightly different terminology:
- Token = virtual node (Murmur3 64-bit hash)
- Default
num_tokens: 256→ each physical node owns 256 tokens. - Replication is configured via Replication Strategy (SimpleStrategy, NetworkTopologyStrategy).
# cassandra.yaml
num_tokens: 256
partitioner: org.apache.cassandra.dht.Murmur3Partitioner
Memcached (libketama): The Virtual Node Standard
libketama, built by Last.fm, is a Memcached client library that became the de facto standard for Consistent Hashing:
- 160 virtual nodes per physical node.
- MD5 hash (128 bits truncated to 32 bits).
- Server info string → MD5 → four 32-bit integers → four virtual nodes.
Nginx: upstream consistent hash
Nginx supports Consistent Hashing for upstream load balancing:
upstream backend {
hash $request_uri consistent;
server backend1.example.com;
server backend2.example.com;
server backend3.example.com;
}
The consistent keyword enables Ketama-based Consistent Hashing.
Envoy / HAProxy / Istio
Envoy's Ring Hash LB Policy and Maglev LB Policy implement Ketama-style and Google Maglev-style Consistent Hashing respectively.
# Envoy config
load_balancing_policy:
policies:
- typed_extension_config:
name: envoy.load_balancing_policies.ring_hash
typed_config:
"@type": type.googleapis.com/envoy.extensions.load_balancing_policies.ring_hash.v3.RingHash
minimum_ring_size: 1024
9. Common Mistakes and Pitfalls
Pitfall 1: Choice of Hash Function
Bad choice:
def bad_hash(key):
return hash(key) # Python's built-in hash() changes on restart!
Python's built-in hash() uses a different seed per process for security reasons. In a distributed system, having the mapping change on every restart is catastrophic.
Good choices:
- MD5, SHA-1: slow but good distribution.
- MurmurHash3: fast with good distribution (used in Cassandra).
- xxHash: very fast (increasingly preferred).
Pitfall 2: Too Few Virtual Nodes
4 nodes × 1 vnode = 4 → severely unbalanced distribution. Always use at least 100 virtual nodes per node.
Pitfall 3: String Format of Node IDs
If vnode name format isn't consistent, the mapping isn't reproducible:
# Bad
f"{node}-{i}" # node1-0, node1-1, ...
f"{node}_{i}" # node1_0, node1_1, ... (different system)
# Good: document team/org standard
f"{node}#{i}" # explicitly documented
Pitfall 4: Hot Keys
Consistent Hashing only works well when key distribution is uniform. If one key (e.g., user:celebrity_account) gets huge traffic, one node is overloaded.
Solutions:
- Key Salting:
user:celebrity_account:{random_suffix}to spread across shards. - Local Cache: reduce distributed cache calls with in-app caching.
- Read Replica: add read-only replicas.
Pitfall 5: Rapid Cluster Churn
In environments like Kubernetes where nodes are frequently created/destroyed, data movement happens every time. Rebalancing Throttling is needed:
- Warm-up period: new nodes gradually take traffic.
- Cooldown: wait for data to fully migrate after node removal.
10. Going Further: Weighted Consistent Hashing
Maglev (Google)
Google's Maglev was designed with these requirements:
- Perfect uniformity (across similarly weighted servers)
- Minimum disruption (on node changes)
- O(1) lookup (fast runtime queries)
Maglev computes a preference permutation for each server and uses it to build a lookup table of size M. Lookup is lookup_table[hash(key) % M] in O(1).
# Conceptual pseudocode
def build_maglev_table(servers, table_size=65537):
# table_size should be a prime
permutations = {
s: generate_permutation(s, table_size) for s in servers
}
table = [None] * table_size
next_idx = {s: 0 for s in servers}
filled = 0
while filled < table_size:
for s in servers:
while table[permutations[s][next_idx[s]]] is not None:
next_idx[s] += 1
table[permutations[s][next_idx[s]]] = s
next_idx[s] += 1
filled += 1
if filled == table_size:
break
return table
Maglev is available as Envoy's maglev load balancer policy and is used internally by Google Cloud Load Balancer.
Assigning Weights
With heterogeneous servers (e.g., 16-core vs. 4-core), reflect weights by:
- Vnode approach:
vnodes = base * weight - Maglev approach: use the permutation
weighttimes. - Rendezvous weighting: bias the hash value by multiplying by log(weight).
11. Performance Comparison: Which Algorithm to Choose?
| Algorithm | Time | Space | Uniformity | Node removal | Main uses |
|---|---|---|---|---|---|
| Modulo Hash | O(1) | O(1) | Perfect | Full rebalance | Simple sharding |
| Consistent + vnode | O(log V) | O(V) | Good (with higher V) | Arbitrary | Cassandra, DynamoDB |
| Jump Hash | O(log N) | O(1) | Perfect | Last only | Google internal |
| Rendezvous Hash | O(N) | O(N) | Perfect | Arbitrary | Small clusters |
| Maglev | O(1) lookup | O(M) | Near perfect | Arbitrary | Google Cloud LB |
| Anchor Hashing | O(log N) | O(N) | Perfect | Arbitrary | Modern systems |
Selection Guide
- Nodes
< 100, frequent churn: Rendezvous Hashing (simple, flexible) - Large-scale cache/DB cluster: Consistent Hashing + vnode (battle-tested)
- Static node set, extreme performance: Jump Hash (O(log N))
- L4/L7 load balancer: Maglev (O(1) lookup)
- Heterogeneous capacity servers: vnode weights or weighted Maglev
12. Operational Tips: Lessons From Production
Tip 1: Never Change the Vnode Count
Changing num_tokens from 150 to 200 in production causes nearly full data rebalance. Pick carefully at initial cluster design time.
Tip 2: Monitor Distribution With Logs
Regularly measure how uniform the actual key distribution is:
# Prometheus metrics
consistent_hash_node_keys{node="server-A"} 12050
consistent_hash_node_keys{node="server-B"} 10800
consistent_hash_node_keys{node="server-C"} 11200
consistent_hash_node_keys{node="server-D"} 12500
If standard deviation spikes suddenly, suspect a hot key.
Tip 3: Graceful Shutdown
Don't kill a node abruptly when removing it:
- Mark it "draining" (remove from ring).
- Wait for existing connections to complete.
- Verify data migration.
- Terminate the process.
Tip 4: Hash Function Speed vs. Quality
At hundreds of thousands of lookups per second, MD5 can be slow. Benchmark:
| Hash function | Speed (MB/s) | Distribution quality |
|---|---|---|
| xxHash3 | ~30,000 | Good |
| MurmurHash3 | ~6,000 | Good |
| CityHash | ~10,000 | Good |
| MD5 | ~500 | Excellent |
| SHA-256 | ~400 | Excellent |
For cache sharding, xxHash or Murmur is appropriate.
Quiz Review
Q1. When scaling from 4 to 5 servers, what percentage of keys move under plain modulo hashing (hash % N)?
A. About 80% of keys move, because the results of hash % 4 and hash % 5 are largely unrelated. Consistent Hashing, in contrast, moves only about 20% (K/N = K/5).
Q2. Why are virtual nodes needed in basic Consistent Hashing?
A. With few nodes, the distribution of nodes on the ring is uneven, causing some nodes to get disproportionate keys. Scattering each physical node as 100-200 virtual nodes gives nearly uniform distribution by the law of large numbers. It also lets you assign weights by tuning vnode counts per server capacity.
Q3. What are the biggest pros and cons of Jump Consistent Hash?
A. Pros: O(1) space and perfectly uniform distribution. Works well without virtual nodes and is very fast. Cons: Only the last (highest-index) node can be removed. Removing a middle node breaks the entire mapping. If arbitrary node removal is required, use Rendezvous Hashing or Anchor Hashing.
Q4. Why is Rendezvous Hashing useful for selecting N replicas?
A. Rendezvous Hashing computes the weight (hash) of every node for each key. Sorting them and picking the top K naturally determines K replica placements. This has the same effect as "next K clockwise nodes" on a Consistent Hashing ring, but with a more uniform distribution.
Q5. How many virtual nodes per node does Memcached's libketama use by default, and why?
A. Ketama uses 160 vnodes per node. It's a balance between balance (std. dev ~5%) and memory overhead. In practice, it splits the MD5 result into four 4-byte integers, creating 4 virtual nodes per "replica." So 40 replicas × 4 = 160.
Closing: Lessons From Consistent Hashing
Consistent Hashing was conceived in 1997 at MIT to solve the CDN web-cache problem. It birthed Akamai, and has been a core building block of distributed systems for over 20 years.
Five Core Principles
- Circular hash space: placing nodes and keys in the same space to define adjacency.
- Clockwise assignment: keys go to the nearest clockwise node.
- Virtual nodes: uniform distribution via the law of large numbers.
- Minimum disruption: only K/N keys move on node add/remove.
- Weight flexibility: vnode counts reflect capacity differences.
Modern Evolution
- Jump Hash: O(1) space for extreme efficiency.
- Rendezvous Hashing: the pinnacle of simplicity.
- Maglev: O(1) lookup + weighted LB.
- Anchor Hashing: the latest improvement.
Remember
"The answer to 'where should this data go' in distributed systems" is almost always a variant of Consistent Hashing.
The Redis Cluster, DynamoDB, Cassandra, Memcached, and CDN you use today all run on this 30-year-old idea. Knowing the fundamentals changes the depth of debugging and design.
Next time you ask "why does this key go to that server?", picture the hash ring in your head. The answer is there.
References
- Karger et al., "Consistent Hashing and Random Trees" (1997) - The original paper
- Dynamo: Amazon's Highly Available Key-value Store (2007)
- Jump Consistent Hash (Google, 2014)
- Maglev: A Fast and Reliable Software Network Load Balancer (Google, 2016)
- Anchor Hashing (2018)
- libketama Source
- Cassandra Data Distribution
- Envoy Ring Hash Load Balancer
현재 단락 (1/380)
Suppose you operate four cache servers and distribute keys with a simple hash: