✍️ 필사 모드: Distributed Locks Complete Guide 2025: Redlock, Zookeeper, etcd, Fencing Token, Safe Implementation
EnglishTL;DR
- Distributed locks are genuinely hard: completely different from single-machine mutexes
- Single Redis lock: simple but single point of failure
- Redlock: Redis cluster-based, contested
- Fencing Token: guarantees true lock safety
- Zookeeper/etcd: safer but more complex
- Conclusion: avoid distributed locks when possible
1. Why Are Distributed Locks Hard?
1.1 Single-Machine Mutex
lock = threading.Lock()
with lock:
critical_section()
How it works:
- Lock variable in memory
- Atomic CPU instructions
- Managed by a single OS kernel
Guarantees:
- Only one thread at a time
- Automatic release (GC, death)
- Very fast (~10ns)
1.2 Differences in a Distributed Environment
[Server A] ←─ network ─→ [Lock Service] ←─ network ─→ [Server B]
Problems:
- Network latency: several ms
- Network partitions: packet loss
- Clock drift: clocks differ between machines
- GC pauses: JVM may stall for 1 second
- Failures: what if the lock holder dies?
- Partial failure: only some nodes die
→ All assumptions of a single machine break down.
1.3 Two Purposes of Distributed Locks
1. Efficiency:
- Avoid doing the same work twice
- Cost savings
- Lock failure is not a big deal (temporary inefficiency)
2. Correctness:
- Guarantee data consistency
- Lock failure = data corruption
- Requires very strict guarantees
Most people want efficiency but mistakenly believe they need correctness guarantees.
2. Single Redis Lock
2.1 The Simplest Attempt
def acquire_lock(key, ttl=10):
return redis.set(key, "locked", nx=True, ex=ttl)
def release_lock(key):
redis.delete(key)
# Usage
if acquire_lock("my-resource"):
try:
do_work()
finally:
release_lock("my-resource")
SET key value NX EX ttl:
NX: only if the key does not existEX: set TTL
Atomic: Redis guarantees it.
2.2 First Problem — Releasing Another Client's Lock
# Client A acquires the lock
acquire_lock("my-resource")
# Client A is working...
# Work takes 30 seconds (exceeds 10s TTL)
# Lock expires automatically
# Client B acquires the lock
acquire_lock("my-resource")
# Client A finishes and releases the lock
release_lock("my-resource")
# → Releases Client B's lock!
# Client C can now acquire the lock
# Client B and C work concurrently!
2.3 Fix — Lock Identifier
import uuid
def acquire_lock(key, ttl=10):
token = str(uuid.uuid4())
if redis.set(key, token, nx=True, ex=ttl):
return token
return None
def release_lock(key, token):
# Atomicity via Lua script
script = """
if redis.call('get', KEYS[1]) == ARGV[1] then
return redis.call('del', KEYS[1])
else
return 0
end
"""
redis.eval(script, 1, key, token)
Improvement: only releases your own lock.
2.4 Second Problem — Single Point of Failure
[Redis] ← if it dies, all locks are lost
↓
[Server A] [Server B] [Server C]
Replication:
[Master Redis]
↓ async replication
[Slave Redis]
Problem: async replication.
- Client A acquires lock on master
- Master dies (before replication)
- Slave is promoted to master (no lock)
- Client B acquires the lock
- Both hold the lock → data corruption
3. Redlock Algorithm
3.1 Salvatore Sanfilippo (Redis Creator)'s Answer
Use multiple independent Redis instances. Must acquire the lock on a majority to succeed.
[Redis 1] [Redis 2] [Redis 3] [Redis 4] [Redis 5]
↓ ↓ ↓ ↓ ↓
[Client] → try SET NX EX on all
→ 3/5 or more successes = lock acquired
3.2 Algorithm
def acquire_redlock(key, ttl=10000):
token = str(uuid.uuid4())
quorum = len(redis_clients) // 2 + 1 # 3/5
start_time = time.time() * 1000
acquired = []
for client in redis_clients:
try:
# Very short timeout
if client.set(key, token, nx=True, px=ttl):
acquired.append(client)
except:
pass
elapsed = time.time() * 1000 - start_time
if len(acquired) >= quorum and elapsed < ttl:
return token # Lock acquired
else:
# Failure - release everything
for client in acquired:
release_lock(client, key, token)
return None
3.3 Redlock's Assumptions
- Clocks are reasonably synchronized
- GC pauses are short
- Network latency < TTL
3.4 Martin Kleppmann's Critique (2016)
The famous blog post "How to do distributed locking":
1. GC Pause Problem:
1. Client A acquires lock (10s TTL)
2. Client A stalls 15s on GC (lock expires)
3. Client B acquires lock
4. Client A wakes up → thinks it still holds the lock
5. Client A and B work concurrently
2. Clock Jump:
1. Client A acquires lock (10s TTL)
2. Redis clock suddenly jumps 5s (NTP sync)
3. Lock expires early
4. Client B acquires lock
Kleppmann's conclusion: Redlock is unsuitable for correctness-critical systems.
3.5 Antirez's Rebuttal
Salvatore Sanfilippo's response:
- Use monotonic clocks (no jumps)
- GC pauses are a universal problem
- Good enough for efficiency purposes
Outcome of the debate: no consensus. Kleppmann's position is more widely accepted.
4. Fencing Token
4.1 Core Idea
Lock + monotonically increasing token.
1. Client A acquires lock, token = 33
2. Client A stalls on GC
3. Client B acquires lock, token = 34
4. Client B writes to storage (token 34)
5. Client A wakes up, writes to storage (token 33)
6. Storage sees 33 < 34 → rejects
4.2 Implementation
def acquire_lock_with_token(key):
token = redis.incr(f"{key}:token") # Monotonic
if redis.set(key, token, nx=True, ex=10):
return token
return None
def write_to_storage(data, token):
# Storage validates the token
if storage.last_token >= token:
raise StaleTokenError()
storage.write(data)
storage.last_token = token
4.3 Key Requirement
Storage must validate the token:
- DB tracks the token
- Rejects old tokens
Most storage systems do not support this → extra work needed.
4.4 Where Is It Used?
- HBase: fencing combined with ZooKeeper
- Apache Cassandra: lightweight transactions
- PostgreSQL: row versions (optimistic locking)
5. ZooKeeper-Based Locks
5.1 ZooKeeper's Strengths
- CP system (consistency first)
- Linearizable
- Ephemeral nodes (auto-deleted when client dies)
- Watches (change notifications)
5.2 Lock Implementation
Recipe: sequential ephemeral nodes.
def acquire_lock(zk, lock_path):
# 1. Create sequential ephemeral node
my_node = zk.create(
f"{lock_path}/lock-",
ephemeral=True,
sequence=True
)
# e.g. /lock/lock-0000000005
while True:
# 2. Get all child nodes
children = sorted(zk.get_children(lock_path))
# 3. If I'm smallest, acquire the lock
if my_node.endswith(children[0]):
return my_node
# 4. Otherwise watch the node right before me
my_index = children.index(my_node.split('/')[-1])
prev_node = children[my_index - 1]
zk.exists(f"{lock_path}/{prev_node}", watch=callback)
wait_for_callback()
5.3 ZooKeeper Advantages
1. Automatic release:
- Ephemeral node is auto-deleted on client death
- No zombie locks
2. Accurate ordering:
- Sequence numbers guarantee fairness
3. Network partition safety:
- ZooKeeper prevents split-brain
- If the lock holder is separated from the quorum, it cannot hold the lock
5.4 ZooKeeper Disadvantages
- Operationally complex: separate cluster
- Performance: slower than Redis
- GC pauses still apply: same problem if the client GCs
5.5 Curator Framework (Java)
import org.apache.curator.framework.recipes.locks.InterProcessMutex;
InterProcessMutex lock = new InterProcessMutex(client, "/my-lock");
if (lock.acquire(10, TimeUnit.SECONDS)) {
try {
// work
} finally {
lock.release();
}
}
Curator abstracts away ZooKeeper's complexity.
6. etcd-Based Locks
6.1 Characteristics of etcd
- CP system (Raft-based)
- Linearizable
- Lease (similar to TTL)
- Watches
6.2 Lock Implementation
import etcd3
client = etcd3.client()
# Create lease (10 seconds)
lease = client.lease(ttl=10)
# Acquire lock
lock = client.lock("my-lock", ttl=10)
lock.acquire()
try:
do_work()
finally:
lock.release()
6.3 Comparison with ZooKeeper
| ZooKeeper | etcd | |
|---|---|---|
| Language | Java | Go |
| Consensus | ZAB | Raft |
| Operations | Complex | Relatively simple |
| Usage | Hadoop ecosystem | Kubernetes |
| API | Java-centric | gRPC |
Kubernetes uses etcd → etcd is increasingly becoming the standard.
7. Database-Based Locks
7.1 SELECT FOR UPDATE
BEGIN;
SELECT * FROM accounts WHERE id = 123 FOR UPDATE;
-- Hold the lock, other transactions wait
UPDATE accounts SET balance = balance - 100 WHERE id = 123;
COMMIT;
-- Release the lock
Pros:
- DB handles it
- Integrated with transactions
- Auto-release (commit/rollback)
Cons:
- DB load
- Performance degradation under lock contention
- Distributed transactions are difficult
7.2 Advisory Locks (PostgreSQL)
-- Acquire lock
SELECT pg_advisory_lock(12345);
-- Work
UPDATE ...;
-- Release lock
SELECT pg_advisory_unlock(12345);
Pros:
- Very fast
- In-memory lock (not a row lock)
- Auto-released when session ends
Use cases: background job synchronization, migrations.
7.3 Optimistic Locking
-- 1. Read data and version
SELECT *, version FROM users WHERE id = 1;
-- version = 5
-- 2. Update after the work
UPDATE users SET name = 'New', version = version + 1
WHERE id = 1 AND version = 5;
-- If affected rows = 0, fail (someone else modified first)
Pros:
- No locks (no waiting)
- Good concurrency
- Distribution-friendly
Cons:
- Retry on conflict
- Inefficient under frequent conflicts
8. Patterns and Pitfalls
8.1 Lock Renewal (Heartbeat)
For long-running work:
import threading
class RenewingLock:
def __init__(self, key, ttl=10):
self.key = key
self.token = None
self.ttl = ttl
self.stop_renewal = False
def acquire(self):
self.token = acquire_lock(self.key, self.ttl)
if self.token:
self.renewal_thread = threading.Thread(target=self._renew)
self.renewal_thread.start()
return self.token
def _renew(self):
while not self.stop_renewal:
time.sleep(self.ttl / 3)
redis.expire(self.key, self.ttl)
def release(self):
self.stop_renewal = True
release_lock(self.key, self.token)
Problem: renewal cannot run during a GC pause → lock expires. Still not safe.
8.3 Backoff with Jitter
import random
def acquire_with_retry(key, max_retries=10):
for attempt in range(max_retries):
if token := acquire_lock(key):
return token
# Exponential backoff + jitter
delay = (2 ** attempt) * 0.1 + random.uniform(0, 0.5)
time.sleep(delay)
return None
Jitter matters: without it, all clients retry simultaneously — thundering herd.
8.3 Reentrant Lock
Allows the same client to re-enter.
class ReentrantLock:
def __init__(self, key):
self.key = key
self.count = 0
self.token = None
def acquire(self):
if self.count > 0:
self.count += 1
return True
self.token = acquire_lock(self.key)
if self.token:
self.count = 1
return True
return False
def release(self):
self.count -= 1
if self.count == 0:
release_lock(self.key, self.token)
self.token = None
8.4 Deadlock Avoidance
When using multiple locks:
# Wrong - deadlock possible
def transfer(from_id, to_id, amount):
lock1 = acquire_lock(f"account:{from_id}")
lock2 = acquire_lock(f"account:{to_id}")
# ...
# Two users simultaneously transferring to each other:
# A → B: lock A, lock B
# B → A: lock B, lock A
# → Deadlock
# Correct - sorted ordering
def transfer(from_id, to_id, amount):
first, second = sorted([from_id, to_id])
lock1 = acquire_lock(f"account:{first}")
lock2 = acquire_lock(f"account:{second}")
# ...
9. How to Avoid Distributed Locks
9.1 Single Ownership
Each task is processed by only one worker:
[Job Queue]
↓
[Worker 1] (jobs A, B)
[Worker 2] (jobs C, D)
[Worker 3] (jobs E, F)
Partitioning:
- Hash by user ID → which worker?
- Same user always goes to the same worker
- No locks needed
9.2 Idempotency
def process_payment(payment_id, amount):
if db.exists(f"processed:{payment_id}"):
return # Already processed
db.atomic_set(f"processed:{payment_id}", True)
actually_charge(amount)
Safe to run multiple times → no locks needed.
9.3 Compare-and-Swap (CAS)
UPDATE inventory
SET quantity = quantity - 1
WHERE product_id = 123 AND quantity > 0;
Atomic, no locks. If affected rows = 0, no inventory.
9.4 Event Sourcing
Append events instead of mutating state:
events.append(OrderCreated(...))
events.append(InventoryReserved(...))
events.append(PaymentCharged(...))
Append-only, almost no locks. State is computed by replaying events.
9.5 Actor Model
Each actor owns its state and a mailbox.
class AccountActor extends Actor {
var balance: BigDecimal = 0
def receive = {
case Deposit(amount) => balance += amount
case Withdraw(amount) => balance -= amount
}
}
One message at a time → no locks needed.
Erlang/Elixir, Akka follow this model.
10. Recommendations
10.1 Decision Tree
Do you really need a distributed lock?
├─ No → use idempotency, CAS, partitioning
└─ Yes
├─ Efficiency goal (roughly OK)
│ └─ Single Redis lock (simple)
└─ Correctness goal (never OK)
├─ ZooKeeper / etcd
└─ Fencing token (if storage supports)
10.2 Kleppmann's Conclusion
"If efficiency is the goal, Redlock may be sufficient. If correctness is the goal, use consensus algorithms (ZAB, Raft)."
10.3 Practical Recommendations
Most cases:
- Simple Redis lock (
SET NX EX) - Short TTL
- Idempotency as backup
- Systems where "lock failure is not catastrophic"
Critical cases:
- ZooKeeper / etcd
- Fencing token
- Use transactions
- Guarantee correctness in code
Quiz
1. What is Redlock's core problem?
Answer: GC pauses and clock jumps. Martin Kleppmann's 2016 critique: (1) GC pause — Client A acquires a lock, then stalls 15s on GC, the lock expires, Client B acquires the lock, A wakes up thinking it still owns the lock → both work concurrently → data corruption. (2) Clock jump — NTP sync causes the clock to jump, expiring the lock early. Conclusion: Redlock is OK for efficiency, unsuitable for correctness. For correctness, use ZooKeeper/etcd plus a fencing token.
2. How does a Fencing Token guarantee safety?
Answer: The lock is issued together with a monotonically increasing token. When the lock holder writes to storage, it passes the token along. Storage validates the token — writes from old tokens are rejected. Scenario: Client A token=33 → GC pause → Client B token=34 → B writes to storage (last_token=34) → A wakes up and writes to storage (token=33) → storage sees 33 < 34 and rejects. Key requirement: storage must track the token. Most DBs do not natively support this, so extra work is needed.
3. Why are ZooKeeper ephemeral nodes good for distributed locks?
Answer: They are automatically deleted when the client disconnects. Scenario: the lock-holding client dies → ZooKeeper session times out → ephemeral node is auto-deleted → lock is released automatically → another client can immediately acquire it. No zombie locks. ZooKeeper is also a CP system (ZAB, Raft-like), preventing split-brain. Downside: if the client is alive but GCing, the same problem persists. Kubernetes uses etcd for similar reasons.
4. What are the limits of SELECT FOR UPDATE?
Answer: (1) DB load — DB resources are held for the duration of the lock, (2) Contention — many transactions locking the same row get serialized, (3) Deadlocks — occur when lock ordering differs, (4) Distributed transactions are hard — locks spanning multiple DBs are very hard (2PC), (5) Long-transaction risk — long transactions block other work. Alternatives: optimistic locking (version column), CAS, partitioning, idempotency. For simple cases, SELECT FOR UPDATE is the safest.
5. How can you avoid distributed locks?
Answer: Five patterns: (1) Single ownership — partition so the same key always goes to the same worker, no lock needed, (2) Idempotency — operations that are safe to run multiple times, (3) CAS (Compare-and-Swap) — the UPDATE ... WHERE version=? pattern, (4) Event Sourcing — append-only, no conflicts, (5) Actor Model — each actor processes one message at a time. Distributed locks are a last resort. Whenever possible, solve concurrency through the data model or algorithm.
References
- How to do distributed locking — Martin Kleppmann
- Is Redlock safe? — Antirez's response
- Redlock Spec — official
- ZooKeeper Lock Recipe
- etcd Concurrency
- Curator Framework — Java ZooKeeper client
- Redisson — Java Redis client with locks
- Designing Data-Intensive Applications — Martin Kleppmann
- PostgreSQL Advisory Locks
- Hazelcast — in-memory data grid
- Apache Curator Recipes
현재 단락 (1/386)
- **Distributed locks are genuinely hard**: completely different from single-machine mutexes