Chaos and Order

💡 왼쪽 원문을 읽으면서 오른쪽에 따라 써보세요. Tab 키로 힌트를 받을 수 있습니다.

TL;DR

Distributed locks are genuinely hard: completely different from single-machine mutexes
Single Redis lock: simple but single point of failure
Redlock: Redis cluster-based, contested
Fencing Token: guarantees true lock safety
Zookeeper/etcd: safer but more complex
Conclusion: avoid distributed locks when possible

1. Why Are Distributed Locks Hard?

1.1 Single-Machine Mutex

lock = threading.Lock()

with lock:
    critical_section()

How it works:

Lock variable in memory
Atomic CPU instructions
Managed by a single OS kernel

Guarantees:

Only one thread at a time
Automatic release (GC, death)
Very fast (~10ns)

1.2 Differences in a Distributed Environment

[Server A] ←─ network ─→ [Lock Service] ←─ network ─→ [Server B]

Problems:

Network latency: several ms
Network partitions: packet loss
Clock drift: clocks differ between machines
GC pauses: JVM may stall for 1 second
Failures: what if the lock holder dies?
Partial failure: only some nodes die

→ All assumptions of a single machine break down.

1.3 Two Purposes of Distributed Locks

1. Efficiency:

Avoid doing the same work twice
Cost savings
Lock failure is not a big deal (temporary inefficiency)

2. Correctness:

Guarantee data consistency
Lock failure = data corruption
Requires very strict guarantees

Most people want efficiency but mistakenly believe they need correctness guarantees.

2. Single Redis Lock

2.1 The Simplest Attempt

def acquire_lock(key, ttl=10):
    return redis.set(key, "locked", nx=True, ex=ttl)

def release_lock(key):
    redis.delete(key)

# Usage
if acquire_lock("my-resource"):
    try:
        do_work()
    finally:
        release_lock("my-resource")

SET key value NX EX ttl:

NX: only if the key does not exist
EX: set TTL

Atomic: Redis guarantees it.

2.2 First Problem — Releasing Another Client's Lock

# Client A acquires the lock
acquire_lock("my-resource")
# Client A is working...
# Work takes 30 seconds (exceeds 10s TTL)
# Lock expires automatically

# Client B acquires the lock
acquire_lock("my-resource")

# Client A finishes and releases the lock
release_lock("my-resource")
# → Releases Client B's lock!

# Client C can now acquire the lock
# Client B and C work concurrently!

2.3 Fix — Lock Identifier

import uuid

def acquire_lock(key, ttl=10):
    token = str(uuid.uuid4())
    if redis.set(key, token, nx=True, ex=ttl):
        return token
    return None

def release_lock(key, token):
    # Atomicity via Lua script
    script = """
    if redis.call('get', KEYS[1]) == ARGV[1] then
        return redis.call('del', KEYS[1])
    else
        return 0
    end
    """
    redis.eval(script, 1, key, token)

Improvement: only releases your own lock.

2.4 Second Problem — Single Point of Failure

[Redis] ← if it dies, all locks are lost
   ↓
[Server A] [Server B] [Server C]

Replication:

[Master Redis]
    ↓ async replication
[Slave Redis]

Problem: async replication.

Client A acquires lock on master
Master dies (before replication)
Slave is promoted to master (no lock)
Client B acquires the lock
Both hold the lock → data corruption

3. Redlock Algorithm

3.1 Salvatore Sanfilippo (Redis Creator)'s Answer

Use multiple independent Redis instances. Must acquire the lock on a majority to succeed.

[Redis 1] [Redis 2] [Redis 3] [Redis 4] [Redis 5]
  ↓        ↓         ↓         ↓         ↓
[Client] → try SET NX EX on all
        → 3/5 or more successes = lock acquired

3.2 Algorithm

def acquire_redlock(key, ttl=10000):
    token = str(uuid.uuid4())
    quorum = len(redis_clients) // 2 + 1  # 3/5
    
    start_time = time.time() * 1000
    acquired = []
    
    for client in redis_clients:
        try:
            # Very short timeout
            if client.set(key, token, nx=True, px=ttl):
                acquired.append(client)
        except:
            pass
    
    elapsed = time.time() * 1000 - start_time
    
    if len(acquired) >= quorum and elapsed < ttl:
        return token  # Lock acquired
    else:
        # Failure - release everything
        for client in acquired:
            release_lock(client, key, token)
        return None

3.3 Redlock's Assumptions

Clocks are reasonably synchronized
GC pauses are short
Network latency < TTL

3.4 Martin Kleppmann's Critique (2016)

The famous blog post "How to do distributed locking":

1. GC Pause Problem:

1. Client A acquires lock (10s TTL)
2. Client A stalls 15s on GC (lock expires)
3. Client B acquires lock
4. Client A wakes up → thinks it still holds the lock
5. Client A and B work concurrently

2. Clock Jump:

1. Client A acquires lock (10s TTL)
2. Redis clock suddenly jumps 5s (NTP sync)
3. Lock expires early
4. Client B acquires lock

Kleppmann's conclusion: Redlock is unsuitable for correctness-critical systems.

3.5 Antirez's Rebuttal

Salvatore Sanfilippo's response:

Use monotonic clocks (no jumps)
GC pauses are a universal problem
Good enough for efficiency purposes

Outcome of the debate: no consensus. Kleppmann's position is more widely accepted.

4. Fencing Token

4.1 Core Idea

Lock + monotonically increasing token.

1. Client A acquires lock, token = 33
2. Client A stalls on GC
3. Client B acquires lock, token = 34
4. Client B writes to storage (token 34)
5. Client A wakes up, writes to storage (token 33)
6. Storage sees 33 < 34 → rejects

4.2 Implementation

def acquire_lock_with_token(key):
    token = redis.incr(f"{key}:token")  # Monotonic
    if redis.set(key, token, nx=True, ex=10):
        return token
    return None

def write_to_storage(data, token):
    # Storage validates the token
    if storage.last_token >= token:
        raise StaleTokenError()
    storage.write(data)
    storage.last_token = token

4.3 Key Requirement

Storage must validate the token:

DB tracks the token
Rejects old tokens

Most storage systems do not support this → extra work needed.

4.4 Where Is It Used?

HBase: fencing combined with ZooKeeper
Apache Cassandra: lightweight transactions
PostgreSQL: row versions (optimistic locking)

5. ZooKeeper-Based Locks

5.1 ZooKeeper's Strengths

CP system (consistency first)
Linearizable
Ephemeral nodes (auto-deleted when client dies)
Watches (change notifications)

5.2 Lock Implementation

Recipe: sequential ephemeral nodes.

def acquire_lock(zk, lock_path):
    # 1. Create sequential ephemeral node
    my_node = zk.create(
        f"{lock_path}/lock-",
        ephemeral=True,
        sequence=True
    )
    # e.g. /lock/lock-0000000005
    
    while True:
        # 2. Get all child nodes
        children = sorted(zk.get_children(lock_path))
        
        # 3. If I'm smallest, acquire the lock
        if my_node.endswith(children[0]):
            return my_node
        
        # 4. Otherwise watch the node right before me
        my_index = children.index(my_node.split('/')[-1])
        prev_node = children[my_index - 1]
        
        zk.exists(f"{lock_path}/{prev_node}", watch=callback)
        wait_for_callback()

5.3 ZooKeeper Advantages

1. Automatic release:

Ephemeral node is auto-deleted on client death
No zombie locks

2. Accurate ordering:

Sequence numbers guarantee fairness

3. Network partition safety:

ZooKeeper prevents split-brain
If the lock holder is separated from the quorum, it cannot hold the lock

5.4 ZooKeeper Disadvantages

Operationally complex: separate cluster
Performance: slower than Redis
GC pauses still apply: same problem if the client GCs

5.5 Curator Framework (Java)

import org.apache.curator.framework.recipes.locks.InterProcessMutex;

InterProcessMutex lock = new InterProcessMutex(client, "/my-lock");
if (lock.acquire(10, TimeUnit.SECONDS)) {
    try {
        // work
    } finally {
        lock.release();
    }
}

Curator abstracts away ZooKeeper's complexity.

6. etcd-Based Locks

6.1 Characteristics of etcd

CP system (Raft-based)
Linearizable
Lease (similar to TTL)
Watches

6.2 Lock Implementation

import etcd3

client = etcd3.client()

# Create lease (10 seconds)
lease = client.lease(ttl=10)

# Acquire lock
lock = client.lock("my-lock", ttl=10)
lock.acquire()

try:
    do_work()
finally:
    lock.release()

6.3 Comparison with ZooKeeper

	ZooKeeper	etcd
Language	Java	Go
Consensus	ZAB	Raft
Operations	Complex	Relatively simple
Usage	Hadoop ecosystem	Kubernetes
API	Java-centric	gRPC

Kubernetes uses etcd → etcd is increasingly becoming the standard.

7. Database-Based Locks

7.1 SELECT FOR UPDATE

BEGIN;
SELECT * FROM accounts WHERE id = 123 FOR UPDATE;
-- Hold the lock, other transactions wait
UPDATE accounts SET balance = balance - 100 WHERE id = 123;
COMMIT;
-- Release the lock

Pros:

DB handles it
Integrated with transactions
Auto-release (commit/rollback)

Cons:

DB load
Performance degradation under lock contention
Distributed transactions are difficult

7.2 Advisory Locks (PostgreSQL)

-- Acquire lock
SELECT pg_advisory_lock(12345);

-- Work
UPDATE ...;

-- Release lock
SELECT pg_advisory_unlock(12345);

Pros:

Very fast
In-memory lock (not a row lock)
Auto-released when session ends

Use cases: background job synchronization, migrations.

7.3 Optimistic Locking

-- 1. Read data and version
SELECT *, version FROM users WHERE id = 1;
-- version = 5

-- 2. Update after the work
UPDATE users SET name = 'New', version = version + 1
WHERE id = 1 AND version = 5;
-- If affected rows = 0, fail (someone else modified first)

Pros:

No locks (no waiting)
Good concurrency
Distribution-friendly

Cons:

Retry on conflict
Inefficient under frequent conflicts

8. Patterns and Pitfalls

8.1 Lock Renewal (Heartbeat)

For long-running work:

import threading

class RenewingLock:
    def __init__(self, key, ttl=10):
        self.key = key
        self.token = None
        self.ttl = ttl
        self.stop_renewal = False
    
    def acquire(self):
        self.token = acquire_lock(self.key, self.ttl)
        if self.token:
            self.renewal_thread = threading.Thread(target=self._renew)
            self.renewal_thread.start()
        return self.token
    
    def _renew(self):
        while not self.stop_renewal:
            time.sleep(self.ttl / 3)
            redis.expire(self.key, self.ttl)
    
    def release(self):
        self.stop_renewal = True
        release_lock(self.key, self.token)

Problem: renewal cannot run during a GC pause → lock expires. Still not safe.

8.3 Backoff with Jitter

import random

def acquire_with_retry(key, max_retries=10):
    for attempt in range(max_retries):
        if token := acquire_lock(key):
            return token
        # Exponential backoff + jitter
        delay = (2 ** attempt) * 0.1 + random.uniform(0, 0.5)
        time.sleep(delay)
    return None

Jitter matters: without it, all clients retry simultaneously — thundering herd.

8.3 Reentrant Lock

Allows the same client to re-enter.

class ReentrantLock:
    def __init__(self, key):
        self.key = key
        self.count = 0
        self.token = None
    
    def acquire(self):
        if self.count > 0:
            self.count += 1
            return True
        
        self.token = acquire_lock(self.key)
        if self.token:
            self.count = 1
            return True
        return False
    
    def release(self):
        self.count -= 1
        if self.count == 0:
            release_lock(self.key, self.token)
            self.token = None

8.4 Deadlock Avoidance

When using multiple locks:

# Wrong - deadlock possible
def transfer(from_id, to_id, amount):
    lock1 = acquire_lock(f"account:{from_id}")
    lock2 = acquire_lock(f"account:{to_id}")
    # ...

# Two users simultaneously transferring to each other:
# A → B: lock A, lock B
# B → A: lock B, lock A
# → Deadlock

# Correct - sorted ordering
def transfer(from_id, to_id, amount):
    first, second = sorted([from_id, to_id])
    lock1 = acquire_lock(f"account:{first}")
    lock2 = acquire_lock(f"account:{second}")
    # ...

9. How to Avoid Distributed Locks

9.1 Single Ownership

Each task is processed by only one worker:

[Job Queue]
    ↓
[Worker 1] (jobs A, B)
[Worker 2] (jobs C, D)
[Worker 3] (jobs E, F)

Partitioning:

Hash by user ID → which worker?
Same user always goes to the same worker
No locks needed

9.2 Idempotency

def process_payment(payment_id, amount):
    if db.exists(f"processed:{payment_id}"):
        return  # Already processed
    
    db.atomic_set(f"processed:{payment_id}", True)
    actually_charge(amount)

Safe to run multiple times → no locks needed.

9.3 Compare-and-Swap (CAS)

UPDATE inventory 
SET quantity = quantity - 1 
WHERE product_id = 123 AND quantity > 0;

Atomic, no locks. If affected rows = 0, no inventory.

9.4 Event Sourcing

Append events instead of mutating state:

events.append(OrderCreated(...))
events.append(InventoryReserved(...))
events.append(PaymentCharged(...))

Append-only, almost no locks. State is computed by replaying events.

9.5 Actor Model

Each actor owns its state and a mailbox.

class AccountActor extends Actor {
  var balance: BigDecimal = 0
  
  def receive = {
    case Deposit(amount) => balance += amount
    case Withdraw(amount) => balance -= amount
  }
}

One message at a time → no locks needed.

Erlang/Elixir, Akka follow this model.

10. Recommendations

10.1 Decision Tree

Do you really need a distributed lock?
├─ No → use idempotency, CAS, partitioning
└─ Yes
   ├─ Efficiency goal (roughly OK)
   │  └─ Single Redis lock (simple)
   └─ Correctness goal (never OK)
      ├─ ZooKeeper / etcd
      └─ Fencing token (if storage supports)

10.2 Kleppmann's Conclusion

"If efficiency is the goal, Redlock may be sufficient. If correctness is the goal, use consensus algorithms (ZAB, Raft)."

10.3 Practical Recommendations

Most cases:

Simple Redis lock (SET NX EX)
Short TTL
Idempotency as backup
Systems where "lock failure is not catastrophic"

Critical cases:

ZooKeeper / etcd
Fencing token
Use transactions
Guarantee correctness in code

Quiz

1. What is Redlock's core problem?

Answer: GC pauses and clock jumps. Martin Kleppmann's 2016 critique: (1) GC pause — Client A acquires a lock, then stalls 15s on GC, the lock expires, Client B acquires the lock, A wakes up thinking it still owns the lock → both work concurrently → data corruption. (2) Clock jump — NTP sync causes the clock to jump, expiring the lock early. Conclusion: Redlock is OK for efficiency, unsuitable for correctness. For correctness, use ZooKeeper/etcd plus a fencing token.

2. How does a Fencing Token guarantee safety?

Answer: The lock is issued together with a monotonically increasing token. When the lock holder writes to storage, it passes the token along. Storage validates the token — writes from old tokens are rejected. Scenario: Client A token=33 → GC pause → Client B token=34 → B writes to storage (last_token=34) → A wakes up and writes to storage (token=33) → storage sees 33 < 34 and rejects. Key requirement: storage must track the token. Most DBs do not natively support this, so extra work is needed.

3. Why are ZooKeeper ephemeral nodes good for distributed locks?

Answer: They are automatically deleted when the client disconnects. Scenario: the lock-holding client dies → ZooKeeper session times out → ephemeral node is auto-deleted → lock is released automatically → another client can immediately acquire it. No zombie locks. ZooKeeper is also a CP system (ZAB, Raft-like), preventing split-brain. Downside: if the client is alive but GCing, the same problem persists. Kubernetes uses etcd for similar reasons.

4. What are the limits of SELECT FOR UPDATE?

Answer: (1) DB load — DB resources are held for the duration of the lock, (2) Contention — many transactions locking the same row get serialized, (3) Deadlocks — occur when lock ordering differs, (4) Distributed transactions are hard — locks spanning multiple DBs are very hard (2PC), (5) Long-transaction risk — long transactions block other work. Alternatives: optimistic locking (version column), CAS, partitioning, idempotency. For simple cases, SELECT FOR UPDATE is the safest.

5. How can you avoid distributed locks?

Answer: Five patterns: (1) Single ownership — partition so the same key always goes to the same worker, no lock needed, (2) Idempotency — operations that are safe to run multiple times, (3) CAS (Compare-and-Swap) — the UPDATE ... WHERE version=? pattern, (4) Event Sourcing — append-only, no conflicts, (5) Actor Model — each actor processes one message at a time. Distributed locks are a last resort. Whenever possible, solve concurrency through the data model or algorithm.

References

How to do distributed locking — Martin Kleppmann
Is Redlock safe? — Antirez's response
Redlock Spec — official
ZooKeeper Lock Recipe
etcd Concurrency
Curator Framework — Java ZooKeeper client
Redisson — Java Redis client with locks
Designing Data-Intensive Applications — Martin Kleppmann
PostgreSQL Advisory Locks
Hazelcast — in-memory data grid
Apache Curator Recipes