- Published on
Distributed Lock Pattern Comparison: Redis Redlock vs ZooKeeper vs etcd — Consistency and Availability Trade-offs
- Authors

- Name
- Youngju Kim
- @fjvbn20031
- Introduction
- Redis Single Instance Lock
- Redlock Algorithm
- Redlock Critique: Kleppmann vs Antirez Debate
- ZooKeeper Distributed Lock
- etcd Distributed Lock
- Three-Way Comparison Analysis
- Failure Cases and Recovery Procedures
- Operational Considerations
- Conclusion
- References

Introduction
In distributed systems, when multiple processes simultaneously access the same resource, data consistency breaks. Operations requiring mutual exclusion such as inventory deduction, payment processing, and file writing must be performed by only one process. On a single server, mutexes or semaphores can solve this, but between processes distributed across multiple servers, a Distributed Lock is needed.
Distributed lock use cases fall into two categories:
- Efficiency: To prevent duplicate execution of the same work. If a lock occasionally fails, only cost is wasted without data corruption.
- Correctness: To prevent data corruption from concurrent access. If a lock fails, data is corrupted, requiring much stricter guarantees.
For efficiency purposes, a Redis single instance lock is sufficient. However, for correctness purposes, a Fencing Token must accompany the lock, and this difference is the core of the Redis Redlock debate.
Redis Single Instance Lock
The simplest distributed lock uses Redis's SET NX PX command. The NX (Not eXists) option sets the key only when it doesn't exist, and PX specifies expiration in milliseconds.
Basic Implementation
import redis
import uuid
import time
class RedisSimpleLock:
"""Redis single instance distributed lock"""
def __init__(self, client: redis.Redis, resource: str, ttl_ms: int = 10000):
self.client = client
self.resource = resource
self.ttl_ms = ttl_ms
self.lock_value = str(uuid.uuid4()) # Ownership identifier
def acquire(self, retry_count: int = 3, retry_delay_ms: int = 200) -> bool:
"""Attempt to acquire lock. Retry on failure."""
for attempt in range(retry_count):
result = self.client.set(
self.resource,
self.lock_value,
nx=True,
px=self.ttl_ms
)
if result:
return True
time.sleep(retry_delay_ms / 1000.0)
return False
def release(self) -> bool:
"""Release lock after ownership verification (using Lua script)"""
lua_script = """
if redis.call("GET", KEYS[1]) == ARGV[1] then
return redis.call("DEL", KEYS[1])
else
return 0
end
"""
result = self.client.eval(lua_script, 1, self.resource, self.lock_value)
return result == 1
# Usage example
client = redis.Redis(host="localhost", port=6379)
lock = RedisSimpleLock(client, "order:12345:lock", ttl_ms=5000)
if lock.acquire():
try:
# Perform critical section work
print("Lock acquired, performing work...")
finally:
lock.release()
else:
print("Failed to acquire lock")
Safe Lock Release with Lua Script
When releasing a lock, ownership must be verified. If GET and DEL are executed as separate commands, another client could acquire the lock between them. Lua scripts execute atomically in Redis, preventing this issue.
-- safe_unlock.lua
-- Safe lock release after ownership verification
-- KEYS[1]: Lock key
-- ARGV[1]: Owner identification value
-- Returns: 1 (success), 0 (ownership mismatch)
if redis.call("GET", KEYS[1]) == ARGV[1] then
return redis.call("DEL", KEYS[1])
else
return 0
end
The limitation of the single instance approach is clear. If the Redis master fails, lock information is lost. Even with failover to a replica, due to the asynchronous replication nature, if the master goes down before the lock key is replicated, two clients can simultaneously acquire the same lock.
Redlock Algorithm
Redis creator Salvatore Sanfilippo (antirez) proposed the Redlock algorithm to overcome single instance limitations. The core idea is to obtain majority consensus from N (typically 5) independent Redis master nodes.
Three-Step Algorithm
- Acquisition phase: Record the current time, then sequentially send SET NX PX commands to all N nodes. The timeout for each node is set much shorter than the total TTL.
- Validity check: If locks were acquired on a majority (N/2 + 1) or more nodes, and the total elapsed time is less than the TTL, the lock acquisition is successful. Valid remaining time is
TTL - elapsed time. - Release phase: Send release commands unconditionally to all N nodes. Release is sent even to nodes where acquisition failed to clean up partially set keys.
Redlock Python Implementation
import redis
import uuid
import time
from typing import List, Optional, Tuple
class Redlock:
"""Redlock distributed lock algorithm implementation"""
CLOCK_DRIFT_FACTOR = 0.01 # Clock drift correction factor
RETRY_DELAY_MS = 200
RETRY_COUNT = 3
def __init__(self, nodes: List[dict], ttl_ms: int = 10000):
self.nodes = [
redis.Redis(host=n["host"], port=n["port"], socket_timeout=0.1)
for n in nodes
]
self.quorum = len(self.nodes) // 2 + 1
self.ttl_ms = ttl_ms
def _acquire_single(self, client: redis.Redis, resource: str,
value: str) -> bool:
try:
return bool(client.set(resource, value, nx=True, px=self.ttl_ms))
except redis.RedisError:
return False
def _release_single(self, client: redis.Redis, resource: str,
value: str) -> None:
lua = """
if redis.call("GET", KEYS[1]) == ARGV[1] then
return redis.call("DEL", KEYS[1])
else
return 0
end
"""
try:
client.eval(lua, 1, resource, value)
except redis.RedisError:
pass
def acquire(self, resource: str) -> Optional[Tuple[str, float]]:
"""Acquire lock. Returns (lock_value, validity_time) on success"""
for _ in range(self.RETRY_COUNT):
lock_value = str(uuid.uuid4())
start_time = time.monotonic()
acquired_count = 0
# Step 1: Attempt lock acquisition on all nodes
for client in self.nodes:
if self._acquire_single(client, resource, lock_value):
acquired_count += 1
# Step 2: Validity check
elapsed_ms = (time.monotonic() - start_time) * 1000
drift = self.ttl_ms * self.CLOCK_DRIFT_FACTOR + 2
validity_time = self.ttl_ms - elapsed_ms - drift
if acquired_count >= self.quorum and validity_time > 0:
return (lock_value, validity_time)
# On failure, release on all nodes
for client in self.nodes:
self._release_single(client, resource, lock_value)
time.sleep(self.RETRY_DELAY_MS / 1000.0)
return None
def release(self, resource: str, lock_value: str) -> None:
"""Release lock on all nodes"""
for client in self.nodes:
self._release_single(client, resource, lock_value)
# Usage example
nodes = [
{"host": "redis1.example.com", "port": 6379},
{"host": "redis2.example.com", "port": 6379},
{"host": "redis3.example.com", "port": 6379},
{"host": "redis4.example.com", "port": 6379},
{"host": "redis5.example.com", "port": 6379},
]
redlock = Redlock(nodes, ttl_ms=10000)
result = redlock.acquire("payment:order:99999")
if result:
lock_value, validity_ms = result
try:
print(f"Lock acquired. Valid for: {validity_ms:.0f}ms")
# Perform critical section work
finally:
redlock.release("payment:order:99999", lock_value)
Using time.monotonic() is important here. System time (time.time()) can go backward due to NTP corrections, but a monotonic clock always moves forward. The CLOCK_DRIFT_FACTOR also compensates for clock drift between nodes.
Redlock Critique: Kleppmann vs Antirez Debate
In 2016, Martin Kleppmann identified fundamental problems with the Redlock algorithm in his article "How to do distributed locking." This debate became one of the most famous technical discussions in the distributed systems community.
Kleppmann's Core Critique
1. Timing Assumption Risks
Redlock relies on the timing assumption that processes complete quickly without interruption. But in reality:
- Client A successfully acquires locks on 3 of 5 nodes
- A GC (Garbage Collection) pause occurs, freezing the process for tens of seconds
- Meanwhile, the TTL expires and the lock is released
- Client B acquires the same lock and performs work
- Client A returns from GC and (believing it still holds the lock) performs work
- Both clients perform critical section work simultaneously, causing data corruption
2. Absence of Fencing Token
Kleppmann argued that safe distributed locks must have a Fencing Token. A Fencing Token is a monotonically increasing number transmitted along with resource access (e.g., database). When the resource side rejects requests with tokens lower than previously seen ones, delayed writes from expired lock holders can be safely blocked. Redlock lacks a mechanism to generate such Fencing Tokens.
3. Network Delay and Clock Jumps
If system clocks suddenly jump due to NTP synchronization failure or VM migration, TTL calculations become invalid. Redlock assumes that clocks between nodes are roughly synchronized, but this cannot be guaranteed in asynchronous distributed systems.
Antirez's Response
Salvatore Sanfilippo responded in his article "Is Redlock safe?" with the following counterarguments:
- The GC pause scenario is not specific to Redlock but applies to all distributed lock systems
- In reasonable operational environments, clock drift is limited and can be sufficiently corrected with
CLOCK_DRIFT_FACTOR - Fencing Tokens require support from the resource side, and if the resource supports this, it could potentially handle concurrency control on its own
Comparison Table: Redlock Debate Summary
| Point | Kleppmann (Critique) | Antirez (Defense) |
|---|---|---|
| Timing assumptions | Timing assumptions are dangerous in async systems | Sufficiently valid in reasonable operational environments |
| GC pause | Process can be suspended during lock validity | A common problem for all distributed systems |
| Fencing Token | Redlock cannot generate them, essential | If resource supports them, the lock itself may be unnecessary |
| Clock sync | Clock jump risk during NTP failure | Addressable with drift correction factor |
| Recommended use | Efficiency only, unsuitable for correctness | Sufficiently safe for most practical scenarios |
In the author's view, standalone Redlock use is not recommended when correctness is important. It's safer to use it with storage that supports Fencing Tokens, or to choose consensus-based systems like ZooKeeper or etcd.
ZooKeeper Distributed Lock
Apache ZooKeeper is a dedicated service designed for distributed system coordination. Through the Zab (ZooKeeper Atomic Broadcast) protocol, it guarantees Linearizability and provides built-in primitives needed for distributed lock implementation.
Ephemeral Sequential Node Pattern
ZooKeeper's distributed lock uses Ephemeral Sequential Znodes:
- The client creates an ephemeral sequential node at the
/locks/resource-name/lock-path. - ZooKeeper automatically assigns a sequence number (e.g.,
lock-0000000001). - If the created node has the smallest number, the lock is acquired.
- Otherwise, a Watch is set on the immediately preceding node and the client waits.
- When the client session ends, the ephemeral node is automatically deleted, releasing the lock.
Setting a Watch only on the immediately preceding node is key. If all waiters Watch the smallest node, a Herd Effect occurs when the lock is released, as notifications are sent to all waiters simultaneously.
ZooKeeper Distributed Lock Python Implementation
from kazoo.client import KazooClient
from kazoo.recipe.lock import Lock
import logging
logging.basicConfig(level=logging.INFO)
class ZooKeeperDistributedLock:
"""ZooKeeper-based distributed lock (using Kazoo library)"""
def __init__(self, hosts: str, lock_path: str):
self.zk = KazooClient(hosts=hosts)
self.zk.start()
self.lock = Lock(self.zk, lock_path)
self.lock_path = lock_path
def acquire(self, timeout: float = 30.0) -> bool:
"""Acquire lock. Returns False if not acquired within timeout seconds"""
try:
return self.lock.acquire(timeout=timeout)
except Exception as e:
logging.error(f"Lock acquisition failed: {e}")
return False
def release(self) -> None:
"""Release lock"""
try:
self.lock.release()
except Exception as e:
logging.error(f"Lock release failed: {e}")
def get_fencing_token(self) -> int:
"""Use zxid as Fencing Token"""
data, stat = self.zk.get(self.lock_path)
return stat.czxid # Creation transaction ID (monotonically increasing)
def close(self) -> None:
self.zk.stop()
self.zk.close()
# Usage example
zk_lock = ZooKeeperDistributedLock(
hosts="zk1:2181,zk2:2181,zk3:2181",
lock_path="/locks/payment/order-99999"
)
if zk_lock.acquire(timeout=10.0):
try:
fencing_token = zk_lock.get_fencing_token()
logging.info(f"Lock acquired, fencing token: {fencing_token}")
# Write to storage with fencing_token
# storage.write(data, fencing_token=fencing_token)
finally:
zk_lock.release()
zk_lock.close()
ZooKeeper's strength is that zxid (ZooKeeper Transaction ID) can be used as a Fencing Token. Since zxid is globally monotonically increasing, if the storage side rejects writes with previous zxids, delayed writes from expired lock holders can be safely blocked.
Read-Write Lock Recipe
ZooKeeper supports not only exclusive locks but also Read-Write Locks. Read lock nodes use the read- prefix, and write lock nodes use the write- prefix. A read lock can be acquired when there are no write nodes ahead, and a write lock can only be acquired when it has the smallest number. This allows higher read concurrency while ensuring write mutual exclusion.
etcd Distributed Lock
etcd is a distributed key-value store widely known as the state store for Kubernetes. Based on the Raft consensus algorithm, it guarantees Strong Consistency and provides Lease and Revision mechanisms suitable for distributed lock implementation.
Lease-based TTL Management
etcd's Lease is a temporary token with TTL. When a key-value pair is attached to a Lease, the key is also deleted when the Lease expires. The client periodically calls KeepAlive to renew the Lease, and if the client fails, renewal stops and the lock is automatically released.
Automatic Fencing Token Generation via Revision Numbers
Every key change in etcd is assigned a globally monotonically increasing Revision number. The ability to directly use this Revision as a Fencing Token is a major advantage of etcd distributed locks. No separate token generation mechanism is needed.
etcd Distributed Lock Python Implementation
import etcd3
import threading
import logging
from typing import Optional, Tuple
logging.basicConfig(level=logging.INFO)
class EtcdDistributedLock:
"""etcd Lease-based distributed lock"""
def __init__(self, host: str = "localhost", port: int = 2379,
ttl: int = 10):
self.client = etcd3.client(host=host, port=port)
self.ttl = ttl
self.lease: Optional[etcd3.Lease] = None
self._keepalive_thread: Optional[threading.Thread] = None
self._stop_keepalive = threading.Event()
def acquire(self, lock_key: str,
timeout: float = 30.0) -> Optional[Tuple[int, int]]:
"""
Acquire lock. Returns (revision, lease_id) on success.
Revision can be used as Fencing Token.
"""
self.lease = self.client.lease(self.ttl)
self.lock_key = lock_key
# Atomic lock acquisition via transaction
# Create only when key doesn't exist (Compare-And-Swap)
success, responses = self.client.transaction(
compare=[
self.client.transactions.create(lock_key) == 0
],
success=[
self.client.transactions.put(
lock_key, "locked", lease=self.lease
)
],
failure=[]
)
if success:
# Use Revision as Fencing Token
revision = responses[0].header.revision
self._start_keepalive()
logging.info(
f"Lock acquired. revision(fencing token): {revision}"
)
return (revision, self.lease.id)
logging.warning("Lock acquisition failed: already held by another client")
self.lease.revoke()
return None
def _start_keepalive(self) -> None:
"""Start Lease auto-renewal thread"""
self._stop_keepalive.clear()
def keepalive_loop():
while not self._stop_keepalive.is_set():
try:
self.lease.refresh()
except Exception as e:
logging.error(f"Lease renewal failed: {e}")
break
self._stop_keepalive.wait(timeout=self.ttl / 3.0)
self._keepalive_thread = threading.Thread(
target=keepalive_loop, daemon=True
)
self._keepalive_thread.start()
def release(self) -> None:
"""Release lock"""
self._stop_keepalive.set()
if self.lease:
try:
self.lease.revoke()
logging.info("Lock released")
except Exception as e:
logging.error(f"Lock release failed: {e}")
def close(self) -> None:
self.release()
self.client.close()
# Usage example
lock = EtcdDistributedLock(host="etcd1.example.com", ttl=15)
result = lock.acquire("locks/payment/order-99999")
if result:
fencing_token, lease_id = result
try:
# Write to storage with fencing_token (revision)
logging.info(f"Performing work, fencing token: {fencing_token}")
finally:
lock.release()
Jepsen Test Results and Cautions
In the Jepsen etcd 3.4.3 test, it was confirmed that etcd locks may not be safe in certain network partition scenarios. Specifically, if Lease renewal is delayed during leader changes, the client may still believe it holds the lock while the Lease has actually expired and another client has acquired it. Therefore, etcd locks must also be used with Fencing Tokens.
Three-Way Comparison Analysis
Core Comparison Table
| Item | Redis Redlock | ZooKeeper | etcd |
|---|---|---|---|
| Consensus algorithm | None (independent node majority) | Zab (Atomic Broadcast) | Raft |
| Consistency model | Eventual consistency approximation | Linearizable | Linearizable |
| Fencing Token | Not supported | zxid available | Revision available |
| Fault tolerance | Up to N/2 node failures | Up to N/2 node failures | Up to N/2 node failures |
| Lock release mechanism | TTL expiration | Session expiry + ephemeral node deletion | Lease expiration |
| Performance (acquisition latency) | Very fast (1-5ms) | Medium (5-20ms) | Medium (5-15ms) |
| Throughput | High (10K+ ops/s) | Medium (1-5K ops/s) | Medium (2-8K ops/s) |
| Operational complexity | Low | High (dedicated ensemble) | Medium (already present in K8s) |
| Watch/notification | Pub/Sub (no guarantee) | Watch (ordered) | Watch (Revision-based) |
| Client ecosystem | Very rich | Rich | Rich (especially Go ecosystem) |
Selection Guide by Use Case
Choose Redis Redlock when:
- Preventing duplicate work for efficiency (cache warming, batch jobs, etc.)
- Fast millisecond-level lock acquisition is needed
- Redis is already in use and you don't want to add infrastructure
- Occasional double execution is acceptable
Choose ZooKeeper when:
- Correctness is paramount and Fencing Token is essential
- Various coordination needs like leader election, config management
- Already operating ZooKeeper-dependent systems (Hadoop, Kafka, etc.)
- Complex lock patterns like Read-Write Lock are needed
Choose etcd when:
- Already operating etcd in a Kubernetes environment
- Want to easily leverage Fencing Token (Revision)
- Prefer gRPC-based API and Go ecosystem
- Want strong consistency with relatively low operational burden
Cost-Complexity Matrix
| Criteria | Redis Redlock | ZooKeeper | etcd |
|---|---|---|---|
| Infrastructure cost | Low (reuse Redis) | High (dedicated cluster) | Medium (can be included in K8s) |
| Learning curve | Low | High | Medium |
| Consistency guarantee | Weak | Strong | Strong |
| Debugging ease | High | Medium | Medium |
| Community support | Very active | Mature | Growing |
Failure Cases and Recovery Procedures
Case 1: Dual Lock Acquisition During GC Pause with TTL Expiration
Scenario:
Time ------>
Client A: [Lock acquired] --- [GC pause starts] -------------------- [GC resumes, attempts write]
TTL expires
Client B: [Lock acquired] --- [Write complete] --- [Write complete]
Result: Both A and B write -> Data corruption
This scenario can occur not only with Redis Redlock but with all TTL-based locks. The defense pattern is to use Fencing Tokens:
class FencingAwareStorage:
"""Storage wrapper that validates Fencing Tokens"""
def __init__(self):
self.last_token = 0
self._lock = threading.Lock()
def write(self, data: dict, fencing_token: int) -> bool:
"""Allow writes only when fencing_token is greater than previous value"""
with self._lock:
if fencing_token <= self.last_token:
logging.warning(
f"Rejected: token {fencing_token} <= "
f"last {self.last_token}"
)
return False
self.last_token = fencing_token
# Perform actual write
self._do_write(data)
logging.info(
f"Write successful: token {fencing_token}"
)
return True
def _do_write(self, data: dict) -> None:
# Actual storage write logic
pass
Case 2: Lock Loss Due to ZooKeeper Session Expiry
Scenario:
When the connection between ZooKeeper client and ensemble is severed due to network partition, after session timeout the ephemeral node is deleted and the lock is released. Meanwhile, the client may still be performing work.
Defense pattern:
from kazoo.client import KazooState
def connection_listener(state):
"""ZooKeeper connection state monitoring"""
if state == KazooState.SUSPENDED:
# Connection suspended: pause ongoing work
logging.warning("ZK connection suspended - pausing operations")
pause_current_operations()
elif state == KazooState.LOST:
# Session expired: lock loss confirmed, stop work immediately
logging.error("ZK session expired - lock lost, aborting operations")
abort_current_operations()
elif state == KazooState.CONNECTED:
# Reconnection successful: attempt lock reacquisition
logging.info("ZK reconnected - attempting lock reacquisition")
reacquire_lock()
zk = KazooClient(hosts="zk1:2181,zk2:2181,zk3:2181")
zk.add_listener(connection_listener)
zk.start()
Recovery Procedure Checklist
- Immediate response: Stop current work immediately upon detecting lock loss
- State verification: Check data consistency of partially completed work
- Idempotency design: Design work to be idempotent ensuring same results on retry
- Compensating transactions: Execute compensation logic to revert partially completed state
- Alert dispatch: Notify operations team of potential double execution
Operational Considerations
Lock Granularity and Scope Design
Lock scope should be designed as narrow as possible. Broad-scoped locks increase contention and reduce throughput.
Bad: /locks/orders (Single lock for all orders)
Medium: /locks/orders/user-123 (Per-user lock)
Good: /locks/orders/99999 (Per-order lock)
Deadlock Detection and Timeout Strategy
In distributed environments, deadlocks can occur when two processes wait for each other's locks. Prevention strategies include:
- Fixed-order acquisition: When locks on multiple resources are needed, always acquire in the same order
- Timeout setting: Set timeouts on all lock acquisitions to prevent infinite waiting
- Lock hierarchies: Acquire locks from parent resources to child resources in order
Monitoring Metrics
Metrics that must be collected when operating distributed locks:
| Metric | Description | Danger Threshold |
|---|---|---|
| lock_acquisition_time_p99 | Lock acquisition latency P99 | Exceeds 50% of TTL |
| lock_contention_rate | Lock contention rate (fails/total) | Exceeds 30% |
| lock_hold_duration_p99 | Lock hold duration P99 | Exceeds 80% of TTL |
| lock_timeout_rate | Timeout occurrence rate | Exceeds 5% |
| fencing_token_reject_rate | Fencing Token rejection rate | Investigate immediately if > 0 |
If fencing_token_reject_rate is greater than 0, it means double execution has occurred, so this metric requires the most urgent response.
Conclusion
Distributed locks are not simple API calls but architecture decisions about understanding and designing the trade-off between consistency and availability.
- For efficiency purposes, a Redis single instance lock is sufficient. Simple and fast.
- For efficiency but concerned about Redis single point of failure, consider Redlock. However, consistency guarantees are limited.
- When correctness is important, choose ZooKeeper or etcd and always use with Fencing Tokens.
- Regardless of implementation, Fencing Token + idempotency design + monitoring are the three pillars of a complete distributed lock.
No perfect distributed lock exists. What matters is clearly defining the guarantee level your system requires, choosing the appropriate implementation, and having defense strategies for failure scenarios.
References
- Distributed Locks with Redis - Redis Official Documentation
- How to do distributed locking - Martin Kleppmann
- Is Redlock safe? - antirez
- ZooKeeper Recipes - Apache Official Documentation
- etcd Lock Tutorial - etcd Official Documentation
- Jepsen: etcd 3.4.3 Analysis
- Distributed Locking Practical Guide - Architecture Weekly