Skip to content

Split View: 분산 락 완전 가이드 2025: Redlock, Zookeeper, etcd, Fencing Token, 안전한 구현

✨ Learn with Quiz
|

분산 락 완전 가이드 2025: Redlock, Zookeeper, etcd, Fencing Token, 안전한 구현

TL;DR

  • 분산 락은 진짜 어렵다: 단일 머신 mutex와 완전히 다름
  • Redis 단일 락: 단순하지만 single point of failure
  • Redlock: Redis 클러스터 기반, 논쟁 중
  • Fencing Token: 락의 진짜 안전성 보장
  • Zookeeper/etcd: 더 안전하지만 복잡
  • 결론: 가능하면 분산 락을 피하라

1. 왜 분산 락이 어려운가?

1.1 단일 머신 mutex

lock = threading.Lock()

with lock:
    critical_section()

작동 원리:

  • 메모리에 락 변수
  • CPU의 atomic 명령
  • 단일 OS 커널이 관리

보장:

  • 한 번에 한 스레드만
  • 자동 해제 (GC, 죽음)
  • 매우 빠름 (~10ns)

1.2 분산 환경의 차이

[Server A] ←─ network ─→ [Lock Service] ←─ network ─→ [Server B]

문제들:

  1. 네트워크 지연: 수 ms
  2. 네트워크 분할: 패킷 손실
  3. 클럭 차이: 머신마다 시계 다름
  4. GC pauses: JVM이 1초 멈출 수 있음
  5. 장애: 락 보유자가 죽으면?
  6. 부분 실패: 일부 노드만 죽음

단일 머신의 가정이 모두 깨짐.

1.3 분산 락의 두 가지 목적

1. 효율성 (Efficiency):

  • 같은 작업을 두 번 안 함
  • 비용 절약
  • 락 실패해도 큰 문제 X (일시적 비효율)

2. 정확성 (Correctness):

  • 데이터 일관성 보장
  • 락 실패 = 데이터 손상
  • 매우 엄격한 보장 필요

대부분의 사람들이 효율성을 원하면서 정확성 보장이 필요하다고 착각.


2. Redis 단일 락

2.1 가장 단순한 시도

def acquire_lock(key, ttl=10):
    return redis.set(key, "locked", nx=True, ex=ttl)

def release_lock(key):
    redis.delete(key)

# 사용
if acquire_lock("my-resource"):
    try:
        do_work()
    finally:
        release_lock("my-resource")

SET key value NX EX ttl:

  • NX: 키가 없을 때만
  • EX: TTL 설정

원자적: Redis가 보장.

2.2 첫 번째 문제 — 다른 클라이언트의 락 해제

# Client A가 락 획득
acquire_lock("my-resource")
# Client A가 작업 중...
# 작업이 30초 걸림 (TTL 10초 초과)
# 락 자동 만료

# Client B가 락 획득
acquire_lock("my-resource")

# Client A가 작업 끝나고 락 해제
release_lock("my-resource")
# → Client B의 락을 해제!

# Client C가 락 획득 가능
# Client B와 C가 동시에 작업!

2.3 해결 — 락 식별자

import uuid

def acquire_lock(key, ttl=10):
    token = str(uuid.uuid4())
    if redis.set(key, token, nx=True, ex=ttl):
        return token
    return None

def release_lock(key, token):
    # Lua 스크립트로 원자성
    script = """
    if redis.call('get', KEYS[1]) == ARGV[1] then
        return redis.call('del', KEYS[1])
    else
        return 0
    end
    """
    redis.eval(script, 1, key, token)

개선: 자기 락만 해제.

2.4 두 번째 문제 — Single Point of Failure

[Redis] ← 죽으면 모든 락 잃음
[Server A] [Server B] [Server C]

Replication:

[Master Redis]
async replication
[Slave Redis]

문제: Async replication.

  1. Client A가 master에 락 획득
  2. Master 죽음 (replication 전)
  3. Slave가 master로 승격 (락 없음)
  4. Client B가 락 획득
  5. 둘 다 락 보유 → 데이터 손상

3. Redlock 알고리즘

3.1 Salvatore Sanfilippo (Redis 창시자)의 답

여러 독립 Redis 인스턴스 사용. 과반수에 락 획득해야 성공.

[Redis 1] [Redis 2] [Redis 3] [Redis 4] [Redis 5]
  ↓        ↓         ↓         ↓         ↓
[Client] → 모두에게 SET NX EX 시도
3/5 이상 성공이면 락 획득

3.2 알고리즘

def acquire_redlock(key, ttl=10000):
    token = str(uuid.uuid4())
    quorum = len(redis_clients) // 2 + 1  # 3/5
    
    start_time = time.time() * 1000
    acquired = []
    
    for client in redis_clients:
        try:
            # 매우 짧은 timeout
            if client.set(key, token, nx=True, px=ttl):
                acquired.append(client)
        except:
            pass
    
    elapsed = time.time() * 1000 - start_time
    
    if len(acquired) >= quorum and elapsed < ttl:
        return token  # 락 획득 성공
    else:
        # 실패 - 모든 것 해제
        for client in acquired:
            release_lock(client, key, token)
        return None

3.3 Redlock의 가정

  1. 시계가 합리적으로 동기화
  2. GC pauses가 짧음
  3. 네트워크 지연 < TTL

3.4 Martin Kleppmann의 비판 (2016)

유명한 블로그 글 "How to do distributed locking":

1. GC Pause 문제:

1. Client A가 락 획득 (10TTL)
2. Client AGC15멈춤 (락 만료)
3. Client B가 락 획득
4. Client A가 깨어남 → 자기가 락 보유한다고 생각
5. Client AB가 동시에 작업

2. 클럭 점프:

1. Client A가 락 획득 (10TTL)
2. Redis 시계가 갑자기 5점프 (NTP 동기화)
3. 락이 일찍 만료
4. Client B가 락 획득

Kleppmann의 결론: Redlock은 정확성이 필요한 시스템에는 부적합.

3.5 Antirez의 반박

Salvatore Sanfilippo의 답:

  • 시계는 monotonic 사용 (점프 없음)
  • GC pause는 모든 시스템의 문제
  • 효율성 목적이면 충분

논쟁 결과: 합의 없음. Kleppmann의 입장이 더 인정받음.


4. Fencing Token

4.1 핵심 아이디어

락 + 단조 증가하는 토큰.

1. Client A가 락 획득, 토큰 = 33
2. Client AGC로 멈춤
3. Client B가 락 획득, 토큰 = 34
4. Client B가 storage에 쓰기 (token 34)
5. Client A가 깨어남, storage에 쓰기 (token 33)
6. Storage가 33 < 34 확인 → 거부 ✅

4.2 구현

def acquire_lock_with_token(key):
    token = redis.incr(f"{key}:token")  # 단조 증가
    if redis.set(key, token, nx=True, ex=10):
        return token
    return None

def write_to_storage(data, token):
    # Storage가 토큰 검증
    if storage.last_token >= token:
        raise StaleTokenError()
    storage.write(data)
    storage.last_token = token

4.3 핵심 요건

Storage가 토큰을 검증해야 함:

  • DB가 토큰 추적
  • 옛 토큰 거부

대부분의 storage가 이를 지원 안 함 → 추가 작업 필요.

4.4 어디에 사용?

  • HBase: ZooKeeper와 결합한 fencing
  • Apache Cassandra: lightweight transactions
  • PostgreSQL: row 버전 (optimistic locking)

5. Zookeeper 기반 락

5.1 ZooKeeper의 강점

  • CP 시스템 (일관성 우선)
  • Linearizable
  • Ephemeral nodes (클라이언트 죽으면 자동 삭제)
  • Watches (변경 알림)

5.2 락 구현

Recipe: 순차 ephemeral node.

def acquire_lock(zk, lock_path):
    # 1. 순차 ephemeral node 생성
    my_node = zk.create(
        f"{lock_path}/lock-",
        ephemeral=True,
        sequence=True
    )
    # 예: /lock/lock-0000000005
    
    while True:
        # 2. 모든 자식 노드 가져오기
        children = sorted(zk.get_children(lock_path))
        
        # 3. 내가 가장 작으면 락 획득
        if my_node.endswith(children[0]):
            return my_node
        
        # 4. 아니면 바로 앞 노드 watch
        my_index = children.index(my_node.split('/')[-1])
        prev_node = children[my_index - 1]
        
        zk.exists(f"{lock_path}/{prev_node}", watch=callback)
        wait_for_callback()

5.3 ZooKeeper의 장점

1. 자동 해제:

  • 클라이언트 죽으면 ephemeral node 자동 삭제
  • 좀비 락 없음

2. 정확한 순서:

  • Sequence number로 공정성 보장

3. 네트워크 분할 안전:

  • ZooKeeper가 split-brain 방지
  • 락 보유자가 quorum과 분리되면 락 해제 불가능

5.4 ZooKeeper의 단점

  • 운영 복잡: 별도 클러스터
  • 성능: Redis보다 느림
  • GC pause 여전: 클라이언트 GC면 동일 문제

5.5 Curator Framework (Java)

import org.apache.curator.framework.recipes.locks.InterProcessMutex;

InterProcessMutex lock = new InterProcessMutex(client, "/my-lock");
if (lock.acquire(10, TimeUnit.SECONDS)) {
    try {
        // 작업
    } finally {
        lock.release();
    }
}

Curator가 ZooKeeper의 복잡성을 추상화.


6. etcd 기반 락

6.1 etcd의 특징

  • CP 시스템 (Raft 기반)
  • Linearizable
  • Lease (TTL과 비슷)
  • Watches

6.2 락 구현

import etcd3

client = etcd3.client()

# Lease 생성 (10초)
lease = client.lease(ttl=10)

# 락 획득
lock = client.lock("my-lock", ttl=10)
lock.acquire()

try:
    do_work()
finally:
    lock.release()

6.3 ZooKeeper와 비교

ZooKeeperetcd
언어JavaGo
합의ZABRaft
운영복잡비교적 단순
사용Hadoop 생태계Kubernetes
APIJava 중심gRPC

Kubernetes는 etcd 사용 → 점점 etcd가 표준.


7. 데이터베이스 기반 락

7.1 SELECT FOR UPDATE

BEGIN;
SELECT * FROM accounts WHERE id = 123 FOR UPDATE;
-- 락 보유, 다른 트랜잭션은 대기
UPDATE accounts SET balance = balance - 100 WHERE id = 123;
COMMIT;
-- 락 해제

장점:

  • DB가 알아서
  • 트랜잭션과 통합
  • 자동 해제 (commit/rollback)

단점:

  • DB 부하
  • 락 컨텐션 시 성능 저하
  • 분산 트랜잭션 어려움

7.2 Advisory Locks (PostgreSQL)

-- 락 획득
SELECT pg_advisory_lock(12345);

-- 작업
UPDATE ...;

-- 락 해제
SELECT pg_advisory_unlock(12345);

장점:

  • 매우 빠름
  • 메모리 락 (행 락 X)
  • 세션 종료 시 자동 해제

사용: 백그라운드 작업 동기화, 마이그레이션.

7.3 Optimistic Locking

-- 1. 데이터와 버전 읽기
SELECT *, version FROM users WHERE id = 1;
-- version = 5

-- 2. 작업 후 업데이트
UPDATE users SET name = 'New', version = version + 1
WHERE id = 1 AND version = 5;
-- 영향 받은 행 = 0이면 실패 (다른 사람이 먼저 수정)

장점:

  • 락 없음 (대기 X)
  • 동시성 좋음
  • 분산 친화적

단점:

  • 충돌 시 재시도
  • 자주 충돌하면 비효율

8. 패턴과 함정

8.1 Lock Renewal (Heartbeat)

긴 작업의 경우:

import threading

class RenewingLock:
    def __init__(self, key, ttl=10):
        self.key = key
        self.token = None
        self.ttl = ttl
        self.stop_renewal = False
    
    def acquire(self):
        self.token = acquire_lock(self.key, self.ttl)
        if self.token:
            self.renewal_thread = threading.Thread(target=self._renew)
            self.renewal_thread.start()
        return self.token
    
    def _renew(self):
        while not self.stop_renewal:
            time.sleep(self.ttl / 3)
            redis.expire(self.key, self.ttl)
    
    def release(self):
        self.stop_renewal = True
        release_lock(self.key, self.token)

문제: GC pause 동안 renewal 못 함 → 락 만료. 여전히 안전 X.

8.3 Backoff with Jitter

import random

def acquire_with_retry(key, max_retries=10):
    for attempt in range(max_retries):
        if token := acquire_lock(key):
            return token
        # Exponential backoff + jitter
        delay = (2 ** attempt) * 0.1 + random.uniform(0, 0.5)
        time.sleep(delay)
    return None

Jitter 중요: 모든 클라이언트가 동시에 재시도하면 thundering herd.

8.3 Reentrant Lock

같은 클라이언트가 재진입 가능.

class ReentrantLock:
    def __init__(self, key):
        self.key = key
        self.count = 0
        self.token = None
    
    def acquire(self):
        if self.count > 0:
            self.count += 1
            return True
        
        self.token = acquire_lock(self.key)
        if self.token:
            self.count = 1
            return True
        return False
    
    def release(self):
        self.count -= 1
        if self.count == 0:
            release_lock(self.key, self.token)
            self.token = None

8.4 데드락 방지

여러 락을 사용할 때:

# 잘못 - 데드락 가능
def transfer(from_id, to_id, amount):
    lock1 = acquire_lock(f"account:{from_id}")
    lock2 = acquire_lock(f"account:{to_id}")
    # ...

# 두 사용자가 동시에 서로에게 송금:
# A → B: lock A, lock B
# B → A: lock B, lock A
# → 데드락
# 올바름 - 정렬된 순서
def transfer(from_id, to_id, amount):
    first, second = sorted([from_id, to_id])
    lock1 = acquire_lock(f"account:{first}")
    lock2 = acquire_lock(f"account:{second}")
    # ...

9. 분산 락을 피하는 방법

9.1 단일 책임자

각 작업을 한 워커만 처리:

[Job Queue]
[Worker 1] (jobs A, B)
[Worker 2] (jobs C, D)
[Worker 3] (jobs E, F)

파티셔닝:

  • 사용자 ID로 hash → 어떤 워커?
  • 같은 사용자는 항상 같은 워커
  • 락 불필요

9.2 멱등성 (Idempotency)

def process_payment(payment_id, amount):
    if db.exists(f"processed:{payment_id}"):
        return  # 이미 처리됨
    
    db.atomic_set(f"processed:{payment_id}", True)
    actually_charge(amount)

여러 번 실행해도 안전 → 락 불필요.

9.3 Compare-and-Swap (CAS)

UPDATE inventory 
SET quantity = quantity - 1 
WHERE product_id = 123 AND quantity > 0;

원자적, 락 없음. 영향 받은 행이 0이면 재고 없음.

9.4 Event Sourcing

상태 변경 대신 이벤트 추가:

events.append(OrderCreated(...))
events.append(InventoryReserved(...))
events.append(PaymentCharged(...))

append-only, 락 거의 없음. 결과는 이벤트 재생.

9.5 Actor Model

각 actor가 자체 상태와 mailbox.

class AccountActor extends Actor {
  var balance: BigDecimal = 0
  
  def receive = {
    case Deposit(amount) => balance += amount
    case Withdraw(amount) => balance -= amount
  }
}

한 번에 하나의 메시지 → 락 불필요.

Erlang/Elixir, Akka가 이 모델.


10. 권장사항

10.1 결정 트리

분산 락이 정말 필요한가?
├─ 아니오 → 멱등성, CAS, 파티셔닝 사용
└─ 예
   ├─ 효율성 목적 (대략 OK)
   │  └─ Redis 단일  (단순)
   └─ 정확성 목적 (절대 안 됨)
      ├─ ZooKeeper / etcd
      └─ Fencing token (storage가 지원하면)

10.2 Kleppmann의 결론

"효율성이 목적이면 Redlock으로 충분할 수 있다. 정확성이 목적이면 합의 알고리즘(ZAB, Raft)을 사용하라."

10.3 실전 권장

대부분의 경우:

  • 단순한 Redis 락 (SET NX EX)
  • 짧은 TTL
  • 멱등성 백업
  • "락이 실패해도 큰일 안 나는" 시스템

중요한 경우:

  • ZooKeeper / etcd
  • Fencing token
  • 트랜잭션 사용
  • 정확성을 코드로 보장

퀴즈

1. Redlock의 핵심 문제는?

: GC pause와 클럭 점프입니다. Martin Kleppmann이 2016년 비판: (1) GC pause — Client A가 락 획득 후 15초 GC pause로 멈춤, 락 만료, Client B가 락 획득, A가 깨어나 자기가 락 보유한다고 생각 → 둘 다 작업 → 데이터 손상. (2) 클럭 점프 — NTP 동기화로 시계가 점프하면 락 일찍 만료. 결론: Redlock은 효율성에는 OK, 정확성에는 부적합. 정확성이 필요하면 ZooKeeper/etcd + Fencing token 사용.

2. Fencing Token이 어떻게 안전성을 보장하나요?

: 단조 증가하는 토큰과 함께 락 발급. 락 보유자가 storage에 쓸 때 토큰을 함께 전달. Storage가 토큰 검증 — 옛 토큰의 쓰기는 거부. 시나리오: Client A 토큰=33 → GC pause → Client B 토큰=34 → B가 storage에 씀 (last_token=34) → A가 깨어나 storage에 씀 (token=33) → storage가 33 < 34 보고 거부. 핵심 요건: storage가 토큰을 추적해야 함. 대부분의 DB는 이를 지원 안 해서 추가 작업 필요.

3. ZooKeeper의 ephemeral node가 왜 분산 락에 유리한가요?

: 클라이언트 연결이 끊어지면 자동 삭제됩니다. 시나리오: 락 보유 클라이언트가 죽음 → ZooKeeper 세션 timeout → ephemeral node 자동 삭제 → 락 자동 해제 → 다른 클라이언트가 즉시 락 획득 가능. 좀비 락 없음. 또한 ZooKeeper는 CP 시스템 (Raft 기반 ZAB)이라 split-brain 방지. 단점: GC pause 동안 클라이언트가 살아있다고 생각하면 여전히 문제. Kubernetes는 비슷한 이유로 etcd 사용.

4. SELECT FOR UPDATE의 한계는?

: (1) DB 부하 — 락 보유 시간 동안 DB 리소스 점유, (2) 컨텐션 — 많은 트랜잭션이 같은 행을 잠그면 직렬화, (3) 데드락 — 락 순서가 다르면 발생, (4) 분산 트랜잭션 어려움 — 여러 DB에 걸친 락은 매우 어려움 (2PC), (5) 장기 트랜잭션 위험 — 트랜잭션이 길면 다른 작업 차단. 대안: Optimistic locking (version 컬럼), CAS, 파티셔닝, 멱등성. 단순한 경우는 SELECT FOR UPDATE가 가장 안전.

5. 분산 락을 어떻게 피할 수 있나요?

: 5가지 패턴: (1) 단일 책임자 — 파티셔닝으로 같은 키는 항상 같은 워커, 락 불필요, (2) 멱등성 — 여러 번 실행해도 안전한 작업, (3) CAS (Compare-and-Swap)UPDATE ... WHERE version=? 패턴, (4) Event Sourcing — append-only, 충돌 없음, (5) Actor Model — 각 actor가 한 번에 하나의 메시지만 처리. 분산 락은 마지막 수단. 가능한 한 데이터 모델이나 알고리즘으로 동시성 문제 해결.


참고 자료

Distributed Locks Complete Guide 2025: Redlock, Zookeeper, etcd, Fencing Token, Safe Implementation

TL;DR

  • Distributed locks are genuinely hard: completely different from single-machine mutexes
  • Single Redis lock: simple but single point of failure
  • Redlock: Redis cluster-based, contested
  • Fencing Token: guarantees true lock safety
  • Zookeeper/etcd: safer but more complex
  • Conclusion: avoid distributed locks when possible

1. Why Are Distributed Locks Hard?

1.1 Single-Machine Mutex

lock = threading.Lock()

with lock:
    critical_section()

How it works:

  • Lock variable in memory
  • Atomic CPU instructions
  • Managed by a single OS kernel

Guarantees:

  • Only one thread at a time
  • Automatic release (GC, death)
  • Very fast (~10ns)

1.2 Differences in a Distributed Environment

[Server A] ←─ network ─→ [Lock Service] ←─ network ─→ [Server B]

Problems:

  1. Network latency: several ms
  2. Network partitions: packet loss
  3. Clock drift: clocks differ between machines
  4. GC pauses: JVM may stall for 1 second
  5. Failures: what if the lock holder dies?
  6. Partial failure: only some nodes die

All assumptions of a single machine break down.

1.3 Two Purposes of Distributed Locks

1. Efficiency:

  • Avoid doing the same work twice
  • Cost savings
  • Lock failure is not a big deal (temporary inefficiency)

2. Correctness:

  • Guarantee data consistency
  • Lock failure = data corruption
  • Requires very strict guarantees

Most people want efficiency but mistakenly believe they need correctness guarantees.


2. Single Redis Lock

2.1 The Simplest Attempt

def acquire_lock(key, ttl=10):
    return redis.set(key, "locked", nx=True, ex=ttl)

def release_lock(key):
    redis.delete(key)

# Usage
if acquire_lock("my-resource"):
    try:
        do_work()
    finally:
        release_lock("my-resource")

SET key value NX EX ttl:

  • NX: only if the key does not exist
  • EX: set TTL

Atomic: Redis guarantees it.

2.2 First Problem — Releasing Another Client's Lock

# Client A acquires the lock
acquire_lock("my-resource")
# Client A is working...
# Work takes 30 seconds (exceeds 10s TTL)
# Lock expires automatically

# Client B acquires the lock
acquire_lock("my-resource")

# Client A finishes and releases the lock
release_lock("my-resource")
# → Releases Client B's lock!

# Client C can now acquire the lock
# Client B and C work concurrently!

2.3 Fix — Lock Identifier

import uuid

def acquire_lock(key, ttl=10):
    token = str(uuid.uuid4())
    if redis.set(key, token, nx=True, ex=ttl):
        return token
    return None

def release_lock(key, token):
    # Atomicity via Lua script
    script = """
    if redis.call('get', KEYS[1]) == ARGV[1] then
        return redis.call('del', KEYS[1])
    else
        return 0
    end
    """
    redis.eval(script, 1, key, token)

Improvement: only releases your own lock.

2.4 Second Problem — Single Point of Failure

[Redis]if it dies, all locks are lost
[Server A] [Server B] [Server C]

Replication:

[Master Redis]
async replication
[Slave Redis]

Problem: async replication.

  1. Client A acquires lock on master
  2. Master dies (before replication)
  3. Slave is promoted to master (no lock)
  4. Client B acquires the lock
  5. Both hold the lock → data corruption

3. Redlock Algorithm

3.1 Salvatore Sanfilippo (Redis Creator)'s Answer

Use multiple independent Redis instances. Must acquire the lock on a majority to succeed.

[Redis 1] [Redis 2] [Redis 3] [Redis 4] [Redis 5]
  ↓        ↓         ↓         ↓         ↓
[Client]try SET NX EX on all
3/5 or more successes = lock acquired

3.2 Algorithm

def acquire_redlock(key, ttl=10000):
    token = str(uuid.uuid4())
    quorum = len(redis_clients) // 2 + 1  # 3/5
    
    start_time = time.time() * 1000
    acquired = []
    
    for client in redis_clients:
        try:
            # Very short timeout
            if client.set(key, token, nx=True, px=ttl):
                acquired.append(client)
        except:
            pass
    
    elapsed = time.time() * 1000 - start_time
    
    if len(acquired) >= quorum and elapsed < ttl:
        return token  # Lock acquired
    else:
        # Failure - release everything
        for client in acquired:
            release_lock(client, key, token)
        return None

3.3 Redlock's Assumptions

  1. Clocks are reasonably synchronized
  2. GC pauses are short
  3. Network latency < TTL

3.4 Martin Kleppmann's Critique (2016)

The famous blog post "How to do distributed locking":

1. GC Pause Problem:

1. Client A acquires lock (10s TTL)
2. Client A stalls 15s on GC (lock expires)
3. Client B acquires lock
4. Client A wakes up → thinks it still holds the lock
5. Client A and B work concurrently

2. Clock Jump:

1. Client A acquires lock (10s TTL)
2. Redis clock suddenly jumps 5s (NTP sync)
3. Lock expires early
4. Client B acquires lock

Kleppmann's conclusion: Redlock is unsuitable for correctness-critical systems.

3.5 Antirez's Rebuttal

Salvatore Sanfilippo's response:

  • Use monotonic clocks (no jumps)
  • GC pauses are a universal problem
  • Good enough for efficiency purposes

Outcome of the debate: no consensus. Kleppmann's position is more widely accepted.


4. Fencing Token

4.1 Core Idea

Lock + monotonically increasing token.

1. Client A acquires lock, token = 33
2. Client A stalls on GC
3. Client B acquires lock, token = 34
4. Client B writes to storage (token 34)
5. Client A wakes up, writes to storage (token 33)
6. Storage sees 33 < 34 → rejects

4.2 Implementation

def acquire_lock_with_token(key):
    token = redis.incr(f"{key}:token")  # Monotonic
    if redis.set(key, token, nx=True, ex=10):
        return token
    return None

def write_to_storage(data, token):
    # Storage validates the token
    if storage.last_token >= token:
        raise StaleTokenError()
    storage.write(data)
    storage.last_token = token

4.3 Key Requirement

Storage must validate the token:

  • DB tracks the token
  • Rejects old tokens

Most storage systems do not support this → extra work needed.

4.4 Where Is It Used?

  • HBase: fencing combined with ZooKeeper
  • Apache Cassandra: lightweight transactions
  • PostgreSQL: row versions (optimistic locking)

5. ZooKeeper-Based Locks

5.1 ZooKeeper's Strengths

  • CP system (consistency first)
  • Linearizable
  • Ephemeral nodes (auto-deleted when client dies)
  • Watches (change notifications)

5.2 Lock Implementation

Recipe: sequential ephemeral nodes.

def acquire_lock(zk, lock_path):
    # 1. Create sequential ephemeral node
    my_node = zk.create(
        f"{lock_path}/lock-",
        ephemeral=True,
        sequence=True
    )
    # e.g. /lock/lock-0000000005
    
    while True:
        # 2. Get all child nodes
        children = sorted(zk.get_children(lock_path))
        
        # 3. If I'm smallest, acquire the lock
        if my_node.endswith(children[0]):
            return my_node
        
        # 4. Otherwise watch the node right before me
        my_index = children.index(my_node.split('/')[-1])
        prev_node = children[my_index - 1]
        
        zk.exists(f"{lock_path}/{prev_node}", watch=callback)
        wait_for_callback()

5.3 ZooKeeper Advantages

1. Automatic release:

  • Ephemeral node is auto-deleted on client death
  • No zombie locks

2. Accurate ordering:

  • Sequence numbers guarantee fairness

3. Network partition safety:

  • ZooKeeper prevents split-brain
  • If the lock holder is separated from the quorum, it cannot hold the lock

5.4 ZooKeeper Disadvantages

  • Operationally complex: separate cluster
  • Performance: slower than Redis
  • GC pauses still apply: same problem if the client GCs

5.5 Curator Framework (Java)

import org.apache.curator.framework.recipes.locks.InterProcessMutex;

InterProcessMutex lock = new InterProcessMutex(client, "/my-lock");
if (lock.acquire(10, TimeUnit.SECONDS)) {
    try {
        // work
    } finally {
        lock.release();
    }
}

Curator abstracts away ZooKeeper's complexity.


6. etcd-Based Locks

6.1 Characteristics of etcd

  • CP system (Raft-based)
  • Linearizable
  • Lease (similar to TTL)
  • Watches

6.2 Lock Implementation

import etcd3

client = etcd3.client()

# Create lease (10 seconds)
lease = client.lease(ttl=10)

# Acquire lock
lock = client.lock("my-lock", ttl=10)
lock.acquire()

try:
    do_work()
finally:
    lock.release()

6.3 Comparison with ZooKeeper

ZooKeeperetcd
LanguageJavaGo
ConsensusZABRaft
OperationsComplexRelatively simple
UsageHadoop ecosystemKubernetes
APIJava-centricgRPC

Kubernetes uses etcd → etcd is increasingly becoming the standard.


7. Database-Based Locks

7.1 SELECT FOR UPDATE

BEGIN;
SELECT * FROM accounts WHERE id = 123 FOR UPDATE;
-- Hold the lock, other transactions wait
UPDATE accounts SET balance = balance - 100 WHERE id = 123;
COMMIT;
-- Release the lock

Pros:

  • DB handles it
  • Integrated with transactions
  • Auto-release (commit/rollback)

Cons:

  • DB load
  • Performance degradation under lock contention
  • Distributed transactions are difficult

7.2 Advisory Locks (PostgreSQL)

-- Acquire lock
SELECT pg_advisory_lock(12345);

-- Work
UPDATE ...;

-- Release lock
SELECT pg_advisory_unlock(12345);

Pros:

  • Very fast
  • In-memory lock (not a row lock)
  • Auto-released when session ends

Use cases: background job synchronization, migrations.

7.3 Optimistic Locking

-- 1. Read data and version
SELECT *, version FROM users WHERE id = 1;
-- version = 5

-- 2. Update after the work
UPDATE users SET name = 'New', version = version + 1
WHERE id = 1 AND version = 5;
-- If affected rows = 0, fail (someone else modified first)

Pros:

  • No locks (no waiting)
  • Good concurrency
  • Distribution-friendly

Cons:

  • Retry on conflict
  • Inefficient under frequent conflicts

8. Patterns and Pitfalls

8.1 Lock Renewal (Heartbeat)

For long-running work:

import threading

class RenewingLock:
    def __init__(self, key, ttl=10):
        self.key = key
        self.token = None
        self.ttl = ttl
        self.stop_renewal = False
    
    def acquire(self):
        self.token = acquire_lock(self.key, self.ttl)
        if self.token:
            self.renewal_thread = threading.Thread(target=self._renew)
            self.renewal_thread.start()
        return self.token
    
    def _renew(self):
        while not self.stop_renewal:
            time.sleep(self.ttl / 3)
            redis.expire(self.key, self.ttl)
    
    def release(self):
        self.stop_renewal = True
        release_lock(self.key, self.token)

Problem: renewal cannot run during a GC pause → lock expires. Still not safe.

8.3 Backoff with Jitter

import random

def acquire_with_retry(key, max_retries=10):
    for attempt in range(max_retries):
        if token := acquire_lock(key):
            return token
        # Exponential backoff + jitter
        delay = (2 ** attempt) * 0.1 + random.uniform(0, 0.5)
        time.sleep(delay)
    return None

Jitter matters: without it, all clients retry simultaneously — thundering herd.

8.3 Reentrant Lock

Allows the same client to re-enter.

class ReentrantLock:
    def __init__(self, key):
        self.key = key
        self.count = 0
        self.token = None
    
    def acquire(self):
        if self.count > 0:
            self.count += 1
            return True
        
        self.token = acquire_lock(self.key)
        if self.token:
            self.count = 1
            return True
        return False
    
    def release(self):
        self.count -= 1
        if self.count == 0:
            release_lock(self.key, self.token)
            self.token = None

8.4 Deadlock Avoidance

When using multiple locks:

# Wrong - deadlock possible
def transfer(from_id, to_id, amount):
    lock1 = acquire_lock(f"account:{from_id}")
    lock2 = acquire_lock(f"account:{to_id}")
    # ...

# Two users simultaneously transferring to each other:
# A → B: lock A, lock B
# B → A: lock B, lock A
# → Deadlock
# Correct - sorted ordering
def transfer(from_id, to_id, amount):
    first, second = sorted([from_id, to_id])
    lock1 = acquire_lock(f"account:{first}")
    lock2 = acquire_lock(f"account:{second}")
    # ...

9. How to Avoid Distributed Locks

9.1 Single Ownership

Each task is processed by only one worker:

[Job Queue]
[Worker 1] (jobs A, B)
[Worker 2] (jobs C, D)
[Worker 3] (jobs E, F)

Partitioning:

  • Hash by user ID → which worker?
  • Same user always goes to the same worker
  • No locks needed

9.2 Idempotency

def process_payment(payment_id, amount):
    if db.exists(f"processed:{payment_id}"):
        return  # Already processed
    
    db.atomic_set(f"processed:{payment_id}", True)
    actually_charge(amount)

Safe to run multiple times → no locks needed.

9.3 Compare-and-Swap (CAS)

UPDATE inventory 
SET quantity = quantity - 1 
WHERE product_id = 123 AND quantity > 0;

Atomic, no locks. If affected rows = 0, no inventory.

9.4 Event Sourcing

Append events instead of mutating state:

events.append(OrderCreated(...))
events.append(InventoryReserved(...))
events.append(PaymentCharged(...))

Append-only, almost no locks. State is computed by replaying events.

9.5 Actor Model

Each actor owns its state and a mailbox.

class AccountActor extends Actor {
  var balance: BigDecimal = 0
  
  def receive = {
    case Deposit(amount) => balance += amount
    case Withdraw(amount) => balance -= amount
  }
}

One message at a time → no locks needed.

Erlang/Elixir, Akka follow this model.


10. Recommendations

10.1 Decision Tree

Do you really need a distributed lock?
├─ No → use idempotency, CAS, partitioning
└─ Yes
   ├─ Efficiency goal (roughly OK)
   │  └─ Single Redis lock (simple)
   └─ Correctness goal (never OK)
      ├─ ZooKeeper / etcd
      └─ Fencing token (if storage supports)

10.2 Kleppmann's Conclusion

"If efficiency is the goal, Redlock may be sufficient. If correctness is the goal, use consensus algorithms (ZAB, Raft)."

10.3 Practical Recommendations

Most cases:

  • Simple Redis lock (SET NX EX)
  • Short TTL
  • Idempotency as backup
  • Systems where "lock failure is not catastrophic"

Critical cases:

  • ZooKeeper / etcd
  • Fencing token
  • Use transactions
  • Guarantee correctness in code

Quiz

1. What is Redlock's core problem?

Answer: GC pauses and clock jumps. Martin Kleppmann's 2016 critique: (1) GC pause — Client A acquires a lock, then stalls 15s on GC, the lock expires, Client B acquires the lock, A wakes up thinking it still owns the lock → both work concurrently → data corruption. (2) Clock jump — NTP sync causes the clock to jump, expiring the lock early. Conclusion: Redlock is OK for efficiency, unsuitable for correctness. For correctness, use ZooKeeper/etcd plus a fencing token.

2. How does a Fencing Token guarantee safety?

Answer: The lock is issued together with a monotonically increasing token. When the lock holder writes to storage, it passes the token along. Storage validates the token — writes from old tokens are rejected. Scenario: Client A token=33 → GC pause → Client B token=34 → B writes to storage (last_token=34) → A wakes up and writes to storage (token=33) → storage sees 33 < 34 and rejects. Key requirement: storage must track the token. Most DBs do not natively support this, so extra work is needed.

3. Why are ZooKeeper ephemeral nodes good for distributed locks?

Answer: They are automatically deleted when the client disconnects. Scenario: the lock-holding client dies → ZooKeeper session times out → ephemeral node is auto-deleted → lock is released automatically → another client can immediately acquire it. No zombie locks. ZooKeeper is also a CP system (ZAB, Raft-like), preventing split-brain. Downside: if the client is alive but GCing, the same problem persists. Kubernetes uses etcd for similar reasons.

4. What are the limits of SELECT FOR UPDATE?

Answer: (1) DB load — DB resources are held for the duration of the lock, (2) Contention — many transactions locking the same row get serialized, (3) Deadlocks — occur when lock ordering differs, (4) Distributed transactions are hard — locks spanning multiple DBs are very hard (2PC), (5) Long-transaction risk — long transactions block other work. Alternatives: optimistic locking (version column), CAS, partitioning, idempotency. For simple cases, SELECT FOR UPDATE is the safest.

5. How can you avoid distributed locks?

Answer: Five patterns: (1) Single ownership — partition so the same key always goes to the same worker, no lock needed, (2) Idempotency — operations that are safe to run multiple times, (3) CAS (Compare-and-Swap) — the UPDATE ... WHERE version=? pattern, (4) Event Sourcing — append-only, no conflicts, (5) Actor Model — each actor processes one message at a time. Distributed locks are a last resort. Whenever possible, solve concurrency through the data model or algorithm.


References