Skip to content

✍️ 필사 모드: Redis Internals & Distributed Cache — Single Thread, Data Structures, Cluster, Sentinel, RDB/AOF, Redlock, Valkey, Dragonfly Deep Dive (2025)

English
0%
정확도 0%
💡 왼쪽 원문을 읽으면서 오른쪽에 따라 써보세요. Tab 키로 힌트를 받을 수 있습니다.

"Redis is what happens when a C programmer falls in love with data structures." — Salvatore Sanfilippo (antirez)

Few systems have a gap as large as Redis between "just using it" and "really understanding it." Most developers use only GET/SET, and only as a string cache. But Redis is really an In-Memory Data Structure Server — caching is just a byproduct.

Salvatore Sanfilippo built it in 2009 for his analytics tool LLOOGG. MySQL couldn't maintain realtime rankings, so he wrote his own "Remote Dictionary Server." VMware hired him in 2010, then Pivotal (2013), then Redis Labs (now Redis Inc) in 2015. Then in March 2024, Redis abruptly abandoned its open-source license. The Linux Foundation forked Valkey, opening a new front in the cloud wars.

This article is a map for those who want to really understand Redis.


1. Why Redis Is Fast — The Single-Thread Paradox

The Common Misunderstanding

"Single-threaded yet fast? Isn't that wasting cores?"

No. Since Redis 6.0, network I/O is multi-threaded, but command execution remains single-threaded. And that's why it's fast.

The Logic of Single-Thread

  1. Memory speed approaches CPU speed — bottleneck shifts to network and syscalls
  2. No locks — zero mutex overhead on shared structures
  3. No context switches — maximum CPU cache locality
  4. Atomicity is free — every command is atomic
  5. Debuggable — bugs are reproducible

antirez: "Even in 2009 I knew lock-based multithreading was hard in practice. Let's make something fast without locks."

But How 1M QPS?

A single Redis instance routinely exceeds 1 million commands per second. The tricks:

  • I/O multiplexingepoll (Linux) / kqueue (BSD) event loop watching thousands of sockets from one thread
  • RESP protocol — simple text-based, minimal parsing cost
  • Pipelining — batch multiple commands, receive bundled responses
  • Not zero-copy, but units are small enough
  • Mostly O(1)O(1) or O(logN)O(\log N) data structures

One Event Loop Iteration

1. epoll_wait() — find ready sockets
2. Parse request (RESP)
3. Execute command (single thread, memory-only)
4. Write response buffer
5. Move to next event

If one command takes long? Everything stalls. Hence KEYS * and FLUSHALL are forbidden in production. Use SCAN and UNLINK instead.


2. Data Structures — The Real Power

Run OBJECT ENCODING mykey and you'll be surprised. The same "String" is stored as int for numbers, embstr for short, raw for long.

9 Core Data Structures

StructureUse CaseInternalNotes
Stringstrings/numbers/binarySDSup to 512MB
Listqueue/stackQuickListbidirectional O(1) push/pop
Hashobject fieldslistpack/hashtableziplist optimization for small hashes
Setunique valueslistpack/intset/hashtableintset if all integers
Sorted Setrankings/priority queueSkip List + hashhistorically interesting choice
Streamevent logRadix TreeKafka-like consumer group
HyperLogLogcardinality estimationfixed 12KB, 0.81% stddevprobabilistic
Bitmapbit arrayString-backed1B user DAU in 128MB
Geospatiallocation queriesSorted Set + GeohashGEOADD/GEORADIUS

SDS — Why Not C Strings

struct sdshdr {
    int len;     // O(1) strlen
    int free;    // slack (fewer reallocs)
    char buf[];  // data + '\0'
};
  • O(1) strlen (C strings are O(N))
  • Binary-safe (embedded \0 OK)
  • Fewer reallocations (2x buffering)

Why Skip List for Sorted Set

antirez explained on his blog:

  1. Simpler implementation — half the code of B-Tree/Red-Black
  2. Range query friendly — sorted linked list structure
  3. Decent memory locality
  4. Easy to debug — no tree rotations

"I chose Skip List not because it's optimal, but because it's simple to implement." — antirez

HyperLogLog — 12KB for Billions

  • "How many unique visitors today?" → Set would explode memory
  • HLL is probabilistic — 0.81% stddev
  • Fixed 12KB: 100M users or 10B users, still 12KB
PFADD visitors user:1 user:2 user:3
PFCOUNT visitors
PFMERGE today yesterday

Bitmap — Power of SETBIT

SETBIT user:active:20260415 12345 1
BITCOUNT user:active:20260415
BITOP AND weekly user:active:*

1B users → 1B bits → 128MB. Enables "users active every day for 30 days" queries faster than any RDB.

Stream — Kafka-Lite (2018)

Redis 5.0 added Streams with consumer groups and offsets. But persistence depends on Redis, so treat it as a lightweight message bus, not a Kafka replacement.


3. Persistence — RDB vs AOF Tradeoffs

RDB (Redis Database) — Snapshots

  • Periodic full dump to binary file
  • BGSAVE — fork(); child dumps, parent keeps serving
  • Copy-on-Write prevents parent memory doubling (usually)
  • Con: data after fork point may be lost
  • Pro: fast restart, small file

AOF (Append-Only File) — Log

  • Append every write to file
  • fsync policy:
    • always — per command (slow, no loss)
    • everysec — per second (default, max 1s loss)
    • no — OS-managed (fast, large loss)
  • AOF Rewrite — periodic compaction
  • Con: slow restart, large file
  • Pro: near-zero loss

Hybrid — Redis 4.0+

aof-use-rdb-preamble yes — RDB at the head, incremental AOF at the tail. Fast restart + minimal loss. De facto standard.

Decision Matrix

ScenarioRDBAOFHybrid
CacheOK (no persistence needed)XX
Session storeXOK (everysec)OK
Fast restart priorityOKXOK
Primary storeXOK (always)OK (best)
Disk I/O sensitiveOKXcareful

Can You Use It as Primary Storage?

Not recommended. antirez warned repeatedly. Redis targets "fast cache + minimal loss," not "perfect durability." Put important data in Postgres/MySQL; use Redis as cache.


4. Redis Cluster — The Art of Hash Slots

Why Cluster?

  • Single Redis has memory/throughput limits (tens of GB per instance)
  • Multi-shard needed → Redis Cluster (3.0, 2015)

16384 Hash Slots

  • Hash key with CRC16, modulo 16384 (2^14)
  • Each slot assigned to a master
  • 3 masters → ~5461 slots each

Why 16384

From antirez's famous GitHub issue reply:

  1. Slot bitmap sent in gossip: 16384 bits = 2KB
  2. 65536 would be 8KB — too large for gossip
  3. Below 1000 nodes, 16384 gives enough distribution quality

MOVED & ASK — Redirection

  • MOVED <slot> host:port — "this slot lives there permanently"
  • ASK <slot> host:port — "migrating now; ask there once"

Smart clients (Lettuce, redis-py-cluster) cache the slot map and refresh on MOVED.

Hash Tag — Same-Slot Guarantee

SET {user:1}:profile "..."
SET {user:1}:sessions "..."

Only the part inside {} is hashed. Essential for MULTI/EXEC, SUNION, Lua scripts touching multiple keys.

Gossip & Failure Detection

  • Nodes exchange PING/PONG
  • Beyond cluster-node-timeoutPFAIL (subjective)
  • Majority PFAIL reports → FAIL (objective)
  • Replica auto-promotes

Limitations

  • Multi-key commands (SINTER) require same slot
  • No cross-slot transactions
  • Backups are complex (per-node)

5. Sentinel — Another Path to HA

Cluster bundles sharding and HA. Sentinel provides HA only.

Structure

  • 1 master + N replicas + M Sentinels (3+, odd)
  • Sentinels monitor master health
  • On failure, Raft-like consensus elects new leader
  • Clients ask Sentinel for current master

Cluster vs Sentinel

AspectSentinelCluster
ShardingXOK
HAOKOK
Config complexityLowMedium
Client supportBroadSmart client required
Scalesmall–mediummedium–large
Multi-key commandsFullHash Tag required

Rule of thumb: fits in one node's memory → Sentinel; otherwise → Cluster.


6. Cache Patterns — The Hot Potato

Cache-Aside (Lazy Loading)

def get_user(id):
    user = redis.get(f"user:{id}")
    if user is None:
        user = db.query(id)
        redis.set(f"user:{id}", user, ex=3600)
    return user
  • Most common
  • Pros: simple, DB only hit on miss
  • Cons: cold first request, possible stale data

Write-Through

def update_user(id, data):
    db.update(id, data)
    redis.set(f"user:{id}", data, ex=3600)
  • Updates DB + cache on every write
  • Good consistency
  • Cons: write latency increases, caches cold data too

Write-Behind (Write-Back)

def update_user(id, data):
    redis.set(f"user:{id}", data)
    queue.push({"id": id, "data": data})
  • Minimal write latency
  • Cons: data-loss risk, complex

Refresh-Ahead

  • Async refresh of near-expiry cache
  • Prevents Thundering Herd (below)

Production Mix

Most real systems: Cache-Aside + TTL + (conditional) Write-Through. TTL strategy matters:

  • Short TTL (1–5 min) — freshness matters
  • Long TTL (1 hr+) — rarely changes
  • Jitterex=3600 + random(-300, 300) — avoid synchronous expiry

7. Thundering Herd & Cache Stampede

The most common, subtlest Redis failure cause.

Scenario

  1. Hot key expires
  2. Thousands of concurrent requests miss cache
  3. All hit the DB → DB overload → full outage

Fix 1 — Mutex Lock

def get_with_lock(key):
    value = redis.get(key)
    if value is not None:
        return value
    lock = redis.set(f"lock:{key}", "1", nx=True, ex=10)
    if not lock:
        time.sleep(0.05)
        return redis.get(key)
    try:
        value = db.query(...)
        redis.set(key, value, ex=3600)
        return value
    finally:
        redis.delete(f"lock:{key}")

Fix 2 — Probabilistic Early Expiration (XFetch)

Refresh before expiry with probability. Paper: "Optimal Probabilistic Cache Stampede Prevention."

def fetch(key, beta=1.0):
    value, ttl, delta = redis.get_with_meta(key)
    now = time.time()
    if value is None or now - delta * beta * math.log(random.random()) >= ttl:
        value = db.query(...)
        redis.set(key, value, ex=3600)
    return value

Fix 3 — Dual TTL (Soft & Hard)

  • Past soft TTL: background refresh, return stale
  • Past hard TTL: synchronous refresh

8. Distributed Locks — The Redlock Debate

SET key value NX PX 10000 makes a tempting lock.

Single-Instance Lock

SET lock:resource unique_id NX PX 30000
EVAL "if redis.call('get',KEYS[1])==ARGV[1] then return redis.call('del',KEYS[1]) else return 0 end" 1 lock:resource unique_id

Redlock — 5-Instance Distributed Lock

antirez's algorithm:

  1. Attempt lock on 5 independent Redis instances
  2. Acquired if majority (3) succeed and total time < TTL
  3. On failure, release all

Martin Kleppmann's Critique

The DDIA author rebutted famously:

  • No fencing token — GC pause can hold expired lock
  • Clock-sync assumption weak — NTP jumps, VM pauses break TTL
  • If correctness matters, don't use Redlock — use ZooKeeper or etcd

antirez's Counter

  • Redlock is for performance — tolerable occasional double-exec
  • For correctness, use DB transactions
  • Fencing tokens are implementable (monotonic counter)

Practical Conclusion

PurposeTool
Dedup (performance)Redis SET NX PX
Leader election (correctness)ZooKeeper/etcd
TransactionsDB itself
Money/paymentsNever Redis locks

9. The 2024 License Drama and Valkey Fork

On March 20, 2024, Redis Inc announced:

  • Redis 7.4+ switches BSD → SSPL/RSALv2 dual license
  • Aimed at cloud providers (AWS ElastiCache etc.)
  • Open-source community was outraged

Valkey — Counter-Strike in 48 Hours

  • March 28: Linux Foundation launches Valkey
  • Founding sponsors: AWS, Google Cloud, Oracle, Ericsson
  • Forked from Redis 7.2.4, stays BSD 3-Clause
  • Most core maintainers moved to Valkey

antirez Returns

November 2024, antirez rejoins Redis Inc. In 2025 focuses on vector search (RedisVL) and AI.

2025 Landscape

ProductLicenseOwnerContributors
RedisSSPL/RSALv2Redis Incantirez returned
ValkeyBSD 3-ClauseLinux FoundationAWS/Google/Oracle
KeyDBBSD 3-ClauseSnapmulti-thread fork
DragonflyBSL/Apache 2.0Dragonfly Labsrewrite from scratch

Guide:

  • Public cloud managed → doesn't matter (ElastiCache moving to Valkey)
  • Self-hosted + open source purist → Valkey
  • Multi-thread extremes → Dragonfly
  • Need Redis 7.4+ features → Redis

10. Dragonfly — "25x Faster Than Redis"

Launched 2022, rewritten from scratch in C++20. Secrets:

  • Multi-thread shared-nothing — each thread owns a shard, no locks
  • io_uring — Linux's modern async I/O (faster than epoll)
  • Dash hashtable — cache-friendly, based on academic paper
  • 30x faster RDB save — novel snapshot algorithm

Performance (2025)

  • Single c7g.16xlarge: 6.5M QPS (Redis ~200K)
  • 30% better memory efficiency

Limits

  • Limited Lua scripting
  • Cluster protocol not 100% compatible
  • Some edge-case commands missing
  • Small ops tooling ecosystem

KeyDB

  • Snap's multi-thread fork, ~100% compatible
  • Development slowed post-2024; momentum moved to Dragonfly

11. Memory Management — 8 Ways to Avoid OOM

Redis OOM = immediate outage.

maxmemory + Eviction Policy

maxmemory 4gb
maxmemory-policy allkeys-lru

Policies:

  • noeviction — reject writes (default, risky)
  • allkeys-lru — LRU across all
  • volatile-lru — LRU among TTL keys
  • allkeys-lfu — LFU (4.0+, recommended)
  • volatile-ttl — nearest TTL first
  • allkeys-random / volatile-random

For caching, allkeys-lfu is preferred. LRU is weak against scan attacks.

Profiling

MEMORY USAGE user:1
MEMORY STATS
MEMORY DOCTOR

Big Key

Single key > 1MB is dangerous:

  • KEYS scan cost
  • Network latency
  • Full move on cluster resharding
  • Fix: split into hash/list

Hot Key

Single key hogs one CPU core:

  • Fix: client-side cache, shard, read from replicas

TTL Strategy

  • Always set TTL — infinite keys leak memory
  • Add jitter — avoid synchronized expiry
  • Hot keys longer TTL, cold shorter

12. Lua Scripting & Functions

Redis embeds Lua 5.1, executes multi-command atomic blocks.

local stock = tonumber(redis.call('GET', KEYS[1]))
if stock >= tonumber(ARGV[1]) then
    redis.call('DECRBY', KEYS[1], ARGV[1])
    return 1
else
    return 0
end

Functions (Redis 7.0+)

Scripts stored as server-side libraries, managed per function.

Caveats

  • Lua blocks all other commands — no long scripts
  • For hot scripts use SCRIPT LOAD + EVALSHA

13. Client-Side Caching (Tracking) — Redis 6.0

Redis pushes invalidation to clients that hold local caches.

CLIENT TRACKING ON REDIRECT 1234 BCAST PREFIX user:
  • Removes roundtrips → ultra-low latency
  • Supported by Lettuce, redis-py, most modern drivers
  • Requires client-side memory management

14. Monitoring — 10 Must-Watch Metrics

MetricDescriptionAlert
used_memory / maxmemorymemory usage80%
evicted_keysevictionstrending up
connected_clientsconcurrent conns> 10K watch
instantaneous_ops_per_secQPSvs baseline
latency_percentiles_usec_*p99 latency> 1ms
rejected_connectionsrejections0
keyspace_hits / misseshit ratio> 90%
aof_current_sizeAOF sizedisk headroom
rdb_last_save_timelast RDBstale = warn
master_link_status (replica)repl linkup

Slow Log

CONFIG SET slowlog-log-slower-than 10000
SLOWLOG GET 10

Never MONITOR

MONITOR streams every command → 50%+ perf drop. Forbidden in production. Use SLOWLOG, LATENCY instead.


15. Anti-Patterns Top 10

  1. Mass insert without TTL — memory leak
  2. KEYS * or FLUSHALL — single-thread stall
  3. Giant Hash/List accumulation — second-long commands
  4. Only-in-cache data — unrecoverable on loss
  5. Same TTL for all keys — simultaneous expiry → stampede
  6. Redis lock for money — see Redlock debate
  7. Pub/Sub as durable queue — messages vanish, use Streams
  8. Write-Through with short TTL — pointless double work
  9. Writing to replicas — async replication, data loss
  10. Cross-slot transactions on cluster — silent failure

16. Sensible Redis Checklist

  • Is it cache or storage? Be explicit
  • Set maxmemory and eviction policy explicitly
  • TTL on every key with jitter
  • AOF + everysec for 1s loss budget, consider hybrid
  • Ban MONITOR / KEYS * / FLUSHALL
  • Alert on Big Key / Hot Key (1MB / 10K QPS)
  • Target 90%+ hit rate; investigate misses
  • Thundering Herd plan — mutex or XFetch
  • Distributed lock is performance-only
  • Decide Cluster vs Sentinel by data size
  • Evaluate client-side caching for read-heavy workloads
  • Practice replica-promotion (failover drill)

Closing — The Elegance of Single Thread

Redis's success is paradoxical: in an age of parallelism, serving 1M QPS from one thread. Behind it is antirez's philosophy.

"Real engineering isn't making the complex simple; it's designing something simple from the start." — antirez

Like many "legacy" technologies, Redis is deeper than it looks. Data structures, I/O model, persistence, distribution, tradeoffs — every choice has a reason. The 2024 license fight is the old open-source vs commercialization tension erupting; Valkey/Dragonfly's rise shows we've entered an era where "the Redis interface matters more than Redis itself."


Next — PostgreSQL Internals & Query Optimization

If Redis is "the elegance of data structures," PostgreSQL is "the relational DB masterpiece." Next we'll cover:

  • MVCC, VACUUM, WAL and replication
  • Query planner internals
  • B-Tree/Hash/GiST/GIN/BRIN/HNSW (pgvector)
  • Partitioning, pgBouncer
  • JSONB vs Document DB
  • PostgreSQL 18 (2025) new features (AIO, DirectIO, UUIDv7)

Making the database a transparent engine, not a black box.


"Redis was originally a tool for me. Even when 10K, then 1M people used it, I kept making it 'easy for me to use.' That's Redis's secret." — Salvatore Sanfilippo (2025 return interview)

현재 단락 (1/311)

Few systems have a gap as large as Redis between "just using it" and "really understanding it." Most...

작성 글자: 0원문 글자: 14,932작성 단락: 0/311