✍️ 필사 모드: Redis Internals & Distributed Cache — Single Thread, Data Structures, Cluster, Sentinel, RDB/AOF, Redlock, Valkey, Dragonfly Deep Dive (2025)
English"Redis is what happens when a C programmer falls in love with data structures." — Salvatore Sanfilippo (antirez)
Few systems have a gap as large as Redis between "just using it" and "really understanding it." Most developers use only GET/SET, and only as a string cache. But Redis is really an In-Memory Data Structure Server — caching is just a byproduct.
Salvatore Sanfilippo built it in 2009 for his analytics tool LLOOGG. MySQL couldn't maintain realtime rankings, so he wrote his own "Remote Dictionary Server." VMware hired him in 2010, then Pivotal (2013), then Redis Labs (now Redis Inc) in 2015. Then in March 2024, Redis abruptly abandoned its open-source license. The Linux Foundation forked Valkey, opening a new front in the cloud wars.
This article is a map for those who want to really understand Redis.
1. Why Redis Is Fast — The Single-Thread Paradox
The Common Misunderstanding
"Single-threaded yet fast? Isn't that wasting cores?"
No. Since Redis 6.0, network I/O is multi-threaded, but command execution remains single-threaded. And that's why it's fast.
The Logic of Single-Thread
- Memory speed approaches CPU speed — bottleneck shifts to network and syscalls
- No locks — zero mutex overhead on shared structures
- No context switches — maximum CPU cache locality
- Atomicity is free — every command is atomic
- Debuggable — bugs are reproducible
antirez: "Even in 2009 I knew lock-based multithreading was hard in practice. Let's make something fast without locks."
But How 1M QPS?
A single Redis instance routinely exceeds 1 million commands per second. The tricks:
- I/O multiplexing —
epoll(Linux) /kqueue(BSD) event loop watching thousands of sockets from one thread - RESP protocol — simple text-based, minimal parsing cost
- Pipelining — batch multiple commands, receive bundled responses
- Not zero-copy, but units are small enough
- Mostly or data structures
One Event Loop Iteration
1. epoll_wait() — find ready sockets
2. Parse request (RESP)
3. Execute command (single thread, memory-only)
4. Write response buffer
5. Move to next event
If one command takes long? Everything stalls. Hence KEYS * and FLUSHALL are forbidden in production. Use SCAN and UNLINK instead.
2. Data Structures — The Real Power
Run OBJECT ENCODING mykey and you'll be surprised. The same "String" is stored as int for numbers, embstr for short, raw for long.
9 Core Data Structures
| Structure | Use Case | Internal | Notes |
|---|---|---|---|
| String | strings/numbers/binary | SDS | up to 512MB |
| List | queue/stack | QuickList | bidirectional O(1) push/pop |
| Hash | object fields | listpack/hashtable | ziplist optimization for small hashes |
| Set | unique values | listpack/intset/hashtable | intset if all integers |
| Sorted Set | rankings/priority queue | Skip List + hash | historically interesting choice |
| Stream | event log | Radix Tree | Kafka-like consumer group |
| HyperLogLog | cardinality estimation | fixed 12KB, 0.81% stddev | probabilistic |
| Bitmap | bit array | String-backed | 1B user DAU in 128MB |
| Geospatial | location queries | Sorted Set + Geohash | GEOADD/GEORADIUS |
SDS — Why Not C Strings
struct sdshdr {
int len; // O(1) strlen
int free; // slack (fewer reallocs)
char buf[]; // data + '\0'
};
- O(1) strlen (C strings are O(N))
- Binary-safe (embedded
\0OK) - Fewer reallocations (2x buffering)
Why Skip List for Sorted Set
antirez explained on his blog:
- Simpler implementation — half the code of B-Tree/Red-Black
- Range query friendly — sorted linked list structure
- Decent memory locality
- Easy to debug — no tree rotations
"I chose Skip List not because it's optimal, but because it's simple to implement." — antirez
HyperLogLog — 12KB for Billions
- "How many unique visitors today?" → Set would explode memory
- HLL is probabilistic — 0.81% stddev
- Fixed 12KB: 100M users or 10B users, still 12KB
PFADD visitors user:1 user:2 user:3
PFCOUNT visitors
PFMERGE today yesterday
Bitmap — Power of SETBIT
SETBIT user:active:20260415 12345 1
BITCOUNT user:active:20260415
BITOP AND weekly user:active:*
1B users → 1B bits → 128MB. Enables "users active every day for 30 days" queries faster than any RDB.
Stream — Kafka-Lite (2018)
Redis 5.0 added Streams with consumer groups and offsets. But persistence depends on Redis, so treat it as a lightweight message bus, not a Kafka replacement.
3. Persistence — RDB vs AOF Tradeoffs
RDB (Redis Database) — Snapshots
- Periodic full dump to binary file
BGSAVE— fork(); child dumps, parent keeps serving- Copy-on-Write prevents parent memory doubling (usually)
- Con: data after fork point may be lost
- Pro: fast restart, small file
AOF (Append-Only File) — Log
- Append every write to file
- fsync policy:
always— per command (slow, no loss)everysec— per second (default, max 1s loss)no— OS-managed (fast, large loss)
- AOF Rewrite — periodic compaction
- Con: slow restart, large file
- Pro: near-zero loss
Hybrid — Redis 4.0+
aof-use-rdb-preamble yes — RDB at the head, incremental AOF at the tail. Fast restart + minimal loss. De facto standard.
Decision Matrix
| Scenario | RDB | AOF | Hybrid |
|---|---|---|---|
| Cache | OK (no persistence needed) | X | X |
| Session store | X | OK (everysec) | OK |
| Fast restart priority | OK | X | OK |
| Primary store | X | OK (always) | OK (best) |
| Disk I/O sensitive | OK | X | careful |
Can You Use It as Primary Storage?
Not recommended. antirez warned repeatedly. Redis targets "fast cache + minimal loss," not "perfect durability." Put important data in Postgres/MySQL; use Redis as cache.
4. Redis Cluster — The Art of Hash Slots
Why Cluster?
- Single Redis has memory/throughput limits (tens of GB per instance)
- Multi-shard needed → Redis Cluster (3.0, 2015)
16384 Hash Slots
- Hash key with CRC16, modulo 16384 (2^14)
- Each slot assigned to a master
- 3 masters → ~5461 slots each
Why 16384
From antirez's famous GitHub issue reply:
- Slot bitmap sent in gossip: 16384 bits = 2KB
- 65536 would be 8KB — too large for gossip
- Below 1000 nodes, 16384 gives enough distribution quality
MOVED & ASK — Redirection
MOVED <slot> host:port— "this slot lives there permanently"ASK <slot> host:port— "migrating now; ask there once"
Smart clients (Lettuce, redis-py-cluster) cache the slot map and refresh on MOVED.
Hash Tag — Same-Slot Guarantee
SET {user:1}:profile "..."
SET {user:1}:sessions "..."
Only the part inside {} is hashed. Essential for MULTI/EXEC, SUNION, Lua scripts touching multiple keys.
Gossip & Failure Detection
- Nodes exchange PING/PONG
- Beyond
cluster-node-timeout→PFAIL(subjective) - Majority PFAIL reports →
FAIL(objective) - Replica auto-promotes
Limitations
- Multi-key commands (SINTER) require same slot
- No cross-slot transactions
- Backups are complex (per-node)
5. Sentinel — Another Path to HA
Cluster bundles sharding and HA. Sentinel provides HA only.
Structure
- 1 master + N replicas + M Sentinels (3+, odd)
- Sentinels monitor master health
- On failure, Raft-like consensus elects new leader
- Clients ask Sentinel for current master
Cluster vs Sentinel
| Aspect | Sentinel | Cluster |
|---|---|---|
| Sharding | X | OK |
| HA | OK | OK |
| Config complexity | Low | Medium |
| Client support | Broad | Smart client required |
| Scale | small–medium | medium–large |
| Multi-key commands | Full | Hash Tag required |
Rule of thumb: fits in one node's memory → Sentinel; otherwise → Cluster.
6. Cache Patterns — The Hot Potato
Cache-Aside (Lazy Loading)
def get_user(id):
user = redis.get(f"user:{id}")
if user is None:
user = db.query(id)
redis.set(f"user:{id}", user, ex=3600)
return user
- Most common
- Pros: simple, DB only hit on miss
- Cons: cold first request, possible stale data
Write-Through
def update_user(id, data):
db.update(id, data)
redis.set(f"user:{id}", data, ex=3600)
- Updates DB + cache on every write
- Good consistency
- Cons: write latency increases, caches cold data too
Write-Behind (Write-Back)
def update_user(id, data):
redis.set(f"user:{id}", data)
queue.push({"id": id, "data": data})
- Minimal write latency
- Cons: data-loss risk, complex
Refresh-Ahead
- Async refresh of near-expiry cache
- Prevents Thundering Herd (below)
Production Mix
Most real systems: Cache-Aside + TTL + (conditional) Write-Through. TTL strategy matters:
- Short TTL (1–5 min) — freshness matters
- Long TTL (1 hr+) — rarely changes
- Jitter —
ex=3600 + random(-300, 300)— avoid synchronous expiry
7. Thundering Herd & Cache Stampede
The most common, subtlest Redis failure cause.
Scenario
- Hot key expires
- Thousands of concurrent requests miss cache
- All hit the DB → DB overload → full outage
Fix 1 — Mutex Lock
def get_with_lock(key):
value = redis.get(key)
if value is not None:
return value
lock = redis.set(f"lock:{key}", "1", nx=True, ex=10)
if not lock:
time.sleep(0.05)
return redis.get(key)
try:
value = db.query(...)
redis.set(key, value, ex=3600)
return value
finally:
redis.delete(f"lock:{key}")
Fix 2 — Probabilistic Early Expiration (XFetch)
Refresh before expiry with probability. Paper: "Optimal Probabilistic Cache Stampede Prevention."
def fetch(key, beta=1.0):
value, ttl, delta = redis.get_with_meta(key)
now = time.time()
if value is None or now - delta * beta * math.log(random.random()) >= ttl:
value = db.query(...)
redis.set(key, value, ex=3600)
return value
Fix 3 — Dual TTL (Soft & Hard)
- Past soft TTL: background refresh, return stale
- Past hard TTL: synchronous refresh
8. Distributed Locks — The Redlock Debate
SET key value NX PX 10000 makes a tempting lock.
Single-Instance Lock
SET lock:resource unique_id NX PX 30000
EVAL "if redis.call('get',KEYS[1])==ARGV[1] then return redis.call('del',KEYS[1]) else return 0 end" 1 lock:resource unique_id
Redlock — 5-Instance Distributed Lock
antirez's algorithm:
- Attempt lock on 5 independent Redis instances
- Acquired if majority (3) succeed and total time
<TTL - On failure, release all
Martin Kleppmann's Critique
The DDIA author rebutted famously:
- No fencing token — GC pause can hold expired lock
- Clock-sync assumption weak — NTP jumps, VM pauses break TTL
- If correctness matters, don't use Redlock — use ZooKeeper or etcd
antirez's Counter
- Redlock is for performance — tolerable occasional double-exec
- For correctness, use DB transactions
- Fencing tokens are implementable (monotonic counter)
Practical Conclusion
| Purpose | Tool |
|---|---|
| Dedup (performance) | Redis SET NX PX |
| Leader election (correctness) | ZooKeeper/etcd |
| Transactions | DB itself |
| Money/payments | Never Redis locks |
9. The 2024 License Drama and Valkey Fork
On March 20, 2024, Redis Inc announced:
- Redis 7.4+ switches BSD → SSPL/RSALv2 dual license
- Aimed at cloud providers (AWS ElastiCache etc.)
- Open-source community was outraged
Valkey — Counter-Strike in 48 Hours
- March 28: Linux Foundation launches Valkey
- Founding sponsors: AWS, Google Cloud, Oracle, Ericsson
- Forked from Redis 7.2.4, stays BSD 3-Clause
- Most core maintainers moved to Valkey
antirez Returns
November 2024, antirez rejoins Redis Inc. In 2025 focuses on vector search (RedisVL) and AI.
2025 Landscape
| Product | License | Owner | Contributors |
|---|---|---|---|
| Redis | SSPL/RSALv2 | Redis Inc | antirez returned |
| Valkey | BSD 3-Clause | Linux Foundation | AWS/Google/Oracle |
| KeyDB | BSD 3-Clause | Snap | multi-thread fork |
| Dragonfly | BSL/Apache 2.0 | Dragonfly Labs | rewrite from scratch |
Guide:
- Public cloud managed → doesn't matter (ElastiCache moving to Valkey)
- Self-hosted + open source purist → Valkey
- Multi-thread extremes → Dragonfly
- Need Redis 7.4+ features → Redis
10. Dragonfly — "25x Faster Than Redis"
Launched 2022, rewritten from scratch in C++20. Secrets:
- Multi-thread shared-nothing — each thread owns a shard, no locks
- io_uring — Linux's modern async I/O (faster than epoll)
- Dash hashtable — cache-friendly, based on academic paper
- 30x faster RDB save — novel snapshot algorithm
Performance (2025)
- Single c7g.16xlarge: 6.5M QPS (Redis ~200K)
- 30% better memory efficiency
Limits
- Limited Lua scripting
- Cluster protocol not 100% compatible
- Some edge-case commands missing
- Small ops tooling ecosystem
KeyDB
- Snap's multi-thread fork, ~100% compatible
- Development slowed post-2024; momentum moved to Dragonfly
11. Memory Management — 8 Ways to Avoid OOM
Redis OOM = immediate outage.
maxmemory + Eviction Policy
maxmemory 4gb
maxmemory-policy allkeys-lru
Policies:
noeviction— reject writes (default, risky)allkeys-lru— LRU across allvolatile-lru— LRU among TTL keysallkeys-lfu— LFU (4.0+, recommended)volatile-ttl— nearest TTL firstallkeys-random/volatile-random
For caching, allkeys-lfu is preferred. LRU is weak against scan attacks.
Profiling
MEMORY USAGE user:1
MEMORY STATS
MEMORY DOCTOR
Big Key
Single key > 1MB is dangerous:
- KEYS scan cost
- Network latency
- Full move on cluster resharding
- Fix: split into hash/list
Hot Key
Single key hogs one CPU core:
- Fix: client-side cache, shard, read from replicas
TTL Strategy
- Always set TTL — infinite keys leak memory
- Add jitter — avoid synchronized expiry
- Hot keys longer TTL, cold shorter
12. Lua Scripting & Functions
Redis embeds Lua 5.1, executes multi-command atomic blocks.
local stock = tonumber(redis.call('GET', KEYS[1]))
if stock >= tonumber(ARGV[1]) then
redis.call('DECRBY', KEYS[1], ARGV[1])
return 1
else
return 0
end
Functions (Redis 7.0+)
Scripts stored as server-side libraries, managed per function.
Caveats
- Lua blocks all other commands — no long scripts
- For hot scripts use
SCRIPT LOAD+EVALSHA
13. Client-Side Caching (Tracking) — Redis 6.0
Redis pushes invalidation to clients that hold local caches.
CLIENT TRACKING ON REDIRECT 1234 BCAST PREFIX user:
- Removes roundtrips → ultra-low latency
- Supported by Lettuce, redis-py, most modern drivers
- Requires client-side memory management
14. Monitoring — 10 Must-Watch Metrics
| Metric | Description | Alert |
|---|---|---|
used_memory / maxmemory | memory usage | 80% |
evicted_keys | evictions | trending up |
connected_clients | concurrent conns | > 10K watch |
instantaneous_ops_per_sec | QPS | vs baseline |
latency_percentiles_usec_* | p99 latency | > 1ms |
rejected_connections | rejections | 0 |
keyspace_hits / misses | hit ratio | > 90% |
aof_current_size | AOF size | disk headroom |
rdb_last_save_time | last RDB | stale = warn |
master_link_status (replica) | repl link | up |
Slow Log
CONFIG SET slowlog-log-slower-than 10000
SLOWLOG GET 10
Never MONITOR
MONITOR streams every command → 50%+ perf drop. Forbidden in production. Use SLOWLOG, LATENCY instead.
15. Anti-Patterns Top 10
- Mass insert without TTL — memory leak
KEYS *orFLUSHALL— single-thread stall- Giant Hash/List accumulation — second-long commands
- Only-in-cache data — unrecoverable on loss
- Same TTL for all keys — simultaneous expiry → stampede
- Redis lock for money — see Redlock debate
- Pub/Sub as durable queue — messages vanish, use Streams
- Write-Through with short TTL — pointless double work
- Writing to replicas — async replication, data loss
- Cross-slot transactions on cluster — silent failure
16. Sensible Redis Checklist
- Is it cache or storage? Be explicit
- Set
maxmemoryand eviction policy explicitly - TTL on every key with jitter
- AOF + everysec for 1s loss budget, consider hybrid
- Ban
MONITOR/KEYS */FLUSHALL - Alert on Big Key / Hot Key (1MB / 10K QPS)
- Target 90%+ hit rate; investigate misses
- Thundering Herd plan — mutex or XFetch
- Distributed lock is performance-only
- Decide Cluster vs Sentinel by data size
- Evaluate client-side caching for read-heavy workloads
- Practice replica-promotion (failover drill)
Closing — The Elegance of Single Thread
Redis's success is paradoxical: in an age of parallelism, serving 1M QPS from one thread. Behind it is antirez's philosophy.
"Real engineering isn't making the complex simple; it's designing something simple from the start." — antirez
Like many "legacy" technologies, Redis is deeper than it looks. Data structures, I/O model, persistence, distribution, tradeoffs — every choice has a reason. The 2024 license fight is the old open-source vs commercialization tension erupting; Valkey/Dragonfly's rise shows we've entered an era where "the Redis interface matters more than Redis itself."
Next — PostgreSQL Internals & Query Optimization
If Redis is "the elegance of data structures," PostgreSQL is "the relational DB masterpiece." Next we'll cover:
- MVCC, VACUUM, WAL and replication
- Query planner internals
- B-Tree/Hash/GiST/GIN/BRIN/HNSW (pgvector)
- Partitioning, pgBouncer
- JSONB vs Document DB
- PostgreSQL 18 (2025) new features (AIO, DirectIO, UUIDv7)
Making the database a transparent engine, not a black box.
"Redis was originally a tool for me. Even when 10K, then 1M people used it, I kept making it 'easy for me to use.' That's Redis's secret." — Salvatore Sanfilippo (2025 return interview)
현재 단락 (1/311)
Few systems have a gap as large as Redis between "just using it" and "really understanding it." Most...