- Authors
- Name

- Overview
- Key Changes in Redis 7
- Cluster Architecture
- Hash Slots and Rebalancing
- Memory Optimization Strategies
- Per-Data-Structure Encoding Optimization
- Persistence (RDB/AOF) Configuration
- Monitoring and Alerting
- Troubleshooting
- Disaster Recovery Procedures
- Operations Checklist
- References
Overview
Redis is the leading in-memory data store, widely used for caching, session management, real-time analytics, and message brokering. Since Redis 7, features such as Functions, ACL v2, and Multi-Part AOF have been added, significantly enhancing operational stability and programmability. In 2025, the transition to AGPLv3 licensing brought major changes to the ecosystem. Alternatives like Valkey and KeyDB have emerged, diversifying the options for in-memory data stores.
This handbook covers the entire production operations process for Redis 7.x-based clusters, from architecture design to hash slot rebalancing, memory optimization strategies, per-data-structure encoding tuning, persistence configuration, monitoring, and disaster recovery. Each section includes practical commands and configuration examples for immediate application.
Key Changes in Redis 7
Redis Functions
Functions, introduced in Redis 7, is a first-class citizen programming model that replaces the existing EVAL-based Lua scripting. Functions are stored alongside RDB and AOF files and are automatically replicated from master to replica. Multiple functions can be defined in a single library, enabling high code reusability.
#!lua name=mylib
-- Function that atomically increments view count and records view history
redis.register_function('increment_view', function(keys, args)
local current = redis.call('HINCRBY', keys[1], 'views', 1)
redis.call('ZADD', keys[1] .. ':history', redis.call('TIME')[1], args[1])
return current
end)
# Load the library
cat mylib.lua | redis-cli -x FUNCTION LOAD REPLACE
# Call the function
redis-cli FCALL increment_view 1 article:1001 "user:42"
# List registered functions
redis-cli FUNCTION LIST
ACL v2
ACL v2 adds the ability to finely control read/write permissions at the key level. It introduces the Selector concept, allowing multiple rule sets to be assigned to a single user. The root selector is evaluated first, followed by additional selectors in order.
# Cache-only user: read/write on cache:* keys, read-only on session:* keys
redis-cli ACL SETUSER cache_worker on >StrongP@ss123 \
~cache:* +@all \
(~session:* %R~session:* +@read)
# Verify ACL rules
redis-cli ACL GETUSER cache_worker
# Save current ACL list to file
redis-cli ACL SAVE
Multi-Part AOF
Previously, the AOF rewrite process replaced a single large file. Redis 7's Multi-Part AOF stores base files (full data) and incr files (incremental data) separately in a dedicated directory. This reduces disk space waste and systematizes AOF history management.
Cluster Architecture
Redis Cluster is a distributed architecture that shards data across multiple nodes for horizontal scalability. Each master node is responsible for a subset of the 16,384 hash slots, and clients access the appropriate node directly via the CRC16 hash of the key.
Minimum 6-Node Cluster Configuration
The recommended minimum configuration for production environments is 3 masters + 3 replicas for a total of 6 nodes.
# Configure 6 Redis instances (ports 7000-7005)
for port in 7000 7001 7002 7003 7004 7005; do
mkdir -p /opt/redis-cluster/${port}
cat > /opt/redis-cluster/${port}/redis.conf << CONF
port ${port}
cluster-enabled yes
cluster-config-file nodes-${port}.conf
cluster-node-timeout 5000
appendonly yes
appendfilename "appendonly.aof"
dir /opt/redis-cluster/${port}
maxmemory 4gb
maxmemory-policy allkeys-lfu
bind 0.0.0.0
protected-mode no
save 3600 1 300 100 60 10000
aof-use-rdb-preamble yes
CONF
redis-server /opt/redis-cluster/${port}/redis.conf &
done
# Create cluster (3 masters + 3 replicas)
redis-cli --cluster create \
192.168.1.10:7000 192.168.1.10:7001 192.168.1.10:7002 \
192.168.1.10:7003 192.168.1.10:7004 192.168.1.10:7005 \
--cluster-replicas 1
Redis vs Valkey vs KeyDB Comparison
Following Redis's license change in 2024 (from RSAL2/SSPL to later AGPLv3), interest in Valkey and KeyDB has increased. All three projects are Redis protocol-compatible, so existing clients can be used as-is.
| Category | Redis 7.x / 8.x | Valkey 8.x | KeyDB |
|---|---|---|---|
| License | AGPLv3 (2025~) | BSD 3-Clause | BSD 3-Clause |
| Threading Model | Single-threaded event loop (I/O threads supported) | Enhanced I/O multi-threading | Native multi-threading |
| Throughput | Baseline | Similar to or slightly above Redis | 2-5x improvement with multi-core |
| Governance | Redis Ltd. | Linux Foundation | Snap Inc. |
| Managed Cloud | Redis Cloud, AWS ElastiCache | AWS ElastiCache, Google MemoryStore | Limited |
| Functions Support | Supported since 7.0 | Compatible support | Not supported (Lua only) |
| Best For | Stability, rich ecosystem | Open-source governance priority | High concurrent throughput needs |
Hash Slots and Rebalancing
All keys in Redis Cluster are mapped to hash slots using the formula CRC16(key) mod 16384. The number 16,384 slots was designed as a balance point between fine-grained key distribution control and cluster metadata overhead. Theoretically, up to 16,384 master nodes can be scaled, but the practical recommended upper limit is approximately 1,000.
Hash Tags for Slot Control
For multi-key operations (MGET, pipelining, etc.), related keys must be in the same slot. Hash tag syntax allows only a specific part to be used for hash calculation.
# Place in the same slot: only the {user:1000} part is used for hash calculation
redis-cli SET "{user:1000}.profile" '{"name":"Kim"}'
redis-cli SET "{user:1000}.settings" '{"theme":"dark"}'
redis-cli SET "{user:1000}.cart" '["item1","item2"]'
# Verify they are in the same slot
redis-cli CLUSTER KEYSLOT "{user:1000}.profile"
redis-cli CLUSTER KEYSLOT "{user:1000}.settings"
Slot Rebalancing
When adding or removing nodes, slots must be redistributed. This can be performed during live traffic, but it is best to avoid peak times.
# Add a new node to the cluster
redis-cli --cluster add-node 192.168.1.10:7006 192.168.1.10:7000
# Auto rebalancing: distribute slots evenly across all masters
redis-cli --cluster rebalance 192.168.1.10:7000
# Manually move slots from one node to another
redis-cli --cluster reshard 192.168.1.10:7000 \
--cluster-from <source-node-id> \
--cluster-to <target-node-id> \
--cluster-slots 1000 \
--cluster-yes
# Check cluster status
redis-cli --cluster check 192.168.1.10:7000
redis-cli --cluster info 192.168.1.10:7000
Memory Optimization Strategies
Redis stores all data in memory, so memory efficiency directly translates to cost. With systematic memory optimization, you can handle more data on the same hardware.
maxmemory-policy Comparison
This policy determines which keys to evict when the memory limit is reached. Redis 7 provides 8 policies.
| Policy | Target Scope | Algorithm | Best For |
|---|---|---|---|
| noeviction | None | No eviction (returns write errors) | Environments where data loss is unacceptable |
| allkeys-lru | All keys | Evict least recently used keys | General-purpose caching |
| allkeys-lfu | All keys | Evict least frequently used keys | Cache with clear hot/cold patterns |
| allkeys-random | All keys | Random eviction | Uniform access patterns |
| volatile-lru | Keys with TTL set | Evict least recently used keys | Mixed cache + persistent data |
| volatile-lfu | Keys with TTL set | Evict least frequently used keys | Hot/cold separation among TTL data |
| volatile-random | Keys with TTL set | Random eviction | Uniform pattern among TTL keys |
| volatile-ttl | Keys with TTL set | Evict keys closest to expiration | Short-lived cache eviction priority |
For most cache-only environments, allkeys-lfu is optimal. For environments where cache and persistent data coexist, use volatile-lru and ensure all cache keys have a TTL set.
Analysis Using MEMORY Commands
# Check memory usage of a specific key (in bytes)
redis-cli MEMORY USAGE user:profile:10001
# (integer) 256
# For nested structures, specify sample count (0 for full traversal)
redis-cli MEMORY USAGE myhash SAMPLES 0
# Memory diagnostic report
redis-cli MEMORY DOCTOR
# "Sam, I have no memory problems"
# Memory statistics summary
redis-cli MEMORY STATS
# Full server memory information
redis-cli INFO memory
redis.conf Memory Tuning Configuration
# Maximum memory and eviction policy
maxmemory 4gb
maxmemory-policy allkeys-lfu
# LFU algorithm tuning
# lfu-log-factor: higher values slow counter increments (default 10)
# lfu-decay-time: counter decay period in minutes (default 1)
lfu-log-factor 10
lfu-decay-time 1
# Enable memory defragmentation (jemalloc-based)
activedefrag yes
active-defrag-enabled yes
active-defrag-ignore-bytes 100mb
active-defrag-threshold-lower 10
active-defrag-threshold-upper 100
active-defrag-cycle-min 1
active-defrag-cycle-max 25
# Lazy Free settings: background processing for large key deletions
lazyfree-lazy-eviction yes
lazyfree-lazy-expire yes
lazyfree-lazy-server-del yes
replica-lazy-flush yes
# I/O thread settings (Redis 7+)
io-threads 4
io-threads-do-reads yes
Per-Data-Structure Encoding Optimization
Redis automatically switches internal encodings based on data size. Since Redis 7, listpack is used instead of the previous ziplist, with a 6-byte header for improved memory efficiency. Proper threshold configuration can achieve 5-10x memory savings.
| Data Structure | Small Encoding | Large Encoding | Transition Setting |
|---|---|---|---|
| Hash | listpack | hashtable | hash-max-listpack-entries (128), hash-max-listpack-value (64) |
| List | listpack | quicklist | list-max-listpack-size (-2) |
| Set | listpack / intset | hashtable | set-max-listpack-entries (128), set-max-intset-entries (512) |
| Sorted Set | listpack | skiplist + hashtable | zset-max-listpack-entries (128), zset-max-listpack-value (64) |
| String | int / embstr | raw | Automatic (44-byte boundary) |
# redis.conf: encoding threshold adjustment examples
# Workloads with many small hashes: increase thresholds to maintain listpack
hash-max-listpack-entries 256
hash-max-listpack-value 128
# Ranking service with many small sorted sets
zset-max-listpack-entries 256
zset-max-listpack-value 64
# Tag system with many small sets
set-max-listpack-entries 256
set-max-intset-entries 1024
# Maximum size per list node (-2 = 8KB, -1 = 4KB)
list-max-listpack-size -2
list-compress-depth 1
Encoding Verification and Structure Transition Testing
# Check current encoding
redis-cli OBJECT ENCODING mykey
# Small hash (listpack)
redis-cli HSET small:hash f1 v1 f2 v2 f3 v3
redis-cli OBJECT ENCODING small:hash
# "listpack"
# Automatic transition to hashtable when threshold is exceeded
# Create a hash with 129 fields (exceeding default hash-max-listpack-entries=128)
for i in $(seq 1 129); do
redis-cli HSET big:hash "field_${i}" "value_${i}"
done
redis-cli OBJECT ENCODING big:hash
# "hashtable"
# Set composed of only integers: intset encoding
redis-cli SADD int:set 1 2 3 4 5
redis-cli OBJECT ENCODING int:set
# "intset"
Persistence (RDB/AOF) Configuration
Since Redis is an in-memory database, a proper persistence strategy is essential for data recovery in case of failure. Redis 7's Multi-Part AOF significantly improved the persistence mechanism.
RDB vs AOF vs Hybrid Mode Comparison
| Category | RDB | AOF | Hybrid (RDB + AOF) |
|---|---|---|---|
| Storage Method | Periodic snapshots | Log of all write commands | RDB preamble + AOF incremental |
| File Size | Small (binary compressed) | Large (command text) | Medium |
| Data Loss Risk | Loss since last snapshot | Minimized by fsync setting | Minimal |
| Recovery Speed | Fast | Slow (command replay) | Fast |
| Fork Overhead | Fork per snapshot | Fork only during rewrite | Fork during rewrite |
| Disk I/O | Low (intermittent) | High (continuous) | Medium |
| Best For | Backup, disaster recovery | Zero data loss required | Production recommended |
Production Recommended Persistence Configuration
# redis.conf: Hybrid mode (production recommended)
# RDB snapshot intervals: 3600s (1 hour) with 1 change, 300s with 100 changes, 60s with 10000 changes
save 3600 1 300 100 60 10000
# Enable AOF
appendonly yes
appendfilename "appendonly.aof"
# AOF fsync policy: everysec (balance between performance and durability)
appendfsync everysec
# Enable hybrid mode: use RDB format as preamble during AOF rewrite
aof-use-rdb-preamble yes
# Multi-Part AOF directory (Redis 7 default)
appenddirname "appendonlydir"
# AOF rewrite trigger conditions
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb
# RDB compression and checksum
rdbcompression yes
rdbchecksum yes
dbfilename dump.rdb
# Stop writes on background save failure
stop-writes-on-bgsave-error yes
Monitoring and Alerting
For stable Redis cluster operations, core metrics must be collected in real-time with threshold-based alerting configured.
Core Monitoring Metrics
# Check memory usage
redis-cli INFO memory | grep -E "used_memory_human|used_memory_rss_human|mem_fragmentation_ratio"
# used_memory_human:2.85G
# used_memory_rss_human:3.12G
# mem_fragmentation_ratio:1.09
# Check cluster status
redis-cli CLUSTER INFO
# cluster_state:ok
# cluster_slots_assigned:16384
# cluster_slots_ok:16384
# cluster_known_nodes:6
# cluster_size:3
# Commands processed per second
redis-cli INFO stats | grep instantaneous_ops_per_sec
# Check connection count
redis-cli INFO clients | grep connected_clients
# Keyspace hit rate (cache efficiency)
redis-cli INFO stats | grep -E "keyspace_hits|keyspace_misses"
# Check Slow Log (commands taking over 10ms)
redis-cli SLOWLOG GET 10
redis-cli SLOWLOG LEN
Python-Based Monitoring Script
import redis
from redis.cluster import RedisCluster
# Redis Cluster connection
startup_nodes = [
{"host": "192.168.1.10", "port": 7000},
{"host": "192.168.1.10", "port": 7001},
{"host": "192.168.1.10", "port": 7002},
]
rc = RedisCluster(
startup_nodes=startup_nodes,
decode_responses=True,
password="your_secure_password",
socket_timeout=5,
retry_on_timeout=True,
)
def check_cluster_health():
"""Checks overall cluster status and sends alerts on anomalies."""
alerts = []
# Check cluster state
cluster_info = rc.cluster_info()
if cluster_info.get("cluster_state") != "ok":
alerts.append(f"[CRITICAL] Cluster state abnormal: {cluster_info.get('cluster_state')}")
# Check memory usage per node
for node in rc.get_nodes():
try:
info = rc.get_redis_connection(node).info("memory")
used = info["used_memory"]
maxmem = info.get("maxmemory", 0)
if maxmem > 0:
usage_pct = (used / maxmem) * 100
if usage_pct > 85:
alerts.append(
f"[WARNING] Node {node.host}:{node.port} "
f"memory usage {usage_pct:.1f}%"
)
# Check memory fragmentation ratio
frag_ratio = info.get("mem_fragmentation_ratio", 1.0)
if frag_ratio > 1.5:
alerts.append(
f"[WARNING] Node {node.host}:{node.port} "
f"memory fragmentation ratio {frag_ratio:.2f}"
)
except Exception as e:
alerts.append(f"[ERROR] Node {node.host}:{node.port} connection failed: {e}")
return alerts
def get_memory_report():
"""Generates a per-node memory usage report."""
report = []
for node in rc.get_nodes():
try:
info = rc.get_redis_connection(node).info("memory")
report.append({
"node": f"{node.host}:{node.port}",
"role": node.server_type,
"used_memory_human": info["used_memory_human"],
"used_memory_rss_human": info["used_memory_rss_human"],
"fragmentation_ratio": info["mem_fragmentation_ratio"],
"used_memory_peak_human": info["used_memory_peak_human"],
})
except Exception as e:
report.append({"node": f"{node.host}:{node.port}", "error": str(e)})
return report
if __name__ == "__main__":
# Cluster health check
alerts = check_cluster_health()
if alerts:
for alert in alerts:
print(alert)
# send_to_slack(alerts) or send_to_pagerduty(alerts)
else:
print("All nodes healthy")
# Memory report output
for item in get_memory_report():
print(item)
Alert Threshold Standards
| Metric | WARNING | CRITICAL | Description |
|---|---|---|---|
| Memory Usage | over 75% | over 90% | used_memory relative to maxmemory |
| Memory Fragmentation Ratio | over 1.5 | over 2.0 | mem_fragmentation_ratio |
| Connection Count | over 5,000 | over 8,000 | connected_clients |
| Cache Hit Rate | under 90% | under 80% | keyspace_hits / (hits + misses) |
| Slow Log Frequency | over 10/min | over 50/min | Commands taking over 10ms |
| Replication Lag | over 1MB | over 10MB | master_repl_offset difference |
Troubleshooting
Memory Fragmentation Issues
When mem_fragmentation_ratio exceeds 1.5, jemalloc memory fragmentation is severe. Frequent key creation/deletion is the cause, and the activedefrag feature can be enabled to resolve it at runtime.
# Check current fragmentation ratio
redis-cli INFO memory | grep mem_fragmentation_ratio
# Enable Active Defrag at runtime
redis-cli CONFIG SET activedefrag yes
redis-cli CONFIG SET active-defrag-threshold-lower 10
# Check defragmentation progress
redis-cli INFO memory | grep -E "active_defrag"
Big Key Detection
Excessive data stored in a single key can cause network latency, memory spikes, and slow log increases.
# Big key scan (SCAN-based, safe for production)
redis-cli --bigkeys
# Exact memory usage of a specific key
redis-cli MEMORY USAGE large:hash SAMPLES 0
# Asynchronous deletion of big keys (UNLINK = non-blocking DEL)
redis-cli UNLINK large:hash
Repeated MOVED Redirections
If a client keeps receiving MOVED responses by sending requests to the wrong node, verify that the client library supports cluster mode. Standalone mode clients do not perform slot routing.
Replica Synchronization Failure
If a replica keeps repeating full resync, repl-backlog-size must be increased sufficiently. The default is 1MB, which should be expanded to 64MB-256MB in high-traffic environments.
# redis.conf
repl-backlog-size 256mb
repl-backlog-ttl 3600
Disaster Recovery Procedures
Automatic Failover
Redis Cluster automatically detects node failures and promotes replicas to masters. If a majority of masters mark a node as PFAIL within the cluster-node-timeout (default 15000ms), it transitions to FAIL state, and one of the failed master's replicas is automatically promoted.
# Check current cluster node status
redis-cli CLUSTER NODES
# Check if a specific node is in FAIL state
redis-cli CLUSTER INFO | grep cluster_state
# Manual failover (executed from a replica)
redis-cli -p 7003 CLUSTER FAILOVER
# Force failover (when master is unresponsive)
redis-cli -p 7003 CLUSTER FAILOVER FORCE
Node Replacement Procedure
- Remove the failed node:
redis-cli --cluster del-node <cluster-ip:port> <node-id> - Start new instance and join cluster:
redis-cli --cluster add-node <new-ip:port> <cluster-ip:port> - Assign as replica:
redis-cli -p <new-port> CLUSTER REPLICATE <master-node-id> - Verify proper slot distribution:
redis-cli --cluster check <cluster-ip:port>
RDB/AOF-Based Data Recovery
# Verify AOF file integrity
redis-check-aof --fix /opt/redis-cluster/7000/appendonlydir/appendonly.aof.1.incr.aof
# Verify RDB file integrity
redis-check-rdb /opt/redis-cluster/7000/dump.rdb
# Restore from backup: stop instance then replace RDB/AOF files
redis-cli -p 7000 SHUTDOWN NOSAVE
cp /backup/dump.rdb /opt/redis-cluster/7000/dump.rdb
redis-server /opt/redis-cluster/7000/redis.conf
Operations Checklist
A checklist for stable production Redis cluster operations.
Pre-Deployment Checks
- Are maxmemory and maxmemory-policy configured appropriately for the workload
- Is the cluster node count at least 6 (3 masters + 3 replicas) or more
- Is each node's memory at least 2x the actual data size (considering COW during fork)
- Are ACL rules configured following the principle of least privilege
- Does network bandwidth and latency meet cluster requirements
Daily Checks
- Is memory usage below 75%
- Is mem_fragmentation_ratio below 1.5
- Are there any abnormal patterns in the Slow Log
- Is replication lag within acceptable range
- Is cache hit rate maintained at 90% or above
Weekly Checks
- Have big key scans identified any abnormally sized keys
- Have unused keys (based on idle time) been cleaned up
- Has RDB/AOF backup file integrity been verified
- Is cluster slot distribution even
- Have security patches and Redis minor version updates been reviewed
Monthly Checks
- Is the maxmemory setting appropriate for data growth trends
- Are encoding thresholds (listpack entries/value) optimal for current data patterns
- Has the disaster recovery procedure been validated through failover drills
- Is the client library version up to date
- Has capacity planning determined whether scaling is needed within 3-6 months
References
- Redis 7.0 Official Release Blog - Redis 7 key features overview
- Redis Cluster Official Spec Documentation - Hash slots, failover mechanism details
- Redis Memory Optimization Official Guide - Encoding optimization, memory saving techniques
- Redis Persistence Official Documentation - RDB, AOF, hybrid mode configuration guide
- Redis Key Eviction Official Documentation - Detailed maxmemory-policy explanation
- Valkey vs KeyDB vs Redis Comparison Guide (2026) - In-memory data store comparison
- Redis Cluster Scaling Tutorial - Cluster scaling and hash slot management