Redis 7 Cluster Operations and Memory Optimization Handbook 2026

Overview
Key Changes in Redis 7
Cluster Architecture
- Minimum 6-Node Cluster Configuration
- Redis vs Valkey vs KeyDB Comparison
Hash Slots and Rebalancing
- Hash Tags for Slot Control
- Slot Rebalancing
Memory Optimization Strategies
Per-Data-Structure Encoding Optimization
- Encoding Verification and Structure Transition Testing
Persistence (RDB/AOF) Configuration
- RDB vs AOF vs Hybrid Mode Comparison
- Production Recommended Persistence Configuration
Monitoring and Alerting
Troubleshooting
Disaster Recovery Procedures
Operations Checklist
References

Overview

Redis is the leading in-memory data store, widely used for caching, session management, real-time analytics, and message brokering. Since Redis 7, features such as Functions, ACL v2, and Multi-Part AOF have been added, significantly enhancing operational stability and programmability. In 2025, the transition to AGPLv3 licensing brought major changes to the ecosystem. Alternatives like Valkey and KeyDB have emerged, diversifying the options for in-memory data stores.

This handbook covers the entire production operations process for Redis 7.x-based clusters, from architecture design to hash slot rebalancing, memory optimization strategies, per-data-structure encoding tuning, persistence configuration, monitoring, and disaster recovery. Each section includes practical commands and configuration examples for immediate application.

Key Changes in Redis 7

Redis Functions

Functions, introduced in Redis 7, is a first-class citizen programming model that replaces the existing EVAL-based Lua scripting. Functions are stored alongside RDB and AOF files and are automatically replicated from master to replica. Multiple functions can be defined in a single library, enabling high code reusability.

#!lua name=mylib

-- Function that atomically increments view count and records view history
redis.register_function('increment_view', function(keys, args)
  local current = redis.call('HINCRBY', keys[1], 'views', 1)
  redis.call('ZADD', keys[1] .. ':history', redis.call('TIME')[1], args[1])
  return current
end)

# Load the library
cat mylib.lua | redis-cli -x FUNCTION LOAD REPLACE

# Call the function
redis-cli FCALL increment_view 1 article:1001 "user:42"

# List registered functions
redis-cli FUNCTION LIST

ACL v2

ACL v2 adds the ability to finely control read/write permissions at the key level. It introduces the Selector concept, allowing multiple rule sets to be assigned to a single user. The root selector is evaluated first, followed by additional selectors in order.

# Cache-only user: read/write on cache:* keys, read-only on session:* keys
redis-cli ACL SETUSER cache_worker on >StrongP@ss123 \
  ~cache:* +@all \
  (~session:* %R~session:* +@read)

# Verify ACL rules
redis-cli ACL GETUSER cache_worker

# Save current ACL list to file
redis-cli ACL SAVE

Multi-Part AOF

Previously, the AOF rewrite process replaced a single large file. Redis 7's Multi-Part AOF stores base files (full data) and incr files (incremental data) separately in a dedicated directory. This reduces disk space waste and systematizes AOF history management.

Cluster Architecture

Redis Cluster is a distributed architecture that shards data across multiple nodes for horizontal scalability. Each master node is responsible for a subset of the 16,384 hash slots, and clients access the appropriate node directly via the CRC16 hash of the key.

Minimum 6-Node Cluster Configuration

The recommended minimum configuration for production environments is 3 masters + 3 replicas for a total of 6 nodes.

# Configure 6 Redis instances (ports 7000-7005)
for port in 7000 7001 7002 7003 7004 7005; do
  mkdir -p /opt/redis-cluster/${port}
  cat > /opt/redis-cluster/${port}/redis.conf << CONF
port ${port}
cluster-enabled yes
cluster-config-file nodes-${port}.conf
cluster-node-timeout 5000
appendonly yes
appendfilename "appendonly.aof"
dir /opt/redis-cluster/${port}
maxmemory 4gb
maxmemory-policy allkeys-lfu
bind 0.0.0.0
protected-mode no
save 3600 1 300 100 60 10000
aof-use-rdb-preamble yes
CONF
  redis-server /opt/redis-cluster/${port}/redis.conf &
done

# Create cluster (3 masters + 3 replicas)
redis-cli --cluster create \
  192.168.1.10:7000 192.168.1.10:7001 192.168.1.10:7002 \
  192.168.1.10:7003 192.168.1.10:7004 192.168.1.10:7005 \
  --cluster-replicas 1

Redis vs Valkey vs KeyDB Comparison

Following Redis's license change in 2024 (from RSAL2/SSPL to later AGPLv3), interest in Valkey and KeyDB has increased. All three projects are Redis protocol-compatible, so existing clients can be used as-is.

Category	Redis 7.x / 8.x	Valkey 8.x	KeyDB
License	AGPLv3 (2025~)	BSD 3-Clause	BSD 3-Clause
Threading Model	Single-threaded event loop (I/O threads supported)	Enhanced I/O multi-threading	Native multi-threading
Throughput	Baseline	Similar to or slightly above Redis	2-5x improvement with multi-core
Governance	Redis Ltd.	Linux Foundation	Snap Inc.
Managed Cloud	Redis Cloud, AWS ElastiCache	AWS ElastiCache, Google MemoryStore	Limited
Functions Support	Supported since 7.0	Compatible support	Not supported (Lua only)
Best For	Stability, rich ecosystem	Open-source governance priority	High concurrent throughput needs

Hash Slots and Rebalancing

All keys in Redis Cluster are mapped to hash slots using the formula CRC16(key) mod 16384. The number 16,384 slots was designed as a balance point between fine-grained key distribution control and cluster metadata overhead. Theoretically, up to 16,384 master nodes can be scaled, but the practical recommended upper limit is approximately 1,000.

Hash Tags for Slot Control

For multi-key operations (MGET, pipelining, etc.), related keys must be in the same slot. Hash tag syntax allows only a specific part to be used for hash calculation.

# Place in the same slot: only the {user:1000} part is used for hash calculation
redis-cli SET "{user:1000}.profile" '{"name":"Kim"}'
redis-cli SET "{user:1000}.settings" '{"theme":"dark"}'
redis-cli SET "{user:1000}.cart" '["item1","item2"]'

# Verify they are in the same slot
redis-cli CLUSTER KEYSLOT "{user:1000}.profile"
redis-cli CLUSTER KEYSLOT "{user:1000}.settings"

Slot Rebalancing

When adding or removing nodes, slots must be redistributed. This can be performed during live traffic, but it is best to avoid peak times.

# Add a new node to the cluster
redis-cli --cluster add-node 192.168.1.10:7006 192.168.1.10:7000

# Auto rebalancing: distribute slots evenly across all masters
redis-cli --cluster rebalance 192.168.1.10:7000

# Manually move slots from one node to another
redis-cli --cluster reshard 192.168.1.10:7000 \
  --cluster-from <source-node-id> \
  --cluster-to <target-node-id> \
  --cluster-slots 1000 \
  --cluster-yes

# Check cluster status
redis-cli --cluster check 192.168.1.10:7000
redis-cli --cluster info 192.168.1.10:7000

Memory Optimization Strategies

Redis stores all data in memory, so memory efficiency directly translates to cost. With systematic memory optimization, you can handle more data on the same hardware.

maxmemory-policy Comparison

This policy determines which keys to evict when the memory limit is reached. Redis 7 provides 8 policies.

Policy	Target Scope	Algorithm	Best For
noeviction	None	No eviction (returns write errors)	Environments where data loss is unacceptable
allkeys-lru	All keys	Evict least recently used keys	General-purpose caching
allkeys-lfu	All keys	Evict least frequently used keys	Cache with clear hot/cold patterns
allkeys-random	All keys	Random eviction	Uniform access patterns
volatile-lru	Keys with TTL set	Evict least recently used keys	Mixed cache + persistent data
volatile-lfu	Keys with TTL set	Evict least frequently used keys	Hot/cold separation among TTL data
volatile-random	Keys with TTL set	Random eviction	Uniform pattern among TTL keys
volatile-ttl	Keys with TTL set	Evict keys closest to expiration	Short-lived cache eviction priority

For most cache-only environments, allkeys-lfu is optimal. For environments where cache and persistent data coexist, use volatile-lru and ensure all cache keys have a TTL set.

Analysis Using MEMORY Commands

# Check memory usage of a specific key (in bytes)
redis-cli MEMORY USAGE user:profile:10001
# (integer) 256

# For nested structures, specify sample count (0 for full traversal)
redis-cli MEMORY USAGE myhash SAMPLES 0

# Memory diagnostic report
redis-cli MEMORY DOCTOR
# "Sam, I have no memory problems"

# Memory statistics summary
redis-cli MEMORY STATS

# Full server memory information
redis-cli INFO memory

redis.conf Memory Tuning Configuration

# Maximum memory and eviction policy
maxmemory 4gb
maxmemory-policy allkeys-lfu

# LFU algorithm tuning
# lfu-log-factor: higher values slow counter increments (default 10)
# lfu-decay-time: counter decay period in minutes (default 1)
lfu-log-factor 10
lfu-decay-time 1

# Enable memory defragmentation (jemalloc-based)
activedefrag yes
active-defrag-enabled yes
active-defrag-ignore-bytes 100mb
active-defrag-threshold-lower 10
active-defrag-threshold-upper 100
active-defrag-cycle-min 1
active-defrag-cycle-max 25

# Lazy Free settings: background processing for large key deletions
lazyfree-lazy-eviction yes
lazyfree-lazy-expire yes
lazyfree-lazy-server-del yes
replica-lazy-flush yes

# I/O thread settings (Redis 7+)
io-threads 4
io-threads-do-reads yes

Per-Data-Structure Encoding Optimization

Redis automatically switches internal encodings based on data size. Since Redis 7, listpack is used instead of the previous ziplist, with a 6-byte header for improved memory efficiency. Proper threshold configuration can achieve 5-10x memory savings.

Data Structure	Small Encoding	Large Encoding	Transition Setting
Hash	listpack	hashtable	hash-max-listpack-entries (128), hash-max-listpack-value (64)
List	listpack	quicklist	list-max-listpack-size (-2)
Set	listpack / intset	hashtable	set-max-listpack-entries (128), set-max-intset-entries (512)
Sorted Set	listpack	skiplist + hashtable	zset-max-listpack-entries (128), zset-max-listpack-value (64)
String	int / embstr	raw	Automatic (44-byte boundary)

# redis.conf: encoding threshold adjustment examples
# Workloads with many small hashes: increase thresholds to maintain listpack
hash-max-listpack-entries 256
hash-max-listpack-value 128

# Ranking service with many small sorted sets
zset-max-listpack-entries 256
zset-max-listpack-value 64

# Tag system with many small sets
set-max-listpack-entries 256
set-max-intset-entries 1024

# Maximum size per list node (-2 = 8KB, -1 = 4KB)
list-max-listpack-size -2
list-compress-depth 1

Encoding Verification and Structure Transition Testing

# Check current encoding
redis-cli OBJECT ENCODING mykey

# Small hash (listpack)
redis-cli HSET small:hash f1 v1 f2 v2 f3 v3
redis-cli OBJECT ENCODING small:hash
# "listpack"

# Automatic transition to hashtable when threshold is exceeded
# Create a hash with 129 fields (exceeding default hash-max-listpack-entries=128)
for i in $(seq 1 129); do
  redis-cli HSET big:hash "field_${i}" "value_${i}"
done
redis-cli OBJECT ENCODING big:hash
# "hashtable"

# Set composed of only integers: intset encoding
redis-cli SADD int:set 1 2 3 4 5
redis-cli OBJECT ENCODING int:set
# "intset"

Persistence (RDB/AOF) Configuration

Since Redis is an in-memory database, a proper persistence strategy is essential for data recovery in case of failure. Redis 7's Multi-Part AOF significantly improved the persistence mechanism.

RDB vs AOF vs Hybrid Mode Comparison

Category	RDB	AOF	Hybrid (RDB + AOF)
Storage Method	Periodic snapshots	Log of all write commands	RDB preamble + AOF incremental
File Size	Small (binary compressed)	Large (command text)	Medium
Data Loss Risk	Loss since last snapshot	Minimized by fsync setting	Minimal
Recovery Speed	Fast	Slow (command replay)	Fast
Fork Overhead	Fork per snapshot	Fork only during rewrite	Fork during rewrite
Disk I/O	Low (intermittent)	High (continuous)	Medium
Best For	Backup, disaster recovery	Zero data loss required	Production recommended

Production Recommended Persistence Configuration

# redis.conf: Hybrid mode (production recommended)

# RDB snapshot intervals: 3600s (1 hour) with 1 change, 300s with 100 changes, 60s with 10000 changes
save 3600 1 300 100 60 10000

# Enable AOF
appendonly yes
appendfilename "appendonly.aof"

# AOF fsync policy: everysec (balance between performance and durability)
appendfsync everysec

# Enable hybrid mode: use RDB format as preamble during AOF rewrite
aof-use-rdb-preamble yes

# Multi-Part AOF directory (Redis 7 default)
appenddirname "appendonlydir"

# AOF rewrite trigger conditions
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb

# RDB compression and checksum
rdbcompression yes
rdbchecksum yes
dbfilename dump.rdb

# Stop writes on background save failure
stop-writes-on-bgsave-error yes

Monitoring and Alerting

For stable Redis cluster operations, core metrics must be collected in real-time with threshold-based alerting configured.

Core Monitoring Metrics

# Check memory usage
redis-cli INFO memory | grep -E "used_memory_human|used_memory_rss_human|mem_fragmentation_ratio"
# used_memory_human:2.85G
# used_memory_rss_human:3.12G
# mem_fragmentation_ratio:1.09

# Check cluster status
redis-cli CLUSTER INFO
# cluster_state:ok
# cluster_slots_assigned:16384
# cluster_slots_ok:16384
# cluster_known_nodes:6
# cluster_size:3

# Commands processed per second
redis-cli INFO stats | grep instantaneous_ops_per_sec

# Check connection count
redis-cli INFO clients | grep connected_clients

# Keyspace hit rate (cache efficiency)
redis-cli INFO stats | grep -E "keyspace_hits|keyspace_misses"

# Check Slow Log (commands taking over 10ms)
redis-cli SLOWLOG GET 10
redis-cli SLOWLOG LEN

Python-Based Monitoring Script

import redis
from redis.cluster import RedisCluster

# Redis Cluster connection
startup_nodes = [
    {"host": "192.168.1.10", "port": 7000},
    {"host": "192.168.1.10", "port": 7001},
    {"host": "192.168.1.10", "port": 7002},
]

rc = RedisCluster(
    startup_nodes=startup_nodes,
    decode_responses=True,
    password="your_secure_password",
    socket_timeout=5,
    retry_on_timeout=True,
)

def check_cluster_health():
    """Checks overall cluster status and sends alerts on anomalies."""
    alerts = []

    # Check cluster state
    cluster_info = rc.cluster_info()
    if cluster_info.get("cluster_state") != "ok":
        alerts.append(f"[CRITICAL] Cluster state abnormal: {cluster_info.get('cluster_state')}")

    # Check memory usage per node
    for node in rc.get_nodes():
        try:
            info = rc.get_redis_connection(node).info("memory")
            used = info["used_memory"]
            maxmem = info.get("maxmemory", 0)
            if maxmem > 0:
                usage_pct = (used / maxmem) * 100
                if usage_pct > 85:
                    alerts.append(
                        f"[WARNING] Node {node.host}:{node.port} "
                        f"memory usage {usage_pct:.1f}%"
                    )

            # Check memory fragmentation ratio
            frag_ratio = info.get("mem_fragmentation_ratio", 1.0)
            if frag_ratio > 1.5:
                alerts.append(
                    f"[WARNING] Node {node.host}:{node.port} "
                    f"memory fragmentation ratio {frag_ratio:.2f}"
                )
        except Exception as e:
            alerts.append(f"[ERROR] Node {node.host}:{node.port} connection failed: {e}")

    return alerts


def get_memory_report():
    """Generates a per-node memory usage report."""
    report = []
    for node in rc.get_nodes():
        try:
            info = rc.get_redis_connection(node).info("memory")
            report.append({
                "node": f"{node.host}:{node.port}",
                "role": node.server_type,
                "used_memory_human": info["used_memory_human"],
                "used_memory_rss_human": info["used_memory_rss_human"],
                "fragmentation_ratio": info["mem_fragmentation_ratio"],
                "used_memory_peak_human": info["used_memory_peak_human"],
            })
        except Exception as e:
            report.append({"node": f"{node.host}:{node.port}", "error": str(e)})
    return report


if __name__ == "__main__":
    # Cluster health check
    alerts = check_cluster_health()
    if alerts:
        for alert in alerts:
            print(alert)
        # send_to_slack(alerts) or send_to_pagerduty(alerts)
    else:
        print("All nodes healthy")

    # Memory report output
    for item in get_memory_report():
        print(item)

Alert Threshold Standards

Metric	WARNING	CRITICAL	Description
Memory Usage	over 75%	over 90%	used_memory relative to maxmemory
Memory Fragmentation Ratio	over 1.5	over 2.0	mem_fragmentation_ratio
Connection Count	over 5,000	over 8,000	connected_clients
Cache Hit Rate	under 90%	under 80%	keyspace_hits / (hits + misses)
Slow Log Frequency	over 10/min	over 50/min	Commands taking over 10ms
Replication Lag	over 1MB	over 10MB	master_repl_offset difference

Troubleshooting

Memory Fragmentation Issues

When mem_fragmentation_ratio exceeds 1.5, jemalloc memory fragmentation is severe. Frequent key creation/deletion is the cause, and the activedefrag feature can be enabled to resolve it at runtime.

# Check current fragmentation ratio
redis-cli INFO memory | grep mem_fragmentation_ratio

# Enable Active Defrag at runtime
redis-cli CONFIG SET activedefrag yes
redis-cli CONFIG SET active-defrag-threshold-lower 10

# Check defragmentation progress
redis-cli INFO memory | grep -E "active_defrag"

Big Key Detection

Excessive data stored in a single key can cause network latency, memory spikes, and slow log increases.

# Big key scan (SCAN-based, safe for production)
redis-cli --bigkeys

# Exact memory usage of a specific key
redis-cli MEMORY USAGE large:hash SAMPLES 0

# Asynchronous deletion of big keys (UNLINK = non-blocking DEL)
redis-cli UNLINK large:hash

Repeated MOVED Redirections

If a client keeps receiving MOVED responses by sending requests to the wrong node, verify that the client library supports cluster mode. Standalone mode clients do not perform slot routing.

Replica Synchronization Failure

If a replica keeps repeating full resync, repl-backlog-size must be increased sufficiently. The default is 1MB, which should be expanded to 64MB-256MB in high-traffic environments.

# redis.conf
repl-backlog-size 256mb
repl-backlog-ttl 3600

Disaster Recovery Procedures

Automatic Failover

Redis Cluster automatically detects node failures and promotes replicas to masters. If a majority of masters mark a node as PFAIL within the cluster-node-timeout (default 15000ms), it transitions to FAIL state, and one of the failed master's replicas is automatically promoted.

# Check current cluster node status
redis-cli CLUSTER NODES

# Check if a specific node is in FAIL state
redis-cli CLUSTER INFO | grep cluster_state

# Manual failover (executed from a replica)
redis-cli -p 7003 CLUSTER FAILOVER

# Force failover (when master is unresponsive)
redis-cli -p 7003 CLUSTER FAILOVER FORCE

Node Replacement Procedure

Remove the failed node: redis-cli --cluster del-node <cluster-ip:port> <node-id>
Start new instance and join cluster: redis-cli --cluster add-node <new-ip:port> <cluster-ip:port>
Assign as replica: redis-cli -p <new-port> CLUSTER REPLICATE <master-node-id>
Verify proper slot distribution: redis-cli --cluster check <cluster-ip:port>

RDB/AOF-Based Data Recovery

# Verify AOF file integrity
redis-check-aof --fix /opt/redis-cluster/7000/appendonlydir/appendonly.aof.1.incr.aof

# Verify RDB file integrity
redis-check-rdb /opt/redis-cluster/7000/dump.rdb

# Restore from backup: stop instance then replace RDB/AOF files
redis-cli -p 7000 SHUTDOWN NOSAVE
cp /backup/dump.rdb /opt/redis-cluster/7000/dump.rdb
redis-server /opt/redis-cluster/7000/redis.conf

Operations Checklist

A checklist for stable production Redis cluster operations.

Pre-Deployment Checks

Are maxmemory and maxmemory-policy configured appropriately for the workload
Is the cluster node count at least 6 (3 masters + 3 replicas) or more
Is each node's memory at least 2x the actual data size (considering COW during fork)
Are ACL rules configured following the principle of least privilege
Does network bandwidth and latency meet cluster requirements

Daily Checks

Is memory usage below 75%
Is mem_fragmentation_ratio below 1.5
Are there any abnormal patterns in the Slow Log
Is replication lag within acceptable range
Is cache hit rate maintained at 90% or above

Weekly Checks

Have big key scans identified any abnormally sized keys
Have unused keys (based on idle time) been cleaned up
Has RDB/AOF backup file integrity been verified
Is cluster slot distribution even
Have security patches and Redis minor version updates been reviewed

Monthly Checks

Is the maxmemory setting appropriate for data growth trends
Are encoding thresholds (listpack entries/value) optimal for current data patterns
Has the disaster recovery procedure been validated through failover drills
Is the client library version up to date
Has capacity planning determined whether scaling is needed within 3-6 months

References

Redis 7.0 Official Release Blog - Redis 7 key features overview
Redis Cluster Official Spec Documentation - Hash slots, failover mechanism details
Redis Memory Optimization Official Guide - Encoding optimization, memory saving techniques
Redis Persistence Official Documentation - RDB, AOF, hybrid mode configuration guide
Redis Key Eviction Official Documentation - Detailed maxmemory-policy explanation
Valkey vs KeyDB vs Redis Comparison Guide (2026) - In-memory data store comparison
Redis Cluster Scaling Tutorial - Cluster scaling and hash slot management