Split View: Redis 7 클러스터 운영과 메모리 최적화 핸드북 2026

Redis 7 클러스터 운영과 메모리 최적화 핸드북 2026

개요
Redis 7 주요 변경사항
클러스터 아키텍처
- 최소 6노드 클러스터 구성
- Redis vs Valkey vs KeyDB 비교
해시슬롯과 리밸런싱
- 해시태그를 이용한 슬롯 제어
- 슬롯 리밸런싱
메모리 최적화 전략
데이터 구조별 인코딩 최적화
- 인코딩 확인 및 구조 전환 테스트
영속성(RDB/AOF) 설정
- RDB vs AOF vs 혼합 모드 비교
- 프로덕션 권장 영속성 설정
모니터링과 알림
트러블슈팅
장애 복구 절차
운영 체크리스트
참고자료

개요

Redis는 인메모리 데이터 스토어의 대표주자로, 캐싱부터 세션 관리, 실시간 분석, 메시지 브로커까지 폭넓게 활용된다. Redis 7 이후로는 Functions, ACL v2, Multi-Part AOF 등 운영 안정성과 프로그래머빌리티를 대폭 강화하는 기능이 추가되었으며, 2025년에는 AGPLv3 라이선스 전환과 함께 생태계에도 큰 변화가 있었다. Valkey, KeyDB 같은 대안이 부상하면서 인메모리 데이터스토어 선택지가 다양해진 상황이다.

이 핸드북에서는 Redis 7.x 기반 클러스터의 아키텍처 설계부터 해시슬롯 리밸런싱, 메모리 최적화 전략, 데이터 구조별 인코딩 튜닝, 영속성 설정, 모니터링, 장애 복구까지 실무 운영에 필요한 전 과정을 다룬다. 각 섹션마다 실전 명령어와 설정 예시를 포함하여 즉시 적용 가능한 가이드를 제공한다.

Redis 7 주요 변경사항

Redis Functions

Redis 7에서 도입된 Functions는 기존 EVAL 기반 Lua 스크립팅을 대체하는 1급 시민(first-class citizen) 프로그래밍 모델이다. Functions는 RDB와 AOF 파일에 함께 저장되며, 마스터에서 레플리카로 자동 복제된다. 하나의 라이브러리에 여러 함수를 정의할 수 있어 코드 재사용성이 높다.

#!lua name=mylib

-- 조회수 증가와 조회 기록을 원자적으로 처리하는 함수
redis.register_function('increment_view', function(keys, args)
  local current = redis.call('HINCRBY', keys[1], 'views', 1)
  redis.call('ZADD', keys[1] .. ':history', redis.call('TIME')[1], args[1])
  return current
end)

# 라이브러리 로드
cat mylib.lua | redis-cli -x FUNCTION LOAD REPLACE

# 함수 호출
redis-cli FCALL increment_view 1 article:1001 "user:42"

# 등록된 함수 목록 확인
redis-cli FUNCTION LIST

ACL v2

ACL v2는 키 단위로 읽기/쓰기 권한을 세밀하게 제어할 수 있는 기능을 추가했다. 셀렉터(Selector) 개념을 도입하여 하나의 사용자에게 복수의 규칙 집합을 부여할 수 있으며, 루트 셀렉터가 먼저 평가된 뒤 추가 셀렉터가 순서대로 적용된다.

# 캐시 전용 사용자: cache:* 키에 읽기/쓰기, session:* 키에 읽기만 허용
redis-cli ACL SETUSER cache_worker on >StrongP@ss123 \
  ~cache:* +@all \
  (~session:* %R~session:* +@read)

# ACL 규칙 확인
redis-cli ACL GETUSER cache_worker

# 현재 ACL 목록을 파일로 저장
redis-cli ACL SAVE

Multi-Part AOF

기존에는 AOF 리라이트 과정에서 하나의 큰 파일을 교체하는 방식이었지만, Redis 7의 Multi-Part AOF는 전용 디렉토리에 base 파일(전체 데이터)과 incr 파일(증분 데이터)을 분리하여 저장한다. 이를 통해 디스크 공간 낭비를 줄이고, AOF 히스토리 관리가 체계화되었다.

클러스터 아키텍처

Redis Cluster는 데이터를 여러 노드에 분산(샤딩)하여 수평 확장성을 확보하는 분산 아키텍처다. 각 마스터 노드는 16,384개의 해시슬롯 중 일부를 담당하며, 클라이언트는 키의 CRC16 해시를 통해 해당 슬롯의 노드로 직접 접근한다.

최소 6노드 클러스터 구성

프로덕션 환경에서 권장되는 최소 구성은 마스터 3대 + 레플리카 3대로 총 6노드다.

# 6개 Redis 인스턴스 설정 (포트 7000~7005)
for port in 7000 7001 7002 7003 7004 7005; do
  mkdir -p /opt/redis-cluster/${port}
  cat > /opt/redis-cluster/${port}/redis.conf << CONF
port ${port}
cluster-enabled yes
cluster-config-file nodes-${port}.conf
cluster-node-timeout 5000
appendonly yes
appendfilename "appendonly.aof"
dir /opt/redis-cluster/${port}
maxmemory 4gb
maxmemory-policy allkeys-lfu
bind 0.0.0.0
protected-mode no
save 3600 1 300 100 60 10000
aof-use-rdb-preamble yes
CONF
  redis-server /opt/redis-cluster/${port}/redis.conf &
done

# 클러스터 생성 (마스터 3 + 레플리카 3)
redis-cli --cluster create \
  192.168.1.10:7000 192.168.1.10:7001 192.168.1.10:7002 \
  192.168.1.10:7003 192.168.1.10:7004 192.168.1.10:7005 \
  --cluster-replicas 1

Redis vs Valkey vs KeyDB 비교

2024년 Redis의 라이선스 변경(RSAL2/SSPL에서 이후 AGPLv3 전환)을 계기로 Valkey와 KeyDB에 대한 관심이 높아졌다. 세 프로젝트 모두 Redis 프로토콜 호환이므로 기존 클라이언트를 그대로 사용할 수 있다.

항목	Redis 7.x / 8.x	Valkey 8.x	KeyDB
라이선스	AGPLv3 (2025~)	BSD 3-Clause	BSD 3-Clause
스레딩 모델	싱글 스레드 이벤트 루프 (I/O 스레드 지원)	I/O 멀티스레딩 강화	네이티브 멀티스레딩
처리량	기준선	Redis 대비 유사~약간 향상	멀티코어 시 2~5배 향상
거버넌스	Redis Ltd.	Linux Foundation	Snap Inc.
클라우드 매니지드	Redis Cloud, AWS ElastiCache	AWS ElastiCache, Google MemoryStore	제한적
Functions 지원	7.0부터 지원	호환 지원	미지원(Lua만 가능)
적합 시나리오	안정성, 풍부한 에코시스템	오픈소스 거버넌스 중시	높은 동시 접속 처리량 필요

해시슬롯과 리밸런싱

Redis Cluster의 모든 키는 CRC16(key) mod 16384 수식으로 해시슬롯에 매핑된다. 16,384개 슬롯이라는 숫자는 키 분산의 세밀한 제어와 클러스터 메타데이터 오버헤드 사이의 균형점으로 설계되었다. 이론적으로 최대 16,384대의 마스터 노드까지 확장 가능하지만, 실질적으로 권장되는 상한은 약 1,000대다.

해시태그를 이용한 슬롯 제어

멀티키 연산(MGET, 파이프라인 등)을 위해서는 관련 키들이 같은 슬롯에 있어야 한다. 해시태그 문법을 사용하면 특정 부분만 해시 계산에 사용된다.

# 동일 슬롯에 배치: {user:1000} 부분만 해시 계산에 사용
redis-cli SET "{user:1000}.profile" '{"name":"Kim"}'
redis-cli SET "{user:1000}.settings" '{"theme":"dark"}'
redis-cli SET "{user:1000}.cart" '["item1","item2"]'

# 같은 슬롯인지 확인
redis-cli CLUSTER KEYSLOT "{user:1000}.profile"
redis-cli CLUSTER KEYSLOT "{user:1000}.settings"

슬롯 리밸런싱

노드를 추가하거나 제거할 때 슬롯을 재분배해야 한다. 라이브 트래픽 중에도 수행할 수 있지만, 피크 시간을 피하는 것이 좋다.

# 새 노드를 클러스터에 추가
redis-cli --cluster add-node 192.168.1.10:7006 192.168.1.10:7000

# 자동 리밸런싱: 모든 마스터에 슬롯을 균등 분배
redis-cli --cluster rebalance 192.168.1.10:7000

# 특정 노드에서 다른 노드로 슬롯 수동 이동
redis-cli --cluster reshard 192.168.1.10:7000 \
  --cluster-from <source-node-id> \
  --cluster-to <target-node-id> \
  --cluster-slots 1000 \
  --cluster-yes

# 클러스터 상태 확인
redis-cli --cluster check 192.168.1.10:7000
redis-cli --cluster info 192.168.1.10:7000

메모리 최적화 전략

Redis는 모든 데이터를 메모리에 보관하므로 메모리 효율성이 곧 비용과 직결된다. 체계적인 메모리 최적화로 동일한 하드웨어에서 더 많은 데이터를 처리할 수 있다.

maxmemory-policy 비교

메모리 한계에 도달했을 때 어떤 키를 제거할지 결정하는 정책이다. Redis 7에서는 8가지 정책을 제공한다.

정책	대상 범위	알고리즘	적합 시나리오
noeviction	없음	제거하지 않음(쓰기 에러 반환)	데이터 손실 불가 환경
allkeys-lru	전체 키	최근 사용 빈도 낮은 키 제거	범용 캐시
allkeys-lfu	전체 키	접근 빈도 낮은 키 제거	핫/콜드 패턴이 뚜렷한 캐시
allkeys-random	전체 키	무작위 제거	균등한 접근 패턴
volatile-lru	TTL 설정된 키	최근 미사용 키 제거	캐시 + 영구 데이터 혼재
volatile-lfu	TTL 설정된 키	접근 빈도 낮은 키 제거	TTL 데이터 중 핫/콜드 분리
volatile-random	TTL 설정된 키	무작위 제거	TTL 키 중 균등 패턴
volatile-ttl	TTL 설정된 키	만료 임박한 키 제거	단기 캐시 우선 제거

대부분의 캐시 전용 환경에서는 allkeys-lfu가 최적이다. 캐시와 영구 데이터가 혼재된 환경이라면 volatile-lru를 사용하되, 반드시 모든 캐시 키에 TTL을 설정해야 한다.

MEMORY 명령어를 활용한 분석

# 특정 키의 메모리 사용량 확인 (바이트 단위)
redis-cli MEMORY USAGE user:profile:10001
# (integer) 256

# 중첩 구조의 경우 샘플 수 지정 (0이면 전체 탐색)
redis-cli MEMORY USAGE myhash SAMPLES 0

# 메모리 진단 리포트
redis-cli MEMORY DOCTOR
# "Sam, I have no memory problems"

# 메모리 통계 요약
redis-cli MEMORY STATS

# 전체 서버 메모리 정보
redis-cli INFO memory

redis.conf 메모리 튜닝 설정

# 최대 메모리 및 제거 정책
maxmemory 4gb
maxmemory-policy allkeys-lfu

# LFU 알고리즘 튜닝
# lfu-log-factor: 값이 클수록 카운터 증가가 느림 (기본 10)
# lfu-decay-time: 카운터 감소 주기 (분 단위, 기본 1)
lfu-log-factor 10
lfu-decay-time 1

# 메모리 조각 모음 활성화 (jemalloc 기반)
activedefrag yes
active-defrag-enabled yes
active-defrag-ignore-bytes 100mb
active-defrag-threshold-lower 10
active-defrag-threshold-upper 100
active-defrag-cycle-min 1
active-defrag-cycle-max 25

# Lazy Free 설정: 대용량 키 삭제 시 백그라운드 처리
lazyfree-lazy-eviction yes
lazyfree-lazy-expire yes
lazyfree-lazy-server-del yes
replica-lazy-flush yes

# I/O 스레드 설정 (Redis 7+)
io-threads 4
io-threads-do-reads yes

데이터 구조별 인코딩 최적화

Redis는 데이터 크기에 따라 내부 인코딩을 자동으로 전환한다. Redis 7부터 기존 ziplist 대신 listpack을 사용하여 6바이트 헤더로 메모리 효율이 더욱 개선되었다. 적절한 임계값 설정으로 5~10배의 메모리 절감 효과를 얻을 수 있다.

데이터 구조	소형 인코딩	대형 인코딩	전환 조건 설정
Hash	listpack	hashtable	hash-max-listpack-entries (128), hash-max-listpack-value (64)
List	listpack	quicklist	list-max-listpack-size (-2)
Set	listpack / intset	hashtable	set-max-listpack-entries (128), set-max-intset-entries (512)
Sorted Set	listpack	skiplist + hashtable	zset-max-listpack-entries (128), zset-max-listpack-value (64)
String	int / embstr	raw	자동 (44바이트 경계)

# redis.conf: 인코딩 임계값 조정 예시
# 소형 해시가 많은 워크로드: 임계값을 높여 listpack 유지
hash-max-listpack-entries 256
hash-max-listpack-value 128

# 소형 정렬 셋이 많은 랭킹 서비스
zset-max-listpack-entries 256
zset-max-listpack-value 64

# 소형 셋이 많은 태그 시스템
set-max-listpack-entries 256
set-max-intset-entries 1024

# List 노드당 최대 크기 (-2 = 8KB, -1 = 4KB)
list-max-listpack-size -2
list-compress-depth 1

인코딩 확인 및 구조 전환 테스트

# 현재 인코딩 확인
redis-cli OBJECT ENCODING mykey

# 소형 해시 (listpack)
redis-cli HSET small:hash f1 v1 f2 v2 f3 v3
redis-cli OBJECT ENCODING small:hash
# "listpack"

# 임계값 초과 시 hashtable로 자동 전환
# 129개 필드를 가진 해시 생성 (기본 hash-max-listpack-entries=128 초과)
for i in $(seq 1 129); do
  redis-cli HSET big:hash "field_${i}" "value_${i}"
done
redis-cli OBJECT ENCODING big:hash
# "hashtable"

# 정수만으로 구성된 Set: intset 인코딩
redis-cli SADD int:set 1 2 3 4 5
redis-cli OBJECT ENCODING int:set
# "intset"

영속성(RDB/AOF) 설정

Redis는 인메모리 데이터베이스이므로 장애 시 데이터를 복구하려면 적절한 영속성 전략이 필수다. Redis 7의 Multi-Part AOF로 영속성 메커니즘이 크게 개선되었다.

RDB vs AOF vs 혼합 모드 비교

항목	RDB	AOF	혼합(RDB + AOF)
저장 방식	주기적 스냅샷	모든 쓰기 명령 로그	RDB 프리앰블 + AOF 증분
파일 크기	작음 (바이너리 압축)	큼 (명령어 텍스트)	중간
데이터 손실 가능성	마지막 스냅샷 이후 손실	fsync 설정에 따라 최소화	최소
복구 속도	빠름	느림 (명령어 리플레이)	빠름
fork 부하	스냅샷마다 fork	리라이트 시에만 fork	리라이트 시 fork
디스크 I/O	낮음 (간헐적)	높음 (지속적)	중간
적합 시나리오	백업, 재해 복구	데이터 무손실 필수	프로덕션 권장

프로덕션 권장 영속성 설정

# redis.conf: 혼합 모드 (프로덕션 권장)

# RDB 스냅샷 주기: 3600초(1시간) 내 1회 변경, 300초 내 100회, 60초 내 10000회
save 3600 1 300 100 60 10000

# AOF 활성화
appendonly yes
appendfilename "appendonly.aof"

# AOF fsync 정책: everysec (성능과 안정성 균형)
appendfsync everysec

# 혼합 모드 활성화: AOF 리라이트 시 RDB 형식을 프리앰블로 사용
aof-use-rdb-preamble yes

# Multi-Part AOF 디렉토리 (Redis 7 기본)
appenddirname "appendonlydir"

# AOF 리라이트 트리거 조건
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb

# RDB 압축 및 체크섬
rdbcompression yes
rdbchecksum yes
dbfilename dump.rdb

# 백그라운드 저장 실패 시 쓰기 중단
stop-writes-on-bgsave-error yes

모니터링과 알림

안정적인 Redis 클러스터 운영을 위해서는 핵심 메트릭을 실시간으로 수집하고 임계값 기반 알림을 설정해야 한다.

핵심 모니터링 메트릭

# 메모리 사용률 확인
redis-cli INFO memory | grep -E "used_memory_human|used_memory_rss_human|mem_fragmentation_ratio"
# used_memory_human:2.85G
# used_memory_rss_human:3.12G
# mem_fragmentation_ratio:1.09

# 클러스터 상태 확인
redis-cli CLUSTER INFO
# cluster_state:ok
# cluster_slots_assigned:16384
# cluster_slots_ok:16384
# cluster_known_nodes:6
# cluster_size:3

# 초당 명령 처리량
redis-cli INFO stats | grep instantaneous_ops_per_sec

# 연결 수 확인
redis-cli INFO clients | grep connected_clients

# 키스페이스 히트율 (캐시 효율)
redis-cli INFO stats | grep -E "keyspace_hits|keyspace_misses"

# Slow Log 확인 (10ms 이상 소요 명령)
redis-cli SLOWLOG GET 10
redis-cli SLOWLOG LEN

Python 기반 모니터링 스크립트

import redis
from redis.cluster import RedisCluster

# Redis Cluster 연결
startup_nodes = [
    {"host": "192.168.1.10", "port": 7000},
    {"host": "192.168.1.10", "port": 7001},
    {"host": "192.168.1.10", "port": 7002},
]

rc = RedisCluster(
    startup_nodes=startup_nodes,
    decode_responses=True,
    password="your_secure_password",
    socket_timeout=5,
    retry_on_timeout=True,
)

def check_cluster_health():
    """클러스터 전체 상태를 점검하고 이상 시 알림을 발송한다."""
    alerts = []

    # 클러스터 상태 확인
    cluster_info = rc.cluster_info()
    if cluster_info.get("cluster_state") != "ok":
        alerts.append(f"[CRITICAL] 클러스터 상태 이상: {cluster_info.get('cluster_state')}")

    # 각 노드별 메모리 사용률 점검
    for node in rc.get_nodes():
        try:
            info = rc.get_redis_connection(node).info("memory")
            used = info["used_memory"]
            maxmem = info.get("maxmemory", 0)
            if maxmem > 0:
                usage_pct = (used / maxmem) * 100
                if usage_pct > 85:
                    alerts.append(
                        f"[WARNING] 노드 {node.host}:{node.port} "
                        f"메모리 사용률 {usage_pct:.1f}%"
                    )

            # 메모리 단편화율 점검
            frag_ratio = info.get("mem_fragmentation_ratio", 1.0)
            if frag_ratio > 1.5:
                alerts.append(
                    f"[WARNING] 노드 {node.host}:{node.port} "
                    f"메모리 단편화율 {frag_ratio:.2f}"
                )
        except Exception as e:
            alerts.append(f"[ERROR] 노드 {node.host}:{node.port} 접속 실패: {e}")

    return alerts


def get_memory_report():
    """노드별 메모리 사용 현황 리포트를 생성한다."""
    report = []
    for node in rc.get_nodes():
        try:
            info = rc.get_redis_connection(node).info("memory")
            report.append({
                "node": f"{node.host}:{node.port}",
                "role": node.server_type,
                "used_memory_human": info["used_memory_human"],
                "used_memory_rss_human": info["used_memory_rss_human"],
                "fragmentation_ratio": info["mem_fragmentation_ratio"],
                "used_memory_peak_human": info["used_memory_peak_human"],
            })
        except Exception as e:
            report.append({"node": f"{node.host}:{node.port}", "error": str(e)})
    return report


if __name__ == "__main__":
    # 클러스터 헬스 체크
    alerts = check_cluster_health()
    if alerts:
        for alert in alerts:
            print(alert)
        # send_to_slack(alerts) 또는 send_to_pagerduty(alerts)
    else:
        print("모든 노드 정상")

    # 메모리 리포트 출력
    for item in get_memory_report():
        print(item)

알림 임계값 기준

메트릭	WARNING	CRITICAL	설명
메모리 사용률	> 75%	> 90%	maxmemory 대비 used_memory
메모리 단편화율	> 1.5	> 2.0	mem_fragmentation_ratio
연결 수	> 5,000	> 8,000	connected_clients
캐시 히트율	< 90%	< 80%	keyspace_hits / (hits + misses)
Slow Log 빈도	> 10/분	> 50/분	10ms 이상 소요 명령
복제 지연	> 1MB	> 10MB	master_repl_offset 차이

트러블슈팅

메모리 단편화 문제

mem_fragmentation_ratio가 1.5를 초과하면 jemalloc의 메모리 단편화가 심한 상태다. 빈번한 키 생성/삭제가 원인이 되며, activedefrag 기능을 활성화하여 런타임에 해소할 수 있다.

# 현재 단편화율 확인
redis-cli INFO memory | grep mem_fragmentation_ratio

# 런타임에서 Active Defrag 활성화
redis-cli CONFIG SET activedefrag yes
redis-cli CONFIG SET active-defrag-threshold-lower 10

# 조각 모음 진행 상태 확인
redis-cli INFO memory | grep -E "active_defrag"

빅 키(Big Key) 탐지

단일 키에 과도한 데이터가 저장되면 네트워크 지연, 메모리 스파이크, 슬로우 로그 증가의 원인이 된다.

# 빅 키 스캔 (프로덕션에서도 안전한 SCAN 기반)
redis-cli --bigkeys

# 특정 키의 정확한 메모리 사용량
redis-cli MEMORY USAGE large:hash SAMPLES 0

# 빅 키를 비동기 삭제 (UNLINK = 비차단 DEL)
redis-cli UNLINK large:hash

MOVED 리다이렉션 반복

클라이언트가 잘못된 노드로 요청을 보내 계속 MOVED 응답을 받는 경우, 클라이언트 라이브러리가 클러스터 모드를 지원하는지 확인해야 한다. 단독 모드 클라이언트를 사용하면 슬롯 라우팅이 되지 않는다.

Replica 동기화 실패

레플리카가 풀 리싱크를 반복한다면 repl-backlog-size를 충분히 늘려야 한다. 기본값은 1MB로, 트래픽이 높은 환경에서는 64MB~256MB로 확대한다.

# redis.conf
repl-backlog-size 256mb
repl-backlog-ttl 3600

장애 복구 절차

자동 페일오버

Redis Cluster는 노드 장애를 자동 감지하고 레플리카를 마스터로 승격시킨다. cluster-node-timeout(기본 15000ms) 이내에 과반수 마스터가 해당 노드를 PFAIL로 판단하면 FAIL 상태로 전환되고, 해당 마스터의 레플리카 중 하나가 자동 승격된다.

# 현재 클러스터 노드 상태 확인
redis-cli CLUSTER NODES

# 특정 노드가 FAIL 상태인지 확인
redis-cli CLUSTER INFO | grep cluster_state

# 수동 페일오버 (레플리카에서 실행)
redis-cli -p 7003 CLUSTER FAILOVER

# 강제 페일오버 (마스터 응답 불가 시)
redis-cli -p 7003 CLUSTER FAILOVER FORCE

노드 교체 절차

장애 노드 제거: redis-cli --cluster del-node <cluster-ip:port> <node-id>
새 인스턴스 기동 후 클러스터 참가: redis-cli --cluster add-node <new-ip:port> <cluster-ip:port>
레플리카 할당: redis-cli -p <new-port> CLUSTER REPLICATE <master-node-id>
슬롯 정상 분배 확인: redis-cli --cluster check <cluster-ip:port>

RDB/AOF 기반 데이터 복구

# AOF 파일 무결성 검증
redis-check-aof --fix /opt/redis-cluster/7000/appendonlydir/appendonly.aof.1.incr.aof

# RDB 파일 무결성 검증
redis-check-rdb /opt/redis-cluster/7000/dump.rdb

# 백업에서 복구: 인스턴스 중지 후 RDB/AOF 파일 교체
redis-cli -p 7000 SHUTDOWN NOSAVE
cp /backup/dump.rdb /opt/redis-cluster/7000/dump.rdb
redis-server /opt/redis-cluster/7000/redis.conf

운영 체크리스트

프로덕션 Redis 클러스터를 안정적으로 운영하기 위한 항목별 점검 리스트다.

배포 전 점검

maxmemory와 maxmemory-policy가 워크로드에 맞게 설정되었는가
클러스터 노드 수가 최소 6개(마스터 3 + 레플리카 3) 이상인가
각 노드의 메모리가 실제 데이터의 2배 이상 확보되었는가 (fork 시 COW 고려)
ACL 규칙이 최소 권한 원칙에 따라 설정되었는가
네트워크 대역폭과 레이턴시가 클러스터 요구사항을 충족하는가

일일 점검

메모리 사용률이 75% 미만인가
mem_fragmentation_ratio가 1.5 미만인가
Slow Log에 비정상적인 패턴이 없는가
복제 지연(replication lag)이 허용 범위 내인가
캐시 히트율이 90% 이상 유지되고 있는가

주간 점검

빅 키 스캔으로 비정상적 크기의 키를 확인했는가
미사용 키(idle time 기준)를 정리했는가
RDB/AOF 백업 파일의 무결성을 검증했는가
클러스터 슬롯 분배가 균등한가
보안 패치 및 Redis 마이너 버전 업데이트를 검토했는가

월간 점검

maxmemory 설정값이 데이터 증가 추세에 적합한가
인코딩 임계값(listpack entries/value)이 현재 데이터 패턴에 최적인가
장애 복구 절차를 모의 훈련(failover drill)으로 검증했는가
클라이언트 라이브러리 버전이 최신인가
용량 계획(capacity planning)을 통해 3~6개월 내 확장 필요 여부를 판단했는가

참고자료

Redis 7.0 공식 릴리스 블로그 - Redis 7 주요 기능 소개
Redis Cluster 공식 스펙 문서 - 해시슬롯, 페일오버 메커니즘 상세
Redis 메모리 최적화 공식 가이드 - 인코딩 최적화, 메모리 절감 기법
Redis 영속성 공식 문서 - RDB, AOF, 혼합 모드 설정 가이드
Redis Key Eviction 공식 문서 - maxmemory-policy 상세 설명
Valkey vs KeyDB vs Redis 비교 가이드 (2026) - 인메모리 데이터스토어 비교
Redis Cluster 스케일링 튜토리얼 - 클러스터 확장과 해시슬롯 관리

Redis 7 Cluster Operations and Memory Optimization Handbook 2026

Overview
Key Changes in Redis 7
Cluster Architecture
- Minimum 6-Node Cluster Configuration
- Redis vs Valkey vs KeyDB Comparison
Hash Slots and Rebalancing
- Hash Tags for Slot Control
- Slot Rebalancing
Memory Optimization Strategies
Per-Data-Structure Encoding Optimization
- Encoding Verification and Structure Transition Testing
Persistence (RDB/AOF) Configuration
- RDB vs AOF vs Hybrid Mode Comparison
- Production Recommended Persistence Configuration
Monitoring and Alerting
Troubleshooting
Disaster Recovery Procedures
Operations Checklist
References
Quiz

Overview

Redis is the leading in-memory data store, widely used for caching, session management, real-time analytics, and message brokering. Since Redis 7, features such as Functions, ACL v2, and Multi-Part AOF have been added, significantly enhancing operational stability and programmability. In 2025, the transition to AGPLv3 licensing brought major changes to the ecosystem. Alternatives like Valkey and KeyDB have emerged, diversifying the options for in-memory data stores.

This handbook covers the entire production operations process for Redis 7.x-based clusters, from architecture design to hash slot rebalancing, memory optimization strategies, per-data-structure encoding tuning, persistence configuration, monitoring, and disaster recovery. Each section includes practical commands and configuration examples for immediate application.

Key Changes in Redis 7

Redis Functions

Functions, introduced in Redis 7, is a first-class citizen programming model that replaces the existing EVAL-based Lua scripting. Functions are stored alongside RDB and AOF files and are automatically replicated from master to replica. Multiple functions can be defined in a single library, enabling high code reusability.

#!lua name=mylib

-- Function that atomically increments view count and records view history
redis.register_function('increment_view', function(keys, args)
  local current = redis.call('HINCRBY', keys[1], 'views', 1)
  redis.call('ZADD', keys[1] .. ':history', redis.call('TIME')[1], args[1])
  return current
end)

# Load the library
cat mylib.lua | redis-cli -x FUNCTION LOAD REPLACE

# Call the function
redis-cli FCALL increment_view 1 article:1001 "user:42"

# List registered functions
redis-cli FUNCTION LIST

ACL v2

ACL v2 adds the ability to finely control read/write permissions at the key level. It introduces the Selector concept, allowing multiple rule sets to be assigned to a single user. The root selector is evaluated first, followed by additional selectors in order.

# Cache-only user: read/write on cache:* keys, read-only on session:* keys
redis-cli ACL SETUSER cache_worker on >StrongP@ss123 \
  ~cache:* +@all \
  (~session:* %R~session:* +@read)

# Verify ACL rules
redis-cli ACL GETUSER cache_worker

# Save current ACL list to file
redis-cli ACL SAVE

Multi-Part AOF

Previously, the AOF rewrite process replaced a single large file. Redis 7's Multi-Part AOF stores base files (full data) and incr files (incremental data) separately in a dedicated directory. This reduces disk space waste and systematizes AOF history management.

Cluster Architecture

Redis Cluster is a distributed architecture that shards data across multiple nodes for horizontal scalability. Each master node is responsible for a subset of the 16,384 hash slots, and clients access the appropriate node directly via the CRC16 hash of the key.

Minimum 6-Node Cluster Configuration

The recommended minimum configuration for production environments is 3 masters + 3 replicas for a total of 6 nodes.

# Configure 6 Redis instances (ports 7000-7005)
for port in 7000 7001 7002 7003 7004 7005; do
  mkdir -p /opt/redis-cluster/${port}
  cat > /opt/redis-cluster/${port}/redis.conf << CONF
port ${port}
cluster-enabled yes
cluster-config-file nodes-${port}.conf
cluster-node-timeout 5000
appendonly yes
appendfilename "appendonly.aof"
dir /opt/redis-cluster/${port}
maxmemory 4gb
maxmemory-policy allkeys-lfu
bind 0.0.0.0
protected-mode no
save 3600 1 300 100 60 10000
aof-use-rdb-preamble yes
CONF
  redis-server /opt/redis-cluster/${port}/redis.conf &
done

# Create cluster (3 masters + 3 replicas)
redis-cli --cluster create \
  192.168.1.10:7000 192.168.1.10:7001 192.168.1.10:7002 \
  192.168.1.10:7003 192.168.1.10:7004 192.168.1.10:7005 \
  --cluster-replicas 1

Redis vs Valkey vs KeyDB Comparison

Following Redis's license change in 2024 (from RSAL2/SSPL to later AGPLv3), interest in Valkey and KeyDB has increased. All three projects are Redis protocol-compatible, so existing clients can be used as-is.

Category	Redis 7.x / 8.x	Valkey 8.x	KeyDB
License	AGPLv3 (2025~)	BSD 3-Clause	BSD 3-Clause
Threading Model	Single-threaded event loop (I/O threads supported)	Enhanced I/O multi-threading	Native multi-threading
Throughput	Baseline	Similar to or slightly above Redis	2-5x improvement with multi-core
Governance	Redis Ltd.	Linux Foundation	Snap Inc.
Managed Cloud	Redis Cloud, AWS ElastiCache	AWS ElastiCache, Google MemoryStore	Limited
Functions Support	Supported since 7.0	Compatible support	Not supported (Lua only)
Best For	Stability, rich ecosystem	Open-source governance priority	High concurrent throughput needs

Hash Slots and Rebalancing

All keys in Redis Cluster are mapped to hash slots using the formula CRC16(key) mod 16384. The number 16,384 slots was designed as a balance point between fine-grained key distribution control and cluster metadata overhead. Theoretically, up to 16,384 master nodes can be scaled, but the practical recommended upper limit is approximately 1,000.

Hash Tags for Slot Control

For multi-key operations (MGET, pipelining, etc.), related keys must be in the same slot. Hash tag syntax allows only a specific part to be used for hash calculation.

# Place in the same slot: only the {user:1000} part is used for hash calculation
redis-cli SET "{user:1000}.profile" '{"name":"Kim"}'
redis-cli SET "{user:1000}.settings" '{"theme":"dark"}'
redis-cli SET "{user:1000}.cart" '["item1","item2"]'

# Verify they are in the same slot
redis-cli CLUSTER KEYSLOT "{user:1000}.profile"
redis-cli CLUSTER KEYSLOT "{user:1000}.settings"

Slot Rebalancing

When adding or removing nodes, slots must be redistributed. This can be performed during live traffic, but it is best to avoid peak times.

# Add a new node to the cluster
redis-cli --cluster add-node 192.168.1.10:7006 192.168.1.10:7000

# Auto rebalancing: distribute slots evenly across all masters
redis-cli --cluster rebalance 192.168.1.10:7000

# Manually move slots from one node to another
redis-cli --cluster reshard 192.168.1.10:7000 \
  --cluster-from <source-node-id> \
  --cluster-to <target-node-id> \
  --cluster-slots 1000 \
  --cluster-yes

# Check cluster status
redis-cli --cluster check 192.168.1.10:7000
redis-cli --cluster info 192.168.1.10:7000

Memory Optimization Strategies

Redis stores all data in memory, so memory efficiency directly translates to cost. With systematic memory optimization, you can handle more data on the same hardware.

maxmemory-policy Comparison

This policy determines which keys to evict when the memory limit is reached. Redis 7 provides 8 policies.

Policy	Target Scope	Algorithm	Best For
noeviction	None	No eviction (returns write errors)	Environments where data loss is unacceptable
allkeys-lru	All keys	Evict least recently used keys	General-purpose caching
allkeys-lfu	All keys	Evict least frequently used keys	Cache with clear hot/cold patterns
allkeys-random	All keys	Random eviction	Uniform access patterns
volatile-lru	Keys with TTL set	Evict least recently used keys	Mixed cache + persistent data
volatile-lfu	Keys with TTL set	Evict least frequently used keys	Hot/cold separation among TTL data
volatile-random	Keys with TTL set	Random eviction	Uniform pattern among TTL keys
volatile-ttl	Keys with TTL set	Evict keys closest to expiration	Short-lived cache eviction priority

For most cache-only environments, allkeys-lfu is optimal. For environments where cache and persistent data coexist, use volatile-lru and ensure all cache keys have a TTL set.

Analysis Using MEMORY Commands

# Check memory usage of a specific key (in bytes)
redis-cli MEMORY USAGE user:profile:10001
# (integer) 256

# For nested structures, specify sample count (0 for full traversal)
redis-cli MEMORY USAGE myhash SAMPLES 0

# Memory diagnostic report
redis-cli MEMORY DOCTOR
# "Sam, I have no memory problems"

# Memory statistics summary
redis-cli MEMORY STATS

# Full server memory information
redis-cli INFO memory

redis.conf Memory Tuning Configuration

# Maximum memory and eviction policy
maxmemory 4gb
maxmemory-policy allkeys-lfu

# LFU algorithm tuning
# lfu-log-factor: higher values slow counter increments (default 10)
# lfu-decay-time: counter decay period in minutes (default 1)
lfu-log-factor 10
lfu-decay-time 1

# Enable memory defragmentation (jemalloc-based)
activedefrag yes
active-defrag-enabled yes
active-defrag-ignore-bytes 100mb
active-defrag-threshold-lower 10
active-defrag-threshold-upper 100
active-defrag-cycle-min 1
active-defrag-cycle-max 25

# Lazy Free settings: background processing for large key deletions
lazyfree-lazy-eviction yes
lazyfree-lazy-expire yes
lazyfree-lazy-server-del yes
replica-lazy-flush yes

# I/O thread settings (Redis 7+)
io-threads 4
io-threads-do-reads yes

Per-Data-Structure Encoding Optimization

Redis automatically switches internal encodings based on data size. Since Redis 7, listpack is used instead of the previous ziplist, with a 6-byte header for improved memory efficiency. Proper threshold configuration can achieve 5-10x memory savings.

Data Structure	Small Encoding	Large Encoding	Transition Setting
Hash	listpack	hashtable	hash-max-listpack-entries (128), hash-max-listpack-value (64)
List	listpack	quicklist	list-max-listpack-size (-2)
Set	listpack / intset	hashtable	set-max-listpack-entries (128), set-max-intset-entries (512)
Sorted Set	listpack	skiplist + hashtable	zset-max-listpack-entries (128), zset-max-listpack-value (64)
String	int / embstr	raw	Automatic (44-byte boundary)

# redis.conf: encoding threshold adjustment examples
# Workloads with many small hashes: increase thresholds to maintain listpack
hash-max-listpack-entries 256
hash-max-listpack-value 128

# Ranking service with many small sorted sets
zset-max-listpack-entries 256
zset-max-listpack-value 64

# Tag system with many small sets
set-max-listpack-entries 256
set-max-intset-entries 1024

# Maximum size per list node (-2 = 8KB, -1 = 4KB)
list-max-listpack-size -2
list-compress-depth 1

Encoding Verification and Structure Transition Testing

# Check current encoding
redis-cli OBJECT ENCODING mykey

# Small hash (listpack)
redis-cli HSET small:hash f1 v1 f2 v2 f3 v3
redis-cli OBJECT ENCODING small:hash
# "listpack"

# Automatic transition to hashtable when threshold is exceeded
# Create a hash with 129 fields (exceeding default hash-max-listpack-entries=128)
for i in $(seq 1 129); do
  redis-cli HSET big:hash "field_${i}" "value_${i}"
done
redis-cli OBJECT ENCODING big:hash
# "hashtable"

# Set composed of only integers: intset encoding
redis-cli SADD int:set 1 2 3 4 5
redis-cli OBJECT ENCODING int:set
# "intset"

Persistence (RDB/AOF) Configuration

Since Redis is an in-memory database, a proper persistence strategy is essential for data recovery in case of failure. Redis 7's Multi-Part AOF significantly improved the persistence mechanism.

RDB vs AOF vs Hybrid Mode Comparison

Category	RDB	AOF	Hybrid (RDB + AOF)
Storage Method	Periodic snapshots	Log of all write commands	RDB preamble + AOF incremental
File Size	Small (binary compressed)	Large (command text)	Medium
Data Loss Risk	Loss since last snapshot	Minimized by fsync setting	Minimal
Recovery Speed	Fast	Slow (command replay)	Fast
Fork Overhead	Fork per snapshot	Fork only during rewrite	Fork during rewrite
Disk I/O	Low (intermittent)	High (continuous)	Medium
Best For	Backup, disaster recovery	Zero data loss required	Production recommended

Production Recommended Persistence Configuration

# redis.conf: Hybrid mode (production recommended)

# RDB snapshot intervals: 3600s (1 hour) with 1 change, 300s with 100 changes, 60s with 10000 changes
save 3600 1 300 100 60 10000

# Enable AOF
appendonly yes
appendfilename "appendonly.aof"

# AOF fsync policy: everysec (balance between performance and durability)
appendfsync everysec

# Enable hybrid mode: use RDB format as preamble during AOF rewrite
aof-use-rdb-preamble yes

# Multi-Part AOF directory (Redis 7 default)
appenddirname "appendonlydir"

# AOF rewrite trigger conditions
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb

# RDB compression and checksum
rdbcompression yes
rdbchecksum yes
dbfilename dump.rdb

# Stop writes on background save failure
stop-writes-on-bgsave-error yes

Monitoring and Alerting

For stable Redis cluster operations, core metrics must be collected in real-time with threshold-based alerting configured.

Core Monitoring Metrics

# Check memory usage
redis-cli INFO memory | grep -E "used_memory_human|used_memory_rss_human|mem_fragmentation_ratio"
# used_memory_human:2.85G
# used_memory_rss_human:3.12G
# mem_fragmentation_ratio:1.09

# Check cluster status
redis-cli CLUSTER INFO
# cluster_state:ok
# cluster_slots_assigned:16384
# cluster_slots_ok:16384
# cluster_known_nodes:6
# cluster_size:3

# Commands processed per second
redis-cli INFO stats | grep instantaneous_ops_per_sec

# Check connection count
redis-cli INFO clients | grep connected_clients

# Keyspace hit rate (cache efficiency)
redis-cli INFO stats | grep -E "keyspace_hits|keyspace_misses"

# Check Slow Log (commands taking over 10ms)
redis-cli SLOWLOG GET 10
redis-cli SLOWLOG LEN

Python-Based Monitoring Script

import redis
from redis.cluster import RedisCluster

# Redis Cluster connection
startup_nodes = [
    {"host": "192.168.1.10", "port": 7000},
    {"host": "192.168.1.10", "port": 7001},
    {"host": "192.168.1.10", "port": 7002},
]

rc = RedisCluster(
    startup_nodes=startup_nodes,
    decode_responses=True,
    password="your_secure_password",
    socket_timeout=5,
    retry_on_timeout=True,
)

def check_cluster_health():
    """Checks overall cluster status and sends alerts on anomalies."""
    alerts = []

    # Check cluster state
    cluster_info = rc.cluster_info()
    if cluster_info.get("cluster_state") != "ok":
        alerts.append(f"[CRITICAL] Cluster state abnormal: {cluster_info.get('cluster_state')}")

    # Check memory usage per node
    for node in rc.get_nodes():
        try:
            info = rc.get_redis_connection(node).info("memory")
            used = info["used_memory"]
            maxmem = info.get("maxmemory", 0)
            if maxmem > 0:
                usage_pct = (used / maxmem) * 100
                if usage_pct > 85:
                    alerts.append(
                        f"[WARNING] Node {node.host}:{node.port} "
                        f"memory usage {usage_pct:.1f}%"
                    )

            # Check memory fragmentation ratio
            frag_ratio = info.get("mem_fragmentation_ratio", 1.0)
            if frag_ratio > 1.5:
                alerts.append(
                    f"[WARNING] Node {node.host}:{node.port} "
                    f"memory fragmentation ratio {frag_ratio:.2f}"
                )
        except Exception as e:
            alerts.append(f"[ERROR] Node {node.host}:{node.port} connection failed: {e}")

    return alerts


def get_memory_report():
    """Generates a per-node memory usage report."""
    report = []
    for node in rc.get_nodes():
        try:
            info = rc.get_redis_connection(node).info("memory")
            report.append({
                "node": f"{node.host}:{node.port}",
                "role": node.server_type,
                "used_memory_human": info["used_memory_human"],
                "used_memory_rss_human": info["used_memory_rss_human"],
                "fragmentation_ratio": info["mem_fragmentation_ratio"],
                "used_memory_peak_human": info["used_memory_peak_human"],
            })
        except Exception as e:
            report.append({"node": f"{node.host}:{node.port}", "error": str(e)})
    return report


if __name__ == "__main__":
    # Cluster health check
    alerts = check_cluster_health()
    if alerts:
        for alert in alerts:
            print(alert)
        # send_to_slack(alerts) or send_to_pagerduty(alerts)
    else:
        print("All nodes healthy")

    # Memory report output
    for item in get_memory_report():
        print(item)

Alert Threshold Standards

Metric	WARNING	CRITICAL	Description
Memory Usage	over 75%	over 90%	used_memory relative to maxmemory
Memory Fragmentation Ratio	over 1.5	over 2.0	mem_fragmentation_ratio
Connection Count	over 5,000	over 8,000	connected_clients
Cache Hit Rate	under 90%	under 80%	keyspace_hits / (hits + misses)
Slow Log Frequency	over 10/min	over 50/min	Commands taking over 10ms
Replication Lag	over 1MB	over 10MB	master_repl_offset difference

Troubleshooting

Memory Fragmentation Issues

When mem_fragmentation_ratio exceeds 1.5, jemalloc memory fragmentation is severe. Frequent key creation/deletion is the cause, and the activedefrag feature can be enabled to resolve it at runtime.

# Check current fragmentation ratio
redis-cli INFO memory | grep mem_fragmentation_ratio

# Enable Active Defrag at runtime
redis-cli CONFIG SET activedefrag yes
redis-cli CONFIG SET active-defrag-threshold-lower 10

# Check defragmentation progress
redis-cli INFO memory | grep -E "active_defrag"

Big Key Detection

Excessive data stored in a single key can cause network latency, memory spikes, and slow log increases.

# Big key scan (SCAN-based, safe for production)
redis-cli --bigkeys

# Exact memory usage of a specific key
redis-cli MEMORY USAGE large:hash SAMPLES 0

# Asynchronous deletion of big keys (UNLINK = non-blocking DEL)
redis-cli UNLINK large:hash

Repeated MOVED Redirections

If a client keeps receiving MOVED responses by sending requests to the wrong node, verify that the client library supports cluster mode. Standalone mode clients do not perform slot routing.

Replica Synchronization Failure

If a replica keeps repeating full resync, repl-backlog-size must be increased sufficiently. The default is 1MB, which should be expanded to 64MB-256MB in high-traffic environments.

# redis.conf
repl-backlog-size 256mb
repl-backlog-ttl 3600

Disaster Recovery Procedures

Automatic Failover

Redis Cluster automatically detects node failures and promotes replicas to masters. If a majority of masters mark a node as PFAIL within the cluster-node-timeout (default 15000ms), it transitions to FAIL state, and one of the failed master's replicas is automatically promoted.

# Check current cluster node status
redis-cli CLUSTER NODES

# Check if a specific node is in FAIL state
redis-cli CLUSTER INFO | grep cluster_state

# Manual failover (executed from a replica)
redis-cli -p 7003 CLUSTER FAILOVER

# Force failover (when master is unresponsive)
redis-cli -p 7003 CLUSTER FAILOVER FORCE

Node Replacement Procedure

Remove the failed node: redis-cli --cluster del-node <cluster-ip:port> <node-id>
Start new instance and join cluster: redis-cli --cluster add-node <new-ip:port> <cluster-ip:port>
Assign as replica: redis-cli -p <new-port> CLUSTER REPLICATE <master-node-id>
Verify proper slot distribution: redis-cli --cluster check <cluster-ip:port>

RDB/AOF-Based Data Recovery

# Verify AOF file integrity
redis-check-aof --fix /opt/redis-cluster/7000/appendonlydir/appendonly.aof.1.incr.aof

# Verify RDB file integrity
redis-check-rdb /opt/redis-cluster/7000/dump.rdb

# Restore from backup: stop instance then replace RDB/AOF files
redis-cli -p 7000 SHUTDOWN NOSAVE
cp /backup/dump.rdb /opt/redis-cluster/7000/dump.rdb
redis-server /opt/redis-cluster/7000/redis.conf

Operations Checklist

A checklist for stable production Redis cluster operations.

Pre-Deployment Checks

Are maxmemory and maxmemory-policy configured appropriately for the workload
Is the cluster node count at least 6 (3 masters + 3 replicas) or more
Is each node's memory at least 2x the actual data size (considering COW during fork)
Are ACL rules configured following the principle of least privilege
Does network bandwidth and latency meet cluster requirements

Daily Checks

Is memory usage below 75%
Is mem_fragmentation_ratio below 1.5
Are there any abnormal patterns in the Slow Log
Is replication lag within acceptable range
Is cache hit rate maintained at 90% or above

Weekly Checks

Have big key scans identified any abnormally sized keys
Have unused keys (based on idle time) been cleaned up
Has RDB/AOF backup file integrity been verified
Is cluster slot distribution even
Have security patches and Redis minor version updates been reviewed

Monthly Checks

Is the maxmemory setting appropriate for data growth trends
Are encoding thresholds (listpack entries/value) optimal for current data patterns
Has the disaster recovery procedure been validated through failover drills
Is the client library version up to date
Has capacity planning determined whether scaling is needed within 3-6 months

References

Redis 7.0 Official Release Blog - Redis 7 key features overview
Redis Cluster Official Spec Documentation - Hash slots, failover mechanism details
Redis Memory Optimization Official Guide - Encoding optimization, memory saving techniques
Redis Persistence Official Documentation - RDB, AOF, hybrid mode configuration guide
Redis Key Eviction Official Documentation - Detailed maxmemory-policy explanation
Valkey vs KeyDB vs Redis Comparison Guide (2026) - In-memory data store comparison
Redis Cluster Scaling Tutorial - Cluster scaling and hash slot management

Quiz

Q1: What is the main topic covered in "Redis 7 Cluster Operations and Memory Optimization Handbook 2026"?

Redis 7 Cluster Operations and Memory Optimization Handbook. Covers cluster configuration, hash slot management, memory policies, per-data-structure encoding optimization, persistence (RDB/AOF), monitoring, and disaster recovery.

Q2: What is Key Changes in Redis 7?

Redis Functions Functions, introduced in Redis 7, is a first-class citizen programming model that replaces the existing EVAL-based Lua scripting. Functions are stored alongside RDB and AOF files and are automatically replicated from master to replica.

Q3: Describe the Cluster Architecture.

Q4: What are the key aspects of Hash Slots and Rebalancing?

Q5: How can Memory Optimization Strategies be achieved effectively?

Redis stores all data in memory, so memory efficiency directly translates to cost. With systematic memory optimization, you can handle more data on the same hardware. maxmemory-policy Comparison This policy determines which keys to evict when the memory limit is reached.