Skip to content
Published on

Redis Complete Guide 2025: Caching Strategies, Data Structures, Pub/Sub, and Redis Stack

Authors

Introduction

Redis is the most widely used in-memory data store in 2025. Beyond simple caching, it serves as a message broker, session store, real-time leaderboard, rate limiter, and distributed lock. From Redis 7.4 features and the emergence of Redis Stack to the Valkey fork controversy, this guide covers everything about Redis.


1. Redis Overview

What is Redis?

Redis (Remote Dictionary Server) is an in-memory key-value data store. Since all data is stored in memory, it provides microsecond response times.

Redis 7.4 Key Features

  • Redis Functions — Evolution of Lua scripts. Managed as libraries
  • ACL v2 — Fine-grained access control
  • Client-side caching — Client-side cache invalidation support
  • Multi-part AOF — Improved persistence

The Valkey Fork Story

In 2024, Redis changed its license (SSPL + RSALv2), and the Linux Foundation forked Redis 7.2 as Valkey. AWS, Google, Oracle, and others support Valkey. Currently both projects maintain compatibility, but may diverge long-term.


2. Five Core Data Structures

2.1 String

The most basic data type. Can store up to 512MB.

# Basic SET/GET
SET user:1:name "Alice"
GET user:1:name

# Atomic increment/decrement
SET page:views 0
INCR page:views          # 1
INCRBY page:views 10     # 11

# TTL setting
SET session:abc123 "user_data" EX 3600   # Expires in 1 hour
TTL session:abc123                        # Check remaining time

# SET options
SET lock:resource "owner1" NX EX 30      # NX: Set only if key doesn't exist
SET user:1:name "Bob" XX                 # XX: Update only if key exists

Use cases: Session tokens, counters, temporary data, distributed locks

2.2 List

Doubly linked list. O(1) push/pop from both ends.

# Basic operations
LPUSH queue:emails "email1" "email2" "email3"
RPOP queue:emails                        # "email1" (FIFO queue)

# Range query
LRANGE queue:emails 0 -1                 # All elements

# Blocking pop (message queue pattern)
BRPOP queue:emails 30                    # Wait 30 seconds then pop

# Trimming (keep recent N items)
LPUSH notifications:user1 "new_msg"
LTRIM notifications:user1 0 99          # Keep only recent 100

Use cases: Message queues, recent activity feeds, job queues

2.3 Set

Unordered collection of unique elements. Supports set operations (union, intersection, difference).

# Basic operations
SADD tags:post:1 "python" "redis" "backend"
SADD tags:post:2 "python" "django" "orm"

# Membership check
SISMEMBER tags:post:1 "python"           # 1 (true)

# Set operations
SINTER tags:post:1 tags:post:2           # "python" (intersection)
SUNION tags:post:1 tags:post:2           # union
SDIFF tags:post:1 tags:post:2            # elements only in post:1

# Random sampling
SRANDMEMBER tags:post:1 2               # Random 2 elements

Use cases: Tag systems, unique visitor tracking, friend relationships, recommendation systems

2.4 Sorted Set (ZSet)

Collection of unique elements sorted by score. Ideal for leaderboards.

# Leaderboard implementation
ZADD leaderboard 1500 "player:alice"
ZADD leaderboard 2300 "player:bob"
ZADD leaderboard 1800 "player:charlie"

# Ranking query (descending by score)
ZREVRANGE leaderboard 0 2 WITHSCORES
# 1) "player:bob"     2) "2300"
# 3) "player:charlie" 4) "1800"
# 5) "player:alice"   6) "1500"

# Specific member rank (0-based)
ZREVRANK leaderboard "player:alice"      # 2

# Score increment
ZINCRBY leaderboard 500 "player:alice"   # 2000

# Range search
ZRANGEBYSCORE leaderboard 1500 2000 WITHSCORES

Use cases: Leaderboards, priority queues, time-ordered events, rate limiting

2.5 Hash

Map of field-value pairs. Ideal for representing objects.

# Store user profile
HSET user:1 name "Alice" email "alice@example.com" age "30" role "admin"

# Get individual field
HGET user:1 name                         # "Alice"

# Get all
HGETALL user:1

# Field increment
HINCRBY user:1 age 1                     # 31

# Existence check
HEXISTS user:1 email                     # 1 (true)

# Multiple fields at once
HMGET user:1 name email role

Use cases: User profiles, configuration values, session data, shopping carts


3. Advanced Data Structures

3.1 HyperLogLog

Probabilistic data structure for estimating unique element counts. Uses only 12KB of memory to count up to 2^64 elements (0.81% error rate).

# Estimate unique visitors
PFADD visitors:2025-03-23 "user1" "user2" "user3"
PFADD visitors:2025-03-23 "user1" "user4"          # user1 is duplicate

PFCOUNT visitors:2025-03-23                         # 4

# Merge multiple days
PFMERGE visitors:week visitors:2025-03-23 visitors:2025-03-24
PFCOUNT visitors:week

3.2 Bitmap

Bit-level operations. Memory-efficient for boolean state tracking.

# Daily attendance check
SETBIT attendance:2025-03-23 1001 1      # User 1001 present
SETBIT attendance:2025-03-23 1002 1
SETBIT attendance:2025-03-23 1003 0      # Absent

# Check attendance
GETBIT attendance:2025-03-23 1001        # 1

# Count attendees
BITCOUNT attendance:2025-03-23           # 2

# Consecutive attendance (AND operation)
BITOP AND consecutive attendance:2025-03-22 attendance:2025-03-23
BITCOUNT consecutive

3.3 Geospatial

Location-based data. Supports radius search and distance calculation.

# Add locations (longitude, latitude)
GEOADD stores 126.9784 37.5665 "gangnam-store"
GEOADD stores 127.0276 37.4979 "samsung-store"
GEOADD stores 126.9316 37.5563 "hongdae-store"

# Distance calculation
GEODIST stores "gangnam-store" "hongdae-store" km    # ~5.2 km

# Radius search (Redis 6.2+)
GEOSEARCH stores FROMLONLAT 126.9784 37.5665 BYRADIUS 10 km ASC COUNT 5

3.4 Redis Streams

Log-based message structure. Supports Consumer Groups similar to Kafka.

# Add messages
XADD events * type "order" user_id "123" amount "50000"
XADD events * type "payment" user_id "123" status "completed"

# Read
XRANGE events - + COUNT 10

# Create Consumer Group
XGROUP CREATE events analytics-group $ MKSTREAM

# Read as consumer
XREADGROUP GROUP analytics-group consumer-1 COUNT 5 BLOCK 2000 STREAMS events >

# ACK (processing complete)
XACK events analytics-group "1679000000000-0"

# Check pending messages
XPENDING events analytics-group

4. Caching Patterns

4.1 Cache-Aside (Lazy Loading)

The most common pattern. The application manages the cache directly.

import redis
import json

r = redis.Redis(host="localhost", port=6379, decode_responses=True)

def get_user(user_id: int) -> dict:
    cache_key = f"user:{user_id}"

    # 1. Check cache
    cached = r.get(cache_key)
    if cached:
        return json.loads(cached)

    # 2. Cache miss -> Query DB
    user = db.query(User).filter(User.id == user_id).first()
    if not user:
        return None

    # 3. Store in cache
    r.setex(cache_key, 3600, json.dumps(user.to_dict()))
    return user.to_dict()

def update_user(user_id: int, data: dict):
    db.query(User).filter(User.id == user_id).update(data)
    db.commit()

    # Invalidate cache
    r.delete(f"user:{user_id}")

4.2 Write-Through

Updates cache and DB simultaneously on writes.

def save_user(user_id: int, data: dict):
    cache_key = f"user:{user_id}"

    # Update DB and cache simultaneously
    db.query(User).filter(User.id == user_id).update(data)
    db.commit()

    r.setex(cache_key, 3600, json.dumps(data))

4.3 Write-Behind (Write-Back)

Writes to cache first, then asynchronously persists to DB.

def save_user_async(user_id: int, data: dict):
    cache_key = f"user:{user_id}"

    # Write to cache first
    r.setex(cache_key, 3600, json.dumps(data))

    # Async DB persistence (Celery, etc.)
    sync_to_db_task.delay(user_id, data)

4.4 Caching Pattern Comparison

PatternRead PerfWrite PerfConsistencyComplexity
Cache-AsideHighMediumEventualLow
Write-ThroughHighLowStrongMedium
Write-BehindHighHighEventualHigh
Read-ThroughHighMediumEventualMedium

5. Cache Invalidation

5.1 TTL-Based

The simplest approach. Auto-expires after a set time.

SET product:123 "data" EX 300           # Expires in 5 minutes

5.2 Event-Based Invalidation

Explicitly delete cache when data changes.

def update_product(product_id: int, data: dict):
    db.update(product_id, data)

    # Delete all related caches
    r.delete(f"product:{product_id}")
    r.delete(f"product_list:category:{data['category_id']}")
    r.delete("product_list:featured")

5.3 Versioned Keys

Include version in keys for bulk invalidation.

def get_product_list(category_id: int) -> list:
    version = r.get(f"product_version:{category_id}") or "1"
    cache_key = f"products:cat:{category_id}:v:{version}"

    cached = r.get(cache_key)
    if cached:
        return json.loads(cached)

    products = db.query(Product).filter(Product.category_id == category_id).all()
    r.setex(cache_key, 3600, json.dumps([p.to_dict() for p in products]))
    return [p.to_dict() for p in products]

def invalidate_category(category_id: int):
    # Increment version to invalidate existing caches
    r.incr(f"product_version:{category_id}")

5.4 Thundering Herd Prevention

Prevents many requests hitting DB simultaneously when cache expires.

import random

def get_with_jitter(key: str, ttl: int = 3600) -> dict:
    cached = r.get(key)
    if cached:
        return json.loads(cached)

    # Distributed lock — only one request queries DB
    lock_key = f"lock:{key}"
    if r.set(lock_key, "1", nx=True, ex=10):
        try:
            data = fetch_from_db(key)
            # Add random jitter to TTL
            jitter = random.randint(0, 300)
            r.setex(key, ttl + jitter, json.dumps(data))
            return data
        finally:
            r.delete(lock_key)
    else:
        # Another process is refreshing — wait briefly and retry
        import time
        time.sleep(0.1)
        return get_with_jitter(key, ttl)

6. Pub/Sub and Streams

6.1 Pub/Sub Basics

# Subscriber
SUBSCRIBE notifications:user:123

# Publisher
PUBLISH notifications:user:123 "You have a new message!"

# Pattern subscription
PSUBSCRIBE notifications:*
# Python subscriber
import redis

r = redis.Redis()
pubsub = r.pubsub()
pubsub.subscribe("notifications:user:123")

for message in pubsub.listen():
    if message["type"] == "message":
        print(f"Received: {message['data']}")

6.2 Redis Streams vs Kafka

FeatureRedis StreamsKafka
Message persistenceMemory + AOFDisk
Throughput~10K/sec~100K+/sec
Consumer GroupsSupportedSupported
Message replaySupportedSupported
PartitioningNot supportedSupported
Operational complexityLowHigh
Suitable scaleSmall-MediumLarge

6.3 Event-Driven Pattern with Streams

import redis
import time

r = redis.Redis(decode_responses=True)

# Publish event
def publish_event(stream: str, event_type: str, data: dict):
    r.xadd(stream, {"type": event_type, **data}, maxlen=10000)

# Consumer Group processing
def consume_events(stream: str, group: str, consumer: str):
    try:
        r.xgroup_create(stream, group, id="0", mkstream=True)
    except redis.ResponseError:
        pass

    while True:
        messages = r.xreadgroup(
            group, consumer,
            {stream: ">"},
            count=10,
            block=5000,
        )

        for stream_name, entries in messages:
            for msg_id, fields in entries:
                try:
                    process_event(fields)
                    r.xack(stream_name, group, msg_id)
                except Exception as e:
                    print(f"Error processing {msg_id}: {e}")

def process_event(fields: dict):
    event_type = fields.get("type")
    if event_type == "order_created":
        handle_order(fields)
    elif event_type == "payment_completed":
        handle_payment(fields)

7. Redis Stack

Redis Stack adds JSON, Search, TimeSeries, and Bloom Filter modules to Redis.

7.1 RedisJSON

# Store JSON document
JSON.SET user:1 $ '{"name":"Alice","age":30,"address":{"city":"Seoul","zip":"06000"},"tags":["python","redis"]}'

# Path-based query
JSON.GET user:1 $.name                   # "Alice"
JSON.GET user:1 $.address.city           # "Seoul"

# Partial update
JSON.SET user:1 $.age 31
JSON.ARRAPPEND user:1 $.tags '"fastapi"'

# Numeric increment
JSON.NUMINCRBY user:1 $.age 1
# Create index
FT.CREATE idx:products
  ON JSON
  PREFIX 1 product:
  SCHEMA
    $.name AS name TEXT WEIGHT 5.0
    $.description AS description TEXT
    $.price AS price NUMERIC SORTABLE
    $.category AS category TAG

# Add documents
JSON.SET product:1 $ '{"name":"Redis in Action","description":"Complete guide to Redis","price":45000,"category":"book"}'
JSON.SET product:2 $ '{"name":"Python Cookbook","description":"Python recipes and patterns","price":38000,"category":"book"}'

# Search
FT.SEARCH idx:products "Redis guide"
FT.SEARCH idx:products "@category:{book} @price:[30000 50000]"
FT.SEARCH idx:products "@name:(Python)" SORTBY price ASC

7.3 RedisTimeSeries

# Create time series
TS.CREATE sensor:temperature:1 RETENTION 86400000 LABELS sensor_id 1 type temperature

# Add data
TS.ADD sensor:temperature:1 * 23.5
TS.ADD sensor:temperature:1 * 24.1
TS.ADD sensor:temperature:1 * 22.8

# Range query
TS.RANGE sensor:temperature:1 - + COUNT 10

# Aggregation (5-minute average)
TS.RANGE sensor:temperature:1 - + AGGREGATION avg 300000

# Downsampling rule
TS.CREATERULE sensor:temperature:1 sensor:temperature:1:avg AGGREGATION avg 300000

8. Lua Scripting

8.1 Basic Lua Script

# Atomic read-modify-write
EVAL "
  local current = redis.call('GET', KEYS[1])
  if current then
    local new_val = tonumber(current) + tonumber(ARGV[1])
    redis.call('SET', KEYS[1], new_val)
    return new_val
  end
  return nil
" 1 counter 5

8.2 Rate Limiter (Sliding Window)

RATE_LIMIT_SCRIPT = """
local key = KEYS[1]
local limit = tonumber(ARGV[1])
local window = tonumber(ARGV[2])
local now = tonumber(ARGV[3])

-- Remove old requests outside window
redis.call('ZREMRANGEBYSCORE', key, 0, now - window)

-- Check current request count
local count = redis.call('ZCARD', key)

if count < limit then
    -- Allow: add new request
    redis.call('ZADD', key, now, now .. ':' .. math.random())
    redis.call('EXPIRE', key, window)
    return 1
else
    -- Deny
    return 0
end
"""

import redis
import time

r = redis.Redis()
rate_limit_sha = r.script_load(RATE_LIMIT_SCRIPT)

def is_allowed(user_id: str, limit: int = 100, window: int = 60) -> bool:
    key = f"ratelimit:{user_id}"
    now = int(time.time() * 1000)
    result = r.evalsha(rate_limit_sha, 1, key, limit, window * 1000, now)
    return bool(result)

8.3 Distributed Lock (Redlock Algorithm)

import redis
import time
import uuid

class DistributedLock:
    def __init__(self, redis_client: redis.Redis, resource: str, ttl: int = 10):
        self.redis = redis_client
        self.resource = f"lock:{resource}"
        self.ttl = ttl
        self.token = str(uuid.uuid4())

    def acquire(self) -> bool:
        return bool(self.redis.set(
            self.resource, self.token,
            nx=True, ex=self.ttl,
        ))

    def release(self) -> bool:
        # Atomic check + delete with Lua script
        script = """
        if redis.call('GET', KEYS[1]) == ARGV[1] then
            return redis.call('DEL', KEYS[1])
        end
        return 0
        """
        return bool(self.redis.eval(script, 1, self.resource, self.token))

    def __enter__(self):
        if not self.acquire():
            raise Exception("Could not acquire lock")
        return self

    def __exit__(self, *args):
        self.release()

# Usage
r = redis.Redis()
with DistributedLock(r, "order:process:123"):
    # Critical section — only one process executes
    process_order(123)

9. Redis Cluster

9.1 Hash Slots

Redis Cluster distributes data across 16384 hash slots. The CRC16 hash of a key modulo 16384 determines the slot number.

Node configuration example:
  Node A: slots 0-5460
  Node B: slots 5461-10922
  Node C: slots 10923-16383

With one replica per node:
  Node A -> Replica A'
  Node B -> Replica B'
  Node C -> Replica C'

9.2 Cluster Setup

# Create cluster (6 nodes: 3 masters + 3 replicas)
redis-cli --cluster create \
  127.0.0.1:7001 127.0.0.1:7002 127.0.0.1:7003 \
  127.0.0.1:7004 127.0.0.1:7005 127.0.0.1:7006 \
  --cluster-replicas 1

# Check cluster status
redis-cli -c -p 7001 cluster info
redis-cli -c -p 7001 cluster nodes

9.3 Failover and Resharding

# Manual failover
redis-cli -c -p 7004 cluster failover

# Resharding (move slots)
redis-cli --cluster reshard 127.0.0.1:7001

# Add node
redis-cli --cluster add-node 127.0.0.1:7007 127.0.0.1:7001

# Remove node
redis-cli --cluster del-node 127.0.0.1:7001 <node-id>

9.4 Hash Tags

Place keys in the same slot to enable multi-key operations.

# Hash tags — slot determined by content in curly braces
SET {user:1}:profile "data"
SET {user:1}:settings "data"
SET {user:1}:sessions "data"

# These three keys are in the same slot -> multi-key operations possible

10. Redis in Practice

10.1 Session Store

import redis
import json
import uuid

r = redis.Redis(decode_responses=True)

def create_session(user_id: int, data: dict) -> str:
    session_id = str(uuid.uuid4())
    session_data = {
        "user_id": str(user_id),
        "created_at": str(time.time()),
        **data,
    }
    r.hset(f"session:{session_id}", mapping=session_data)
    r.expire(f"session:{session_id}", 86400)  # 24 hours
    return session_id

def get_session(session_id: str) -> dict | None:
    data = r.hgetall(f"session:{session_id}")
    if not data:
        return None
    # Refresh TTL on access
    r.expire(f"session:{session_id}", 86400)
    return data

def destroy_session(session_id: str):
    r.delete(f"session:{session_id}")

10.2 Rate Limiter (Fixed Window)

def check_rate_limit(user_id: str, limit: int = 100, window: int = 60) -> bool:
    key = f"rate:{user_id}:{int(time.time()) // window}"

    pipe = r.pipeline()
    pipe.incr(key)
    pipe.expire(key, window)
    count, _ = pipe.execute()

    return count <= limit

10.3 Leaderboard

class Leaderboard:
    def __init__(self, name: str):
        self.key = f"leaderboard:{name}"

    def add_score(self, player_id: str, score: float):
        r.zadd(self.key, {player_id: score})

    def increment_score(self, player_id: str, delta: float):
        r.zincrby(self.key, delta, player_id)

    def get_rank(self, player_id: str) -> int | None:
        rank = r.zrevrank(self.key, player_id)
        return rank + 1 if rank is not None else None

    def get_top(self, count: int = 10) -> list[tuple[str, float]]:
        return r.zrevrange(self.key, 0, count - 1, withscores=True)

    def get_around(self, player_id: str, count: int = 5) -> list:
        rank = r.zrevrank(self.key, player_id)
        if rank is None:
            return []
        start = max(0, rank - count)
        end = rank + count
        return r.zrevrange(self.key, start, end, withscores=True)

# Usage
lb = Leaderboard("weekly")
lb.add_score("player:alice", 1500)
lb.increment_score("player:alice", 200)
print(lb.get_top(10))
print(lb.get_rank("player:alice"))

10.4 Simple Job Queue

import json
import time

def enqueue_job(queue: str, job_data: dict):
    job = {
        "id": str(uuid.uuid4()),
        "data": job_data,
        "created_at": time.time(),
    }
    r.lpush(f"queue:{queue}", json.dumps(job))

def dequeue_job(queue: str, timeout: int = 30) -> dict | None:
    result = r.brpop(f"queue:{queue}", timeout=timeout)
    if result:
        _, job_json = result
        return json.loads(job_json)
    return None

def worker(queue: str):
    while True:
        job = dequeue_job(queue)
        if job:
            try:
                process_job(job["data"])
            except Exception as e:
                # On failure, add to retry queue
                r.lpush(f"queue:{queue}:failed", json.dumps(job))

11. Redis vs Memcached vs DragonflyDB

FeatureRedisMemcachedDragonflyDB
Data structuresRich (String, List, Set, etc.)String onlyRedis-compatible
PersistenceRDB + AOFNoneSnapshots
ClusteringRedis ClusterClient-sideSingle node (multithreaded)
MultithreadingSingle thread (I/O threads)MultithreadedMultithreaded
Memory efficiencyMediumHighHigh
Pub/SubSupportedNot supportedSupported
Lua scriptingSupportedNot supportedSupported
Throughput~100K ops/s~100K ops/s~400K ops/s
Max value size512MB1MB512MB
Best forGeneral purposeSimple cachingHigh-perf single node

12. Client Library Code Examples

12.1 Spring Data Redis (Java)

@Configuration
public class RedisConfig {
    @Bean
    public RedisTemplate<String, Object> redisTemplate(RedisConnectionFactory factory) {
        RedisTemplate<String, Object> template = new RedisTemplate<>();
        template.setConnectionFactory(factory);
        template.setKeySerializer(new StringRedisSerializer());
        template.setValueSerializer(new GenericJackson2JsonRedisSerializer());
        return template;
    }
}

@Service
public class UserCacheService {
    private final RedisTemplate<String, Object> redisTemplate;
    private final Duration TTL = Duration.ofHours(1);

    public UserCacheService(RedisTemplate<String, Object> redisTemplate) {
        this.redisTemplate = redisTemplate;
    }

    public void cacheUser(String userId, UserDto user) {
        String key = "user:" + userId;
        redisTemplate.opsForValue().set(key, user, TTL);
    }

    public UserDto getCachedUser(String userId) {
        String key = "user:" + userId;
        return (UserDto) redisTemplate.opsForValue().get(key);
    }

    public void addScore(String leaderboard, String playerId, double score) {
        redisTemplate.opsForZSet().add("lb:" + leaderboard, playerId, score);
    }
}

12.2 ioredis (Node.js)

import Redis from 'ioredis';

const redis = new Redis({
  host: 'localhost',
  port: 6379,
  retryStrategy: (times) => Math.min(times * 50, 2000),
  maxRetriesPerRequest: 3,
});

// Basic caching
async function getUser(userId) {
  const cacheKey = `user:${userId}`;

  const cached = await redis.get(cacheKey);
  if (cached) return JSON.parse(cached);

  const user = await db.findUser(userId);
  if (user) {
    await redis.setex(cacheKey, 3600, JSON.stringify(user));
  }
  return user;
}

// Pipeline (batch processing)
async function getMultipleUsers(userIds) {
  const pipeline = redis.pipeline();
  userIds.forEach(id => pipeline.get(`user:${id}`));
  const results = await pipeline.exec();
  return results.map(([err, val]) => val ? JSON.parse(val) : null);
}

// Pub/Sub
const subscriber = new Redis();
subscriber.subscribe('notifications', (err, count) => {
  console.log(`Subscribed to ${count} channels`);
});

subscriber.on('message', (channel, message) => {
  console.log(`Received on ${channel}: ${message}`);
});

await redis.publish('notifications', JSON.stringify({ type: 'alert', message: 'Server update' }));

12.3 redis-py (Python)

import redis.asyncio as aioredis
import json

# Async Redis client
pool = aioredis.ConnectionPool.from_url(
    "redis://localhost:6379",
    max_connections=20,
    decode_responses=True,
)
r = aioredis.Redis(connection_pool=pool)

# Pipeline
async def batch_operations():
    async with r.pipeline(transaction=True) as pipe:
        pipe.set("key1", "value1")
        pipe.set("key2", "value2")
        pipe.get("key1")
        results = await pipe.execute()
        return results

# Cache decorator
import functools

def redis_cache(ttl: int = 3600):
    def decorator(func):
        @functools.wraps(func)
        async def wrapper(*args, **kwargs):
            cache_key = f"cache:{func.__name__}:{args}:{kwargs}"
            cached = await r.get(cache_key)
            if cached:
                return json.loads(cached)
            result = await func(*args, **kwargs)
            await r.setex(cache_key, ttl, json.dumps(result))
            return result
        return wrapper
    return decorator

@redis_cache(ttl=600)
async def get_product_list(category_id: int):
    return await db.get_products(category_id)

13. Interview Questions (15)

Q1. How is Redis fast despite being single-threaded?

Redis uses an event loop-based single thread with all data in memory, eliminating disk I/O. It uses epoll/kqueue-based I/O multiplexing to efficiently handle thousands of connections. Since Redis 6.0, I/O threads parallelize network processing.

Q2. What is the difference between Cache-Aside and Write-Through?

Cache-Aside has the application manage the cache directly (on read miss, query DB then cache). Write-Through updates both cache and DB simultaneously on writes. Cache-Aside is simpler to implement; Write-Through provides stronger data consistency.

Q3. Explain Redis persistence mechanisms (RDB, AOF).

RDB (Redis Database) saves point-in-time snapshots to disk. AOF (Append Only File) logs every write operation. RDB is suitable for fast recovery; AOF minimizes data loss. Using both is recommended.

Q4. Why might MGET fail in Redis Cluster?

In Redis Cluster, keys are distributed across hash slots. If MGET keys are in different slots, they cannot be processed in a single command. Use hash tags (curly braces) to place keys in the same slot, or split into multiple requests at the client.

Q5. What is the Thundering Herd problem?

When a cache key expires, many requests simultaneously hit the DB. Solutions: distributed lock allowing only one request to access DB, random jitter added to TTL, proactive background cache refresh.

Q6. What is the difference between Redis Pub/Sub and Streams?

Pub/Sub is fire-and-forget; messages are lost if no subscribers exist. Streams persist messages permanently, with Consumer Groups ensuring reliable processing. Streams support ACK, reprocessing, and history queries.

Q7. Why are big keys problematic in Redis?

Big keys block Redis during deletion, consume network bandwidth, and cause data skew in clusters. Use UNLINK for async deletion and split large hashes into smaller ones.

Q8. Explain Redis eviction policies.

When maxmemory is reached, keys are removed by eviction policy: noeviction (reject new writes), allkeys-lru (LRU), allkeys-lfu (LFU), volatile-lru (LRU among TTL keys), volatile-ttl (nearest expiration first).

Q9. Why use Lua scripting in Redis?

For atomic execution of multiple commands. Reduces network round-trips and executes complex logic atomically server-side. Common use cases: rate limiters, distributed lock release, conditional updates.

Q10. What is the difference between Redis Sentinel and Redis Cluster?

Sentinel provides high availability for master-slave setups (automatic failover). Cluster provides horizontal scaling by distributing data across nodes. Sentinel for small scale; Cluster for large datasets.

Q11. What is Redis pipelining?

Sending multiple commands at once to the server and receiving all responses together. Dramatically reduces network round-trips. 100 individual commands require 100 round-trips; pipelining needs just 1.

Q12. How does Redis handle key expiration?

Two mechanisms combined: lazy expiration checks expiry on key access; active expiration samples random expired keys every 100ms for deletion. This combination prevents expired keys from consuming excessive memory.

Q13. What is the Redlock algorithm?

A distributed lock algorithm proposed by Salvatore Sanfilippo (and critiqued by Martin Kleppmann). Lock is valid when successfully acquired on a majority (N/2+1) of N independent Redis instances. Safer than single-instance locks but not perfect.

Q14. What is Redis slow log?

Records commands exceeding a time threshold. Set threshold with slowlog-log-slower-than (microseconds). Use SLOWLOG GET to identify and optimize slow commands.

Q15. What are the pros and cons of Redis as a session store?

Pros: fast reads/writes, automatic TTL expiry, horizontal scaling, cross-server session sharing. Cons: memory cost, session loss risk on Redis failure, network dependency. Enable persistence (AOF) and replication to mitigate risks.


14. Quiz

Q1. What is the difference between Redis SET NX and XX options?

NX (Not eXists) sets the key only if it does not exist. Used for distributed lock acquisition. XX (eXists) updates only if the key already exists. Used for safely updating existing values.

Q2. What is the time complexity of ZADD and why?

ZADD has O(log N) time complexity. Sorted Set internally uses a Skip List to maintain sorted order, and Skip List insertion is O(log N).

Q3. Why should KEYS command not be used in production?

KEYS iterates all keys with O(N) complexity, blocking Redis for extended periods with many keys. Use SCAN instead, which iterates incrementally with cursor-based approach, preventing blocking.

Q4. What problem does WATCH/MULTI/EXEC solve in Redis?

It implements optimistic locking. WATCH monitors keys, MULTI starts a transaction, EXEC executes it. If a WATCHed key is modified by another client, the transaction fails. Efficient in low-contention scenarios.

Q5. HyperLogLog has ~0.81% error rate. Why use an inaccurate data structure?

Exact unique counts require O(N) memory (storing all elements). HyperLogLog uses fixed 12KB regardless of set size. Tracking 100 million unique visitors would need several GB with Set, but only 12KB with HyperLogLog. The 0.81% error is acceptable for most analytics use cases.


References

  1. Redis Official Documentation
  2. Redis University
  3. Redis Stack Documentation
  4. Valkey Project
  5. Redis in Action (Manning)
  6. ioredis GitHub
  7. redis-py Documentation
  8. Spring Data Redis
  9. Redis Best Practices
  10. Redlock Algorithm
  11. Martin Kleppmann's Redlock Analysis
  12. Redis Cluster Tutorial
  13. DragonflyDB
  14. Redis Streams Guide
  15. RediSearch Documentation