Message Queue Showdown: Kafka vs RabbitMQ vs SQS vs Pulsar vs NATS — 2025 Decision Guide

1. Why Message Queues?
2. Architecture Comparison: Log vs Broker vs Hybrid
3. Performance Benchmarks
4. 2025 New Feature Highlights
5. Managed Cloud Services Comparison + Pricing
6. Spring Boot Integration Code
7. Decision Flowchart
- Quick Selection Guide
8. Real-World Architecture Patterns
9. Operations Checklist
10. Quiz
References

1. Why Message Queues?

Market Size and Adoption

The global message queue middleware market has grown to approximately 6 billion USD as of 2025. According to Fortune Business Insights, 41% of enterprises have already adopted message-queue-based asynchronous architectures, with an additional 27% planning to adopt them.

Why are so many companies embracing message queues?

Synchronous vs Asynchronous Communication

graph LR
    subgraph "Synchronous - Tight Coupling"
        A[Order Service] -->|HTTP call| B[Payment Service]
        B -->|HTTP call| C[Inventory Service]
        C -->|HTTP call| D[Notification Service]
    end

In synchronous communication, a 1-second delay in the payment service slows the entire chain by over 1 second. If one service dies, cascading failures occur.

graph LR
    subgraph "Asynchronous - Loose Coupling"
        A2[Order Service] -->|Publish event| MQ[Message Queue]
        MQ -->|Subscribe| B2[Payment Service]
        MQ -->|Subscribe| C2[Inventory Service]
        MQ -->|Subscribe| D2[Notification Service]
    end

In asynchronous communication, the order service only needs to publish a message. Even if the payment service goes down temporarily, the message is retained in the queue and processed after recovery.

5 Signs You Need a Message Queue

Synchronous calls between services chain more than 3 levels deep
Traffic spikes overload downstream services
Event replay is needed (failure recovery, data reprocessing)
Multiple consumers need to process the same data for different purposes
Microservice migration requires reducing inter-service coupling

2. Architecture Comparison: Log vs Broker vs Hybrid

Message queues fall into three main architectural paradigms.

2.1 Apache Kafka — Distributed Commit Log

graph TB
    subgraph "Kafka Cluster"
        direction TB
        KR[KRaft Controller] --> B1[Broker 1]
        KR --> B2[Broker 2]
        KR --> B3[Broker 3]
    end
    P[Producer] --> B1
    B1 --> T["Topic: orders (3 partitions)"]
    T --> P0[Partition 0 - Leader: B1]
    T --> P1[Partition 1 - Leader: B2]
    T --> P2[Partition 2 - Leader: B3]
    P0 --> CG["Consumer Group"]
    P1 --> CG
    P2 --> CG

Key Characteristics:

Append-Only Commit Log: Messages are appended sequentially to partitions and are immutable
Consumer Offsets: Consumers manage their read positions, enabling replay
KRaft (Kafka 4.0): Complete removal of ZooKeeper with Raft-based metadata management
Retention Policy: Messages retained based on time or size (default 7 days)
Partition Parallelism: Scale consumers in parallel up to the number of partitions

Best For: Event streaming, log aggregation, event sourcing, real-time analytics

2.2 RabbitMQ — AMQP Message Broker

graph LR
    P[Producer] --> E["Exchange (type: topic)"]
    E -->|"routing.key.a"| Q1[Queue A]
    E -->|"routing.key.b"| Q2[Queue B]
    E -->|"routing.#"| Q3[Queue C - wildcard]
    Q1 --> C1[Consumer 1]
    Q2 --> C2[Consumer 2]
    Q3 --> C3[Consumer 3]

Key Characteristics:

AMQP Protocol: Flexible routing model with Exchanges, Bindings, and Queues
Exchange Types: Direct, Topic, Fanout, Headers for complex routing
Message ACK: Messages are removed from the queue only after consumer acknowledgment
Plugin Ecosystem: Shovel, Federation, Management UI, and more
Priority Queue: Support for message priority levels

Best For: Complex routing, RPC patterns, task distribution, existing AMQP ecosystems

2.3 Amazon SQS — Fully Managed Queue

graph LR
    P[Producer] -->|SendMessage| SQS["SQS Queue"]
    SQS -->|ReceiveMessage| C1[Consumer 1]
    SQS -->|ReceiveMessage| C2[Consumer 2]
    SQS -.->|On failure| DLQ["Dead Letter Queue"]

Key Characteristics:

Fully Managed: No server provisioning, patching, or scaling needed
Pull-Based: Consumers actively fetch messages
Standard vs FIFO: Unlimited throughput (Standard) or ordered delivery (FIFO, 300-3,000 TPS)
Visibility Timeout: Messages are hidden from other consumers during processing
Native DLQ Support: Automatic routing of failed messages

Best For: Serverless architecture, AWS-native workloads, minimal operational overhead

2.4 Apache Pulsar — Decoupled Architecture

graph TB
    subgraph "Pulsar Cluster"
        direction TB
        B1[Broker 1 - Serving] --> BK1[BookKeeper 1 - Storage]
        B2[Broker 2 - Serving] --> BK2[BookKeeper 2 - Storage]
        B3[Broker 3 - Serving] --> BK3[BookKeeper 3 - Storage]
    end
    P[Producer] --> B1
    B1 --> T["Topic: events"]
    T --> SUB["Subscription (Shared/Key_Shared/Exclusive)"]
    SUB --> C1[Consumer 1]
    SUB --> C2[Consumer 2]

Key Characteristics:

Compute/Storage Separation: Brokers (serving) and BookKeeper (storage) scale independently
Multi-Tenancy: Isolation at the namespace and tenant level
Geo-Replication: Native cross-datacenter replication
Tiered Storage: Automatic offloading of old data to low-cost storage like S3
Subscription Modes: Exclusive, Shared, Failover, Key_Shared

Best For: Multi-tenant environments, geo-distributed deployments, large-scale streaming

2.5 NATS — Lightweight Message Mesh

graph LR
    subgraph "NATS Cluster"
        N1[NATS Server 1] --- N2[NATS Server 2]
        N2 --- N3[NATS Server 3]
        N1 --- N3
    end
    P[Publisher] --> N1
    N1 -->|"subject: sensor.temp"| S1[Subscriber 1]
    N2 -->|"subject: sensor.*"| S2[Subscriber 2 - wildcard]
    N3 -->|JetStream| JS[JetStream Consumer]

Key Characteristics:

Core NATS: Fire-and-forget Pub/Sub for minimal latency
JetStream: Enable when persistence, replay, and exactly-once are needed
Lightweight Binary: Single binary of about 20MB, written in Go
Subject-Based Routing: Hierarchical subject names with wildcard matching
Leaf Nodes: Lightweight node extension for edge computing

Best For: IoT, edge computing, fast inter-service communication, cloud-native

2.6 Redis Streams — In-Memory Log Structure

graph LR
    P[Producer] -->|XADD| RS["Redis Stream (mystream)"]
    RS -->|XREADGROUP| CG["Consumer Group"]
    CG --> C1["Consumer 1 (pending entries)"]
    CG --> C2["Consumer 2 (pending entries)"]

Key Characteristics:

In-Memory: Ultra-low latency, high throughput
Consumer Groups: Similar consumer group model to Kafka
XADD/XREAD: Simple command-based API
Pending Entries List (PEL): Tracks unacknowledged messages
MAXLEN Trimming: Automatic cleanup within memory limits

Best For: Lightweight streaming, environments already using Redis, cache + queue integration

Architecture Paradigm Summary

System	Paradigm	Message Storage	Consumption Model	Message Deletion
Kafka	Distributed Commit Log	Disk (retention policy)	Pull + Offset	After retention period
RabbitMQ	Message Broker	Memory + Disk	Push (delete on ACK)	Immediately on ACK
SQS	Managed Queue	AWS managed	Pull	After processing
Pulsar	Decoupled Log	BookKeeper + Tiered	Pull + Cursor	Retention-policy based
NATS	Message Mesh	Memory / JetStream	Push / Pull	Policy-based
Redis Streams	In-Memory Log	Memory (AOF optional)	Pull (Consumer Group)	MAXLEN trimming

3. Performance Benchmarks

Throughput and Latency Comparison

The following benchmarks are measured on standard 3-node cluster environments (m5.2xlarge or equivalent).

System	Throughput (msgs/s)	P50 Latency	P99 Latency	Best For
Kafka	500K - 1M	5 - 15ms	10 - 50ms	Event streams, replay, log aggregation
Pulsar	1M - 2.6M	5 - 10ms	300x better P99 tail latency vs Kafka	Multi-tenancy, geo-replication
NATS	200K - 400K	sub-ms	1 - 5ms	IoT, edge, fast messaging
RabbitMQ	50K - 100K	1 - 5ms	5 - 20ms	Complex routing, RPC
Redis Streams	100K - 500K	sub-ms	1 - 3ms	Lightweight streaming, cache integration
SQS	3K - 30K (Standard)	20 - 50ms	50 - 100ms	Serverless, AWS native

Throughput Deep Dive

The secret behind Kafka's 1M msgs/s:

# Kafka throughput optimization configuration
num.partitions=12
batch.size=65536          # 64KB batch
linger.ms=5               # Wait 5ms before sending batch
compression.type=lz4      # LZ4 compression
buffer.memory=67108864    # 64MB buffer
acks=1                    # Leader-only ACK (max throughput)

With 12 partitions, each partition handles approximately 83K msgs/s
Changing to acks=all reduces throughput by 30-40% but ensures data durability
LZ4 compression saves 40-60% network bandwidth

Why Pulsar excels at P99:

Broker and BookKeeper separation means write I/O and read I/O don't compete
BookKeeper's Journal (WAL) and Ledger (data) use physically separate disks
Result: Up to 300x improvement in P99 tail latency compared to Kafka

NATS sub-ms requirements:

# NATS Core (JetStream disabled)
# - No persistence, in-memory only
# - Message size under 10KB
# - Communication within a single datacenter

Scalability Comparison

System	Horizontal Scaling	Partition/Queue Limit	Max Cluster Size
Kafka	Add partitions + brokers	Recommended 4,000 - 20,000/cluster	Hundreds of brokers
Pulsar	Independent broker/BookKeeper scaling	1M+ topics possible	Thousands of nodes
NATS	Leaf node expansion	Unlimited (subject-based)	Super cluster
RabbitMQ	Quorum queues + node addition	Recommended thousands	Dozens of nodes
Redis Streams	Redis Cluster sharding	One stream per shard	Hundreds of nodes
SQS	Automatic (unlimited)	Unlimited	AWS managed

4. 2025 New Feature Highlights

4.1 Kafka 4.0 — KRaft GA and Queue Mode

KRaft GA (KIP-833):

ZooKeeper has been completely removed in Kafka 4.0. All metadata management is now handled by KRaft controllers.

# Kafka 4.0 KRaft mode configuration
process.roles: controller,broker
node.id: 1
controller.quorum.voters: 1@kafka1:9093,2@kafka2:9093,3@kafka3:9093
listeners: PLAINTEXT://:9092,CONTROLLER://:9093
# All ZooKeeper-related configuration removed

Improvements:

10x faster metadata propagation
Partition count scalable beyond 1 million
Drastically reduced operational complexity (no separate ZooKeeper cluster)
Cluster startup time reduced from minutes to seconds

KIP-932: Queues for Kafka (Share Groups):

Traditional "queue" semantics have been added to Kafka. Previously, messages within a consumer group could only be distributed at the partition level. With Share Groups, messages can be distributed at the individual message level.

// Kafka 4.0 Share Group example
Properties props = new Properties();
props.put("group.id", "my-share-group");
props.put("group.type", "share");  // New Share type

KafkaShareConsumer<String, String> consumer =
    new KafkaShareConsumer<>(props);
consumer.subscribe(List.of("orders"));

// Distributed per message, not bound to partitions
while (true) {
    ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(100));
    for (ConsumerRecord<String, String> record : records) {
        process(record);
        consumer.acknowledge(record);  // Per-message ACK
    }
}

4.2 RabbitMQ 4.x — SQL Filters and AMQP 1.0

Native AMQP 1.0 Support:

Starting with RabbitMQ 4.0, AMQP 1.0 has been promoted to a core protocol. The existing AMQP 0.9.1 transitions to a legacy plugin.

Stream Filtering (4.2):

// RabbitMQ 4.2 Stream SQL Filter
// Server-side filtering saves network bandwidth
Consumer consumer = environment.consumerBuilder()
    .stream("events")
    .filter()
        .values("region-us", "priority-high")  // Server-side filter
        .postFilter(msg -> msg.getProperties().getSubject().equals("order"))
    .builder()
    .messageHandler((context, message) -> {
        // Only filtered messages received
        processMessage(message);
        context.storeOffset();
    })
    .build();

Performance: Maintains 4M+ msgs/s even with filtering enabled

Khepri Metadata Store:

A Raft-based metadata store replacing Mnesia, significantly improving stability in large-scale clusters.

4.3 Pulsar 4.1 — 19 PIPs

Apache Pulsar 4.1 includes 19 Pulsar Improvement Proposals (PIPs).

Key changes:

Java 17 Required: Java 8/11 support dropped
PIP-354: Topic Policy performance optimization
PIP-362: Broker Level Dispatch Throttling
PIP-374: Improved memory management reducing GC overhead

4.4 NATS 2.11 — Message TTL and Distributed Tracing

# NATS 2.11 JetStream message TTL configuration
nats stream add EVENTS \
  --subjects "events.>" \
  --max-msg-ttl 3600 \
  --storage file \
  --replicas 3

Key changes:

Message-Level TTL: Set expiration times on individual messages, not entire streams
Distributed Tracing: OpenTelemetry integration for message flow tracking
SubjectTransform: Dynamic subject name transformation
Improved Monitoring: Extended Prometheus metrics

5. Managed Cloud Services Comparison + Pricing

Major Managed Services

Service	Underlying Tech	Provider	Key Features
Confluent Cloud	Kafka	Confluent	Schema Registry, ksqlDB, Connectors
Amazon MSK	Kafka	AWS	VPC integration, Serverless option
Amazon MSK Serverless	Kafka	AWS	Auto-scaling, pay-per-use
Amazon SQS	Proprietary	AWS	Fully managed, serverless integration
CloudAMQP	RabbitMQ	84codes	Dedicated/Shared instances
StreamNative	Pulsar	StreamNative	Pulsar-specialized managed service
Synadia	NATS	Synadia	NATS-specialized managed service

Monthly Cost Simulation (100M messages/day, 1KB average)

Scenario: 100M messages/day, 1KB average, 3-day retention

Confluent Cloud (Basic):
  - Throughput: ~1,160 msgs/s average
  - Estimated cost: ~800 - 1,200 USD/month
  - Includes: Brokers, storage, network

Amazon MSK Provisioned (kafka.m5.large x 3):
  - Instances: 0.21 USD/h x 3 = ~454 USD/month
  - Storage: 300GB x 0.10 USD/GB = 30 USD
  - Estimated total: ~500 - 700 USD/month

Amazon MSK Serverless:
  - Cluster: 0.75 USD/h = ~540 USD/month
  - Partitions: 0.0015 USD/h/partition
  - Storage: 0.10 USD/GB
  - Estimated total: ~600 - 1,000 USD/month

Amazon SQS Standard:
  - First 1M requests free
  - Then: 0.40 USD / 1M requests
  - ~3B requests/month: ~1,200 USD
  - Data transfer costs extra

CloudAMQP (Dedicated - Tiger):
  - ~500 - 1,000 USD/month
  - Cluster included, monitoring included

Self-Hosted Kafka (c5.2xlarge x 3 + EBS):
  - EC2: 0.34 USD/h x 3 = ~734 USD/month
  - EBS gp3 300GB x 3: ~72 USD/month
  - Operations cost extra (at least 1 DevOps engineer)
  - Estimated total: ~800 USD/month + ops costs

Selection Criteria Summary

Minimize cost: SQS for small scale (pay-per-use), self-hosted Kafka for large scale
Minimize ops overhead: SQS or MSK Serverless
Maximize features: Confluent Cloud (Schema Registry, ksqlDB)
Need RabbitMQ: CloudAMQP

6. Spring Boot Integration Code

6.1 Kafka + Spring Boot

// build.gradle
// implementation 'org.springframework.kafka:spring-kafka:3.3.0'

// application.yml
// spring:
//   kafka:
//     bootstrap-servers: localhost:9092
//     producer:
//       key-serializer: org.apache.kafka.common.serialization.StringSerializer
//       value-serializer: org.springframework.kafka.support.serializer.JsonSerializer
//     consumer:
//       group-id: order-service
//       auto-offset-reset: earliest
//       key-deserializer: org.apache.kafka.common.serialization.StringDeserializer
//       value-deserializer: org.springframework.kafka.support.serializer.JsonDeserializer

// KafkaProducerService.java
@Service
@RequiredArgsConstructor
public class KafkaProducerService {

    private final KafkaTemplate<String, OrderEvent> kafkaTemplate;

    public CompletableFuture<SendResult<String, OrderEvent>> sendOrderEvent(
            OrderEvent event) {
        return kafkaTemplate.send("orders", event.getOrderId(), event);
    }
}

// KafkaConsumerService.java
@Service
@Slf4j
public class KafkaConsumerService {

    @KafkaListener(
        topics = "orders",
        groupId = "order-service",
        containerFactory = "kafkaListenerContainerFactory"
    )
    public void handleOrderEvent(
            @Payload OrderEvent event,
            @Header(KafkaHeaders.RECEIVED_PARTITION) int partition,
            @Header(KafkaHeaders.OFFSET) long offset) {

        log.info("Received order: {} from partition: {}, offset: {}",
            event.getOrderId(), partition, offset);
        processOrder(event);
    }

    private void processOrder(OrderEvent event) {
        // Order processing logic
    }
}

6.2 RabbitMQ + Spring Boot

// build.gradle
// implementation 'org.springframework.boot:spring-boot-starter-amqp'

// application.yml
// spring:
//   rabbitmq:
//     host: localhost
//     port: 5672
//     username: guest
//     password: guest

// RabbitMQConfig.java
@Configuration
public class RabbitMQConfig {

    public static final String ORDER_EXCHANGE = "order.exchange";
    public static final String ORDER_QUEUE = "order.queue";
    public static final String ORDER_ROUTING_KEY = "order.created";
    public static final String DLQ_QUEUE = "order.dlq";

    @Bean
    public TopicExchange orderExchange() {
        return new TopicExchange(ORDER_EXCHANGE);
    }

    @Bean
    public Queue orderQueue() {
        return QueueBuilder.durable(ORDER_QUEUE)
            .withArgument("x-dead-letter-exchange", "")
            .withArgument("x-dead-letter-routing-key", DLQ_QUEUE)
            .withArgument("x-message-ttl", 60000)
            .build();
    }

    @Bean
    public Queue deadLetterQueue() {
        return QueueBuilder.durable(DLQ_QUEUE).build();
    }

    @Bean
    public Binding orderBinding(Queue orderQueue, TopicExchange orderExchange) {
        return BindingBuilder.bind(orderQueue)
            .to(orderExchange)
            .with(ORDER_ROUTING_KEY);
    }
}

// RabbitMQProducer.java
@Service
@RequiredArgsConstructor
public class RabbitMQProducer {

    private final RabbitTemplate rabbitTemplate;

    public void sendOrderEvent(OrderEvent event) {
        rabbitTemplate.convertAndSend(
            RabbitMQConfig.ORDER_EXCHANGE,
            RabbitMQConfig.ORDER_ROUTING_KEY,
            event,
            message -> {
                message.getMessageProperties().setContentType("application/json");
                message.getMessageProperties().setDeliveryMode(
                    MessageDeliveryMode.PERSISTENT);
                return message;
            }
        );
    }
}

// RabbitMQConsumer.java
@Service
@Slf4j
public class RabbitMQConsumer {

    @RabbitListener(queues = RabbitMQConfig.ORDER_QUEUE)
    public void handleOrderEvent(OrderEvent event, Channel channel,
            @Header(AmqpHeaders.DELIVERY_TAG) long tag) throws IOException {
        try {
            log.info("Received order: {}", event.getOrderId());
            processOrder(event);
            channel.basicAck(tag, false);
        } catch (Exception e) {
            log.error("Failed to process order: {}", event.getOrderId(), e);
            channel.basicNack(tag, false, false); // Move to DLQ
        }
    }
}

6.3 SQS + Spring Boot

// build.gradle
// implementation 'io.awspring.cloud:spring-cloud-aws-starter-sqs:3.2.0'

// application.yml
// spring:
//   cloud:
//     aws:
//       region:
//         static: ap-northeast-2
//       sqs:
//         endpoint: https://sqs.ap-northeast-2.amazonaws.com

// SQSProducer.java
@Service
@RequiredArgsConstructor
public class SQSProducer {

    private final SqsTemplate sqsTemplate;

    public void sendOrderEvent(OrderEvent event) {
        sqsTemplate.send(to -> to
            .queue("order-queue")
            .payload(event)
            .header("eventType", "ORDER_CREATED")
        );
    }

    // FIFO queue sending
    public void sendOrderEventFifo(OrderEvent event) {
        sqsTemplate.send(to -> to
            .queue("order-queue.fifo")
            .payload(event)
            .header(SqsHeaders.MessageSystemAttributes.SQS_MESSAGE_GROUP_ID_HEADER,
                event.getOrderId())
            .header(SqsHeaders.MessageSystemAttributes
                .SQS_MESSAGE_DEDUPLICATION_ID_HEADER,
                UUID.randomUUID().toString())
        );
    }
}

// SQSConsumer.java
@Service
@Slf4j
public class SQSConsumer {

    @SqsListener(value = "order-queue", maxConcurrentMessages = "10",
        maxMessagesPerPoll = "5")
    public void handleOrderEvent(@Payload OrderEvent event,
            @Header("eventType") String eventType) {
        log.info("Received order: {} with type: {}",
            event.getOrderId(), eventType);
        processOrder(event);
    }
}

6.4 Pulsar + Spring Boot

// build.gradle
// implementation 'org.springframework.pulsar:spring-pulsar-spring-boot-starter:1.2.0'

// application.yml
// spring:
//   pulsar:
//     client:
//       service-url: pulsar://localhost:6650
//     producer:
//       topic-name: persistent://public/default/orders
//     consumer:
//       subscription-name: order-service-sub
//       subscription-type: shared

// PulsarProducer.java
@Service
@RequiredArgsConstructor
public class PulsarProducer {

    private final PulsarTemplate<OrderEvent> pulsarTemplate;

    public void sendOrderEvent(OrderEvent event) {
        pulsarTemplate.send("persistent://public/default/orders", event);
    }
}

// PulsarConsumer.java
@Service
@Slf4j
public class PulsarConsumer {

    @PulsarListener(
        topics = "persistent://public/default/orders",
        subscriptionName = "order-service-sub",
        subscriptionType = SubscriptionType.Shared,
        schemaType = SchemaType.JSON
    )
    public void handleOrderEvent(OrderEvent event) {
        log.info("Received order: {}", event.getOrderId());
        processOrder(event);
    }
}

6.5 NATS + Spring Boot

// build.gradle
// implementation 'io.nats:jnats:2.20.4'

// NATSConfig.java
@Configuration
public class NATSConfig {

    @Bean
    public Connection natsConnection() throws IOException, InterruptedException {
        Options options = new Options.Builder()
            .server("nats://localhost:4222")
            .reconnectWait(Duration.ofSeconds(2))
            .maxReconnects(-1) // Unlimited reconnects
            .connectionListener((conn, type) -> {
                System.out.println("NATS connection event: " + type);
            })
            .build();
        return Nats.connect(options);
    }

    @Bean
    public JetStream jetStream(Connection connection) throws IOException {
        return connection.jetStream();
    }
}

// NATSProducer.java
@Service
@RequiredArgsConstructor
public class NATSProducer {

    private final JetStream jetStream;
    private final ObjectMapper objectMapper;

    public PublishAck sendOrderEvent(OrderEvent event) throws Exception {
        byte[] data = objectMapper.writeValueAsBytes(event);
        Message msg = NatsMessage.builder()
            .subject("orders.created")
            .data(data)
            .build();
        return jetStream.publish(msg);
    }
}

// NATSConsumer.java
@Service
@Slf4j
public class NATSConsumer {

    @PostConstruct
    public void startConsumer() throws Exception {
        JetStream js = natsConnection.jetStream();

        PushSubscribeOptions options = PushSubscribeOptions.builder()
            .durable("order-consumer")
            .build();

        js.subscribe("orders.>", "order-queue", dispatcher -> {
            dispatcher.onMessage(msg -> {
                try {
                    OrderEvent event = objectMapper.readValue(
                        msg.getData(), OrderEvent.class);
                    log.info("Received order: {}", event.getOrderId());
                    processOrder(event);
                    msg.ack();
                } catch (Exception e) {
                    log.error("Failed to process message", e);
                    msg.nak();
                }
            });
        }, false, options);
    }
}

6.6 Redis Streams + Spring Boot

// build.gradle
// implementation 'org.springframework.boot:spring-boot-starter-data-redis'

// RedisStreamConfig.java
@Configuration
public class RedisStreamConfig {

    @Bean
    public StreamMessageListenerContainer<String, MapRecord<String, String, String>>
            streamListenerContainer(RedisConnectionFactory factory) {

        var options = StreamMessageListenerContainer
            .StreamMessageListenerContainerOptions.builder()
            .pollTimeout(Duration.ofSeconds(1))
            .batchSize(10)
            .targetType(MapRecord.class)
            .build();

        var container = StreamMessageListenerContainer.create(factory, options);

        container.receiveAutoAck(
            Consumer.from("order-group", "consumer-1"),
            StreamOffset.create("orders", ReadOffset.lastConsumed()),
            new OrderStreamListener()
        );

        container.start();
        return container;
    }
}

// RedisStreamProducer.java
@Service
@RequiredArgsConstructor
public class RedisStreamProducer {

    private final StringRedisTemplate redisTemplate;
    private final ObjectMapper objectMapper;

    public RecordId sendOrderEvent(OrderEvent event) throws Exception {
        Map<String, String> fields = Map.of(
            "orderId", event.getOrderId(),
            "type", "ORDER_CREATED",
            "payload", objectMapper.writeValueAsString(event)
        );
        return redisTemplate.opsForStream()
            .add(StreamRecords.newRecord()
                .ofMap(fields)
                .withStreamKey("orders"));
    }
}

// OrderStreamListener.java
@Slf4j
public class OrderStreamListener
        implements StreamListener<String, MapRecord<String, String, String>> {

    @Override
    public void onMessage(MapRecord<String, String, String> message) {
        Map<String, String> body = message.getValue();
        log.info("Received order: {} from stream: {}",
            body.get("orderId"), message.getStream());
        processOrder(body);
    }
}

7. Decision Flowchart

A decision tree for choosing a message queue system.

graph TD
    START[Choose a Message Queue] --> Q1["Need event replay?"]

    Q1 -->|Yes| Q1A["Need multi-tenancy or geo-replication?"]
    Q1A -->|Yes| PULSAR["Apache Pulsar"]
    Q1A -->|No| KAFKA["Apache Kafka"]

    Q1 -->|No| Q2["Need complex routing?"]
    Q2 -->|Yes| RABBIT["RabbitMQ"]

    Q2 -->|No| Q3["AWS-native serverless?"]
    Q3 -->|Yes| SQS["Amazon SQS"]

    Q3 -->|No| Q4["Need sub-ms ultra-low latency?"]
    Q4 -->|Yes| Q4A["Need persistence?"]
    Q4A -->|Yes| NATS["NATS + JetStream"]
    Q4A -->|No| NATS_CORE["NATS Core"]

    Q4 -->|No| Q5["Already using Redis?"]
    Q5 -->|Yes| REDIS["Redis Streams"]
    Q5 -->|No| KAFKA2["Kafka (default choice)"]

Quick Selection Guide

Requirement	Recommended System	Reason
Event streaming + replay	Kafka	Commit-log based, industry standard
Complex message routing	RabbitMQ	Exchange/Binding model
Serverless + all-in on AWS	SQS	Zero ops, Lambda triggers
Multi-tenancy + geo-replication	Pulsar	Native multi-tenancy
IoT + edge computing	NATS	Lightweight, sub-ms latency
Already using Redis	Redis Streams	No additional infrastructure
High-volume events + low cost	Kafka (self-hosted)	High throughput per dollar
No operations team	SQS or Confluent Cloud	Fully managed

8. Real-World Architecture Patterns

8.1 E-Commerce: Order Processing Pipeline

graph LR
    subgraph "Order Creation"
        API[API Gateway] --> OS[Order Service]
    end

    OS -->|OrderCreated| KAFKA[Kafka - orders topic]

    KAFKA --> PAY[Payment Service]
    KAFKA --> INV[Inventory Service]
    KAFKA --> NOTIF[Notification Service]

    PAY -->|PaymentCompleted| KAFKA2[Kafka - payments topic]
    KAFKA2 --> SHIP[Shipping Service]

    INV -->|StockReserved| KAFKA3[Kafka - inventory topic]
    KAFKA3 --> OS2[Order Service - status update]

    SHIP -->|ShipmentCreated| KAFKA4[Kafka - shipments topic]
    KAFKA4 --> NOTIF2[Notification Service - shipping alerts]

Why Kafka?

Order event replay needed (reprocessing on payment failures)
Multiple services consume the same event independently
Can serve as an audit log

Key Configuration:

# Order topic configuration
topics:
  orders:
    partitions: 12
    replication-factor: 3
    configs:
      retention.ms: 604800000 # 7 days
      min.insync.replicas: 2
      cleanup.policy: delete

  payments:
    partitions: 6
    replication-factor: 3
    configs:
      retention.ms: 2592000000 # 30 days (audit log)
      min.insync.replicas: 2

8.2 Real-Time Analytics Pipeline

graph LR
    subgraph "Data Collection"
        WEB[Web Clickstream] --> KAFKA[Kafka]
        APP[App Events] --> KAFKA
        IOT[IoT Sensors] --> NATS[NATS]
        NATS -->|Bridge| KAFKA
    end

    subgraph "Stream Processing"
        KAFKA --> FLINK[Apache Flink]
        KAFKA --> KSQL[ksqlDB]
    end

    subgraph "Storage and Serving"
        FLINK --> ES[Elasticsearch]
        FLINK --> CH[ClickHouse]
        KSQL --> REDIS[Redis Cache]
    end

    subgraph "Visualization"
        ES --> GRAFANA[Grafana]
        CH --> SUPERSET[Apache Superset]
    end

Why the Kafka + NATS combination?

Kafka: High-throughput event collection and Flink/ksqlDB integration
NATS: Meets ultra-low latency requirements for IoT sensors, lightweight agents
Bridge: Consolidates NATS-collected data into Kafka

8.3 IoT Sensor Data Collection

graph TB
    subgraph "Edge Layer"
        S1[Temperature Sensor] --> LN1[NATS Leaf Node - Factory A]
        S2[Humidity Sensor] --> LN1
        S3[Vibration Sensor] --> LN2[NATS Leaf Node - Factory B]
    end

    subgraph "Core Layer"
        LN1 --> NATS[NATS Cluster - JetStream]
        LN2 --> NATS
        NATS --> PROC[Stream Processor]
        PROC -->|Anomaly detection| ALERT[Alert Service - RabbitMQ]
        PROC -->|Aggregated data| TS[TimescaleDB]
    end

    ALERT -->|Email| EMAIL[Email Service]
    ALERT -->|Slack| SLACK[Slack Notification]

Why the NATS + RabbitMQ combination?

NATS: Extends to the edge with Leaf Nodes, sub-ms latency
JetStream: Sensor data persistence and replay
RabbitMQ: Flexible alert routing (email, Slack, SMS distribution)

8.4 Saga Pattern — Distributed Transactions

sequenceDiagram
    participant OS as Order Service
    participant K as Kafka
    participant PS as Payment Service
    participant IS as Inventory Service
    participant SS as Shipping Service

    OS->>K: OrderCreated
    K->>PS: Payment request
    PS->>K: PaymentCompleted
    K->>IS: Deduct inventory
    IS->>K: StockReserved
    K->>SS: Create shipment
    SS->>K: ShipmentCreated
    K->>OS: Order complete

    Note over PS,IS: Compensating transactions on failure
    PS-->>K: PaymentFailed (compensate)
    K-->>IS: Restore inventory (compensate)
    K-->>OS: Cancel order (compensate)

Choreography vs Orchestration:

Approach	Pros	Cons	Recommended Scenario
Choreography	Low coupling, easy to scale	Hard to trace flow	Simple workflows
Orchestration	Central control, clear flow	Orchestrator as single point of failure	Complex business logic

9. Operations Checklist

Common Monitoring Metrics

Metric	Description	Warning Threshold
Consumer Lag	Number of unconsumed messages	Sustained above 10K
Message Failure Rate	DLQ ingestion rate	Above 1%
Broker Disk Usage	Storage capacity	Above 80%
Network I/O	In/Out traffic	Above 70% of bandwidth
JVM GC Time	Kafka/Pulsar/RabbitMQ	Full GC over 5 seconds

Kafka Production Checklist

1. Broker Configuration
   - min.insync.replicas = 2 (with replication-factor 3)
   - unclean.leader.election.enable = false
   - auto.create.topics.enable = false

2. Producer Configuration
   - acks = all (prevent data loss)
   - retries = Integer.MAX_VALUE
   - enable.idempotence = true
   - max.in.flight.requests.per.connection = 5

3. Consumer Configuration
   - enable.auto.commit = false (manual commits)
   - max.poll.records = 500
   - session.timeout.ms = 30000

4. Monitoring
   - Prometheus + Grafana (JMX Exporter)
   - Consumer Lag alerts
   - Under-Replicated Partitions alerts

RabbitMQ Production Checklist

1. Cluster Configuration
   - Use Quorum Queues (replaces Classic Mirrored Queues)
   - vm_memory_high_watermark = 0.4
   - disk_free_limit = 2GB

2. Connection Management
   - Use Connection Pools
   - Heartbeat timeout: 60 seconds
   - Reuse channels (channel creation is expensive)

3. Message Configuration
   - Persistent delivery mode
   - Enable Publisher Confirms
   - Configure Dead Letter Exchange

4. Monitoring
   - Management Plugin Dashboard
   - rabbitmq_prometheus plugin
   - Queue depth alerts

10. Quiz

Q1. The Biggest Change in Kafka 4.0

What core component was removed in Kafka 4.0, and what technology replaces it?

Answer: ZooKeeper was completely removed and replaced by KRaft (Kafka Raft).

KRaft is a Raft consensus algorithm-based metadata management system
Metadata propagation speed improved by 10x, partition count scalable beyond 1 million
No need to manage a separate ZooKeeper cluster, greatly reducing operational complexity

Q2. Pull vs Push Model

What are the differences, pros, and cons between pull-based consumption (Kafka, SQS) and push-based consumption (RabbitMQ)?

Pull-Based (Kafka, SQS):

Consumers actively fetch messages
Pros: Consumers can fetch at their own processing speed (natural backpressure)
Cons: Requires long polling, slightly lower real-time capability

Push-Based (RabbitMQ):

Broker pushes messages to consumers
Pros: Immediate delivery on message arrival (low latency)
Cons: Requires separate backpressure mechanisms when consumers are overloaded (prefetch count, etc.)

Q3. Exactly-Once Semantics

What 3 configurations are needed to achieve Exactly-Once Semantics (EOS) in Kafka?

Answer:

enable.idempotence = true: Even if a producer sends duplicate messages, only one is stored
transactional.id setting: A unique ID that identifies producer transactions
isolation.level = read_committed: Consumers only read committed transaction messages

Combining all three guarantees end-to-end Exactly-Once from producer to consumer.

Q4. SQS Standard vs FIFO

What are the differences between Amazon SQS Standard and FIFO queues in terms of throughput, ordering, and deduplication?

Feature	Standard	FIFO
Throughput	Nearly unlimited	300 TPS (3,000 with batching)
Ordering	Best-effort (no guarantee)	Strict FIFO ordering
Deduplication	At-Least-Once (duplicates possible)	Exactly-Once (5-minute dedup window)
Price	0.40 USD / 1M requests	0.50 USD / 1M requests
Use Case	High throughput, order-independent	Payments, orders where sequence matters

Q5. Pulsar vs Kafka Architecture

What are 2 operational advantages of Pulsar's compute/storage separation architecture compared to Kafka?

Answer:

Independent Scaling: Brokers (compute) and BookKeeper (storage) can be scaled separately. In Kafka, brokers store data directly, so scaling storage requires adding entire brokers. In Pulsar, you just add BookKeeper nodes.
Expansion Without Rebalancing: When adding a broker in Kafka, partition rebalancing is needed, causing massive data movement. In Pulsar, new brokers can immediately take topic ownership, minimizing service impact during scaling.

References

Apache Kafka 4.0 Release Notes — KRaft GA, Share Groups
KIP-932: Queues for Kafka — Share Group specification
RabbitMQ 4.0 Release Blog — AMQP 1.0, Khepri
RabbitMQ Streams — Stream Plugin and filtering
Amazon SQS Developer Guide — Standard vs FIFO
Apache Pulsar 4.1 Release — 19 PIPs
NATS 2.11 Release Notes — Message TTL, distributed tracing
Confluent Benchmark: Kafka vs Pulsar — Performance comparison
Spring for Apache Kafka — Official Spring Kafka docs
Spring AMQP — Spring RabbitMQ integration
Spring Cloud AWS SQS — SQS integration
Spring for Apache Pulsar — Official Spring Pulsar docs
Designing Data-Intensive Applications (Martin Kleppmann) — Message system theory
Enterprise Integration Patterns — Messaging pattern bible
Confluent Cloud Pricing — Cloud cost reference
AWS MSK Pricing — MSK cost reference