Skip to content
Published on

Distributed Messaging 2026 — Kafka 3.9 / NATS / Redpanda / Pulsar / RabbitMQ 4 / WarpStream Deep-Dive

Authors

Prologue — "Should we use a message queue?" is now a weird question

A design meeting at some team in 2026.

Junior: "We send events through Kafka, right?" Senior: "What's the traffic?" Junior: "About 100 per second." Senior: "...why Kafka?"

That short exchange contains everything about 2026. On one side, a near-religious belief that "messaging equals Kafka". On the other, a veteran sigh of "please stop using Kafka for things that don't need Kafka". And in between sit NATS, Redpanda, Pulsar, RabbitMQ 4, RocketMQ, WarpStream, and Buf Stream.

As of 2026, distributed messaging splits into three paradigms: classic message queues (RabbitMQ), log-based streaming (Kafka family), and cloud-native pub/sub (NATS, Pulsar). On top of that, two operational axes: disk-based vs object-storage-based (WarpStream, Kafka tiered storage), self-hosted vs managed.

This post draws the whole map. From Kafka 3.9's KRaft transition, to WarpStream's Kafka on S3, NATS's single-infra ambition, Pulsar Functions, RabbitMQ 4's Khepri move, all the way to what Korean and Japanese big tech actually run.


1 · The 2026 Map — Pub/Sub / Queue / Stream Paradigms

Terminology first. Messaging systems group into three kinds.

ParadigmMeaningExamplesRetention
Queue (task queue)1:1 work distribution, consumer acks then deletesRabbitMQ, SQS, Celery brokerDeleted on consume
Pub/Sub (broadcast)1:N, many subscribers per topicNATS Core, Redis Pub/Sub, MQTTNone (or brief)
Stream (log)Ordered log, multiple consumer groupsKafka, Pulsar, Kinesis, JetStreamTime/size-based

In 2026 the lines blur. Kafka mimics queues (static membership, share groups), RabbitMQ added streams (Streams plugin), NATS got persistent streams via JetStream, and Pulsar was both from day one.

Still, start with the paradigm. A decision tree:

  1. "One task processed by one consumer, then done" → Queue (RabbitMQ, SQS).
  2. "Same event consumed by multiple systems" → Stream (Kafka, Pulsar) or Pub/Sub (NATS).
  3. "Store events for days to years and replay" → Stream with long retention (Kafka + tiered storage, Pulsar tiered).
  4. "Tens to hundreds of events per second, simple" → RabbitMQ or NATS Core.
  5. "Multi-tenant, multi-region, tight P99" → NATS or Pulsar.
  6. "AWS only, want S3 costs to dominate" → WarpStream.

When Kafka is your hammer, every problem looks like a nail. The 2026 veteran says, "100 per second? RabbitMQ is plenty, and ops is 10x simpler."


2 · Kafka 3.9 (KRaft Default, ZooKeeper Removed) → 4.0 Outlook

Kafka 3.7 (2024) declared KRaft production-ready, Kafka 3.9 (late 2024) made KRaft the default and deprecated ZooKeeper mode, and Kafka 4.0 (2025) removed ZooKeeper code entirely.

The operational impact is concrete.

ItemZK eraKRaft era
Cluster coordinationExternal ZooKeeper ensembleInternal Raft (KRaft)
Bootstrapbroker + 3 to 5 ZKbroker (controller combined or split)
Metadata storeZK znodesInternal Kafka topic (__cluster_metadata)
Metadata scale~200K partitionsMillions of partitions
Startup timeTens of seconds to minutesSeconds
Operational complexityTwo systemsOne system

The simplest KRaft bring-up.

# Kafka 3.9, single-node KRaft (for dev)
export KAFKA_HOME=/opt/kafka_2.13-3.9.0
$KAFKA_HOME/bin/kafka-storage.sh random-uuid
# Output: yzs8XK6oR1qV3R-... (CLUSTER_ID)

$KAFKA_HOME/bin/kafka-storage.sh format \
  -t $CLUSTER_ID \
  -c $KAFKA_HOME/config/kraft/server.properties

$KAFKA_HOME/bin/kafka-server-start.sh \
  $KAFKA_HOME/config/kraft/server.properties

For production, separate controllers from brokers (process.roles=controller vs process.roles=broker). The standard layout is 3 controllers plus N brokers.

The key KIPs as of 2026.

  • KIP-405 — Tiered Storage: auto-offload cold segments to S3/GCS/HDFS. Hot on local SSD, cold on object storage. Storage cost down 5x to 10x.
  • KIP-848 — Next Gen Consumer Rebalance: server-side rebalance, shorter stop-the-world.
  • KIP-932 — Queues for Kafka (Share Groups): the 4.0 headline. Native queue semantics on top of Kafka (multiple consumers competing on the same topic, ack/timeout/redelivery). This invades RabbitMQ's territory.
  • KIP-1102 — Transactions v2: more reliable transaction coordinator.

KRaft + tiered storage + share groups together aim at "stream, queue, and long-term storage in one system" for Kafka 4.0.


3 · Confluent Cloud vs Aiven vs Redpanda — Managed Kafka

Fewer teams self-operate Kafka every year. Managed options matured.

ProviderCompatibilityPricing modelNotes
Confluent Cloud100% (their OSS)Throughput + storage (Basic/Standard/Dedicated)Schema Registry, Connect, ksqlDB, Stream Designer bundled
Aiven for Kafka100% (Apache Kafka)Instance hours + storageMulti-cloud, Karapace (OSS schema registry)
AWS MSK100% (Apache Kafka)Instance hours + storageKRaft support, MSK Serverless separately
Azure Event HubsKafka API (partial)Throughput Unit + CaptureAzure integration
Redpanda CloudKafka wire compatibleCluster hours + storageOwn engine (C++), tiered storage
WarpStreamKafka wire compatibleThroughput only (storage in customer S3)Kafka on S3, BYOC by default

Practical guidance for 2026.

  • Full-stack integration + enterprise support: Confluent Cloud.
  • Pure OSS Kafka, multi-cloud: Aiven.
  • AWS lock-in OK, want simplicity from MSK Serverless: MSK.
  • Operational simplicity, half the nodes: Redpanda Cloud.
  • Minimize storage cost and cross-AZ egress: WarpStream.

The dominant cost variable is inter-AZ transfer. Kafka replicates across 3 AZs by default, so replica traffic crosses AZ boundaries at roughly USD 0.01-0.02 per GB. At high throughput, that cost exceeds the cluster cost. WarpStream and KIP-392 (fetch from follower) exist to crush that bill.


4 · Redpanda — Kafka Compatible in C++

Redpanda started in 2019 from Vectorized (now Redpanda Data). One-liner: Kafka rewritten in C++, no JVM. They borrowed ScyllaDB's Seastar thread-per-core model.

Key differences.

  • No JVM, no GC pause: stable P99 latency.
  • No ZooKeeper from day 1: their own Raft, well before Kafka.
  • Single binary: broker equals controller, 1-3 nodes is enough.
  • Kafka wire compatible: clients, connectors, UIs all reuse.
  • Built-in tiered storage: cold data to S3/GCS.
  • WASM transforms: Redpanda Data Transforms run WASM inside the broker.

Performance claims read "2-5x throughput on the same hardware, 1/3 the nodes". Benchmarks always depend on workload, but fewer nodes mean less operations burden, and that matches field experience.

Single-node Docker bring-up.

docker run -d --name redpanda \
  -p 9092:9092 -p 9644:9644 \
  redpandadata/redpanda:v24.3.1 \
  redpanda start --smp 1 --memory 1G --reserve-memory 0M \
  --overprovisioned --node-id 0 --check=false

Use any Kafka client directly.

docker exec -it redpanda rpk topic create test
docker exec -it redpanda rpk topic produce test
> hello
> ^D
docker exec -it redpanda rpk topic consume test --num 1

Redpanda's 2026 positioning: "teams that hate running JVM/ZK but need Kafka compatibility". Adoption rose in gaming, finance, and IoT where latency consistency matters.


5 · WarpStream (Confluent Acquisition, 2024) — Kafka on S3

WarpStream emerged in 2023 and Confluent acquired it in September 2024. The one-line idea: Kafka wire protocol with S3 as the storage tier.

Classic Kafka cost structure.

  1. Broker instance cost (EC2/EBS).
  2. Cross-AZ replication traffic (USD 0.01/GB).
  3. EBS storage cost.
  4. Cross-AZ fetch when consumers live elsewhere.

WarpStream architecture.

  1. Stateless Agents speak the Kafka wire protocol and write directly to S3.
  2. Metadata lives in the WarpStream control plane (or your own controller in BYOC mode).
  3. No disks means no cross-AZ replication traffic.
  4. All data goes to S3, so 11-nines durability is the cloud's problem.
  5. Compute (Agents) and storage (S3) decouple completely and scale independently.

The trade-off is latency. Classic Kafka delivers 5-50ms P99 produce; WarpStream sits at 200-500ms because of S3 PUT latency. So it doesn't fit "low-latency real-time", but it shines on "high-throughput, minimum cost, second-scale latency OK" workloads. Data lake ingestion, log and metric pipelines, and CDC sinks are the canonical cases.

Post-acquisition direction folds WarpStream into Confluent Cloud as a deployment type. Confluent Cloud Freight Cluster is essentially the WarpStream engine.


6 · NATS + JetStream + KV + ObjectStore

NATS started in 2011 as a lightweight pub/sub system. Synadia spun it out as a company in 2018 and donated NATS to the CNCF the same year. The philosophy: simplicity and speed.

NATS evolution.

PeriodFeatures
2011Core NATS — fire-and-forget pub/sub
2018NATS Streaming (deprecated later)
2020JetStream — persistent stream, KV, ObjectStore
2024NATS 2.10 — domains, leaf node security
2025NATS 2.11 — pull consumer improvements, ADR integration

A single 2026 NATS node (clustering is simple by design).

# Enable JetStream
nats-server -js -sd /data/nats

Create a stream.

nats stream add ORDERS \
  --subjects "orders.>" \
  --storage file --retention limits \
  --max-msgs=1000000 --max-age=24h

KV.

nats kv add config
nats kv put config feature.new-checkout '{"enabled":true}'
nats kv get config feature.new-checkout

ObjectStore (S3-like).

nats object add my-bucket
nats object put my-bucket large.zip ./large.zip
nats object get my-bucket large.zip > out.zip

NATS's strength is single infrastructure. Queue, pub/sub, stream, KV, object store, distributed locks, and leader election all live in one binary. For many microservices with multi-region or edge deployments, NATS is simpler than the sum of "one Kafka + one Redis + one Consul + one MinIO".

The weakness is ecosystem. Kafka's huge connector catalogue, Schema Registry, and stream processing (ksqlDB, Flink) are richer than NATS's.


7 · Apache Pulsar 4.0 + Pulsar Functions

Apache Pulsar came from Yahoo (open-sourced 2016) and graduated to Apache in 2018. The differentiator: compute and storage are separated — brokers are stateless, BookKeeper is the storage layer.

Pulsar 4.0 (early 2025) highlights.

  • Lakehouse Tiered Storage: cold data lands as Parquet in S3/GCS so Trino/Spark can query it.
  • Pulsar Functions stabilized: Lambda-like functions on top of streams, plus function mesh for workflows.
  • Topic compaction improvements.
  • Protocol gateways: unified WebSocket, HTTP, and MQTT.

Structural advantages of Pulsar.

ItemKafkaPulsar
Broker stateStateful (disk)Stateless
StorageBroker diskBookKeeper bookies
Rebalance on node addData movement requiredNear-instant
Multi-tenancyCluster-levelTenant/namespace/topic in one cluster
Geo replicationMirrorMaker 2Built-in geo-replication

Pulsar's weakness is operational complexity. Brokers plus bookies plus ZooKeeper (Pulsar still uses ZK as of 2026) is three components to operate. Adoption in Korea and Japan is low, so hiring operators is harder.

Pulsar's 2026 strength shows in large multi-tenant SaaS. Splunk, Yahoo, Tencent, and StreamNative run hundreds of thousands of topics on a single cluster.


8 · RabbitMQ 4.0 — Khepri (Raft) + Streams Plugin

RabbitMQ has been the AMQP 0.9.1 reference implementation since 2007 and the "standard message queue" for a generation. RabbitMQ 4.0 (2024) was a major shift.

The big changes.

  • Mnesia to Khepri: cluster metadata moves from Erlang Mnesia (classic distributed DB) to Khepri (Raft-based home-grown DB). Split-brain recovery is clean now.
  • Streams plugin GA: Kafka-like append-only logs inside RabbitMQ (Stream Queue Type). Throughput up to 1M msg/s.
  • Quorum Queues as default: classic queues out, Raft-based Quorum Queues recommended.
  • MQTT 5 and AMQP 1.0 hardened.

Where RabbitMQ still fits.

  • Job queues: Celery, Sidekiq, Resque, etc.
  • RPC patterns: request-reply with reply-to queues.
  • Fanout / topic routing: routing-key based branching.
  • Back-office work: email sending, PDF rendering, payment processing.

Declare a Quorum Queue.

# CLI
rabbitmqadmin declare queue name=tasks queue_type=quorum

Publish from a Java client.

ConnectionFactory factory = new ConnectionFactory();
factory.setHost("rabbitmq");
try (Connection conn = factory.newConnection();
     Channel ch = conn.createChannel()) {
    ch.queueDeclare("tasks", true, false, false,
        Map.of("x-queue-type", "quorum"));
    ch.basicPublish("", "tasks",
        MessageProperties.PERSISTENT_TEXT_PLAIN,
        "hello".getBytes());
}

Replacing RabbitMQ with Kafka usually ends in regret. Thousands of jobs per second, complex routing, short retention — RabbitMQ is simpler and cheaper.


9 · Apache RocketMQ (Alibaba) — China Scale

Alibaba built RocketMQ in 2012 and Apache promoted it in 2017. It powers the Double 11 (11/11) infrastructure at tens of billions of messages per second.

Features.

  • Order modes: per-message-group ordering instead of per-partition (FIFO Topic).
  • Transactional messages: half-commit / commit / rollback in three steps — atomicity between DB transaction and message publish.
  • Scheduled and delayed messages built in.
  • Strong dashboard and tracing.
  • RocketMQ 5.0+ (2023+): compute-storage separation, cloud-native.

Almost unused in Korea, Japan, and the West, but extremely common in China and Southeast Asia as a Kafka alternative. Alibaba Cloud's message service is RocketMQ in disguise.

A transactional message in Java.

TransactionMQProducer producer = new TransactionMQProducer("group");
producer.setTransactionListener(new TransactionListener() {
    public LocalTransactionState executeLocalTransaction(Message msg, Object arg) {
        // 1) run DB transaction
        // 2) success -> COMMIT, fail -> ROLLBACK, unsure -> UNKNOWN
        return LocalTransactionState.COMMIT_MESSAGE;
    }
    public LocalTransactionState checkLocalTransaction(MessageExt msg) {
        // broker re-checks unresolved messages
        return LocalTransactionState.COMMIT_MESSAGE;
    }
});

Few Korean teams pick RocketMQ for greenfield, but if you deploy on Alibaba Cloud or Tencent Cloud, managed RocketMQ is the natural choice.


10 · NSQ / Memphis (Superstream)

NSQ

NSQ is a distributed messaging system bitly open-sourced in 2013. Written in Go. The philosophy: no central broker, simple topology.

  • A single binary (nsqd) deploys per host so messages publish locally.
  • nsqlookupd handles discovery.
  • No clustering (each node independent).
  • Memory-first with disk overflow.

Simplicity is NSQ's weapon. The narrative is "big companies do not use it", but it still shines for IRC bots, domain indexing, and simple job queues. For a 2026 greenfield project, however, NATS does almost everything better.

Memphis to Superstream

Memphis launched in 2022 as a "developer-friendly Kafka" with GUI, schema management, and DLQ built in. In 2024 the company rebranded to Superstream and pivoted to a "cost optimization layer on top of Kafka".

Superstream's 2026 positioning is interesting.

  • It does not replace Kafka; it tunes configs with AI, compresses, and balances partitions.
  • It runs on top of Confluent Cloud / MSK / Aiven and promises 30-50% cost savings.
  • Common pattern: re-recommend misconfigured batch.size, linger.ms, and compression.type based on the actual workload.

The pitch is "keep Kafka, cut the bill" and teams paying eye-watering managed Kafka invoices are biting.


11 · Buf Stream — A New Take on gRPC Streams

Buf is famous in Protocol Buffers tooling. In 2024 they announced Buf Stream and stepped into messaging.

Core ideas.

  • Schema-first messaging: every topic has a Protobuf schema, and malformed messages are rejected at publish time.
  • Kafka API compatible: existing Kafka clients keep working.
  • Integrates with Buf Schema Registry to enforce schema evolution rules.
  • Stateless architecture: like WarpStream, S3 or object storage is the primary store.

Buf Stream's pitch.

  • "Stop bad data from entering topics at compile and deploy time."
  • "Buf CLI lints schema changes, checks compatibility, and blocks breaking changes."
  • "Keep your Kafka clients — replace only the infra."

In 2026 Buf Stream sees limited but real production use. Organizations where schema governance is the core value (finance, healthcare, B2B SaaS) are paying attention. The pitch is collapsing the "two systems" (Confluent Schema Registry + Kafka) into one.


12 · Event Sourcing / CQRS / Outbox / Saga Patterns

The system is just a tool — how you use it is the real thing. The four 2026-proof patterns.

1) Event Sourcing

Store the sequence of events that produced the state instead of the state itself. Current state is the fold of events.

// Order state = fold of events
function orderState(events) {
  return events.reduce((state, e) => {
    switch (e.type) {
      case 'OrderCreated': return { id: e.id, status: 'PENDING', items: e.items };
      case 'OrderPaid':    return { ...state, status: 'PAID', paidAt: e.at };
      case 'OrderShipped': return { ...state, status: 'SHIPPED', tracking: e.tracking };
      default: return state;
    }
  }, null);
}
  • Pros: audit log, time travel, rebuilding new views.
  • Cons: queries are hard, so pair with CQRS.

2) CQRS (Command Query Responsibility Segregation)

Separate the write and read models. Read models are built asynchronously from the event stream.

3) Transactional Outbox

The classic mistake: "DB transaction, then publish event" as two steps. A failure between them leaves the world inconsistent. Outbox bundles an outbox table INSERT in the same DB transaction, and a separate process reads the outbox and publishes to Kafka. Debezium CDC or a custom poller.

BEGIN;
  INSERT INTO orders (id, user_id, total) VALUES (...);
  INSERT INTO outbox (aggregate_id, event_type, payload)
         VALUES ('order-123', 'OrderCreated', '{"...":"..."}');
COMMIT;
-- another process polls/CDCs the outbox and publishes to Kafka

4) Saga

The distributed version of a multi-service transaction, with a compensating action per step.

  • Pay, then deduct inventory, then book shipping. Failure rolls earlier steps back (refund, restore inventory).
  • Two implementations: orchestration (central coordinator like Temporal, AWS Step Functions) vs choreography (each service reacts to events).

Running distributed messaging in production without these four patterns is a ticking time bomb of data inconsistency.


13 · KR/JP Adoption — Kakao, Toss, LINE, Mercari, ZOZO, CyberAgent

Kakao

Kakao operates Kafka as an internal managed service. KakaoTalk message flows, search indexing, and log pipelines all sit on Kafka. KRaft migration started in 2025 in phases. A few teams use RabbitMQ for back-office workers.

Toss

Toss's data engineering is well known as Kafka-centric. Payment events, CDC, and real-time ML feature stores all run on Kafka. Some workloads are evaluating WarpStream and tiered storage lately. Multiple talks at the in-house SLASH conference share Kafka operations know-how.

LINE

LINE's messaging infrastructure is a hybrid of an in-house queue (LMQ) and Kafka. Traffic is global and message ordering matters, so they keep proprietary tech. Ads, logs, and metrics ride Kafka. Java is strong at LINE, so their ZK-era operational experience is a real asset.

Mercari

Mercari is GCP-centric and Pub/Sub plus Dataflow is the main stack, but Kafka also appears in payments and inventory. The Mercari engineering blog has plenty of Pub/Sub vs Kafka write-ups.

ZOZO

ZOZO adopted Kafka for search and recommendation pipelines. Fashion catalogue changes flow through Kafka to update the search index. Internal talks share their Schema Registry operations.

CyberAgent / AbemaTV

AbemaTV runs live-streaming metadata and ad bidding pipelines on Kafka. CyberAgent group companies are partially adopting NATS and Pulsar, and Japanese conference talks on them are rising.

The common pattern: in 2026 the trend is "the right tool per workload" rather than "Kafka for everything". Payments and CDC pick Kafka, microservice messaging picks NATS, job queues pick RabbitMQ, and data lake ingestion picks WarpStream or MSK tiered. Three to four messaging systems coexisting inside one company is the new normal.


14 · Decision Checklist — What Do We Pick?

One-page decision sheet to close.

SituationFirst choiceAlternatives
Under 100/s, job queueRabbitMQNATS JetStream, SQS
10K-100K/s, multi-consumer, event logKafka (KRaft)Redpanda, Pulsar
Multi-region, P99 of 1ms goalNATSPulsar
AWS only, storage cost dominantWarpStreamMSK + Tiered Storage
Multi-tenant SaaS, hundreds of thousands of topicsPulsarKafka with KRaft + careful partition mgmt
0-1 ops peopleManaged (MSK Serverless, Confluent Cloud, NATS Synadia Cloud)-
China marketRocketMQ / Aliyun MQKafka
Schema-first, strong governanceBuf StreamKafka + Confluent Schema Registry
Simple pub/sub, tinyNATS CoreRedis Pub/Sub
Already on Kafka, want savingsSuperstream + WarpStreamManual tuning + Tiered Storage

Operational must-haves.

  • Monitoring: Prometheus with Kafka Exporter / Burrow (consumer lag), JMX metrics, Datadog or Grafana.
  • Schema management: Confluent Schema Registry / Buf / Karapace. Topics without schemas in production are six-month-later regrets.
  • Backup and DR: MirrorMaker 2 (Kafka), Geo-replication (Pulsar), JetStream mirrors (NATS).
  • AuthN/Z: SASL/OAuth and ACLs — anonymous publish must not work in production.
  • Consumer lag alerts: lag over 30 minutes pages on call. The most common incident class.

Epilogue — Messages Are Contracts, Not Infrastructure

The real lesson of distributed messaging in 2026 is not the tools but the contracts. Between systems: what events, in what schema, with what guarantees (at-least-once, exactly-once, ordering, latency).

Tools change. The person who ran ZooKeeper five years ago runs KRaft today and might run Kafka on S3 next year. But teams that design contracts well survive any tool migration, and teams that design them badly meet the same problems with a new logo.

Next post candidates: Outbox pattern deep-dive — the safest event publishing with Debezium CDC and Kafka, Getting consumer lag to zero — KIP-848 / partition assignment / backpressure, NATS multi-region — true global distribution with leaf nodes.

"Picking a message queue is designing a contract between two systems. The tool comes after."

— Distributed Messaging 2026, end.


References