- Published on
Microservices Data Synchronization with the Outbox Pattern and CDC: A Practical Debezium Guide
- Authors
- Name
- Introduction: The Dual Write Problem
- 1. Outbox Pattern Architecture
- 2. Outbox Table Design
- 3. CDC Concepts and How It Works
- 4. Debezium Installation and Connector Configuration
- 5. Kafka Connect Pipeline
- 6. Event Ordering Guarantees and Idempotency
- 7. Polling vs. CDC: A Comparison
- 8. Debezium vs. Maxwell vs. Canal
- 9. Failure Scenarios and Recovery Procedures
- 10. Production Operations Checklist
- References
- Quiz

Introduction: The Dual Write Problem
One of the most common challenges in microservices architecture is the need to write data to both a database and a message broker simultaneously. Consider the case where an order service must create an order and publish an event to the inventory service at the same time. A naive implementation looks something like this:
// Dangerous Dual Write pattern
@Transactional
public Order createOrder(OrderRequest request) {
Order order = orderRepository.save(new Order(request)); // Step 1: Save to DB
// Step 2: Publish event to message broker
kafkaTemplate.send("order-events", new OrderCreatedEvent(order));
return order;
}
This code harbors a critical flaw. If the DB save succeeds but the Kafka publish fails, the order exists but no other service is aware of it. Conversely, if the Kafka publish succeeds but the DB transaction rolls back, an event is propagated for an order that does not exist. This is the Dual Write problem.
While distributed transactions (2PC) could theoretically solve this, Kafka cannot participate in traditional XA transactions, and 2PC itself introduces performance bottlenecks and reduces availability. The pattern that fundamentally solves this problem is the Outbox Pattern, and the technology that efficiently implements it is CDC (Change Data Capture).
1. Outbox Pattern Architecture
1.1 Core Idea
The core idea of the Outbox Pattern is straightforward: instead of publishing events directly to a message broker, write them to an Outbox table within the same transaction as the business data. This way, the database's ACID transaction guarantees the atomicity of both operations. A separate process then reads the Outbox table and forwards the events to the message broker.
[Service Code]
│
├── Business data INSERT ────────┐
│ │ Same DB transaction
└── Outbox table INSERT ─────────┘
│
▼
[Outbox Relay]
(CDC or Polling)
│
▼
[Message Broker]
(Kafka, RabbitMQ)
│
▼
[Other Microservices]
The benefits of this approach are clear. Since the business data and the event are bound within a single transaction, either both are persisted or neither is. At-least-once delivery is guaranteed, and if consumers implement idempotency, effectively-once semantics can also be achieved.
1.2 Two Implementation Approaches for the Outbox Pattern
There are two primary approaches for relaying changes from the Outbox table to the message broker:
- Polling approach: Periodically query the Outbox table to retrieve unpublished events
- CDC approach: Monitor the database's transaction log (WAL/binlog) to capture changes
2. Outbox Table Design
2.1 Basic Schema
-- PostgreSQL Outbox table DDL
CREATE TABLE outbox_events (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
aggregate_type VARCHAR(255) NOT NULL, -- Domain the event belongs to (e.g., 'Order')
aggregate_id VARCHAR(255) NOT NULL, -- Domain entity ID (e.g., order ID)
event_type VARCHAR(255) NOT NULL, -- Event type (e.g., 'OrderCreated')
payload JSONB NOT NULL, -- Event body
metadata JSONB DEFAULT '{}', -- Additional metadata
created_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW(),
published_at TIMESTAMP WITH TIME ZONE,-- For polling: timestamp when published
retry_count INT DEFAULT 0, -- For polling: retry count
status VARCHAR(20) DEFAULT 'PENDING' -- PENDING, PUBLISHED, FAILED
);
-- Index for querying unpublished events in polling approach
CREATE INDEX idx_outbox_status_created ON outbox_events(status, created_at)
WHERE status = 'PENDING';
-- For CDC approach, the status/published_at columns are unnecessary,
-- and rows can be deleted after event publication
2.2 Key Design Considerations
Use aggregate_id as the Kafka partition key: Events with the same aggregate_id land in the same partition, ensuring event ordering for that entity. For example, the events 'OrderCreated' -> 'OrderPaid' -> 'OrderShipped' for order 123 are processed in order.
Include all necessary information in the payload: The payload should contain enough information for the consumer to complete its processing using the event alone. Consumers should not directly query the producer's database, as this increases inter-service coupling.
Prevent table bloat: With the CDC approach, periodically delete old rows after events have been captured. With the polling approach, archive or delete published events after a retention period.
2.3 Trigger-Based Outbox for MySQL (Alternative)
-- Automatic Outbox recording using triggers in MySQL
DELIMITER //
CREATE TRIGGER after_order_insert
AFTER INSERT ON orders
FOR EACH ROW
BEGIN
INSERT INTO outbox_events (
id, aggregate_type, aggregate_id, event_type, payload, created_at
) VALUES (
UUID(),
'Order',
NEW.order_id,
'OrderCreated',
JSON_OBJECT(
'orderId', NEW.order_id,
'userId', NEW.user_id,
'totalAmount', NEW.total_amount,
'status', NEW.status,
'createdAt', NEW.created_at
),
NOW()
);
END //
CREATE TRIGGER after_order_update
AFTER UPDATE ON orders
FOR EACH ROW
BEGIN
IF OLD.status != NEW.status THEN
INSERT INTO outbox_events (
id, aggregate_type, aggregate_id, event_type, payload, created_at
) VALUES (
UUID(),
'Order',
NEW.order_id,
CONCAT('OrderStatus', NEW.status),
JSON_OBJECT(
'orderId', NEW.order_id,
'previousStatus', OLD.status,
'newStatus', NEW.status,
'updatedAt', NEW.updated_at
),
NOW()
);
END IF;
END //
DELIMITER ;
The trigger-based approach has the advantage of requiring no application code changes, but the trigger execution cost is included in the main transaction, which can impact performance. Additionally, embedding complex business logic in triggers is difficult and debugging is cumbersome. In most cases, explicitly inserting Outbox rows at the application level is recommended.
3. CDC Concepts and How It Works
3.1 What Is CDC?
CDC (Change Data Capture) is a technology that captures database changes (INSERT, UPDATE, DELETE) in real time and delivers them to external systems. The key to CDC is reading the database's transaction log directly.
- PostgreSQL: Uses logical replication slots from the WAL (Write-Ahead Log)
- MySQL: Reads the binlog (binary log) to capture changes
- SQL Server: Has built-in CDC functionality that automatically creates change tables
- MongoDB: Streams changes from the oplog via Change Streams
3.2 Advantages of CDC
Log-based CDC offers several advantages over the polling approach:
- Extremely low latency: Changes are captured within milliseconds after transaction commit
- Minimal DB load: Reads a log stream rather than running index-based queries, imposing virtually no additional load on the database
- Ability to capture DELETEs: Polling struggles to detect deleted rows, but DELETE events are recorded in the log
- Captures all changes: No intermediate state changes are missed (polling can miss changes between intervals)
- Schema change tracking: Table structure changes are also recorded in the log, enabling schema evolution handling
4. Debezium Installation and Connector Configuration
4.1 Debezium Overview
Debezium is an open-source CDC platform led by Red Hat that runs on top of the Kafka Connect framework. It supports a wide range of databases including PostgreSQL, MySQL, MongoDB, SQL Server, Oracle, Cassandra, and Vitess. Through the Outbox Event Router SMT (Single Message Transformation), it provides native support for the Outbox Pattern.
4.2 Full Stack Setup with Docker Compose
# docker-compose.yml - Full Debezium + Kafka stack
version: '3.8'
services:
postgres:
image: postgres:16
environment:
POSTGRES_DB: orderdb
POSTGRES_USER: appuser
POSTGRES_PASSWORD: secret
command:
- 'postgres'
- '-c'
- 'wal_level=logical' # Required setting for CDC
- '-c'
- 'max_replication_slots=4'
- '-c'
- 'max_wal_senders=4'
ports:
- '5432:5432'
volumes:
- postgres_data:/var/lib/postgresql/data
zookeeper:
image: confluentinc/cp-zookeeper:7.6.0
environment:
ZOOKEEPER_CLIENT_PORT: 2181
ZOOKEEPER_TICK_TIME: 2000
kafka:
image: confluentinc/cp-kafka:7.6.0
depends_on:
- zookeeper
ports:
- '9092:9092'
environment:
KAFKA_BROKER_ID: 1
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka:29092,PLAINTEXT_HOST://localhost:9092
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT
KAFKA_INTER_BROKER_LISTENER_NAME: PLAINTEXT
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR: 1
KAFKA_TRANSACTION_STATE_LOG_MIN_ISR: 1
connect:
image: debezium/connect:2.7
depends_on:
- kafka
- postgres
ports:
- '8083:8083'
environment:
BOOTSTRAP_SERVERS: kafka:29092
GROUP_ID: debezium-connect-group
CONFIG_STORAGE_TOPIC: connect-configs
OFFSET_STORAGE_TOPIC: connect-offsets
STATUS_STORAGE_TOPIC: connect-status
CONFIG_STORAGE_REPLICATION_FACTOR: 1
OFFSET_STORAGE_REPLICATION_FACTOR: 1
STATUS_STORAGE_REPLICATION_FACTOR: 1
kafka-ui:
image: provectuslabs/kafka-ui:latest
ports:
- '8080:8080'
environment:
KAFKA_CLUSTERS_0_NAME: local
KAFKA_CLUSTERS_0_BOOTSTRAPSERVERS: kafka:29092
KAFKA_CLUSTERS_0_KAFKACONNECT_0_NAME: debezium
KAFKA_CLUSTERS_0_KAFKACONNECT_0_ADDRESS: http://connect:8083
volumes:
postgres_data:
4.3 Registering the Debezium Connector (with Outbox Event Router)
{
"name": "order-outbox-connector",
"config": {
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"database.hostname": "postgres",
"database.port": "5432",
"database.user": "appuser",
"database.password": "secret",
"database.dbname": "orderdb",
"topic.prefix": "order-service",
"schema.include.list": "public",
"table.include.list": "public.outbox_events",
"tombstones.on.delete": "false",
"slot.name": "order_outbox_slot",
"plugin.name": "pgoutput",
"transforms": "outbox",
"transforms.outbox.type": "io.debezium.transforms.outbox.EventRouter",
"transforms.outbox.table.fields.additional.placement": "event_type:header:eventType",
"transforms.outbox.table.field.event.id": "id",
"transforms.outbox.table.field.event.key": "aggregate_id",
"transforms.outbox.table.field.event.payload": "payload",
"transforms.outbox.table.field.event.timestamp": "created_at",
"transforms.outbox.route.by.field": "aggregate_type",
"transforms.outbox.route.topic.replacement": "events.${routedByValue}",
"transforms.outbox.table.expand.json.payload": "true",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable": "false",
"heartbeat.interval.ms": "10000",
"snapshot.mode": "initial"
}
}
The key points of this configuration are as follows:
table.include.listmonitors only the Outbox table, preventing unnecessary CDC eventstransforms.outbox.route.by.fieldis set toaggregate_type, so Kafka topics are automatically created per domain. For example, if aggregate_type is 'Order', the event is published to theevents.Ordertopictable.field.event.keyis set toaggregate_id, ensuring events for the same entity land in the same Kafka partitionheartbeat.interval.msis configured so that offsets are periodically updated even when there are no changes to the Outbox table. Without this setting, WAL retention can grow unnecessarily large
# Register connector
curl -X POST http://localhost:8083/connectors \
-H "Content-Type: application/json" \
-d @order-outbox-connector.json
# Check connector status
curl -s http://localhost:8083/connectors/order-outbox-connector/status | jq .
# Restart connector
curl -X POST http://localhost:8083/connectors/order-outbox-connector/restart
# List topics (via Kafka UI or CLI)
docker exec -it kafka kafka-topics --bootstrap-server localhost:9092 --list
5. Kafka Connect Pipeline
5.1 End-to-End Pipeline Flow
The full flow of a Kafka Connect-based CDC pipeline is as follows:
- The application writes business data and an Outbox event to the database within the same transaction
- The Debezium Source Connector reads the PostgreSQL WAL and detects changes to the Outbox table
- The Outbox Event Router SMT routes the raw CDC event to domain-specific Kafka topics
- Consumer microservices subscribe to the relevant topics and process the events
- Optionally, a Sink Connector forwards events to other data stores (Elasticsearch, S3, etc.)
5.2 Spring Boot Outbox Pattern Implementation
// OutboxEvent.java - Outbox entity
@Entity
@Table(name = "outbox_events")
public class OutboxEvent {
@Id
@GeneratedValue(strategy = GenerationType.UUID)
private UUID id;
@Column(name = "aggregate_type", nullable = false)
private String aggregateType;
@Column(name = "aggregate_id", nullable = false)
private String aggregateId;
@Column(name = "event_type", nullable = false)
private String eventType;
@Column(name = "payload", columnDefinition = "jsonb", nullable = false)
private String payload;
@Column(name = "created_at", nullable = false)
private Instant createdAt;
// Default constructor, getters, setters omitted
public static OutboxEvent create(String aggregateType, String aggregateId,
String eventType, Object payload) {
OutboxEvent event = new OutboxEvent();
event.aggregateType = aggregateType;
event.aggregateId = aggregateId;
event.eventType = eventType;
event.payload = toJson(payload);
event.createdAt = Instant.now();
return event;
}
private static String toJson(Object obj) {
try {
return new ObjectMapper()
.registerModule(new JavaTimeModule())
.writeValueAsString(obj);
} catch (JsonProcessingException e) {
throw new RuntimeException("JSON serialization failed", e);
}
}
}
// OrderService.java - Order service with Outbox Pattern applied
@Service
@RequiredArgsConstructor
public class OrderService {
private final OrderRepository orderRepository;
private final OutboxEventRepository outboxRepository;
@Transactional // Single transaction persists both business data and event
public Order createOrder(OrderRequest request) {
// 1. Execute business logic
Order order = Order.builder()
.userId(request.getUserId())
.items(request.getItems())
.totalAmount(calculateTotal(request.getItems()))
.status(OrderStatus.CREATED)
.build();
Order savedOrder = orderRepository.save(order);
// 2. Record event in Outbox table (same transaction)
OrderCreatedPayload eventPayload = OrderCreatedPayload.builder()
.orderId(savedOrder.getId().toString())
.userId(savedOrder.getUserId())
.totalAmount(savedOrder.getTotalAmount())
.items(savedOrder.getItems())
.createdAt(savedOrder.getCreatedAt())
.build();
OutboxEvent outboxEvent = OutboxEvent.create(
"Order", // aggregate_type -> determines Kafka topic
savedOrder.getId().toString(), // aggregate_id -> Kafka partition key
"OrderCreated", // event_type
eventPayload // payload
);
outboxRepository.save(outboxEvent);
return savedOrder;
}
@Transactional
public Order cancelOrder(UUID orderId) {
Order order = orderRepository.findById(orderId)
.orElseThrow(() -> new OrderNotFoundException(orderId));
if (order.getStatus() != OrderStatus.CREATED) {
throw new IllegalStateException(
"Only orders in CREATED status can be cancelled. Current status: " + order.getStatus()
);
}
order.setStatus(OrderStatus.CANCELLED);
Order savedOrder = orderRepository.save(order);
OutboxEvent outboxEvent = OutboxEvent.create(
"Order",
savedOrder.getId().toString(),
"OrderCancelled",
Map.of(
"orderId", savedOrder.getId().toString(),
"reason", "User request",
"cancelledAt", Instant.now().toString()
)
);
outboxRepository.save(outboxEvent);
return savedOrder;
}
}
6. Event Ordering Guarantees and Idempotency
6.1 Event Ordering Guarantees
Event ordering in the Outbox Pattern follows these principles:
- Ordering is guaranteed for the same Aggregate: By using
aggregate_idas the Kafka partition key, events for the same entity are stored in order within the same partition - Ordering is not guaranteed across different Aggregates: This is by design and aligns with the loose coupling principle between microservices
- Exercise caution when changing partition counts: Changing the number of partitions for a Kafka topic breaks the existing key-to-partition mapping. It is safest to avoid changing partition counts in production
6.2 Idempotency Handling (Python Consumer Example)
import json
import uuid
from datetime import datetime
from kafka import KafkaConsumer
from sqlalchemy import create_engine, text
from sqlalchemy.orm import Session
# Processing history table for idempotency guarantees
# CREATE TABLE processed_events (
# event_id UUID PRIMARY KEY,
# processed_at TIMESTAMP NOT NULL DEFAULT NOW()
# );
engine = create_engine('postgresql://user:pass@localhost:5432/inventorydb')
consumer = KafkaConsumer(
'events.Order',
bootstrap_servers=['localhost:9092'],
group_id='inventory-service',
auto_offset_reset='earliest',
enable_auto_commit=False, # Manual commit for precise control
value_deserializer=lambda m: json.loads(m.decode('utf-8')),
max_poll_records=100,
)
def process_order_event(event_data: dict, event_id: str) -> None:
"""Process events with idempotency guarantees"""
with Session(engine) as session:
with session.begin():
# 1. Check if the event has already been processed
result = session.execute(
text("SELECT 1 FROM processed_events WHERE event_id = :id"),
{"id": event_id}
)
if result.fetchone():
print(f"Skipping already processed event: {event_id}")
return
# 2. Execute business logic
event_type = event_data.get('eventType', '')
if event_type == 'OrderCreated':
items = event_data.get('items', [])
for item in items:
session.execute(
text("""
UPDATE inventory
SET reserved_quantity = reserved_quantity + :qty
WHERE product_id = :product_id
AND available_quantity >= :qty
"""),
{"qty": item['quantity'], "product_id": item['productId']}
)
elif event_type == 'OrderCancelled':
order_id = event_data.get('orderId')
session.execute(
text("""
UPDATE inventory i
SET reserved_quantity = reserved_quantity - oi.quantity
FROM order_items oi
WHERE oi.order_id = :order_id
AND oi.product_id = i.product_id
"""),
{"order_id": order_id}
)
# 3. Record processing history (same transaction)
session.execute(
text("""
INSERT INTO processed_events (event_id, processed_at)
VALUES (:id, :now)
"""),
{"id": event_id, "now": datetime.utcnow()}
)
print(f"Event processed successfully: {event_id} ({event_type})")
# Main consumer loop
print("Starting event consumer...")
try:
for message in consumer:
try:
event_data = message.value
# Extract event_id from headers set by Debezium Outbox Event Router
headers = {k: v.decode('utf-8') for k, v in message.headers}
event_id = headers.get('id', str(uuid.uuid4()))
process_order_event(event_data, event_id)
# Commit offset only after successful processing
consumer.commit()
except Exception as e:
print(f"Event processing failed: {e}")
# On failure, send to DLQ (Dead Letter Queue) or apply retry logic
# Here we simply log and move on to the next event
except KeyboardInterrupt:
print("Consumer shutting down")
finally:
consumer.close()
The key to idempotency handling is wrapping the event processing and the processing history record within the same database transaction. This ensures that if a failure occurs during event processing, the processing history is not recorded, allowing the same event to be reprocessed upon restart. Events that have already been processed are identified via the processing history table and skipped.
7. Polling vs. CDC: A Comparison
| Comparison Criteria | Polling Approach | CDC Approach (Debezium) |
|---|---|---|
| Latency | Depends on polling interval (seconds to minutes) | Near real-time at millisecond level |
| DB Load | Periodic SELECT query overhead | Minimal load via WAL/binlog reading |
| Implementation Complexity | Low (CRON + SQL) | High (Kafka Connect + Debezium) |
| Infrastructure Requirements | Only DB needed | Kafka + Kafka Connect + Debezium |
| DELETE Detection | Not possible (requires soft delete) | Possible (recorded in log) |
| Intermediate State Capture | Not possible (only last state) | Possible (captures all changes) |
| Ordering Guarantees | Timestamp-based (fragile) | Log-based (robust) |
| Scalability | DB connection overhead increases | Kafka Connect horizontally scalable |
| Operational Complexity | Low | High (requires Kafka cluster management) |
| Failure Recovery | Simple retry based on status column | Sophisticated recovery based on Kafka offsets |
| Best Suited For | Small scale, tolerant of higher latency | Large scale, real-time processing required |
When first adopting the Outbox Pattern, starting with the polling approach and incrementally transitioning to CDC as traffic grows is a valid strategy. Even the polling approach fully resolves the Dual Write problem, and may be sufficient if latency requirements are on the order of seconds.
8. Debezium vs. Maxwell vs. Canal
| Comparison Criteria | Debezium | Maxwell | Canal |
|---|---|---|---|
| Supported DBs | PostgreSQL, MySQL, MongoDB, SQL Server, Oracle, Cassandra, etc. | MySQL only | MySQL only |
| Architecture | Distributed via Kafka Connect | Single Java process | Single Java process |
| Message Brokers | Kafka (default), Pulsar, NATS, etc. | Kafka, RabbitMQ, Redis, etc. | Kafka, RocketMQ, etc. |
| Outbox Support | Native via EventRouter SMT | Requires custom implementation | Requires custom implementation |
| Schema Evolution | Schema Registry integration | Limited | Limited |
| Scalability | Horizontally scalable via Kafka Connect distributed mode | Limited to single process | Limited to single process |
| Configuration Complexity | High (requires Kafka Connect knowledge) | Low (simple configuration) | Medium |
| Community | Very active (Red Hat sponsored) | Small | China-centric (Alibaba) |
| Operational Maturity | High | Medium | Medium |
| Best Suited For | Large scale, multi-DB, enterprise | Small services using only MySQL | MySQL-based Chinese ecosystem |
Selection Guide: For most production environments, Debezium is the recommended choice. It supports a wide range of databases, achieves high availability through Kafka Connect's distributed mode, and provides native Outbox Event Router support. If you only use MySQL and need a quick PoC, Maxwell's simplicity is appealing. Canal was developed by Alibaba and is primarily used within the Chinese ecosystem, with relatively lower adoption internationally.
9. Failure Scenarios and Recovery Procedures
9.1 Scenario 1: Debezium Connector Failure
Symptoms: The connector status in Kafka Connect changes to FAILED, and new events are no longer published to Kafka.
Common Causes: DB connection loss, WAL slot deletion, schema change compatibility issues, inability to connect to Kafka brokers, etc.
Recovery Procedure:
# 1. Check connector status
curl -s http://localhost:8083/connectors/order-outbox-connector/status | jq .
# 2. Attempt to restart the connector task
curl -X POST http://localhost:8083/connectors/order-outbox-connector/tasks/0/restart
# 3. If restart fails, delete and re-register the connector
curl -X DELETE http://localhost:8083/connectors/order-outbox-connector
curl -X POST http://localhost:8083/connectors \
-H "Content-Type: application/json" \
-d @order-outbox-connector.json
# 4. Check PostgreSQL WAL slot status
psql -c "SELECT * FROM pg_replication_slots;"
# 5. Re-create slot if necessary
psql -c "SELECT pg_drop_replication_slot('order_outbox_slot');"
# The slot is automatically created when the connector is re-registered
Important Note: If Debezium is stopped for an extended period while the WAL slot remains active, PostgreSQL cannot delete WAL files after that slot's position, which can fill up the disk. You must monitor WAL size and set an upper limit using the max_slot_wal_keep_size parameter.
9.2 Scenario 2: Kafka Broker Failure
Symptoms: The Debezium connector cannot publish events, and consumers cannot consume events.
Recovery Procedure:
- Restore the Kafka broker
- Debezium automatically resumes from the last successfully committed offset (managed by Kafka Connect's offset management)
- Consumers also resume from the last committed offset
- With the CDC approach, no data loss occurs as long as the database's WAL is preserved
9.3 Scenario 3: Consumer Processing Failure
Symptoms: A specific event repeatedly fails processing, preventing the consumer from making progress (poison pill).
Recovery Procedure:
- Apply a retry policy (maximum 3-5 attempts with exponential backoff)
- After exceeding the maximum retries, route the message to a DLQ (Dead Letter Queue) topic
- Manually analyze messages in the DLQ, fix the issue, and republish to the original topic
9.4 Scenario 4: Outbox Table Bloat
Symptoms: The Outbox table grows to millions of rows, degrading database performance.
Recovery Procedure:
- With the CDC approach, delete rows that have been captured after a certain retention period
- For PostgreSQL, apply partitioning and DROP old partitions
- Run periodic VACUUM operations to clean up dead tuples
-- Daily partitioned Outbox table
CREATE TABLE outbox_events (
id UUID NOT NULL DEFAULT gen_random_uuid(),
aggregate_type VARCHAR(255) NOT NULL,
aggregate_id VARCHAR(255) NOT NULL,
event_type VARCHAR(255) NOT NULL,
payload JSONB NOT NULL,
created_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW()
) PARTITION BY RANGE (created_at);
-- Automatic daily partition creation (using pg_partman)
SELECT partman.create_parent(
'public.outbox_events',
'created_at',
'native',
'daily'
);
-- Automatically drop partitions older than 7 days
UPDATE partman.part_config
SET retention = '7 days', retention_keep_table = false
WHERE parent_table = 'public.outbox_events';
10. Production Operations Checklist
Debezium/CDC Configuration:
- Verify that PostgreSQL's
wal_levelis set tologicaland the server has been restarted - Set
max_replication_slotsandmax_wal_sendersgenerously above the number of connectors - Configure
max_slot_wal_keep_sizeto prevent WAL disk exhaustion - Always set Debezium connector's
heartbeat.interval.msto ensure offset updates - Choose
snapshot.modecarefully (initialfor first deployment,schema_onlyfor subsequent restarts)
Event Design:
- Include all information the consumer needs in the event payload (self-contained events)
- Design for schema evolution: allow field additions but avoid removing existing fields
- Use aggregate_id as the Kafka partition key to guarantee event ordering for the same entity
- Keep event size under 1MB (Kafka's default maximum message size)
Consumer Implementation:
- Implement idempotency in all consumers (using a processed_events table)
- Disable auto.commit and commit offsets manually
- Configure a DLQ to prevent processing stalls caused by poison pill messages
- Set appropriate session timeout and heartbeat interval for consumer groups
Monitoring:
- Periodically check Debezium connector status (RUNNING/FAILED)
- Monitor WAL slot size and disk usage
- Monitor event publishing latency (CDC lag)
- Monitor consumer lag
- Monitor Outbox table size and row count
- Set up alerts for message counts in the DLQ topic
Failure Preparedness:
- Run Kafka Connect in distributed mode so tasks are automatically reassigned when a worker node fails
- Periodically test the connector delete-and-re-register recovery procedure
- Document WAL slot-related failure scenarios in the operations manual
- Automate end-to-end testing of the entire pipeline
References
- Debezium Official Docs - Outbox Event Router - Complete configuration options and usage for the Outbox Event Router SMT
- Debezium Blog - Reliable Microservices Data Exchange With the Outbox Pattern - Principles of microservices data exchange using the Outbox Pattern and Debezium
- Thorben Janssen - Implementing the Outbox Pattern with CDC using Debezium - Hands-on guide for implementing the Outbox Pattern in JPA/Hibernate environments
- Decodable - Revisiting the Outbox Pattern - A 2024 re-evaluation of the Outbox Pattern and analysis of alternatives
- Upsolver - Debezium vs Maxwell - Detailed analysis of CDC tool comparison and selection criteria
- RisingWave - Debezium vs Other CDC Tools - Comprehensive comparison of Debezium with competing CDC tools
- DEV Community - CDC Maxwell vs Debezium - Architecture and feature comparison between Maxwell and Debezium
- Medium - Change Data Capture vs Outbox Pattern - Analysis of the differences between CDC and the Outbox Pattern and appropriate use-case scenarios
Quiz
Q1: What is the main topic covered in "Microservices Data Synchronization with the Outbox Pattern and CDC: A Practical Debezium Guide"?
A comprehensive guide to microservices data synchronization using the Outbox Pattern and CDC (Change Data Capture).
Q2: Describe the Outbox Pattern Architecture.
1.1 Core Idea The core idea of the Outbox Pattern is straightforward: instead of publishing events directly to a message broker, write them to an Outbox table within the same transaction as the business data.
Q3: Describe the Outbox Table Design.
2.1 Basic Schema 2.2 Key Design Considerations Use aggregate_id as the Kafka partition key: Events with the same aggregate_id land in the same partition, ensuring event ordering for that entity.
Q4: What are the key aspects of CDC Concepts and How It Works?
3.1 What Is CDC? CDC (Change Data Capture) is a technology that captures database changes (INSERT, UPDATE, DELETE) in real time and delivers them to external systems. The key to CDC is reading the database's transaction log directly.
Q5: What are the key steps for Debezium Installation and Connector Configuration?
4.1 Debezium Overview Debezium is an open-source CDC platform led by Red Hat that runs on top of the Kafka Connect framework. It supports a wide range of databases including PostgreSQL, MySQL, MongoDB, SQL Server, Oracle, Cassandra, and Vitess.