[AWS] Kinesis in Practice: Kafka Comparison and Streaming Patterns

1. Kinesis vs Apache Kafka Detailed Comparison

1.1 Architecture Differences

Apache Kafka and AWS Kinesis are both platforms for real-time data streaming, but their fundamental architectural philosophies differ.

Apache Kafka Architecture
==========================

+----------+     +------------------------------------------+
| Producer | --> | Kafka Cluster                             |
+----------+     |                                          |
                 |  Broker 1    Broker 2    Broker 3        |
                 |  +-------+  +-------+  +-------+        |
                 |  |Topic A|  |Topic A|  |Topic A|        |
                 |  |Part 0 |  |Part 1 |  |Part 2 |        |
                 |  |       |  |       |  |       |        |
                 |  |Topic B|  |Topic B|  |Topic B|        |
                 |  |Part 0 |  |Part 1 |  |Part 2 |        |
                 |  +-------+  +-------+  +-------+        |
                 |                                          |
                 |  KRaft (Kafka 4.0+, ZooKeeper removed)   |
                 +------------------------------------------+
                         |
                         v
                 +----------+
                 | Consumer |
                 | Group    |
                 +----------+

AWS Kinesis Data Streams Architecture
=======================================

+----------+     +------------------------------------------+
| Producer | --> | Kinesis Stream (Fully Managed)            |
+----------+     |                                          |
                 |  Shard 1     Shard 2     Shard 3         |
                 |  +-------+  +-------+  +-------+        |
                 |  |Records|  |Records|  |Records|        |
                 |  |1MB/s W|  |1MB/s W|  |1MB/s W|        |
                 |  |2MB/s R|  |2MB/s R|  |2MB/s R|        |
                 |  +-------+  +-------+  +-------+        |
                 |                                          |
                 |  Fully managed by AWS                    |
                 +------------------------------------------+
                         |
                         v
                 +----------+
                 | Consumer |
                 | (KCL)    |
                 +----------+

1.2 Comprehensive Comparison Table

Comparison	AWS Kinesis	Apache Kafka (Self-hosted)	Amazon MSK
Management	Fully managed	Self-operated	Managed Kafka
Scale unit	Shard	Partition + Broker	Broker
Write per shard/partition	1 MB/s	No limit (disk I/O)	Disk I/O dependent
Read per shard/partition	2 MB/s (shared)	No limit	Disk I/O dependent
Max retention	365 days	Unlimited	Unlimited
Ordering	Within shard	Within partition	Within partition
Max record size	1 MB	Default 1 MB (configurable)	Default 1 MB
Consumer model	KCL, Enhanced Fan-Out	Consumer Groups	Consumer Groups
Protocol	HTTPS/HTTP2	Custom TCP protocol	Custom TCP
Ecosystem	AWS service integration	Kafka Connect, Schema Registry, etc.	Kafka ecosystem
Operational complexity	Low	High	Medium
Initial setup	Minutes	Hours to days	Tens of minutes

1.3 Throughput and Latency

Throughput Comparison
======================

Kinesis (Provisioned):
  Write: 1 MB/s x number of shards
  Read:  2 MB/s x number of shards (shared)
         2 MB/s x shards x consumers (enhanced fan-out)

  Example: 100 shards
  Write: 100 MB/s
  Read:  200 MB/s (shared) or 200 MB/s x N (enhanced fan-out)

Kafka (Self-hosted):
  Depends on broker performance
  Single broker: hundreds of MB/s possible
  Cluster: multiple GB/s achievable

  Example: 6 broker cluster
  Write: 600+ MB/s
  Read:  multiple GB/s

Latency Comparison
===================
Kinesis:
  - PutRecord: tens of ms
  - GetRecords (shared fan-out): ~200 ms
  - Enhanced Fan-Out: ~70 ms

Kafka:
  - Producer -> Consumer: ~2-10 ms (network dependent)
  - End-to-end: ~10-50 ms

1.4 Cost Comparison

Monthly Cost Estimate (US East)
=================================

Scenario: 10 MB/s sustained ingestion, 3 consumers

Kinesis (Provisioned Mode):
  - 10 shards needed (10 MB/s / 1 MB/s per shard)
  - Shard cost: 10 x 0.015 x 720 hours = ~108 USD
  - PUT units: ~360 USD (~25.9B units/month)
  - Enhanced fan-out (3 consumers): ~324 USD
  Total: ~792 USD/month

Kinesis (On-Demand Advantage):
  - Data write: ~25.9 TB x 0.032 = ~829 USD
  - Data read: ~25.9 TB x 3 x 0.016 = ~1,243 USD
  Total: ~2,072 USD/month

Kafka (Self-hosted on EC2):
  - 3 brokers (m5.xlarge): 3 x 140 = ~420 USD
  - EBS storage (1TB x 3): ~300 USD
  - Operations personnel: separate
  Total: ~720 USD/month + operational costs

Amazon MSK:
  - 3 brokers (kafka.m5.large): ~456 USD
  - Storage: ~300 USD
  Total: ~756 USD/month

1.5 When to Choose What

Choose Kinesis when:

Architecture deeply integrated with AWS
You want to minimize operational burden
Direct integration with Lambda, Firehose, and other AWS services is needed
Small to medium throughput (under tens of MB/s)
Rapid prototyping is needed

Choose Kafka when:

Very high throughput is required (multiple GB/s)
Multi-cloud or hybrid environment
Kafka Connect ecosystem is needed
Ultra-low latency is required (single-digit ms)
Unlimited data retention is needed

2. Kinesis vs SQS: When to Use Which

Comparison	Kinesis Data Streams	Amazon SQS
Processing model	Streaming (continuous)	Message queue (individual)
Consumer count	Multiple simultaneous	Primarily single consumer
Ordering	Guaranteed within shard	FIFO queue only
Data retention	24h to 365 days	Up to 14 days
Data replay	Possible (sequence number based)	Not possible (deleted after processing)
Throughput	1 MB/s write per shard	Nearly unlimited
Message size	Up to 1 MB	Up to 256 KB
Latency	Milliseconds	Milliseconds
Cost model	Shard hours + data transfer	Request-based
Primary use	Real-time analytics, log collection	Microservice decoupling

Usage Scenario Decision Flowchart
====================================

Analyze data processing requirements
         |
    +----+----+
    |         |
Do multiple     Do messages need
consumers need  single processing?
to read the          |
same data?         SQS
    |
Is real-time ordering
required?
    |
    +----+----+
    |         |
   YES        NO
    |         |
 Kinesis    SQS FIFO
 Data       or
 Streams    Kinesis

3. Amazon Managed Service for Apache Flink

3.1 Overview

Amazon Managed Service for Apache Flink (formerly Kinesis Data Analytics) allows you to run Apache Flink on fully managed infrastructure.

Note: The legacy Kinesis Data Analytics for SQL was discontinued for new application creation after October 2025, and migration to Amazon Managed Service for Apache Flink is recommended.

3.2 Key Features

Managed Flink Architecture
============================

+-----------+     +----------------------------+     +-----------+
| Sources   |     | Managed Flink              |     | Sinks     |
|           | --> |                            | --> |           |
| - Kinesis |     | +------------------------+ |     | - Kinesis |
| - MSK     |     | | Flink Application      | |     | - S3      |
| - S3      |     | |                        | |     | - DynamoDB|
|           |     | | - SQL queries          | |     | - Open-   |
|           |     | | - Java/Scala apps      | |     |   Search  |
|           |     | | - Python (PyFlink)     | |     | - Redshift|
|           |     | |                        | |     |           |
|           |     | | Window aggregation     | |     |           |
|           |     | | Pattern detection      | |     |           |
|           |     | | CEP (Complex Event     | |     |           |
|           |     | |      Processing)       | |     |           |
|           |     | +------------------------+ |     |           |
+-----------+     +----------------------------+     +-----------+

3.3 Window Processing Types

A core feature for grouping stream data by time-based windows for analysis.

Window Types
==============

1) Tumbling Window
   - Fixed size, non-overlapping
   |-------|-------|-------|-------|
   0       5       10      15      20 (sec)

2) Sliding/Hopping Window
   - Fixed size, slides at regular intervals
   |-----------|
       |-----------|
           |-----------|
   0   2   4   6   8   10 (sec)
   Size: 6s, Slide: 2s

3) Session Window
   - Activity-based, separated by inactivity gaps
   |---event-event---| gap |--event-event-event--| gap |
   <-- Session 1 -->       <------ Session 2 ---->

4) Global Window
   - Single window over the entire stream

3.4 Flink SQL Example

-- Define Kinesis source table
CREATE TABLE clickstream (
    user_id VARCHAR,
    page VARCHAR,
    action VARCHAR,
    event_time TIMESTAMP(3),
    WATERMARK FOR event_time AS event_time - INTERVAL '5' SECOND
) WITH (
    'connector' = 'kinesis',
    'stream' = 'clickstream-data',
    'aws.region' = 'us-east-1',
    'scan.stream.initpos' = 'LATEST',
    'format' = 'json'
);

-- Aggregate page views per 1-minute tumbling window
SELECT
    page,
    COUNT(*) AS view_count,
    COUNT(DISTINCT user_id) AS unique_users,
    TUMBLE_START(event_time, INTERVAL '1' MINUTE) AS window_start,
    TUMBLE_END(event_time, INTERVAL '1' MINUTE) AS window_end
FROM clickstream
WHERE action = 'view'
GROUP BY
    page,
    TUMBLE(event_time, INTERVAL '1' MINUTE);

-- Anomaly detection: 10+ clicks by same user within 1 minute
SELECT
    user_id,
    COUNT(*) AS click_count,
    TUMBLE_START(event_time, INTERVAL '1' MINUTE) AS window_start
FROM clickstream
WHERE action = 'click'
GROUP BY
    user_id,
    TUMBLE(event_time, INTERVAL '1' MINUTE)
HAVING COUNT(*) >= 10;

4. Real-World Streaming Architecture Patterns

4.1 Log Aggregation Pipeline

Log Aggregation Architecture
==============================

+----------+     +----------+     +---------+     +----------+
| App      |     | Kinesis  |     | Firehose|     | S3       |
| Server 1 | --> | Agent    | --> |         | --> | (Raw     |
+----------+     +----------+     |         |     |  Logs)   |
                                  |         |     +----------+
+----------+     +----------+     |         |         |
| App      |     | Kinesis  |     |         |         v
| Server 2 | --> | Agent    | --> |         |     +----------+
+----------+     +----------+     |         |     | Athena   |
                                  |         |     | (Query)  |
+----------+     +----------+     |         |     +----------+
| App      |     | Kinesis  |     |         |
| Server N | --> | Agent    | --> |         |
+----------+     +----------+     +---------+
                      |
                      v
                 +----------+     +----------+
                 | Lambda   | --> | Open-    |
                 | (Real-   |     | Search   |
                 |  time    |     | (Search/ |
                 |  alerts) |     | Dashboard)|
                 +----------+     +----------+

4.2 Real-Time Analytics Dashboard

Real-Time Dashboard Architecture
==================================

+----------+     +----------+     +-----------+     +----------+
| Web/     |     | API      |     | Kinesis   |     | Managed  |
| Mobile   | --> | Gateway  | --> | Data      | --> | Flink    |
| Client   |     |          |     | Streams   |     | (Aggre-  |
+----------+     +----------+     +-----------+     |  gation) |
                                                    +----------+
                                                         |
                                          +---------+----+----+---------+
                                          |         |         |         |
                                          v         v         v         v
                                     +--------+ +--------+ +--------+ +--------+
                                     |DynamoDB| |Time-   | | S3     | |Cloud-  |
                                     |(Real-  | |stream  | |(Long-  | |Watch   |
                                     | time)  | |(Time-  | | term)  | |(Metric |
                                     |        | | series)| |        | | Alarms)|
                                     +--------+ +--------+ +--------+ +--------+
                                          |         |
                                          v         v
                                     +--------------------+
                                     | Dashboard App      |
                                     | (React/Vue +       |
                                     |  WebSocket)        |
                                     +--------------------+

4.3 IoT Data Ingestion

# IoT device simulator
import boto3
import json
import time
import random
from datetime import datetime

kinesis = boto3.client('kinesis', region_name='us-east-1')
STREAM_NAME = 'iot-sensor-data'

def simulate_sensor(device_id):
    """Simulate IoT sensor data"""
    return {
        'device_id': device_id,
        'temperature': round(random.uniform(15.0, 45.0), 2),
        'humidity': round(random.uniform(20.0, 90.0), 2),
        'pressure': round(random.uniform(990.0, 1030.0), 2),
        'battery_level': round(random.uniform(0.0, 100.0), 1),
        'location': {
            'lat': round(random.uniform(33.0, 38.0), 6),
            'lon': round(random.uniform(126.0, 130.0), 6)
        },
        'timestamp': datetime.utcnow().isoformat() + 'Z'
    }

def ingest_iot_data(num_devices=100, interval=1.0):
    """Ingest IoT data into Kinesis"""
    device_ids = [f'sensor-{i:04d}' for i in range(num_devices)]

    while True:
        records = []
        for device_id in device_ids:
            sensor_data = simulate_sensor(device_id)
            records.append({
                'Data': json.dumps(sensor_data).encode('utf-8'),
                'PartitionKey': device_id
            })

        # PutRecords supports up to 500 records
        for batch_start in range(0, len(records), 500):
            batch = records[batch_start:batch_start + 500]
            response = kinesis.put_records(
                StreamName=STREAM_NAME,
                Records=batch
            )

            failed = response['FailedRecordCount']
            if failed > 0:
                print(f"Batch failed: {failed} records")
                # Exponential backoff retry logic
                retry_records = []
                for i, result in enumerate(response['Records']):
                    if 'ErrorCode' in result:
                        retry_records.append(batch[i])
                if retry_records:
                    time.sleep(0.5)
                    kinesis.put_records(
                        StreamName=STREAM_NAME,
                        Records=retry_records
                    )

        print(f"Ingested {len(records)} sensor readings")
        time.sleep(interval)

if __name__ == '__main__':
    ingest_iot_data()

4.4 Event Sourcing Pattern

Event Sourcing Architecture
=============================

+----------+     +----------+     +-----------+
| Command  |     | Kinesis  |     | Event     |
| Handler  | --> | Data     | --> | Processor |
|          |     | Streams  |     | (Lambda/  |
| - Order  |     | (Event   |     |  ECS)     |
| - Payment|     |  Store)  |     +-----------+
| - Ship   |     +----------+          |
+----------+          |          +-----+-----+
                      |          |           |
                      v          v           v
                 +----------+ +--------+ +--------+
                 | S3       | |DynamoDB| |SNS     |
                 | (Event   | |(Read   | |(Notif- |
                 |  Archive)| | Model) | | ication)|
                 +----------+ +--------+ +--------+

Event Flow Example:
1. OrderCreated -> Order creation event
2. PaymentProcessed -> Payment processing event
3. InventoryReserved -> Inventory reservation event
4. ShipmentCreated -> Shipment creation event

4.5 ML Feature Pipeline

ML Feature Pipeline
=====================

+----------+     +----------+     +-----------+     +----------+
| Event    |     | Kinesis  |     | Managed   |     | Feature  |
| Sources  | --> | Data     | --> | Flink     | --> | Store    |
|          |     | Streams  |     | (Feature  |     | (Sage-   |
| - Clicks |     |          |     |  Compute) |     |  Maker)  |
| - Purch. |     |          |     |           |     +----------+
| - Search |     |          |     | Real-time:|         |
+----------+     +----------+     | - Session |         v
                                  |   count   |     +----------+
                                  | - Recent  |     | ML Model |
                                  |   purch.  |     | Inference|
                                  | - Avg     |     +----------+
                                  |   dwell   |
                                  +-----------+

5. Performance Optimization

5.1 Partition Key Design

Partition key design is the most critical factor for Kinesis performance.

Criteria for good partition keys:

High cardinality (many unique values)
Even distribution (no skew toward specific keys)
At least 10x as many unique keys as the number of shards

Good Partition Key Examples
============================

1) User ID (high cardinality)
   user-001 -> Shard 1
   user-002 -> Shard 3
   user-003 -> Shard 2
   ...
   Evenly distributed

2) UUID (best distribution)
   Random UUID -> Perfect distribution
   Downside: Cannot guarantee ordering for same entity

3) Composite key
   "region-userType-userId"
   Fine-grained distribution control

Bad Partition Key Examples
===========================

1) Date ("2026-03-20")
   All records to same shard -> Hot shard

2) Country code ("KR", "US", "JP")
   Cardinality too low
   Traffic skews toward specific countries

3) Fixed value ("default")
   All load concentrated on single shard

5.2 Aggregation with KPL

KPL Aggregation Optimization
==============================

Without aggregation:
Record 1 (100B) -> PutRecord -> 1 API call
Record 2 (200B) -> PutRecord -> 1 API call
Record 3 (150B) -> PutRecord -> 1 API call
Total: 3 API calls, 450B transmitted

With KPL aggregation:
Record 1 (100B) --+
Record 2 (200B) --+--> Aggregated record (450B) -> 1 API call
Record 3 (150B) --+
Total: 1 API call, 450B transmitted

Benefits:
- Dramatically reduced API calls
- Lower PUT unit costs
- Significantly improved throughput

5.3 Enhanced Fan-Out Strategy

# Register enhanced fan-out consumer
import boto3

kinesis = boto3.client('kinesis', region_name='us-east-1')

# Register consumer
response = kinesis.register_stream_consumer(
    StreamARN='arn:aws:kinesis:us-east-1:123456789012:stream/my-stream',
    ConsumerName='analytics-consumer'
)
consumer_arn = response['Consumer']['ConsumerARN']
print(f"Consumer ARN: {consumer_arn}")

# Check consumer status
response = kinesis.describe_stream_consumer(
    StreamARN='arn:aws:kinesis:us-east-1:123456789012:stream/my-stream',
    ConsumerName='analytics-consumer'
)
print(f"Status: {response['ConsumerDescription']['ConsumerStatus']}")

5.4 Error Handling and Retry Strategy

import time
import random

def put_records_with_retry(kinesis_client, stream_name, records, max_retries=3):
    """PutRecords retry with exponential backoff"""

    for attempt in range(max_retries):
        response = kinesis_client.put_records(
            StreamName=stream_name,
            Records=records
        )

        failed_count = response['FailedRecordCount']

        if failed_count == 0:
            return response

        # Extract only failed records
        retry_records = []
        for i, result in enumerate(response['Records']):
            if 'ErrorCode' in result:
                error_code = result['ErrorCode']
                if error_code == 'ProvisionedThroughputExceededException':
                    retry_records.append(records[i])
                else:
                    print(f"Non-retryable error: {error_code}")

        if not retry_records:
            return response

        records = retry_records

        # Exponential backoff + jitter
        backoff = min(2 ** attempt * 0.1, 5.0)
        jitter = random.uniform(0, backoff * 0.5)
        wait_time = backoff + jitter
        print(f"Retry {attempt + 1}: {len(retry_records)} records, waiting {wait_time:.2f}s")
        time.sleep(wait_time)

    print(f"Failed after {max_retries} retries: {len(records)} records")
    return None

6. Monitoring: CloudWatch Metrics

6.1 Key Monitoring Metrics

Metric	Description	Alert Threshold
IncomingBytes	Bytes entering the stream	80% of shard capacity
IncomingRecords	Records entering the stream	800 rec/s per shard
GetRecords.IteratorAgeMilliseconds	How far behind a consumer is	60,000 ms (1 min)
WriteProvisionedThroughputExceeded	Write throttle count	Alarm when above 0
ReadProvisionedThroughputExceeded	Read throttle count	Alarm when above 0
GetRecords.Latency	GetRecords call latency	1,000 ms
PutRecord.Latency	PutRecord call latency	1,000 ms
GetRecords.Success	GetRecords success rate	Alarm when below 99%

6.2 Enhanced Monitoring

Additional Metrics with Enhanced Monitoring
=============================================

Shard-level metrics:
- IncomingBytes (per shard)
- IncomingRecords (per shard)
- IteratorAgeMilliseconds (per shard)
- OutgoingBytes (per shard)
- OutgoingRecords (per shard)
- ReadProvisionedThroughputExceeded (per shard)
- WriteProvisionedThroughputExceeded (per shard)

Hot Shard Detection:
  Shard 1: IncomingBytes = 200 KB/s  [Normal]
  Shard 2: IncomingBytes = 950 KB/s  [Warning! Near limit]
  Shard 3: IncomingBytes = 300 KB/s  [Normal]
  -> Recommend splitting Shard 2

7. Best Practices and Anti-Patterns

7.1 Best Practices

1) Partition Key Design

Use high-cardinality keys (user IDs, device IDs)
Ensure at least 10x unique keys relative to shard count
Use entity IDs as partition keys when ordering is needed

2) Producer Optimization

Use PutRecords (batch) API to minimize API calls
Use KPL for record aggregation and collection optimization
Implement proper retry logic (exponential backoff + jitter)

3) Consumer Optimization

Use enhanced fan-out for multiple consumers
Use KCL for automated distributed processing
Optimize checkpointing frequency (too frequent increases DynamoDB costs)

4) Capacity Management

Predictable workloads: Provisioned mode
Irregular workloads: On-demand mode
Trigger auto-scaling with CloudWatch alarms

5) Cost Optimization

Consider On-Demand Advantage mode (released 2025)
Set retention period only as long as needed
Deregister unnecessary enhanced fan-out consumers

7.2 Anti-Patterns

1) Single Partition Key

All data concentrates on one shard
Immediately hits shard limits

2) Excessive Shard Count

Increased costs and management complexity
Increased DynamoDB lease table load for KCL

3) Not Using Checkpoints

Duplicate processing or data loss on failure
Always implement proper checkpointing strategy

4) Missing Error Handling

Ignoring ProvisionedThroughputExceededException
Losing failed data without retries

5) Excessive GetRecords Calls

Must respect 5 calls per second per shard limit
Set appropriate polling intervals

8. Comprehensive Comparison Summary Table

Item	Kinesis Data Streams	Kinesis Firehose	Kafka	SQS	Managed Flink
Type	Data streaming	Data delivery	Data streaming	Message queue	Stream processing
Management	Fully managed	Fully managed	Self/MSK	Fully managed	Fully managed
Latency	ms	60s+	ms	ms	ms
Ordering	Within shard	None	Within partition	FIFO only	Input dependent
Replay	Yes	No	Yes	No	Input dependent
Scaling	Add shards	Automatic	Partition/broker	Automatic	Add KPUs
Transform	None (consumer)	Lambda	Kafka Streams	None	Flink app
Cost model	Shard+data	Data volume	Instance	Requests	KPU hours
AWS integration	High	Very high	Low/Medium	High	High
Best for	Real-time ingest	Auto delivery	High-volume streaming	Task queue	Real-time analytics

9. Summary

Service Selection Guide

When designing a real-time streaming architecture, choosing the right service for your requirements is crucial.

If simple data delivery is the goal: Use Amazon Data Firehose for direct delivery to S3, Redshift, etc.
If real-time processing with multiple consumers is needed: Kinesis Data Streams + KCL or Enhanced Fan-Out
If complex stream analytics are needed: Managed Service for Apache Flink
If high-volume processing and ecosystem are needed: Apache Kafka or Amazon MSK
If inter-microservice messaging is needed: Amazon SQS

Understand each service's strengths and consider combining multiple services for optimal production patterns. For example, collecting with Kinesis Data Streams, analyzing in real-time with Managed Flink, and storing long-term in S3 via Data Firehose is a very common architecture combination.