Database Complete Comparison 2025: PostgreSQL vs MySQL vs MongoDB vs Redis — Selection Guide

Introduction
1. 2025 Database Rankings and Trends
- DB-Engines Rankings (March 2025)
- Key Trends in 2025
2. RDBMS vs NoSQL vs NewSQL — When to Use What
3. PostgreSQL Deep Dive
4. MySQL Deep Dive
5. MongoDB Deep Dive
6. Redis Deep Dive
7. DynamoDB — The Serverless NoSQL Standard
8. Cassandra / ScyllaDB — The Kings of High-Volume Writes
- Cassandra Architecture
- ScyllaDB — High-Performance Cassandra-Compatible Alternative
9. Performance Benchmark Comparison
- YCSB Benchmark (Yahoo Cloud Serving Benchmark)
- Scalability Comparison
10. Use Case Selection Matrix
- Optimal Database Selection Guide
11. Multi-DB Architecture Patterns
12. Cost Comparison (Managed Services)
- Monthly Cost Estimate (Medium-Scale Service: 100GB Data, 5000 TPS)
- Cost Optimization Strategies
13. Interview Questions — 20 Essential Questions
14. Quiz
References

Introduction

The database ecosystem in 2025 is more diverse and complex than ever before. The explosive growth of AI/ML workloads has made vector search essential, and new requirements like real-time streaming processing, multi-model support, and serverless architectures are making database selection increasingly challenging.

In this article, we systematically compare the architecture, performance, cost, and use cases of 8 major databases, centered on PostgreSQL, MySQL, MongoDB, and Redis, extending to DynamoDB and Cassandra/ScyllaDB. Rather than a simple feature listing, we provide a practical guide to assist real-world production decision-making.

1. 2025 Database Rankings and Trends

DB-Engines Rankings (March 2025)

Rank	Database	Type	Score	YoY Change
1	Oracle	RDBMS	1,245	-3.2%
2	MySQL	RDBMS	1,142	-1.8%
3	PostgreSQL	RDBMS	685	+12.4%
4	MongoDB	Document	397	+2.1%
5	Redis	Key-Value	168	-5.3%
6	Elasticsearch	Search	155	+1.7%
7	SQLite	RDBMS	142	+8.9%
8	Cassandra	Wide-Column	112	-4.1%

Key Trends in 2025

1. AI is Changing Database Selection

As RAG (Retrieval-Augmented Generation) pipelines become ubiquitous, vector search capabilities have become a core criterion for database selection. PostgreSQL's pgvector, MongoDB Atlas Vector Search, and Redis Stack are leading this competition.

2. The Rise of Serverless Databases

Usage-based pricing models like Neon (PostgreSQL serverless), PlanetScale (MySQL serverless), MongoDB Atlas Serverless, and DynamoDB on-demand are becoming the standard.

3. Multi-Model Databases

The multi-model approach of handling relational, document, graph, and time-series data within a single database is growing. PostgreSQL leads this space with extensions like JSONB, pgvector, and PostGIS.

4. The Emergence of Valkey

In response to Redis Labs' license change (SSPL), the Linux Foundation released the Valkey fork. With AWS ElastiCache, Google Cloud Memorystore, and others adopting Valkey, significant changes are underway in the Redis ecosystem.

2. RDBMS vs NoSQL vs NewSQL — When to Use What

Understanding the CAP Theorem in Practice

This theorem states that in distributed systems, you cannot simultaneously achieve perfect Consistency, Availability, and Partition Tolerance.

         Consistency (C)
            /\
           /  \
          /    \
    CP   / CAP  \ CA
  system/ impossi\ system
        /ble     \
       /________\
  Partition(P)   Availability(A)

Type	Combination	Databases	Characteristics
CA	Consistency + Availability	Traditional RDBMS (single node)	Unavailable during network partitions
CP	Consistency + Partition Tolerance	MongoDB, HBase	Sacrifices some availability
AP	Availability + Partition Tolerance	Cassandra, DynamoDB	Eventual consistency

When to Choose RDBMS

Transaction integrity is mandatory (finance, payments, inventory management)
Complex JOINs and aggregate queries are frequent
Data schema is clear and stable
ACID compliance is a legal/regulatory requirement

When to Choose NoSQL

Schema changes frequently or data is unstructured
Horizontal scaling is a core requirement
Sub-millisecond read/write latency needed
High-volume fast write processing

When to Choose NewSQL

NewSQL databases like CockroachDB, TiDB, and YugabyteDB provide both RDBMS transaction guarantees and NoSQL horizontal scalability.

Globally distributed transactional systems
Need horizontal scaling while maintaining RDBMS compatibility
However, maturity and community size still lag behind traditional RDBMS

3. PostgreSQL Deep Dive

Architecture Overview

PostgreSQL uses a process-based architecture. Each client connection gets a dedicated backend process.

┌─────────────────────────────────────────────┐
│                 PostgreSQL                   │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐  │
│  │ Backend  │  │ Backend  │  │ Backend  │  │
│  │ Process  │  │ Process  │  │ Process  │  │
│  └────┬─────┘  └────┬─────┘  └────┬─────┘  │
│       │             │             │         │
│  ┌────┴─────────────┴─────────────┴────┐    │
│  │          Shared Memory              │    │
│  │  ┌──────────┐  ┌──────────────┐    │    │
│  │  │ Shared   │  │    WAL       │    │    │
│  │  │ Buffers  │  │  Buffers     │    │    │
│  │  └──────────┘  └──────────────┘    │    │
│  └─────────────────────────────────────┘    │
│                                             │
│  ┌─────────────────────────────────────┐    │
│  │     Background Workers              │    │
│  │  WAL Writer | Checkpointer |        │    │
│  │  Autovacuum | Stats Collector       │    │
│  └─────────────────────────────────────┘    │
└─────────────────────────────────────────────┘

MVCC (Multi-Version Concurrency Control)

PostgreSQL's MVCC maintains multiple versions of each row to minimize lock conflicts between readers and writers.

-- Set transaction isolation level
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;

-- Verify MVCC behavior
BEGIN;
SELECT xmin, xmax, ctid, * FROM orders WHERE id = 1;
-- xmin: creation transaction ID, xmax: deletion transaction ID
-- ctid: physical location (page, offset)
COMMIT;

JSONB — The Ultimate Relational + Document Combination

-- Create table with JSONB column
CREATE TABLE products (
    id SERIAL PRIMARY KEY,
    name VARCHAR(255) NOT NULL,
    metadata JSONB NOT NULL DEFAULT '{}',
    created_at TIMESTAMPTZ DEFAULT NOW()
);

-- Create GIN index (optimizes JSONB search)
CREATE INDEX idx_products_metadata ON products USING GIN (metadata);

-- JSONB query examples
SELECT name, metadata->>'category' AS category,
       (metadata->'specs'->>'weight')::numeric AS weight
FROM products
WHERE metadata @> '{"category": "electronics"}'
  AND (metadata->'specs'->>'weight')::numeric < 500;

-- JSONB aggregation
SELECT metadata->>'category' AS category,
       COUNT(*),
       AVG((metadata->'price'->>'amount')::numeric) AS avg_price
FROM products
GROUP BY metadata->>'category';

pgvector — The Game Changer for the AI/RAG Era

-- Install pgvector extension
CREATE EXTENSION IF NOT EXISTS vector;

-- Table with vector column
CREATE TABLE documents (
    id SERIAL PRIMARY KEY,
    title TEXT NOT NULL,
    content TEXT NOT NULL,
    embedding vector(1536)  -- OpenAI text-embedding-3-small dimensions
);

-- HNSW index (fast approximate nearest neighbor search)
CREATE INDEX ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);

-- IVFFlat index (memory efficient)
CREATE INDEX ON documents
USING ivfflat (embedding vector_l2_ops)
WITH (lists = 100);

-- Cosine similarity search
SELECT id, title,
       1 - (embedding <=> query_embedding) AS similarity
FROM documents
ORDER BY embedding <=> query_embedding
LIMIT 10;

-- Hybrid search (vector + full-text)
SELECT id, title,
       ts_rank(to_tsvector('english', content), plainto_tsquery('english', 'search term')) AS text_score,
       1 - (embedding <=> query_embedding) AS vector_score
FROM documents
WHERE to_tsvector('english', content) @@ plainto_tsquery('english', 'search term')
ORDER BY vector_score * 0.7 + text_score * 0.3 DESC
LIMIT 10;

pg_trgm — Fuzzy Search and Similarity Matching

CREATE EXTENSION IF NOT EXISTS pg_trgm;

-- Trigram index
CREATE INDEX idx_products_name_trgm ON products
USING GIN (name gin_trgm_ops);

-- Similarity search (typo-tolerant)
SELECT name, similarity(name, 'PostgreSLQ') AS sim
FROM products
WHERE name % 'PostgreSLQ'
ORDER BY sim DESC;

Partitioning

-- Range partitioning example
CREATE TABLE orders (
    id BIGSERIAL,
    order_date DATE NOT NULL,
    customer_id INTEGER NOT NULL,
    total_amount NUMERIC(10,2),
    PRIMARY KEY (id, order_date)
) PARTITION BY RANGE (order_date);

-- Create partitions
CREATE TABLE orders_2025_q1 PARTITION OF orders
    FOR VALUES FROM ('2025-01-01') TO ('2025-04-01');
CREATE TABLE orders_2025_q2 PARTITION OF orders
    FOR VALUES FROM ('2025-04-01') TO ('2025-07-01');

-- Automatic partition management (pg_partman)
CREATE EXTENSION pg_partman;
SELECT partman.create_parent(
    p_parent_table => 'public.orders',
    p_control => 'order_date',
    p_type => 'native',
    p_interval => '3 months'
);

PostgreSQL 17 Key Features

Incremental Backup: Incremental backup support in pg_basebackup
Enhanced SQL/JSON Standard Support: JSON_TABLE, JSON_QUERY, etc.
Performance Improvements: Improved parallel hash joins, bulk loading optimizations
Security Enhancements: Improved LDAP/SCRAM authentication

4. MySQL Deep Dive

InnoDB Architecture

┌─────────────────────────────────────────────┐
│                MySQL Server                  │
│  ┌───────────────────────────────────────┐  │
│  │          Connection Pool              │  │
│  │   Thread 1 | Thread 2 | Thread N     │  │
│  └───────────────────────────────────────┘  │
│  ┌───────────────────────────────────────┐  │
│  │     Query Cache (removed in 8.0+)    │  │
│  │          Parser -> Optimizer          │  │
│  └───────────────────────────────────────┘  │
│  ┌───────────────────────────────────────┐  │
│  │            InnoDB Engine              │  │
│  │  ┌──────────┐  ┌──────────────────┐  │  │
│  │  │ Buffer   │  │  Change Buffer   │  │  │
│  │  │ Pool     │  │                  │  │  │
│  │  └──────────┘  └──────────────────┘  │  │
│  │  ┌──────────┐  ┌──────────────────┐  │  │
│  │  │ Redo Log │  │  Undo Log        │  │  │
│  │  └──────────┘  └──────────────────┘  │  │
│  └───────────────────────────────────────┘  │
└─────────────────────────────────────────────┘

MySQL uses a thread-based architecture, contrasting with PostgreSQL's process-based approach. This results in lower connection overhead and higher memory efficiency.

MySQL vs PostgreSQL Key Differences

-- MySQL: UPSERT (INSERT ... ON DUPLICATE KEY)
INSERT INTO products (id, name, stock)
VALUES (1, 'Widget', 100)
ON DUPLICATE KEY UPDATE stock = stock + VALUES(stock);

-- PostgreSQL: UPSERT (INSERT ... ON CONFLICT)
INSERT INTO products (id, name, stock)
VALUES (1, 'Widget', 100)
ON CONFLICT (id)
DO UPDATE SET stock = products.stock + EXCLUDED.stock;

HeatWave ML — ML Directly in MySQL

-- Train model with HeatWave ML
CALL sys.ML_TRAIN('my_schema.customer_data',
    'churn_label',
    JSON_OBJECT('task', 'classification'),
    @model_handle);

-- Run prediction
CALL sys.ML_PREDICT_ROW(
    JSON_OBJECT('age', 35, 'tenure', 24, 'monthly_charge', 79.99),
    @model_handle,
    @prediction);

SELECT @prediction;

Group Replication

MySQL Group Replication uses a Paxos-based consensus protocol for high availability.

-- Check Group Replication status
SELECT MEMBER_ID, MEMBER_HOST, MEMBER_PORT, MEMBER_STATE, MEMBER_ROLE
FROM performance_schema.replication_group_members;

-- Single-primary mode (default, recommended)
-- One write node, rest are read-only
SET GLOBAL group_replication_single_primary_mode = ON;

Read/Write Splitting with ProxySQL

-- ProxySQL routing rules
INSERT INTO mysql_query_rules (rule_id, active, match_pattern, destination_hostgroup)
VALUES
    (1, 1, '^SELECT .* FOR UPDATE', 0),   -- Writer group
    (2, 1, '^SELECT', 1);                  -- Reader group

-- Server group configuration
INSERT INTO mysql_servers (hostgroup_id, hostname, port, weight)
VALUES
    (0, 'primary.mysql.local', 3306, 1),    -- Writer
    (1, 'replica1.mysql.local', 3306, 50),  -- Reader
    (1, 'replica2.mysql.local', 3306, 50);  -- Reader

MySQL 9.0 Key Features

JavaScript Stored Programs: GraalVM-based JavaScript stored procedures
Vector Data Type: Native vector data type support
EXPLAIN Improvements: FORMAT=TREE default, more intuitive execution plans
Security: OpenID Connect authentication support

5. MongoDB Deep Dive

Strengths of the Document Model

// MongoDB document structure — embedding pattern
db.orders.insertOne({
  orderId: 'ORD-2025-001',
  customer: {
    name: 'John Dev',
    email: 'john@example.com',
    address: {
      street: '123 Tech Street',
      city: 'San Francisco',
      zipCode: '94102',
    },
  },
  items: [
    {
      productId: 'PROD-001',
      name: 'Wireless Keyboard',
      price: 89.0,
      quantity: 2,
    },
    {
      productId: 'PROD-002',
      name: 'Monitor Stand',
      price: 45.0,
      quantity: 1,
    },
  ],
  payment: {
    method: 'card',
    status: 'completed',
    paidAt: ISODate('2025-03-15T10:30:00Z'),
  },
  totalAmount: 223.0,
  createdAt: ISODate('2025-03-15T10:25:00Z'),
})

Atlas Vector Search

// Create vector search index
db.collection.createSearchIndex({
  name: 'vector_index',
  definition: {
    mappings: {
      dynamic: true,
      fields: {
        embedding: {
          type: 'knnVector',
          dimensions: 1536,
          similarity: 'cosine',
        },
      },
    },
  },
})

// Vector search query
db.documents.aggregate([
  {
    $vectorSearch: {
      index: 'vector_index',
      path: 'embedding',
      queryVector: queryEmbedding, // 1536-dimensional vector
      numCandidates: 100,
      limit: 10,
      filter: { category: 'tech' },
    },
  },
  {
    $project: {
      title: 1,
      content: 1,
      score: { $meta: 'vectorSearchScore' },
    },
  },
])

Advanced Aggregation Pipeline

// Monthly revenue analysis pipeline
db.orders.aggregate([
  // Period filter
  {
    $match: {
      createdAt: {
        $gte: ISODate('2025-01-01'),
        $lt: ISODate('2026-01-01'),
      },
      'payment.status': 'completed',
    },
  },
  // Decompose order items
  { $unwind: '$items' },
  // Group by month
  {
    $group: {
      _id: {
        year: { $year: '$createdAt' },
        month: { $month: '$createdAt' },
        category: '$items.category',
      },
      totalRevenue: { $sum: { $multiply: ['$items.price', '$items.quantity'] } },
      orderCount: { $sum: 1 },
      avgOrderValue: { $avg: { $multiply: ['$items.price', '$items.quantity'] } },
    },
  },
  // Sort
  { $sort: { '_id.year': 1, '_id.month': 1 } },
  // Transform output
  {
    $project: {
      _id: 0,
      period: { $concat: [{ $toString: '$_id.year' }, '-', { $toString: '$_id.month' }] },
      category: '$_id.category',
      totalRevenue: 1,
      orderCount: 1,
      avgOrderValue: { $round: ['$avgOrderValue', 0] },
    },
  },
])

MongoDB Sharding

// Shard key selection — consider cardinality, distribution, query patterns
sh.enableSharding('ecommerce')

// Hash sharding (even distribution)
sh.shardCollection('ecommerce.orders', { customerId: 'hashed' })

// Range sharding (optimized for range queries)
sh.shardCollection('ecommerce.logs', { timestamp: 1 })

// Zone sharding (region-based data separation)
sh.addShardTag('shard-us', 'AMERICAS')
sh.addShardTag('shard-eu', 'EUROPE')
sh.addTagRange('ecommerce.users', { region: 'US' }, { region: 'UT' }, 'AMERICAS')

MongoDB 8.0 Key Features

Queryable Encryption: Query data while encrypted
Cluster-to-Cluster Sync: Multi-cloud synchronization
Time Series Collection Improvements: More efficient time-series data handling
Performance: 20-35% improvement in read/write performance

6. Redis Deep Dive

Data Structures and Use Cases

-- Strings: cache, sessions, counters
SET user:session:abc123 "session_data" EX 3600
INCR page:views:homepage

-- Hashes: object storage
HSET user:1001 name "John Dev" email "john@dev.com" login_count 42
HINCRBY user:1001 login_count 1

-- Sorted Sets: leaderboards, rankings
ZADD leaderboard 9500 "player:alice"
ZADD leaderboard 8700 "player:bob"
ZADD leaderboard 9200 "player:charlie"
ZREVRANGE leaderboard 0 9 WITHSCORES  -- Top 10

-- Streams: event streaming
XADD mystream * sensor_id "temp-01" temperature "23.5" humidity "45"
XREAD COUNT 10 BLOCK 5000 STREAMS mystream 0

-- HyperLogLog: approximate unique counts
PFADD unique:visitors:2025-03-23 "user1" "user2" "user3"
PFCOUNT unique:visitors:2025-03-23

Redis Stack — Search, JSON, TimeSeries

-- Redis JSON
JSON.SET product:001 $ '{"name":"Wireless Keyboard","price":89.00,"tags":["electronics","keyboard"],"specs":{"weight":350,"wireless":true}}'
JSON.GET product:001 $.specs.weight

-- Redis Search index creation
FT.CREATE idx:products ON JSON PREFIX 1 product:
  SCHEMA
    $.name AS name TEXT SORTABLE
    $.price AS price NUMERIC SORTABLE
    $.tags[*] AS tags TAG
    $.specs.weight AS weight NUMERIC

-- Full-text search
FT.SEARCH idx:products "@name:keyboard @price:[50 100]"

-- Vector search (Redis Stack)
FT.CREATE idx:docs ON HASH PREFIX 1 doc:
  SCHEMA
    title TEXT
    embedding VECTOR HNSW 6 TYPE FLOAT32 DIM 1536 DISTANCE_METRIC COSINE

-- KNN search
FT.SEARCH idx:docs "*=>[KNN 10 @embedding $query_vec AS score]"
  PARAMS 2 query_vec "\x00\x00..."
  SORTBY score
  RETURN 2 title score

-- Redis TimeSeries
TS.CREATE temperature:sensor1 RETENTION 86400000 LABELS sensor_id temp-01 location office
TS.ADD temperature:sensor1 * 23.5
TS.RANGE temperature:sensor1 - + AGGREGATION avg 60000  -- 1-minute average

Persistence: RDB vs AOF

Feature	RDB (Snapshot)	AOF (Append Only File)
Mechanism	Periodic memory dump	Logs every write command
Data Loss	Since last snapshot	Depends on fsync policy
Recovery Speed	Fast (binary load)	Slow (command replay)
File Size	Small (compressed)	Large (all commands logged)
CPU Impact	High during fork()	Low with everysec

# Recommended redis.conf settings
save 900 1       # RDB if 1+ changes in 900 seconds
save 300 10      # RDB if 10+ changes in 300 seconds
appendonly yes   # Enable AOF
appendfsync everysec  # fsync every second (performance/safety balance)

Redis Cluster

# Create 6-node cluster (3 masters + 3 replicas)
redis-cli --cluster create \
  node1:6379 node2:6379 node3:6379 \
  node4:6379 node5:6379 node6:6379 \
  --cluster-replicas 1

Redis Cluster uses 16384 hash slots to distribute keys. The key space is mapped to slots using CRC16 hashing.

The Valkey Fork — The Future of Redis?

After Redis Labs switched to the SSPL license in 2024, Valkey was born under the Linux Foundation.

Item	Redis	Valkey
License	SSPL (2024~)	BSD-3-Clause
Cloud Support	Redis Cloud	AWS, GCP, Azure
Compatibility	Original	Redis 7.2 compatible
Development	Redis Ltd.	Linux Foundation
Community	Limited contributions	Open contributions

7. DynamoDB — The Serverless NoSQL Standard

Core Concepts

┌───────────────────────────────────────────┐
│              DynamoDB Table                │
│  ┌─────────────────────────────────────┐  │
│  │  Partition Key (PK)  Sort Key (SK)  │  │
│  ├─────────────────────────────────────┤  │
│  │  USER#001            PROFILE        │  │
│  │  USER#001            ORDER#001      │  │
│  │  USER#001            ORDER#002      │  │
│  │  USER#002            PROFILE        │  │
│  │  USER#002            ORDER#001      │  │
│  └─────────────────────────────────────┘  │
│                                           │
│  GSI: OrderDate-Index                     │
│  ┌─────────────────────────────────────┐  │
│  │  OrderDate (PK)     UserId (SK)     │  │
│  └─────────────────────────────────────┘  │
└───────────────────────────────────────────┘

Single-Table Design Pattern

import boto3

dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('EcommerceTable')

# Store user profile
table.put_item(Item={
    'PK': 'USER#001',
    'SK': 'PROFILE',
    'name': 'John Dev',
    'email': 'john@dev.com',
    'type': 'USER'
})

# Store order in the same table
table.put_item(Item={
    'PK': 'USER#001',
    'SK': 'ORDER#2025-03-15#ORD001',
    'items': [{'name': 'Keyboard', 'price': 89.00}],
    'total': 89.00,
    'status': 'completed',
    'type': 'ORDER'
})

# Query all orders for a user
response = table.query(
    KeyConditionExpression='PK = :pk AND begins_with(SK, :sk)',
    ExpressionAttributeValues={
        ':pk': 'USER#001',
        ':sk': 'ORDER#'
    }
)

DynamoDB Pricing Models

Mode	Read Cost	Write Cost	Best For
On-Demand	$0.25/1M RRU	$1.25/1M WRU	Unpredictable traffic
Provisioned	$0.00065/RCU/hour	$0.00065/WCU/hour	Stable traffic
Reserved	Up to 77% discount	Up to 77% discount	1-year/3-year commitment

8. Cassandra / ScyllaDB — The Kings of High-Volume Writes

Cassandra Architecture

Cassandra combines a Dynamo-style distributed hash table with a BigTable-style data model.

┌────────┐  ┌────────┐  ┌────────┐
│ Node A │──│ Node B │──│ Node C │
│ Token: │  │ Token: │  │ Token: │
│ 0-33   │  │ 34-66  │  │ 67-100 │
└───┬────┘  └───┬────┘  └───┬────┘
    │           │           │
    └───────────┼───────────┘
            Gossip Protocol

-- Create keyspace (with replication strategy)
CREATE KEYSPACE ecommerce WITH replication = {
    'class': 'NetworkTopologyStrategy',
    'us-east': 3,
    'eu-west': 3
};

-- Create table (partition key + clustering key)
CREATE TABLE ecommerce.user_activity (
    user_id UUID,
    activity_date DATE,
    activity_time TIMESTAMP,
    activity_type TEXT,
    details MAP<TEXT, TEXT>,
    PRIMARY KEY ((user_id, activity_date), activity_time)
) WITH CLUSTERING ORDER BY (activity_time DESC)
  AND compaction = {'class': 'TimeWindowCompactionStrategy',
                    'compaction_window_size': 1,
                    'compaction_window_unit': 'DAYS'};

ScyllaDB — High-Performance Cassandra-Compatible Alternative

ScyllaDB is a Cassandra-compatible database rewritten in C++. Its shard-per-core architecture delivers 3-10x higher performance on identical hardware.

Comparison	Cassandra	ScyllaDB
Language	Java	C++ (Seastar)
Architecture	Thread-based	Shard-per-core
GC Impact	GC pauses exist	No GC
Same Hardware Performance	Baseline	3-10x
Cassandra Compatibility	Original	CQL compatible

9. Performance Benchmark Comparison

YCSB Benchmark (Yahoo Cloud Serving Benchmark)

The following comparison is on identical hardware (8 vCPU, 32GB RAM, NVMe SSD):

Operation	PostgreSQL	MySQL	MongoDB	Redis	DynamoDB	ScyllaDB
Read Latency (p99)	2.1ms	1.8ms	1.5ms	0.3ms	4.2ms	0.9ms
Write Latency (p99)	3.5ms	2.9ms	2.1ms	0.4ms	5.8ms	1.2ms
Read TPS	45K	52K	58K	250K	25K	120K
Write TPS	28K	35K	42K	200K	20K	85K
Mixed (50/50) TPS	35K	42K	48K	220K	22K	100K

Note: Benchmark results vary significantly based on workload patterns, data size, index configuration, etc.

Scalability Comparison

Scaling Method	PostgreSQL	MySQL	MongoDB	Redis	DynamoDB	Cassandra
Vertical	Excellent	Excellent	Good	Good	N/A	Good
Read Scale	Replicas	Replicas	Replica Set	Cluster	Auto	Add Nodes
Write Scale	Limited	Limited	Sharding	Cluster	Auto	Add Nodes
Max Data Size	Several TB	Several TB	Several PB	Hundreds GB	Unlimited	Several PB

10. Use Case Selection Matrix

Optimal Database Selection Guide

Use Case	1st Choice	2nd Choice	Reason
E-commerce	PostgreSQL	MySQL	Transactions + JSONB flexibility
Social Media	MongoDB	Cassandra	Unstructured data + high writes
IoT Sensor Data	ScyllaDB	DynamoDB	High-volume time-series writes
Real-time Analytics	PostgreSQL	ClickHouse	Complex aggregate queries
RAG/AI Search	PostgreSQL (pgvector)	MongoDB (Atlas Vector)	Vector + full-text search
Cache	Redis	DragonflyDB	Ultra-low latency, rich data structures
Session Store	Redis	DynamoDB	Fast read/write, TTL
Chat/Messaging	MongoDB	Cassandra	Flexible schema, time-ordered
Game Leaderboard	Redis	DynamoDB	Sorted Set, fast updates
Log Management	Elasticsearch	MongoDB	Full-text search, time-series
Geospatial	PostgreSQL (PostGIS)	MongoDB	GIS capabilities, spatial indexes
Graph Data	Neo4j	PostgreSQL (AGE)	Optimized relationship traversal

11. Multi-DB Architecture Patterns

Polyglot Persistence

┌─────────────────────────────────────────────────┐
│                  Application Layer               │
└───────┬──────────┬──────────┬──────────┬────────┘
        │          │          │          │
   ┌────▼────┐ ┌──▼───┐ ┌───▼───┐ ┌───▼────┐
   │PostgreSQL│ │Redis │ │MongoDB│ │Elastic │
   │(Orders/  │ │(Cache)│ │(Prods)│ │(Search)│
   │ Payment) │ │      │ │       │ │        │
   └─────────┘ └──────┘ └───────┘ └────────┘

CQRS Pattern (Command Query Responsibility Segregation)

Write (Command):                  Read (Query):
┌──────────┐                     ┌──────────┐
│  Client   │                     │  Client   │
└────┬─────┘                     └────┬─────┘
     │                                │
┌────▼─────┐                     ┌────▼─────┐
│ Command  │                     │  Query   │
│ Service  │                     │ Service  │
└────┬─────┘                     └────┬─────┘
     │                                │
┌────▼─────┐  ──Events──▶  ┌─────▼─────┐
│PostgreSQL│              │  Redis /   │
│ (Source  │              │  MongoDB   │
│  of Truth)│              │ (Read Model)│
└──────────┘              └───────────┘

Cache-Aside Pattern

import redis
import psycopg2
import json

r = redis.Redis(host='localhost', port=6379)

def get_product(product_id):
    # 1. Check cache
    cache_key = f"product:{product_id}"
    cached = r.get(cache_key)
    if cached:
        return json.loads(cached)

    # 2. Cache miss — query DB
    conn = psycopg2.connect("dbname=ecommerce")
    cur = conn.cursor()
    cur.execute("SELECT * FROM products WHERE id = %s", (product_id,))
    product = cur.fetchone()

    if product:
        # 3. Store in cache (TTL 1 hour)
        r.setex(cache_key, 3600, json.dumps(product))

    return product

def update_product(product_id, data):
    # 1. Update DB
    conn = psycopg2.connect("dbname=ecommerce")
    cur = conn.cursor()
    cur.execute("UPDATE products SET name=%s WHERE id=%s",
                (data['name'], product_id))
    conn.commit()

    # 2. Invalidate cache
    r.delete(f"product:{product_id}")

12. Cost Comparison (Managed Services)

Monthly Cost Estimate (Medium-Scale Service: 100GB Data, 5000 TPS)

Service	Provider	Instance	Monthly Cost (USD)
RDS PostgreSQL	AWS	db.r6g.xlarge	~350
Cloud SQL PostgreSQL	GCP	db-custom-4-16384	~320
RDS MySQL	AWS	db.r6g.xlarge	~350
Atlas MongoDB	MongoDB	M40	~480
ElastiCache Redis	AWS	cache.r6g.xlarge	~450
DynamoDB On-Demand	AWS	-	~180-600
Neon PostgreSQL	Neon	Scale	~69-189
PlanetScale MySQL	PlanetScale	Scaler Pro	~39-299

Cost Optimization Strategies

Reserved Instances: 40-70% discount with 1-3 year commitments
Serverless Options: Neon, PlanetScale, DynamoDB on-demand for variable traffic
Read Replicas: Distribute read load to save on primary instance specs
Right-sizing: Check actual utilization via CloudWatch/monitoring before adjusting
Data Archiving: Move old data to S3/Glacier

13. Interview Questions — 20 Essential Questions

Fundamentals (1-5)

Q1: Explain the difference between ACID and BASE.

ACID (Atomicity, Consistency, Isolation, Durability): Traditional RDBMS transaction properties. All operations either fully succeed or fully fail, guaranteeing data consistency.

BASE (Basically Available, Soft state, Eventually consistent): The NoSQL approach. Always available but provides eventual consistency instead of immediate consistency.

Selection criteria: ACID for financial/payment systems, BASE for social media feeds/like counters.

Q2: Explain the CAP theorem with real-world examples.

CP System (MongoDB): During network partitions, sacrifices some node availability for consistency. Writes unavailable during primary election.
AP System (Cassandra, DynamoDB): All nodes respond even during network partitions, but data may temporarily differ between nodes.
CA System: Single-node systems without network partitions. Practically nonexistent in distributed environments.

Pro tip: Extending to the PACELC theorem, consider the trade-off between Latency and Consistency when there is no partition (Else).

Q3: Explain index types and their appropriate use cases.

B-Tree Index: Optimal for range searches and sorting. Most default indexes. O(log N)
Hash Index: Only for equality comparisons (=). O(1). Rarely used in PostgreSQL due to WAL support issues
GIN (Generalized Inverted Index): Suitable for arrays, JSONB, full-text search
GiST (Generalized Search Tree): Suitable for spatial data (PostGIS), range types
BRIN (Block Range Index): Effective for physically sorted large tables (time-series). Memory efficient
Bloom Index: Equality comparison filtering across many columns

Q4: What is the N+1 query problem and how do you solve it?

N+1 problem: An inefficient pattern where 1 query fetches N results, then N additional queries are executed for each result.

Solutions:

JOIN: Combine data in a single query in relational DBs
Eager Loading: Pre-load related data in ORMs (Django: select_related, prefetch_related)
DataLoader Pattern: Batch requests together in GraphQL
MongoDB: Embedding pattern to store related data in a single document

Q5: Explain the trade-offs of database normalization vs denormalization.

Normalization (1NF-BCNF):

Pros: Minimize data redundancy, easier consistency maintenance, save storage
Cons: Increased JOINs degrade read performance, complex queries

Denormalization:

Pros: Improved read performance (fewer JOINs), simpler queries
Cons: Data redundancy, update anomaly risks, increased storage

Practical guide: Normalize to 3NF+ for OLTP, consider intentional denormalization for OLAP/read-heavy workloads.

Advanced Topics (6-10)

Q6: Explain the differences between PostgreSQL MVCC and MySQL InnoDB MVCC.

PostgreSQL MVCC:

Stores multiple row versions directly in the table (xmin, xmax)
Old versions cleaned up by VACUUM
Pros: No read locks, high concurrency
Cons: Table bloat, VACUUM overhead

MySQL InnoDB MVCC:

Stores previous versions in Undo Log
Current row always has the latest version
Purge thread cleans unnecessary Undo entries
Pros: No table bloat, simpler management
Cons: Performance degradation when Undo log grows large

Q7: What are the types of sharding strategies and how do you choose?

Hash Sharding: Hash key for even distribution. Inefficient for range queries.
Range Sharding: Split by key range. Hotspot risk. Good for time-series.
Geography-based Sharding: Split by user location. Useful for regional compliance.
Directory-based Sharding: Flexible routing via mapping table. High management complexity.

Shard key selection criteria: High cardinality, even distribution, alignment with query patterns.

Q8: Redis is single-threaded — how is it so fast?

In-memory Processing: All data in RAM, no disk I/O
I/O Multiplexing: epoll/kqueue handles thousands of connections in a single thread
No Locking Needed: Single-threaded means no concurrency control overhead
Optimized Data Structures: SDS, ziplist, intset — memory-efficient structures
Redis 6.0+: I/O threading introduced (command processing remains single-threaded)

Note: Redis 7.0+ further improved multi-threaded I/O, enhancing network processing performance.

Q9: When should you choose embedding vs referencing in MongoDB?

Embedding (subdocuments):

Data queried together
1:1 or 1:few relationships
Within 16MB document size limit
Example: Orders and order items, users and addresses

Referencing (separate collections):

Data queried/updated independently
1:many or many:many relationships
Document size grows rapidly
Example: Blog posts and comments, products and reviews

Hybrid pattern: Embed frequently-used data, reference the rest.

Q10: Why is connection pooling important and how do you configure it?

DB connection creation cost: TCP handshake + authentication + session allocation = several to tens of ms. Creating connections per request severely degrades performance.

Configuration example (HikariCP):

maximumPoolSize: Typically CPU cores x 2 + number of disks
minimumIdle: Minimum idle connections
connectionTimeout: Maximum time waiting for a connection from the pool
idleTimeout: How long idle connections are maintained

PostgreSQL is process-based, so using PgBouncer in front for connection pooling management is a best practice.

Practical System Design (11-15)

Q11: Design a DB architecture for a high-traffic e-commerce service.

Product Catalog: MongoDB (flexible attributes, fast reads)
Orders/Payments: PostgreSQL (ACID transactions)
Cart/Sessions: Redis (TTL, fast read/write)
Product Search: Elasticsearch (full-text search, facets)
Recommendation System: PostgreSQL pgvector (vector similarity)
Analytics/Reporting: ClickHouse or BigQuery

Caching strategy: Cache-Aside pattern with Redis for product info caching. CDN for static content caching.

Q12: Explain database migration strategies.

Dual-Write Pattern: Write to both DBs simultaneously, gradually switch reads
CDC (Change Data Capture): Real-time data sync with tools like Debezium
Shadow Traffic: Mirror traffic to new DB for performance validation
Blue-Green: Prepare two environments and switch at once

Key points: Rollback plan is essential, automate data consistency verification, gradual traffic migration.

Q13: How do you handle replication lag between read replicas and write master?

Read-your-writes Consistency: Read from master for a short time after writes
Minimum Replication Confirmation: PostgreSQL's synchronous_commit setting
Lag Monitoring: Check replay_lag in pg_stat_replication
GTID-based Routing: Route to replicas that have replicated up to a specific GTID in MySQL
Cache Utilization: Store latest values in cache after writes, read from cache instead of replica

Q14: How do you safely perform schema changes on large tables?

PostgreSQL:

ALTER TABLE ADD COLUMN (without DEFAULT): Instant completion
ALTER TABLE ADD COLUMN DEFAULT value: Instant in pg 11+
CREATE INDEX CONCURRENTLY: Index creation without table locks

MySQL:

pt-online-schema-change (Percona): Trigger-based online changes
gh-ost (GitHub): Binary log-based online changes
MySQL 8.0+ INSTANT DDL: Metadata-only change operations

MongoDB:

Schema-less so most migrations are unnecessary
Index creation: createIndex (background build by default)

Q15: What are the most important metrics for DB performance monitoring?

Query Performance: Slow query logs, p95/p99 response times
Connection Count: Active/idle connections, connection pool utilization
Cache Hit Ratio: Buffer Pool/Shared Buffer hit ratio (target: 99%+)
Replication Lag: Master-replica delay time
Disk I/O: IOPS, latency, utilization
Lock Waits: Row lock/table lock wait time and frequency
Transaction Throughput: TPS, commit/rollback ratios

Tools: pg_stat_statements (PostgreSQL), Performance Schema (MySQL), Atlas (MongoDB), Redis INFO.

Advanced Practical Questions (16-20)

Q16: What is the Hot Partition problem and how do you solve it?

A problem where traffic concentrates on specific partitions/shards, creating performance bottlenecks.

DynamoDB example: When writes concentrate on a specific partition key, that partition gets throttled.

Solutions:

Write Sharding: Add random suffix to key (e.g., USER#001#3)
Composite Key: Include date/time in key for distribution
Caching Layer: Absorb read hotspots with Redis
Distributed Counters: Split one counter into N parts and sum

Q17: Explain backup strategy types and RPO/RTO.

RPO (Recovery Point Objective): Maximum acceptable data loss time
RTO (Recovery Time Objective): Maximum acceptable recovery time

Backup types:

Full Backup: All data. Short RTO, large storage
Incremental Backup: Changes since last backup. Small storage, complex recovery
Continuous Archiving (WAL): PostgreSQL WAL archiving enables Point-in-Time Recovery (PITR). RPO near zero
Snapshots: EBS snapshots, RDS automatic backups. Fast recovery

Q18: Describe the process for analyzing and optimizing slow queries.

Identify: Enable slow query logs (MySQL: slow_query_log, PostgreSQL: log_min_duration_statement)
Analyze: Check execution plan with EXPLAIN ANALYZE
Optimization order:
- Add/improve indexes
- Refactor queries (remove unnecessary JOINs, convert subqueries to CTEs)
- Apply partitioning
- Archive data
- Consider denormalization (read optimization)
Verify: Compare before/after performance
Monitor: Continuously observe pg_stat_statements, Performance Schema

Q19: How do you ensure data consistency in eventually consistent systems?

Read Repair: Fix inconsistencies found during reads with latest values
Anti-Entropy (Merkle Tree): Background data comparison/correction between nodes
Conflict Resolution: Last-write-wins (LWW), vector clocks, custom merge logic
Quorum Reads/Writes: R + W > N condition for strong consistency
Saga Pattern: Compensating transactions instead of distributed transactions for consistency

Q20: What are the top 3 factors to consider when selecting a database in 2025?

AI/Vector Search Support: As RAG pipelines become standard, vector search is essential. Choose between pgvector, Atlas Vector Search, Redis Vector Search.
Operational Complexity vs Cost: Convenience of managed services (RDS, Atlas, ElastiCache) vs cost savings of self-hosting. Decide based on team size and DevOps capabilities.
Data Locality and Regulations: Data sovereignty regulations like GDPR and regional privacy laws. DynamoDB Global Tables, CockroachDB's region-based distribution can be solutions.

14. Quiz

Quiz 1: What are the 2 main differences between JSONB and JSON types in PostgreSQL?

Storage Method: JSON stores as plain text, JSONB decomposes into binary format. JSONB writes slightly slower but reads/searches much faster.
Indexing: JSONB supports GIN indexes for O(log N) key/value searches. JSON cannot be indexed.

Additional: JSONB does not preserve key order and removes duplicate keys.

Quiz 2: What data structure does Redis Sorted Set use internally?

Skip List + Hash Table combination.

Skip List: Range search in sorted order O(log N)
Hash Table: Score lookup O(1)

When elements are 128 or fewer and values are 64 bytes or less, ziplist (listpack in 7.0+) is used to save memory.

Quiz 3: What is the difference between GSI and LSI in DynamoDB?

Item	GSI	LSI
Partition Key	Can differ from table	Same as table
Sort Key	Freely chosen	Freely chosen
Capacity Limit	None	10GB per partition
Creation Timing	After table creation	Only at table creation
Consistency	Eventually consistent only	Strong consistency possible
Provisioning	Separate RCU/WCU	Shared with table

Practical tip: GSI is more flexible, so prefer GSI in most cases.

Quiz 4: When should you use and avoid transactions in MongoDB?

Use when:

Atomic updates across multiple documents/collections (payment and inventory simultaneous changes)
Data consistency is a business requirement

Avoid when:

Single document updates (MongoDB is already atomic at document level)
High-throughput operations (transactions have performance overhead)
Frequent multi-shard transactions in sharded environments

Design principle: Embed related data in a single document when possible to reduce the need for transactions.

Quiz 5: Why is VACUUM important in PostgreSQL?

PostgreSQL's MVCC does not immediately remove previous versions during updates/deletes. VACUUM cleans up these "dead tuples."

Without it:

Table Bloat: Wasted disk space, degraded sequential scan performance
Index Bloat: Increased index size, degraded search performance
Transaction ID Wraparound: Risk of data loss after approximately 2 billion transactions (XID wraparound)

Autovacuum is enabled by default, but tables with high write loads need autovacuum parameter tuning.

References

PostgreSQL 17 Official Docs: https://www.postgresql.org/docs/17/
MySQL 9.0 Official Docs: https://dev.mysql.com/doc/refman/9.0/en/
MongoDB 8.0 Official Docs: https://www.mongodb.com/docs/manual/
Redis Official Docs: https://redis.io/docs/
Valkey Project: https://valkey.io/
DynamoDB Developer Guide: https://docs.aws.amazon.com/dynamodb/
ScyllaDB Docs: https://opensource.docs.scylladb.com/
DB-Engines Rankings: https://db-engines.com/en/ranking
YCSB Benchmark: https://github.com/brianfrankcooper/YCSB
"Designing Data-Intensive Applications" by Martin Kleppmann

The 8 databases covered in this article each have unique strengths. There is no "best database" — only "optimal databases." Consider workload patterns, scalability requirements, team capabilities, and cost constraints holistically. And rather than relying on a single database for everything, combining each database's strengths through a Polyglot Persistence strategy is the best practice for 2025.