Skip to content
Published on

Database Complete Comparison 2025: PostgreSQL vs MySQL vs MongoDB vs Redis — Selection Guide

Authors

Introduction

The database ecosystem in 2025 is more diverse and complex than ever before. The explosive growth of AI/ML workloads has made vector search essential, and new requirements like real-time streaming processing, multi-model support, and serverless architectures are making database selection increasingly challenging.

In this article, we systematically compare the architecture, performance, cost, and use cases of 8 major databases, centered on PostgreSQL, MySQL, MongoDB, and Redis, extending to DynamoDB and Cassandra/ScyllaDB. Rather than a simple feature listing, we provide a practical guide to assist real-world production decision-making.


DB-Engines Rankings (March 2025)

RankDatabaseTypeScoreYoY Change
1OracleRDBMS1,245-3.2%
2MySQLRDBMS1,142-1.8%
3PostgreSQLRDBMS685+12.4%
4MongoDBDocument397+2.1%
5RedisKey-Value168-5.3%
6ElasticsearchSearch155+1.7%
7SQLiteRDBMS142+8.9%
8CassandraWide-Column112-4.1%

1. AI is Changing Database Selection

As RAG (Retrieval-Augmented Generation) pipelines become ubiquitous, vector search capabilities have become a core criterion for database selection. PostgreSQL's pgvector, MongoDB Atlas Vector Search, and Redis Stack are leading this competition.

2. The Rise of Serverless Databases

Usage-based pricing models like Neon (PostgreSQL serverless), PlanetScale (MySQL serverless), MongoDB Atlas Serverless, and DynamoDB on-demand are becoming the standard.

3. Multi-Model Databases

The multi-model approach of handling relational, document, graph, and time-series data within a single database is growing. PostgreSQL leads this space with extensions like JSONB, pgvector, and PostGIS.

4. The Emergence of Valkey

In response to Redis Labs' license change (SSPL), the Linux Foundation released the Valkey fork. With AWS ElastiCache, Google Cloud Memorystore, and others adopting Valkey, significant changes are underway in the Redis ecosystem.


2. RDBMS vs NoSQL vs NewSQL — When to Use What

Understanding the CAP Theorem in Practice

This theorem states that in distributed systems, you cannot simultaneously achieve perfect Consistency, Availability, and Partition Tolerance.

         Consistency (C)
            /\
           /  \
          /    \
    CP   / CAP  \ CA
  system/ impossi\ system
        /ble     \
       /________\
  Partition(P)   Availability(A)
TypeCombinationDatabasesCharacteristics
CAConsistency + AvailabilityTraditional RDBMS (single node)Unavailable during network partitions
CPConsistency + Partition ToleranceMongoDB, HBaseSacrifices some availability
APAvailability + Partition ToleranceCassandra, DynamoDBEventual consistency

When to Choose RDBMS

  • Transaction integrity is mandatory (finance, payments, inventory management)
  • Complex JOINs and aggregate queries are frequent
  • Data schema is clear and stable
  • ACID compliance is a legal/regulatory requirement

When to Choose NoSQL

  • Schema changes frequently or data is unstructured
  • Horizontal scaling is a core requirement
  • Sub-millisecond read/write latency needed
  • High-volume fast write processing

When to Choose NewSQL

NewSQL databases like CockroachDB, TiDB, and YugabyteDB provide both RDBMS transaction guarantees and NoSQL horizontal scalability.

  • Globally distributed transactional systems
  • Need horizontal scaling while maintaining RDBMS compatibility
  • However, maturity and community size still lag behind traditional RDBMS

3. PostgreSQL Deep Dive

Architecture Overview

PostgreSQL uses a process-based architecture. Each client connection gets a dedicated backend process.

┌─────────────────────────────────────────────┐
PostgreSQL│  ┌──────────┐  ┌──────────┐  ┌──────────┐  │
│  │ Backend  │  │ Backend  │  │ Backend  │  │
│  │ Process  │  │ Process  │  │ Process  │  │
│  └────┬─────┘  └────┬─────┘  └────┬─────┘  │
│       │             │             │         │
│  ┌────┴─────────────┴─────────────┴────┐    │
│  │          Shared Memory              │    │
│  │  ┌──────────┐  ┌──────────────┐    │    │
│  │  │ Shared   │  │    WAL       │    │    │
│  │  │ Buffers  │  │  Buffers     │    │    │
│  │  └──────────┘  └──────────────┘    │    │
│  └─────────────────────────────────────┘    │
│                                             │
│  ┌─────────────────────────────────────┐    │
│  │     Background Workers              │    │
│  │  WAL Writer | Checkpointer |        │    │
│  │  Autovacuum | Stats Collector       │    │
│  └─────────────────────────────────────┘    │
└─────────────────────────────────────────────┘

MVCC (Multi-Version Concurrency Control)

PostgreSQL's MVCC maintains multiple versions of each row to minimize lock conflicts between readers and writers.

-- Set transaction isolation level
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;

-- Verify MVCC behavior
BEGIN;
SELECT xmin, xmax, ctid, * FROM orders WHERE id = 1;
-- xmin: creation transaction ID, xmax: deletion transaction ID
-- ctid: physical location (page, offset)
COMMIT;

JSONB — The Ultimate Relational + Document Combination

-- Create table with JSONB column
CREATE TABLE products (
    id SERIAL PRIMARY KEY,
    name VARCHAR(255) NOT NULL,
    metadata JSONB NOT NULL DEFAULT '{}',
    created_at TIMESTAMPTZ DEFAULT NOW()
);

-- Create GIN index (optimizes JSONB search)
CREATE INDEX idx_products_metadata ON products USING GIN (metadata);

-- JSONB query examples
SELECT name, metadata->>'category' AS category,
       (metadata->'specs'->>'weight')::numeric AS weight
FROM products
WHERE metadata @> '{"category": "electronics"}'
  AND (metadata->'specs'->>'weight')::numeric < 500;

-- JSONB aggregation
SELECT metadata->>'category' AS category,
       COUNT(*),
       AVG((metadata->'price'->>'amount')::numeric) AS avg_price
FROM products
GROUP BY metadata->>'category';

pgvector — The Game Changer for the AI/RAG Era

-- Install pgvector extension
CREATE EXTENSION IF NOT EXISTS vector;

-- Table with vector column
CREATE TABLE documents (
    id SERIAL PRIMARY KEY,
    title TEXT NOT NULL,
    content TEXT NOT NULL,
    embedding vector(1536)  -- OpenAI text-embedding-3-small dimensions
);

-- HNSW index (fast approximate nearest neighbor search)
CREATE INDEX ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);

-- IVFFlat index (memory efficient)
CREATE INDEX ON documents
USING ivfflat (embedding vector_l2_ops)
WITH (lists = 100);

-- Cosine similarity search
SELECT id, title,
       1 - (embedding <=> query_embedding) AS similarity
FROM documents
ORDER BY embedding <=> query_embedding
LIMIT 10;

-- Hybrid search (vector + full-text)
SELECT id, title,
       ts_rank(to_tsvector('english', content), plainto_tsquery('english', 'search term')) AS text_score,
       1 - (embedding <=> query_embedding) AS vector_score
FROM documents
WHERE to_tsvector('english', content) @@ plainto_tsquery('english', 'search term')
ORDER BY vector_score * 0.7 + text_score * 0.3 DESC
LIMIT 10;

pg_trgm — Fuzzy Search and Similarity Matching

CREATE EXTENSION IF NOT EXISTS pg_trgm;

-- Trigram index
CREATE INDEX idx_products_name_trgm ON products
USING GIN (name gin_trgm_ops);

-- Similarity search (typo-tolerant)
SELECT name, similarity(name, 'PostgreSLQ') AS sim
FROM products
WHERE name % 'PostgreSLQ'
ORDER BY sim DESC;

Partitioning

-- Range partitioning example
CREATE TABLE orders (
    id BIGSERIAL,
    order_date DATE NOT NULL,
    customer_id INTEGER NOT NULL,
    total_amount NUMERIC(10,2),
    PRIMARY KEY (id, order_date)
) PARTITION BY RANGE (order_date);

-- Create partitions
CREATE TABLE orders_2025_q1 PARTITION OF orders
    FOR VALUES FROM ('2025-01-01') TO ('2025-04-01');
CREATE TABLE orders_2025_q2 PARTITION OF orders
    FOR VALUES FROM ('2025-04-01') TO ('2025-07-01');

-- Automatic partition management (pg_partman)
CREATE EXTENSION pg_partman;
SELECT partman.create_parent(
    p_parent_table => 'public.orders',
    p_control => 'order_date',
    p_type => 'native',
    p_interval => '3 months'
);

PostgreSQL 17 Key Features

  • Incremental Backup: Incremental backup support in pg_basebackup
  • Enhanced SQL/JSON Standard Support: JSON_TABLE, JSON_QUERY, etc.
  • Performance Improvements: Improved parallel hash joins, bulk loading optimizations
  • Security Enhancements: Improved LDAP/SCRAM authentication

4. MySQL Deep Dive

InnoDB Architecture

┌─────────────────────────────────────────────┐
MySQL Server│  ┌───────────────────────────────────────┐  │
│  │          Connection Pool              │  │
│  │   Thread 1 | Thread 2 | Thread N     │  │
│  └───────────────────────────────────────┘  │
│  ┌───────────────────────────────────────┐  │
│  │     Query Cache (removed in 8.0+)    │  │
│  │          Parser -> Optimizer          │  │
│  └───────────────────────────────────────┘  │
│  ┌───────────────────────────────────────┐  │
│  │            InnoDB Engine              │  │
│  │  ┌──────────┐  ┌──────────────────┐  │  │
│  │  │ Buffer   │  │  Change Buffer   │  │  │
│  │  │ Pool     │  │                  │  │  │
│  │  └──────────┘  └──────────────────┘  │  │
│  │  ┌──────────┐  ┌──────────────────┐  │  │
│  │  │ Redo Log │  │  Undo Log        │  │  │
│  │  └──────────┘  └──────────────────┘  │  │
│  └───────────────────────────────────────┘  │
└─────────────────────────────────────────────┘

MySQL uses a thread-based architecture, contrasting with PostgreSQL's process-based approach. This results in lower connection overhead and higher memory efficiency.

MySQL vs PostgreSQL Key Differences

-- MySQL: UPSERT (INSERT ... ON DUPLICATE KEY)
INSERT INTO products (id, name, stock)
VALUES (1, 'Widget', 100)
ON DUPLICATE KEY UPDATE stock = stock + VALUES(stock);

-- PostgreSQL: UPSERT (INSERT ... ON CONFLICT)
INSERT INTO products (id, name, stock)
VALUES (1, 'Widget', 100)
ON CONFLICT (id)
DO UPDATE SET stock = products.stock + EXCLUDED.stock;

HeatWave ML — ML Directly in MySQL

-- Train model with HeatWave ML
CALL sys.ML_TRAIN('my_schema.customer_data',
    'churn_label',
    JSON_OBJECT('task', 'classification'),
    @model_handle);

-- Run prediction
CALL sys.ML_PREDICT_ROW(
    JSON_OBJECT('age', 35, 'tenure', 24, 'monthly_charge', 79.99),
    @model_handle,
    @prediction);

SELECT @prediction;

Group Replication

MySQL Group Replication uses a Paxos-based consensus protocol for high availability.

-- Check Group Replication status
SELECT MEMBER_ID, MEMBER_HOST, MEMBER_PORT, MEMBER_STATE, MEMBER_ROLE
FROM performance_schema.replication_group_members;

-- Single-primary mode (default, recommended)
-- One write node, rest are read-only
SET GLOBAL group_replication_single_primary_mode = ON;

Read/Write Splitting with ProxySQL

-- ProxySQL routing rules
INSERT INTO mysql_query_rules (rule_id, active, match_pattern, destination_hostgroup)
VALUES
    (1, 1, '^SELECT .* FOR UPDATE', 0),   -- Writer group
    (2, 1, '^SELECT', 1);                  -- Reader group

-- Server group configuration
INSERT INTO mysql_servers (hostgroup_id, hostname, port, weight)
VALUES
    (0, 'primary.mysql.local', 3306, 1),    -- Writer
    (1, 'replica1.mysql.local', 3306, 50),  -- Reader
    (1, 'replica2.mysql.local', 3306, 50);  -- Reader

MySQL 9.0 Key Features

  • JavaScript Stored Programs: GraalVM-based JavaScript stored procedures
  • Vector Data Type: Native vector data type support
  • EXPLAIN Improvements: FORMAT=TREE default, more intuitive execution plans
  • Security: OpenID Connect authentication support

5. MongoDB Deep Dive

Strengths of the Document Model

// MongoDB document structure — embedding pattern
db.orders.insertOne({
  orderId: 'ORD-2025-001',
  customer: {
    name: 'John Dev',
    email: 'john@example.com',
    address: {
      street: '123 Tech Street',
      city: 'San Francisco',
      zipCode: '94102',
    },
  },
  items: [
    {
      productId: 'PROD-001',
      name: 'Wireless Keyboard',
      price: 89.0,
      quantity: 2,
    },
    {
      productId: 'PROD-002',
      name: 'Monitor Stand',
      price: 45.0,
      quantity: 1,
    },
  ],
  payment: {
    method: 'card',
    status: 'completed',
    paidAt: ISODate('2025-03-15T10:30:00Z'),
  },
  totalAmount: 223.0,
  createdAt: ISODate('2025-03-15T10:25:00Z'),
})
// Create vector search index
db.collection.createSearchIndex({
  name: 'vector_index',
  definition: {
    mappings: {
      dynamic: true,
      fields: {
        embedding: {
          type: 'knnVector',
          dimensions: 1536,
          similarity: 'cosine',
        },
      },
    },
  },
})

// Vector search query
db.documents.aggregate([
  {
    $vectorSearch: {
      index: 'vector_index',
      path: 'embedding',
      queryVector: queryEmbedding, // 1536-dimensional vector
      numCandidates: 100,
      limit: 10,
      filter: { category: 'tech' },
    },
  },
  {
    $project: {
      title: 1,
      content: 1,
      score: { $meta: 'vectorSearchScore' },
    },
  },
])

Advanced Aggregation Pipeline

// Monthly revenue analysis pipeline
db.orders.aggregate([
  // Period filter
  {
    $match: {
      createdAt: {
        $gte: ISODate('2025-01-01'),
        $lt: ISODate('2026-01-01'),
      },
      'payment.status': 'completed',
    },
  },
  // Decompose order items
  { $unwind: '$items' },
  // Group by month
  {
    $group: {
      _id: {
        year: { $year: '$createdAt' },
        month: { $month: '$createdAt' },
        category: '$items.category',
      },
      totalRevenue: { $sum: { $multiply: ['$items.price', '$items.quantity'] } },
      orderCount: { $sum: 1 },
      avgOrderValue: { $avg: { $multiply: ['$items.price', '$items.quantity'] } },
    },
  },
  // Sort
  { $sort: { '_id.year': 1, '_id.month': 1 } },
  // Transform output
  {
    $project: {
      _id: 0,
      period: { $concat: [{ $toString: '$_id.year' }, '-', { $toString: '$_id.month' }] },
      category: '$_id.category',
      totalRevenue: 1,
      orderCount: 1,
      avgOrderValue: { $round: ['$avgOrderValue', 0] },
    },
  },
])

MongoDB Sharding

// Shard key selection — consider cardinality, distribution, query patterns
sh.enableSharding('ecommerce')

// Hash sharding (even distribution)
sh.shardCollection('ecommerce.orders', { customerId: 'hashed' })

// Range sharding (optimized for range queries)
sh.shardCollection('ecommerce.logs', { timestamp: 1 })

// Zone sharding (region-based data separation)
sh.addShardTag('shard-us', 'AMERICAS')
sh.addShardTag('shard-eu', 'EUROPE')
sh.addTagRange('ecommerce.users', { region: 'US' }, { region: 'UT' }, 'AMERICAS')

MongoDB 8.0 Key Features

  • Queryable Encryption: Query data while encrypted
  • Cluster-to-Cluster Sync: Multi-cloud synchronization
  • Time Series Collection Improvements: More efficient time-series data handling
  • Performance: 20-35% improvement in read/write performance

6. Redis Deep Dive

Data Structures and Use Cases

-- Strings: cache, sessions, counters
SET user:session:abc123 "session_data" EX 3600
INCR page:views:homepage

-- Hashes: object storage
HSET user:1001 name "John Dev" email "john@dev.com" login_count 42
HINCRBY user:1001 login_count 1

-- Sorted Sets: leaderboards, rankings
ZADD leaderboard 9500 "player:alice"
ZADD leaderboard 8700 "player:bob"
ZADD leaderboard 9200 "player:charlie"
ZREVRANGE leaderboard 0 9 WITHSCORES  -- Top 10

-- Streams: event streaming
XADD mystream * sensor_id "temp-01" temperature "23.5" humidity "45"
XREAD COUNT 10 BLOCK 5000 STREAMS mystream 0

-- HyperLogLog: approximate unique counts
PFADD unique:visitors:2025-03-23 "user1" "user2" "user3"
PFCOUNT unique:visitors:2025-03-23

Redis Stack — Search, JSON, TimeSeries

-- Redis JSON
JSON.SET product:001 $ '{"name":"Wireless Keyboard","price":89.00,"tags":["electronics","keyboard"],"specs":{"weight":350,"wireless":true}}'
JSON.GET product:001 $.specs.weight

-- Redis Search index creation
FT.CREATE idx:products ON JSON PREFIX 1 product:
  SCHEMA
    $.name AS name TEXT SORTABLE
    $.price AS price NUMERIC SORTABLE
    $.tags[*] AS tags TAG
    $.specs.weight AS weight NUMERIC

-- Full-text search
FT.SEARCH idx:products "@name:keyboard @price:[50 100]"

-- Vector search (Redis Stack)
FT.CREATE idx:docs ON HASH PREFIX 1 doc:
  SCHEMA
    title TEXT
    embedding VECTOR HNSW 6 TYPE FLOAT32 DIM 1536 DISTANCE_METRIC COSINE

-- KNN search
FT.SEARCH idx:docs "*=>[KNN 10 @embedding $query_vec AS score]"
  PARAMS 2 query_vec "\x00\x00..."
  SORTBY score
  RETURN 2 title score

-- Redis TimeSeries
TS.CREATE temperature:sensor1 RETENTION 86400000 LABELS sensor_id temp-01 location office
TS.ADD temperature:sensor1 * 23.5
TS.RANGE temperature:sensor1 - + AGGREGATION avg 60000  -- 1-minute average

Persistence: RDB vs AOF

FeatureRDB (Snapshot)AOF (Append Only File)
MechanismPeriodic memory dumpLogs every write command
Data LossSince last snapshotDepends on fsync policy
Recovery SpeedFast (binary load)Slow (command replay)
File SizeSmall (compressed)Large (all commands logged)
CPU ImpactHigh during fork()Low with everysec
# Recommended redis.conf settings
save 900 1       # RDB if 1+ changes in 900 seconds
save 300 10      # RDB if 10+ changes in 300 seconds
appendonly yes   # Enable AOF
appendfsync everysec  # fsync every second (performance/safety balance)

Redis Cluster

# Create 6-node cluster (3 masters + 3 replicas)
redis-cli --cluster create \
  node1:6379 node2:6379 node3:6379 \
  node4:6379 node5:6379 node6:6379 \
  --cluster-replicas 1

Redis Cluster uses 16384 hash slots to distribute keys. The key space is mapped to slots using CRC16 hashing.

The Valkey Fork — The Future of Redis?

After Redis Labs switched to the SSPL license in 2024, Valkey was born under the Linux Foundation.

ItemRedisValkey
LicenseSSPL (2024~)BSD-3-Clause
Cloud SupportRedis CloudAWS, GCP, Azure
CompatibilityOriginalRedis 7.2 compatible
DevelopmentRedis Ltd.Linux Foundation
CommunityLimited contributionsOpen contributions

7. DynamoDB — The Serverless NoSQL Standard

Core Concepts

┌───────────────────────────────────────────┐
DynamoDB Table│  ┌─────────────────────────────────────┐  │
│  │  Partition Key (PK)  Sort Key (SK)  │  │
│  ├─────────────────────────────────────┤  │
│  │  USER#001            PROFILE        │  │
│  │  USER#001            ORDER#001      │  │
│  │  USER#001            ORDER#002      │  │
│  │  USER#002            PROFILE        │  │
│  │  USER#002            ORDER#001      │  │
│  └─────────────────────────────────────┘  │
│                                           │
GSI: OrderDate-Index│  ┌─────────────────────────────────────┐  │
│  │  OrderDate (PK)     UserId (SK)     │  │
│  └─────────────────────────────────────┘  │
└───────────────────────────────────────────┘

Single-Table Design Pattern

import boto3

dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('EcommerceTable')

# Store user profile
table.put_item(Item={
    'PK': 'USER#001',
    'SK': 'PROFILE',
    'name': 'John Dev',
    'email': 'john@dev.com',
    'type': 'USER'
})

# Store order in the same table
table.put_item(Item={
    'PK': 'USER#001',
    'SK': 'ORDER#2025-03-15#ORD001',
    'items': [{'name': 'Keyboard', 'price': 89.00}],
    'total': 89.00,
    'status': 'completed',
    'type': 'ORDER'
})

# Query all orders for a user
response = table.query(
    KeyConditionExpression='PK = :pk AND begins_with(SK, :sk)',
    ExpressionAttributeValues={
        ':pk': 'USER#001',
        ':sk': 'ORDER#'
    }
)

DynamoDB Pricing Models

ModeRead CostWrite CostBest For
On-Demand$0.25/1M RRU$1.25/1M WRUUnpredictable traffic
Provisioned$0.00065/RCU/hour$0.00065/WCU/hourStable traffic
ReservedUp to 77% discountUp to 77% discount1-year/3-year commitment

8. Cassandra / ScyllaDB — The Kings of High-Volume Writes

Cassandra Architecture

Cassandra combines a Dynamo-style distributed hash table with a BigTable-style data model.

┌────────┐  ┌────────┐  ┌────────┐
Node A │──│ Node B │──│ Node CToken: │  │ Token: │  │ Token:0-33   │  │ 34-66  │  │ 67-100└───┬────┘  └───┬────┘  └───┬────┘
    │           │           │
    └───────────┼───────────┘
            Gossip Protocol
-- Create keyspace (with replication strategy)
CREATE KEYSPACE ecommerce WITH replication = {
    'class': 'NetworkTopologyStrategy',
    'us-east': 3,
    'eu-west': 3
};

-- Create table (partition key + clustering key)
CREATE TABLE ecommerce.user_activity (
    user_id UUID,
    activity_date DATE,
    activity_time TIMESTAMP,
    activity_type TEXT,
    details MAP<TEXT, TEXT>,
    PRIMARY KEY ((user_id, activity_date), activity_time)
) WITH CLUSTERING ORDER BY (activity_time DESC)
  AND compaction = {'class': 'TimeWindowCompactionStrategy',
                    'compaction_window_size': 1,
                    'compaction_window_unit': 'DAYS'};

ScyllaDB — High-Performance Cassandra-Compatible Alternative

ScyllaDB is a Cassandra-compatible database rewritten in C++. Its shard-per-core architecture delivers 3-10x higher performance on identical hardware.

ComparisonCassandraScyllaDB
LanguageJavaC++ (Seastar)
ArchitectureThread-basedShard-per-core
GC ImpactGC pauses existNo GC
Same Hardware PerformanceBaseline3-10x
Cassandra CompatibilityOriginalCQL compatible

9. Performance Benchmark Comparison

YCSB Benchmark (Yahoo Cloud Serving Benchmark)

The following comparison is on identical hardware (8 vCPU, 32GB RAM, NVMe SSD):

OperationPostgreSQLMySQLMongoDBRedisDynamoDBScyllaDB
Read Latency (p99)2.1ms1.8ms1.5ms0.3ms4.2ms0.9ms
Write Latency (p99)3.5ms2.9ms2.1ms0.4ms5.8ms1.2ms
Read TPS45K52K58K250K25K120K
Write TPS28K35K42K200K20K85K
Mixed (50/50) TPS35K42K48K220K22K100K

Note: Benchmark results vary significantly based on workload patterns, data size, index configuration, etc.

Scalability Comparison

Scaling MethodPostgreSQLMySQLMongoDBRedisDynamoDBCassandra
VerticalExcellentExcellentGoodGoodN/AGood
Read ScaleReplicasReplicasReplica SetClusterAutoAdd Nodes
Write ScaleLimitedLimitedShardingClusterAutoAdd Nodes
Max Data SizeSeveral TBSeveral TBSeveral PBHundreds GBUnlimitedSeveral PB

10. Use Case Selection Matrix

Optimal Database Selection Guide

Use Case1st Choice2nd ChoiceReason
E-commercePostgreSQLMySQLTransactions + JSONB flexibility
Social MediaMongoDBCassandraUnstructured data + high writes
IoT Sensor DataScyllaDBDynamoDBHigh-volume time-series writes
Real-time AnalyticsPostgreSQLClickHouseComplex aggregate queries
RAG/AI SearchPostgreSQL (pgvector)MongoDB (Atlas Vector)Vector + full-text search
CacheRedisDragonflyDBUltra-low latency, rich data structures
Session StoreRedisDynamoDBFast read/write, TTL
Chat/MessagingMongoDBCassandraFlexible schema, time-ordered
Game LeaderboardRedisDynamoDBSorted Set, fast updates
Log ManagementElasticsearchMongoDBFull-text search, time-series
GeospatialPostgreSQL (PostGIS)MongoDBGIS capabilities, spatial indexes
Graph DataNeo4jPostgreSQL (AGE)Optimized relationship traversal

11. Multi-DB Architecture Patterns

Polyglot Persistence

┌─────────────────────────────────────────────────┐
Application Layer└───────┬──────────┬──────────┬──────────┬────────┘
        │          │          │          │
   ┌────▼────┐ ┌──▼───┐ ┌───▼───┐ ┌───▼────┐
   │PostgreSQL│ │Redis │ │MongoDB│ │Elastic │
   (Orders/(Cache)(Prods)(Search)Payment) │ │      │ │       │ │        │
   └─────────┘ └──────┘ └───────┘ └────────┘

CQRS Pattern (Command Query Responsibility Segregation)

Write (Command):                  Read (Query):
┌──────────┐                     ┌──────────┐
Client   │                     │  Client└────┬─────┘                     └────┬─────┘
     │                                │
┌────▼─────┐                     ┌────▼─────┐
Command  │                     │  QueryService  │                     │ Service└────┬─────┘                     └────┬─────┘
     │                                │
┌────▼─────┐  ──Events──▶  ┌─────▼─────┐
│PostgreSQL│              │  Redis / (Source  │              │  MongoDBof Truth) (Read Model)└──────────┘              └───────────┘

Cache-Aside Pattern

import redis
import psycopg2
import json

r = redis.Redis(host='localhost', port=6379)

def get_product(product_id):
    # 1. Check cache
    cache_key = f"product:{product_id}"
    cached = r.get(cache_key)
    if cached:
        return json.loads(cached)

    # 2. Cache miss — query DB
    conn = psycopg2.connect("dbname=ecommerce")
    cur = conn.cursor()
    cur.execute("SELECT * FROM products WHERE id = %s", (product_id,))
    product = cur.fetchone()

    if product:
        # 3. Store in cache (TTL 1 hour)
        r.setex(cache_key, 3600, json.dumps(product))

    return product

def update_product(product_id, data):
    # 1. Update DB
    conn = psycopg2.connect("dbname=ecommerce")
    cur = conn.cursor()
    cur.execute("UPDATE products SET name=%s WHERE id=%s",
                (data['name'], product_id))
    conn.commit()

    # 2. Invalidate cache
    r.delete(f"product:{product_id}")

12. Cost Comparison (Managed Services)

Monthly Cost Estimate (Medium-Scale Service: 100GB Data, 5000 TPS)

ServiceProviderInstanceMonthly Cost (USD)
RDS PostgreSQLAWSdb.r6g.xlarge~350
Cloud SQL PostgreSQLGCPdb-custom-4-16384~320
RDS MySQLAWSdb.r6g.xlarge~350
Atlas MongoDBMongoDBM40~480
ElastiCache RedisAWScache.r6g.xlarge~450
DynamoDB On-DemandAWS-~180-600
Neon PostgreSQLNeonScale~69-189
PlanetScale MySQLPlanetScaleScaler Pro~39-299

Cost Optimization Strategies

  1. Reserved Instances: 40-70% discount with 1-3 year commitments
  2. Serverless Options: Neon, PlanetScale, DynamoDB on-demand for variable traffic
  3. Read Replicas: Distribute read load to save on primary instance specs
  4. Right-sizing: Check actual utilization via CloudWatch/monitoring before adjusting
  5. Data Archiving: Move old data to S3/Glacier

13. Interview Questions — 20 Essential Questions

Fundamentals (1-5)

Q1: Explain the difference between ACID and BASE.

ACID (Atomicity, Consistency, Isolation, Durability): Traditional RDBMS transaction properties. All operations either fully succeed or fully fail, guaranteeing data consistency.

BASE (Basically Available, Soft state, Eventually consistent): The NoSQL approach. Always available but provides eventual consistency instead of immediate consistency.

Selection criteria: ACID for financial/payment systems, BASE for social media feeds/like counters.

Q2: Explain the CAP theorem with real-world examples.
  • CP System (MongoDB): During network partitions, sacrifices some node availability for consistency. Writes unavailable during primary election.
  • AP System (Cassandra, DynamoDB): All nodes respond even during network partitions, but data may temporarily differ between nodes.
  • CA System: Single-node systems without network partitions. Practically nonexistent in distributed environments.

Pro tip: Extending to the PACELC theorem, consider the trade-off between Latency and Consistency when there is no partition (Else).

Q3: Explain index types and their appropriate use cases.
  • B-Tree Index: Optimal for range searches and sorting. Most default indexes. O(log N)
  • Hash Index: Only for equality comparisons (=). O(1). Rarely used in PostgreSQL due to WAL support issues
  • GIN (Generalized Inverted Index): Suitable for arrays, JSONB, full-text search
  • GiST (Generalized Search Tree): Suitable for spatial data (PostGIS), range types
  • BRIN (Block Range Index): Effective for physically sorted large tables (time-series). Memory efficient
  • Bloom Index: Equality comparison filtering across many columns
Q4: What is the N+1 query problem and how do you solve it?

N+1 problem: An inefficient pattern where 1 query fetches N results, then N additional queries are executed for each result.

Solutions:

  • JOIN: Combine data in a single query in relational DBs
  • Eager Loading: Pre-load related data in ORMs (Django: select_related, prefetch_related)
  • DataLoader Pattern: Batch requests together in GraphQL
  • MongoDB: Embedding pattern to store related data in a single document
Q5: Explain the trade-offs of database normalization vs denormalization.

Normalization (1NF-BCNF):

  • Pros: Minimize data redundancy, easier consistency maintenance, save storage
  • Cons: Increased JOINs degrade read performance, complex queries

Denormalization:

  • Pros: Improved read performance (fewer JOINs), simpler queries
  • Cons: Data redundancy, update anomaly risks, increased storage

Practical guide: Normalize to 3NF+ for OLTP, consider intentional denormalization for OLAP/read-heavy workloads.

Advanced Topics (6-10)

Q6: Explain the differences between PostgreSQL MVCC and MySQL InnoDB MVCC.

PostgreSQL MVCC:

  • Stores multiple row versions directly in the table (xmin, xmax)
  • Old versions cleaned up by VACUUM
  • Pros: No read locks, high concurrency
  • Cons: Table bloat, VACUUM overhead

MySQL InnoDB MVCC:

  • Stores previous versions in Undo Log
  • Current row always has the latest version
  • Purge thread cleans unnecessary Undo entries
  • Pros: No table bloat, simpler management
  • Cons: Performance degradation when Undo log grows large
Q7: What are the types of sharding strategies and how do you choose?
  1. Hash Sharding: Hash key for even distribution. Inefficient for range queries.
  2. Range Sharding: Split by key range. Hotspot risk. Good for time-series.
  3. Geography-based Sharding: Split by user location. Useful for regional compliance.
  4. Directory-based Sharding: Flexible routing via mapping table. High management complexity.

Shard key selection criteria: High cardinality, even distribution, alignment with query patterns.

Q8: Redis is single-threaded — how is it so fast?
  1. In-memory Processing: All data in RAM, no disk I/O
  2. I/O Multiplexing: epoll/kqueue handles thousands of connections in a single thread
  3. No Locking Needed: Single-threaded means no concurrency control overhead
  4. Optimized Data Structures: SDS, ziplist, intset — memory-efficient structures
  5. Redis 6.0+: I/O threading introduced (command processing remains single-threaded)

Note: Redis 7.0+ further improved multi-threaded I/O, enhancing network processing performance.

Q9: When should you choose embedding vs referencing in MongoDB?

Embedding (subdocuments):

  • Data queried together
  • 1:1 or 1:few relationships
  • Within 16MB document size limit
  • Example: Orders and order items, users and addresses

Referencing (separate collections):

  • Data queried/updated independently
  • 1:many or many:many relationships
  • Document size grows rapidly
  • Example: Blog posts and comments, products and reviews

Hybrid pattern: Embed frequently-used data, reference the rest.

Q10: Why is connection pooling important and how do you configure it?

DB connection creation cost: TCP handshake + authentication + session allocation = several to tens of ms. Creating connections per request severely degrades performance.

Configuration example (HikariCP):

  • maximumPoolSize: Typically CPU cores x 2 + number of disks
  • minimumIdle: Minimum idle connections
  • connectionTimeout: Maximum time waiting for a connection from the pool
  • idleTimeout: How long idle connections are maintained

PostgreSQL is process-based, so using PgBouncer in front for connection pooling management is a best practice.

Practical System Design (11-15)

Q11: Design a DB architecture for a high-traffic e-commerce service.
  1. Product Catalog: MongoDB (flexible attributes, fast reads)
  2. Orders/Payments: PostgreSQL (ACID transactions)
  3. Cart/Sessions: Redis (TTL, fast read/write)
  4. Product Search: Elasticsearch (full-text search, facets)
  5. Recommendation System: PostgreSQL pgvector (vector similarity)
  6. Analytics/Reporting: ClickHouse or BigQuery

Caching strategy: Cache-Aside pattern with Redis for product info caching. CDN for static content caching.

Q12: Explain database migration strategies.
  1. Dual-Write Pattern: Write to both DBs simultaneously, gradually switch reads
  2. CDC (Change Data Capture): Real-time data sync with tools like Debezium
  3. Shadow Traffic: Mirror traffic to new DB for performance validation
  4. Blue-Green: Prepare two environments and switch at once

Key points: Rollback plan is essential, automate data consistency verification, gradual traffic migration.

Q13: How do you handle replication lag between read replicas and write master?
  1. Read-your-writes Consistency: Read from master for a short time after writes
  2. Minimum Replication Confirmation: PostgreSQL's synchronous_commit setting
  3. Lag Monitoring: Check replay_lag in pg_stat_replication
  4. GTID-based Routing: Route to replicas that have replicated up to a specific GTID in MySQL
  5. Cache Utilization: Store latest values in cache after writes, read from cache instead of replica
Q14: How do you safely perform schema changes on large tables?

PostgreSQL:

  • ALTER TABLE ADD COLUMN (without DEFAULT): Instant completion
  • ALTER TABLE ADD COLUMN DEFAULT value: Instant in pg 11+
  • CREATE INDEX CONCURRENTLY: Index creation without table locks

MySQL:

  • pt-online-schema-change (Percona): Trigger-based online changes
  • gh-ost (GitHub): Binary log-based online changes
  • MySQL 8.0+ INSTANT DDL: Metadata-only change operations

MongoDB:

  • Schema-less so most migrations are unnecessary
  • Index creation: createIndex (background build by default)
Q15: What are the most important metrics for DB performance monitoring?
  1. Query Performance: Slow query logs, p95/p99 response times
  2. Connection Count: Active/idle connections, connection pool utilization
  3. Cache Hit Ratio: Buffer Pool/Shared Buffer hit ratio (target: 99%+)
  4. Replication Lag: Master-replica delay time
  5. Disk I/O: IOPS, latency, utilization
  6. Lock Waits: Row lock/table lock wait time and frequency
  7. Transaction Throughput: TPS, commit/rollback ratios

Tools: pg_stat_statements (PostgreSQL), Performance Schema (MySQL), Atlas (MongoDB), Redis INFO.

Advanced Practical Questions (16-20)

Q16: What is the Hot Partition problem and how do you solve it?

A problem where traffic concentrates on specific partitions/shards, creating performance bottlenecks.

DynamoDB example: When writes concentrate on a specific partition key, that partition gets throttled.

Solutions:

  1. Write Sharding: Add random suffix to key (e.g., USER#001#3)
  2. Composite Key: Include date/time in key for distribution
  3. Caching Layer: Absorb read hotspots with Redis
  4. Distributed Counters: Split one counter into N parts and sum
Q17: Explain backup strategy types and RPO/RTO.
  • RPO (Recovery Point Objective): Maximum acceptable data loss time
  • RTO (Recovery Time Objective): Maximum acceptable recovery time

Backup types:

  1. Full Backup: All data. Short RTO, large storage
  2. Incremental Backup: Changes since last backup. Small storage, complex recovery
  3. Continuous Archiving (WAL): PostgreSQL WAL archiving enables Point-in-Time Recovery (PITR). RPO near zero
  4. Snapshots: EBS snapshots, RDS automatic backups. Fast recovery
Q18: Describe the process for analyzing and optimizing slow queries.
  1. Identify: Enable slow query logs (MySQL: slow_query_log, PostgreSQL: log_min_duration_statement)
  2. Analyze: Check execution plan with EXPLAIN ANALYZE
  3. Optimization order:
    • Add/improve indexes
    • Refactor queries (remove unnecessary JOINs, convert subqueries to CTEs)
    • Apply partitioning
    • Archive data
    • Consider denormalization (read optimization)
  4. Verify: Compare before/after performance
  5. Monitor: Continuously observe pg_stat_statements, Performance Schema
Q19: How do you ensure data consistency in eventually consistent systems?
  1. Read Repair: Fix inconsistencies found during reads with latest values
  2. Anti-Entropy (Merkle Tree): Background data comparison/correction between nodes
  3. Conflict Resolution: Last-write-wins (LWW), vector clocks, custom merge logic
  4. Quorum Reads/Writes: R + W > N condition for strong consistency
  5. Saga Pattern: Compensating transactions instead of distributed transactions for consistency
Q20: What are the top 3 factors to consider when selecting a database in 2025?
  1. AI/Vector Search Support: As RAG pipelines become standard, vector search is essential. Choose between pgvector, Atlas Vector Search, Redis Vector Search.

  2. Operational Complexity vs Cost: Convenience of managed services (RDS, Atlas, ElastiCache) vs cost savings of self-hosting. Decide based on team size and DevOps capabilities.

  3. Data Locality and Regulations: Data sovereignty regulations like GDPR and regional privacy laws. DynamoDB Global Tables, CockroachDB's region-based distribution can be solutions.


14. Quiz

Quiz 1: What are the 2 main differences between JSONB and JSON types in PostgreSQL?
  1. Storage Method: JSON stores as plain text, JSONB decomposes into binary format. JSONB writes slightly slower but reads/searches much faster.
  2. Indexing: JSONB supports GIN indexes for O(log N) key/value searches. JSON cannot be indexed.

Additional: JSONB does not preserve key order and removes duplicate keys.

Quiz 2: What data structure does Redis Sorted Set use internally?

Skip List + Hash Table combination.

  • Skip List: Range search in sorted order O(log N)
  • Hash Table: Score lookup O(1)

When elements are 128 or fewer and values are 64 bytes or less, ziplist (listpack in 7.0+) is used to save memory.

Quiz 3: What is the difference between GSI and LSI in DynamoDB?
ItemGSILSI
Partition KeyCan differ from tableSame as table
Sort KeyFreely chosenFreely chosen
Capacity LimitNone10GB per partition
Creation TimingAfter table creationOnly at table creation
ConsistencyEventually consistent onlyStrong consistency possible
ProvisioningSeparate RCU/WCUShared with table

Practical tip: GSI is more flexible, so prefer GSI in most cases.

Quiz 4: When should you use and avoid transactions in MongoDB?

Use when:

  • Atomic updates across multiple documents/collections (payment and inventory simultaneous changes)
  • Data consistency is a business requirement

Avoid when:

  • Single document updates (MongoDB is already atomic at document level)
  • High-throughput operations (transactions have performance overhead)
  • Frequent multi-shard transactions in sharded environments

Design principle: Embed related data in a single document when possible to reduce the need for transactions.

Quiz 5: Why is VACUUM important in PostgreSQL?

PostgreSQL's MVCC does not immediately remove previous versions during updates/deletes. VACUUM cleans up these "dead tuples."

Without it:

  1. Table Bloat: Wasted disk space, degraded sequential scan performance
  2. Index Bloat: Increased index size, degraded search performance
  3. Transaction ID Wraparound: Risk of data loss after approximately 2 billion transactions (XID wraparound)

Autovacuum is enabled by default, but tables with high write loads need autovacuum parameter tuning.


References


The 8 databases covered in this article each have unique strengths. There is no "best database" — only "optimal databases." Consider workload patterns, scalability requirements, team capabilities, and cost constraints holistically. And rather than relying on a single database for everything, combining each database's strengths through a Polyglot Persistence strategy is the best practice for 2025.