Chaos and Order

💡 왼쪽 원문을 읽으면서 오른쪽에 따라 써보세요. Tab 키로 힌트를 받을 수 있습니다.

Prologue — why "which DB?" got hard again

In 2015 the answer was simple: Postgres for relational, MySQL for speed, Mongo for documents, Redis for cache. In 2026 you get handed a list of 10+ options: Neon, Supabase, PlanetScale, CockroachDB, Spanner, TiDB, Yugabyte, Aurora DSQL, Turso, D1 — and even saying "Postgres" now needs a qualifier.

Three forces pushed the market here:

Serverless and edge runtimes became default. Vercel, Cloudflare, Lambda — short-lived workers don't play well with DBs that expect a long-lived connection pool.
Multi-region users. One Postgres in us-east-1 can't serve Seoul, Frankfurt, and São Paulo with sub-200ms writes.
Billing shifted to usage. Scale-to-zero DBs became the default for startups and side projects where RDS's idle bill was absurd.

This post compares distributed SQL vs serverless Postgres vs edge SQLite — with a decision tree for picking one, not a "best of" list.

1. Re-read CAP and PACELC

PACELC is more useful than CAP: if Partitioned, Availability or Consistency? Else, Latency or Consistency? Even without failures, strict consistency costs latency.

System	On partition	On steady state
DynamoDB, Cassandra	A	L
Spanner, CockroachDB	C	C
MongoDB (default)	A	L
MongoDB (majority)	C	L→C

Strict-serializable is not free. Spanner uses TrueTime (atomic clocks + GPS). Cockroach uses HLC + commit-wait. Multi-region writes always cost a consensus round trip.

Practical checks

Can you tolerate P99 write ≥ 100ms? If not, stay single-region.
What % of transactions genuinely need strict consistency? (Balances, inventory, seats — yes. Feeds, metrics — no.)
Do you have data residency rules? EU data in EU, Korean finance in KR? Then multi-region + region pinning is mandatory.

2. Distributed SQL big three — Spanner, CockroachDB, TiDB

Spanner

Google's original global distributed RDB, powering Search and AdWords. The Postgres interface GA'd in late 2023, making it "Postgres dialect for a globally consistent DB."

TrueTime → external consistency guarantee.
Multi-region writes as a button. US + EU + Asia with strict reads.
2024 additions: Spanner Graph, Full-text Search.
Expensive. High floor price, premium storage.

Use when: already on GCP, need multi-region strict consistency, want minimal ops. Avoid when: single region is enough — Cloud SQL for Postgres is 10× cheaper.

CockroachDB

Open-source "Spanner-inspired." Postgres wire protocol, Raft, HLC.

2024–2025: vector indexes, CDC webhook sinks, improved query planner.
License tightened (2024) — self-hosted has paid thresholds now. Biggest friction point in choosing it.
Serverless rebranded to "Cockroach Cloud Basic/Standard."

Pros: multi-region topology via SQL (ALTER TABLE ... SET LOCALITY REGIONAL BY ROW), online schema changes, high Postgres compatibility. Cons: single-node slower than Postgres, complex joins still tricky, license ambiguity has driven some teams to Yugabyte.

TiDB

PingCAP's MySQL-compatible distributed SQL. Strong traction in Asia.

Architecture: TiDB (SQL) + TiKV (Raft KV) + PD (placement) + TiFlash (columnar, analytics).

2024–2025: TiDB Serverless rolled out globally. Native vector type + HNSW. TiDB MCP for AI agents (2025).

Strengths: MySQL ecosystem carries over, HTAP built-in (row + column store in one cluster). Weaknesses: not for Postgres-first teams, multi-region writes less mature than Spanner/Cockroach.

Summary

Item	Spanner	Cockroach	TiDB
Wire protocol	Postgres	Postgres	MySQL
Multi-region writes	Strongest	Strong	OK
HTAP	Medium	Weak	Strong (TiFlash)
Vendor lock	GCP	Low	Low
Cost	$$$	$$	$$
OSS license	No	BSL (complex)	Apache 2.0

3. Yugabyte — "actually Postgres, distributed"

YugabyteDB forks real Postgres 13+ code for YSQL, so extensions like pgvector, pg_trgm, postgis mostly work unmodified — a differentiator versus re-implementations.

Dual API: YSQL (Postgres) + YCQL (Cassandra-like).
Multi-region: xCluster (async), geo-partitioning (sync).
Yugabyte Aeon: serverless with scale-to-zero under conditions.
Apache 2.0 core — the #1 landing pad after Cockroach license friction.

Trade-off: single-region perf still behind vanilla Postgres, smaller community than Cockroach/TiDB.

4. Aurora DSQL — AWS's answer

Announced at re:Invent 2024: distributed + serverless + Postgres-compatible.

"Spanner-class" strict consistency with Postgres wire protocol.
Active-active multi-region writes.
Scale-to-zero with pay-per-request billing.
Storage, transaction manager, and query processor all disaggregated.
IAM-based auth natively — perfect for Lambda.

Caveats: not a complete Postgres in 2025 — FK, sequences, some extensions limited. AWS lock-in. Still maturing on complex transaction workloads.

Use when: on AWS, want serverless strict-consistency DB, can live with feature gaps.

5. Serverless Postgres — Neon, Supabase, PlanetScale

Neon

Storage–compute separation. Compute auto-suspends when idle.
Git-like branching — branch from main, run migrations on a copy, merge. CoW so cost is near-zero.
Point-in-time restore at any moment.
Acquired by Databricks in late 2024 — now positioned as "DB for AI agents" (spin up a DB per agent/task).

Caveat: single region for writes. Global is read-replica only.

Supabase

Firebase-style backend stack built around Postgres.

Bundled Auth, Storage, Realtime, Edge Functions, Queues, Cron.
pgvector built-in → RAG-ready out of the box.
RLS deeply integrated with Auth — clients can hit PostgREST directly.
Self-hostable.

Caveat: RLS is powerful but foot-gun (teams ship with accidentally open tables). Single-region managed.

PlanetScale

Started as Vitess-based MySQL, pioneered branching + deploy requests.

2024: killed free tier — shockwave for the community, drove Neon/Supabase migrations.
2025: launched PlanetScale for Postgres — head-to-head with Neon.
Best-in-class schema-change process (branch → deploy request → safe migrations).

Caveat: no free tier, less approachable for hobby projects.

Summary

Item	Neon	Supabase	PlanetScale
Protocol	Postgres	Postgres	MySQL + Postgres
Branching	Best	Yes	Strong
Scale-to-zero	Yes	Yes (Pro)	Limited
Free tier	Yes	Yes	No
BaaS bundle	No	Yes	No
Vector	pgvector	pgvector	pgvector

6. Edge SQLite — Turso/libSQL, Cloudflare D1

Turso / libSQL

SQLite fork with embedded replicas:

Writer lives in one region.
Each edge node holds a local SQLite file replica.
Reads hit the local file — microsecond latency.
Writes go to the primary.

Wins: read latency has no network round trip. DB-per-tenant becomes cheap (new DB in seconds). Limits: single-writer ceiling. Analytical joins still want Postgres.

Cloudflare D1

SQLite integrated into Cloudflare Workers. 2025: Read Replicas shipped. Access is env.DB.prepare(...) — no cold-start, no pool.

Caveats: per-DB size cap, transaction isolation evolving.

Use edge SQLite for

Global read-heavy products (docs, catalogs, CMS) — unbeatable latency.
Multi-tenant SaaS where "DB-per-tenant" is natural.

Skip for global-write collaboration (Figma-like). Those want distributed SQL.

7. The Postgres extensions arms race

Postgres becoming a platform is the biggest story of 2022–2025.

pgvector — embeddings, HNSW/IVFFlat. Default RAG storage.
Citus — sharded Postgres, Microsoft-owned, in Azure Postgres.
TimescaleDB — time-series hypertables. 2024 license shift → available across managed services.
pgmq — Postgres as a message queue. Replaces SQS for small shops.
PostgREST — tables as REST, foundation of Supabase.
pg_graphql — GraphQL auto-generation.
pg_trgm / pg_bigm — full-text for non-English (Korean/Japanese N-gram).

Advice: list your required extensions first, then pick a provider that supports all of them. Moving later is painful.

8. Connection management — the real serverless bottleneck

The actual problem in serverless isn't DB throughput, it's connection explosion.

Tool	Where	Use case
PgBouncer	In front of DB	Classic, well understood
Supavisor	Supabase-built	Wire-protocol pooler (Elixir)
Neon Pooler	Neon-built	Optimized for serverless
RDS Proxy	AWS-managed	Lambda-friendly
Prisma Accelerate	SaaS	Global edge pool + query cache
HTTP drivers (Neon/PlanetScale)	Client	Skip TCP entirely, talk over HTTPS

Rules of thumb

Vercel/Lambda + Postgres → HTTP driver first (Neon serverless driver, Prisma Accelerate).
Containers (ECS, Fargate) + Postgres → PgBouncer or managed pooler.
Big monolith → HikariCP or in-app pool.

9. Migrations in 2025

The real differentiator between products is migration UX.

Tools

Atlas — declarative schema + CI, for Postgres and MySQL.
pgroll — online migrations with shadow schema + dual-read views.
Neon Branch — branch → apply → promote.
PlanetScale Deploy Request — the original reviewable migration flow.
Prisma Migrate / Drizzle Kit / Sqitch — per-ORM chains.

Principle — Expand & Contract

Add new column (nullable, no default).
Write to both old and new.
Backfill.
Switch reads.
Drop old.

Every step is backward-compatible — deploy rollbacks don't break the system.

10. Observability, backup, DR

Feature parity is close; operational quality varies.

Checklist:

Slow query logs, pg_stat_statements, EXPLAIN UX, metric dashboards.
PITR — how fine-grained? Cross-region copies?
Region failover — automated? RPO/RTO numbers documented?
Security — IP allowlists, VPC peering/PrivateLink, encryption, audit logs.
Compliance — SOC2 / ISO27001 / HIPAA / PCI / GDPR.

Product	PITR	Cross-region backup	VPC Peering
Aurora	second	Yes	Yes
Cockroach Cloud	minute	Yes	Yes
Neon	arbitrary time	Conditional	Enterprise
Supabase	daily (PITR paid)	Paid	Enterprise
Turso	minute	Per-region	Limited

11. Five places money leaks

Storage/compute split billing — Aurora, DSQL, Neon: scaling compute down still leaves storage fixed cost.
Egress — cross-region or out-of-cloud traffic is the silent budget killer.
Connection memory — thousands of Lambda directly connecting → memory overhead → bigger instance → higher bill.
Branch/snapshot accumulation — branch-per-PR CI can pile up CoW storage. Clean them up.
Vector index RAM — pgvector HNSW is fast but memory-heavy. 10M × 1536-d vectors ≈ 60GB+.

12. Decision tree

Q1: Strict consistency + global writes required?
├─ Yes → Spanner / Cockroach / Yugabyte / Aurora DSQL
└─ No → Q2

Q2: Serverless/edge is your primary runtime?
├─ Read-heavy + edge: Turso (libSQL), Cloudflare D1
├─ Normal CRUD: Neon, Supabase, PlanetScale
└─ No → Q3

Q3: Need large analytics / HTAP?
├─ Yes: TiDB (HTAP), or split OLAP to BigQuery/Snowflake/ClickHouse
└─ No → Q4

Q4: Want full Postgres extension ecosystem?
├─ Managed: Supabase, Neon, RDS for Postgres
├─ Self-hosted: Postgres + Citus + TimescaleDB
└─ No: RDS MySQL, Aurora MySQL, Cloud SQL

13. Watch in 2026 and beyond

AI-native DBs. Neon (Databricks), TiDB, MotherDuck — "DB as agent workspace" with instant branches and conversational querying.
Distributed vector search. pgvector HNSW is hard to shard. 2026 will be the battleground for distributed ANN.
Aurora DSQL GA. A first-party AWS distributed SQL shifts the market hard.
Sovereign cloud. EU, Middle East, Japan regulations push multi-region from option to requirement.
Lakehouse creep. MotherDuck and kin bring DuckDB/Parquet into the OLTP-adjacent stack.

12-question adoption checklist

Do you have actual metrics for query load and connection peaks?
Which tables/transactions truly need strict consistency?
Is data residency in writing?
Do all candidate DBs support every extension you need?
Is scale-to-zero valuable, or is your load 24/7?
Will branching actually be used (branch-per-PR)?
Do you have RPO/RTO numbers for backup/PITR?
Is your connection-pool strategy decided (HTTP vs PgBouncer vs Proxy)?
Are schema changes managed (Atlas/pgroll/Deploy Request)?
Can you integrate observability (Datadog, Grafana, pg_stat_statements)?
Is the exit cost calculated (how painful to migrate away)?
Does the pricing model still hold at year-3 scale?

10 common mistakes

Treating "Postgres" as a single thing — RDS/Aurora/Neon/Supabase differ deeply.
Demanding strict consistency on every table — 80% doesn't need it.
Postponing multi-region "for later" — retrofit pain is severe.
Not cleaning branches — thousands accumulate and bill pile up.
Serverless without a pooler — the most common outage cause.
Ignoring vendor lock-in — Spanner/DSQL are sticky.
Skipping EXPLAIN ANALYZE — distributed plans are opaque without it.
Timezone chaos — store UTC, convert at render only.
Starting with no foreign keys — adding them at scale is very expensive.
"We won't migrate." A 5+ year service does migrate at least once. Add abstraction early.

Next episode

Season 7 Episode 4: Message Queues & Event Streaming 2025 — Kafka, Pulsar, NATS, Redpanda, NSQ, pgmq, SQS/SNS, Pub/Sub compared. Why distributed transactions are hard, event sourcing + CQRS in practice, and why "exactly-once" is mostly a marketing phrase.

— End of Distributed Databases.