Chaos and Order

💡 왼쪽 원문을 읽으면서 오른쪽에 따라 써보세요. Tab 키로 힌트를 받을 수 있습니다.

Prologue — Two years, a different map
1. What changed — price collapse and the object-storage shift
2. Managed SaaS — Pinecone Serverless, Turbopuffer, Zilliz Cloud
3. OSS-first engines — Qdrant, Weaviate, Milvus, Vespa, Vald
4. The Postgres camp — pgvector + pgvectorscale + Supabase/Neon/Tiger
- 4-1. pgvector + pgvectorscale
- 4-2. Managed — Supabase, Neon, Tiger Cloud
5. Embedded and analytical — Chroma, LanceDB, DuckDB VSS
6. Vectors bolted onto general DBs — MongoDB Atlas, Couchbase, Elasticsearch/OpenSearch
7. Hybrid search standard — BM25 + dense + rerankers
- Embedding and reranking landscape (April-2026 MTEB)
8. Cost matrix — what does 1M vectors actually cost per month
9. Decision diagram — "pgvector vs dedicated"
10. Operational anti-patterns
11. Epilogue — checklist and the next post
- Pre-launch checklist
- Next post
References

Prologue — Two years, a different map

When I started a RAG pilot in spring 2024, the menu was short. Pinecone was expensive but easy. Weaviate was modular but rough to self-host. Chroma was a notebook toy. pgvector had a reputation as "small datasets only".

By May 2026 almost every part of that received wisdom is wrong.

Pinecone Serverless collapsed the price curve. Free tier of 2GB and 5 indexes, standard rate around 8.25 dollars per 1M read units, 2 dollars per 1M writes, 0.33 dollars per GB-month. Small RAG is effectively free.
Turbopuffer put 2.5 trillion vectors on S3 and made "object storage plus SSD cache" the industry pattern. Cursor and Notion run on it.
pgvectorscale (Timescale, now Tiger Data) shipped StreamingDiskANN. On 50M Cohere-768 vectors, self-hosted on AWS EC2, it beats Pinecone s1 by 28x on p95 latency and costs 75% less.
Qdrant dropped RocksDB entirely in v1.17 in favor of an in-house engine (gridstore), and raised a 50M-dollar Series B in March 2026.
DuckDB VSS dropped HNSW into the middle of analytical workflows. You can do vector search inside your data pipeline.
Vespa rebranded as an "AI Search Platform" and still runs Spotify's real-time HNSW.
Hybrid search (BM25 + dense + reranker) is the standard. Cohere Rerank 3.5, Voyage Rerank 2.5, Jina Reranker v3, BGE-M3 all hold real positions.

This post follows up on last year's "Vector DB comparison — Pinecone, Weaviate, Chroma, pgvector". That was an intro. This is the May-2026 map with cost, architecture, and failure stories on one page.

Flow:

What changed — price collapse and the object-storage shift
Managed SaaS — Pinecone Serverless, Turbopuffer, Zilliz Cloud
OSS-first engines — Qdrant, Weaviate, Milvus, Vespa, Vald
The Postgres camp — pgvector plus pgvectorscale on Supabase, Neon, Tiger
Embedded and analytical — Chroma, LanceDB, DuckDB VSS
Vectors bolted onto general DBs — MongoDB Atlas, Couchbase, Elasticsearch/OpenSearch
Hybrid search standard — BM25, dense, rerankers
Cost matrix — what does 1M vectors actually cost per month
Decision diagram — "pgvector vs dedicated"
Operational anti-patterns
Epilogue — checklist and the next post

1. What changed — price collapse and the object-storage shift

1-1. The price curve

In 2023 and early 2024 a Pinecone p1.x1 pod ran roughly 0.10 dollars an hour. A stable 100M-vector workload meant a four-figure monthly bill before anything else. The Serverless architecture announced in January 2024 broke the model — compute and storage separated, cold data shipped to object storage, billing per query. May-2026 Standard pricing:

Line item	Rate	Notes
Read	8.25 USD per 1M read units	1 RU is roughly a 1KB-payload search
Write	2 USD per 1M write units	Upserts and deletes both
Storage	0.33 USD per GB-month	Metadata included
Reserved capacity	Separate hourly fee	Kicks in on sustained high concurrency
Free tier	2GB, 5 indexes	1 project, no SLA, no RBAC, no support

Small RAG (1M vectors, tens of thousands of daily queries) dropped to 5 to 15 dollars a month. Large workloads are still pricey, and the reserved-capacity line is the bill that tends to spike unexpectedly — watch that one.

1-2. The object-storage shift — Turbopuffer

Simon Eskildsen built Turbopuffer under Cursor and Notion with one rule: object storage is the source of truth, SSD is just a cache. The cost is roughly 70 dollars per TB-month, versus about 600 for triple-replicated SSD and about 1,600 for RAM-cache incumbents. One- to two-orders-of-magnitude difference.

Operating scale, May 2026:

4T+ documents
10M+ writes per second, 25k+ queries per second
Query unit prices cut up to 94% (July 2024)

Zilliz pushed back with "storage isn't the whole story — cold misses penalize p99". The counter to the counter is that most production workloads are mostly cold: long-tail search, archive, multilingual indexes. For those, Turbopuffer's model wins on price by an embarrassing margin.

1-3. The "just use Postgres" movement

Timescale (now Tiger Data) shipped pgvectorscale on top of pgvector with a new index type called StreamingDiskANN. On 50M vectors at 768 dimensions:

p95 latency: 28x lower than Pinecone s1
Throughput: 16x higher
Cost: 75% lower when self-hosted on AWS EC2 at 99% recall

Add Statistical Binary Quantization (SBQ) and Filtered DiskANN (label-aware pre-filtering inside the index) and the case gets stronger. Version 0.9.0 in November 2025 added PG18 support and concurrent index builds, and TimescaleDB 2.26.0 in March 2026 brought it to Tiger Cloud.

The headline is short: if you already run Postgres, do not introduce a new database.

2. Managed SaaS — Pinecone Serverless, Turbopuffer, Zilliz Cloud

2-1. Pinecone Serverless

May 2026: still the safest managed default. Free tier covers most prototypes; ops burden is zero. Two real downsides:

Lock-in: index format is not interoperable. Migration is upsert-by-upsert.
Concurrency bill spikes: reserved-capacity fees activate under sustained load. Agentic workloads that fire tens of queries per second can hit this hard.

When to pick it:

No infra team, 1M to 10M vectors
"Need results this quarter," nobody to run a system
Multi-region and HA are SLA requirements

2-2. Turbopuffer

The company that put "object-storage-first" in our vocabulary. One price book scales from a hobby index to four trillion documents. 2026 published tiers:

Launch: 64 USD per month
Scale: 256 USD per month
Enterprise: contact

Storage at roughly 0.02 USD per GB (S3, GCS pass-through), SSD cache at roughly 0.1 USD per GB, queries and writes usage-based. 100x cheaper cold, 6 to 20x cheaper warm. Cursor and Notion are the production proofs.

Weak spots:

Latency distribution: cache misses trigger an S3 fetch and p99 spikes. Not recommended for real-time recommendations.
Ecosystem: fewer integrations and tutorials than Pinecone or Qdrant.

When to pick it:

100M+ vectors with heavy cold tail
Cost-sensitive search and archive
"200ms per query is fine"

2-3. Zilliz Cloud (Milvus 2.6 managed)

Zilliz Cloud went GA with Milvus 2.6.x in January 2026: 0.096 USD per CU-hour compute, 0.02 USD per GB-month storage. Two real strengths:

CAGRA GPU index (NVIDIA): hundreds of thousands of QPS on a single-GPU node at 100M scale
Hybrid GPU/CPU deployments: GPU builds the graph, CPU serves search

When to pick it:

100M to 10B vectors with access to GPU infra
Multi-tenant SaaS with tens of thousands of QPS
Already running Spark or NVIDIA RAPIDS pipelines

3. OSS-first engines — Qdrant, Weaviate, Milvus, Vespa, Vald

3-1. Qdrant — Rust, filter-strong

Qdrant in 2026 has two headline events:

v1.17 removed RocksDB in favor of gridstore. Direct jumps from v1.15 to v1.17 are not supported — step through.
Series B of 50M USD in March 2026: enterprise sales motion.

Strengths:

Payload filtering is integrated with the index: precise filters like category = 'X' AND created_at > '2026-01-01' are routinely faster than Weaviate or Pinecone at the same recall.
Rust-native: memory safety, single-binary deploy, one Helm chart and done.
Sharding and replication: easy to self-host.

Weak spots:

No modular RAG (no rerankers or generators as a module). You bring the pipeline.

3-2. Weaviate — modular and multi-vector

Weaviate in 2026 leads on hybrid search, ColBERT multi-vector, and reranker modules. BM25F plus dense in a single query with configurable fusion is a first-class citizen. Rerankers and generators live as modules — you can let the DB absorb part of the RAG pipeline.

When to pick it:

GraphQL-friendly, OK with the DB hosting part of the pipeline
ColBERT v2 style late-interaction multi-vectors
Hybrid search in a single query

3-3. Milvus — scale and GPU

OSS Milvus 2.6.x added more flexible CAGRA deployments (GPU plus CPU), decoupled storage and indexer nodes, and a lighter metadata service. Self-hostable in principle, but production almost always ends up on Zilliz Cloud.

3-4. Vespa — real-time search plus vector hybrid

Spun out of Yahoo, Vespa rebranded in 2026 as an "AI Search Platform". The strength is "vectors plus BM25 plus tensors plus ML ranking in one query". Spotify's search runs real-time HNSW on it.

Special features:

Tensor storage: multi-dimensional representations (text, image, numeric) in one place
Ranking functions are first-class: ML ranking executes inside the DB
Highest operational burden: Vespa is not light. You need a search team.

When to pick it:

Search is your product
Recommendation, ranking, and hybrid all in one query
You have Lucene or Solr-era people on staff

3-5. Vald — Yahoo Japan's OSS

Vald is Yahoo JAPAN's Kubernetes-native ANN engine on top of NGT. Internal Japanese deployments demonstrate billion-scale millisecond search. Global momentum has cooled since 2024, but K8s affinity and gRPC-first design still have value. Worth a look if you are in Japan or specifically need NGT.

4. The Postgres camp — pgvector + pgvectorscale + Supabase/Neon/Tiger

4-1. pgvector + pgvectorscale

May 2026, "just use Postgres" is not a joke.

pgvector 0.9.x: HNSW and IVFFlat, all in the same place as transactions, JOINs, JSONB.
pgvectorscale: StreamingDiskANN. On the 50M-vector bench, p95 28x below Pinecone s1, 75% cheaper.
Filtered DiskANN: label-based pre-filtering inside the index. Multi-tenant RAG with "tenant_id = X AND embedding similarity" is the kind of query this was built for.
SBQ: 1/32-scale memory footprint.

Self-hosted weak spots:

HNSW builds can take a long time (tens of minutes to hours at tens of millions of vectors).
HA and replication: same Postgres ops problem as ever — not trivial.
VACUUM and pg_repack strategy has to be rewritten for vector indexes.

4-2. Managed — Supabase, Neon, Tiger Cloud

Supabase: pgvector as a first-class citizen, plus auth, Realtime, Edge Functions. Excellent value up to about 100M vectors.
Neon: serverless Postgres plus pgvector. Branching and copy-on-write are great for RAG experiments.
Tiger Cloud (formerly Timescale): the first-class home for pgvectorscale. Smoothest path for time-series plus vectors.

When to pick this camp:

Already running Postgres, under 1B vectors
Need vectors in the same transaction as relational data
Your infra team refuses to add a new DB (the realistic case)

5. Embedded and analytical — Chroma, LanceDB, DuckDB VSS

5-1. Chroma — developer-friendly, now with Cloud

Chroma is long past 1.0 — 1.5.9 as of early May 2026. Chroma Cloud GA in August 2025, BM25 plus SPLADE sparse vectors as first-class in November 2025, CMEK in January 2026, array metadata with $contains operators in February 2026.

Still the easiest start. pip install chromadb, five minutes to a RAG demo. Above ~100M vectors, pick something else.

5-2. LanceDB — Lance columnar, embedded-first

LanceDB is an in-process vector DB on top of the Lance columnar format. Core in Rust, clients in Python, Node, Rust, and a REST API. Positioning is clear: vectors on top of a data lake.

Strengths:

In-process and zero-copy: fast without a server
Columnar analytics: vectors, text, and metadata in one file, SQL-queryable via DataFrame
Versioning: the Lance format supports transactions and versions

When to pick it:

Notebook, CLI, or local RAG
You want vectors next to S3 or GCS data
Multimodal ML workflows

5-3. DuckDB VSS — vectors inside analytics

DuckDB's VSS extension adds an HNSW index to ARRAY columns. Two lines: INSTALL vss; LOAD vss;. array_cosine_distance and array_negative_inner_product are index-accelerated.

INSTALL vss;
LOAD vss;
CREATE TABLE docs (id INTEGER, embedding FLOAT[768]);
CREATE INDEX idx ON docs USING HNSW (embedding)
  WITH (metric = 'cosine');
SELECT id FROM docs
ORDER BY array_cosine_distance(embedding, ?::FLOAT[768])
LIMIT 5;

VSS is still experimental — not for production. But for vector search inside analytical pipelines (a data scientist materializing an ad-hoc index in a notebook) it is already extremely useful.

When to pick it:

Data analysis with a vector-search step
"Run ETL once, build an index, query, write the report"

6. Vectors bolted onto general DBs — MongoDB Atlas, Couchbase, Elasticsearch/OpenSearch

General-purpose DBs adding vector columns is the dominant 2024-2026 storyline.

MongoDB Atlas Vector Search: managed, document plus vector in one collection. Natural choice for Mongo shops.
Couchbase Vector Search: also managed, FTS plus vector combined.
Elasticsearch/OpenSearch kNN: BM25 plus dense in one query. Operational cost is real, but obvious if you already run ES.
Redis Stack vector index: cache plus session plus vector. Great for small, hot datasets.

The pitch is simple: use what you already operate, add one more column type. Operational consistency is the biggest win.

7. Hybrid search standard — BM25 + dense + rerankers

"Vectors alone are enough" is essentially a dead position in 2026. The standard RAG pipeline is three stages.

Query
  ├─ BM25 search (sparse) ───┐
  ├─ Dense vector search ────┼──> Reciprocal Rank Fusion or weighted sum
  └─ (optional) ColBERT/late ┘
                              │
                              ▼
                       Candidates: 50 to 200
                              │
                              ▼
                     Cross-encoder reranker
                     (Cohere Rerank 3.5,
                      Voyage Rerank 2.5,
                      Jina Reranker v3,
                      BGE Reranker)
                              │
                              ▼
                       Top 5 to 10 to LLM

Embedding and reranking landscape (April-2026 MTEB)

Embedding — voyage-3-large (Voyage AI): retrieval near the top
NV-Embed-v2: best overall MTEB average class
OpenAI text-embedding-3-large: solid general default
Cohere embed-v3 plus Rerank v3.5: strong when paired
BGE-M3: 100+ languages, the self-hosted multilingual default
Nomic Embed v2: lower-cost multilingual
Jina Reranker v3: top BEIR nDCG@10 at 61.94, 188ms
Cohere Rerank 3.5 / Voyage Rerank 2.5: 600ms-ish average

Operational notes:

Do not turn BM25 off. Dense vectors are weak on proper nouns, code, and acronyms.
Rerankers must be cross-encoders. Running bi-encoders twice is not the same thing.
You will swap embedding models. Budget for full reindex from day one.

8. Cost matrix — what does 1M vectors actually cost per month

Rough self-estimate: 1M vectors at 1536 dimensions, 10k queries per day, on US or KR managed standard plans or comparable self-host. Real prices depend on commitments and load shape.

Option	Monthly cost estimate	Notes
Pinecone Serverless (Standard)	5 to 30 USD	Free tier fits 1M-1536D in most cases
Turbopuffer (Launch)	64 USD flat + usage	Multiple indexes still fit Launch
Zilliz Cloud (Serverless small)	30 to 80 USD	0.096 USD per CU-hour plus storage
Qdrant Cloud	25 to 70 USD	Single-node baseline
Weaviate Cloud	25 to 80 USD	Varies with enabled modules
pgvector + pgvectorscale on Tiger	20 to 60 USD	Best when you also need time-series
Supabase (Pro plus pgvector)	from 25 USD	Includes auth and Realtime
Neon (serverless pgvector)	from 19 USD	Branching is a real superpower
LanceDB OSS	0 USD	Storage cost only (S3)
Chroma OSS	0 USD	Infra cost only
DuckDB VSS	0 USD	In-process, experimental
MongoDB Atlas Vector	30 to 100 USD	From an M10 instance up
Elasticsearch (Elastic Cloud)	from 95 USD	BM25 plus kNN

The gap widens at 100M vectors: Turbopuffer, pgvectorscale, and self-hosted Milvus dominate on price per QPS. Above 1B vectors, you essentially need a distributed engine — Milvus, Vespa, or Vald.

9. Decision diagram — "pgvector vs dedicated"

                          ┌──────────────────────────────┐
                          │  Under 100k vectors,         │──> Chroma / LanceDB / DuckDB VSS
                          │  prototype phase             │
                          └──────────────────────────────┘
                                       │
                                       ▼
                          ┌──────────────────────────────┐
                          │ Already running Postgres?    │
                          └─────────────┬────────────────┘
                                        │
                          Yes ◀─────────┴─────────▶ No
                          │                          │
                          ▼                          ▼
            ┌────────────────────────┐   ┌──────────────────────────┐
            │ Under 1B vectors,       │   │ No infra team,           │
            │ need transactions/JOINs │   │ "ship this quarter"      │──> Pinecone Serverless
            │ → pgvector +            │   └─────────────┬────────────┘
            │   pgvectorscale         │                 │
            │   (Tiger / Supabase /   │                 ▼
            │    Neon)                │   ┌──────────────────────────┐
            └────────────────────────┘   │ Cold-heavy, cost-sensitive│──> Turbopuffer
                                         └─────────────┬────────────┘
                                                       │
                                                       ▼
                                         ┌──────────────────────────┐
                                         │ Self-host OK,             │
                                         │ filter-heavy workload     │──> Qdrant
                                         └─────────────┬────────────┘
                                                       │
                                                       ▼
                                         ┌──────────────────────────┐
                                         │ ColBERT multi-vector +    │
                                         │ modular RAG              │──> Weaviate
                                         └─────────────┬────────────┘
                                                       │
                                                       ▼
                                         ┌──────────────────────────┐
                                         │ 100M+ vectors + GPU       │──> Milvus / Zilliz
                                         └─────────────┬────────────┘
                                                       │
                                                       ▼
                                         ┌──────────────────────────┐
                                         │ Search is the product,    │
                                         │ ML ranking integrated     │──> Vespa
                                         └──────────────────────────┘

The default branch is simple.

Already on Postgres? Try pgvector + pgvectorscale first. 90% of RAG ends here.
No ops team? Pinecone Serverless. Free tier is generous.
Cold-heavy? Turbopuffer. Price difference is qualitative.
Filter-heavy: Qdrant. Modular RAG: Weaviate. Scale plus GPU: Milvus or Zilliz. Search is the product: Vespa.
Notebook, CLI, embedded: Chroma, LanceDB, DuckDB VSS.

10. Operational anti-patterns

The mistakes you only learn the hard way.

Silent embedding-model swap. Going from text-embedding-ada-002 to voyage-3-large without a full reindex makes the cosine space drift and retrieval quality collapses. Always reindex with A/B verification.
Turning BM25 off. Dense vectors are bad at code, proper nouns, and acronyms. If your internal wiki cannot find K8s, pgbouncer, or RAGAS, this is almost certainly the cause.
Skipping the reranker. Without a cross-encoder over top-50 candidates, you lose more than 30% of your achievable answer quality.
Default HNSW parameters. M and ef_construction look fine in small datasets and ruin recall above 1M. Tune to your dimensionality, distribution, and query shape.
Filtering outside the index. Qdrant and pgvectorscale both reward integrated filters. Multi-tenant RAG should pin tenant_id as a payload filter inside the index.
Storing blobs in metadata. Putting full text or images in the vector DB drives read units through the roof. Reference only — keep bodies in object storage.
No retry or backoff. Managed SaaS will return 429. Exponential backoff with jitter is not optional.
Monitoring equals average latency. p50 means little. You need p95, p99, and recall@k together.
Ignoring lock-in. Pinecone migrations are upsert-by-upsert. Always keep the embedding source in S3 or GCS in parallel.
"Index everything and pray". If 50% of your retrieval pool is noise, no database choice can save the answer quality. Chunking and selection strategy come before DB choice.

11. Epilogue — checklist and the next post

Pre-launch checklist

Twelve questions to answer before going live.

Next up: "RAG evaluation pipeline hands-on — measuring retriever, reranker, and generator separately with RAGAS, Ragas, Phoenix, and LangSmith". Picking a vector DB is not the end. When answer quality drops, you need to know whether it fell at retrieval recall, rerank precision, or generation faithfulness. Every DB in this post becomes comparable under the same eval frame.

Prologue — Two years, a different map

1. What changed — price collapse and the object-storage shift

1-1. The price curve

1-2. The object-storage shift — Turbopuffer

1-3. The "just use Postgres" movement

2. Managed SaaS — Pinecone Serverless, Turbopuffer, Zilliz Cloud

2-1. Pinecone Serverless

2-2. Turbopuffer

2-3. Zilliz Cloud (Milvus 2.6 managed)

3. OSS-first engines — Qdrant, Weaviate, Milvus, Vespa, Vald

3-1. Qdrant — Rust, filter-strong

3-2. Weaviate — modular and multi-vector

3-3. Milvus — scale and GPU

3-4. Vespa — real-time search plus vector hybrid

3-5. Vald — Yahoo Japan's OSS

4. The Postgres camp — pgvector + pgvectorscale + Supabase/Neon/Tiger

4-1. pgvector + pgvectorscale

4-2. Managed — Supabase, Neon, Tiger Cloud

5. Embedded and analytical — Chroma, LanceDB, DuckDB VSS

5-1. Chroma — developer-friendly, now with Cloud

5-2. LanceDB — Lance columnar, embedded-first

5-3. DuckDB VSS — vectors inside analytics

6. Vectors bolted onto general DBs — MongoDB Atlas, Couchbase, Elasticsearch/OpenSearch

7. Hybrid search standard — BM25 + dense + rerankers

Embedding and reranking landscape (April-2026 MTEB)

8. Cost matrix — what does 1M vectors actually cost per month

9. Decision diagram — "pgvector vs dedicated"

10. Operational anti-patterns

11. Epilogue — checklist and the next post

Pre-launch checklist

Next post

References