Skip to content

✍️ 필사 모드: The 2026 Vector DB Landscape — Pinecone Serverless, Turbopuffer, pgvectorscale, Qdrant, Weaviate, Vespa, and What Actually Changed

English
0%
정확도 0%
💡 왼쪽 원문을 읽으면서 오른쪽에 따라 써보세요. Tab 키로 힌트를 받을 수 있습니다.

Prologue — Two years, a different map

When I started a RAG pilot in spring 2024, the menu was short. Pinecone was expensive but easy. Weaviate was modular but rough to self-host. Chroma was a notebook toy. pgvector had a reputation as "small datasets only".

By May 2026 almost every part of that received wisdom is wrong.

  • Pinecone Serverless collapsed the price curve. Free tier of 2GB and 5 indexes, standard rate around 8.25 dollars per 1M read units, 2 dollars per 1M writes, 0.33 dollars per GB-month. Small RAG is effectively free.
  • Turbopuffer put 2.5 trillion vectors on S3 and made "object storage plus SSD cache" the industry pattern. Cursor and Notion run on it.
  • pgvectorscale (Timescale, now Tiger Data) shipped StreamingDiskANN. On 50M Cohere-768 vectors, self-hosted on AWS EC2, it beats Pinecone s1 by 28x on p95 latency and costs 75% less.
  • Qdrant dropped RocksDB entirely in v1.17 in favor of an in-house engine (gridstore), and raised a 50M-dollar Series B in March 2026.
  • DuckDB VSS dropped HNSW into the middle of analytical workflows. You can do vector search inside your data pipeline.
  • Vespa rebranded as an "AI Search Platform" and still runs Spotify's real-time HNSW.
  • Hybrid search (BM25 + dense + reranker) is the standard. Cohere Rerank 3.5, Voyage Rerank 2.5, Jina Reranker v3, BGE-M3 all hold real positions.

This post follows up on last year's "Vector DB comparison — Pinecone, Weaviate, Chroma, pgvector". That was an intro. This is the May-2026 map with cost, architecture, and failure stories on one page.

Flow:

  1. What changed — price collapse and the object-storage shift
  2. Managed SaaS — Pinecone Serverless, Turbopuffer, Zilliz Cloud
  3. OSS-first engines — Qdrant, Weaviate, Milvus, Vespa, Vald
  4. The Postgres camp — pgvector plus pgvectorscale on Supabase, Neon, Tiger
  5. Embedded and analytical — Chroma, LanceDB, DuckDB VSS
  6. Vectors bolted onto general DBs — MongoDB Atlas, Couchbase, Elasticsearch/OpenSearch
  7. Hybrid search standard — BM25, dense, rerankers
  8. Cost matrix — what does 1M vectors actually cost per month
  9. Decision diagram — "pgvector vs dedicated"
  10. Operational anti-patterns
  11. Epilogue — checklist and the next post

1. What changed — price collapse and the object-storage shift

1-1. The price curve

In 2023 and early 2024 a Pinecone p1.x1 pod ran roughly 0.10 dollars an hour. A stable 100M-vector workload meant a four-figure monthly bill before anything else. The Serverless architecture announced in January 2024 broke the model — compute and storage separated, cold data shipped to object storage, billing per query. May-2026 Standard pricing:

Line itemRateNotes
Read8.25 USD per 1M read units1 RU is roughly a 1KB-payload search
Write2 USD per 1M write unitsUpserts and deletes both
Storage0.33 USD per GB-monthMetadata included
Reserved capacitySeparate hourly feeKicks in on sustained high concurrency
Free tier2GB, 5 indexes1 project, no SLA, no RBAC, no support

Small RAG (1M vectors, tens of thousands of daily queries) dropped to 5 to 15 dollars a month. Large workloads are still pricey, and the reserved-capacity line is the bill that tends to spike unexpectedly — watch that one.

1-2. The object-storage shift — Turbopuffer

Simon Eskildsen built Turbopuffer under Cursor and Notion with one rule: object storage is the source of truth, SSD is just a cache. The cost is roughly 70 dollars per TB-month, versus about 600 for triple-replicated SSD and about 1,600 for RAM-cache incumbents. One- to two-orders-of-magnitude difference.

Operating scale, May 2026:

  • 4T+ documents
  • 10M+ writes per second, 25k+ queries per second
  • Query unit prices cut up to 94% (July 2024)

Zilliz pushed back with "storage isn't the whole story — cold misses penalize p99". The counter to the counter is that most production workloads are mostly cold: long-tail search, archive, multilingual indexes. For those, Turbopuffer's model wins on price by an embarrassing margin.

1-3. The "just use Postgres" movement

Timescale (now Tiger Data) shipped pgvectorscale on top of pgvector with a new index type called StreamingDiskANN. On 50M vectors at 768 dimensions:

  • p95 latency: 28x lower than Pinecone s1
  • Throughput: 16x higher
  • Cost: 75% lower when self-hosted on AWS EC2 at 99% recall

Add Statistical Binary Quantization (SBQ) and Filtered DiskANN (label-aware pre-filtering inside the index) and the case gets stronger. Version 0.9.0 in November 2025 added PG18 support and concurrent index builds, and TimescaleDB 2.26.0 in March 2026 brought it to Tiger Cloud.

The headline is short: if you already run Postgres, do not introduce a new database.


2. Managed SaaS — Pinecone Serverless, Turbopuffer, Zilliz Cloud

2-1. Pinecone Serverless

May 2026: still the safest managed default. Free tier covers most prototypes; ops burden is zero. Two real downsides:

  • Lock-in: index format is not interoperable. Migration is upsert-by-upsert.
  • Concurrency bill spikes: reserved-capacity fees activate under sustained load. Agentic workloads that fire tens of queries per second can hit this hard.

When to pick it:

  • No infra team, 1M to 10M vectors
  • "Need results this quarter," nobody to run a system
  • Multi-region and HA are SLA requirements

2-2. Turbopuffer

The company that put "object-storage-first" in our vocabulary. One price book scales from a hobby index to four trillion documents. 2026 published tiers:

  • Launch: 64 USD per month
  • Scale: 256 USD per month
  • Enterprise: contact

Storage at roughly 0.02 USD per GB (S3, GCS pass-through), SSD cache at roughly 0.1 USD per GB, queries and writes usage-based. 100x cheaper cold, 6 to 20x cheaper warm. Cursor and Notion are the production proofs.

Weak spots:

  • Latency distribution: cache misses trigger an S3 fetch and p99 spikes. Not recommended for real-time recommendations.
  • Ecosystem: fewer integrations and tutorials than Pinecone or Qdrant.

When to pick it:

  • 100M+ vectors with heavy cold tail
  • Cost-sensitive search and archive
  • "200ms per query is fine"

2-3. Zilliz Cloud (Milvus 2.6 managed)

Zilliz Cloud went GA with Milvus 2.6.x in January 2026: 0.096 USD per CU-hour compute, 0.02 USD per GB-month storage. Two real strengths:

  • CAGRA GPU index (NVIDIA): hundreds of thousands of QPS on a single-GPU node at 100M scale
  • Hybrid GPU/CPU deployments: GPU builds the graph, CPU serves search

When to pick it:

  • 100M to 10B vectors with access to GPU infra
  • Multi-tenant SaaS with tens of thousands of QPS
  • Already running Spark or NVIDIA RAPIDS pipelines

3. OSS-first engines — Qdrant, Weaviate, Milvus, Vespa, Vald

3-1. Qdrant — Rust, filter-strong

Qdrant in 2026 has two headline events:

  • v1.17 removed RocksDB in favor of gridstore. Direct jumps from v1.15 to v1.17 are not supported — step through.
  • Series B of 50M USD in March 2026: enterprise sales motion.

Strengths:

  • Payload filtering is integrated with the index: precise filters like category = 'X' AND created_at > '2026-01-01' are routinely faster than Weaviate or Pinecone at the same recall.
  • Rust-native: memory safety, single-binary deploy, one Helm chart and done.
  • Sharding and replication: easy to self-host.

Weak spots:

  • No modular RAG (no rerankers or generators as a module). You bring the pipeline.

3-2. Weaviate — modular and multi-vector

Weaviate in 2026 leads on hybrid search, ColBERT multi-vector, and reranker modules. BM25F plus dense in a single query with configurable fusion is a first-class citizen. Rerankers and generators live as modules — you can let the DB absorb part of the RAG pipeline.

When to pick it:

  • GraphQL-friendly, OK with the DB hosting part of the pipeline
  • ColBERT v2 style late-interaction multi-vectors
  • Hybrid search in a single query

3-3. Milvus — scale and GPU

OSS Milvus 2.6.x added more flexible CAGRA deployments (GPU plus CPU), decoupled storage and indexer nodes, and a lighter metadata service. Self-hostable in principle, but production almost always ends up on Zilliz Cloud.

3-4. Vespa — real-time search plus vector hybrid

Spun out of Yahoo, Vespa rebranded in 2026 as an "AI Search Platform". The strength is "vectors plus BM25 plus tensors plus ML ranking in one query". Spotify's search runs real-time HNSW on it.

Special features:

  • Tensor storage: multi-dimensional representations (text, image, numeric) in one place
  • Ranking functions are first-class: ML ranking executes inside the DB
  • Highest operational burden: Vespa is not light. You need a search team.

When to pick it:

  • Search is your product
  • Recommendation, ranking, and hybrid all in one query
  • You have Lucene or Solr-era people on staff

3-5. Vald — Yahoo Japan's OSS

Vald is Yahoo JAPAN's Kubernetes-native ANN engine on top of NGT. Internal Japanese deployments demonstrate billion-scale millisecond search. Global momentum has cooled since 2024, but K8s affinity and gRPC-first design still have value. Worth a look if you are in Japan or specifically need NGT.


4. The Postgres camp — pgvector + pgvectorscale + Supabase/Neon/Tiger

4-1. pgvector + pgvectorscale

May 2026, "just use Postgres" is not a joke.

  • pgvector 0.9.x: HNSW and IVFFlat, all in the same place as transactions, JOINs, JSONB.
  • pgvectorscale: StreamingDiskANN. On the 50M-vector bench, p95 28x below Pinecone s1, 75% cheaper.
  • Filtered DiskANN: label-based pre-filtering inside the index. Multi-tenant RAG with "tenant_id = X AND embedding similarity" is the kind of query this was built for.
  • SBQ: 1/32-scale memory footprint.

Self-hosted weak spots:

  • HNSW builds can take a long time (tens of minutes to hours at tens of millions of vectors).
  • HA and replication: same Postgres ops problem as ever — not trivial.
  • VACUUM and pg_repack strategy has to be rewritten for vector indexes.

4-2. Managed — Supabase, Neon, Tiger Cloud

  • Supabase: pgvector as a first-class citizen, plus auth, Realtime, Edge Functions. Excellent value up to about 100M vectors.
  • Neon: serverless Postgres plus pgvector. Branching and copy-on-write are great for RAG experiments.
  • Tiger Cloud (formerly Timescale): the first-class home for pgvectorscale. Smoothest path for time-series plus vectors.

When to pick this camp:

  • Already running Postgres, under 1B vectors
  • Need vectors in the same transaction as relational data
  • Your infra team refuses to add a new DB (the realistic case)

5. Embedded and analytical — Chroma, LanceDB, DuckDB VSS

5-1. Chroma — developer-friendly, now with Cloud

Chroma is long past 1.0 — 1.5.9 as of early May 2026. Chroma Cloud GA in August 2025, BM25 plus SPLADE sparse vectors as first-class in November 2025, CMEK in January 2026, array metadata with $contains operators in February 2026.

Still the easiest start. pip install chromadb, five minutes to a RAG demo. Above ~100M vectors, pick something else.

5-2. LanceDB — Lance columnar, embedded-first

LanceDB is an in-process vector DB on top of the Lance columnar format. Core in Rust, clients in Python, Node, Rust, and a REST API. Positioning is clear: vectors on top of a data lake.

Strengths:

  • In-process and zero-copy: fast without a server
  • Columnar analytics: vectors, text, and metadata in one file, SQL-queryable via DataFrame
  • Versioning: the Lance format supports transactions and versions

When to pick it:

  • Notebook, CLI, or local RAG
  • You want vectors next to S3 or GCS data
  • Multimodal ML workflows

5-3. DuckDB VSS — vectors inside analytics

DuckDB's VSS extension adds an HNSW index to ARRAY columns. Two lines: INSTALL vss; LOAD vss;. array_cosine_distance and array_negative_inner_product are index-accelerated.

INSTALL vss;
LOAD vss;
CREATE TABLE docs (id INTEGER, embedding FLOAT[768]);
CREATE INDEX idx ON docs USING HNSW (embedding)
  WITH (metric = 'cosine');
SELECT id FROM docs
ORDER BY array_cosine_distance(embedding, ?::FLOAT[768])
LIMIT 5;

VSS is still experimental — not for production. But for vector search inside analytical pipelines (a data scientist materializing an ad-hoc index in a notebook) it is already extremely useful.

When to pick it:

  • Data analysis with a vector-search step
  • "Run ETL once, build an index, query, write the report"

6. Vectors bolted onto general DBs — MongoDB Atlas, Couchbase, Elasticsearch/OpenSearch

General-purpose DBs adding vector columns is the dominant 2024-2026 storyline.

  • MongoDB Atlas Vector Search: managed, document plus vector in one collection. Natural choice for Mongo shops.
  • Couchbase Vector Search: also managed, FTS plus vector combined.
  • Elasticsearch/OpenSearch kNN: BM25 plus dense in one query. Operational cost is real, but obvious if you already run ES.
  • Redis Stack vector index: cache plus session plus vector. Great for small, hot datasets.

The pitch is simple: use what you already operate, add one more column type. Operational consistency is the biggest win.


7. Hybrid search standard — BM25 + dense + rerankers

"Vectors alone are enough" is essentially a dead position in 2026. The standard RAG pipeline is three stages.

Query
  ├─ BM25 search (sparse) ───┐
  ├─ Dense vector search ────┼──> Reciprocal Rank Fusion or weighted sum
  └─ (optional) ColBERT/late ┘
                       Candidates: 50 to 200
                     Cross-encoder reranker
                     (Cohere Rerank 3.5,
                      Voyage Rerank 2.5,
                      Jina Reranker v3,
                      BGE Reranker)
                       Top 5 to 10 to LLM

Embedding and reranking landscape (April-2026 MTEB)

  • Embedding — voyage-3-large (Voyage AI): retrieval near the top
  • NV-Embed-v2: best overall MTEB average class
  • OpenAI text-embedding-3-large: solid general default
  • Cohere embed-v3 plus Rerank v3.5: strong when paired
  • BGE-M3: 100+ languages, the self-hosted multilingual default
  • Nomic Embed v2: lower-cost multilingual
  • Jina Reranker v3: top BEIR nDCG@10 at 61.94, 188ms
  • Cohere Rerank 3.5 / Voyage Rerank 2.5: 600ms-ish average

Operational notes:

  • Do not turn BM25 off. Dense vectors are weak on proper nouns, code, and acronyms.
  • Rerankers must be cross-encoders. Running bi-encoders twice is not the same thing.
  • You will swap embedding models. Budget for full reindex from day one.

8. Cost matrix — what does 1M vectors actually cost per month

Rough self-estimate: 1M vectors at 1536 dimensions, 10k queries per day, on US or KR managed standard plans or comparable self-host. Real prices depend on commitments and load shape.

OptionMonthly cost estimateNotes
Pinecone Serverless (Standard)5 to 30 USDFree tier fits 1M-1536D in most cases
Turbopuffer (Launch)64 USD flat + usageMultiple indexes still fit Launch
Zilliz Cloud (Serverless small)30 to 80 USD0.096 USD per CU-hour plus storage
Qdrant Cloud25 to 70 USDSingle-node baseline
Weaviate Cloud25 to 80 USDVaries with enabled modules
pgvector + pgvectorscale on Tiger20 to 60 USDBest when you also need time-series
Supabase (Pro plus pgvector)from 25 USDIncludes auth and Realtime
Neon (serverless pgvector)from 19 USDBranching is a real superpower
LanceDB OSS0 USDStorage cost only (S3)
Chroma OSS0 USDInfra cost only
DuckDB VSS0 USDIn-process, experimental
MongoDB Atlas Vector30 to 100 USDFrom an M10 instance up
Elasticsearch (Elastic Cloud)from 95 USDBM25 plus kNN

The gap widens at 100M vectors: Turbopuffer, pgvectorscale, and self-hosted Milvus dominate on price per QPS. Above 1B vectors, you essentially need a distributed engine — Milvus, Vespa, or Vald.


9. Decision diagram — "pgvector vs dedicated"

                          ┌──────────────────────────────┐
                          │  Under 100k vectors,         │──> Chroma / LanceDB / DuckDB VSS
                          │  prototype phase             │
                          └──────────────────────────────┘
                          ┌──────────────────────────────┐
                          │ Already running Postgres?    │
                          └─────────────┬────────────────┘
                          Yes ◀─────────┴─────────▶ No
                          │                          │
                          ▼                          ▼
            ┌────────────────────────┐   ┌──────────────────────────┐
            │ Under 1B vectors,       │   │ No infra team,           │
            │ need transactions/JOINs │   │ "ship this quarter"      │──> Pinecone Serverless
            │ → pgvector +            │   └─────────────┬────────────┘
            │   pgvectorscale         │                 │
            │   (Tiger / Supabase /   │                 ▼
            │    Neon)                │   ┌──────────────────────────┐
            └────────────────────────┘   │ Cold-heavy, cost-sensitive│──> Turbopuffer
                                         └─────────────┬────────────┘
                                         ┌──────────────────────────┐
                                         │ Self-host OK,             │
                                         │ filter-heavy workload     │──> Qdrant
                                         └─────────────┬────────────┘
                                         ┌──────────────────────────┐
                                         │ ColBERT multi-vector +    │
                                         │ modular RAG              │──> Weaviate
                                         └─────────────┬────────────┘
                                         ┌──────────────────────────┐
                                         │ 100M+ vectors + GPU       │──> Milvus / Zilliz
                                         └─────────────┬────────────┘
                                         ┌──────────────────────────┐
                                         │ Search is the product,    │
                                         │ ML ranking integrated     │──> Vespa
                                         └──────────────────────────┘

The default branch is simple.

  1. Already on Postgres? Try pgvector + pgvectorscale first. 90% of RAG ends here.
  2. No ops team? Pinecone Serverless. Free tier is generous.
  3. Cold-heavy? Turbopuffer. Price difference is qualitative.
  4. Filter-heavy: Qdrant. Modular RAG: Weaviate. Scale plus GPU: Milvus or Zilliz. Search is the product: Vespa.
  5. Notebook, CLI, embedded: Chroma, LanceDB, DuckDB VSS.

10. Operational anti-patterns

The mistakes you only learn the hard way.

  • Silent embedding-model swap. Going from text-embedding-ada-002 to voyage-3-large without a full reindex makes the cosine space drift and retrieval quality collapses. Always reindex with A/B verification.
  • Turning BM25 off. Dense vectors are bad at code, proper nouns, and acronyms. If your internal wiki cannot find K8s, pgbouncer, or RAGAS, this is almost certainly the cause.
  • Skipping the reranker. Without a cross-encoder over top-50 candidates, you lose more than 30% of your achievable answer quality.
  • Default HNSW parameters. M and ef_construction look fine in small datasets and ruin recall above 1M. Tune to your dimensionality, distribution, and query shape.
  • Filtering outside the index. Qdrant and pgvectorscale both reward integrated filters. Multi-tenant RAG should pin tenant_id as a payload filter inside the index.
  • Storing blobs in metadata. Putting full text or images in the vector DB drives read units through the roof. Reference only — keep bodies in object storage.
  • No retry or backoff. Managed SaaS will return 429. Exponential backoff with jitter is not optional.
  • Monitoring equals average latency. p50 means little. You need p95, p99, and recall@k together.
  • Ignoring lock-in. Pinecone migrations are upsert-by-upsert. Always keep the embedding source in S3 or GCS in parallel.
  • "Index everything and pray". If 50% of your retrieval pool is noise, no database choice can save the answer quality. Chunking and selection strategy come before DB choice.

11. Epilogue — checklist and the next post

Pre-launch checklist

Twelve questions to answer before going live.

  • Vector count at 6 and 12 months?
  • Dimensionality? 1536, 3072, 768?
  • Daily queries and writes?
  • p95 latency SLO — 100ms, 300ms, 1s?
  • Multi-tenant? How is isolation done — per-index or payload filter?
  • Is BM25 or keyword search needed?
  • Will you add a reranker? Which one?
  • Which embedding model? Likelihood of swap within 6 months?
  • Already on Postgres?
  • Headcount on ops, room to introduce a new system?
  • Monthly cost ceiling?
  • How will you handle lock-in? Where do embedding sources live?

Next post

Next up: "RAG evaluation pipeline hands-on — measuring retriever, reranker, and generator separately with RAGAS, Ragas, Phoenix, and LangSmith". Picking a vector DB is not the end. When answer quality drops, you need to know whether it fell at retrieval recall, rerank precision, or generation faithfulness. Every DB in this post becomes comparable under the same eval frame.


References

현재 단락 (1/298)

When I started a RAG pilot in spring 2024, the menu was short. Pinecone was expensive but easy. Weav...

작성 글자: 0원문 글자: 19,596작성 단락: 0/298