- Authors

- Name
- Youngju Kim
- @fjvbn20031
- Introduction
- Why Vector Databases Exist
- Core Algorithms: HNSW vs IVF
- Pinecone: Zero-Ops Managed SaaS
- Decision Matrix
- Performance Reference
- Portable Abstraction Layer
- Conclusion
Introduction
Building a RAG system means picking a vector database, and the options are overwhelming: Pinecone, Weaviate, Chroma, pgvector, Qdrant, Milvus... Each is optimized for different scenarios. A rapid prototype needs something completely different from a production system handling hundreds of millions of vectors.
This guide gives you real code, honest trade-offs, and a clear decision framework.
Why Vector Databases Exist
Traditional databases are built for exact-match queries. Vector databases are built for similarity search.
-- Traditional DB: exact match
SELECT * FROM products WHERE id = 123;
SELECT * FROM documents WHERE title = 'AI Guide';
-- Vector DB: semantic similarity
-- "Tell me about AI" query → return 5 most relevant documents
SELECT id, content, 1 - (embedding <=> '[0.1, 0.2, ...]') AS similarity
FROM documents
ORDER BY embedding <=> '[0.1, 0.2, ...]'
LIMIT 5;
-- Uses "<=> (cosine distance)" not "=" operator
Vector databases are purpose-built to find "nearest neighbors" among millions of high-dimensional vectors in milliseconds — something impossible with B-tree indexes.
Core Algorithms: HNSW vs IVF
HNSW (Hierarchical Navigable Small World):
──────────────────────────────────────────
Structure: Layered graph (like zoom levels on a map)
Build time: O(n log n)
Query time: O(log n) approximate
Memory: High (entire graph lives in RAM)
Accuracy: High, tunable
Pros: Fast queries, high recall, supports dynamic inserts
Cons: High memory usage, slow build on large datasets
Best for: 1M–100M vectors, high recall requirements
IVF (Inverted File Index):
───────────────────────────
Structure: k-means clusters, each cluster as a sub-index
Build time: Fast
Query time: Depends on nprobe parameter
Memory: Lower than HNSW
Accuracy: Increases with nprobe (more clusters searched)
Pros: Memory efficient, can handle billions of vectors
Cons: Requires nprobe tuning, accuracy vs speed tradeoff
Best for: Billion-scale datasets
For most RAG systems (under 10M vectors), HNSW is the better choice.
Pinecone: Zero-Ops Managed SaaS
from pinecone import Pinecone, ServerlessSpec
# Initialize
pc = Pinecone(api_key="your-api-key")
# Create index (one-time setup)
pc.create_index(
name="my-rag-index",
dimension=1536, # Must match your embedding model's output
metric="cosine", # cosine, euclidean, or dotproduct
spec=ServerlessSpec(
cloud="aws",
region="us-east-1"
)
)
index = pc.Index("my-rag-index")
# Upsert vectors
index.upsert(vectors=[
{
"id": "doc1",
"values": embedding_vector,
"metadata": {
"text": "original document text",
"source": "document.pdf",
"page": 1
}
}
])
# Similarity search with metadata filtering
results = index.query(
vector=query_embedding,
top_k=5,
include_metadata=True,
filter={"source": {"$eq": "document.pdf"}} # Metadata pre-filter
)
for match in results.matches:
print(f"Score: {match.score:.3f} | {match.metadata['text'][:100]}")
# Namespace-based multi-tenancy (isolate data per customer)
index.upsert(
vectors=[{"id": "doc1", "values": embedding, "metadata": {...}}],
namespace="customer-123"
)
results = index.query(vector=query_emb, top_k=5, namespace="customer-123")
Pros: Zero infrastructure management, auto-scaling, 99.99% SLA, fast time-to-market.
Cons: Expensive at scale (100M vectors starts at $70+/month for serverless, dedicated pods cost hundreds to thousands). Data lives in Pinecone's infrastructure (vendor lock-in). Metadata filtering can degrade performance at large scale.
Best for: Fast production launch, no DevOps resources, cost is less important than speed.
Weaviate: Built-In Hybrid Search
import weaviate
from weaviate.classes.config import Configure, Property, DataType, VectorDistances
# Connect to local Docker instance or Weaviate Cloud Services
client = weaviate.connect_to_local()
# or: client = weaviate.connect_to_wcs(cluster_url="your-url", auth_credentials=...)
# Define schema
client.collections.create(
"Document",
vectorizer_config=Configure.Vectorizer.text2vec_openai(), # Auto-vectorize on insert
vector_index_config=Configure.VectorIndex.hnsw(
distance_metric=VectorDistances.COSINE
),
properties=[
Property(name="content", data_type=DataType.TEXT),
Property(name="source", data_type=DataType.TEXT),
Property(name="page", data_type=DataType.INT)
]
)
collection = client.collections.get("Document")
# Insert — Weaviate vectorizes automatically if vectorizer is configured
collection.data.insert({
"content": "RAG combines retrieval with language model generation",
"source": "guide.pdf",
"page": 1
})
# Pure semantic search
results = collection.query.near_text(
query="how does retrieval augmented generation work?",
limit=5,
return_metadata=["distance"]
)
# Hybrid search: vector + BM25 keyword (Weaviate's superpower)
hybrid_results = collection.query.hybrid(
query="RAG retrieval system",
alpha=0.5, # 0.0 = pure BM25, 1.0 = pure vector, 0.5 = balanced
limit=5
)
# Filter + semantic search
import weaviate.classes.query as wq
filtered_results = collection.query.near_text(
query="AI technology",
filters=wq.Filter.by_property("page").greater_than(0),
limit=10
)
Pros: Hybrid search (vector + BM25) built in, multimodal support (text, image, audio), auto-vectorization, self-hostable or managed.
Cons: Complex setup, resource-hungry (2GB+ RAM for basic Docker install), steep learning curve.
Best for: Hybrid search required, self-hosted infrastructure, complex schemas.
Chroma: Fastest Path to Working Code
import chromadb
from chromadb.utils.embedding_functions import OpenAIEmbeddingFunction
# In-memory (development/testing)
client = chromadb.Client()
# Persistent local storage
client = chromadb.PersistentClient(path="./chroma_db")
# Configure embedding function (auto-embeds on insert)
embedding_fn = OpenAIEmbeddingFunction(
api_key="your-openai-key",
model_name="text-embedding-3-small"
)
collection = client.create_collection(
name="my_documents",
embedding_function=embedding_fn,
metadata={"hnsw:space": "cosine"}
)
# Add documents — embedding happens automatically
collection.add(
documents=[
"RAG is retrieval-augmented generation",
"Embeddings convert text into vectors",
"Vector databases optimize similarity search"
],
ids=["doc1", "doc2", "doc3"],
metadatas=[
{"source": "guide.pdf", "page": 1},
{"source": "guide.pdf", "page": 2},
{"source": "guide.pdf", "page": 3}
]
)
# Query with just text — no manual embedding needed
results = collection.query(
query_texts=["how does text search work?"],
n_results=3,
where={"source": "guide.pdf"} # Metadata filter
)
print(results["documents"])
print(results["distances"])
Pros: Simplest API (5 lines to working search), Python-native, open-source, free, works in Jupyter immediately.
Cons: Not production-ready for large scale (degrades at millions of documents), no distributed processing, no enterprise features.
Best for: Prototypes, demos, development environments, small datasets.
pgvector: Leverage Your Existing PostgreSQL
-- Install the extension
CREATE EXTENSION vector;
-- Table with vector column
CREATE TABLE documents (
id BIGSERIAL PRIMARY KEY,
content TEXT NOT NULL,
source VARCHAR(255),
page INTEGER,
embedding vector(1536), -- Stores the embedding
created_at TIMESTAMP DEFAULT NOW()
);
-- HNSW index (critical for performance)
CREATE INDEX ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
-- m: connections per node (higher = more accurate, more memory)
-- ef_construction: build quality (higher = better, slower build)
-- Semantic similarity search
SELECT
id,
content,
source,
1 - (embedding <=> $1) AS similarity
FROM documents
ORDER BY embedding <=> $1 -- <=> is cosine distance operator
LIMIT 5;
-- Combine with regular PostgreSQL filters
SELECT id, content, similarity
FROM (
SELECT
id,
content,
1 - (embedding <=> $1) AS similarity
FROM documents
WHERE source = 'guide.pdf' -- Standard SQL filter
AND created_at > NOW() - INTERVAL '30 days'
) sub
WHERE similarity > 0.75
ORDER BY similarity DESC
LIMIT 10;
Python integration:
import psycopg2
from openai import OpenAI
openai_client = OpenAI()
def index_document(content: str, source: str, page: int, conn) -> None:
"""Embed and store a document"""
embedding = openai_client.embeddings.create(
input=content,
model="text-embedding-3-small"
).data[0].embedding
with conn.cursor() as cur:
cur.execute(
"INSERT INTO documents (content, source, page, embedding) VALUES (%s, %s, %s, %s::vector)",
(content, source, page, str(embedding))
)
conn.commit()
def semantic_search(query: str, conn, top_k: int = 5) -> list:
"""Search documents by semantic similarity"""
query_embedding = openai_client.embeddings.create(
input=query,
model="text-embedding-3-small"
).data[0].embedding
with conn.cursor() as cur:
cur.execute("""
SELECT id, content, source,
1 - (embedding <=> %s::vector) AS similarity
FROM documents
ORDER BY embedding <=> %s::vector
LIMIT %s
""", (str(query_embedding), str(query_embedding), top_k))
rows = cur.fetchall()
return [
{"id": r[0], "content": r[1], "source": r[2], "similarity": r[3]}
for r in rows
]
Pros: Reuse existing PostgreSQL infrastructure, full ACID compliance (atomic updates to vectors and metadata), combine with complex SQL, no additional DB to operate.
Cons: Performance drops compared to dedicated vector DBs above 10-50M vectors, entire HNSW index must fit in RAM for best performance, limited horizontal scaling.
Best for: Teams already on PostgreSQL, data that must join with relational data, cost minimization.
Decision Matrix
| Situation | Recommended | Why |
|---|---|---|
| Prototype / PoC | Chroma | Easiest start, 5 lines of code |
| Team already on PostgreSQL | pgvector | Reuse infrastructure, lowest ops overhead |
| Fast production launch | Pinecone | Zero ops, instant scaling |
| Self-hosted + large scale | Weaviate or Qdrant | Open source, full control |
| Hybrid search required | Weaviate | Native vector + BM25 |
| Billion-scale vectors | Qdrant or Milvus | Optimized for extreme scale |
| Minimize cost | pgvector | No separate DB costs |
Performance Reference
Test: 1M vectors, 1536 dimensions, cosine similarity, top-5 query
Approximate QPS (queries per second):
────────────────────────────────────────
Pinecone Serverless: ~2,000 QPS (auto-scales)
Weaviate (HNSW): ~1,500 QPS (single instance)
pgvector (HNSW): ~800 QPS (index fully in RAM)
Chroma: ~200 QPS (single-threaded limit)
Latency p99 at 1M vectors:
────────────────────────────────────────
Pinecone: ~50ms (network included)
Weaviate: ~30ms (local)
pgvector: ~20ms (index in memory)
Chroma: ~100ms+
These numbers vary significantly based on hardware and configuration. Always benchmark with your actual data.
Portable Abstraction Layer
Building a thin abstraction over your vector DB makes migration painless when you outgrow your initial choice.
from abc import ABC, abstractmethod
class VectorStoreBase(ABC):
@abstractmethod
def upsert(self, ids: list, embeddings: list, metadatas: list) -> None: ...
@abstractmethod
def search(self, query_embedding: list, top_k: int = 5) -> list: ...
@abstractmethod
def delete(self, ids: list) -> None: ...
class ChromaVectorStore(VectorStoreBase):
def __init__(self, collection_name: str):
import chromadb
self.collection = chromadb.PersistentClient("./chroma").get_or_create_collection(collection_name)
def upsert(self, ids, embeddings, metadatas):
self.collection.upsert(ids=ids, embeddings=embeddings, metadatas=metadatas)
def search(self, query_embedding, top_k=5):
r = self.collection.query(query_embeddings=[query_embedding], n_results=top_k)
return [{"id": i, "score": 1-d} for i, d in zip(r["ids"][0], r["distances"][0])]
def delete(self, ids):
self.collection.delete(ids=ids)
class PineconeVectorStore(VectorStoreBase):
def __init__(self, index_name: str):
from pinecone import Pinecone
self.index = Pinecone(api_key="your-key").Index(index_name)
def upsert(self, ids, embeddings, metadatas):
self.index.upsert([{"id": i, "values": e, "metadata": m}
for i, e, m in zip(ids, embeddings, metadatas)])
def search(self, query_embedding, top_k=5):
r = self.index.query(vector=query_embedding, top_k=top_k)
return [{"id": m.id, "score": m.score} for m in r.matches]
def delete(self, ids):
self.index.delete(ids=ids)
# Business logic only touches the interface
def build_rag_pipeline(provider: str) -> VectorStoreBase:
stores = {"chroma": ChromaVectorStore, "pinecone": PineconeVectorStore}
return stores[provider]("my_collection")
Conclusion
There's no universally right vector database — it depends on your constraints.
Start fast: Use Chroma for the prototype. It gets you to working search in 10 minutes. Migrate when you need to.
Already on PostgreSQL: Try pgvector first. It handles millions of vectors comfortably, and your ops burden stays near zero.
No DevOps resources: Pinecone Serverless. The cost is real, but so is the time you save not managing infrastructure.
Large-scale self-hosted: Evaluate Weaviate (if you need hybrid search) or Qdrant (if you need pure vector performance at scale).
The one principle that holds regardless of which you choose: build a thin abstraction over it from day one. Switching vector databases is inevitable as requirements evolve.