Split View: Vector DB 완전 비교 2025: Pinecone, Weaviate, Chroma, pgvector 중 무엇을 선택하는가

Vector DB 완전 비교 2025: Pinecone, Weaviate, Chroma, pgvector 중 무엇을 선택하는가

들어가며
벡터 DB가 왜 필요한가?
핵심 알고리즘: HNSW vs IVF
각 DB 상세 비교
결정 매트릭스
성능 비교 (동일 조건 벤치마크)
마이그레이션 전략
결론

들어가며

RAG 시스템을 구축하다 보면 필연적으로 벡터 데이터베이스를 선택해야 하는 순간이 온다. Pinecone, Weaviate, Chroma, pgvector, Qdrant... 선택지가 너무 많다.

각 도구는 다른 상황을 위해 최적화되어 있다. 프로토타입을 빠르게 만들어야 하는 상황과 수억 개의 벡터를 처리해야 하는 프로덕션 환경은 완전히 다른 선택이 필요하다. 이 글에서 실용적 기준으로 각 DB를 비교하고 결정 프레임워크를 제시한다.

벡터 DB가 왜 필요한가?

일반 데이터베이스와의 차이를 이해하면 벡터 DB의 가치가 명확해진다.

-- 일반 DB: 정확한 매칭
SELECT * FROM products WHERE id = 123;
SELECT * FROM documents WHERE title = '인공지능 가이드';

-- 벡터 DB: 의미적 유사도 검색
-- "AI에 대해 알려줘" 쿼리 → 가장 관련 있는 문서 5개 반환
SELECT id, content, 1 - (embedding <=> '[0.1, 0.2, ...]') AS similarity
FROM documents
ORDER BY embedding <=> '[0.1, 0.2, ...]'
LIMIT 5;
-- "=" 연산이 아닌 "<=> (코사인 거리)" 연산

벡터 DB는 수백만 개의 고차원 벡터에서 "가장 가까운 이웃"을 밀리초 내에 찾아주도록 특화되어 있다.

핵심 알고리즘: HNSW vs IVF

벡터 DB를 이해하려면 핵심 인덱싱 알고리즘을 알아야 한다.

HNSW (Hierarchical Navigable Small World):
─────────────────────────────────────────
구조: 계층적 그래프 (마치 지도의 줌 레벨처럼)
빌드 시간: O(n log n)
쿼리 시간: O(log n) 근사
메모리: 높음 (전체 그래프를 RAM에 유지)
정확도: 높음 (조정 가능)

장점: 빠른 쿼리, 높은 정확도, 동적 삽입 지원
단점: 메모리 사용량 큼, 빌드 시간 길음
적합: 수백만~수억 건, 높은 recall 요구

IVF (Inverted File Index):
──────────────────────────
구조: k-means 클러스터링 후 각 클러스터를 별도 인덱스로
빌드 시간: 빠름 (클러스터 수만큼)
쿼리 시간: nprobe 파라미터에 의존
메모리: HNSW보다 낮음
정확도: nprobe 증가 시 향상

장점: 메모리 효율적, 수십억 건 처리 가능
단점: 쿼리 시 nprobe 튜닝 필요, 정확도-속도 트레이드오프
적합: 수십억 건 이상의 초대규모 데이터셋

대부분의 RAG 시스템(수백만 건 이하)에서는 HNSW가 더 나은 선택이다.

각 DB 상세 비교

Pinecone: 운영 부담 없는 관리형 SaaS

from pinecone import Pinecone, ServerlessSpec

# 초기화
pc = Pinecone(api_key="your-api-key")

# 인덱스 생성 (한 번만)
index = pc.create_index(
    name="my-rag-index",
    dimension=1536,          # 사용하는 임베딩 모델 차원과 일치
    metric="cosine",         # cosine, euclidean, dotproduct 중 선택
    spec=ServerlessSpec(
        cloud="aws",
        region="us-east-1"
    )
)

# 벡터 업서트 (삽입/업데이트)
index = pc.Index("my-rag-index")
index.upsert(vectors=[
    {
        "id": "doc1",
        "values": embedding_vector,
        "metadata": {
            "text": "원본 텍스트 내용",
            "source": "document.pdf",
            "page": 1,
            "created_at": "2025-01-01"
        }
    },
    {
        "id": "doc2",
        "values": embedding_vector2,
        "metadata": {"text": "두 번째 문서 내용"}
    }
])

# 유사도 검색
results = index.query(
    vector=query_embedding,
    top_k=5,
    include_metadata=True,
    filter={"source": {"$eq": "document.pdf"}}  # 메타데이터 필터링
)

for match in results.matches:
    print(f"ID: {match.id}, Score: {match.score:.3f}")
    print(f"Text: {match.metadata['text'][:100]}")

# 네임스페이스로 멀티테넌시 구현
index.upsert(
    vectors=[...],
    namespace="customer-123"  # 고객별 격리
)

장점:

제로 운영: 인프라 관리, 스케일링, 백업 자동
초당 수만 건의 쿼리 자동 처리
99.99% SLA

단점:

비용이 비쌈: 1억 벡터 기준 월 $70+ (Serverless), 전용 Pod는 수백~수천 달러
데이터가 Pinecone 인프라에 저장됨 (벤더 락인)
복잡한 필터링 쿼리에서 성능 저하 가능

적합한 상황: 빠른 프로덕션 출시, DevOps 리소스 없음, 비용보다 속도가 중요

Weaviate: 하이브리드 검색의 강자

import weaviate
from weaviate.classes.config import Configure, Property, DataType, VectorDistances

# 로컬 연결 (Docker로 실행) 또는 Weaviate Cloud Services
client = weaviate.connect_to_local()
# 또는: client = weaviate.connect_to_wcs(cluster_url="your-url", auth_credentials=...)

# 스키마 정의 (컬렉션 생성)
client.collections.create(
    "Document",
    vectorizer_config=Configure.Vectorizer.text2vec_openai(),  # 자동 임베딩
    vector_index_config=Configure.VectorIndex.hnsw(
        distance_metric=VectorDistances.COSINE
    ),
    properties=[
        Property(name="content", data_type=DataType.TEXT),
        Property(name="source", data_type=DataType.TEXT),
        Property(name="page", data_type=DataType.INT)
    ]
)

# 문서 삽입 (vectorizer 설정 시 임베딩 자동 생성)
collection = client.collections.get("Document")
collection.data.insert({
    "content": "인공지능이 세상을 바꾸고 있다",
    "source": "article.pdf",
    "page": 1
})

# 시맨틱 검색
results = collection.query.near_text(
    query="AI 기술 동향",
    limit=5,
    return_metadata=["distance", "score"]
)

# 하이브리드 검색 (벡터 + BM25 키워드)
hybrid_results = collection.query.hybrid(
    query="AI 기술 동향",
    alpha=0.5,   # 0: 순수 BM25, 1: 순수 벡터, 0.5: 균형
    limit=5
)

# GraphQL로 복잡한 쿼리
import weaviate.classes.query as wq
complex_results = collection.query.near_text(
    query="AI 기술 동향",
    filters=wq.Filter.by_property("page").greater_than(0),
    limit=10
)

장점:

하이브리드 검색(벡터 + 키워드) 내장
멀티모달 지원 (텍스트, 이미지, 오디오)
자동 벡터화 (vectorizer 설정 시 임베딩 코드 불필요)
오픈소스 + 클라우드 모두 지원

단점:

설정이 복잡하고 리소스 소모가 큼
Docker 기본 설치 시 2GB+ RAM 필요
러닝 커브가 높음

적합한 상황: 하이브리드 검색 필요, 자체 인프라 운영 가능, 복잡한 스키마

Chroma: 가장 쉬운 시작점

import chromadb
from chromadb.utils.embedding_functions import OpenAIEmbeddingFunction

# 인메모리 (개발/테스트용)
client = chromadb.Client()

# 영구 저장 (로컬 파일)
client = chromadb.PersistentClient(path="./chroma_db")

# 임베딩 함수 설정 (자동 임베딩)
embedding_fn = OpenAIEmbeddingFunction(
    api_key="your-openai-key",
    model_name="text-embedding-3-small"
)

collection = client.create_collection(
    name="my_documents",
    embedding_function=embedding_fn,
    metadata={"hnsw:space": "cosine"}
)

# 문서 추가 (텍스트 제공 시 자동 임베딩)
collection.add(
    documents=[
        "RAG는 검색 증강 생성 기술입니다",
        "임베딩은 텍스트를 벡터로 변환합니다",
        "벡터 DB는 유사도 검색에 최적화되어 있습니다"
    ],
    ids=["doc1", "doc2", "doc3"],
    metadatas=[
        {"source": "guide.pdf", "page": 1},
        {"source": "guide.pdf", "page": 2},
        {"source": "guide.pdf", "page": 3}
    ]
)

# 쿼리 (텍스트만 제공하면 됨)
results = collection.query(
    query_texts=["텍스트 검색 기술에 대해 알려주세요"],
    n_results=3,
    where={"source": "guide.pdf"}  # 메타데이터 필터
)

print(results["documents"])
print(results["distances"])

장점:

가장 쉬운 API, 코드 5줄로 시작 가능
Python 네이티브, Jupyter에서 바로 실행
오픈소스, 무료

단점:

대규모 프로덕션에 적합하지 않음 (수백만 건 이상에서 성능 저하)
분산 처리 미지원
엔터프라이즈 기능 부재

적합한 상황: 프로토타입, 데모, 개발 환경, 소규모 데이터셋

pgvector: 이미 PostgreSQL을 쓰고 있다면

-- PostgreSQL 확장 설치
CREATE EXTENSION vector;

-- 벡터 컬럼이 있는 테이블 생성
CREATE TABLE documents (
    id BIGSERIAL PRIMARY KEY,
    content TEXT NOT NULL,
    source VARCHAR(255),
    page INTEGER,
    embedding vector(1536),          -- 임베딩 컬럼
    created_at TIMESTAMP DEFAULT NOW()
);

-- HNSW 인덱스 생성 (쿼리 성능 핵심)
CREATE INDEX ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
-- m: 연결 수 (높을수록 정확, 메모리 증가)
-- ef_construction: 빌드 품질 (높을수록 좋음, 빌드 느림)

-- 코사인 유사도 검색
SELECT
    id,
    content,
    source,
    1 - (embedding <=> $1) AS similarity
FROM documents
ORDER BY embedding <=> $1  -- <=> 코사인 거리
LIMIT 5;

-- 메타데이터 필터링과 조합
SELECT id, content, similarity
FROM (
    SELECT
        id,
        content,
        1 - (embedding <=> $1) AS similarity
    FROM documents
    WHERE source = 'guide.pdf'   -- 일반 PostgreSQL 필터
) sub
WHERE similarity > 0.8           -- 유사도 임계값
ORDER BY similarity DESC
LIMIT 10;

Python에서 사용:

import psycopg2
import numpy as np
from openai import OpenAI

client_openai = OpenAI()

def semantic_search(query: str, conn, top_k: int = 5) -> list:
    """pgvector를 사용한 시맨틱 검색"""

    # 쿼리 임베딩 생성
    query_embedding = client_openai.embeddings.create(
        input=query,
        model="text-embedding-3-small"
    ).data[0].embedding

    # 벡터를 pgvector 형식으로 변환
    embedding_str = str(query_embedding)

    with conn.cursor() as cur:
        cur.execute("""
            SELECT id, content, source,
                   1 - (embedding <=> %s::vector) AS similarity
            FROM documents
            ORDER BY embedding <=> %s::vector
            LIMIT %s
        """, (embedding_str, embedding_str, top_k))

        results = cur.fetchall()
        return [
            {"id": r[0], "content": r[1], "source": r[2], "similarity": r[3]}
            for r in results
        ]

# 사용 예시
conn = psycopg2.connect("postgresql://user:pass@localhost/mydb")
results = semantic_search("인공지능 기술 동향", conn)
for r in results:
    print(f"[{r['similarity']:.3f}] {r['content'][:80]}")

장점:

기존 PostgreSQL 인프라 재사용
ACID 트랜잭션 지원 (벡터와 일반 데이터 원자적 처리)
복잡한 SQL 쿼리와 자연스러운 조합
운영 비용 절감

단점:

수천만 건 이상에서 전용 벡터 DB 대비 성능 떨어짐
인덱스가 메모리에 상주해야 최적 성능
분산 처리 제한

적합한 상황: 이미 PostgreSQL 사용 중, 데이터가 RDB와 결합 필요, 비용 절감 우선

결정 매트릭스

상황	최고 선택	이유
프로토타입/PoC	Chroma	가장 쉽게 시작
PostgreSQL 팀	pgvector	인프라 재활용, 낮은 운영 비용
빠른 프로덕션 론칭	Pinecone	제로 운영, 즉시 스케일링
자체 인프라 + 대규모	Weaviate 또는 Qdrant	오픈소스, 완전한 제어
하이브리드 검색 필수	Weaviate	벡터+BM25 네이티브 지원
수십억 벡터 이상	Qdrant 또는 Milvus	초대규모 최적화
비용 최소화	pgvector	별도 DB 비용 없음

성능 비교 (동일 조건 벤치마크)

테스트 조건: 100만 개 1536차원 벡터, cosine similarity, top-5 검색

초당 쿼리 처리량 (QPS):
─────────────────────────────────────────
Pinecone Serverless:  ~2,000 QPS (자동 스케일링)
Weaviate (HNSW):      ~1,500 QPS (단일 인스턴스)
pgvector (HNSW):      ~800 QPS (단일 인스턴스, RAM에 인덱스 적재 시)
Chroma:               ~200 QPS (단일 스레드 한계)

지연시간 (p99, 100만 벡터):
─────────────────────────────────────────
Pinecone:   ~50ms
Weaviate:   ~30ms (로컬)
pgvector:   ~20ms (인덱스 완전 메모리 적재 시)
Chroma:     ~100ms+

주의: 이 수치는 환경에 따라 크게 달라진다. 실제 데이터로 벤치마크를 직접 실행할 것을 강력히 권장한다.

마이그레이션 전략

벡터 DB를 나중에 교체해야 할 경우를 대비한 추상화 레이어:

from abc import ABC, abstractmethod
import numpy as np

class VectorStoreBase(ABC):
    """벡터 DB 추상화 레이어"""

    @abstractmethod
    def upsert(self, ids: list, embeddings: list, metadatas: list) -> None:
        pass

    @abstractmethod
    def search(self, query_embedding: list, top_k: int = 5) -> list:
        pass

    @abstractmethod
    def delete(self, ids: list) -> None:
        pass


class ChromaVectorStore(VectorStoreBase):
    def __init__(self, collection_name: str):
        import chromadb
        self.client = chromadb.PersistentClient(path="./chroma_db")
        self.collection = self.client.get_or_create_collection(collection_name)

    def upsert(self, ids, embeddings, metadatas):
        self.collection.upsert(ids=ids, embeddings=embeddings, metadatas=metadatas)

    def search(self, query_embedding, top_k=5):
        results = self.collection.query(query_embeddings=[query_embedding], n_results=top_k)
        return [{"id": id_, "score": 1-dist}
                for id_, dist in zip(results["ids"][0], results["distances"][0])]

    def delete(self, ids):
        self.collection.delete(ids=ids)


class PineconeVectorStore(VectorStoreBase):
    def __init__(self, index_name: str):
        from pinecone import Pinecone
        pc = Pinecone(api_key="your-key")
        self.index = pc.Index(index_name)

    def upsert(self, ids, embeddings, metadatas):
        vectors = [{"id": id_, "values": emb, "metadata": meta}
                   for id_, emb, meta in zip(ids, embeddings, metadatas)]
        self.index.upsert(vectors=vectors)

    def search(self, query_embedding, top_k=5):
        results = self.index.query(vector=query_embedding, top_k=top_k)
        return [{"id": m.id, "score": m.score} for m in results.matches]

    def delete(self, ids):
        self.index.delete(ids=ids)


# 환경에 따라 구현체 교체 가능
def get_vector_store(provider: str) -> VectorStoreBase:
    if provider == "chroma":
        return ChromaVectorStore("my_collection")
    elif provider == "pinecone":
        return PineconeVectorStore("my-index")
    else:
        raise ValueError(f"Unknown provider: {provider}")

이 패턴을 사용하면 초반에 Chroma로 시작해 나중에 Pinecone이나 Weaviate로 마이그레이션할 때 비즈니스 로직 변경이 최소화된다.

결론

벡터 DB 선택에 정답은 없다. 상황에 따라 다르다.

빠르게 시작해야 한다면: Chroma로 프로토타입 → 요구사항 명확해지면 적합한 프로덕션 DB로 이전.

이미 PostgreSQL이 있다면: pgvector를 먼저 시도해라. 수백만 건 이하에서는 충분하고, 운영 복잡도가 가장 낮다.

팀에 운영 리소스가 없다면: Pinecone Serverless로 시작해라. 비용이 나중에 문제가 되면 그때 이전을 검토한다.

대규모 자체 인프라가 있다면: Weaviate나 Qdrant를 검토해라. 하이브리드 검색이 필요하다면 Weaviate가 단연 최고다.

Vector Database Comparison 2025: Pinecone vs Weaviate vs Chroma vs pgvector

Introduction
Why Vector Databases Exist
Core Algorithms: HNSW vs IVF
Pinecone: Zero-Ops Managed SaaS
Decision Matrix
Performance Reference
Portable Abstraction Layer
Conclusion

Introduction

Building a RAG system means picking a vector database, and the options are overwhelming: Pinecone, Weaviate, Chroma, pgvector, Qdrant, Milvus... Each is optimized for different scenarios. A rapid prototype needs something completely different from a production system handling hundreds of millions of vectors.

This guide gives you real code, honest trade-offs, and a clear decision framework.

Why Vector Databases Exist

Traditional databases are built for exact-match queries. Vector databases are built for similarity search.

-- Traditional DB: exact match
SELECT * FROM products WHERE id = 123;
SELECT * FROM documents WHERE title = 'AI Guide';

-- Vector DB: semantic similarity
-- "Tell me about AI" query → return 5 most relevant documents
SELECT id, content, 1 - (embedding <=> '[0.1, 0.2, ...]') AS similarity
FROM documents
ORDER BY embedding <=> '[0.1, 0.2, ...]'
LIMIT 5;
-- Uses "<=> (cosine distance)" not "=" operator

Vector databases are purpose-built to find "nearest neighbors" among millions of high-dimensional vectors in milliseconds — something impossible with B-tree indexes.

Core Algorithms: HNSW vs IVF

HNSW (Hierarchical Navigable Small World):
──────────────────────────────────────────
Structure: Layered graph (like zoom levels on a map)
Build time: O(n log n)
Query time: O(log n) approximate
Memory:     High (entire graph lives in RAM)
Accuracy:   High, tunable

Pros: Fast queries, high recall, supports dynamic inserts
Cons: High memory usage, slow build on large datasets
Best for: 1M–100M vectors, high recall requirements

IVF (Inverted File Index):
───────────────────────────
Structure: k-means clusters, each cluster as a sub-index
Build time: Fast
Query time: Depends on nprobe parameter
Memory:     Lower than HNSW
Accuracy:   Increases with nprobe (more clusters searched)

Pros: Memory efficient, can handle billions of vectors
Cons: Requires nprobe tuning, accuracy vs speed tradeoff
Best for: Billion-scale datasets

For most RAG systems (under 10M vectors), HNSW is the better choice.

Pinecone: Zero-Ops Managed SaaS

from pinecone import Pinecone, ServerlessSpec

# Initialize
pc = Pinecone(api_key="your-api-key")

# Create index (one-time setup)
pc.create_index(
    name="my-rag-index",
    dimension=1536,          # Must match your embedding model's output
    metric="cosine",         # cosine, euclidean, or dotproduct
    spec=ServerlessSpec(
        cloud="aws",
        region="us-east-1"
    )
)

index = pc.Index("my-rag-index")

# Upsert vectors
index.upsert(vectors=[
    {
        "id": "doc1",
        "values": embedding_vector,
        "metadata": {
            "text": "original document text",
            "source": "document.pdf",
            "page": 1
        }
    }
])

# Similarity search with metadata filtering
results = index.query(
    vector=query_embedding,
    top_k=5,
    include_metadata=True,
    filter={"source": {"$eq": "document.pdf"}}  # Metadata pre-filter
)

for match in results.matches:
    print(f"Score: {match.score:.3f} | {match.metadata['text'][:100]}")

# Namespace-based multi-tenancy (isolate data per customer)
index.upsert(
    vectors=[{"id": "doc1", "values": embedding, "metadata": {...}}],
    namespace="customer-123"
)
results = index.query(vector=query_emb, top_k=5, namespace="customer-123")

Pros: Zero infrastructure management, auto-scaling, 99.99% SLA, fast time-to-market.

Cons: Expensive at scale (100M vectors starts at $70+/month for serverless, dedicated pods cost hundreds to thousands). Data lives in Pinecone's infrastructure (vendor lock-in). Metadata filtering can degrade performance at large scale.

Best for: Fast production launch, no DevOps resources, cost is less important than speed.

Weaviate: Built-In Hybrid Search

import weaviate
from weaviate.classes.config import Configure, Property, DataType, VectorDistances

# Connect to local Docker instance or Weaviate Cloud Services
client = weaviate.connect_to_local()
# or: client = weaviate.connect_to_wcs(cluster_url="your-url", auth_credentials=...)

# Define schema
client.collections.create(
    "Document",
    vectorizer_config=Configure.Vectorizer.text2vec_openai(),  # Auto-vectorize on insert
    vector_index_config=Configure.VectorIndex.hnsw(
        distance_metric=VectorDistances.COSINE
    ),
    properties=[
        Property(name="content", data_type=DataType.TEXT),
        Property(name="source", data_type=DataType.TEXT),
        Property(name="page", data_type=DataType.INT)
    ]
)

collection = client.collections.get("Document")

# Insert — Weaviate vectorizes automatically if vectorizer is configured
collection.data.insert({
    "content": "RAG combines retrieval with language model generation",
    "source": "guide.pdf",
    "page": 1
})

# Pure semantic search
results = collection.query.near_text(
    query="how does retrieval augmented generation work?",
    limit=5,
    return_metadata=["distance"]
)

# Hybrid search: vector + BM25 keyword (Weaviate's superpower)
hybrid_results = collection.query.hybrid(
    query="RAG retrieval system",
    alpha=0.5,  # 0.0 = pure BM25, 1.0 = pure vector, 0.5 = balanced
    limit=5
)

# Filter + semantic search
import weaviate.classes.query as wq
filtered_results = collection.query.near_text(
    query="AI technology",
    filters=wq.Filter.by_property("page").greater_than(0),
    limit=10
)

Pros: Hybrid search (vector + BM25) built in, multimodal support (text, image, audio), auto-vectorization, self-hostable or managed.

Cons: Complex setup, resource-hungry (2GB+ RAM for basic Docker install), steep learning curve.

Best for: Hybrid search required, self-hosted infrastructure, complex schemas.

Chroma: Fastest Path to Working Code

import chromadb
from chromadb.utils.embedding_functions import OpenAIEmbeddingFunction

# In-memory (development/testing)
client = chromadb.Client()

# Persistent local storage
client = chromadb.PersistentClient(path="./chroma_db")

# Configure embedding function (auto-embeds on insert)
embedding_fn = OpenAIEmbeddingFunction(
    api_key="your-openai-key",
    model_name="text-embedding-3-small"
)

collection = client.create_collection(
    name="my_documents",
    embedding_function=embedding_fn,
    metadata={"hnsw:space": "cosine"}
)

# Add documents — embedding happens automatically
collection.add(
    documents=[
        "RAG is retrieval-augmented generation",
        "Embeddings convert text into vectors",
        "Vector databases optimize similarity search"
    ],
    ids=["doc1", "doc2", "doc3"],
    metadatas=[
        {"source": "guide.pdf", "page": 1},
        {"source": "guide.pdf", "page": 2},
        {"source": "guide.pdf", "page": 3}
    ]
)

# Query with just text — no manual embedding needed
results = collection.query(
    query_texts=["how does text search work?"],
    n_results=3,
    where={"source": "guide.pdf"}  # Metadata filter
)

print(results["documents"])
print(results["distances"])

Pros: Simplest API (5 lines to working search), Python-native, open-source, free, works in Jupyter immediately.

Cons: Not production-ready for large scale (degrades at millions of documents), no distributed processing, no enterprise features.

Best for: Prototypes, demos, development environments, small datasets.

pgvector: Leverage Your Existing PostgreSQL

-- Install the extension
CREATE EXTENSION vector;

-- Table with vector column
CREATE TABLE documents (
    id BIGSERIAL PRIMARY KEY,
    content TEXT NOT NULL,
    source VARCHAR(255),
    page INTEGER,
    embedding vector(1536),      -- Stores the embedding
    created_at TIMESTAMP DEFAULT NOW()
);

-- HNSW index (critical for performance)
CREATE INDEX ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
-- m: connections per node (higher = more accurate, more memory)
-- ef_construction: build quality (higher = better, slower build)

-- Semantic similarity search
SELECT
    id,
    content,
    source,
    1 - (embedding <=> $1) AS similarity
FROM documents
ORDER BY embedding <=> $1   -- <=> is cosine distance operator
LIMIT 5;

-- Combine with regular PostgreSQL filters
SELECT id, content, similarity
FROM (
    SELECT
        id,
        content,
        1 - (embedding <=> $1) AS similarity
    FROM documents
    WHERE source = 'guide.pdf'     -- Standard SQL filter
      AND created_at > NOW() - INTERVAL '30 days'
) sub
WHERE similarity > 0.75
ORDER BY similarity DESC
LIMIT 10;

Python integration:

import psycopg2
from openai import OpenAI

openai_client = OpenAI()

def index_document(content: str, source: str, page: int, conn) -> None:
    """Embed and store a document"""
    embedding = openai_client.embeddings.create(
        input=content,
        model="text-embedding-3-small"
    ).data[0].embedding

    with conn.cursor() as cur:
        cur.execute(
            "INSERT INTO documents (content, source, page, embedding) VALUES (%s, %s, %s, %s::vector)",
            (content, source, page, str(embedding))
        )
    conn.commit()

def semantic_search(query: str, conn, top_k: int = 5) -> list:
    """Search documents by semantic similarity"""
    query_embedding = openai_client.embeddings.create(
        input=query,
        model="text-embedding-3-small"
    ).data[0].embedding

    with conn.cursor() as cur:
        cur.execute("""
            SELECT id, content, source,
                   1 - (embedding <=> %s::vector) AS similarity
            FROM documents
            ORDER BY embedding <=> %s::vector
            LIMIT %s
        """, (str(query_embedding), str(query_embedding), top_k))
        rows = cur.fetchall()

    return [
        {"id": r[0], "content": r[1], "source": r[2], "similarity": r[3]}
        for r in rows
    ]

Pros: Reuse existing PostgreSQL infrastructure, full ACID compliance (atomic updates to vectors and metadata), combine with complex SQL, no additional DB to operate.

Cons: Performance drops compared to dedicated vector DBs above 10-50M vectors, entire HNSW index must fit in RAM for best performance, limited horizontal scaling.

Best for: Teams already on PostgreSQL, data that must join with relational data, cost minimization.

Decision Matrix

Situation	Recommended	Why
Prototype / PoC	Chroma	Easiest start, 5 lines of code
Team already on PostgreSQL	pgvector	Reuse infrastructure, lowest ops overhead
Fast production launch	Pinecone	Zero ops, instant scaling
Self-hosted + large scale	Weaviate or Qdrant	Open source, full control
Hybrid search required	Weaviate	Native vector + BM25
Billion-scale vectors	Qdrant or Milvus	Optimized for extreme scale
Minimize cost	pgvector	No separate DB costs

Performance Reference

Test: 1M vectors, 1536 dimensions, cosine similarity, top-5 query

Approximate QPS (queries per second):
────────────────────────────────────────
Pinecone Serverless:  ~2,000 QPS (auto-scales)
Weaviate (HNSW):      ~1,500 QPS (single instance)
pgvector (HNSW):      ~800 QPS (index fully in RAM)
Chroma:               ~200 QPS (single-threaded limit)

Latency p99 at 1M vectors:
────────────────────────────────────────
Pinecone:    ~50ms (network included)
Weaviate:    ~30ms (local)
pgvector:    ~20ms (index in memory)
Chroma:      ~100ms+

These numbers vary significantly based on hardware and configuration. Always benchmark with your actual data.

Portable Abstraction Layer

Building a thin abstraction over your vector DB makes migration painless when you outgrow your initial choice.

from abc import ABC, abstractmethod

class VectorStoreBase(ABC):
    @abstractmethod
    def upsert(self, ids: list, embeddings: list, metadatas: list) -> None: ...

    @abstractmethod
    def search(self, query_embedding: list, top_k: int = 5) -> list: ...

    @abstractmethod
    def delete(self, ids: list) -> None: ...


class ChromaVectorStore(VectorStoreBase):
    def __init__(self, collection_name: str):
        import chromadb
        self.collection = chromadb.PersistentClient("./chroma").get_or_create_collection(collection_name)

    def upsert(self, ids, embeddings, metadatas):
        self.collection.upsert(ids=ids, embeddings=embeddings, metadatas=metadatas)

    def search(self, query_embedding, top_k=5):
        r = self.collection.query(query_embeddings=[query_embedding], n_results=top_k)
        return [{"id": i, "score": 1-d} for i, d in zip(r["ids"][0], r["distances"][0])]

    def delete(self, ids):
        self.collection.delete(ids=ids)


class PineconeVectorStore(VectorStoreBase):
    def __init__(self, index_name: str):
        from pinecone import Pinecone
        self.index = Pinecone(api_key="your-key").Index(index_name)

    def upsert(self, ids, embeddings, metadatas):
        self.index.upsert([{"id": i, "values": e, "metadata": m}
                           for i, e, m in zip(ids, embeddings, metadatas)])

    def search(self, query_embedding, top_k=5):
        r = self.index.query(vector=query_embedding, top_k=top_k)
        return [{"id": m.id, "score": m.score} for m in r.matches]

    def delete(self, ids):
        self.index.delete(ids=ids)


# Business logic only touches the interface
def build_rag_pipeline(provider: str) -> VectorStoreBase:
    stores = {"chroma": ChromaVectorStore, "pinecone": PineconeVectorStore}
    return stores[provider]("my_collection")

Conclusion

There's no universally right vector database — it depends on your constraints.

Start fast: Use Chroma for the prototype. It gets you to working search in 10 minutes. Migrate when you need to.

Already on PostgreSQL: Try pgvector first. It handles millions of vectors comfortably, and your ops burden stays near zero.

No DevOps resources: Pinecone Serverless. The cost is real, but so is the time you save not managing infrastructure.

Large-scale self-hosted: Evaluate Weaviate (if you need hybrid search) or Qdrant (if you need pure vector performance at scale).

The one principle that holds regardless of which you choose: build a thin abstraction over it from day one. Switching vector databases is inevitable as requirements evolve.