✍️ 필사 모드: Vector Database Complete Guide 2025: Embeddings, Similarity Search, Pinecone/Weaviate/Qdrant/pgvector
English- 1. Why Vector Databases
- 2. Vector Embedding Fundamentals
- 3. Distance Metrics (Similarity Measurement)
- 4. Indexing Algorithms
- 5. Vector Database Comparison
- 6. pgvector Deep Dive
- 7. Hybrid Search
- 8. Metadata Filtering
- 9. Namespace and Collection Management
- 10. Production Operations
- 11. Performance Benchmarks
- 12. Cost Analysis
- 13. Practical Implementation: Vector DB in a RAG Pipeline
- 14. Quiz
- 15. References
1. Why Vector Databases
1.1 Limitations of Traditional Search
Traditional databases rely on exact keyword matching. If you search for "a puppy playing in the park," you will not find "a dog running on the lawn." These two sentences are semantically nearly identical, but their keywords differ.
Traditional search: "puppy park" → keyword matching → only returns docs containing "puppy" AND "park"
Vector search: "puppy park" → semantic vectorization → returns all semantically similar docs
→ "a dog running on the lawn" ✅ found
→ "best pet walking spots" ✅ found
1.2 Problems Vector Databases Solve
Vector databases store data as high-dimensional vectors (arrays of numbers) and search based on vector similarity.
Core Use Cases:
| Domain | Description | Example |
|---|---|---|
| RAG (Retrieval-Augmented Generation) | Provide relevant document context to LLMs | ChatGPT + internal docs |
| Semantic Search | Meaning-based search | Natural language question search |
| Recommendation Systems | Find similar items | Product/content recommendations |
| Image Search | Visual similarity | "Products similar to this outfit" |
| Anomaly Detection | Data deviating from normal patterns | Fraud detection |
| Duplicate Detection | Identify similar content | Plagiarism detection, dedup |
1.3 Market Growth
The Vector Database market is projected to grow from 1.5 billion dollars in 2024 to approximately 6 billion dollars by 2028. Explosive adoption of RAG pipelines is the key driver.
2. Vector Embedding Fundamentals
2.1 What Are Embeddings
Embeddings convert unstructured data such as text, images, and audio into numeric vectors in high-dimensional space. Semantically similar data is located close together in vector space.
from openai import OpenAI
client = OpenAI()
# Convert text to vector
response = client.embeddings.create(
model="text-embedding-3-large",
input="Vector databases are essential AI infrastructure"
)
embedding = response.data[0].embedding
print(f"Dimensions: {len(embedding)}") # 3072
print(f"Vector sample: {embedding[:5]}") # [0.023, -0.041, 0.017, ...]
2.2 Text Embedding Model Comparison
| Model | Provider | Dimensions | MTEB Score | Cost | Features |
|---|---|---|---|---|---|
| text-embedding-3-large | OpenAI | 3072 | 64.6 | Paid | Dimension reduction support |
| text-embedding-3-small | OpenAI | 1536 | 62.3 | Low cost | Best cost-performance ratio |
| embed-v3.0 | Cohere | 1024 | 64.5 | Paid | Excellent multilingual |
| BGE-M3 | BAAI | 1024 | 68.2 | Free | Best open-source |
| Jina-embeddings-v3 | Jina AI | 1024 | 65.5 | Free | Multilingual specialized |
| all-MiniLM-L6-v2 | SBERT | 384 | 56.3 | Free | Lightweight and fast |
| nomic-embed-text | Nomic | 768 | 62.4 | Free | Long context |
2.3 Image Embeddings
CLIP (Contrastive Language-Image Pre-Training) maps text and images into the same vector space.
from sentence_transformers import SentenceTransformer
from PIL import Image
model = SentenceTransformer("clip-ViT-B-32")
# Image embedding
img = Image.open("cat_photo.jpg")
img_embedding = model.encode(img)
# Text embedding (same vector space)
text_embedding = model.encode("a cute orange cat")
# Cross-modal search is possible!
from numpy import dot
from numpy.linalg import norm
similarity = dot(img_embedding, text_embedding) / (
norm(img_embedding) * norm(text_embedding)
)
print(f"Similarity: {similarity:.4f}") # 0.28+ (higher if related)
2.4 Multimodal Embeddings
Modern models unify text, images, and audio into a single vector space.
Text: "sunset over the ocean" ──┐
├──→ Same vector space → similarity comparison possible
Image: (sunset photo) ──┘
3. Distance Metrics (Similarity Measurement)
3.1 Cosine Similarity
Measures the angle between two vectors. Ignores vector magnitude (length) and compares only direction.
import numpy as np
def cosine_similarity(a, b):
return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))
# Example: 3-dimensional vectors
vec_a = np.array([1, 2, 3])
vec_b = np.array([2, 4, 6]) # Same direction, different magnitude
vec_c = np.array([-1, -2, -3]) # Opposite direction
print(cosine_similarity(vec_a, vec_b)) # 1.0 (exact same direction)
print(cosine_similarity(vec_a, vec_c)) # -1.0 (exact opposite)
When to use: Text embeddings (most common), normalized vectors
3.2 Euclidean Distance (L2)
Measures the straight-line distance between two vectors. Smaller values mean more similar.
def euclidean_distance(a, b):
return np.linalg.norm(a - b)
vec_a = np.array([1, 2])
vec_b = np.array([4, 6])
print(euclidean_distance(vec_a, vec_b)) # 5.0
When to use: When vector magnitude is meaningful, clustering
3.3 Dot Product (Inner Product)
The inner product of two vectors. Considers both direction and magnitude.
def dot_product(a, b):
return np.dot(a, b)
vec_a = np.array([1, 2, 3])
vec_b = np.array([4, 5, 6])
print(dot_product(vec_a, vec_b)) # 32
When to use: Maximum Inner Product Search (MIPS), equivalent to cosine similarity on normalized vectors
3.4 Manhattan Distance (L1)
Sum of absolute differences across each dimension.
def manhattan_distance(a, b):
return np.sum(np.abs(a - b))
vec_a = np.array([1, 2, 3])
vec_b = np.array([4, 6, 3])
print(manhattan_distance(vec_a, vec_b)) # 7 (3 + 4 + 0)
3.5 Metric Selection Guide
Text embedding search → Cosine Similarity (default recommendation)
Image similarity search → Euclidean Distance
Recommendation (MIPS) → Dot Product
High-dimensional sparse → Cosine Similarity
Clustering / Classification → Euclidean Distance
4. Indexing Algorithms
When you have millions of vectors, comparing against every single one (brute-force) is too slow. Indexing algorithms speed up search by thousands of times.
4.1 HNSW (Hierarchical Navigable Small World)
The most widely used ANN (Approximate Nearest Neighbor) algorithm.
How it works:
Layer 3: [A] ────────────────── [B] (few nodes, long-range links)
\ /
Layer 2: [A] ── [C] ── [D] ── [B] (medium nodes)
\ / \ / \ / \
Layer 1: [A]-[E]-[C]-[F]-[D]-[G]-[B] (most nodes, short-range links)
| | | | | | |
Layer 0: [All vectors exist at base layer] (all nodes)
- Navigate from upper layers for coarse positioning, then refine at lower layers
- Graph-based so it uses more memory, but very fast
# HNSW configuration in Qdrant
from qdrant_client import QdrantClient
from qdrant_client.models import VectorParams, HnswConfigDiff
client = QdrantClient("localhost", port=6333)
client.create_collection(
collection_name="documents",
vectors_config=VectorParams(
size=1536,
distance="Cosine"
),
hnsw_config=HnswConfigDiff(
m=16, # connections per node (higher = more accurate, more memory)
ef_construct=128, # search scope during index building
full_scan_threshold=10000 # brute-force below this count
)
)
Key Parameters:
| Parameter | Description | Default | Effect |
|---|---|---|---|
| M | Connections per node | 16 | Higher increases accuracy and memory |
| ef_construct | Build search scope | 128 | Higher increases index quality |
| ef_search | Query search scope | 64 | Higher increases accuracy, decreases speed |
4.2 IVF (Inverted File Index)
Divides vectors into clusters and searches only the nearest clusters at query time.
Vector space:
┌─────────────────────────────────┐
│ * * X * │
│ * Cluster1 * * │
│ * * X │
│ * * Cluster2 * │
│ * * * * │
│ * * │
│ * * * X │
│ Cluster3 Cluster4 │
│ * * * * * * │
└─────────────────────────────────┘
X = cluster centroid
Search: only search within the nprobe closest clusters to the query
4.3 IVF-PQ (IVF + Product Quantization)
PQ compresses vectors to reduce memory usage.
# Creating IVF-PQ index with FAISS
import faiss
dimension = 1536
nlist = 256 # number of clusters
m_pq = 48 # number of sub-vectors (splits dimensions into m_pq parts)
nbits = 8 # codebook size per sub-vector
quantizer = faiss.IndexFlatL2(dimension)
index = faiss.IndexIVFPQ(quantizer, dimension, nlist, m_pq, nbits)
# Train clusters + codebook on training data
index.train(training_vectors)
index.add(all_vectors)
# Set nprobe at search time
index.nprobe = 16 # number of clusters to probe
distances, indices = index.search(query_vector, k=10)
Memory comparison:
Original (float32, 1536 dims): 1536 x 4 bytes = 6,144 bytes/vector
PQ compressed (48 sub-vectors): 48 x 1 byte = 48 bytes/vector
→ 128x compression! 100M vectors: 614GB → 4.8GB
4.4 ScaNN (Scalable Nearest Neighbors)
Developed by Google. Combines asymmetric hashing with quantization to deliver fast speed even at high recall levels.
4.5 Annoy (Approximate Nearest Neighbors Oh Yeah)
Developed by Spotify. Recursively partitions space using random hyperplanes to build a tree structure. Read-only index with memory-mapped file support, shareable across multiple processes.
4.6 Algorithm Comparison
| Algorithm | Search Speed | Memory | Index Build | Accuracy | Suitable Scale |
|---|---|---|---|---|---|
| Flat (Brute-force) | Slow | High | None | 100% | Under 100K |
| HNSW | Very Fast | High | Slow | Very High | Millions |
| IVF-Flat | Fast | Medium | Medium | High | Tens of millions |
| IVF-PQ | Fast | Low | Medium | Medium | Hundreds of millions |
| ScaNN | Very Fast | Medium | Medium | High | Hundreds of millions |
| Annoy | Fast | Low | Fast | Medium | Tens of millions |
5. Vector Database Comparison
5.1 Major Solutions Comparison
| Feature | Pinecone | Weaviate | Qdrant | Milvus | Chroma | pgvector |
|---|---|---|---|---|---|---|
| Type | Managed SaaS | Open-source/Cloud | Open-source/Cloud | Open-source | Open-source | PostgreSQL extension |
| Index | Proprietary | HNSW | HNSW | HNSW/IVF/DiskANN | HNSW | HNSW/IVF |
| Hybrid Search | Sparse-Dense | BM25 + Vector | Sparse + Dense | Supported | Limited | Full-text + Vector |
| Filtering | Metadata filter | GraphQL filter | Payload filter | Attribute filter | Where clause | SQL WHERE |
| Multi-tenancy | Namespaces | Tenant isolation | Collection/Payload | Partitions | Collections | Schema/RLS |
| Max Dimensions | 20,000 | 65,535 | 65,535 | 32,768 | Unlimited | 2,000 |
| Distributed | Automatic | Raft consensus | Raft consensus | Distributed arch | Not supported | Citus extension |
| SDKs | Python/JS/Go/Java | Python/JS/Go/Java | Python/JS/Rust/Go | Python/JS/Go/Java | Python/JS | SQL |
| Pricing | Pod/Serverless plans | Open-source free | Open-source free | Open-source free | Open-source free | Free |
| GPU Support | Internal | No | No | NVIDIA GPU | No | No |
| Real-time Updates | Yes | Yes | Yes | Yes | Yes | Yes |
| Backup/Restore | Automatic | Collections API | Snapshot | Supported | Limited | pg_dump |
| Community | Medium | Active | Active | Very Active | Active | Very Active |
| Serverless | Yes | Yes | Yes (Cloud) | Zilliz Cloud | No | Supabase/Neon |
5.2 Pinecone
Fully managed vector database. Use via API without any infrastructure management.
from pinecone import Pinecone, ServerlessSpec
pc = Pinecone(api_key="YOUR_KEY")
# Create index
pc.create_index(
name="my-index",
dimension=1536,
metric="cosine",
spec=ServerlessSpec(
cloud="aws",
region="us-east-1"
)
)
index = pc.Index("my-index")
# Upsert vectors
index.upsert(
vectors=[
("id1", [0.1, 0.2, ...], {"title": "Document Title", "category": "tech"}),
("id2", [0.3, 0.4, ...], {"title": "Another Doc", "category": "science"}),
],
namespace="articles"
)
# Search (filter + vector)
results = index.query(
vector=[0.15, 0.25, ...],
top_k=5,
filter={"category": "tech"},
namespace="articles",
include_metadata=True
)
Pros: Fully managed, instant scaling, Serverless option Cons: Vendor lock-in, higher cost, no self-hosting
5.3 Weaviate
Open-source vector database with GraphQL-based API and built-in vectorization modules.
import weaviate
from weaviate.classes.config import Configure, Property, DataType
client = weaviate.connect_to_local()
# Create collection (built-in vectorization)
collection = client.collections.create(
name="Article",
vectorizer_config=Configure.Vectorizer.text2vec_openai(
model="text-embedding-3-small"
),
properties=[
Property(name="title", data_type=DataType.TEXT),
Property(name="content", data_type=DataType.TEXT),
Property(name="category", data_type=DataType.TEXT),
]
)
# Add data (automatic vectorization)
articles = client.collections.get("Article")
articles.data.insert(
properties={
"title": "Vector DB Guide",
"content": "Vector databases are essential for AI...",
"category": "tech"
}
)
# Semantic search
response = articles.query.near_text(
query="artificial intelligence infrastructure",
limit=5,
filters=weaviate.classes.query.Filter.by_property("category").equal("tech")
)
Pros: Built-in vectorization, GraphQL API, module ecosystem Cons: Resource heavy, learning curve
5.4 Qdrant
High-performance vector database written in Rust. Strong in payload filtering and quantization.
from qdrant_client import QdrantClient
from qdrant_client.models import (
Distance, VectorParams, PointStruct,
Filter, FieldCondition, MatchValue
)
client = QdrantClient("localhost", port=6333)
# Create collection
client.create_collection(
collection_name="articles",
vectors_config=VectorParams(
size=1536,
distance=Distance.COSINE
)
)
# Upsert vectors
client.upsert(
collection_name="articles",
points=[
PointStruct(
id=1,
vector=[0.1, 0.2, ...],
payload={"title": "Vector DB Guide", "category": "tech", "views": 1500}
),
]
)
# Filter + vector search
results = client.search(
collection_name="articles",
query_vector=[0.15, 0.25, ...],
query_filter=Filter(
must=[
FieldCondition(key="category", match=MatchValue(value="tech")),
]
),
limit=5
)
Pros: Rust-based performance, fine-grained filtering, Scalar/Binary quantization Cons: Relatively smaller ecosystem
5.5 Milvus
Large-scale vector database specialized for distributed architectures. Supports GPU acceleration.
from pymilvus import connections, Collection, FieldSchema, CollectionSchema, DataType
connections.connect("default", host="localhost", port="19530")
# Define schema
fields = [
FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=True),
FieldSchema(name="title", dtype=DataType.VARCHAR, max_length=200),
FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=1536),
]
schema = CollectionSchema(fields, description="Articles")
collection = Collection("articles", schema)
# Create index
index_params = {
"index_type": "HNSW",
"metric_type": "COSINE",
"params": {"M": 16, "efConstruction": 256}
}
collection.create_index("embedding", index_params)
# Search
collection.load()
results = collection.search(
data=[[0.1, 0.2, ...]],
anns_field="embedding",
param={"metric_type": "COSINE", "params": {"ef": 128}},
limit=5,
output_fields=["title"]
)
Pros: Large-scale distributed, GPU support, diverse indexes Cons: Operational complexity, resource requirements
5.6 ChromaDB
Lightweight, developer-friendly open-source vector database. Ideal for prototyping.
import chromadb
client = chromadb.PersistentClient(path="./chroma_db")
collection = client.create_collection(
name="articles",
metadata={"hnsw:space": "cosine"}
)
# Add documents (automatic embedding)
collection.add(
documents=["Vector DBs are core AI infrastructure", "RAG is retrieval augmented generation"],
metadatas=[{"category": "tech"}, {"category": "ai"}],
ids=["doc1", "doc2"]
)
# Search
results = collection.query(
query_texts=["artificial intelligence database"],
n_results=5,
where={"category": "tech"}
)
Pros: Ultra-simple API, built-in embeddings, local execution Cons: Limited production scaling, no distributed support
6. pgvector Deep Dive
6.1 Why pgvector
If you already use PostgreSQL, you can add vector search to your existing infrastructure without a separate vector database. Combine the rich features of SQL with vector search.
6.2 Installation and Setup
-- Requires PostgreSQL 14+
-- Install extension
CREATE EXTENSION IF NOT EXISTS vector;
-- Create table with vector column
CREATE TABLE documents (
id BIGSERIAL PRIMARY KEY,
title TEXT NOT NULL,
content TEXT,
category TEXT,
embedding VECTOR(1536), -- OpenAI text-embedding-3-small dimensions
created_at TIMESTAMPTZ DEFAULT NOW()
);
6.3 Data Insertion and Search
-- Insert vector
INSERT INTO documents (title, content, category, embedding)
VALUES (
'Vector DB Guide',
'Vector databases are essential for AI...',
'tech',
'[0.1, 0.2, 0.3, ...]'::vector -- 1536-dimensional vector
);
-- Cosine distance search (top 5 most similar)
SELECT id, title, 1 - (embedding <=> '[0.15, 0.25, ...]'::vector) AS similarity
FROM documents
WHERE category = 'tech'
ORDER BY embedding <=> '[0.15, 0.25, ...]'::vector
LIMIT 5;
-- L2 distance search
SELECT id, title, embedding <-> '[0.15, 0.25, ...]'::vector AS distance
FROM documents
ORDER BY embedding <-> '[0.15, 0.25, ...]'::vector
LIMIT 5;
-- Inner product search (Maximum Inner Product)
SELECT id, title, (embedding <#> '[0.15, 0.25, ...]'::vector) * -1 AS inner_product
FROM documents
ORDER BY embedding <#> '[0.15, 0.25, ...]'::vector
LIMIT 5;
6.4 HNSW vs IVFFlat Index
-- HNSW index (recommended)
CREATE INDEX ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 128);
-- Set ef_search at query time
SET hnsw.ef_search = 100;
-- IVFFlat index
CREATE INDEX ON documents
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100); -- number of clusters (sqrt(rows) recommended)
-- Set probes at query time
SET ivfflat.probes = 10;
HNSW vs IVFFlat comparison:
| Feature | HNSW | IVFFlat |
|---|---|---|
| Search Speed | Faster | Fast |
| Index Build | Slow | Fast |
| Memory | Higher | Lower |
| Accuracy (Recall) | Higher | Medium |
| Real-time Insert | Excellent | May need rebuild |
| Recommendation | Default choice | When memory constrained |
6.5 Query Optimization
-- 1. Partial index (specific category only)
CREATE INDEX ON documents
USING hnsw (embedding vector_cosine_ops)
WHERE category = 'tech';
-- 2. Filter + vector search with denormalization
-- Slow pattern: vector search then filter
SELECT * FROM documents
WHERE category = 'tech'
ORDER BY embedding <=> query_vec
LIMIT 10;
-- Fast pattern: use partitioned tables
CREATE TABLE documents_tech PARTITION OF documents
FOR VALUES IN ('tech');
-- 3. Confirm index usage with EXPLAIN
EXPLAIN (ANALYZE, BUFFERS)
SELECT id, title
FROM documents
ORDER BY embedding <=> '[0.15, 0.25, ...]'::vector
LIMIT 5;
-- Verify: Index Scan using documents_embedding_idx
6.6 Using pgvector with Python
import psycopg2
from pgvector.psycopg2 import register_vector
conn = psycopg2.connect("postgresql://user:pass@localhost/mydb")
register_vector(conn)
cur = conn.cursor()
# Vector search
query_embedding = [0.1, 0.2, ...] # 1536 dimensions
cur.execute("""
SELECT id, title, 1 - (embedding <=> %s::vector) AS similarity
FROM documents
WHERE category = %s
ORDER BY embedding <=> %s::vector
LIMIT %s
""", (query_embedding, "tech", query_embedding, 5))
results = cur.fetchall()
for row in results:
print(f"ID: {row[0]}, Title: {row[1]}, Similarity: {row[2]:.4f}")
7. Hybrid Search
7.1 Why Hybrid Search
Vector search alone is insufficient in some cases.
Query: "PostgreSQL 16 release notes"
Vector only: "MySQL 8.0 new features" may also rank high (similar semantics)
Keyword only: only documents containing exactly "PostgreSQL 16"
Hybrid: semantically similar + contains "PostgreSQL 16" = optimal results
7.2 BM25 + Vector Combination
# Hybrid search in Weaviate
response = articles.query.hybrid(
query="PostgreSQL vector search performance",
alpha=0.5, # 0 = keyword only, 1 = vector only, 0.5 = balanced
limit=10,
fusion_type="relative_score" # rankedFusion or relativeScoreFusion
)
7.3 Reciprocal Rank Fusion (RRF)
An algorithm that combines rankings from two search results.
def reciprocal_rank_fusion(keyword_results, vector_results, k=60):
"""
RRF score = sum(1 / (k + rank_i))
k = 60 is standard
"""
scores = {}
for rank, doc_id in enumerate(keyword_results):
scores[doc_id] = scores.get(doc_id, 0) + 1 / (k + rank + 1)
for rank, doc_id in enumerate(vector_results):
scores[doc_id] = scores.get(doc_id, 0) + 1 / (k + rank + 1)
# Sort by score
return sorted(scores.items(), key=lambda x: x[1], reverse=True)
# Example
keyword_hits = ["doc_A", "doc_B", "doc_C", "doc_D"]
vector_hits = ["doc_C", "doc_A", "doc_E", "doc_B"]
fused = reciprocal_rank_fusion(keyword_hits, vector_hits)
# doc_A and doc_C appear top in both → final top results
7.4 pgvector + Full-Text Search
-- Hybrid search (PostgreSQL)
WITH vector_search AS (
SELECT id, title, content,
1 - (embedding <=> query_vec) AS vector_score,
ROW_NUMBER() OVER (ORDER BY embedding <=> query_vec) AS vector_rank
FROM documents
ORDER BY embedding <=> query_vec
LIMIT 20
),
keyword_search AS (
SELECT id, title, content,
ts_rank(to_tsvector('english', content), plainto_tsquery('english', 'vector database')) AS bm25_score,
ROW_NUMBER() OVER (ORDER BY ts_rank(to_tsvector('english', content), plainto_tsquery('english', 'vector database')) DESC) AS keyword_rank
FROM documents
WHERE to_tsvector('english', content) @@ plainto_tsquery('english', 'vector database')
LIMIT 20
)
SELECT
COALESCE(v.id, k.id) AS id,
COALESCE(v.title, k.title) AS title,
-- RRF score
COALESCE(1.0 / (60 + v.vector_rank), 0) +
COALESCE(1.0 / (60 + k.keyword_rank), 0) AS rrf_score
FROM vector_search v
FULL OUTER JOIN keyword_search k ON v.id = k.id
ORDER BY rrf_score DESC
LIMIT 10;
8. Metadata Filtering
8.1 Filtering Strategies
Three strategies for combining vector search with metadata filters.
Pre-filtering: Apply filter → search only filtered vectors (accurate but can be slow)
Post-filtering: Vector search → apply filter to results (fast but may return fewer results)
In-filtering: Apply filter during search simultaneously (optimal but complex to implement)
8.2 Advanced Filtering in Qdrant
from qdrant_client.models import Filter, FieldCondition, MatchValue, Range
# Complex filter
results = client.search(
collection_name="products",
query_vector=query_embedding,
query_filter=Filter(
must=[
FieldCondition(key="category", match=MatchValue(value="electronics")),
FieldCondition(key="price", range=Range(gte=100, lte=500)),
],
must_not=[
FieldCondition(key="out_of_stock", match=MatchValue(value=True)),
],
should=[
FieldCondition(key="brand", match=MatchValue(value="Apple")),
FieldCondition(key="brand", match=MatchValue(value="Samsung")),
]
),
limit=10
)
8.3 Efficient Metadata Design
Recommended:
- Create payload indexes on frequently filtered fields
- Works best for low-cardinality fields (category, status)
- Date range filters need indexes
- Prefer flat structures over nested objects
Not recommended:
- Indexing fields with very high cardinality (user IDs, etc.)
- Using only metadata for search without vectors (a regular DB is better)
9. Namespace and Collection Management
9.1 Multi-tenancy Strategies
Strategy 1: Namespace/Partition Separation (Pinecone, Milvus)
┌──────────── Index ────────────┐
│ Namespace: tenant_A ***** │
│ Namespace: tenant_B ooooo │
│ Namespace: tenant_C ^^^^^ │
└───────────────────────────────┘
- Pros: Simple, data isolation
- Cons: Cross-tenant search difficult
Strategy 2: Metadata Filter (Qdrant, Weaviate)
┌──────── Collection ───────────┐
│ *(A) o(B) ^(C) *(A) o(B) │
│ ^(C) *(A) *(A) o(B) ^(C) │
│ Separated by tenant_id filter│
└───────────────────────────────┘
- Pros: Flexible, cross-search possible
- Cons: Performance degrades with many tenants
Strategy 3: Collection Separation (few large tenants)
┌─ Collection: tenant_A ─┐
│ ****************** │
└────────────────────────┘
┌─ Collection: tenant_B ─┐
│ oooooooooooooooooo │
└────────────────────────┘
- Pros: Perfect isolation, independent scaling
- Cons: Management overhead
9.2 Collection Lifecycle Management
# Qdrant example: collection management
# List collections
collections = client.get_collections()
# Collection info
info = client.get_collection("articles")
print(f"Vector count: {info.points_count}")
print(f"Index status: {info.status}")
# Delete collection
client.delete_collection("old_articles")
# Aliases (blue-green deployment)
client.update_collection_aliases(
change_aliases_operations=[
{"create_alias": {"collection_name": "articles_v2", "alias_name": "articles_prod"}}
]
)
10. Production Operations
10.1 Sharding Strategy
Horizontal Sharding: distribute data across multiple nodes
┌─ Node 1 ─┐ ┌─ Node 2 ─┐ ┌─ Node 3 ─┐
│ Shard 1 │ │ Shard 2 │ │ Shard 3 │
│ 33% data │ │ 33% data │ │ 33% data │
└───────────┘ └───────────┘ └───────────┘
↕ Query all nodes at search time, then merge
Qdrant sharding config:
- auto sharding: automatically distributes based on shard_number
- custom sharding: tenant-level separation via shard_key
10.2 Replication
# Qdrant replication setup
client.create_collection(
collection_name="articles",
vectors_config=VectorParams(size=1536, distance=Distance.COSINE),
replication_factor=3, # replicate to 3 nodes
write_consistency_factor=2 # write succeeds after 2 nodes confirm
)
10.3 Backup and Restore
# Qdrant snapshot creation
curl -X POST "http://localhost:6333/collections/articles/snapshots"
# Snapshot recovery
curl -X PUT "http://localhost:6333/collections/articles/snapshots/recover" \
-H "Content-Type: application/json" \
-d '{"location": "http://backup-server/snapshot.tar"}'
# pgvector: use pg_dump
pg_dump -t documents mydb > documents_backup.sql
10.4 Monitoring
# Key metrics
monitoring_metrics = {
"Search latency (p50, p95, p99)": "Target: p99 under 100ms",
"QPS (queries per second)": "Monitor under load",
"Recall@K": "Accuracy. Target 0.95+",
"Index size / memory usage": "Prevent OOM",
"Insert latency": "Critical for real-time updates",
"Disk usage": "Capacity planning",
"Replication lag": "Data consistency",
}
11. Performance Benchmarks
11.1 ANN Benchmark Results (1M vectors, 128 dimensions)
| DB / Algorithm | QPS (Recall 0.95) | QPS (Recall 0.99) | Index Time | Memory |
|---|---|---|---|---|
| Qdrant HNSW | 8,500 | 4,200 | 12 min | 2.1GB |
| Weaviate HNSW | 7,800 | 3,900 | 14 min | 2.3GB |
| Milvus HNSW | 9,200 | 4,500 | 11 min | 2.0GB |
| pgvector HNSW | 3,400 | 1,600 | 25 min | 2.5GB |
| Pinecone (p2) | 5,000 | 2,800 | N/A | N/A |
| FAISS IVF-PQ | 12,000 | 5,500 | 8 min | 0.4GB |
11.2 Real-World Benchmark (10M vectors, 1536 dimensions)
Test environment: AWS r6g.2xlarge (8vCPU, 64GB RAM)
Qdrant:
- Index build: 45 min
- Memory: 48GB
- p50 latency: 3ms, p99 latency: 12ms
- QPS: 2,100 (Recall@10 = 0.96)
pgvector (HNSW):
- Index build: 2h 30min
- Memory: 52GB
- p50 latency: 8ms, p99 latency: 35ms
- QPS: 800 (Recall@10 = 0.95)
Milvus (DiskANN):
- Index build: 35 min
- Memory: 12GB (disk-based)
- p50 latency: 5ms, p99 latency: 18ms
- QPS: 1,800 (Recall@10 = 0.95)
11.3 Benchmark Summary
Small scale (under 100K) + existing PostgreSQL → pgvector
Small scale prototyping → ChromaDB
Medium scale (100K-10M) + self-hosted → Qdrant or Weaviate
Large scale (10M+) + self-hosted → Milvus
Want managed service → Pinecone or Zilliz Cloud
Hybrid search priority → Weaviate
Filtering priority → Qdrant
12. Cost Analysis
12.1 Managed Service Cost Comparison
| Service | Free Tier | Starting Price | Est. Cost for 1M Vectors |
|---|---|---|---|
| Pinecone Serverless | 2GB storage | Pay per read/write | ~70 USD/month |
| Pinecone Pod (s1) | - | 70 USD/month | ~140 USD/month |
| Weaviate Cloud | 14-day free | 25 USD/month | ~100 USD/month |
| Qdrant Cloud | 1GB free | 25 USD/month | ~65 USD/month |
| Zilliz Cloud | Free tier | Pay-as-you-go | ~90 USD/month |
12.2 Self-Hosting Costs
1M vectors (1536 dims, HNSW) estimated infrastructure:
- RAM needed: ~12GB
- Disk: ~20GB
- AWS r6g.xlarge (4vCPU, 32GB): ~150 USD/month
- Operations personnel cost separate
10M vectors:
- RAM needed: ~100GB
- AWS r6g.4xlarge (16vCPU, 128GB): ~600 USD/month
Cost reduction tips:
- Quantization saves 60-75% memory
- DiskANN leverages SSD (80% memory savings)
- Dimension reduction (1536 to 512) saves 3x memory
13. Practical Implementation: Vector DB in a RAG Pipeline
from openai import OpenAI
from qdrant_client import QdrantClient
from qdrant_client.models import PointStruct, VectorParams, Distance
openai_client = OpenAI()
qdrant_client = QdrantClient("localhost", port=6333)
# 1. Embed and store documents
def embed_and_store(documents):
for doc in documents:
response = openai_client.embeddings.create(
model="text-embedding-3-small",
input=doc["content"]
)
embedding = response.data[0].embedding
qdrant_client.upsert(
collection_name="knowledge_base",
points=[PointStruct(
id=doc["id"],
vector=embedding,
payload={"title": doc["title"], "content": doc["content"]}
)]
)
# 2. Search + LLM generation
def rag_query(question):
# Embed question
q_response = openai_client.embeddings.create(
model="text-embedding-3-small",
input=question
)
q_embedding = q_response.data[0].embedding
# Vector search
results = qdrant_client.search(
collection_name="knowledge_base",
query_vector=q_embedding,
limit=5
)
# Build context
context = "\n\n".join([
f"[{r.payload['title']}]\n{r.payload['content']}"
for r in results
])
# LLM generation
response = openai_client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": f"Answer based on the following context:\n\n{context}"},
{"role": "user", "content": question}
]
)
return response.choices[0].message.content
14. Quiz
Q1. What is the difference between cosine similarity and Euclidean distance?
Cosine similarity compares only the direction (angle) of two vectors. It equals 1 when vectors point in the same direction regardless of magnitude. Euclidean distance measures the straight-line distance between two vectors, where magnitude also affects the result. For normalized vectors, both metrics produce the same ordering. Cosine similarity is the default recommendation for text embeddings.
Q2. What happens when you increase the M parameter in the HNSW algorithm?
M is the maximum number of connections per node. Increasing M makes the graph denser, which improves search accuracy (recall), but increases memory usage and lengthens index build time. The default M=16 works well generally. For higher recall requirements, you can increase to M=32-64. Very high values show diminishing returns.
Q3. What is the role of RRF (Reciprocal Rank Fusion) in hybrid search?
RRF is an algorithm that combines results from keyword search and vector search based on ranking. The RRF score for each document is calculated as 1/(k + rank), and documents that rank highly in both search results appear at the top of the final results. k=60 is the standard, and it has the advantage of not requiring score normalization.
Q4. Should you choose HNSW or IVFFlat index for pgvector?
HNSW is the recommended default choice. HNSW offers higher recall, faster search speed, and real-time insert support. IVFFlat should be considered when memory constraints are severe. IVFFlat has faster index build time and uses less memory, but may require rebuilding (REINDEX) when data changes frequently.
Q5. What is a cost-effective Vector DB choice at 10M vector scale?
If self-hosting is possible: Qdrant or Milvus are cost-effective. Using Milvus with DiskANN index can significantly reduce memory usage. If you want managed: Pinecone Serverless offers pay-per-use cost management. Applying quantization (Scalar/Binary) can reduce memory by 60-75%, greatly cutting infrastructure costs. Dimension reduction (1536 to 512) is also effective.
15. References
- Pinecone Documentation - https://docs.pinecone.io/
- Weaviate Documentation - https://weaviate.io/developers/weaviate
- Qdrant Documentation - https://qdrant.tech/documentation/
- Milvus Documentation - https://milvus.io/docs
- pgvector GitHub - https://github.com/pgvector/pgvector
- ChromaDB Documentation - https://docs.trychroma.com/
- ANN Benchmarks - https://ann-benchmarks.com/
- FAISS Wiki - https://github.com/facebookresearch/faiss/wiki
- OpenAI Embeddings Guide - https://platform.openai.com/docs/guides/embeddings
- Cohere Embed Documentation - https://docs.cohere.com/reference/embed
- HNSW Original Paper - https://arxiv.org/abs/1603.09320
- Product Quantization Paper - https://hal.inria.fr/inria-00514462v2/document
- MTEB Leaderboard - https://huggingface.co/spaces/mteb/leaderboard
- Reciprocal Rank Fusion Paper - https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf
현재 단락 (1/681)
Traditional databases rely on exact keyword matching. If you search for "a puppy playing in the park...