Split View: PostgreSQL + pgvector 벡터 검색 실전 가이드: RAG부터 하이브리드 검색까지

PostgreSQL + pgvector 벡터 검색 실전 가이드: RAG부터 하이브리드 검색까지

1. 왜 PostgreSQL + pgvector인가
2. 설치 및 설정
- 2.1 pgvector 설치
- 2.2 확장 활성화
3. 기본 사용법
4. 인덱스: HNSW vs IVFFlat
5. 하이브리드 검색: 벡터 + 전문 검색
- 5.1 Full-Text Search + Vector Search
- 5.2 Reciprocal Rank Fusion (RRF)
6. RAG 파이프라인 통합
- 6.1 Python 전체 예제
7. 성능 튜닝
- 7.1 핵심 설정
- 7.2 벤치마크
8. 퀴즈

1. 왜 PostgreSQL + pgvector인가

별도 벡터 DB(Pinecone, Qdrant, Weaviate) 대신 PostgreSQL + pgvector를 선택하는 이유:

특성	전용 벡터 DB	PostgreSQL + pgvector
추가 인프라	✅ 필요	❌ 기존 PostgreSQL 활용
ACID 트랜잭션	❌ 제한적	✅ 완전 지원
JOIN/관계형 쿼리	❌ 불가	✅ 자유롭게 조합
하이브리드 검색	⚠️ 제한적	✅ tsvector + vector
운영 복잡도	높음	낮음 (기존 DBA 활용)
확장성	✅ 수십억 벡터	⚠️ 수천만 수준

결론: 벡터 수가 수천만 이하이고, 관계형 데이터와 함께 관리해야 한다면 pgvector가 최적.

2. 설치 및 설정

2.1 pgvector 설치

# Ubuntu/Debian
sudo apt install postgresql-17-pgvector

# macOS (Homebrew)
brew install pgvector

# Docker
docker run -d --name pgvector \
  -e POSTGRES_PASSWORD=secret \
  -p 5432:5432 \
  pgvector/pgvector:pg17

2.2 확장 활성화

-- pgvector 확장 설치
CREATE EXTENSION IF NOT EXISTS vector;

-- 버전 확인
SELECT extversion FROM pg_extension WHERE extname = 'vector';
-- 0.8.0

3. 기본 사용법

3.1 테이블 생성

-- 문서 테이블 (1536차원 = OpenAI text-embedding-3-small)
CREATE TABLE documents (
    id BIGSERIAL PRIMARY KEY,
    title TEXT NOT NULL,
    content TEXT NOT NULL,
    embedding vector(1536),  -- 벡터 컬럼!
    metadata JSONB DEFAULT '{}',
    created_at TIMESTAMPTZ DEFAULT NOW()
);

-- 384차원 (sentence-transformers/all-MiniLM-L6-v2)
CREATE TABLE chunks (
    id BIGSERIAL PRIMARY KEY,
    doc_id BIGINT REFERENCES documents(id),
    chunk_text TEXT NOT NULL,
    embedding vector(384),
    chunk_index INT
);

3.2 데이터 삽입

-- 단일 삽입
INSERT INTO documents (title, content, embedding)
VALUES (
    'Kubernetes RBAC Guide',
    'RBAC is a method of regulating access...',
    '[0.1, 0.2, 0.3, ...]'::vector  -- 1536차원 벡터
);

-- Python에서 배치 삽입

import psycopg2
from pgvector.psycopg2 import register_vector
import numpy as np

conn = psycopg2.connect("dbname=mydb user=postgres password=secret")
register_vector(conn)

cur = conn.cursor()

# OpenAI 임베딩 생성
from openai import OpenAI
client = OpenAI()

texts = ["Kubernetes RBAC guide", "Docker networking basics", ...]
response = client.embeddings.create(
    model="text-embedding-3-small",
    input=texts
)

# 배치 삽입
for text, emb_data in zip(texts, response.data):
    embedding = np.array(emb_data.embedding)
    cur.execute(
        "INSERT INTO documents (title, content, embedding) VALUES (%s, %s, %s)",
        (text, text, embedding)
    )

conn.commit()

3.3 유사도 검색

-- 코사인 유사도 (가장 일반적)
SELECT id, title,
       1 - (embedding <=> '[0.1, 0.2, ...]'::vector) AS similarity
FROM documents
ORDER BY embedding <=> '[0.1, 0.2, ...]'::vector
LIMIT 10;

-- L2 거리
SELECT id, title,
       embedding <-> '[0.1, 0.2, ...]'::vector AS distance
FROM documents
ORDER BY embedding <-> '[0.1, 0.2, ...]'::vector
LIMIT 10;

-- 내적 (Inner Product) — 정규화된 벡터에서 코사인과 동일
SELECT id, title,
       (embedding <#> '[0.1, 0.2, ...]'::vector) * -1 AS similarity
FROM documents
ORDER BY embedding <#> '[0.1, 0.2, ...]'::vector
LIMIT 10;

연산자 정리:

연산자	의미	용도
`<->`	L2 거리	유클리드 거리 기반
`<=>`	코사인 거리	방향 유사도 (가장 일반적)
`<#>`	내적 (음수)	정규화 벡터에서 사용

4. 인덱스: HNSW vs IVFFlat

4.1 IVFFlat

-- IVFFlat 인덱스 생성
CREATE INDEX ON documents
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);

-- 검색 시 probe 수 설정 (정확도 vs 속도 트레이드오프)
SET ivfflat.probes = 10;

파라미터	설명	권장값
`lists`	클러스터 수	√(row수) ~ row수/1000
`probes`	검색 클러스터 수	lists/10 ~ lists/5

4.2 HNSW (권장)

-- HNSW 인덱스 생성 (빌드 시간 길지만 검색 빠름)
CREATE INDEX ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 200);

-- 검색 시 ef_search 설정
SET hnsw.ef_search = 100;

파라미터	설명	권장값
`m`	연결 수	16~64
`ef_construction`	빌드 품질	200+
`ef_search`	검색 품질	40~200

4.3 HNSW vs IVFFlat 비교

특성	IVFFlat	HNSW
빌드 속도	✅ 빠름	❌ 느림
검색 속도	보통	✅ 빠름
Recall	보통 (probe 의존)	✅ 높음
메모리	✅ 적음	❌ 많음
업데이트	❌ 재빌드 필요	✅ 실시간 가능

권장: 대부분의 경우 HNSW. 데이터가 자주 변경되거나 메모리가 제한적이면 IVFFlat.

5. 하이브리드 검색: 벡터 + 전문 검색

5.1 Full-Text Search + Vector Search

-- tsvector 컬럼 추가
ALTER TABLE documents ADD COLUMN tsv tsvector
  GENERATED ALWAYS AS (to_tsvector('english', title || ' ' || content)) STORED;

CREATE INDEX ON documents USING gin(tsv);

-- 하이브리드 검색 함수
CREATE OR REPLACE FUNCTION hybrid_search(
    query_text TEXT,
    query_embedding vector(1536),
    match_count INT DEFAULT 10,
    vector_weight FLOAT DEFAULT 0.7,
    text_weight FLOAT DEFAULT 0.3
)
RETURNS TABLE (id BIGINT, title TEXT, score FLOAT) AS $$
BEGIN
    RETURN QUERY
    WITH vector_results AS (
        SELECT d.id, d.title,
               1 - (d.embedding <=> query_embedding) AS vector_score
        FROM documents d
        ORDER BY d.embedding <=> query_embedding
        LIMIT match_count * 3
    ),
    text_results AS (
        SELECT d.id, d.title,
               ts_rank(d.tsv, plainto_tsquery('english', query_text)) AS text_score
        FROM documents d
        WHERE d.tsv @@ plainto_tsquery('english', query_text)
        LIMIT match_count * 3
    ),
    combined AS (
        SELECT
            COALESCE(v.id, t.id) AS id,
            COALESCE(v.title, t.title) AS title,
            COALESCE(v.vector_score, 0) * vector_weight +
            COALESCE(t.text_score, 0) * text_weight AS score
        FROM vector_results v
        FULL OUTER JOIN text_results t ON v.id = t.id
    )
    SELECT c.id, c.title, c.score
    FROM combined c
    ORDER BY c.score DESC
    LIMIT match_count;
END;
$$ LANGUAGE plpgsql;

-- 사용
SELECT * FROM hybrid_search(
    'Kubernetes RBAC security',
    '[0.1, 0.2, ...]'::vector(1536)
);

5.2 Reciprocal Rank Fusion (RRF)

CREATE OR REPLACE FUNCTION rrf_search(
    query_text TEXT,
    query_embedding vector(1536),
    match_count INT DEFAULT 10,
    rrf_k INT DEFAULT 60
)
RETURNS TABLE (id BIGINT, title TEXT, rrf_score FLOAT) AS $$
BEGIN
    RETURN QUERY
    WITH vector_ranked AS (
        SELECT d.id, d.title,
               ROW_NUMBER() OVER (ORDER BY d.embedding <=> query_embedding) AS rank
        FROM documents d
        LIMIT match_count * 5
    ),
    text_ranked AS (
        SELECT d.id, d.title,
               ROW_NUMBER() OVER (
                   ORDER BY ts_rank(d.tsv, plainto_tsquery('english', query_text)) DESC
               ) AS rank
        FROM documents d
        WHERE d.tsv @@ plainto_tsquery('english', query_text)
        LIMIT match_count * 5
    ),
    fused AS (
        SELECT
            COALESCE(v.id, t.id) AS id,
            COALESCE(v.title, t.title) AS title,
            COALESCE(1.0 / (rrf_k + v.rank), 0) +
            COALESCE(1.0 / (rrf_k + t.rank), 0) AS rrf_score
        FROM vector_ranked v
        FULL OUTER JOIN text_ranked t ON v.id = t.id
    )
    SELECT f.id, f.title, f.rrf_score
    FROM fused f
    ORDER BY f.rrf_score DESC
    LIMIT match_count;
END;
$$ LANGUAGE plpgsql;

6. RAG 파이프라인 통합

6.1 Python 전체 예제

import psycopg2
from pgvector.psycopg2 import register_vector
from openai import OpenAI
import numpy as np

client = OpenAI()
conn = psycopg2.connect("dbname=ragdb user=postgres password=secret")
register_vector(conn)

def embed(text: str) -> list[float]:
    resp = client.embeddings.create(
        model="text-embedding-3-small", input=text
    )
    return resp.data[0].embedding

def rag_query(question: str, top_k: int = 5) -> str:
    query_vec = embed(question)

    cur = conn.cursor()
    cur.execute("""
        SELECT title, content,
               1 - (embedding <=> %s::vector) AS similarity
        FROM documents
        WHERE 1 - (embedding <=> %s::vector) > 0.7
        ORDER BY embedding <=> %s::vector
        LIMIT %s
    """, (query_vec, query_vec, query_vec, top_k))

    results = cur.fetchall()
    context = "\n\n".join([
        f"[{r[0]}] (similarity: {r[2]:.3f})\n{r[1]}"
        for r in results
    ])

    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": f"Answer based on context:\n{context}"},
            {"role": "user", "content": question}
        ]
    )
    return response.choices[0].message.content

# 사용
answer = rag_query("Kubernetes RBAC에서 ClusterRole과 Role의 차이는?")
print(answer)

7. 성능 튜닝

7.1 핵심 설정

-- 작업 메모리 (인덱스 빌드/검색 시)
SET maintenance_work_mem = '2GB';  -- HNSW 빌드 시
SET work_mem = '256MB';             -- 검색 시

-- 병렬 처리
SET max_parallel_workers_per_gather = 4;
SET max_parallel_maintenance_workers = 4;

-- HNSW 빌드 최적화
SET maintenance_work_mem = '4GB';
-- 빌드 후 원복

7.2 벤치마크

100만 벡터 (1536차원), PostgreSQL 17 + pgvector 0.8.0:

인덱스	빌드 시간	검색 지연 (p50)	Recall@10	메모리
없음 (brute)	-	850ms	100%	0
IVFFlat (lists=1000, probes=50)	45s	8ms	95%	1.2GB
HNSW (m=16, ef=200)	12min	3ms	99%	2.8GB

8. 퀴즈

Q1. pgvector의 <=> 연산자는 무엇을 계산하는가?

코사인 거리 (1 - cosine_similarity). 값이 작을수록 유사하다. 유사도로 변환하려면 1 - (a <=> b).

Q2. HNSW와 IVFFlat 중 검색 속도가 빠른 것은?

HNSW. 빌드는 느리지만 검색은 빠르고 Recall도 높다. 대부분의 프로덕션 환경에서 권장.

Q3. 하이브리드 검색이 순수 벡터 검색보다 나은 이유는?

벡터 검색은 의미적 유사도에 강하지만 키워드 정확 매칭에 약함. 전문 검색과 결합하면 의미적 유사도 + 키워드 정확도 모두 확보.

Q4. RRF(Reciprocal Rank Fusion)의 원리는?

각 검색 결과의 순위의 역수를 합산. 점수 스케일이 다른 검색 결과를 정규화 없이 결합할 수 있는 장점.

Q5. IVFFlat의 lists와 probes 파라미터의 관계는?

lists는 클러스터 수, probes는 검색 시 탐색할 클러스터 수. probes가 클수록 정확하지만 느림. 보통 probes = lists/10 ~ lists/5.

Q6. pgvector를 전용 벡터 DB 대신 선택해야 하는 경우는?

(1) 관계형 데이터와 JOIN 필요 (2) ACID 트랜잭션 필요 (3) 벡터 수 수천만 이하 (4) 추가 인프라 운영 부담 최소화.

PostgreSQL + pgvector Vector Search Practical Guide: From RAG to Hybrid Search

1. Why PostgreSQL + pgvector
2. Installation and Configuration
- 2.1 Installing pgvector
- 2.2 Enabling the Extension
3. Basic Usage
4. Indexes: HNSW vs IVFFlat
5. Hybrid Search: Vector + Full-Text Search
- 5.1 Full-Text Search + Vector Search
- 5.2 Reciprocal Rank Fusion (RRF)
6. RAG Pipeline Integration
- 6.1 Full Python Example
7. Performance Tuning
- 7.1 Key Settings
- 7.2 Benchmark
8. Quiz
Quiz

1. Why PostgreSQL + pgvector

Reasons to choose PostgreSQL + pgvector over dedicated vector DBs (Pinecone, Qdrant, Weaviate):

Characteristic	Dedicated Vector DB	PostgreSQL + pgvector
Additional infra	Required	Not needed (use existing PG)
ACID transactions	Limited	Fully supported
JOIN / relational	Not possible	Freely combinable
Hybrid search	Limited	tsvector + vector
Operational overhead	High	Low (leverage existing DBA)
Scalability	Billions of vectors	Tens of millions

Conclusion: If your vector count is in the tens of millions or fewer and you need to manage data alongside relational data, pgvector is the optimal choice.

2. Installation and Configuration

2.1 Installing pgvector

# Ubuntu/Debian
sudo apt install postgresql-17-pgvector

# macOS (Homebrew)
brew install pgvector

# Docker
docker run -d --name pgvector \
  -e POSTGRES_PASSWORD=secret \
  -p 5432:5432 \
  pgvector/pgvector:pg17

2.2 Enabling the Extension

-- Install pgvector extension
CREATE EXTENSION IF NOT EXISTS vector;

-- Check version
SELECT extversion FROM pg_extension WHERE extname = 'vector';
-- 0.8.0

3. Basic Usage

3.1 Creating Tables

-- Documents table (1536 dimensions = OpenAI text-embedding-3-small)
CREATE TABLE documents (
    id BIGSERIAL PRIMARY KEY,
    title TEXT NOT NULL,
    content TEXT NOT NULL,
    embedding vector(1536),  -- Vector column!
    metadata JSONB DEFAULT '{}',
    created_at TIMESTAMPTZ DEFAULT NOW()
);

-- 384 dimensions (sentence-transformers/all-MiniLM-L6-v2)
CREATE TABLE chunks (
    id BIGSERIAL PRIMARY KEY,
    doc_id BIGINT REFERENCES documents(id),
    chunk_text TEXT NOT NULL,
    embedding vector(384),
    chunk_index INT
);

3.2 Inserting Data

-- Single insert
INSERT INTO documents (title, content, embedding)
VALUES (
    'Kubernetes RBAC Guide',
    'RBAC is a method of regulating access...',
    '[0.1, 0.2, 0.3, ...]'::vector  -- 1536-dimension vector
);

-- Batch insert from Python

import psycopg2
from pgvector.psycopg2 import register_vector
import numpy as np

conn = psycopg2.connect("dbname=mydb user=postgres password=secret")
register_vector(conn)

cur = conn.cursor()

# Generate OpenAI embeddings
from openai import OpenAI
client = OpenAI()

texts = ["Kubernetes RBAC guide", "Docker networking basics", ...]
response = client.embeddings.create(
    model="text-embedding-3-small",
    input=texts
)

# Batch insert
for text, emb_data in zip(texts, response.data):
    embedding = np.array(emb_data.embedding)
    cur.execute(
        "INSERT INTO documents (title, content, embedding) VALUES (%s, %s, %s)",
        (text, text, embedding)
    )

conn.commit()

3.3 Similarity Search

-- Cosine similarity (most common)
SELECT id, title,
       1 - (embedding <=> '[0.1, 0.2, ...]'::vector) AS similarity
FROM documents
ORDER BY embedding <=> '[0.1, 0.2, ...]'::vector
LIMIT 10;

-- L2 distance
SELECT id, title,
       embedding <-> '[0.1, 0.2, ...]'::vector AS distance
FROM documents
ORDER BY embedding <-> '[0.1, 0.2, ...]'::vector
LIMIT 10;

-- Inner Product — equivalent to cosine for normalized vectors
SELECT id, title,
       (embedding <#> '[0.1, 0.2, ...]'::vector) * -1 AS similarity
FROM documents
ORDER BY embedding <#> '[0.1, 0.2, ...]'::vector
LIMIT 10;

Operator summary:

Operator	Meaning	Use case
`<->`	L2 distance	Euclidean distance-based
`<=>`	Cosine distance	Directional similarity (most common)
`<#>`	Inner product (negative)	Used with normalized vectors

4. Indexes: HNSW vs IVFFlat

4.1 IVFFlat

-- Create IVFFlat index
CREATE INDEX ON documents
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);

-- Set probe count for search (accuracy vs speed trade-off)
SET ivfflat.probes = 10;

Parameter	Description	Recommended value
`lists`	Number of clusters	sqrt(rows) ~ rows/1000
`probes`	Clusters to search	lists/10 ~ lists/5

4.2 HNSW (Recommended)

-- Create HNSW index (longer build time but faster search)
CREATE INDEX ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 200);

-- Set ef_search for search
SET hnsw.ef_search = 100;

Parameter	Description	Recommended value
`m`	Connections	16~64
`ef_construction`	Build quality	200+
`ef_search`	Search quality	40~200

4.3 HNSW vs IVFFlat Comparison

Characteristic	IVFFlat	HNSW
Build speed	Fast	Slow
Search speed	Moderate	Fast
Recall	Moderate (probe-dependent)	High
Memory	Low	High
Updates	Requires rebuild	Real-time possible

Recommendation: Use HNSW in most cases. Use IVFFlat if data changes frequently or memory is limited.

5. Hybrid Search: Vector + Full-Text Search

5.1 Full-Text Search + Vector Search

-- Add tsvector column
ALTER TABLE documents ADD COLUMN tsv tsvector
  GENERATED ALWAYS AS (to_tsvector('english', title || ' ' || content)) STORED;

CREATE INDEX ON documents USING gin(tsv);

-- Hybrid search function
CREATE OR REPLACE FUNCTION hybrid_search(
    query_text TEXT,
    query_embedding vector(1536),
    match_count INT DEFAULT 10,
    vector_weight FLOAT DEFAULT 0.7,
    text_weight FLOAT DEFAULT 0.3
)
RETURNS TABLE (id BIGINT, title TEXT, score FLOAT) AS $$
BEGIN
    RETURN QUERY
    WITH vector_results AS (
        SELECT d.id, d.title,
               1 - (d.embedding <=> query_embedding) AS vector_score
        FROM documents d
        ORDER BY d.embedding <=> query_embedding
        LIMIT match_count * 3
    ),
    text_results AS (
        SELECT d.id, d.title,
               ts_rank(d.tsv, plainto_tsquery('english', query_text)) AS text_score
        FROM documents d
        WHERE d.tsv @@ plainto_tsquery('english', query_text)
        LIMIT match_count * 3
    ),
    combined AS (
        SELECT
            COALESCE(v.id, t.id) AS id,
            COALESCE(v.title, t.title) AS title,
            COALESCE(v.vector_score, 0) * vector_weight +
            COALESCE(t.text_score, 0) * text_weight AS score
        FROM vector_results v
        FULL OUTER JOIN text_results t ON v.id = t.id
    )
    SELECT c.id, c.title, c.score
    FROM combined c
    ORDER BY c.score DESC
    LIMIT match_count;
END;
$$ LANGUAGE plpgsql;

-- Usage
SELECT * FROM hybrid_search(
    'Kubernetes RBAC security',
    '[0.1, 0.2, ...]'::vector(1536)
);

5.2 Reciprocal Rank Fusion (RRF)

CREATE OR REPLACE FUNCTION rrf_search(
    query_text TEXT,
    query_embedding vector(1536),
    match_count INT DEFAULT 10,
    rrf_k INT DEFAULT 60
)
RETURNS TABLE (id BIGINT, title TEXT, rrf_score FLOAT) AS $$
BEGIN
    RETURN QUERY
    WITH vector_ranked AS (
        SELECT d.id, d.title,
               ROW_NUMBER() OVER (ORDER BY d.embedding <=> query_embedding) AS rank
        FROM documents d
        LIMIT match_count * 5
    ),
    text_ranked AS (
        SELECT d.id, d.title,
               ROW_NUMBER() OVER (
                   ORDER BY ts_rank(d.tsv, plainto_tsquery('english', query_text)) DESC
               ) AS rank
        FROM documents d
        WHERE d.tsv @@ plainto_tsquery('english', query_text)
        LIMIT match_count * 5
    ),
    fused AS (
        SELECT
            COALESCE(v.id, t.id) AS id,
            COALESCE(v.title, t.title) AS title,
            COALESCE(1.0 / (rrf_k + v.rank), 0) +
            COALESCE(1.0 / (rrf_k + t.rank), 0) AS rrf_score
        FROM vector_ranked v
        FULL OUTER JOIN text_ranked t ON v.id = t.id
    )
    SELECT f.id, f.title, f.rrf_score
    FROM fused f
    ORDER BY f.rrf_score DESC
    LIMIT match_count;
END;
$$ LANGUAGE plpgsql;

6. RAG Pipeline Integration

6.1 Full Python Example

import psycopg2
from pgvector.psycopg2 import register_vector
from openai import OpenAI
import numpy as np

client = OpenAI()
conn = psycopg2.connect("dbname=ragdb user=postgres password=secret")
register_vector(conn)

def embed(text: str) -> list[float]:
    resp = client.embeddings.create(
        model="text-embedding-3-small", input=text
    )
    return resp.data[0].embedding

def rag_query(question: str, top_k: int = 5) -> str:
    query_vec = embed(question)

    cur = conn.cursor()
    cur.execute("""
        SELECT title, content,
               1 - (embedding <=> %s::vector) AS similarity
        FROM documents
        WHERE 1 - (embedding <=> %s::vector) > 0.7
        ORDER BY embedding <=> %s::vector
        LIMIT %s
    """, (query_vec, query_vec, query_vec, top_k))

    results = cur.fetchall()
    context = "\n\n".join([
        f"[{r[0]}] (similarity: {r[2]:.3f})\n{r[1]}"
        for r in results
    ])

    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": f"Answer based on context:\n{context}"},
            {"role": "user", "content": question}
        ]
    )
    return response.choices[0].message.content

# Usage
answer = rag_query("What is the difference between ClusterRole and Role in Kubernetes RBAC?")
print(answer)

7. Performance Tuning

7.1 Key Settings

-- Work memory (for index build/search)
SET maintenance_work_mem = '2GB';  -- During HNSW build
SET work_mem = '256MB';             -- During search

-- Parallel processing
SET max_parallel_workers_per_gather = 4;
SET max_parallel_maintenance_workers = 4;

-- HNSW build optimization
SET maintenance_work_mem = '4GB';
-- Revert after build

7.2 Benchmark

1 million vectors (1536 dimensions), PostgreSQL 17 + pgvector 0.8.0:

Index	Build time	Search latency (p50)	Recall@10	Memory
None (brute)	-	850ms	100%	0
IVFFlat (lists=1000, probes=50)	45s	8ms	95%	1.2GB
HNSW (m=16, ef=200)	12min	3ms	99%	2.8GB

8. Quiz

Q1. What does the pgvector <=> operator compute?

Cosine distance (1 - cosine_similarity). Smaller values mean more similar. To convert to similarity: 1 - (a <=> b).

Q2. Which is faster for search, HNSW or IVFFlat?

HNSW. Build is slower but search is faster with higher recall. Recommended for most production environments.

Q3. Why is hybrid search better than pure vector search?

Vector search excels at semantic similarity but is weak at exact keyword matching. Combining it with full-text search captures both semantic similarity and keyword precision.

Q4. What is the principle behind RRF (Reciprocal Rank Fusion)?

It sums the reciprocal of each result's rank. The advantage is that it can combine search results with different score scales without normalization.

Q5. What is the relationship between IVFFlat's lists and probes parameters?

lists is the number of clusters, probes is the number of clusters to search at query time. Higher probes means more accurate but slower. Typically probes = lists/10 ~ lists/5.

Q6. When should you choose pgvector over a dedicated vector DB?

(1) Need JOINs with relational data (2) Need ACID transactions (3) Vector count is in the tens of millions or fewer (4) Want to minimize additional infrastructure overhead.

Quiz

Q1: What is the main topic covered in "PostgreSQL + pgvector Vector Search Practical Guide: From RAG to Hybrid Search"?

Implement vector similarity search with the PostgreSQL pgvector extension. Covers installation through HNSW/IVFFlat indexes, hybrid search (vector + full-text search), and RAG pipeline integration with practical code.

Q2: Why PostgreSQL + pgvector?

Reasons to choose PostgreSQL + pgvector over dedicated vector DBs (Pinecone, Qdrant, Weaviate): Conclusion: If your vector count is in the tens of millions or fewer and you need to manage data alongside relational data, pgvector is the optimal choice.

Q3: What are the key steps for Installation and Configuration?

2.1 Installing pgvector 2.2 Enabling the Extension

Q4: What are the key aspects of Basic Usage?

3.1 Creating Tables 3.2 Inserting Data 3.3 Similarity Search Operator summary:

Q5: What are the key differences in Indexes: HNSW vs IVFFlat?

4.1 IVFFlat 4.2 HNSW (Recommended) 4.3 HNSW vs IVFFlat Comparison Recommendation: Use HNSW in most cases. Use IVFFlat if data changes frequently or memory is limited.