Skip to content
Published on

PostgreSQL + pgvector Vector Search Practical Guide: From RAG to Hybrid Search

Authors
  • Name
    Twitter

1. Why PostgreSQL + pgvector

Reasons to choose PostgreSQL + pgvector over dedicated vector DBs (Pinecone, Qdrant, Weaviate):

CharacteristicDedicated Vector DBPostgreSQL + pgvector
Additional infraRequiredNot needed (use existing PG)
ACID transactionsLimitedFully supported
JOIN / relationalNot possibleFreely combinable
Hybrid searchLimitedtsvector + vector
Operational overheadHighLow (leverage existing DBA)
ScalabilityBillions of vectorsTens of millions

Conclusion: If your vector count is in the tens of millions or fewer and you need to manage data alongside relational data, pgvector is the optimal choice.

2. Installation and Configuration

2.1 Installing pgvector

# Ubuntu/Debian
sudo apt install postgresql-17-pgvector

# macOS (Homebrew)
brew install pgvector

# Docker
docker run -d --name pgvector \
  -e POSTGRES_PASSWORD=secret \
  -p 5432:5432 \
  pgvector/pgvector:pg17

2.2 Enabling the Extension

-- Install pgvector extension
CREATE EXTENSION IF NOT EXISTS vector;

-- Check version
SELECT extversion FROM pg_extension WHERE extname = 'vector';
-- 0.8.0

3. Basic Usage

3.1 Creating Tables

-- Documents table (1536 dimensions = OpenAI text-embedding-3-small)
CREATE TABLE documents (
    id BIGSERIAL PRIMARY KEY,
    title TEXT NOT NULL,
    content TEXT NOT NULL,
    embedding vector(1536),  -- Vector column!
    metadata JSONB DEFAULT '{}',
    created_at TIMESTAMPTZ DEFAULT NOW()
);

-- 384 dimensions (sentence-transformers/all-MiniLM-L6-v2)
CREATE TABLE chunks (
    id BIGSERIAL PRIMARY KEY,
    doc_id BIGINT REFERENCES documents(id),
    chunk_text TEXT NOT NULL,
    embedding vector(384),
    chunk_index INT
);

3.2 Inserting Data

-- Single insert
INSERT INTO documents (title, content, embedding)
VALUES (
    'Kubernetes RBAC Guide',
    'RBAC is a method of regulating access...',
    '[0.1, 0.2, 0.3, ...]'::vector  -- 1536-dimension vector
);

-- Batch insert from Python
import psycopg2
from pgvector.psycopg2 import register_vector
import numpy as np

conn = psycopg2.connect("dbname=mydb user=postgres password=secret")
register_vector(conn)

cur = conn.cursor()

# Generate OpenAI embeddings
from openai import OpenAI
client = OpenAI()

texts = ["Kubernetes RBAC guide", "Docker networking basics", ...]
response = client.embeddings.create(
    model="text-embedding-3-small",
    input=texts
)

# Batch insert
for text, emb_data in zip(texts, response.data):
    embedding = np.array(emb_data.embedding)
    cur.execute(
        "INSERT INTO documents (title, content, embedding) VALUES (%s, %s, %s)",
        (text, text, embedding)
    )

conn.commit()
-- Cosine similarity (most common)
SELECT id, title,
       1 - (embedding <=> '[0.1, 0.2, ...]'::vector) AS similarity
FROM documents
ORDER BY embedding <=> '[0.1, 0.2, ...]'::vector
LIMIT 10;

-- L2 distance
SELECT id, title,
       embedding <-> '[0.1, 0.2, ...]'::vector AS distance
FROM documents
ORDER BY embedding <-> '[0.1, 0.2, ...]'::vector
LIMIT 10;

-- Inner Product — equivalent to cosine for normalized vectors
SELECT id, title,
       (embedding <#> '[0.1, 0.2, ...]'::vector) * -1 AS similarity
FROM documents
ORDER BY embedding <#> '[0.1, 0.2, ...]'::vector
LIMIT 10;

Operator summary:

OperatorMeaningUse case
<->L2 distanceEuclidean distance-based
<=>Cosine distanceDirectional similarity (most common)
<#>Inner product (negative)Used with normalized vectors

4. Indexes: HNSW vs IVFFlat

4.1 IVFFlat

-- Create IVFFlat index
CREATE INDEX ON documents
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);

-- Set probe count for search (accuracy vs speed trade-off)
SET ivfflat.probes = 10;
ParameterDescriptionRecommended value
listsNumber of clusterssqrt(rows) ~ rows/1000
probesClusters to searchlists/10 ~ lists/5
-- Create HNSW index (longer build time but faster search)
CREATE INDEX ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 200);

-- Set ef_search for search
SET hnsw.ef_search = 100;
ParameterDescriptionRecommended value
mConnections16~64
ef_constructionBuild quality200+
ef_searchSearch quality40~200

4.3 HNSW vs IVFFlat Comparison

CharacteristicIVFFlatHNSW
Build speedFastSlow
Search speedModerateFast
RecallModerate (probe-dependent)High
MemoryLowHigh
UpdatesRequires rebuildReal-time possible

Recommendation: Use HNSW in most cases. Use IVFFlat if data changes frequently or memory is limited.

-- Add tsvector column
ALTER TABLE documents ADD COLUMN tsv tsvector
  GENERATED ALWAYS AS (to_tsvector('english', title || ' ' || content)) STORED;

CREATE INDEX ON documents USING gin(tsv);

-- Hybrid search function
CREATE OR REPLACE FUNCTION hybrid_search(
    query_text TEXT,
    query_embedding vector(1536),
    match_count INT DEFAULT 10,
    vector_weight FLOAT DEFAULT 0.7,
    text_weight FLOAT DEFAULT 0.3
)
RETURNS TABLE (id BIGINT, title TEXT, score FLOAT) AS $$
BEGIN
    RETURN QUERY
    WITH vector_results AS (
        SELECT d.id, d.title,
               1 - (d.embedding <=> query_embedding) AS vector_score
        FROM documents d
        ORDER BY d.embedding <=> query_embedding
        LIMIT match_count * 3
    ),
    text_results AS (
        SELECT d.id, d.title,
               ts_rank(d.tsv, plainto_tsquery('english', query_text)) AS text_score
        FROM documents d
        WHERE d.tsv @@ plainto_tsquery('english', query_text)
        LIMIT match_count * 3
    ),
    combined AS (
        SELECT
            COALESCE(v.id, t.id) AS id,
            COALESCE(v.title, t.title) AS title,
            COALESCE(v.vector_score, 0) * vector_weight +
            COALESCE(t.text_score, 0) * text_weight AS score
        FROM vector_results v
        FULL OUTER JOIN text_results t ON v.id = t.id
    )
    SELECT c.id, c.title, c.score
    FROM combined c
    ORDER BY c.score DESC
    LIMIT match_count;
END;
$$ LANGUAGE plpgsql;

-- Usage
SELECT * FROM hybrid_search(
    'Kubernetes RBAC security',
    '[0.1, 0.2, ...]'::vector(1536)
);

5.2 Reciprocal Rank Fusion (RRF)

CREATE OR REPLACE FUNCTION rrf_search(
    query_text TEXT,
    query_embedding vector(1536),
    match_count INT DEFAULT 10,
    rrf_k INT DEFAULT 60
)
RETURNS TABLE (id BIGINT, title TEXT, rrf_score FLOAT) AS $$
BEGIN
    RETURN QUERY
    WITH vector_ranked AS (
        SELECT d.id, d.title,
               ROW_NUMBER() OVER (ORDER BY d.embedding <=> query_embedding) AS rank
        FROM documents d
        LIMIT match_count * 5
    ),
    text_ranked AS (
        SELECT d.id, d.title,
               ROW_NUMBER() OVER (
                   ORDER BY ts_rank(d.tsv, plainto_tsquery('english', query_text)) DESC
               ) AS rank
        FROM documents d
        WHERE d.tsv @@ plainto_tsquery('english', query_text)
        LIMIT match_count * 5
    ),
    fused AS (
        SELECT
            COALESCE(v.id, t.id) AS id,
            COALESCE(v.title, t.title) AS title,
            COALESCE(1.0 / (rrf_k + v.rank), 0) +
            COALESCE(1.0 / (rrf_k + t.rank), 0) AS rrf_score
        FROM vector_ranked v
        FULL OUTER JOIN text_ranked t ON v.id = t.id
    )
    SELECT f.id, f.title, f.rrf_score
    FROM fused f
    ORDER BY f.rrf_score DESC
    LIMIT match_count;
END;
$$ LANGUAGE plpgsql;

6. RAG Pipeline Integration

6.1 Full Python Example

import psycopg2
from pgvector.psycopg2 import register_vector
from openai import OpenAI
import numpy as np

client = OpenAI()
conn = psycopg2.connect("dbname=ragdb user=postgres password=secret")
register_vector(conn)

def embed(text: str) -> list[float]:
    resp = client.embeddings.create(
        model="text-embedding-3-small", input=text
    )
    return resp.data[0].embedding

def rag_query(question: str, top_k: int = 5) -> str:
    query_vec = embed(question)

    cur = conn.cursor()
    cur.execute("""
        SELECT title, content,
               1 - (embedding <=> %s::vector) AS similarity
        FROM documents
        WHERE 1 - (embedding <=> %s::vector) > 0.7
        ORDER BY embedding <=> %s::vector
        LIMIT %s
    """, (query_vec, query_vec, query_vec, top_k))

    results = cur.fetchall()
    context = "\n\n".join([
        f"[{r[0]}] (similarity: {r[2]:.3f})\n{r[1]}"
        for r in results
    ])

    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": f"Answer based on context:\n{context}"},
            {"role": "user", "content": question}
        ]
    )
    return response.choices[0].message.content

# Usage
answer = rag_query("What is the difference between ClusterRole and Role in Kubernetes RBAC?")
print(answer)

7. Performance Tuning

7.1 Key Settings

-- Work memory (for index build/search)
SET maintenance_work_mem = '2GB';  -- During HNSW build
SET work_mem = '256MB';             -- During search

-- Parallel processing
SET max_parallel_workers_per_gather = 4;
SET max_parallel_maintenance_workers = 4;

-- HNSW build optimization
SET maintenance_work_mem = '4GB';
-- Revert after build

7.2 Benchmark

1 million vectors (1536 dimensions), PostgreSQL 17 + pgvector 0.8.0:

IndexBuild timeSearch latency (p50)Recall@10Memory
None (brute)-850ms100%0
IVFFlat (lists=1000, probes=50)45s8ms95%1.2GB
HNSW (m=16, ef=200)12min3ms99%2.8GB

8. Quiz

Q1. What does the pgvector <=> operator compute?

Cosine distance (1 - cosine_similarity). Smaller values mean more similar. To convert to similarity: 1 - (a <=> b).

Q2. Which is faster for search, HNSW or IVFFlat?

HNSW. Build is slower but search is faster with higher recall. Recommended for most production environments.

Q3. Why is hybrid search better than pure vector search?

Vector search excels at semantic similarity but is weak at exact keyword matching. Combining it with full-text search captures both semantic similarity and keyword precision.

Q4. What is the principle behind RRF (Reciprocal Rank Fusion)?

It sums the reciprocal of each result's rank. The advantage is that it can combine search results with different score scales without normalization.

Q5. What is the relationship between IVFFlat's lists and probes parameters?

lists is the number of clusters, probes is the number of clusters to search at query time. Higher probes means more accurate but slower. Typically probes = lists/10 ~ lists/5.

Q6. When should you choose pgvector over a dedicated vector DB?

(1) Need JOINs with relational data (2) Need ACID transactions (3) Vector count is in the tens of millions or fewer (4) Want to minimize additional infrastructure overhead.