- Published on
Database Engineering Complete Guide: From SQL to Vector DBs and AI RAG Systems
- Authors

- Name
- Youngju Kim
- @fjvbn20031
Database Engineering Complete Guide: From SQL to Vector DBs and AI RAG Systems
Even in the AI era, database engineering remains the backbone of every production system. No matter how powerful an LLM is, you cannot build reliable software without efficiently storing and retrieving data. The rise of vector search and RAG architectures has actually made database engineering more important, not less.
This guide takes you from advanced SQL through PostgreSQL internals, NoSQL systems, vector databases, distributed DB theory, and finally into LLM + DB integration patterns — all with production-ready code examples.
1. Relational DB Core: Advanced SQL
1.1 Window Functions
Window functions compute results across a set of rows related to the current row without collapsing them into a single group — indispensable for analytical queries.
-- Salary rank and rolling average per department
SELECT
employee_id,
name,
department,
salary,
RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS dept_rank,
SUM(salary) OVER (PARTITION BY department) AS dept_total,
AVG(salary) OVER (
PARTITION BY department
ORDER BY hire_date
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
) AS rolling_avg_3
FROM employees;
Key window functions:
ROW_NUMBER(): Unique sequential number with no tiesRANK(): Tied rows get the same rank; next rank skips numbersDENSE_RANK(): Tied rows get the same rank; next rank is consecutiveLAG() / LEAD(): Access values from previous / next rowsFIRST_VALUE() / LAST_VALUE(): First / last value within the window frame
-- Month-over-month revenue growth rate
SELECT
month,
revenue,
LAG(revenue, 1) OVER (ORDER BY month) AS prev_month,
ROUND(
(revenue - LAG(revenue, 1) OVER (ORDER BY month)) * 100.0
/ NULLIF(LAG(revenue, 1) OVER (ORDER BY month), 0),
2
) AS growth_rate_pct
FROM monthly_sales;
1.2 CTEs (Common Table Expressions)
CTEs let you break complex queries into named, readable steps. Recursive CTEs are especially powerful for hierarchical traversal.
-- Recursive CTE for org chart traversal
WITH RECURSIVE org_tree AS (
-- Base case: top-level employees
SELECT
employee_id,
name,
manager_id,
0 AS depth,
name::TEXT AS path
FROM employees
WHERE manager_id IS NULL
UNION ALL
-- Recursive case
SELECT
e.employee_id,
e.name,
e.manager_id,
ot.depth + 1,
ot.path || ' > ' || e.name
FROM employees e
INNER JOIN org_tree ot ON e.manager_id = ot.employee_id
)
SELECT employee_id, name, depth, path
FROM org_tree
ORDER BY path;
-- Multi-step analysis broken into readable CTEs
WITH
top_customers AS (
SELECT customer_id, SUM(amount) AS total_spent
FROM orders
WHERE created_at >= NOW() - INTERVAL '90 days'
GROUP BY customer_id
HAVING SUM(amount) > 1000
),
customer_details AS (
SELECT c.*, tc.total_spent
FROM customers c
JOIN top_customers tc ON c.id = tc.customer_id
),
ranked AS (
SELECT *,
NTILE(4) OVER (ORDER BY total_spent DESC) AS quartile
FROM customer_details
)
SELECT * FROM ranked WHERE quartile = 1;
1.3 Reading EXPLAIN ANALYZE
Understanding the query execution plan is the single most powerful skill for diagnosing performance problems.
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT u.name, COUNT(o.id) AS order_count
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.created_at > '2025-01-01'
GROUP BY u.id, u.name
ORDER BY order_count DESC
LIMIT 10;
Sample plan output:
Limit (cost=1234.56..1234.57 rows=10 width=72) (actual time=45.123..45.125 rows=10 loops=1)
-> Sort (cost=1234.56..1259.56 rows=10000 width=72) (actual time=45.120..45.121 rows=10 loops=1)
Sort Key: (count(o.id)) DESC
Sort Method: top-N heapsort Memory: 25kB
-> HashAggregate (cost=876.00..976.00 rows=10000 width=72) (actual time=38.456..42.234 rows=8523 loops=1)
Group Key: u.id
-> Hash Left Join (cost=345.00..801.00 rows=15000 width=40)
Hash Cond: (o.user_id = u.id)
-> Seq Scan on orders o (cost=0.00..312.00 rows=15000 width=16)
-> Hash (cost=270.00..270.00 rows=6000 width=32)
-> Index Scan using idx_users_created_at on users u
Planning Time: 1.234 ms
Execution Time: 45.456 ms
Key signals to watch:
- Seq Scan on a large table: investigate whether an index would help
- Large gap between estimated and actual rows: run
ANALYZEto refresh statistics - Buffers: shared read vs hit: low cache hit ratio means consider
shared_bufferstuning - High loops on nested loop joins: consider a hash join or better index
1.4 Index Design
-- Composite index: put high-selectivity columns first
CREATE INDEX idx_orders_user_status_date
ON orders (user_id, status, created_at DESC);
-- Partial index: only index rows meeting a condition
CREATE INDEX idx_active_users
ON users (email)
WHERE deleted_at IS NULL AND status = 'active';
-- Expression index: index the result of a function
CREATE INDEX idx_users_lower_email
ON users (LOWER(email));
-- BRIN index: very small, ideal for naturally ordered time-series
CREATE INDEX idx_logs_timestamp_brin
ON application_logs USING BRIN (created_at);
-- GIN index: arrays, JSONB, full-text search
CREATE INDEX idx_products_tags_gin
ON products USING GIN (tags);
1.5 Transactions and ACID
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
BEGIN;
-- Atomic bank transfer
UPDATE accounts SET balance = balance - 500 WHERE id = 1;
UPDATE accounts SET balance = balance + 500 WHERE id = 2;
-- Guard clause
DO $$
DECLARE
bal NUMERIC;
BEGIN
SELECT balance INTO bal FROM accounts WHERE id = 1;
IF bal < 0 THEN
RAISE EXCEPTION 'Insufficient funds';
END IF;
END $$;
COMMIT;
Isolation level anomalies:
| Isolation Level | Dirty Read | Non-repeatable Read | Phantom Read |
|---|---|---|---|
| READ UNCOMMITTED | Possible | Possible | Possible |
| READ COMMITTED | Prevented | Possible | Possible |
| REPEATABLE READ | Prevented | Prevented | Possible |
| SERIALIZABLE | Prevented | Prevented | Prevented |
2. PostgreSQL in Practice
2.1 JSONB and Semi-Structured Data
PostgreSQL's JSONB stores JSON in a parsed binary format enabling indexing and fast operators.
CREATE TABLE products (
id SERIAL PRIMARY KEY,
name TEXT NOT NULL,
metadata JSONB NOT NULL DEFAULT '{}'
);
CREATE INDEX idx_products_metadata ON products USING GIN (metadata);
-- Containment check (@>), key existence (?), path extraction (->>)
SELECT * FROM products
WHERE metadata @> '{"category": "electronics", "in_stock": true}';
SELECT
name,
metadata->>'brand' AS brand,
(metadata->>'price')::NUMERIC AS price,
metadata->'specs'->>'cpu' AS cpu
FROM products
WHERE metadata ? 'discount_pct'
AND (metadata->>'discount_pct')::NUMERIC > 10;
-- Update a single key without replacing the whole document
UPDATE products
SET metadata = jsonb_set(metadata, '{price}', '29900'::jsonb)
WHERE id = 42;
2.2 Table Partitioning
-- Range partitioning for time-series data
CREATE TABLE events (
id BIGSERIAL,
user_id INT,
event_type TEXT,
payload JSONB,
created_at TIMESTAMPTZ NOT NULL
) PARTITION BY RANGE (created_at);
CREATE OR REPLACE FUNCTION create_monthly_partition(target_date DATE)
RETURNS VOID AS $$
DECLARE
partition_name TEXT;
start_date DATE;
end_date DATE;
BEGIN
start_date := DATE_TRUNC('month', target_date);
end_date := start_date + INTERVAL '1 month';
partition_name := 'events_' || TO_CHAR(start_date, 'YYYY_MM');
EXECUTE FORMAT(
'CREATE TABLE IF NOT EXISTS %I PARTITION OF events
FOR VALUES FROM (%L) TO (%L)',
partition_name, start_date, end_date
);
END;
$$ LANGUAGE plpgsql;
SELECT create_monthly_partition('2026-03-01');
2.3 Logical Replication
-- On the primary: create a publication
ALTER SYSTEM SET wal_level = logical;
CREATE PUBLICATION app_publication
FOR TABLE users, orders, products
WITH (publish = 'insert, update, delete');
-- On the replica: subscribe
CREATE SUBSCRIPTION app_subscription
CONNECTION 'host=primary-db port=5432 dbname=myapp user=replicator'
PUBLICATION app_publication;
-- Monitor replication lag
SELECT subname, received_lsn, latest_end_lsn, latest_end_time
FROM pg_stat_subscription;
3. NoSQL Databases
3.1 Redis: Caching Patterns
import redis
import json
import hashlib
import time
from typing import Optional
r = redis.Redis(host='localhost', port=6379, db=0, decode_responses=True)
# Cache-Aside (Lazy Loading)
def get_user_profile(user_id: int) -> dict:
cache_key = f"user:profile:{user_id}"
cached = r.get(cache_key)
if cached:
return json.loads(cached)
user = db.query("SELECT * FROM users WHERE id = %s", user_id)
r.setex(cache_key, 3600, json.dumps(user))
return user
# Write-Through: update cache synchronously on every write
def update_user_profile(user_id: int, data: dict) -> None:
cache_key = f"user:profile:{user_id}"
db.execute("UPDATE users SET ... WHERE id = %s", user_id)
updated = get_user_from_db(user_id)
r.setex(cache_key, 3600, json.dumps(updated))
# Write-Behind (Write-Back): write to cache immediately, flush DB asynchronously
class WriteBehindCache:
def __init__(self):
self.dirty_keys_set = "cache:dirty_keys"
def write(self, key: str, value: dict, ttl: int = 3600):
r.setex(key, ttl, json.dumps(value))
r.sadd(self.dirty_keys_set, key)
def flush_to_db(self):
dirty_keys = r.smembers(self.dirty_keys_set)
for key in dirty_keys:
data = r.get(key)
if data:
db.upsert(json.loads(data))
r.srem(self.dirty_keys_set, key)
# Distributed lock (Redlock pattern)
def acquire_lock(lock_name: str, timeout: int = 10) -> Optional[str]:
identifier = str(time.time())
acquired = r.set(f"lock:{lock_name}", identifier, nx=True, ex=timeout)
return identifier if acquired else None
def release_lock(lock_name: str, identifier: str) -> bool:
lua = """
if redis.call("get", KEYS[1]) == ARGV[1] then
return redis.call("del", KEYS[1])
else
return 0
end
"""
return bool(r.eval(lua, 1, f"lock:{lock_name}", identifier))
Additional Redis data structures:
# Sorted Set: real-time leaderboard
def update_score(player: str, score: int):
r.zadd("leaderboard", {player: score})
def get_top_players(n: int = 10):
return r.zrevrange("leaderboard", 0, n - 1, withscores=True)
# HyperLogLog: approximate unique visitor count (very memory-efficient)
def track_visitor(page: str, user_id: str):
r.pfadd(f"visitors:{page}", user_id)
def get_unique_visitors(page: str) -> int:
return r.pfcount(f"visitors:{page}")
3.2 MongoDB: Document Modeling
from pymongo import MongoClient
from datetime import datetime
client = MongoClient('mongodb://localhost:27017/')
db = client['ecommerce']
# Embedded documents vs references
# Embed when data is frequently read together and has limited growth
sample_order = {
"_id": "order_12345",
"user_id": "user_67890",
"status": "shipped",
"created_at": datetime.utcnow(),
"shipping_address": { # Embedded: snapshot at time of order
"street": "123 Main St",
"city": "Seoul",
"zip": "06234"
},
"items": [ # Embedded: price snapshot preserved
{"product_id": "p001", "name": "Laptop", "price": 1200, "qty": 1},
{"product_id": "p002", "name": "Mouse", "price": 35, "qty": 2}
],
"total": 1270
}
# Aggregation Pipeline
pipeline = [
{"$match": {
"created_at": {"$gte": datetime(2026, 2, 17)},
"status": {"$in": ["delivered", "shipped"]}
}},
{"$unwind": "$items"},
{"$group": {
"_id": "$items.product_id",
"product_name": {"$first": "$items.name"},
"total_qty": {"$sum": "$items.qty"},
"total_revenue": {"$sum": {"$multiply": ["$items.price", "$items.qty"]}}
}},
{"$sort": {"total_revenue": -1}},
{"$limit": 10},
{"$project": {
"product_id": "$_id",
"product_name": 1,
"total_qty": 1,
"total_revenue": 1,
"_id": 0
}}
]
top_products = list(db['orders'].aggregate(pipeline))
3.3 Cassandra: Wide-Column Design
Cassandra forces you to design tables around query patterns, not entity relationships.
-- Query-first table design
CREATE TABLE user_timeline (
user_id UUID,
created_at TIMEUUID,
post_id UUID,
content TEXT,
PRIMARY KEY (user_id, created_at)
) WITH CLUSTERING ORDER BY (created_at DESC)
AND compaction = {
'class': 'TimeWindowCompactionStrategy',
'compaction_window_unit': 'DAYS',
'compaction_window_size': 7
};
-- Denormalized table for tag-based search (separate query pattern)
CREATE TABLE posts_by_tag (
tag TEXT,
created_at TIMEUUID,
post_id UUID,
user_id UUID,
title TEXT,
PRIMARY KEY (tag, created_at, post_id)
) WITH CLUSTERING ORDER BY (created_at DESC);
4. Vector Databases: The AI-Era Core
4.1 pgvector: Vector Search Inside PostgreSQL
CREATE EXTENSION IF NOT EXISTS vector;
CREATE TABLE document_embeddings (
id BIGSERIAL PRIMARY KEY,
content TEXT NOT NULL,
metadata JSONB DEFAULT '{}',
embedding vector(1536),
created_at TIMESTAMPTZ DEFAULT NOW()
);
-- HNSW index for fast approximate nearest-neighbor search
CREATE INDEX idx_doc_hnsw
ON document_embeddings
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
-- Cosine similarity search (lower distance = more similar)
SELECT
id,
content,
metadata,
1 - (embedding <=> '[0.1, 0.2, ...]'::vector) AS similarity
FROM document_embeddings
ORDER BY embedding <=> '[0.1, 0.2, ...]'::vector
LIMIT 5;
-- Hybrid: metadata filter + vector search
SELECT
id,
content,
metadata->>'source' AS source,
1 - (embedding <=> query_vec) AS similarity
FROM document_embeddings
WHERE metadata->>'language' = 'en'
AND metadata->>'category' = 'technical'
ORDER BY embedding <=> query_vec
LIMIT 10;
Distance operators in pgvector:
<=>: Cosine distance — best for text embeddings<->: L2 (Euclidean) distance — image feature vectors<#>: Negative inner product — normalized vectors (equivalent to cosine)
4.2 IVFFlat vs HNSW
-- IVFFlat: faster build, lower memory, good for large initial datasets
CREATE INDEX idx_ivfflat
ON document_embeddings
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100); -- rule of thumb: sqrt(row count)
SET ivfflat.probes = 10; -- more probes = higher recall, slower query
-- HNSW: higher recall, faster queries, more memory / slower build
CREATE INDEX idx_hnsw
ON document_embeddings
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
SET hnsw.ef_search = 40; -- higher = more accurate, slower
4.3 Choosing a Vector Database
| Property | pgvector | Pinecone | Weaviate | Milvus |
|---|---|---|---|---|
| Deployment | PG extension | SaaS | Self-hosted / cloud | Self-hosted / cloud |
| Filtering | Full SQL | Metadata filters | GraphQL + filters | Rich filters |
| Scale | Millions | Billions+ | Hundreds of millions | Billions+ |
| Cost | PG infra only | Usage-based | Open-source free | Open-source free |
| Hybrid search | BM25 + vector | Built-in | BM25 + vector native | Rich support |
| Best for | Existing PG stack | Fast prototypes | Knowledge graphs | Massive scale |
5. Distributed Databases: Theory and Practice
5.1 CAP Theorem
The CAP theorem states that a distributed system can satisfy at most two of:
- C (Consistency): Every node returns the latest data
- A (Availability): Every request gets a response
- P (Partition Tolerance): The system keeps working despite network splits
Because network partitions are inevitable in distributed systems, the real choice is between CP and AP.
CP systems (consistency over availability):
- Refuse requests during a partition to preserve correctness
- Examples: ZooKeeper, HBase, MongoDB with
w: majority - Use cases: financial transactions, inventory management
AP systems (availability over consistency):
- Return possibly stale data during a partition
- Examples: DynamoDB, Cassandra, CouchDB
- Use cases: social feeds, DNS, notification systems
5.2 Consistency Models
Strong Consistency (high latency, low throughput)
Sequential Consistency
Causal Consistency
Eventual Consistency (low latency, high throughput)
Monotonic Read
5.3 Sharding Strategies
# Range sharding: consecutive ID ranges per shard
def range_shard(user_id: int, num_shards: int = 4) -> int:
shard_size = 250_000_000
return min(user_id // shard_size, num_shards - 1)
# Hash sharding: uniform distribution, prevents hotspots
import hashlib
def hash_shard(key: str, num_shards: int = 8) -> int:
h = int(hashlib.md5(key.encode()).hexdigest(), 16)
return h % num_shards
# Consistent hashing: minimal rehashing when nodes are added/removed
import bisect
class ConsistentHashRing:
def __init__(self, nodes: list, replicas: int = 150):
self.replicas = replicas
self.ring: dict = {}
self.sorted_keys: list = []
for node in nodes:
self.add_node(node)
def add_node(self, node: str):
for i in range(self.replicas):
key = self._hash(f"{node}:{i}")
self.ring[key] = node
bisect.insort(self.sorted_keys, key)
def get_node(self, key: str) -> str:
if not self.ring:
return None
h = self._hash(key)
idx = bisect.bisect(self.sorted_keys, h)
if idx == len(self.sorted_keys):
idx = 0
return self.ring[self.sorted_keys[idx]]
def _hash(self, key: str) -> int:
return int(hashlib.md5(key.encode()).hexdigest(), 16)
6. Data Modeling
6.1 Normalization vs Denormalization
Normalization (through 3NF):
- Eliminates redundancy and update anomalies
- Excellent write performance
- Requires JOINs for reads
Denormalization:
- Optimizes read performance at the cost of redundancy
- Well-suited for OLAP and data warehouses
-- Normalized schema
CREATE TABLE categories (id SERIAL PRIMARY KEY, name TEXT);
CREATE TABLE products (
id SERIAL PRIMARY KEY,
name TEXT,
category_id INT REFERENCES categories(id)
);
-- Denormalized schema (eliminates the JOIN)
CREATE TABLE products_denormalized (
id SERIAL PRIMARY KEY,
name TEXT,
category_id INT,
category_name TEXT -- duplicated to avoid JOIN
);
6.2 Star Schema for Data Warehouses
-- Fact table (measurements / events)
CREATE TABLE fact_sales (
sale_id BIGINT,
date_key INT,
product_key INT,
customer_key INT,
store_key INT,
quantity INT,
unit_price DECIMAL(10,2),
total_amount DECIMAL(10,2)
);
-- Dimension tables (descriptive context)
CREATE TABLE dim_date (
date_key INT PRIMARY KEY,
full_date DATE,
year INT, quarter INT, month INT, week INT, day_of_week INT
);
CREATE TABLE dim_product (
product_key INT PRIMARY KEY,
product_id TEXT,
name TEXT, category TEXT, brand TEXT, unit_cost DECIMAL(10,2)
);
7. AI Integration: LLM + DB Patterns
7.1 Building a RAG System with LangChain + pgvector
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_postgres import PGVector
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chains import RetrievalQA
from langchain.schema import Document
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
CONNECTION_STRING = "postgresql+psycopg://user:password@localhost:5432/vectordb"
vector_store = PGVector(
embeddings=embeddings,
collection_name="documents",
connection=CONNECTION_STRING,
use_jsonb=True,
)
# Ingest: chunk, embed, and store
def ingest_documents(file_path: str, metadata: dict):
with open(file_path, 'r', encoding='utf-8') as f:
raw_text = f.read()
splitter = RecursiveCharacterTextSplitter(
chunk_size=512,
chunk_overlap=50,
separators=["\n\n", "\n", ".", " "]
)
chunks = splitter.split_text(raw_text)
documents = [
Document(page_content=chunk, metadata={**metadata, "chunk_index": i})
for i, chunk in enumerate(chunks)
]
return vector_store.add_documents(documents)
# Build the RAG chain
def build_rag_chain(k: int = 5, score_threshold: float = 0.7):
retriever = vector_store.as_retriever(
search_type="similarity_score_threshold",
search_kwargs={"k": k, "score_threshold": score_threshold}
)
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
return RetrievalQA.from_chain_type(
llm=llm,
chain_type="stuff",
retriever=retriever,
return_source_documents=True,
)
rag_chain = build_rag_chain()
result = rag_chain.invoke({"query": "How does PostgreSQL MVCC handle concurrency?"})
print(result['result'])
7.2 Hybrid Search: BM25 + Vector via RRF
from sqlalchemy import text
def hybrid_search(
query: str,
query_embedding: list,
k: int = 10,
alpha: float = 0.5, # 0 = BM25 only, 1 = vector only
) -> list:
sql = text("""
WITH
vector_results AS (
SELECT id, content, metadata,
ROW_NUMBER() OVER (ORDER BY embedding <=> :embedding) AS rank
FROM document_embeddings
ORDER BY embedding <=> :embedding
LIMIT :k
),
bm25_results AS (
SELECT id, content, metadata,
ROW_NUMBER() OVER (ORDER BY ts_rank(
to_tsvector('english', content),
plainto_tsquery('english', :query)
) DESC) AS rank
FROM document_embeddings
WHERE to_tsvector('english', content) @@ plainto_tsquery('english', :query)
LIMIT :k
),
rrf AS (
SELECT
COALESCE(v.id, b.id) AS id,
COALESCE(v.content, b.content) AS content,
COALESCE(v.metadata, b.metadata) AS metadata,
COALESCE(1.0 / (60 + v.rank), 0) * :alpha +
COALESCE(1.0 / (60 + b.rank), 0) * (1 - :alpha) AS score
FROM vector_results v
FULL OUTER JOIN bm25_results b ON v.id = b.id
)
SELECT id, content, metadata, score
FROM rrf
ORDER BY score DESC
LIMIT :k
""")
with engine.connect() as conn:
rows = conn.execute(sql, {
"embedding": str(query_embedding),
"query": query,
"k": k,
"alpha": alpha,
})
return [dict(row) for row in rows]
7.3 Embedding Cache with Redis
import hashlib, json
from typing import Optional
class EmbeddingCache:
def __init__(self, redis_client, ttl: int = 86400 * 7):
self.redis = redis_client
self.ttl = ttl
def _key(self, text: str, model: str) -> str:
h = hashlib.sha256(f"{model}:{text}".encode()).hexdigest()
return f"embedding:{h}"
def get(self, text: str, model: str) -> Optional[list]:
raw = self.redis.get(self._key(text, model))
return json.loads(raw) if raw else None
def set(self, text: str, model: str, embedding: list):
self.redis.setex(self._key(text, model), self.ttl, json.dumps(embedding))
def get_or_compute(self, text: str, model: str, compute_fn) -> list:
cached = self.get(text, model)
if cached:
return cached
embedding = compute_fn(text)
self.set(text, model, embedding)
return embedding
Quiz: Test Your Knowledge
Q1. When should you use a B-Tree index versus a Hash index?
Answer: B-Tree is best for range queries, sorting, and prefix LIKE patterns. Hash indexes only support equality comparisons (=).
Explanation: B-Tree stores keys in a sorted tree, making it ideal for WHERE age > 30, ORDER BY name, and BETWEEN operations. Hash indexes compute a hash of the key for O(1) point lookups (WHERE id = 42) but are completely useless for range queries or sorting. In PostgreSQL, Hash indexes gained full WAL logging support in v10, making them production-safe.
Q2. How does PostgreSQL MVCC handle concurrent reads and writes?
Answer: Each transaction takes a snapshot at its start time and only sees row versions that were committed before that snapshot. Readers never block writers and writers never block readers.
Explanation: When a row is updated, PostgreSQL writes a new version of the row tagged with xmin and xmax transaction IDs rather than overwriting the old version. Each transaction uses its own xid snapshot to determine which row version is visible. Old versions (dead tuples) are cleaned up by the VACUUM background process. This is why PostgreSQL can achieve very high read concurrency with minimal locking.
Q3. What are the trade-offs between CP and AP systems in the CAP theorem?
Answer: CP systems sacrifice availability during a network partition to preserve data consistency. AP systems stay available during a partition but may return stale data, offering only eventual consistency.
Explanation: A CP system like ZooKeeper will refuse requests or return errors during a partition, ensuring every response reflects the latest committed state — critical for distributed coordination, financial ledgers, and inventory. An AP system like Cassandra will always return a response even if some nodes are unreachable, but different clients might temporarily see different versions of data. The right choice depends on whether inconsistency or downtime is the bigger risk for your use case.
Q4. Why does HNSW outperform brute-force KNN for approximate nearest-neighbor search in vector databases?
Answer: HNSW builds a hierarchical small-world graph where each node connects to a small number of neighbors. Search navigates from coarse upper layers to fine-grained lower layers in O(log N) time, achieving high recall with far fewer distance computations than brute-force O(N).
Explanation: Exact KNN over millions of vectors requires computing the distance to every vector — prohibitively slow at scale. HNSW (Hierarchical Navigable Small World) constructs multiple graph layers: upper layers enable fast long-range navigation while the bottom layer provides precise local search. The build parameters m (edges per node) and ef_construction (beam width during construction) determine index quality. At query time, ef_search controls the recall/speed trade-off. For most production workloads, HNSW achieves 95%+ recall with millisecond latency.
Q5. What is the difference between write-through and write-behind caching in Redis?
Answer: Write-through updates the cache and the database synchronously on every write, guaranteeing consistency at the cost of write latency. Write-behind updates the cache immediately and flushes dirty entries to the database asynchronously, improving throughput at the risk of data loss on failure.
Explanation: In write-through, every write goes to both the cache and the DB before acknowledging the client, so the cache is never stale but writes are as slow as the DB. In write-behind (write-back), the application writes only to the cache and a background process batches DB writes later. This gives much lower write latency and higher throughput, but if the cache node crashes before flushing, those writes are lost. Write-behind is appropriate for high-frequency, loss-tolerant workloads like shopping carts, game state, and analytics counters.
Conclusion
Database engineering extends far beyond writing SQL. It encompasses data modeling, index strategy, distributed systems theory, and now — increasingly — vector search and AI pipeline integration. In the AI era, the engineer who can design a schema for embedding storage, tune HNSW parameters, implement a Redis caching layer, and wire everything into a LangChain RAG pipeline holds a rare and valuable skill set.
Start with pgvector for vector search inside your existing PostgreSQL infrastructure, add Redis for caching hot data, and apply the principles of proper index design and query plan analysis to keep everything fast. The foundations in this guide will serve you well regardless of which specific tools the ecosystem moves to next.