- Published on
PostgreSQL Advanced Indexing Guide: GIN, GiST, BRIN, and Partial Index in Practice
- Authors
- Name
- Introduction
- PostgreSQL Index Types Overview
- B-tree Limitations and the Need for Advanced Indexes
- GIN Index Deep Dive
- GiST Index -- Spatial Data and Range Types
- BRIN Index -- Time-Series and Large Tables
- Partial Index and Expression Index
- EXPLAIN ANALYZE for Index Performance Analysis
- Index Bloat Management and REINDEX
- Index Type Comparison Table
- Operations Checklist
- Conclusion

Introduction
PostgreSQL offers six index types: B-tree, Hash, GIN, GiST, SP-GiST, and BRIN. While most tutorials stop at B-tree, real-world applications frequently encounter problems that B-tree alone cannot efficiently solve -- JSONB queries, full-text search, spatial queries, and time-series tables with billions of rows.
This article goes beyond B-tree to explore the internal structures of GIN, GiST, BRIN, Partial Index, and Expression Index, covering when and how to apply each with EXPLAIN ANALYZE-based performance data.
PostgreSQL Index Types Overview
A quick summary of PostgreSQL index types:
| Index Type | Internal Structure | Best Use Cases | Size | Write Cost |
|---|---|---|---|---|
| B-tree | Balanced tree | Equality, range, sort, UNIQUE | Medium | Low |
| GIN | Inverted index (Posting List/Tree) | JSONB, arrays, tsvector | Large | High |
| GiST | Generalized search tree | Spatial data, ranges, proximity | Medium | Medium |
| BRIN | Block range summary | Time-series, append-only large tables | Very small | Very low |
| Hash | Hash table | Pure equality lookups | Small | Low |
| SP-GiST | Space-partitioned tree | Phone numbers, IP addresses, unbalanced trees | Medium | Medium |
B-tree Limitations and the Need for Advanced Indexes
B-tree is optimized for scalar value equality comparisons and range searches. However, it becomes inefficient or unusable in the following scenarios:
- JSONB containment queries: Operators like
@>inWHERE metadata @> '...'cannot be indexed with B-tree. - Full-text search:
to_tsvector()-based search requires GIN indexes. - Spatial queries: PostGIS functions like
ST_DWithin()andST_Contains()need GiST indexes. - Billion-row time-series tables: The B-tree index itself grows to tens of GB, causing memory pressure.
PostgreSQL provides specialized index types to address these challenges.
GIN Index Deep Dive
Internal Structure
GIN (Generalized Inverted Index) uses an inverted index structure. Internally, it builds a B-tree over key values, and each leaf node stores a Posting List or Posting Tree -- a list of TIDs (Tuple Identifiers) for rows containing that key.
For example, if a JSONB column contains "tags": ["python", "database"], the GIN index registers both "python" and "database" as keys and adds the row TID to each key's Posting List.
JSONB Indexing
Creating a GIN index on a JSONB column enables index scans for operators like @>, ?, ?|, and ?&.
-- Create table
CREATE TABLE products (
id SERIAL PRIMARY KEY,
name TEXT NOT NULL,
metadata JSONB NOT NULL
);
-- Insert 1 million test rows
INSERT INTO products (name, metadata)
SELECT
'product_' || i,
jsonb_build_object(
'category', (ARRAY['electronics', 'clothing', 'food', 'toys'])[1 + (i % 4)],
'price', (random() * 1000)::int,
'tags', jsonb_build_array(
(ARRAY['sale', 'new', 'popular', 'limited'])[1 + (i % 4)],
(ARRAY['premium', 'budget', 'mid-range'])[1 + (i % 3)]
),
'in_stock', (i % 2 = 0)
)
FROM generate_series(1, 1000000) AS i;
-- GIN index with default jsonb_ops
CREATE INDEX idx_products_metadata_gin ON products USING gin (metadata);
-- GIN index with jsonb_path_ops (smaller index, @> only)
CREATE INDEX idx_products_metadata_path ON products USING gin (metadata jsonb_path_ops);
-- Containment query with EXPLAIN ANALYZE
EXPLAIN ANALYZE
SELECT id, name
FROM products
WHERE metadata @> '{"category": "electronics", "in_stock": true}';
Example output:
Bitmap Heap Scan on products (cost=52.01..2084.18 rows=500 width=20)
(actual time=1.234..5.678 rows=125000 loops=1)
Recheck Cond: (metadata @> '{"category": "electronics", "in_stock": true}'::jsonb)
Heap Blocks: exact=8334
-> Bitmap Index Scan on idx_products_metadata_gin (cost=0.00..51.88 rows=500 width=0)
(actual time=0.891..0.891 rows=125000 loops=1)
Index Cond: (metadata @> '{"category": "electronics", "in_stock": true}'::jsonb)
Planning Time: 0.152 ms
Execution Time: 12.345 ms
jsonb_ops vs jsonb_path_ops comparison:
| Property | jsonb_ops (default) | jsonb_path_ops |
|---|---|---|
| Supported operators | @>, ?, ?|, ?&, @@, @? | @>, @@, @? |
| Index size | Large (indexes keys + paths) | Small (stores path hashes only) |
| Key existence checks | Supported | Not supported |
| Containment search speed | Fast | Faster |
Full-Text Search
-- Full-text search table with GIN index
CREATE TABLE articles (
id SERIAL PRIMARY KEY,
title TEXT NOT NULL,
body TEXT NOT NULL,
tsv tsvector GENERATED ALWAYS AS (
setweight(to_tsvector('english', title), 'A') ||
setweight(to_tsvector('english', body), 'B')
) STORED
);
-- Create GIN index
CREATE INDEX idx_articles_tsv ON articles USING gin (tsv);
-- Full-text search query
EXPLAIN ANALYZE
SELECT id, title, ts_rank(tsv, q) AS rank
FROM articles, to_tsquery('english', 'postgresql & indexing') AS q
WHERE tsv @@ q
ORDER BY rank DESC
LIMIT 10;
Array Indexing
-- GIN index on tag arrays
CREATE TABLE posts (
id SERIAL PRIMARY KEY,
title TEXT,
tags TEXT[]
);
CREATE INDEX idx_posts_tags_gin ON posts USING gin (tags);
-- Array containment search
EXPLAIN ANALYZE
SELECT * FROM posts WHERE tags @> ARRAY['postgresql', 'performance'];
-- Array overlap search
EXPLAIN ANALYZE
SELECT * FROM posts WHERE tags && ARRAY['database', 'backend'];
GIN Pending List and fastupdate
By default, GIN indexes are configured with fastupdate=on. When new rows are inserted, they are not immediately merged into the index -- instead, they are temporarily stored in a Pending List and batch-merged during VACUUM or when the Pending List exceeds its size limit.
- Advantage: Improved write performance, especially during bulk INSERTs
- Disadvantage: CPU/IO spikes during Pending List merge; search queries must also scan the Pending List
In production, tune the gin_pending_list_limit parameter or manually call SELECT gin_clean_pending_list('idx_name') before peak hours.
GiST Index -- Spatial Data and Range Types
Internal Structure
GiST (Generalized Search Tree) is an extensible balanced tree framework. Unlike B-tree, GiST supports various data types and search strategies through operator class methods: consistent, union, penalty, picksplit, and others.
Internally, it uses an R-tree structure that wraps spatial data in bounding boxes (MBR: Minimum Bounding Rectangle) and organizes them hierarchically.
Spatial Data Indexing (PostGIS)
-- Install PostGIS extension
CREATE EXTENSION IF NOT EXISTS postgis;
-- Spatial data table
CREATE TABLE stores (
id SERIAL PRIMARY KEY,
name TEXT NOT NULL,
location GEOMETRY(Point, 4326) NOT NULL
);
-- Insert 1 million test rows (random coordinates around Seoul)
INSERT INTO stores (name, location)
SELECT
'store_' || i,
ST_SetSRID(ST_MakePoint(
126.9 + random() * 0.2, -- longitude
37.4 + random() * 0.2 -- latitude
), 4326)
FROM generate_series(1, 1000000) AS i;
-- Create GiST index
CREATE INDEX idx_stores_location_gist ON stores USING gist (location);
-- Find stores within 1km radius
EXPLAIN ANALYZE
SELECT id, name,
ST_Distance(location::geography,
ST_SetSRID(ST_MakePoint(127.0, 37.5), 4326)::geography) AS distance_m
FROM stores
WHERE ST_DWithin(location::geography,
ST_SetSRID(ST_MakePoint(127.0, 37.5), 4326)::geography,
1000)
ORDER BY distance_m
LIMIT 20;
Example output:
Limit (cost=8.45..8.50 rows=20 width=44)
(actual time=2.345..2.678 rows=20 loops=1)
-> Sort (cost=8.45..8.52 rows=25 width=44)
(actual time=2.340..2.350 rows=20 loops=1)
Sort Key: (st_distance(...))
Sort Method: top-N heapsort Memory: 27kB
-> Index Scan using idx_stores_location_gist on stores
(cost=0.42..7.89 rows=25 width=44)
(actual time=0.234..2.123 rows=785 loops=1)
Index Cond: (location && ...)
Filter: st_dwithin(...)
Planning Time: 0.456 ms
Execution Time: 2.789 ms
Without the GiST index, a Sequential Scan across all 1 million rows would take several seconds.
Range Type Indexing
-- Overlapping time range search for a reservation system
CREATE TABLE reservations (
id SERIAL PRIMARY KEY,
room_id INT NOT NULL,
period TSTZRANGE NOT NULL,
guest_name TEXT
);
-- GiST index for range overlap optimization
CREATE INDEX idx_reservations_period_gist ON reservations USING gist (period);
-- Find overlapping reservations
EXPLAIN ANALYZE
SELECT * FROM reservations
WHERE period && tstzrange('2026-03-10 14:00', '2026-03-10 18:00', '[)');
-- EXCLUDE constraint to prevent overlaps (requires GiST)
ALTER TABLE reservations
ADD CONSTRAINT no_overlap
EXCLUDE USING gist (room_id WITH =, period WITH &&);
GiST supports range operators: && (overlap), @> (contains), <@ (contained by), << (left of), >> (right of), -|- (adjacent).
KNN (K-Nearest Neighbor) Search
GiST indexes support KNN searches that handle ORDER BY distance patterns at the index level.
-- Find the 10 nearest stores using index scan
SELECT id, name, location <-> ST_SetSRID(ST_MakePoint(127.0, 37.5), 4326) AS dist
FROM stores
ORDER BY location <-> ST_SetSRID(ST_MakePoint(127.0, 37.5), 4326)
LIMIT 10;
The <-> operator combined with ORDER BY ... LIMIT performs efficient KNN search through the GiST index. Rather than sorting the entire table, it traverses the index tree and returns results in nearest-first order.
BRIN Index -- Time-Series and Large Tables
Internal Structure
BRIN (Block Range Index) stores only summary information -- minimum and maximum values -- for each physical block range (default: 128 pages) in the table. This makes the index extremely small, typically just a few MB even for tables with billions of rows.
The key prerequisite: there must be a strong correlation between column values and their physical storage order. Timestamp columns in time-series data are the classic example.
Time-Series Data Usage
-- Event log table (time-series)
CREATE TABLE events (
id BIGSERIAL PRIMARY KEY,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
event_type TEXT NOT NULL,
payload JSONB
);
-- Insert 50 million test rows
INSERT INTO events (created_at, event_type, payload)
SELECT
'2025-01-01'::timestamptz + (i || ' seconds')::interval,
(ARRAY['click', 'view', 'purchase', 'signup'])[1 + (i % 4)],
jsonb_build_object('user_id', (i % 100000), 'value', random() * 100)
FROM generate_series(1, 50000000) AS i;
-- B-tree index for comparison
CREATE INDEX idx_events_created_btree ON events (created_at);
-- BRIN index
CREATE INDEX idx_events_created_brin ON events USING brin (created_at)
WITH (pages_per_range = 128);
-- Compare index sizes
SELECT
indexname,
pg_size_pretty(pg_relation_size(indexname::regclass)) AS index_size
FROM pg_indexes
WHERE tablename = 'events' AND indexname LIKE 'idx_events_created%';
Size comparison example:
indexname | index_size
------------------------------+------------
idx_events_created_btree | 1071 MB
idx_events_created_brin | 128 kB
The BRIN index is approximately 8,500 times smaller than B-tree while handling the same range queries.
-- Range query using BRIN index
EXPLAIN ANALYZE
SELECT count(*) FROM events
WHERE created_at BETWEEN '2025-06-01' AND '2025-06-30';
Example output:
Aggregate (cost=456789.12..456789.13 rows=1 width=8)
(actual time=234.567..234.568 rows=1 loops=1)
-> Bitmap Heap Scan on events (cost=48.12..445678.90 rows=2592000 width=0)
(actual time=12.345..198.765 rows=2592000 loops=1)
Recheck Cond: (created_at >= ... AND created_at <= ...)
Rows Removed by Recheck: 45678
Heap Blocks: lossy=19200
-> Bitmap Index Scan on idx_events_created_brin (cost=0.00..47.50 rows=2600000 width=0)
(actual time=0.234..0.234 rows=192000 loops=1)
Planning Time: 0.123 ms
Execution Time: 256.789 ms
pages_per_range Tuning
The pages_per_range value is a trade-off between accuracy and index size.
| pages_per_range | Index Size | Accuracy | Best For |
|---|---|---|---|
| 32 | Larger | High | Frequent small-range queries |
| 128 (default) | Medium | Medium | General time-series data |
| 256+ | Very small | Low | Primarily large-range queries |
When BRIN Is Not Suitable
- Data is inserted randomly with no correlation between physical and logical order
- UPDATE operations move rows to different pages (HOT failure), breaking the correlation
- Point queries (single row lookups) are the primary workload
Check the correlation value in pg_stats to assess BRIN suitability. Values closer to 1 (absolute) are ideal.
SELECT tablename, attname, correlation
FROM pg_stats
WHERE tablename = 'events' AND attname = 'created_at';
-- correlation above 0.95 indicates good BRIN suitability
Partial Index and Expression Index
Partial Index
Create an index on only a subset of table rows to reduce index size and improve write performance.
-- Order table: index only active orders
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
customer_id INT NOT NULL,
status TEXT NOT NULL DEFAULT 'pending',
total_amount NUMERIC(10,2),
created_at TIMESTAMPTZ DEFAULT now()
);
-- Full index vs Partial index size comparison
CREATE INDEX idx_orders_status_full ON orders (status, created_at);
CREATE INDEX idx_orders_status_partial ON orders (created_at)
WHERE status IN ('pending', 'processing');
-- Query that uses the partial index
EXPLAIN ANALYZE
SELECT * FROM orders
WHERE status = 'pending'
AND created_at > now() - interval '7 days'
ORDER BY created_at DESC;
The key to partial indexes is that the query WHERE clause must imply the index predicate. The PostgreSQL optimizer can recognize simple inequality implications, but may not recognize complex expressions.
Unique Partial Index for Conditional Uniqueness
-- Allow only one active email per user
CREATE UNIQUE INDEX idx_unique_active_email
ON users (email) WHERE is_active = true;
-- Rows where is_active = false can have duplicate emails
-- Rows where is_active = true enforce email uniqueness
Expression Index
Create an index on the result of transforming column values. The query must use the exact same expression for the index to be utilized.
-- Expression index for case-insensitive search
CREATE INDEX idx_users_email_lower ON users (lower(email));
-- This query uses the index
EXPLAIN ANALYZE
SELECT * FROM users WHERE lower(email) = 'user@example.com';
-- Date extraction expression index
CREATE INDEX idx_orders_created_date ON orders ((created_at::date));
-- Used for date-based aggregations
EXPLAIN ANALYZE
SELECT created_at::date AS order_date, count(*)
FROM orders
WHERE created_at::date = '2026-03-10'
GROUP BY created_at::date;
-- Expression index on a specific JSONB key (smaller than full GIN)
CREATE INDEX idx_products_category ON products ((metadata->>'category'));
-- B-tree based, so equality and range searches work
EXPLAIN ANALYZE
SELECT * FROM products WHERE metadata->>'category' = 'electronics';
Combining Partial + Expression
-- Optimize email search for recently active users
CREATE INDEX idx_recent_active_users_email
ON users (lower(email))
WHERE last_login_at > now() - interval '7 days'
AND is_active = true;
This combination dramatically reduces index size while providing optimal performance for specific query patterns.
EXPLAIN ANALYZE for Index Performance Analysis
The most reliable way to verify index usage is EXPLAIN ANALYZE.
-- Basic EXPLAIN ANALYZE
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT * FROM products
WHERE metadata @> '{"category": "electronics"}';
-- Key points to check in the execution plan:
-- 1. Scan type: Index Scan vs Bitmap Index Scan vs Seq Scan
-- 2. actual time: Real elapsed time
-- 3. rows: Estimated vs actual row count discrepancy
-- 4. Buffers: shared hit (cache) vs shared read (disk)
Diagnosing Index Non-Usage
Common reasons why an existing index is not used:
- Stale statistics: ANALYZE not run, causing the optimizer to estimate incorrect cardinality
- Low selectivity: When results exceed 10-15% of the table, Sequential Scan is more efficient
- Type mismatch: Query condition data type differs from the indexed column type
- Expression mismatch: Expression Index and query expression do not match exactly
- enable_indexscan = off: Index scan disabled at the session level
-- Find unused indexes
SELECT
schemaname || '.' || relname AS table,
indexrelname AS index,
pg_size_pretty(pg_relation_size(i.indexrelid)) AS index_size,
idx_scan AS times_used,
idx_tup_read AS tuples_read
FROM pg_stat_user_indexes i
JOIN pg_index USING (indexrelid)
WHERE idx_scan = 0
AND NOT indisunique
AND NOT indisprimary
ORDER BY pg_relation_size(i.indexrelid) DESC;
Index Bloat Management and REINDEX
What Is Index Bloat
Due to PostgreSQL MVCC architecture, UPDATE is internally handled as DELETE + INSERT. Index entries for deleted tuples occupy space until VACUUM cleans them up. When this accumulates, index bloat occurs -- the index grows abnormally large.
Measuring Bloat
-- Measure bloat using pgstattuple extension
CREATE EXTENSION IF NOT EXISTS pgstattuple;
SELECT
avg_leaf_density,
leaf_pages,
empty_pages,
deleted_pages,
round(100 - avg_leaf_density, 1) AS bloat_pct
FROM pgstatindex('idx_orders_status_full');
-- Consider REINDEX when bloat_pct exceeds 20%
REINDEX CONCURRENTLY
In production, use REINDEX CONCURRENTLY to rebuild indexes without locking the table.
-- Rebuild a specific index (non-blocking)
REINDEX INDEX CONCURRENTLY idx_orders_status_full;
-- Rebuild all indexes on a table
REINDEX TABLE CONCURRENTLY orders;
-- Rebuild all indexes in a schema
REINDEX SCHEMA CONCURRENTLY public;
Important caveats:
- If REINDEX CONCURRENTLY fails midway, it leaves behind an invalid index with a
_ccnewsuffix. You must check for and drop these. - REINDEX CONCURRENTLY holds the xmin horizon during execution, which can delay dead tuple cleanup by other VACUUM processes.
-- Check for invalid indexes
SELECT indexrelid::regclass AS index_name,
indisvalid
FROM pg_index
WHERE NOT indisvalid;
-- Drop invalid indexes
-- DROP INDEX CONCURRENTLY idx_name_ccnew;
Automated Management Strategy
-- Monitoring query to find indexes with bloat exceeding 20%
SELECT
schemaname || '.' || tablename AS table,
indexname,
pg_size_pretty(pg_relation_size(indexname::regclass)) AS size,
round(100 - (pgstatindex(indexname)).avg_leaf_density, 1) AS bloat_pct
FROM pg_indexes
WHERE schemaname = 'public'
ORDER BY bloat_pct DESC NULLS LAST;
Index Type Comparison Table
| Property | B-tree | GIN | GiST | BRIN |
|---|---|---|---|---|
| Best data types | Scalar values | JSONB, arrays, tsvector | Spatial, ranges | Time-series, sequential |
| Index size | Medium | Large (60-80% of table) | Medium | Extremely small (KB) |
| Write overhead | Low | High | Medium | Very low |
| Point queries | Optimal | Supported | Supported | Inefficient |
| Range queries | Optimal | Not supported | Supported | Optimal (high correlation) |
| Full-text search | Not supported | Optimal | Not supported | Not supported |
| Spatial queries | Not supported | Not supported | Optimal | Limited |
| KNN search | Not supported | Not supported | Optimal | Not supported |
| Unique constraints | Supported | Not supported | Not supported | Not supported |
| Index-only scan | Supported | Not supported | Not supported | Not supported |
| CONCURRENTLY create | Supported | Supported | Supported | Supported |
| Physical sort required | No | No | No | Yes |
Operations Checklist
Pre-Index Creation Checklist
- Analyze query patterns: Check high-frequency queries in
pg_stat_statements - Verify the target column data type and operators (determine if B-tree suffices)
- Check the current execution plan with
EXPLAIN ANALYZE - Estimate table size and expected index size
- Plan for non-blocking creation with the CONCURRENTLY option
Regular Monitoring Items
- Unused indexes (
pg_stat_user_indexes.idx_scan = 0) - Index bloat percentage (
pgstatindexfunction) - Index size relative to table size
- BRIN index correlation (
pg_stats.correlation) - GIN Pending List size (
pg_stat_all_indexes.idx_tup_insert)
Index Type Selection Flow
- Equality / range / sort queries -> B-tree
- JSONB containment, arrays, full-text search -> GIN
- Spatial data, range overlaps, KNN -> GiST
- Time-series / append-only large table range queries -> BRIN
- Frequently querying only specific row subsets -> Partial Index
- Searching on function/transformation results -> Expression Index
Conclusion
Each PostgreSQL advanced index has a clear design purpose and optimal scenario. The key is to select the right index type based on your data characteristics and query patterns.
- If you have many JSONB or full-text search queries, consider GIN -- but account for write overhead and index size.
- For spatial data or range overlap queries, GiST is practically mandatory.
- For billion-row time-series tables, BRIN can reduce index size by over 99%.
- Partial Index and Expression Index can be combined with any index type for additional optimization.
Most importantly, always validate actual execution plans with EXPLAIN ANALYZE and continuously monitor index bloat in production. Indexes are not fire-and-forget assets -- they are operational objects that require ongoing maintenance.