PostgreSQL Advanced Indexing Guide: GIN, GiST, BRIN, and Partial Index in Practice

Introduction
PostgreSQL Index Types Overview
B-tree Limitations and the Need for Advanced Indexes
GIN Index Deep Dive
GiST Index -- Spatial Data and Range Types
BRIN Index -- Time-Series and Large Tables
Partial Index and Expression Index
EXPLAIN ANALYZE for Index Performance Analysis
- Diagnosing Index Non-Usage
Index Bloat Management and REINDEX
Index Type Comparison Table
Operations Checklist
Conclusion

Introduction

PostgreSQL offers six index types: B-tree, Hash, GIN, GiST, SP-GiST, and BRIN. While most tutorials stop at B-tree, real-world applications frequently encounter problems that B-tree alone cannot efficiently solve -- JSONB queries, full-text search, spatial queries, and time-series tables with billions of rows.

This article goes beyond B-tree to explore the internal structures of GIN, GiST, BRIN, Partial Index, and Expression Index, covering when and how to apply each with EXPLAIN ANALYZE-based performance data.

PostgreSQL Index Types Overview

A quick summary of PostgreSQL index types:

Index Type	Internal Structure	Best Use Cases	Size	Write Cost
B-tree	Balanced tree	Equality, range, sort, UNIQUE	Medium	Low
GIN	Inverted index (Posting List/Tree)	JSONB, arrays, tsvector	Large	High
GiST	Generalized search tree	Spatial data, ranges, proximity	Medium	Medium
BRIN	Block range summary	Time-series, append-only large tables	Very small	Very low
Hash	Hash table	Pure equality lookups	Small	Low
SP-GiST	Space-partitioned tree	Phone numbers, IP addresses, unbalanced trees	Medium	Medium

B-tree Limitations and the Need for Advanced Indexes

B-tree is optimized for scalar value equality comparisons and range searches. However, it becomes inefficient or unusable in the following scenarios:

JSONB containment queries: Operators like @> in WHERE metadata @> '...' cannot be indexed with B-tree.
Full-text search: to_tsvector()-based search requires GIN indexes.
Spatial queries: PostGIS functions like ST_DWithin() and ST_Contains() need GiST indexes.
Billion-row time-series tables: The B-tree index itself grows to tens of GB, causing memory pressure.

PostgreSQL provides specialized index types to address these challenges.

GIN Index Deep Dive

Internal Structure

GIN (Generalized Inverted Index) uses an inverted index structure. Internally, it builds a B-tree over key values, and each leaf node stores a Posting List or Posting Tree -- a list of TIDs (Tuple Identifiers) for rows containing that key.

For example, if a JSONB column contains "tags": ["python", "database"], the GIN index registers both "python" and "database" as keys and adds the row TID to each key's Posting List.

JSONB Indexing

Creating a GIN index on a JSONB column enables index scans for operators like @>, ?, ?|, and ?&.

-- Create table
CREATE TABLE products (
    id SERIAL PRIMARY KEY,
    name TEXT NOT NULL,
    metadata JSONB NOT NULL
);

-- Insert 1 million test rows
INSERT INTO products (name, metadata)
SELECT
    'product_' || i,
    jsonb_build_object(
        'category', (ARRAY['electronics', 'clothing', 'food', 'toys'])[1 + (i % 4)],
        'price', (random() * 1000)::int,
        'tags', jsonb_build_array(
            (ARRAY['sale', 'new', 'popular', 'limited'])[1 + (i % 4)],
            (ARRAY['premium', 'budget', 'mid-range'])[1 + (i % 3)]
        ),
        'in_stock', (i % 2 = 0)
    )
FROM generate_series(1, 1000000) AS i;

-- GIN index with default jsonb_ops
CREATE INDEX idx_products_metadata_gin ON products USING gin (metadata);

-- GIN index with jsonb_path_ops (smaller index, @> only)
CREATE INDEX idx_products_metadata_path ON products USING gin (metadata jsonb_path_ops);

-- Containment query with EXPLAIN ANALYZE
EXPLAIN ANALYZE
SELECT id, name
FROM products
WHERE metadata @> '{"category": "electronics", "in_stock": true}';

Example output:

Bitmap Heap Scan on products  (cost=52.01..2084.18 rows=500 width=20)
  (actual time=1.234..5.678 rows=125000 loops=1)
  Recheck Cond: (metadata @> '{"category": "electronics", "in_stock": true}'::jsonb)
  Heap Blocks: exact=8334
  ->  Bitmap Index Scan on idx_products_metadata_gin  (cost=0.00..51.88 rows=500 width=0)
      (actual time=0.891..0.891 rows=125000 loops=1)
        Index Cond: (metadata @> '{"category": "electronics", "in_stock": true}'::jsonb)
Planning Time: 0.152 ms
Execution Time: 12.345 ms

jsonb_ops vs jsonb_path_ops comparison:

Property	jsonb_ops (default)	jsonb_path_ops
Supported operators	`@>`, `?`, `?\|`, `?&`, `@@`, `@?`	`@>`, `@@`, `@?`
Index size	Large (indexes keys + paths)	Small (stores path hashes only)
Key existence checks	Supported	Not supported
Containment search speed	Fast	Faster

Full-Text Search

-- Full-text search table with GIN index
CREATE TABLE articles (
    id SERIAL PRIMARY KEY,
    title TEXT NOT NULL,
    body TEXT NOT NULL,
    tsv tsvector GENERATED ALWAYS AS (
        setweight(to_tsvector('english', title), 'A') ||
        setweight(to_tsvector('english', body), 'B')
    ) STORED
);

-- Create GIN index
CREATE INDEX idx_articles_tsv ON articles USING gin (tsv);

-- Full-text search query
EXPLAIN ANALYZE
SELECT id, title, ts_rank(tsv, q) AS rank
FROM articles, to_tsquery('english', 'postgresql & indexing') AS q
WHERE tsv @@ q
ORDER BY rank DESC
LIMIT 10;

Array Indexing

-- GIN index on tag arrays
CREATE TABLE posts (
    id SERIAL PRIMARY KEY,
    title TEXT,
    tags TEXT[]
);

CREATE INDEX idx_posts_tags_gin ON posts USING gin (tags);

-- Array containment search
EXPLAIN ANALYZE
SELECT * FROM posts WHERE tags @> ARRAY['postgresql', 'performance'];

-- Array overlap search
EXPLAIN ANALYZE
SELECT * FROM posts WHERE tags && ARRAY['database', 'backend'];

GIN Pending List and fastupdate

By default, GIN indexes are configured with fastupdate=on. When new rows are inserted, they are not immediately merged into the index -- instead, they are temporarily stored in a Pending List and batch-merged during VACUUM or when the Pending List exceeds its size limit.

Advantage: Improved write performance, especially during bulk INSERTs
Disadvantage: CPU/IO spikes during Pending List merge; search queries must also scan the Pending List

In production, tune the gin_pending_list_limit parameter or manually call SELECT gin_clean_pending_list('idx_name') before peak hours.

GiST Index -- Spatial Data and Range Types

Internal Structure

GiST (Generalized Search Tree) is an extensible balanced tree framework. Unlike B-tree, GiST supports various data types and search strategies through operator class methods: consistent, union, penalty, picksplit, and others.

Internally, it uses an R-tree structure that wraps spatial data in bounding boxes (MBR: Minimum Bounding Rectangle) and organizes them hierarchically.

Spatial Data Indexing (PostGIS)

-- Install PostGIS extension
CREATE EXTENSION IF NOT EXISTS postgis;

-- Spatial data table
CREATE TABLE stores (
    id SERIAL PRIMARY KEY,
    name TEXT NOT NULL,
    location GEOMETRY(Point, 4326) NOT NULL
);

-- Insert 1 million test rows (random coordinates around Seoul)
INSERT INTO stores (name, location)
SELECT
    'store_' || i,
    ST_SetSRID(ST_MakePoint(
        126.9 + random() * 0.2,   -- longitude
        37.4 + random() * 0.2     -- latitude
    ), 4326)
FROM generate_series(1, 1000000) AS i;

-- Create GiST index
CREATE INDEX idx_stores_location_gist ON stores USING gist (location);

-- Find stores within 1km radius
EXPLAIN ANALYZE
SELECT id, name,
       ST_Distance(location::geography,
                   ST_SetSRID(ST_MakePoint(127.0, 37.5), 4326)::geography) AS distance_m
FROM stores
WHERE ST_DWithin(location::geography,
                 ST_SetSRID(ST_MakePoint(127.0, 37.5), 4326)::geography,
                 1000)
ORDER BY distance_m
LIMIT 20;

Example output:

Limit  (cost=8.45..8.50 rows=20 width=44)
  (actual time=2.345..2.678 rows=20 loops=1)
  ->  Sort  (cost=8.45..8.52 rows=25 width=44)
      (actual time=2.340..2.350 rows=20 loops=1)
        Sort Key: (st_distance(...))
        Sort Method: top-N heapsort  Memory: 27kB
        ->  Index Scan using idx_stores_location_gist on stores
            (cost=0.42..7.89 rows=25 width=44)
            (actual time=0.234..2.123 rows=785 loops=1)
              Index Cond: (location && ...)
              Filter: st_dwithin(...)
Planning Time: 0.456 ms
Execution Time: 2.789 ms

Without the GiST index, a Sequential Scan across all 1 million rows would take several seconds.

Range Type Indexing

-- Overlapping time range search for a reservation system
CREATE TABLE reservations (
    id SERIAL PRIMARY KEY,
    room_id INT NOT NULL,
    period TSTZRANGE NOT NULL,
    guest_name TEXT
);

-- GiST index for range overlap optimization
CREATE INDEX idx_reservations_period_gist ON reservations USING gist (period);

-- Find overlapping reservations
EXPLAIN ANALYZE
SELECT * FROM reservations
WHERE period && tstzrange('2026-03-10 14:00', '2026-03-10 18:00', '[)');

-- EXCLUDE constraint to prevent overlaps (requires GiST)
ALTER TABLE reservations
ADD CONSTRAINT no_overlap
EXCLUDE USING gist (room_id WITH =, period WITH &&);

GiST supports range operators: && (overlap), @> (contains), <@ (contained by), << (left of), >> (right of), -|- (adjacent).

KNN (K-Nearest Neighbor) Search

GiST indexes support KNN searches that handle ORDER BY distance patterns at the index level.

-- Find the 10 nearest stores using index scan
SELECT id, name, location <-> ST_SetSRID(ST_MakePoint(127.0, 37.5), 4326) AS dist
FROM stores
ORDER BY location <-> ST_SetSRID(ST_MakePoint(127.0, 37.5), 4326)
LIMIT 10;

The <-> operator combined with ORDER BY ... LIMIT performs efficient KNN search through the GiST index. Rather than sorting the entire table, it traverses the index tree and returns results in nearest-first order.

BRIN Index -- Time-Series and Large Tables

Internal Structure

BRIN (Block Range Index) stores only summary information -- minimum and maximum values -- for each physical block range (default: 128 pages) in the table. This makes the index extremely small, typically just a few MB even for tables with billions of rows.

The key prerequisite: there must be a strong correlation between column values and their physical storage order. Timestamp columns in time-series data are the classic example.

Time-Series Data Usage

-- Event log table (time-series)
CREATE TABLE events (
    id BIGSERIAL PRIMARY KEY,
    created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
    event_type TEXT NOT NULL,
    payload JSONB
);

-- Insert 50 million test rows
INSERT INTO events (created_at, event_type, payload)
SELECT
    '2025-01-01'::timestamptz + (i || ' seconds')::interval,
    (ARRAY['click', 'view', 'purchase', 'signup'])[1 + (i % 4)],
    jsonb_build_object('user_id', (i % 100000), 'value', random() * 100)
FROM generate_series(1, 50000000) AS i;

-- B-tree index for comparison
CREATE INDEX idx_events_created_btree ON events (created_at);

-- BRIN index
CREATE INDEX idx_events_created_brin ON events USING brin (created_at)
WITH (pages_per_range = 128);

-- Compare index sizes
SELECT
    indexname,
    pg_size_pretty(pg_relation_size(indexname::regclass)) AS index_size
FROM pg_indexes
WHERE tablename = 'events' AND indexname LIKE 'idx_events_created%';

Size comparison example:

       indexname              | index_size
------------------------------+------------
 idx_events_created_btree     | 1071 MB
 idx_events_created_brin      | 128 kB

The BRIN index is approximately 8,500 times smaller than B-tree while handling the same range queries.

-- Range query using BRIN index
EXPLAIN ANALYZE
SELECT count(*) FROM events
WHERE created_at BETWEEN '2025-06-01' AND '2025-06-30';

Example output:

Aggregate  (cost=456789.12..456789.13 rows=1 width=8)
  (actual time=234.567..234.568 rows=1 loops=1)
  ->  Bitmap Heap Scan on events  (cost=48.12..445678.90 rows=2592000 width=0)
      (actual time=12.345..198.765 rows=2592000 loops=1)
        Recheck Cond: (created_at >= ... AND created_at <= ...)
        Rows Removed by Recheck: 45678
        Heap Blocks: lossy=19200
        ->  Bitmap Index Scan on idx_events_created_brin  (cost=0.00..47.50 rows=2600000 width=0)
            (actual time=0.234..0.234 rows=192000 loops=1)
Planning Time: 0.123 ms
Execution Time: 256.789 ms

pages_per_range Tuning

The pages_per_range value is a trade-off between accuracy and index size.

pages_per_range	Index Size	Accuracy	Best For
32	Larger	High	Frequent small-range queries
128 (default)	Medium	Medium	General time-series data
256+	Very small	Low	Primarily large-range queries

When BRIN Is Not Suitable

Data is inserted randomly with no correlation between physical and logical order
UPDATE operations move rows to different pages (HOT failure), breaking the correlation
Point queries (single row lookups) are the primary workload

Check the correlation value in pg_stats to assess BRIN suitability. Values closer to 1 (absolute) are ideal.

SELECT tablename, attname, correlation
FROM pg_stats
WHERE tablename = 'events' AND attname = 'created_at';
-- correlation above 0.95 indicates good BRIN suitability

Partial Index and Expression Index

Partial Index

Create an index on only a subset of table rows to reduce index size and improve write performance.

-- Order table: index only active orders
CREATE TABLE orders (
    id SERIAL PRIMARY KEY,
    customer_id INT NOT NULL,
    status TEXT NOT NULL DEFAULT 'pending',
    total_amount NUMERIC(10,2),
    created_at TIMESTAMPTZ DEFAULT now()
);

-- Full index vs Partial index size comparison
CREATE INDEX idx_orders_status_full ON orders (status, created_at);
CREATE INDEX idx_orders_status_partial ON orders (created_at)
    WHERE status IN ('pending', 'processing');

-- Query that uses the partial index
EXPLAIN ANALYZE
SELECT * FROM orders
WHERE status = 'pending'
  AND created_at > now() - interval '7 days'
ORDER BY created_at DESC;

The key to partial indexes is that the query WHERE clause must imply the index predicate. The PostgreSQL optimizer can recognize simple inequality implications, but may not recognize complex expressions.

Unique Partial Index for Conditional Uniqueness

-- Allow only one active email per user
CREATE UNIQUE INDEX idx_unique_active_email
ON users (email) WHERE is_active = true;

-- Rows where is_active = false can have duplicate emails
-- Rows where is_active = true enforce email uniqueness

Expression Index

Create an index on the result of transforming column values. The query must use the exact same expression for the index to be utilized.

-- Expression index for case-insensitive search
CREATE INDEX idx_users_email_lower ON users (lower(email));

-- This query uses the index
EXPLAIN ANALYZE
SELECT * FROM users WHERE lower(email) = 'user@example.com';

-- Date extraction expression index
CREATE INDEX idx_orders_created_date ON orders ((created_at::date));

-- Used for date-based aggregations
EXPLAIN ANALYZE
SELECT created_at::date AS order_date, count(*)
FROM orders
WHERE created_at::date = '2026-03-10'
GROUP BY created_at::date;

-- Expression index on a specific JSONB key (smaller than full GIN)
CREATE INDEX idx_products_category ON products ((metadata->>'category'));

-- B-tree based, so equality and range searches work
EXPLAIN ANALYZE
SELECT * FROM products WHERE metadata->>'category' = 'electronics';

Combining Partial + Expression

-- Optimize email search for recently active users
CREATE INDEX idx_recent_active_users_email
ON users (lower(email))
WHERE last_login_at > now() - interval '7 days'
  AND is_active = true;

This combination dramatically reduces index size while providing optimal performance for specific query patterns.

EXPLAIN ANALYZE for Index Performance Analysis

The most reliable way to verify index usage is EXPLAIN ANALYZE.

-- Basic EXPLAIN ANALYZE
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT * FROM products
WHERE metadata @> '{"category": "electronics"}';

-- Key points to check in the execution plan:
-- 1. Scan type: Index Scan vs Bitmap Index Scan vs Seq Scan
-- 2. actual time: Real elapsed time
-- 3. rows: Estimated vs actual row count discrepancy
-- 4. Buffers: shared hit (cache) vs shared read (disk)

Diagnosing Index Non-Usage

Common reasons why an existing index is not used:

Stale statistics: ANALYZE not run, causing the optimizer to estimate incorrect cardinality
Low selectivity: When results exceed 10-15% of the table, Sequential Scan is more efficient
Type mismatch: Query condition data type differs from the indexed column type
Expression mismatch: Expression Index and query expression do not match exactly
enable_indexscan = off: Index scan disabled at the session level

-- Find unused indexes
SELECT
    schemaname || '.' || relname AS table,
    indexrelname AS index,
    pg_size_pretty(pg_relation_size(i.indexrelid)) AS index_size,
    idx_scan AS times_used,
    idx_tup_read AS tuples_read
FROM pg_stat_user_indexes i
JOIN pg_index USING (indexrelid)
WHERE idx_scan = 0
  AND NOT indisunique
  AND NOT indisprimary
ORDER BY pg_relation_size(i.indexrelid) DESC;

Index Bloat Management and REINDEX

What Is Index Bloat

Due to PostgreSQL MVCC architecture, UPDATE is internally handled as DELETE + INSERT. Index entries for deleted tuples occupy space until VACUUM cleans them up. When this accumulates, index bloat occurs -- the index grows abnormally large.

Measuring Bloat

-- Measure bloat using pgstattuple extension
CREATE EXTENSION IF NOT EXISTS pgstattuple;

SELECT
    avg_leaf_density,
    leaf_pages,
    empty_pages,
    deleted_pages,
    round(100 - avg_leaf_density, 1) AS bloat_pct
FROM pgstatindex('idx_orders_status_full');

-- Consider REINDEX when bloat_pct exceeds 20%

REINDEX CONCURRENTLY

In production, use REINDEX CONCURRENTLY to rebuild indexes without locking the table.

-- Rebuild a specific index (non-blocking)
REINDEX INDEX CONCURRENTLY idx_orders_status_full;

-- Rebuild all indexes on a table
REINDEX TABLE CONCURRENTLY orders;

-- Rebuild all indexes in a schema
REINDEX SCHEMA CONCURRENTLY public;

Important caveats:

If REINDEX CONCURRENTLY fails midway, it leaves behind an invalid index with a _ccnew suffix. You must check for and drop these.
REINDEX CONCURRENTLY holds the xmin horizon during execution, which can delay dead tuple cleanup by other VACUUM processes.

-- Check for invalid indexes
SELECT indexrelid::regclass AS index_name,
       indisvalid
FROM pg_index
WHERE NOT indisvalid;

-- Drop invalid indexes
-- DROP INDEX CONCURRENTLY idx_name_ccnew;

Automated Management Strategy

-- Monitoring query to find indexes with bloat exceeding 20%
SELECT
    schemaname || '.' || tablename AS table,
    indexname,
    pg_size_pretty(pg_relation_size(indexname::regclass)) AS size,
    round(100 - (pgstatindex(indexname)).avg_leaf_density, 1) AS bloat_pct
FROM pg_indexes
WHERE schemaname = 'public'
ORDER BY bloat_pct DESC NULLS LAST;

Index Type Comparison Table

Property	B-tree	GIN	GiST	BRIN
Best data types	Scalar values	JSONB, arrays, tsvector	Spatial, ranges	Time-series, sequential
Index size	Medium	Large (60-80% of table)	Medium	Extremely small (KB)
Write overhead	Low	High	Medium	Very low
Point queries	Optimal	Supported	Supported	Inefficient
Range queries	Optimal	Not supported	Supported	Optimal (high correlation)
Full-text search	Not supported	Optimal	Not supported	Not supported
Spatial queries	Not supported	Not supported	Optimal	Limited
KNN search	Not supported	Not supported	Optimal	Not supported
Unique constraints	Supported	Not supported	Not supported	Not supported
Index-only scan	Supported	Not supported	Not supported	Not supported
CONCURRENTLY create	Supported	Supported	Supported	Supported
Physical sort required	No	No	No	Yes

Operations Checklist

Pre-Index Creation Checklist

Analyze query patterns: Check high-frequency queries in pg_stat_statements
Verify the target column data type and operators (determine if B-tree suffices)
Check the current execution plan with EXPLAIN ANALYZE
Estimate table size and expected index size
Plan for non-blocking creation with the CONCURRENTLY option

Regular Monitoring Items

Unused indexes (pg_stat_user_indexes.idx_scan = 0)
Index bloat percentage (pgstatindex function)
Index size relative to table size
BRIN index correlation (pg_stats.correlation)
GIN Pending List size (pg_stat_all_indexes.idx_tup_insert)

Index Type Selection Flow

Equality / range / sort queries -> B-tree
JSONB containment, arrays, full-text search -> GIN
Spatial data, range overlaps, KNN -> GiST
Time-series / append-only large table range queries -> BRIN
Frequently querying only specific row subsets -> Partial Index
Searching on function/transformation results -> Expression Index

Conclusion

Each PostgreSQL advanced index has a clear design purpose and optimal scenario. The key is to select the right index type based on your data characteristics and query patterns.

If you have many JSONB or full-text search queries, consider GIN -- but account for write overhead and index size.
For spatial data or range overlap queries, GiST is practically mandatory.
For billion-row time-series tables, BRIN can reduce index size by over 99%.
Partial Index and Expression Index can be combined with any index type for additional optimization.

Most importantly, always validate actual execution plans with EXPLAIN ANALYZE and continuously monitor index bloat in production. Indexes are not fire-and-forget assets -- they are operational objects that require ongoing maintenance.