PostgreSQL Query Optimization
EXPLAIN Basics
Running EXPLAIN
-- Basic plan (no execution) EXPLAIN SELECT * FROM orders WHERE customer_id = 123; -- With actual execution times and row counts EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 123; -- With buffer/IO statistics EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM orders WHERE customer_id = 123; -- Full verbose output with all options EXPLAIN (ANALYZE, BUFFERS, VERBOSE, FORMAT JSON) SELECT * FROM orders WHERE customer_id = 123;
Key Metrics to Watch
- •actual time: First row time..last row time in milliseconds
- •rows: Estimated vs actual row counts (large differences indicate stale statistics)
- •loops: How many times the node executed (important for nested loops)
- •Buffers: shared hit (cache) vs shared read (disk)—high read count = slow
- •Planning Time: Query planning overhead
- •Execution Time: Actual query execution time
Scan Types
Sequential Scan (Seq Scan)
Reads every row in the table. Acceptable for:
- •Small tables (<10K rows typically)
- •Queries returning large % of table (>5-10%)
- •No suitable index exists
Red flag: Seq Scan on large table with highly selective WHERE clause.
Index Scan
Uses B-tree index to find rows, then fetches from heap. Best for:
- •Highly selective queries (<5% of rows)
- •Sorted output matching index order
Index Only Scan
Answers query entirely from index (no heap fetch). Requires:
- •All needed columns in index (via INCLUDE or as key columns)
- •Table's visibility map is up-to-date (run VACUUM)
Goal: Convert Index Scan → Index Only Scan for read-heavy queries.
Bitmap Index Scan
Combines multiple index conditions or handles medium selectivity. Pattern:
- •Bitmap Index Scan: Build bitmap of matching pages
- •Bitmap Heap Scan: Fetch pages and recheck conditions
Good for OR conditions and medium selectivity (5-20% of rows).
Join Types
Nested Loop
For each row in outer table, scan inner table. Best when:
- •Inner table has good index
- •Outer table is small
- •Join returns few rows
Hash Join
Builds hash table from smaller table, probes with larger. Best for:
- •Larger joins without useful indexes
- •Equality joins only
Watch for: Batches > 1 means hash table spilled to disk (increase work_mem).
Merge Join
Both inputs sorted, merge together. Best for:
- •Large sorted datasets
- •Indexes provide sort order
- •Multiple equality conditions
Common Performance Issues
N+1 Query Problem
Symptom: Many small queries instead of one efficient join.
-- Bad: N+1 pattern (in application) SELECT * FROM orders WHERE id = 1; SELECT * FROM order_items WHERE order_id = 1; SELECT * FROM orders WHERE id = 2; SELECT * FROM order_items WHERE order_id = 2; -- ... repeated N times -- Good: Single query with JOIN SELECT o.*, oi.* FROM orders o JOIN order_items oi ON o.id = oi.order_id WHERE o.customer_id = 123;
Missing Index on FK
Symptom: Slow deletes on parent table, slow joins.
-- Check for missing FK indexes
SELECT
c.conrelid::regclass AS table_name,
a.attname AS column_name,
c.confrelid::regclass AS referenced_table
FROM pg_constraint c
JOIN pg_attribute a ON a.attrelid = c.conrelid AND a.attnum = ANY(c.conkey)
WHERE c.contype = 'f'
AND NOT EXISTS (
SELECT 1 FROM pg_index i
WHERE i.indrelid = c.conrelid
AND a.attnum = ANY(i.indkey)
);
Over-Indexing
Symptom: Slow inserts/updates, excessive disk usage.
-- Find unused indexes
SELECT
schemaname || '.' || relname AS table,
indexrelname AS index,
pg_size_pretty(pg_relation_size(i.indexrelid)) AS size,
idx_scan AS scans
FROM pg_stat_user_indexes i
JOIN pg_index USING (indexrelid)
WHERE idx_scan = 0
AND NOT indisunique
AND NOT indisprimary
ORDER BY pg_relation_size(i.indexrelid) DESC;
Stale Statistics
Symptom: Planner estimates wildly wrong vs actual rows.
-- Check table statistics freshness
SELECT
schemaname,
relname,
last_vacuum,
last_autovacuum,
last_analyze,
last_autoanalyze,
n_dead_tup
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC;
-- Force statistics update
ANALYZE table_name;
ANALYZE VERBOSE table_name; -- with progress
Inefficient Pagination
Symptom: OFFSET-based pagination gets slower for higher pages.
-- Bad: OFFSET pagination (rescans all previous rows) SELECT * FROM orders ORDER BY created_at DESC OFFSET 10000 LIMIT 20; -- Good: Keyset/cursor pagination SELECT * FROM orders WHERE created_at < '2024-01-15T10:30:00Z' ORDER BY created_at DESC LIMIT 20;
Index Selection Strategy
When to Create Indexes
- •WHERE clause columns: Frequently filtered columns
- •JOIN columns: Both sides of JOIN conditions
- •ORDER BY columns: For sorted output without extra sort
- •Foreign keys: Always index FK columns (PostgreSQL doesn't auto-create)
- •Unique constraints: Already create indexes automatically
Index Types by Use Case
| Use Case | Index Type | Example |
|---|---|---|
| Equality, range, ORDER BY | B-tree (default) | CREATE INDEX ON orders (status) |
| JSONB containment | GIN | CREATE INDEX ON docs USING GIN (data) |
| Array operations | GIN | CREATE INDEX ON posts USING GIN (tags) |
| Full-text search | GIN | CREATE INDEX ON articles USING GIN (to_tsvector('english', body)) |
| Range overlap | GiST | CREATE INDEX ON bookings USING GiST (daterange) |
| Time-series (huge tables) | BRIN | CREATE INDEX ON logs USING BRIN (created_at) |
| Pattern matching (LIKE) | GIN + pg_trgm | CREATE INDEX ON users USING GIN (name gin_trgm_ops) |
Partial Indexes
Index only rows matching a condition:
-- Only index active orders (common query pattern) CREATE INDEX ON orders (customer_id) WHERE status = 'active'; -- Only index non-null values CREATE INDEX ON users (referral_code) WHERE referral_code IS NOT NULL;
Covering Indexes
Include extra columns for index-only scans:
-- Query: SELECT name, email FROM users WHERE id = ? CREATE INDEX ON users (id) INCLUDE (name, email);
Expression Indexes
Index computed values:
-- Case-insensitive email lookup CREATE INDEX ON users (LOWER(email)); -- JSONB field extraction CREATE INDEX ON products ((data->>'category')); -- Date part extraction CREATE INDEX ON events (DATE(created_at));
pg_stat_statements
Essential extension for query performance monitoring.
Enable and Configure
-- In postgresql.conf shared_preload_libraries = 'pg_stat_statements' pg_stat_statements.track = all -- Create extension CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
Find Slowest Queries
-- Top 10 by total time
SELECT
calls,
round(total_exec_time::numeric, 2) AS total_time_ms,
round(mean_exec_time::numeric, 2) AS mean_time_ms,
round((100 * total_exec_time / sum(total_exec_time) OVER ())::numeric, 2) AS percent_total,
query
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;
-- Queries with high variance (inconsistent performance)
SELECT
calls,
round(mean_exec_time::numeric, 2) AS mean_ms,
round(stddev_exec_time::numeric, 2) AS stddev_ms,
query
FROM pg_stat_statements
WHERE calls > 100
ORDER BY stddev_exec_time DESC
LIMIT 10;
Reset Statistics
SELECT pg_stat_statements_reset();
Lock Monitoring
Current Locks
SELECT
pg_class.relname,
pg_locks.mode,
pg_locks.granted,
pg_stat_activity.query,
pg_stat_activity.pid
FROM pg_locks
JOIN pg_class ON pg_locks.relation = pg_class.oid
JOIN pg_stat_activity ON pg_locks.pid = pg_stat_activity.pid
WHERE pg_class.relname NOT LIKE 'pg_%'
ORDER BY pg_class.relname;
Blocked Queries
SELECT
blocked.pid AS blocked_pid,
blocked.query AS blocked_query,
blocking.pid AS blocking_pid,
blocking.query AS blocking_query,
now() - blocked.query_start AS blocked_duration
FROM pg_stat_activity blocked
JOIN pg_locks blocked_locks ON blocked.pid = blocked_locks.pid
JOIN pg_locks blocking_locks ON blocked_locks.locktype = blocking_locks.locktype
AND blocked_locks.relation = blocking_locks.relation
AND blocked_locks.pid != blocking_locks.pid
JOIN pg_stat_activity blocking ON blocking_locks.pid = blocking.pid
WHERE NOT blocked_locks.granted;
Long-Running Transactions
SELECT
pid,
now() - xact_start AS duration,
state,
query
FROM pg_stat_activity
WHERE state != 'idle'
AND xact_start < now() - interval '5 minutes'
ORDER BY duration DESC;
Connection Management
Current Connections
SELECT
state,
COUNT(*) AS count,
MAX(now() - state_change) AS max_duration
FROM pg_stat_activity
GROUP BY state;
Connection Pool Sizing
Rule of thumb: connections = (core_count * 2) + effective_spindle_count
For most web apps with SSD: 10-20 connections per CPU core is reasonable.
Idle Connection Cleanup
-- Terminate idle connections older than 10 minutes SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE state = 'idle' AND state_change < now() - interval '10 minutes';
Memory Settings
work_mem
Memory per operation (sort, hash). Default 4MB is often too low.
-- Check current setting SHOW work_mem; -- Set for session (for heavy analytical queries) SET work_mem = '256MB'; -- Check if sorts are spilling to disk EXPLAIN (ANALYZE, BUFFERS) SELECT ... ORDER BY ...; -- Look for: "Sort Method: external merge Disk:"
shared_buffers
Main memory cache. Start with 25% of system RAM.
effective_cache_size
Hint to planner about total cache (OS + PostgreSQL). Set to ~75% of RAM.
Query Patterns to Avoid
SELECT *
Fetch only needed columns for smaller data transfer and potential index-only scans.
Functions on Indexed Columns
-- Bad: Can't use index on created_at WHERE DATE(created_at) = '2024-01-15' -- Good: Range query uses index WHERE created_at >= '2024-01-15' AND created_at < '2024-01-16'
OR with Different Columns
-- Bad: Often causes Seq Scan WHERE status = 'active' OR customer_id = 123 -- Better: UNION of indexed queries SELECT * FROM orders WHERE status = 'active' UNION ALL SELECT * FROM orders WHERE customer_id = 123 AND status != 'active'
DISTINCT When Not Needed
DISTINCT adds sort/hash overhead. Ensure your query design eliminates duplicates naturally through proper joins.
Correlated Subqueries
-- Bad: Executes subquery for each row
SELECT * FROM orders o
WHERE total > (SELECT AVG(total) FROM orders WHERE customer_id = o.customer_id);
-- Good: Use window function or CTE
WITH customer_avg AS (
SELECT customer_id, AVG(total) AS avg_total
FROM orders
GROUP BY customer_id
)
SELECT o.*
FROM orders o
JOIN customer_avg ca ON o.customer_id = ca.customer_id
WHERE o.total > ca.avg_total;
Performance Checklist
- •EXPLAIN ANALYZE your slow queries
- •Check estimated vs actual rows (stale stats?)
- •Look for Seq Scans on large tables
- •Verify indexes exist for WHERE/JOIN columns
- •Check for missing FK indexes
- •Review work_mem for sorts spilling to disk
- •Consider partial/covering indexes for hot queries
- •Use keyset pagination instead of OFFSET
- •Enable pg_stat_statements for ongoing monitoring
- •Regular VACUUM ANALYZE for fresh statistics