Overview

When building a data model for PostgreSQL, you're more than likely to have foreign keys to define relationships between tables. When you have those foreign keys, one quick way to optimize your queries and allow them to be significantly faster is to index on those foreign key references.

Reference Checks

When updates or deletes are performed on the primary table, it has to check if any other tables references it. If there isn't an index on that table that references it, it will do an entire sequential scan on that table to see if any of the rows in the table reference the primary table. That can cause queries to slow down significantly. If there is an index, the PostgreSQL can quickly verify if there are any referencing rows without needing to scan the entire table.

The same logic applies to queries with a cascading effect. These queries typically look like ON DELETE CASCADE or ON UPDATE CASCADE which will cause the PostgreSQL planner to locate the rows referencing that and delete or update them. This is much faster if the lookup is in-memory using an index.

Join Performance

If you're building a model out with foreign keys, there's a decent likelihood that you're going to need to join those tables together to form the combined data output that you desire. In those cases, doing a JOIN on a foreign key column is common. With that in mind, having indexes on those foreign keys can make your query a lot faster especially when the referencing table has a lot of rows in it.

Example Schema

Let's say you have a simple blog system where each blog can have many posts. You can end up with this simple schema. You'll want to make sure that there's an index on the foreign key in the posts table that references an id in the blog table.

-- Primary table
CREATE TABLE blog (
    id SERIAL PRIMARY KEY,
    name TEXT
);

-- Referencing table
CREATE TABLE posts (
    id SERIAL PRIMARY KEY,
    blog_id INT REFERENCES blog(id)
);

-- Adding index on the foreign key
CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_posts_blog_id ON posts(blog_id);
💡
The use of CONCURRENTLY avoids locking the table when creating the index

Find All Missing Indexes

If you're working in an existing large schema, it's important to identify where foreign keys are not indexed. I have seen this have a large impact on multiple scaled PostgreSQL databases.

You can find out what those foreign keys columns are with the following query:

SELECT DISTINCT
  tc.table_name,
  kcu.column_name
FROM information_schema.table_constraints tc
JOIN information_schema.key_column_usage kcu
  ON tc.constraint_name = kcu.constraint_name
WHERE tc.constraint_type = 'FOREIGN KEY'
  AND NOT EXISTS (
    SELECT 1 FROM pg_indexes
    WHERE tablename = tc.table_name
    AND indexdef LIKE '%' || kcu.column_name || '%'
  )
ORDER BY tc.table_name, kcu.column_name;

Then, if you want to get CREATE INDEX commands for all of those, you can take it a step further with:

SELECT 
  'CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_' || 
  LOWER(table_name) || '_' || 
  LOWER(column_name) || 
  ' ON ' || table_name || '(' || column_name || ');' AS create_index_command
FROM (
  SELECT DISTINCT
    tc.table_name,
    kcu.column_name
  FROM information_schema.table_constraints tc
  JOIN information_schema.key_column_usage kcu 
    ON tc.constraint_name = kcu.constraint_name
  WHERE tc.constraint_type = 'FOREIGN KEY'
    AND NOT EXISTS (
      SELECT 1 FROM pg_indexes 
      WHERE tablename = tc.table_name 
      AND indexdef LIKE '%' || kcu.column_name || '%'
    )
) AS unique_fks
ORDER BY table_name, column_name;

This will get you a long list that you can then add into a database migration system of your choice to get your foreign keys indexed. You should see an impact on queries after applying these especially at scale.

⚠️
Keep the size of your tables in mind. Adding many indexes on large tables at once can cause heavy load on your database. It may be good to batch these database migrations.

Conclusion

Adding indexes to your PostgreSQL foreign key references can reduce load on your database and improve performance. It's a simple change that can pay off dividends as you scale up. You can take all of this a step further with SQL linting or automatically adding these migrations when a schema change is made.