Skip to content
Phil edited this page Feb 21, 2024 · 26 revisions

Designing indexes for full-text search queries in PostgreSQL is a delicate art. Here are some tips to help guide developers to find an index definition that works for their application.

Please update this page with any new tips or tricks you might discover.

Option 1: Using a dedicated index

You must use GIN or GIST indexes for full text search

Essentially you look in a SQL statement for the operator (% or @@) and grab the operand that isn't the query.

Then you do something like this:

CREATE INDEX my_cool_index ON my_cool_table USING gist (([expression]));

or (for trigram only) (Trigram seems to underperform the other types).

CREATE INDEX my_cool_index ON my_cool_table USING gist (([expression]) gist_trgm_ops);

Searching against multiple columns and/or weighted columns

If you are using multiple columns and/or weighted columns then you want to grab the whole expression; that is, you should have one index, not one index per column. You also want to include the setweight function as part of the index if you are utilizing weighted columns. For example, if you have a Page model with the following configuration:

pg_search_scope :search_full_text,
                :against => {
                  name: "A",
                  keywords: "B",
                  body: "C"
                },
                :using => {
                  :tsearch => {
                    :prefix => true,
                    :any_word => true,
                    :dictionary => "english",
                  }
                }

then your index will end up being something like this:

CREATE INDEX "ts_vector_index_on_pages" ON "pages" USING gin (
  setweight(to_tsvector('english', coalesce("pages"."name"::text, '')), 'A') || 
  setweight(to_tsvector('english', coalesce("pages"."keywords"::text, '')), 'B') || 
  setweight(to_tsvector('english', coalesce("pages"."body"::text, '')), 'C')
)

For apps using multisearchable and a pg_search_documents table, the index migration would look something like this:

# IMPORTANT you must substitute in the correct to_tsvector
# expression. Look for "to_tsvector" in your your SQL query logs:
add_index :pg_search_documents, YOUR_EXPRESSION_HERE, using: :gin, name: "index_pg_search_documents_on_content"

# For example for a tsvector column
# add_index :pg_search_documents, %[to_tsvector('simple', coalesce("pg_search_documents"."content"::text, ''))], using: :gin, name: "index_pg_search_documents_on_content"

# For example for a content column where you are `using: :trigram`
execute <<~SQL
  CREATE INDEX pg_search_documents_on_content ON pg_search_documents USING gin(coalesce(content, ''::text) gin_trgm_ops)
SQL

Note: not sure about multisearchable, but for pg_search_documents, if you’re setting it against multiple fields, you need to create a multicolumn index that includes all those fields. This blog post describes one approach to this.

How to create a model scope and use multisearch index

First, you add the index like this:

add_index :pg_search_documents, %[to_tsvector('simple', coalesce("pg_search_documents"."content"::text, ''))], using: :gin, name: "index_pg_search_documents_on_content"

Second, you change your model to multisearch and add the scope:

class User < ApplicationRecord
  multisearchable(
    against: [:name]
  )
  
  scope :full_text_search_for, -> (term) do
    joins(:pg_search_document).merge(
      PgSearch.multisearch(term).where(searchable_type: klass.to_s)
    )
  end
end

Then, you can use the scope and the index:

User.full_text_search_for('term')

Option 2: Adding a dedicated tsv column and trigger

Alternatively, you can use a trigger to automatically populate a column of type tsvector with the [expression] from above, and then use the :tsvector_column option in pg_search to tell it to use that column instead. Then you'd make a very similar index as the above (option 1) but put the column in instead of [expression]

You'll need enough rows in your table for EXPLAIN ANALYZE to show up using the index, and the index is brittle on the expression, so if you change the pg_search configuration, you'll have to change the index.

Begin by making sure you have a tsvector column in your table:

add_column :posts, :tsv_body, :tsvector

Add an index on the tsv_body column. As of ActiveRecord 4.0, you can add indexes and specify the "type" of index easily:

add_index(:posts, :tsv_body, using: 'gin')

For the triggers, you can use hair_trigger to add database triggers via your migrations:

create_trigger(compatibility: 1).on(:posts).before(:insert, :update) do
  "new.tsv_body := to_tsvector('pg_catalog.english', coalesce(new.body,''));"
end

You can also execute plain SQL for adding trigger function in your migration code:

say_with_time("Adding trigger function on posts for updating tsv_body column") do

sql = <<-MIGRATION
  CREATE TRIGGER tsvectorupdate BEFORE INSERT OR UPDATE
  ON posts FOR EACH ROW EXECUTE PROCEDURE
  tsvector_update_trigger(tsv_body, 'pg_catalog.simple', body);
MIGRATION

  execute(sql)
end

To update tsv_body column for existing records, you can simply run

Post.find_each(&:touch)

or

execute("UPDATE posts SET id = id")

The latter will be faster since it won't load each post into memory as a Ruby object.

Don't forget to add the column to your pg_search specification, or the DB won't use your index.

    using: {
      tsearch: {
        tsvector_column: "tsv_body"
      }
    }

To ensure the DB is using your index, copy the search's SELECT query from your development log and run an EXPLAIN (or use Rails' explain function if you prefer). Look for a bitmap scan on your tsv index. If your index doesn't appear in the EXPLAIN output, it's not being used.