Skip to content

Zero-downtime reindexing of ActiveRecord into Elasticsearch

License

Notifications You must be signed in to change notification settings

carwow/zelastic

Repository files navigation

Zelastic

Zero-downtime Elasticsearch tooling for managing indices and indexing from ActiveRecord with PostgreSQL to Elasticsearch.

Installation

Add this line to your application's Gemfile:

gem 'zelastic'

And then execute:

$ bundle install

Or install it yourself as:

$ gem install zelastic

Usage

Setup

For each ActiveRecord scope you want to index, you'll need a configuration:

class MyModel < ApplicationRecord
  ...
end

MyModelIndex = Zelastic.new(
  client: Elasticsearch::Client.new(...),
  mapping: {
    ...
  },
  data_source: MyModel.some_scope
) do |my_model|
  # this block transforms an instance of MyModel into the hash which goes into Elasticsearch
  {
    attr_1: my_model.attr_1,
    attr_2: my_model.attr_2,
    attr_3: my_model.attr_3
  }
end

You can also override some defaults, if you wish:

  • index_settings: by default there aren't any, but you can provide, for example, custom analysers here
  • read_alias: by default this is the table name of the data_source
  • write_alias: by default this is the read_alias, with _write appended

If you pass an array to as the client argument, all writes will be applied to every client in the array.

Normal usage

You'll need to make sure the following gets run whenever an instance of MyModel is updated:

indexer = Zelastic::Indexer.new(MyModelIndex)
indexer.index_record(my_model)

And when an instance of MyModel gets deleted:

indexer = Zelastic::Indexer.new(MyModelIndex)
indexer.delete_by_id(my_model.id)

There's also some bulk-change methods which may be useful:

indexer = Zelastic::Indexer.new(MyModelIndex)
indexer.index_batch(MyModel.where(id: [...]))
indexer.delete_by_ids([1, 2, 3])
indexer.delete_by_query(elasticsearch_query)

Re-indexing

Sometimes you'll need to do a full reindex - maybe because of a bug which left the index in a bad state, or because of a new index definition, or...anything else.

We use index aliases to make it easy to do zero-downtime reindexing. The actual indexes are <read_alias>_<random>. The read_alias points to the single "current" index. The write_alias is usually the same as the read alias, except during re-indexing, where it points at both the old and new indices, so both receive writes. The following steps run a full reindex:

  1. new_name = SecureRandom.hex(3)
  2. index_manager = Zelastic::IndexManager.new(MyModelIndex, client: Elasticsearch::Client.new(...))
  3. index_manager.create_index(new_name)
  4. index_manager.populate_index(new_name, batch_size: 3000)
  5. Check that the new index is looking alrightish
  6. index_manager.switch_read_index(new_name)
  7. Probably do some more checks, then
  8. index_manager.stop_dual_writes
  9. index_manager.cleanup_old_indices

The client keyword argument to Zelastic::IndexManager.new is optional. It defaults to the client passed to Zelastic::Config.new, if one client is passed, or the first client in the array, if an array is passed to Zelastic::Config.new.

Development

After checking out the repo, run bin/setup to install dependencies. Then, run rake spec to run the tests. You can also run bin/console for an interactive prompt that will allow you to experiment.

To install this gem onto your local machine, run bundle exec rake install. To release a new version, update the version number in version.rb, and then run bundle exec rake release, which will create a git tag for the version, push git commits and tags, and push the .gem file to rubygems.org.

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/carwow/zelastic.

License

The gem is available as open source under the terms of the MIT License.

About

Zero-downtime reindexing of ActiveRecord into Elasticsearch

Resources

License

Stars

Watchers

Forks

Packages

No packages published