Skip to content
Marcos G. Zimmermann edited this page Sep 26, 2023 · 12 revisions

The Esse::Index class an abstraction of an Elasticsearch index. It's responsible for defining the index name, the index settings, the index mappings, datasources and its documents.

Here is an minimal example of an index:

class ArticlesIndex < Esse::Index
  repository :article do
    collection do |**context, &block|
      batch = [
        { id: 1, title: 'Article 1', body: 'Lorem ipsum dolor sit amet, consectetur adipiscing elit.' },
        { id: 2, title: 'Article 2', body: 'Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium.' },
      ]
      batch.delete_if { |item| item[:id] != context[:id] } if context[:id] # Just to simulate a filter
      block.call(batch, **context)
    end

    document do |item, **_context|
      {
        _id: item[:id], # The _id is a convention to define the document id. More on this later.
        title: item[:title],
        body: item[:body],
      }
    end
  end
end

Now, let's see what's happening here:

  • The ArticlesIndex class inherits from Esse::Index and defines a respository block.
  • The respository block defines a new repo identified by :article with a collection and a document.
  • The collection block is responsible for fetching data from a datasource. It may receive a context that can be used to filter the data and a block that must be called with the fetched data
  • The document block is responsible for transforming each item of collection into a Esse::Document. Note that we are using a Hash as a document to keep things simpler, but under the hood, it will be converted to a generic Esse::HashDocument object. Always prefer to implement your own Esse::Document class.
> ArticlesIndex.documents
=> #<Enumerator: ...>
> ArticlesIndex.documents.to_a
=> [
  #<Esse::HashDocument @object={:_id=>1, :title=>"Article 1", :body=>"Lorem ipsum dolor sit amet, consectetur adipiscing elit."}, @options={}>,
  #<Esse::HashDocument @object={:_id=>2, :title=>"Article 2", :body=>"Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium."}, @options={}>
]
> ArticlesIndex.documents(id: 1).to_a
=> [
  #<Esse::HashDocument @object={:_id=>1, :title=>"Article 1", :body=>"Lorem ipsum dolor sit amet, consectetur adipiscing elit."}, @options={}>
]

Now let's go deeper in each part of the index definition.

Repository

The repository is used to define a data source for the index. It can be a database table, a file, a web service, etc. The repository is responsible for fetching the data, enriching it and transforming it into documents. One index can have multiple repositories.

Defining a repository with a block:

class GeosIndex < Esse::Index
  repository :county do
    # ...
  end

  repository :city do
    # ...
  end
end

The identifier of the repository must be unique within the index. As default, a constantized version of the identifier will be used as the repository class name. In the example above, the :county repository will be represented by the GeosIndex::County class and the :city repository will be represented by the GeosIndex::City class. You can also access the repositories using GeosIndex.repo method:

> GeosIndex.repo(:county) == GeosIndex::County
=> true

> GeosIndex.repo_hash
=> {"county"=>GeosIndex::County, "city"=>GeosIndex::City}

If you don't want to generate the repo constant, you can pass const: false to the repository method:

class GeosIndex < Esse::Index
  repository :county, const: false do
    # ...
  end
end
GeosIndex.constants.include?(:County)
=> false

Collection

The collection block is responsible for fetching the data from the datasource. It must receive a context keyword-arguments and a block that must be called with the fetched data. The context can be anything you want, but it's important to implement :id filter to fetch a single document.

A collection can be defined through a block or a class that implements the Enumerable interface.

# app/indices/geos_index.rb
class GeosIndex < Esse::Index
  repository :county do
    collection do |**context, &block|
      # ...
    end
  end

  repository :city do
    collection Collections::CityCollection
  end
end

# app/indices/geos_index/collections/city_collection.rb
class GeosIndex::Collections::CityCollection
  include Enumerable

  # @param [Hash] context
  def initialize(**context)
    @context = context
  end

  # @yield [Array<Object>] batch of objects
  def each(&block)
    # ...
  end
end

Document

The document block is responsible for coerce each item of collection into a Esse::Document. It will always receive each item of the collection and a context keyword-arguments. The context can be anything you want, apply filters, policies, etc.

A document can be defined through a block or a class that implements the Esse::Document interface.

# app/indices/geos_index.rb
class GeosIndex < Esse::Index
  repository :county do
    document do |item, **context|
      { _id: item.id, name: item.name }
    end
  end

  repository :city do
    document Documents::CityDocument
  end
end

# app/indices/geos_index/documents/city_document.rb
class GeosIndex::Documents::CityDocument < Esse::Document
  # @return [String]
  def id
    object.id
  end

  # @return [Hash]
  def source
    # You can access the context using the `options` method
    { name: object.name }
  end
end
Clone this wiki locally