-
Notifications
You must be signed in to change notification settings - Fork 1
Index
The Esse::Index
class an abstraction of an Elasticsearch index. It's responsible for defining the index name, the index settings, the index mappings, datasources and its documents.
Here is an minimal example of an index:
class ArticlesIndex < Esse::Index
repository :article do
collection do |**context, &block|
batch = [
{ id: 1, title: 'Article 1', body: 'Lorem ipsum dolor sit amet, consectetur adipiscing elit.' },
{ id: 2, title: 'Article 2', body: 'Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium.' },
]
batch.delete_if { |item| item[:id] != context[:id] } if context[:id] # Just to simulate a filter
block.call(batch, **context)
end
document do |item, **_context|
{
_id: item[:id], # The _id is a convention to define the document id. More on this later.
title: item[:title],
body: item[:body],
}
end
end
end
Now, let's see what's happening here:
- The
ArticlesIndex
class inherits fromEsse::Index
and defines arespository
block. - The
respository
block defines a new repo identified by:article
with acollection
and adocument
. - The
collection
block is responsible for fetching data from a datasource. It may receive acontext
that can be used to filter the data and ablock
that must be called with the fetched data - The
document
block is responsible for transforming each item of collection into a Esse::Document. Note that we are using aHash
as a document to keep things simpler, but under the hood, it will be converted to a genericEsse::HashDocument
object. Always prefer to implement your ownEsse::Document
class.
> ArticlesIndex.documents
=> #<Enumerator: ...>
> ArticlesIndex.documents.to_a
=> [
#<Esse::HashDocument @object={:_id=>1, :title=>"Article 1", :body=>"Lorem ipsum dolor sit amet, consectetur adipiscing elit."}, @options={}>,
#<Esse::HashDocument @object={:_id=>2, :title=>"Article 2", :body=>"Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium."}, @options={}>
]
> ArticlesIndex.documents(id: 1).to_a
=> [
#<Esse::HashDocument @object={:_id=>1, :title=>"Article 1", :body=>"Lorem ipsum dolor sit amet, consectetur adipiscing elit."}, @options={}>
]
Now let's go deeper in each part of the index definition.
The repository
is used to define a data source for the index. It can be a database table, a file, a web service, etc. The repository is responsible for fetching the data, enriching it and transforming it into documents. One index can have multiple repositories.
Defining a repository with a block:
class GeosIndex < Esse::Index
repository :county do
# ...
end
repository :city do
# ...
end
end
The identifier of the repository must be unique within the index. As default, a constantized version of the identifier will be used as the repository class name. In the example above, the :county
repository will be represented by the GeosIndex::County
class and the :city
repository will be represented by the GeosIndex::City
class. You can also access the repositories using GeosIndex.repo
method:
> GeosIndex.repo(:county) == GeosIndex::County
=> true
> GeosIndex.repo_hash
=> {"county"=>GeosIndex::County, "city"=>GeosIndex::City}
If you don't want to generate the repo constant, you can pass const: false
to the repository
method:
class GeosIndex < Esse::Index
repository :county, const: false do
# ...
end
end
GeosIndex.constants.include?(:County)
=> false
The collection
block is responsible for fetching the data from the datasource. It must receive a context
keyword-arguments and a block
that must be called with the fetched data. The context
can be anything you want, but it's important to implement :id
filter to fetch a single document.
A collection can be defined through a block or a class that implements the Enumerable
interface.
# app/indices/geos_index.rb
class GeosIndex < Esse::Index
repository :county do
collection do |**context, &block|
# ...
end
end
repository :city do
collection Collections::CityCollection
end
end
# app/indices/geos_index/collections/city_collection.rb
class GeosIndex::Collections::CityCollection
include Enumerable
# @param [Hash] context
def initialize(**context)
@context = context
end
# @yield [Array<Object>] batch of objects
def each(&block)
# ...
end
end
The document
block is responsible for coerce each item of collection into a Esse::Document. It will always receive each item of the collection and a context
keyword-arguments. The context
can be anything you want, apply filters, policies, etc.
A document can be defined through a block or a class that implements the Esse::Document
interface.
# app/indices/geos_index.rb
class GeosIndex < Esse::Index
repository :county do
document do |item, **context|
{ _id: item.id, name: item.name }
end
end
repository :city do
document Documents::CityDocument
end
end
# app/indices/geos_index/documents/city_document.rb
class GeosIndex::Documents::CityDocument < Esse::Document
# @return [String]
def id
object.id
end
# @return [Hash]
def source
# You can access the context using the `options` method
{ name: object.name }
end
end
Elasticsearch 5.x or lower requires a type
to be defined for each document. You can define the type using the #type
method or by rendering _type: 'doc_type'
in hash documents.
In the document level you can also define the #routing
method to define the routing of the document.
Please look at the Esse::Document source code to see all the methods you can override.
As most of Ruby applications are built on Rails, I'm going to show how to create a Esse Index loading data from an ActiveRecord model.
# app/indices/geographies_index.rb
class GeographiesIndex < Esse::Index
repository :city do
collection do |**context, &block|
query = ::City.includes(:state)
query = query.where(id: context[:id]) if context[:id]
query = query.where(state_abbr: context[:state_abbr]) if context[:state_abbr]
query.find_in_batches(&block)
end
document do |city, **_context|
{
_id: city.id,
name: city.name,
state: {
id: city.state.id,
name: city.state.name,
}
}
end
end
end
But thanks to the plugin system, we can use the esse-active_record and simplify the implementation above with a few lines of code:
# app/indices/geographies_index.rb
class GeographiesIndex < Esse::Index
plugin :active_record
repository :city do
collection ::City.includes(:state) do
scope :state_abbr, ->(abbr) { where(state_abbr: abbr) }
end
document Documents::CityDocument
end
end
Much better, huh? The esse-active_record
plugin will automatically create a collection
block. You can define multiple scopes to handle the context
filters. There is also a pretty nice feature named batch_context
that can be useful to preload associations. Please refer to the esse-active_record documentation for more details and more examples.
The index settings are responsible for defining the index settings It can be defined using the settings
method. The settings
method accepts a block or a Hash
as argument.
class ArticlesIndex < Esse::Index
settings number_of_shards: 2, number_of_replicas: 1
end
# or
class ArticlesIndex < Esse::Index
settings do
# Usefull when you need to define dynamic settings
{
number_of_shards: 2,
number_of_replicas: 1,
}
end
end
If you want something more complex, you can pass as argument any object. The object must respond to #to_h
and return a Hash
with the settings definition.
Note that the settings
can also be defined in the Esse.config.custer
. The global settings
will be deep merged with the settings defined in the index.
# config/initializers/esse.rb
Esse.configure do |config|
config.cluster do |cluster|
cluster.settings = {
number_of_shards: 2,
number_of_replicas: 0,
refresh_interval: '30s',
}
end
end
# app/indices/articles_index.rb
class ArticlesIndex < Esse::Index
settings number_of_replicas: 1
end
ArticlesIndex.settings_hash
# => {:settings=>{:number_of_shards=>2, :number_of_replicas=>1, :refresh_interval=>"30s"}}
The index mappings are responsible for defining the index mappings It can be defined using the mappings
method:
class ArticlesIndex < Esse::Index
mappings do
{
properties: {
title: { type: 'text' },
body: { type: 'text' },
}
}
end
end
If you want something more complex, you can pass as argument any object. The object must respond to #to_h
and return a Hash
with the mappings definition.
Note that the mappings
can also be defined in the Esse.config.custer
. The global mappings
will be deep merged with the mappings defined in the index.
# config/initializers/esse.rb
Esse.configure do |config|
config.cluster do |cluster|
cluster.mappings = {
dynamic_templates: [
{
strings_as_keywords: {
mapping: {
ignore_above: 1024,
type: 'keyword',
},
match_mapping_type: 'string',
},
},
],
properties: {
created_at: { type: 'date' },
},
}
end
end
# app/indices/articles_index.rb
class ArticlesIndex < Esse::Index
mappings do
{
properties: {
title: { type: 'text' },
body: { type: 'text' },
}
}
end
end
ArticlesIndex.mappings_hash
# => {:mappings=>
# {:dynamic_templates=>[{:strings_as_keywords=>{:mapping=>{:ignore_above=>1024, :type=>"keyword"}, :match_mapping_type=>"string"}}],
# :properties=>{:created_at=>{:type=>"date"}, :title=>{:type=>"text"}, :body=>{:type=>"text"}}}}
If you are working with elasticsearch 5.x and lower, you must define a type
in the mappings' properties. Please adjust accordingly your needs.