Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EPIC] Redesign application architecture #7

Merged
merged 41 commits into from
Jul 5, 2023
Merged

Conversation

marcosgz
Copy link
Owner

@marcosgz marcosgz commented Jul 5, 2023

[EPIC] Redesign application architecture

The current application architecture is not very well suited for the current elasticsearch and opensearch versions. The project was initially built with the idea of indexing types that are not supported anymore. But it is still necessary to have indexes composed of many data types, or better, many several data sources. So the new architecture we are introducing the context of Repository as a replacement of Type.

Another big change is the separation of the concerns of the index business logic from the lower level elasticsearch client. We are introducting a new layer called Esse::Transport as a proxy to the elasticsearch client. This layer will be responsible for the communication with the elasticsearch client and will be the only one to know about the elasticsearch client. The Esse::Transport will be responsible for the index management, index creation, index deletion, index aliases, index settings, index mappings, index documents, etc. And such methods are no longer isolated in the Esse::Index.elasticsearch interface, but they are now available in the Esse::Index interface.

There are many changes in this PR to be described in a single PR description. I'll try to describe the most important ones and in future add more documentation to the project.

Esse::Index.elasticsearch methods

Below is a list of methods that were removed from the Esse::Index.elasticsearch interface and are now available in the Esse::Index interface. Old methods are now deprecated and will be removed in the next major version.

Note that Esse::Index.backed is an alias to Esse::Index.elasticsearch. So if you are using Esse::Index.backed you will need to change accordingly.

  • Esse::Index.elasticsearch.aliases is now Esse::Index.aliases
  • Esse::Index.elasticsearch.indices is now Esse::Index.indices_pointing_to_alias
  • Esse::Index.elasticsearch.update_aliases! is now Esse::Index.update_aliases
  • Esse::Index.elasticsearch.update_aliases is now Esse::Index.update_aliases
  • Esse::Index.elasticsearch.create_index is now Esse::Index.create_index
  • Esse::Index.elasticsearch.create_index! is now Esse::Index.create_index
  • Esse::Index.elasticsearch.close! is now Esse::Index.close
  • Esse::Index.elasticsearch.close is now Esse::Index.close
  • Esse::Index.elasticsearch.open! is now Esse::Index.open
  • Esse::Index.elasticsearch.open is now Esse::Index.open
  • Esse::Index.elasticsearch.refresh is now Esse::Index.refresh
  • Esse::Index.elasticsearch.refresh! is now Esse::Index.refresh
  • Esse::Index.elasticsearch.delete_index is now Esse::Index.delete_index
  • Esse::Index.elasticsearch.delete_index! is now Esse::Index.delete_index
  • Esse::Index.elasticsearch.create_index is now Esse::Index.create_index
  • Esse::Index.elasticsearch.create_index! is now Esse::Index.create_index
  • Esse::Index.elasticsearch.update_mapping is now Esse::Index.update_mapping
  • Esse::Index.elasticsearch.update_mapping! is now Esse::Index.update_mapping
  • Esse::Index.elasticsearch.update_settings is now Esse::Index.update_settings
  • Esse::Index.elasticsearch.update_settings! is now Esse::Index.update_settings
  • Esse::Index.elasticsearch.reset_index! is now Esse::Index.reset_index
  • Esse::Index.elasticsearch.import! is now Esse::Index.import
  • Esse::Index.elasticsearch.import is now Esse::Index.import
  • Esse::Index.elasticsearch.bulk! is now Esse::Index.bulk
  • Esse::Index.elasticsearch.bulk is now Esse::Index.bulk
  • Esse::Index.elasticsearch.index! is now Esse::Index.index
  • Esse::Index.elasticsearch.index is now Esse::Index.index
  • Esse::Index.elasticsearch.update! is now Esse::Index.update
  • Esse::Index.elasticsearch.update is now Esse::Index.update
  • Esse::Index.elasticsearch.delete! is now Esse::Index.delete
  • Esse::Index.elasticsearch.delete is now Esse::Index.delete
  • Esse::Index.elasticsearch.count is now Esse::Index.count
  • Esse::Index.elasticsearch.exist? is now Esse::Index.exist?
  • Esse::Index.elasticsearch.find! is now Esse::Index.get
  • Esse::Index.elasticsearch.find is now Esse::Index.get

Esse::IndexType.elasticsearch methods

The rename of Esse::IndexType to Esse::Repository is already in the master branch done in the commit b7a63812. But I'm mentioning here because here is where we are actually dropping the concept of type and introducing the concept of repository.

Below is a list of methods that were removed from the Esse::IndexType.elasticsearch interface. Most of them are now available in the Esse::Index interface since the concept of type was changed to repository. Old methods are now deprecated and will be removed in the next major version.

Note the Esse::IndexType.backed is an alias to Esse::IndexType.elasticsearch. So if you are using Esse::IndexType.backed you will need to change accordingly.

  • Esse::IndexType.elasticsearch.import! is now Esse::Repository.import
  • Esse::IndexType.elasticsearch.import is now Esse::Repository.import
  • Esse::IndexType.elasticsearch.bulk! is now Esse::Index.bulk
  • Esse::IndexType.elasticsearch.bulk is now Esse::Index.bulk
  • Esse::IndexType.elasticsearch.index! is now Esse::Index.index
  • Esse::IndexType.elasticsearch.index is now Esse::Index.index
  • Esse::IndexType.elasticsearch.index_document is now Esse::Index.index
  • Esse::IndexType.elasticsearch.update! is now Esse::Index.update
  • Esse::IndexType.elasticsearch.update is now Esse::Index.update
  • Esse::IndexType.elasticsearch.delete! is now Esse::Index.delete
  • Esse::IndexType.elasticsearch.delete is now Esse::Index.delete
  • Esse::IndexType.elasticsearch.delete_document is now Esse::Index.delete
  • Esse::IndexType.elasticsearch.count is now Esse::Index.count
  • Esse::IndexType.elasticsearch.exist? is now Esse::Index.exist?
  • Esse::IndexType.elasticsearch.find! is now Esse::Index.get
  • Esse::IndexType.elasticsearch.find is now Esse::Index.get

Referer to the Esse::Repository instead of Esse::IndexType if for some reason you are using the master branch instead of the published gem.

Esse::Serializer -> Esse::Document

The Esse::Serializer is now Esse::Document. The Esse::Serializer was a bad name for the class that is responsible for the serialization of the data to be indexed. The Esse::Document is a better name for the purpose of the class that is formatting the data to be indexed.

We are continue to support Hash objects as a valid document. They will be automatically converted to Esse::HashDocument objects. But I recommend to inherit from Esse::Document instead of using Hash objects.

The document has also routing support. You can define the routing in the document class:

class AddressDocument < Esse::Document
  def routing
    # return the routing value
    object.state_abbr
  end
end

Or add _routing value in case of Hash document:

{
  _id: 1,
  _routing: 'CA',
  street: '123 Main St',
}

Event system

New events were added to the pub/sub system. The events are:

  • 'elasticsearch.bulk'
  • 'elasticsearch.close'
  • 'elasticsearch.create_index'
  • 'elasticsearch.delete_index'
  • 'elasticsearch.execute_search_query'
  • 'elasticsearch.index_exist'
  • 'elasticsearch.open'
  • 'elasticsearch.refresh'
  • 'elasticsearch.search'
  • 'elasticsearch.update_aliases'
  • 'elasticsearch.update_mapping'
  • 'elasticsearch.update_settings'
  • 'elasticsearch.index'
  • 'elasticsearch.update'
  • 'elasticsearch.delete'
  • 'elasticsearch.exist'
  • 'elasticsearch.count'
  • 'elasticsearch.get'

The events are triggered in the Esse::Transport layer.

Esse::Cluster

Cluster now has a new method #readonly= to set the cluster in read-only mode. This is useful when you want to prevent the cluster from indexing data. This is useful when you want to perform a maintenance in the cluster and you want to prevent the cluster from indexing data while the maintenance is being performed.

Esse::Index definition

The core of this project is the Esse::Index class. The Esse::Index class is responsible for the index definition like index name, index settings, index mappings, index aliases, repository, document format, etc. The Esse::index class is the interface to perform any operation in the elasticseach index.

The next sections will describe the most important changes in the Esse::Index class.

index_version -> index_prefix

The attribute self.index_version= was renamed to self.index_prefix=. Prefix is a better name for the attribute that is used to compose the index name. The index name is composed by the prefix and the version. The version is a timestamp that is automatically generated by the Esse::Index class.

define_type -> repository

The define_type method was renamed to repository. The repository method is responsible for defining the repository class. Repository should contain the collection and document definitions. The collection is responsible for streaming the data to be indexed. The document is responsible for formatting the data to be indexed. They are not allowed to be defined in the Esse::Index class anymore.

class GeosIndex < Esse::Index
  repository :city do
    collection Collections::CityCollection
    document Documents::CityDocument
  end

  repository :county do
    collection Collections::CountyCollection
    document Documents::CountyDocument
  end
end

serializer -> document

The serializer method was renamed to document. The document method is responsible for defining the document class. The document class is responsible for formatting the data to be indexed.

class GeosIndex < Esse::Index
  repository :city do
    collection Collections::CityCollection
    
    # Define using a class that inherits from Esse::Document
    document Documents::CityDocument
    
    # or using a Hash object
    document do |city, **context|
      {
        _id: city.uuid,
        name: city.name,
      }
    end
  end
end

@marcosgz marcosgz merged commit 35e9803 into master Jul 5, 2023
@marcosgz marcosgz deleted the marcosgz/redesign branch July 5, 2023 15:59
marcosgz added a commit that referenced this pull request Aug 23, 2024
marcosgz added a commit that referenced this pull request Aug 23, 2024
* during import, use the doc directly should reduce amount of allocated objects

* feat: start updating bulk to work with std hash as default

* reduce memory usage by creating raw hash instead of coercing them to doc/header doc

* keep bulk order

* minor refactoring

* fix: do not compare document source, they may have attributes dinamically generated

* feat: do not create new array when values are already a doc instances

* feat: simplify data structure

* chore: rename lazy_attributes to eager_load_lazy_attributes

* feat: refactoring by reusing each_serialized_batch

* feat: update rexml close #7
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant