-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce cross-cluster replication #30086
Labels
Comments
elasticmachine
added
:Distributed Indexing/CCR
Issues around the Cross Cluster State Replication features
Meta
labels
Apr 25, 2018
martijnvg
added a commit
to martijnvg/elasticsearch
that referenced
this issue
Apr 25, 2018
The shard changes api returns the minimum IndexMetadata version the leader index needs to have. If the leader side is behind on IndexMetadata version then follow shard task waits with processing write operations until the mapping has been fetched from leader index and applied in follower index in the background. The cluster state api is used to fetch the leader mapping and put mapping api to apply the mapping in the follower index. This works because put mapping api accepts fields that are already defined. Relates to elastic#30086
martijnvg
added
:Distributed Indexing/CCR
Issues around the Cross Cluster State Replication features
and removed
:Distributed Indexing/CCR
Issues around the Cross Cluster State Replication features
labels
Apr 26, 2018
Pinging @elastic/es-distributed |
dnhatn
pushed a commit
that referenced
this issue
May 9, 2018
dnhatn
pushed a commit
that referenced
this issue
May 10, 2018
martijnvg
added a commit
that referenced
this issue
May 28, 2018
The shard changes api returns the minimum IndexMetadata version the leader index needs to have. If the leader side is behind on IndexMetadata version then follow shard task waits with processing write operations until the mapping has been fetched from leader index and applied in follower index in the background. The cluster state api is used to fetch the leader mapping and put mapping api to apply the mapping in the follower index. This works because put mapping api accepts fields that are already defined. Relates to #30086
martijnvg
added a commit
that referenced
this issue
May 28, 2018
The shard changes api returns the minimum IndexMetadata version the leader index needs to have. If the leader side is behind on IndexMetadata version then follow shard task waits with processing write operations until the mapping has been fetched from leader index and applied in follower index in the background. The cluster state api is used to fetch the leader mapping and put mapping api to apply the mapping in the follower index. This works because put mapping api accepts fields that are already defined. Relates to #30086
martijnvg
added a commit
to martijnvg/elasticsearch
that referenced
this issue
Jun 5, 2018
* A single ChunksCoordinator is now in charge of following a shard and keeps on coordinating until the persistent task has been stopped. Whereas before a ChunksCoordinator's job was to process a finite amount of chunks and then a new ChunksCoordinator instance would process the next chunks. * Instead of consuming the chunks queue and waiting for all workers to complete, another background thread will continuously and chunks to the queue, so that the workers never run out of chunks to process if the leader shard has unprocessed write operations. Relates to elastic#30086
martijnvg
added a commit
to martijnvg/elasticsearch
that referenced
this issue
Jun 14, 2018
martijnvg
added a commit
to martijnvg/elasticsearch
that referenced
this issue
Jun 27, 2018
The current shard follow mechanism is complex and does not give us easy ways the have visibility into the system (e.g. why we are falling behind). The main reason why it is complex is because the current design is highly asynchronous. Also in the current model it is hard to apply backpressure other than reducing the concurrent reads from the leader shard. This PR has the following changes: * Rewrote the shard follow task to coordinate the shard follow mechanism between a leader and follow shard in a single threaded manner. This allows for better unit testing and makes it easier to add stats. * All write operations read from the shard changes api should be added to a buffer instead of directly sending it to the bulk shard operations api. This allows to apply backpressure. In this PR there is a limit that controls how many write ops are allowed in the buffer after which no new reads will be performed until the number of ops is below that limit. * The shard changes api includes the current global checkpoint on the leader shard copy. This allows reading to be a more self sufficient process; instead of relying on a background thread to fetch the leader shard's global checkpoint. * Reading write operations from the leader shard (via shard changes api) is a seperate step then writing the write operations (via bulk shards operations api). Whereas before a read would immediately result into a write. * The bulk shard operations api returns the local checkpoint on the follow primary shard, to keep the shard follow task up to date with what has been written. * Moved the shard follow logic that was previously in ShardFollowTasksExecutor to ShardFollowNodeTask. * Moved over the changes from elastic#31242 to make shard follow mechanism resilient from node and shard failures. Relates to elastic#30086
This was referenced Aug 29, 2018
dnhatn
added a commit
that referenced
this issue
Aug 31, 2018
This PR integrates Lucene soft-deletes(LUCENE-8200) into Elasticsearch. Highlight works in this PR include: - Replace hard-deletes by soft-deletes in InternalEngine - Use _recovery_source if _source is disabled or modified (#31106) - Soft-deletes retention policy based on the global checkpoint (#30335) - Read operation history from Lucene instead of translog (#30120) - Use Lucene history in peer-recovery (#30522) Relates #30086 Closes #29530 --- These works have been done by the whole team; however, these individuals (lexical order) have significant contribution in coding and reviewing: Co-authored-by: Adrien Grand jpountz@gmail.com Co-authored-by: Boaz Leskes b.leskes@gmail.com Co-authored-by: Jason Tedor jason@tedor.me Co-authored-by: Martijn van Groningen martijn.v.groningen@gmail.com Co-authored-by: Nhat Nguyen nhat.nguyen@elastic.co Co-authored-by: Simon Willnauer simonw@apache.org
dnhatn
added a commit
to dnhatn/elasticsearch
that referenced
this issue
Aug 31, 2018
This PR integrates Lucene soft-deletes(LUCENE-8200) into Elasticsearch. Highlight works in this PR include: - Replace hard-deletes by soft-deletes in InternalEngine - Use _recovery_source if _source is disabled or modified (elastic#31106) - Soft-deletes retention policy based on the global checkpoint (elastic#30335) - Read operation history from Lucene instead of translog (elastic#30120) - Use Lucene history in peer-recovery (elastic#30522) Relates elastic#30086 Closes elastic#29530 --- These works have been done by the whole team; however, these individuals (lexical order) have significant contribution in coding and reviewing: Co-authored-by: Adrien Grand <jpountz@gmail.com> Co-authored-by: Boaz Leskes <b.leskes@gmail.com> Co-authored-by: Jason Tedor <jason@tedor.me> Co-authored-by: Martijn van Groningen <martijn.v.groningen@gmail.com> Co-authored-by: Nhat Nguyen <nhat.nguyen@elastic.co> Co-authored-by: Simon Willnauer <simonw@apache.org>
dnhatn
added a commit
that referenced
this issue
Aug 31, 2018
This PR integrates Lucene soft-deletes(LUCENE-8200) into Elasticsearch. Highlight works in this PR include: - Replace hard-deletes by soft-deletes in InternalEngine - Use _recovery_source if _source is disabled or modified (#31106) - Soft-deletes retention policy based on the global checkpoint (#30335) - Read operation history from Lucene instead of translog (#30120) - Use Lucene history in peer-recovery (#30522) Relates #30086 Closes #29530 --- These works have been done by the whole team; however, these individuals (lexical order) have significant contribution in coding and reviewing: Co-authored-by: Adrien Grand <jpountz@gmail.com> Co-authored-by: Boaz Leskes <b.leskes@gmail.com> Co-authored-by: Jason Tedor <jason@tedor.me> Co-authored-by: Martijn van Groningen <martijn.v.groningen@gmail.com> Co-authored-by: Nhat Nguyen <nhat.nguyen@elastic.co> Co-authored-by: Simon Willnauer <simonw@apache.org>
martijnvg
added a commit
to martijnvg/elasticsearch
that referenced
this issue
Sep 4, 2018
Improve failure handling of retryable errors by retrying remote calls in a exponential backoff like manner. The delay between a retry would not be longer than the configured max retry delay. Also retryable errors will be retried indefinitely. Relates to elastic#30086
martijnvg
added a commit
to martijnvg/elasticsearch
that referenced
this issue
Sep 8, 2018
For correctness we need to verify whether the history uuid of the leader index shards never changes while that index is being followed. * The history UUIDs are recorded as custom index metadata in the follow index. * The follow api validates whether the current history UUIDs of the leader index shards are the same as the recorded history UUIDs. If not the follow api fails. * While a follow index is following a leader index; shard follow tasks on each shard changes api call verify whether their current history uuid is the same as the recorded history uuid. Relates to elastic#30086
martijnvg
added a commit
that referenced
this issue
Sep 12, 2018
Improve failure handling of retryable errors by retrying remote calls in a exponential backoff like manner. The delay between a retry would not be longer than the configured max retry delay. Also retryable errors will be retried indefinitely. Relates to #30086
martijnvg
added a commit
that referenced
this issue
Sep 12, 2018
Improve failure handling of retryable errors by retrying remote calls in a exponential backoff like manner. The delay between a retry would not be longer than the configured max retry delay. Also retryable errors will be retried indefinitely. Relates to #30086
martijnvg
added a commit
that referenced
this issue
Sep 12, 2018
For correctness we need to verify whether the history uuid of the leader index shards never changes while that index is being followed. * The history UUIDs are recorded as custom index metadata in the follow index. * The follow api validates whether the current history UUIDs of the leader index shards are the same as the recorded history UUIDs. If not the follow api fails. * While a follow index is following a leader index; shard follow tasks on each shard changes api call verify whether their current history uuid is the same as the recorded history uuid. Relates to #30086 Co-authored-by: Nhat Nguyen <nhat.nguyen@elastic.co>
martijnvg
added a commit
that referenced
this issue
Sep 12, 2018
For correctness we need to verify whether the history uuid of the leader index shards never changes while that index is being followed. * The history UUIDs are recorded as custom index metadata in the follow index. * The follow api validates whether the current history UUIDs of the leader index shards are the same as the recorded history UUIDs. If not the follow api fails. * While a follow index is following a leader index; shard follow tasks on each shard changes api call verify whether their current history uuid is the same as the recorded history uuid. Relates to #30086 Co-authored-by: Nhat Nguyen <nhat.nguyen@elastic.co>
ycombinator
added a commit
to ycombinator/elasticsearch
that referenced
this issue
Sep 18, 2018
) Follow up to elastic#33617. Relates to elastic#30086. As with all other per-index Monitoring collectors, the `CcrStatsCollector` should only collect stats for the indices the user wants to monitor. This list is controlled by the `xpack.monitoring.collection.indices` setting and defaults to all indices.
ycombinator
added a commit
that referenced
this issue
Sep 18, 2018
Backport of #33646 to 6.x. Original message: Follow up to #33617. Relates to #30086. As with all other per-index Monitoring collectors, the `CcrStatsCollector` should only collect stats for the indices the user wants to monitor. This list is controlled by the `xpack.monitoring.collection.indices` setting and defaults to all indices.
dnhatn
added a commit
that referenced
this issue
Jan 24, 2019
Today, the mapping on the follower is managed and replicated from its leader index by the ShardFollowTask. Thus, we should prevent users from modifying the mapping on the follower indices. Relates #30086
dnhatn
added a commit
that referenced
this issue
Jan 28, 2019
Today, the mapping on the follower is managed and replicated from its leader index by the ShardFollowTask. Thus, we should prevent users from modifying the mapping on the follower indices. Relates #30086
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Issue description
Original comment by @jasontedor:
The goal of cross-cluster replication is to enable users to replicate operations from one cluster to another cluster via active-passive replication. There are three drivers for providing CCR functionality in X-Pack:
The purpose of this meta-issue is to serve as a high-level plan for the initial implementation.
Our initial implementation will build a shard-to-shard pull-based model without automatic setup built on the transport layer.
In this model, shards on a following index are responsible for pulling from shards on the leader index. We chose this model because:
As far as automatic setup, for now users will have to manually set up a following index to a leader index, and would have to use on bootstrapping existing data via snapshot/restore. This does not necessarily mean that we will not have something more sophisticated when the first version ships, only that for the initial implementation we will not look at anything else. We can consider automatic setup (for example, for time-based indices) and remote recovery infrastructure later.
Utilizing the transport layer allows us to reuse existing infrastructure, and we can follow the path blazed by cross-cluster search for reading from remote clusters.
Basic infrastructure:
Things to do and investigate:
[ ] Old indices cannot be followed. The soft delete index setting should not allowed to be set on older indices (indices that have been created prior to when ccr was released) (Nhat). We discussed and decided not to do this for now.[ ] Until rollbacks are fully implemented for Lucene, lucene make contain seq# collisions. These can be resolved by making CCR terms aware - i.e., use terms as a secondary sort and us it to dedup operations. We should decide if needs to be done, especially when at some point Lucene rollbacks will be implemented (and are expected to clean these collisions). @dnhatn (yes). Discussed this with Martijn, the dedup logic we have in LuceneChangesSnapshot is sufficient.index.xpack.ccr.following_index
setting (Nhat) (Reject follow request if following setting not enabled on follower #32448).index.xpack.ccr.following_index
a final / internal setting. To avoid directly writing into the following index. Only CCR can write into a follow index. (@martijnvg) (yes)- [ ] Disable external usage of the follow api until we can safely execute a follow call after an unfollow call. (no)search.remote.*
settings tocluster.remote.*
(@jasontedor) (Generalize search.remote settings to cluster.remote #33413)search.remote.*
in the cluster state tocluster.remote.*
(@jasontedor) Add infrastructure to upgrade settings #33536, Upgrade remote cluster settings #33537The text was updated successfully, but these errors were encountered: