Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Backport 2.11] [Backport 2.x] Segment Replication - Fix ShardLockObtained error during corruption cases #10370 #10429

Merged
merged 1 commit into from
Oct 5, 2023

Conversation

opensearch-trigger-bot[bot]
Copy link
Contributor

Backport cdf5e1a from #10418.

…ng corruption cases #10370 (#10418)

* Segment Replication - Fix ShardLockObtained error during corruption cases (#10370)

* Segment Replication - Fix ShardLockObtained error during corruption cases

This change fixes a bug where shards could not be recreated locally after corruption.
This occured because the store was not decref'd to 0 if the commit on close would fail
with a corruption exception.

Signed-off-by: Marc Handalian <handalm@amazon.com>

* Remove exra logs

Signed-off-by: Marc Handalian <handalm@amazon.com>

* Remove flaky assertion on store refcount

Signed-off-by: Marc Handalian <handalm@amazon.com>

* Remove flaky test.

Signed-off-by: Marc Handalian <handalm@amazon.com>

* PR Feedback.

Remove hacky handling of corruption when fetching metadata.  This will now check for store corruption
when replication has failed and fail the shard accordingly.

This commit also fixes logging in NRTReplicationEngine.

Signed-off-by: Marc Handalian <handalm@amazon.com>

* Fix unit test.

Signed-off-by: Marc Handalian <handalm@amazon.com>

* Fix test failure testSegRepSucceedsOnPreviousCopiedFiles.

This test broke because we invoked target.indexShard on a closed replicationTarget.
In these cases we can assume the store is not corrupt.

Signed-off-by: Marc Handalian <handalm@amazon.com>

* spotless

Signed-off-by: Marc Handalian <handalm@amazon.com>

* Revert flaky IT

Signed-off-by: Marc Handalian <handalm@amazon.com>

* Fix flakiness failure by expecting RTE when check index fails.

Signed-off-by: Marc Handalian <handalm@amazon.com>

* reintroduce ITs and use recoveries API instead of waiting on shard state.

Signed-off-by: Marc Handalian <handalm@amazon.com>

* Fix edge case where flush failures would not get reported as corruption.

Signed-off-by: Marc Handalian <handalm@amazon.com>

---------

Signed-off-by: Marc Handalian <handalm@amazon.com>

* Fix breaking change only on main.

Signed-off-by: Marc Handalian <handalm@amazon.com>

---------

Signed-off-by: Marc Handalian <handalm@amazon.com>
(cherry picked from commit cdf5e1a)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
@github-actions
Copy link
Contributor

github-actions bot commented Oct 5, 2023

Compatibility status:

Checks if related components are compatible with change fe82d17

Incompatible components

Skipped components

Compatible components

Compatible components: [https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/custom-codecs.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/performance-analyzer.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/neural-search.git]

@github-actions
Copy link
Contributor

github-actions bot commented Oct 5, 2023

Gradle Check (Jenkins) Run Completed with:

@codecov
Copy link

codecov bot commented Oct 5, 2023

Codecov Report

❗ No coverage uploaded for pull request base (2.11@049590a). Click here to learn what that means.
The diff coverage is n/a.

@@           Coverage Diff           @@
##             2.11   #10429   +/-   ##
=======================================
  Coverage        ?   71.00%           
  Complexity      ?    58580           
=======================================
  Files           ?     4829           
  Lines           ?   276344           
  Branches        ?    40578           
=======================================
  Hits            ?   196223           
  Misses          ?    63402           
  Partials        ?    16719           

@kotwanikunal kotwanikunal merged commit f164bf5 into 2.11 Oct 5, 2023
@github-actions github-actions bot deleted the backport/backport-10418-to-2.11 branch October 5, 2023 23:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant