Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Segment Replication] Handling resource close #6795

Merged
merged 2 commits into from
Mar 23, 2023

Conversation

dreamer-89
Copy link
Member

@dreamer-89 dreamer-89 commented Mar 22, 2023

Description

Changes

  1. Store decRef handling. Inside finalize step, sometimes the call to fetch store reference fails due to various reasons (ReplicationTarget refCounts to 0 - segrep cancelleation, store already closed - close/delete index etc). This results in skipping the incRef() on ontained store reference. The subsequent call to decRef inside finalize block thus results in extra decRef call on store object, causing failures in store references outside of segrep (index deletion, read store etc).

  2. Handle closed shard. When shard is already closed (delete index/close index), the call to obtain store reference fails as resources are released by event listeners. We check for event cancellation before every step on target but it is not sufficient. The propose change here is to catch such exceptions and handle them gracefully.

  3. Obtain safe reference of IndexService while results in IndexNotFoundException for deleted/closed indices. This is better than null pointer exception. Will raise a separate for handling on target and not fail segment replication and thus recovery.

  4. Changes ReplicationFailedException to OpenSearchException on ReplicationTarget which closely resembles to situation when target is already decRef'ed (closed).

Issues Resolved

#6776
#6795

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Suraj Singh <surajrider@gmail.com>
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@dreamer-89
Copy link
Member Author

Gradle Check (Jenkins) Run Completed with:

REPRODUCE WITH: ./gradlew ':server:test' --tests "org.opensearch.indices.replication.OngoingSegmentReplicationsTests.testCancelReplication" -Dtests.seed=1ED659A541FAB9F9 -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=sq -Dtests.timezone=Antarctica/Rothera -Druntime.java=19

org.opensearch.indices.replication.OngoingSegmentReplicationsTests > testCancelReplication FAILED
    java.lang.NullPointerException: Cannot invoke "org.opensearch.index.IndexService.getShard(int)" because "indexService" is null
        at __randomizedtesting.SeedInfo.seed([1ED659A541FAB9F9:E498644503DDF453]:0)
        at org.opensearch.indices.replication.OngoingSegmentReplications.getCachedCopyState(OngoingSegmentReplications.java:84)
        at org.opensearch.indices.replication.OngoingSegmentReplications.prepareForReplication(OngoingSegmentReplications.java:149)
        at org.opensearch.indices.replication.OngoingSegmentReplicationsTests.testCancelReplication(OngoingSegmentReplicationsTests.java:147)

Signed-off-by: Suraj Singh <surajrider@gmail.com>
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@codecov-commenter
Copy link

Codecov Report

Merging #6795 (7b95eb6) into main (f40fa82) will increase coverage by 0.01%.
The diff coverage is 50.00%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

@@             Coverage Diff              @@
##               main    #6795      +/-   ##
============================================
+ Coverage     70.75%   70.77%   +0.01%     
+ Complexity    59230    59227       -3     
============================================
  Files          4810     4810              
  Lines        283487   283491       +4     
  Branches      40877    40879       +2     
============================================
+ Hits         200577   200635      +58     
+ Misses        66408    66350      -58     
- Partials      16502    16506       +4     
Impacted Files Coverage Δ
...rg/opensearch/indices/recovery/RecoveryTarget.java 73.07% <ø> (-0.35%) ⬇️
...s/replication/SegmentReplicationTargetService.java 48.10% <0.00%> (ø)
.../indices/replication/SegmentReplicationTarget.java 90.26% <42.85%> (-3.20%) ⬇️
...ndices/replication/OngoingSegmentReplications.java 90.36% <100.00%> (ø)
.../indices/replication/common/ReplicationTarget.java 86.66% <100.00%> (+6.66%) ⬆️

... and 488 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@dreamer-89 dreamer-89 merged commit e58a33d into opensearch-project:main Mar 23, 2023
@dreamer-89 dreamer-89 added the backport 2.x Backport to 2.x branch label Mar 23, 2023
opensearch-trigger-bot bot pushed a commit that referenced this pull request Mar 23, 2023
* [Segment Replication] Fix resource close handling

Signed-off-by: Suraj Singh <surajrider@gmail.com>

* Fix unit tests and change exception type

Signed-off-by: Suraj Singh <surajrider@gmail.com>

---------

Signed-off-by: Suraj Singh <surajrider@gmail.com>
(cherry picked from commit e58a33d)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
dreamer-89 added a commit to dreamer-89/OpenSearch that referenced this pull request Mar 23, 2023
* [Segment Replication] Fix resource close handling

Signed-off-by: Suraj Singh <surajrider@gmail.com>

* Fix unit tests and change exception type

Signed-off-by: Suraj Singh <surajrider@gmail.com>

---------

Signed-off-by: Suraj Singh <surajrider@gmail.com>
dreamer-89 pushed a commit that referenced this pull request Mar 23, 2023
* [Segment Replication] Fix resource close handling



* Fix unit tests and change exception type



---------


(cherry picked from commit e58a33d)

Signed-off-by: Suraj Singh <surajrider@gmail.com>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
mitrofmep pushed a commit to mitrofmep/OpenSearch that referenced this pull request Apr 5, 2023
* [Segment Replication] Fix resource close handling

Signed-off-by: Suraj Singh <surajrider@gmail.com>

* Fix unit tests and change exception type

Signed-off-by: Suraj Singh <surajrider@gmail.com>

---------

Signed-off-by: Suraj Singh <surajrider@gmail.com>
Signed-off-by: Valentin Mitrofanov <mitrofmep@gmail.com>
@dreamer-89
Copy link
Member Author

This PR also prevents NPE by invoking safe to use instance and thus resolves #6724

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch skip-changelog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

IndexWriter AlreadyClosedException on segment replication failure
3 participants