Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Backport 2.x] Implement segment replication event cancellation. #4387

Merged
merged 1 commit into from
Sep 2, 2022

Conversation

mch2
Copy link
Member

@mch2 mch2 commented Sep 1, 2022

Manual backport of #4225 to 2.x.

@mch2 mch2 requested review from a team and reta as code owners September 1, 2022 20:24
@github-actions
Copy link
Contributor

github-actions bot commented Sep 1, 2022

Gradle Check (Jenkins) Run Completed with:

@dreamer-89
Copy link
Member

Failing due to heap space issue. Tracked in #3973.
Refiring!

1: Task failed with an exception.
-----------
* What went wrong:
Execution failed for task ':plugins:discovery-azure-classic:compileInternalClusterTestJava'.
> java.lang.OutOfMemoryError: Java heap space

@github-actions
Copy link
Contributor

github-actions bot commented Sep 1, 2022

Gradle Check (Jenkins) Run Completed with:

@dreamer-89 dreamer-89 self-requested a review September 1, 2022 21:35
@mch2
Copy link
Member Author

mch2 commented Sep 1, 2022

Execution failed for task ':distribution:bwc:bugfix:buildBwcLinuxTar'.
> Building 2.2.1 didn't generate expected file /var/jenkins/workspace/gradle-check/search/distribution/bwc/bugfix/build/bwc/checkout-2.2/distribution/archives/linux-tar/build/distributions/opensearch-min-2.2.1-SNAPSHOT-linux-x64.tar.gz

@github-actions
Copy link
Contributor

github-actions bot commented Sep 2, 2022

Gradle Check (Jenkins) Run Completed with:

@mch2 mch2 force-pushed the backport/backport-4225-to-2.x branch from b0d892b to 6818bda Compare September 2, 2022 04:53
@github-actions
Copy link
Contributor

github-actions bot commented Sep 2, 2022

Gradle Check (Jenkins) Run Completed with:

@mch2
Copy link
Member Author

mch2 commented Sep 2, 2022

  2> REPRODUCE WITH: ./gradlew ':server:internalClusterTest' --tests "org.opensearch.cluster.routing.allocation.decider.DiskThresholdDeciderIT.testRestoreSnapshotAllocationDoesNotExceedWatermark" -Dtests.seed=2C6090E329EBBFC4 -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=es-HN -Dtests.timezone=Asia/Karachi -Druntime.java=17
  2> java.lang.AssertionError: 
    Expected: an empty collection
         but: <[[qfddaotgwv][1], node[Ak81Pn6GRbeGBUORxWl2bg], [P], s[STARTED], a[id=zu2e5c5bSwyE-ZDsdLv4_g], [qfddaotgwv][5], node[Ak81Pn6GRbeGBUORxWl2bg], [P], s[STARTED], a[id=bO_j9IkxT6Ot4qY8ucWCPg]]>
        at __randomizedtesting.SeedInfo.seed([2C6090E329EBBFC4:3ED836824252A07C]:0)
        at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
        at org.junit.Assert.assertThat(Assert.java:964)
        at org.junit.Assert.assertThat(Assert.java:930)
        at org.opensearch.cluster.routing.allocation.decider.DiskThresholdDeciderIT.lambda$testRestoreSnapshotAllocationDoesNotExceedWatermark$2(DiskThresholdDeciderIT.java:260)
        at org.opensearch.test.OpenSearchTestCase.assertBusy(OpenSearchTestCase.java:1049)
        at org.opensearch.test.OpenSearchTestCase.assertBusy(OpenSearchTestCase.java:1022)
        at org.opensearch.cluster.routing.allocation.decider.DiskThresholdDeciderIT.testRestoreSnapshotAllocationDoesNotExceedWatermark(DiskThresholdDeciderIT.java:260)

@github-actions
Copy link
Contributor

github-actions bot commented Sep 2, 2022

Gradle Check (Jenkins) Run Completed with:

@codecov-commenter
Copy link

codecov-commenter commented Sep 2, 2022

Codecov Report

Merging #4387 (1a9712c) into 2.x (b202fd1) will increase coverage by 0.13%.
The diff coverage is 74.82%.

@@             Coverage Diff              @@
##                2.x    #4387      +/-   ##
============================================
+ Coverage     70.54%   70.68%   +0.13%     
- Complexity    56942    57146     +204     
============================================
  Files          4572     4584      +12     
  Lines        273816   274484     +668     
  Branches      40152    40223      +71     
============================================
+ Hits         193170   194008     +838     
+ Misses        64455    64263     -192     
- Partials      16191    16213      +22     
Impacted Files Coverage Δ
.../org/opensearch/client/support/AbstractClient.java 34.96% <0.00%> (-0.19%) ⬇️
...pensearch/common/settings/IndexScopedSettings.java 100.00% <ø> (ø)
.../java/org/opensearch/common/util/FeatureFlags.java 50.00% <ø> (ø)
...rc/main/java/org/opensearch/index/IndexModule.java 82.03% <ø> (ø)
.../java/org/opensearch/plugins/IndexStorePlugin.java 100.00% <ø> (ø)
.../java/org/opensearch/snapshots/RestoreService.java 55.55% <0.00%> (-6.55%) ⬇️
...opensearch/cluster/routing/ShardRoutingHelper.java 95.65% <ø> (ø)
...ore/restore/TransportRestoreRemoteStoreAction.java 25.00% <25.00%> (ø)
...c/main/java/org/opensearch/index/IndexService.java 74.03% <28.57%> (-0.11%) ⬇️
...on/admin/cluster/RestRestoreRemoteStoreAction.java 33.33% <33.33%> (ø)
... and 535 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@dreamer-89
Copy link
Member

@mch2 : Looks like a conflict, can you rebase and merge ?

…n. (opensearch-project#4225)

* Segment Replication.  Fix Cancellation of replication events.

This PR updates segment replication paths to correctly cancel replication events on the primary and replica.
In the source service, any ongoing event for a primary that is sending to a replica that shuts down or is promoted as a new primary are cancelled.
In the target service, any ongoing event for a replica that is promoted as a new primary or is fetching from a primary that shuts down.
It wires up SegmentReplicationSourceService as an IndexEventListener so that it can respond to events and cancel any ongoing transfer state.
This change also includes some test cleanup for segment replication to rely on actual components over mocks.

Signed-off-by: Marc Handalian <handalm@amazon.com>

Fix to not start/stop SegmentReplicationSourceService as a lifecycle component with feature flag off.

Signed-off-by: Marc Handalian <handalm@amazon.com>

Update logic to properly mark SegmentReplicationTarget as cancelled when cancel initiated by primary.

Signed-off-by: Marc Handalian <handalm@amazon.com>

Minor updates from self review.

Signed-off-by: Marc Handalian <handalm@amazon.com>

* Add missing changelog entry.

Signed-off-by: Marc Handalian <handalm@amazon.com>

Signed-off-by: Marc Handalian <handalm@amazon.com>
(cherry picked from commit 19d1a2b)
@mch2 mch2 force-pushed the backport/backport-4225-to-2.x branch from 6818bda to 1a9712c Compare September 2, 2022 18:04
@github-actions
Copy link
Contributor

github-actions bot commented Sep 2, 2022

Gradle Check (Jenkins) Run Completed with:

@mch2
Copy link
Member Author

mch2 commented Sep 2, 2022


REPRODUCE WITH: ./gradlew ':server:internalClusterTest' --tests "org.opensearch.cluster.allocation.AwarenessAllocationIT.testThreeZoneOneReplicaWithForceZoneValueAndLoadAwareness" -Dtests.seed=196AC1473FA624F2 -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=fi-FI -Dtests.timezone=Pacific/Apia -Druntime.java=17

@github-actions
Copy link
Contributor

github-actions bot commented Sep 2, 2022

Gradle Check (Jenkins) Run Completed with:

@mch2 mch2 merged commit 1edb733 into opensearch-project:2.x Sep 2, 2022
@mch2 mch2 deleted the backport/backport-4225-to-2.x branch September 2, 2022 19:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants