Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding index create block when all nodes have breached high disk watermark #5852

Merged
merged 2 commits into from
Jan 19, 2023

Conversation

RS146BIJAY
Copy link
Contributor

@RS146BIJAY RS146BIJAY commented Jan 13, 2023

…rmark

Signed-off-by: Rishav Sagar rissag@amazon.com

Description

OpenSearch stops allocating any shards to nodes that have breached High Disk Watermark. If a scenario arises that all the nodes in the cluster breached high disk watermark, no new shards will be created on this cluster. Now if we try to create a new index on this cluster, it will be a red index (since no primary or replica shards can be created for this index).

In order to prevent red cluster, we propose that whenever all the nodes in the cluster has breached high disk watermark we apply a INDEX CREATE BLOCK on the cluster to prevent creation of any red indices. We will modify the DiskThresholdMonitor (which monitors for disk watermarks thresholds on a domain and take appropriate action) to include an extra check on whether low disk watermark is breached on all the nodes in cluster. If it has, it will apply an index create block on the entire cluster.

Issues Resolved

#4456

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@RS146BIJAY RS146BIJAY force-pushed the index-block-guardrails branch 3 times, most recently from fd39829 to 76b9782 Compare January 13, 2023 09:32
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      1 org.opensearch.indices.replication.SegmentReplicationRelocationIT.testCancelPrimaryAllocation
      1 org.opensearch.indices.replication.SegmentReplicationIT.testCancelPrimaryAllocation

@codecov-commenter
Copy link

codecov-commenter commented Jan 13, 2023

Codecov Report

Merging #5852 (0c5ed1f) into main (3e2d8cc) will increase coverage by 0.27%.
The diff coverage is 17.39%.

@@             Coverage Diff              @@
##               main    #5852      +/-   ##
============================================
+ Coverage     70.57%   70.84%   +0.27%     
- Complexity    58486    58730     +244     
============================================
  Files          4770     4770              
  Lines        280721   280743      +22     
  Branches      40536    40539       +3     
============================================
+ Hits         198123   198901     +778     
+ Misses        66228    65502     -726     
+ Partials      16370    16340      -30     
Impacted Files Coverage Δ
...uster/routing/allocation/DiskThresholdMonitor.java 81.73% <17.39%> (-6.09%) ⬇️
...g/opensearch/index/analysis/CharFilterFactory.java 0.00% <0.00%> (-100.00%) ⬇️
...rch/test/junit/listeners/ReproduceInfoPrinter.java 9.52% <0.00%> (-68.26%) ⬇️
.../java/org/opensearch/search/dfs/AggregatedDfs.java 51.61% <0.00%> (-45.17%) ⬇️
...opensearch/persistent/PersistentTasksExecutor.java 22.22% <0.00%> (-44.45%) ⬇️
.../admin/cluster/reroute/ClusterRerouteResponse.java 60.00% <0.00%> (-40.00%) ⬇️
...ava/org/opensearch/search/dfs/DfsSearchResult.java 47.87% <0.00%> (-39.37%) ⬇️
...luster/routing/allocation/RoutingExplanations.java 62.06% <0.00%> (-37.94%) ⬇️
...ion/admin/cluster/node/info/PluginsAndModules.java 53.12% <0.00%> (-34.38%) ⬇️
...cluster/routing/allocation/RerouteExplanation.java 70.00% <0.00%> (-30.00%) ⬇️
... and 522 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@RS146BIJAY RS146BIJAY force-pushed the index-block-guardrails branch 2 times, most recently from a8787d6 to 888c720 Compare January 13, 2023 15:30
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      2 org.opensearch.cluster.routing.allocation.decider.DiskThresholdDeciderIT.testIndexCreateBlockWhenAllNodesExceededHighWatermark
      1 org.opensearch.snapshots.DedicatedClusterSnapshotRestoreIT.testIndexDeletionDuringSnapshotCreationInQueue
      1 org.opensearch.cluster.routing.allocation.decider.DiskThresholdDeciderIT.testIndexCreateBlockWithAReadOnlyBlock
      1 org.opensearch.cluster.routing.allocation.decider.DiskThresholdDeciderIT.testIndexCreateBlockWhenAllNodesExceededHighWatermark

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      2 org.opensearch.cluster.routing.allocation.decider.DiskThresholdDeciderIT.testIndexCreateBlockWithAReadOnlyBlock
      1 org.opensearch.indices.replication.SegmentReplicationIT.testCancelPrimaryAllocation
      1 org.opensearch.cluster.routing.allocation.decider.DiskThresholdDeciderIT.testIndexCreateBlockWhenAllNodesExceededHighWatermark

@RS146BIJAY RS146BIJAY force-pushed the index-block-guardrails branch 2 times, most recently from dceeb82 to b8dfc29 Compare January 13, 2023 19:15
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      2 org.opensearch.cluster.service.MasterServiceTests.classMethod
      1 org.opensearch.indices.replication.SegmentReplicationRelocationIT.testCancellation
      1 org.opensearch.cluster.service.MasterServiceTests.testThrottlingForMultipleTaskTypes

@RS146BIJAY RS146BIJAY marked this pull request as ready for review January 13, 2023 20:18
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

…rmark

Signed-off-by: Rishav Sagar <rissag@amazon.com>
@RS146BIJAY RS146BIJAY force-pushed the index-block-guardrails branch 2 times, most recently from b8d232d to f401496 Compare January 19, 2023 10:33
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

Comment on lines 411 to 437
ActionListener<Void> wrappedListener = ActionListener.wrap(r -> {
setLastRunTimeMillis();
listener.onResponse(r);
}, e -> {
ActionListener<Void> wrappedListener = ActionListener.wrap(r -> { listener.onResponse(r); }, e -> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for my understanding y did we remove last run time?

Copy link
Contributor Author

@RS146BIJAY RS146BIJAY Jan 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As commented by Gaurav here, setLastRunTimeMillis() is only meant for reroute operations as it needs to honor diskThresholdSettings.getRerouteInterval().millis(), so removing it from updateIndicesReadOnly as well.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we remove setLastRunTimeMillis it will try to reroute once even though we don't need to reroute.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed the comment

Signed-off-by: Rishav Sagar <rissag@amazon.com>
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@Bukhtawar Bukhtawar merged commit 6309611 into opensearch-project:main Jan 19, 2023
@andrross andrross added the backport 2.x Backport to 2.x branch label Jan 19, 2023
opensearch-trigger-bot bot pushed a commit that referenced this pull request Jan 19, 2023
…rmark (#5852)

* Adding index create block when all nodes have breached high disk watermark

Signed-off-by: Rishav Sagar <rissag@amazon.com>
(cherry picked from commit 6309611)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
@andrross andrross added the v2.6.0 'Issues and PRs related to version v2.6.0' label Jan 26, 2023
@andrross andrross mentioned this pull request Jan 26, 2023
6 tasks
@andrross andrross added backport 2.x Backport to 2.x branch and removed backport 2.x Backport to 2.x branch labels Feb 2, 2023
opensearch-trigger-bot bot pushed a commit that referenced this pull request Feb 2, 2023
…rmark (#5852)

* Adding index create block when all nodes have breached high disk watermark

Signed-off-by: Rishav Sagar <rissag@amazon.com>
(cherry picked from commit 6309611)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
andrross pushed a commit that referenced this pull request Feb 2, 2023
…rmark (#5852) (#6165)

* Adding index create block when all nodes have breached high disk watermark


(cherry picked from commit 6309611)

Signed-off-by: Rishav Sagar <rissag@amazon.com>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
@dennisoelkers
Copy link

@RS146BIJAY: Will the index creation cluster block be lifted automatically when one or more nodes fall below the watermark again?

@RS146BIJAY
Copy link
Contributor Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch v2.6.0 'Issues and PRs related to version v2.6.0'
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants