Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Weighted Routing] Support for fail open stats #6041

Merged
merged 10 commits into from
Feb 1, 2023

Conversation

anshu1106
Copy link
Contributor

Signed-off-by: Anshu Agarwal anshukag@amazon.com

Description

This PR adds support to capture fail open stats. Currently it captures number of times fail open is executed for search requests using weighted shard routing.

Issues Resolved

#5037

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Anshu Agarwal <anshukag@amazon.com>
@anshu1106 anshu1106 marked this pull request as ready for review January 27, 2023 15:03
@anshu1106 anshu1106 changed the title [Weighted Routing] Add fail open stats [Weighted Routing] Support for fail open stats Jan 27, 2023
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

*
* @opensearch.internal
*/
public class FailOpenWeightedRoutingStats implements ToXContentFragment, Writeable {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we define a generic weighted shard robin stats

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Updated

Signed-off-by: Anshu Agarwal <anshukag@amazon.com>
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

Signed-off-by: Anshu Agarwal <anshukag@amazon.com>
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

Comment on lines +130 to +131
@Nullable
private WeightedRoutingStats weightedRoutingStats;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is only related to Search, does it make sense to be placed in SearchStats instead ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah makes sense. Tried to go through the flow but looks like since failopen stats is captured inside FetchPhase, there isn't an easy way to capture this since the current implementation is just looking for fetchphase https://github.com/opensearch-project/OpenSearch/blob/main/server/src/main/java/org/opensearch/index/shard/SearchOperationListener.java. Will require some deeper understanding in this as we need to capture fail open stats in all the actions modified in https://github.com/opensearch-project/OpenSearch/blob/main/server/src/main/java/org/opensearch/index/shard/SearchOperationListener.java. Do you have any pointers on this?

Copy link
Collaborator

@gbbafna gbbafna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Signed-off-by: Anshu Agarwal <anshukag@amazon.com>
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      1 org.opensearch.search.SearchWeightedRoutingIT.testWeightedRoutingFailOpenStats

@@ -162,6 +166,11 @@ public NodeStats(StreamInput in) throws IOException {
} else {
clusterManagerThrottlingStats = null;
}
if (in.getVersion().onOrAfter(Version.CURRENT)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be 2_6

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mixed cluster tests fail with 2_6. https://build.ci.opensearch.org/job/gradle-check/10046/

@@ -356,6 +371,9 @@ public void writeTo(StreamOutput out) throws IOException {
if (out.getVersion().onOrAfter(Version.V_2_6_0)) {
out.writeOptionalWriteable(clusterManagerThrottlingStats);
}
if (out.getVersion().onOrAfter(Version.CURRENT)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same 2_6

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mixed cluster tests fail with 2_6. https://build.ci.opensearch.org/job/gradle-check/10046/

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah needs a backport

@@ -46,6 +46,8 @@ public static FailAwareWeightedRouting getInstance() {
return INSTANCE;
}

public static WeightedRoutingStats weightedRoutingStats = new WeightedRoutingStats();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

public static member variables have caps on

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack

// number of times fail open has to be executed for search requests
private AtomicInteger failOpenCount;

public WeightedRoutingStats() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we create a singleton instance in this class that can be shared across consumers

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed

Signed-off-by: Anshu Agarwal <anshukag@amazon.com>
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      1 org.opensearch.snapshots.DedicatedClusterSnapshotRestoreIT.testIndexDeletionDuringSnapshotCreationInQueue
      1 org.opensearch.search.SearchWeightedRoutingIT.testWeightedRoutingFailOpenStats
      1 org.opensearch.cluster.routing.allocation.decider.DiskThresholdDeciderIT.testIndexCreateBlockWithAReadOnlyBlock

@codecov-commenter
Copy link

codecov-commenter commented Jan 31, 2023

Codecov Report

Merging #6041 (1329f7e) into main (b1cf2d1) will decrease coverage by 0.12%.
The diff coverage is 50.00%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

@@             Coverage Diff              @@
##               main    #6041      +/-   ##
============================================
- Coverage     70.77%   70.66%   -0.12%     
+ Complexity    58744    58643     -101     
============================================
  Files          4775     4776       +1     
  Lines        281000   281042      +42     
  Branches      40592    40597       +5     
============================================
- Hits         198873   198586     -287     
- Misses        65812    66080     +268     
- Partials      16315    16376      +61     
Impacted Files Coverage Δ
...min/cluster/stats/TransportClusterStatsAction.java 69.56% <ø> (ø)
...src/main/java/org/opensearch/node/NodeService.java 70.96% <0.00%> (-1.17%) ⬇️
...search/cluster/MockInternalClusterInfoService.java 0.00% <0.00%> (ø)
.../java/org/opensearch/test/InternalTestCluster.java 58.52% <ø> (+0.40%) ⬆️
...rch/action/admin/cluster/node/stats/NodeStats.java 53.54% <44.44%> (-0.57%) ⬇️
...ensearch/cluster/routing/WeightedRoutingStats.java 50.00% <50.00%> (ø)
...arch/cluster/routing/FailAwareWeightedRouting.java 78.00% <66.66%> (-0.73%) ⬇️
...on/admin/cluster/node/stats/NodesStatsRequest.java 93.05% <100.00%> (+0.09%) ⬆️
.../cluster/node/stats/TransportNodesStatsAction.java 100.00% <100.00%> (ø)
...java/org/opensearch/client/indices/DataStream.java 0.00% <0.00%> (-76.09%) ⬇️
... and 495 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

Signed-off-by: Anshu Agarwal <anshukag@amazon.com>
@github-actions
Copy link
Contributor

github-actions bot commented Feb 1, 2023

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

github-actions bot commented Feb 1, 2023

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      1 org.opensearch.search.SearchWeightedRoutingIT.testWeightedRoutingFailOpenStats

Signed-off-by: Anshu Agarwal <anshukag@amazon.com>
@github-actions
Copy link
Contributor

github-actions bot commented Feb 1, 2023

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      2 org.opensearch.extensions.ExtensionsManagerTests.classMethod
      1 org.opensearch.search.SearchWeightedRoutingIT.testSearchAggregationWithNetworkDisruption_FailOpenEnabled
      1 org.opensearch.extensions.ExtensionsManagerTests.testInitialize
      1 org.opensearch.cluster.routing.allocation.decider.DiskThresholdDeciderIT.testIndexCreateBlockIsRemovedWhenAnyNodesNotExceedHighWatermark

@anshu1106
Copy link
Contributor Author

testSearchAggregationWithNetworkDisruption_FailOpenEnabled

#5957

Comment on lines +81 to +84
public void resetFailOpenCount() {
failOpenCount.set(0);
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make package private

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

won't compile since test is in a different package.

@Bukhtawar Bukhtawar merged commit 26ec5fc into opensearch-project:main Feb 1, 2023
@Bukhtawar Bukhtawar added the backport 2.x Backport to 2.x branch label Feb 1, 2023
opensearch-trigger-bot bot pushed a commit that referenced this pull request Feb 1, 2023
* Support for fail open stats

Signed-off-by: Anshu Agarwal <anshukag@amazon.com>
(cherry picked from commit 26ec5fc)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
kotwanikunal pushed a commit that referenced this pull request Feb 1, 2023
* [Weighted Routing] Support for fail open stats (#6041)

* Support for fail open stats

Signed-off-by: Anshu Agarwal <anshukag@amazon.com>
(cherry picked from commit 26ec5fc)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* Fix weighted routing stats version (#6128)

Signed-off-by: Anshu Agarwal <anshukag@amazon.com>
Co-authored-by: Anshu Agarwal <anshukag@amazon.com>

---------

Signed-off-by: Anshu Agarwal <anshukag@amazon.com>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Anshu Agarwal <anshuagarwal11@gmail.com>
Co-authored-by: Anshu Agarwal <anshukag@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch skip-changelog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants