Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Indexing flow enhancements for admission controller #8911

Open
ajaymovva opened this issue Jul 27, 2023 · 0 comments
Open

Indexing flow enhancements for admission controller #8911

ajaymovva opened this issue Jul 27, 2023 · 0 comments
Labels
enhancement Enhancement or improvement to existing feature or request Indexing Indexing, Bulk Indexing and anything related to indexing

Comments

@ajaymovva
Copy link
Contributor

Overview

The parent RFC #8910 discusses the admission control framework, which limits and restricts the incoming requests early when a node begins to go under stress.

OpenSearch today provides few gating mechanisms to protect a node when under duress, via the concepts of queue rejections and circuit breakers. However, queue sizes are fixed and isolated and do not effectively represent the total work required to be done. Similarly, Circuit breakers act as a last line of defence, are mostly too late to act upon, and do not offer fairness. Some of these gaps create availability issues with the cluster when under duress due to hardware failures, node performance degradation, or traffic bursts.

Shard Indexing Pressure today tries to address this to some extent by rejecting indexing requests based on the memory accounting at the shard level along with other key performance factors like throughput and last successful requests.

Challenges in the Shard Indexing BackPressure

  1. Shard Indexing Backpressure mechanisms only account for memory usage and throughput degradation of the shards / nodes to reject requests. They don’t consider other resource utilisation parameters such as CPU , I/O etc. Refer to similar [RFC] Shard Indexing backpressure mechanism should also protect from any CPU contention on nodes #7638.
  2. The coordinator rejects requests based on the local view it constructs for downstream nodes and rejects if memory utilisation for the shard is breached. But it does not consider the collective resource contention on the target node based on requests from other coordinator nodes.

Proposed Solution

The admission controller framework we are suggesting will track the resource utilisation for all the downstream nodes and maintain these stats at the coordinator. Along with the shard indexing backpressure, which tracks the shard level view for downstream nodes, we are proposing a new solution that will give the node-level performance stats for all the downstream nodes.

In the admission controller framework, we will support multiple levels of rejection, such as rejection at the coordinator based on the coordinator's or downstream nodes resource utilisation and rejection at the target node based on the target node's resource utilisation. As part of it, we will enhance the indexing flow to proactively reject requests if the data nodes are stressed.

Reject the Incoming Indexing request based on the resource utilization of downstream nodes

AdmissionControllerIndexingFlow (1)

Proposed Indexing Flow

For every bulk shard request, we will evaluate the performance stats of all nodes that are associated with primary or replica shards of the indexing operation and proceed with the request either by allowing it or rejecting it.

Primary Shard Node is in Stress

  1. When the primary shard node is in stress, we will reject the whole bulk shard request at the coordinator only.

Replica Shard Node is in Stress

  1. When all the replica nodes are in stress, we will reject the whole bulk shard reject at the coordinator only.
  2. When few replica nodes are in stress we will allow the request to proceed from the coordinator as the primary and more than one replica nodes are not in stress.
  3. While processing replica request at target node we will evaluate the resource utilization and proceed with replication action.

This will be a further extension to node-level rejections based on nodeLimitBreached at the coordinator which now consider rejections based on the CPU/JVM/IO usage of the downstream nodes.

Rejection of Incoming Indexing requests at target node

The coordinator will allow the requests to the stressed target nodes, when it doesn’t have the latest stats for the downstream nodes. In such cases, we will reject requests at target/data nodes.

Primary Shard Node is in Stress

  1. We will extend existing shard indexing backpressure and add CPU/JVM limits to isPrimaryNodeLimitBreached method.

Replica Shard Node is in Stress

  1. Similar to the shard indexing backpressure, we will have a higher threshold for replica requests compared to primary requests and reject the requests at the target node based on the CPU/JVM/IO usage of that node. This mechanism is to make sure to fail fast the indexing requests (rejecting primary) and to reduce shard failures as well (high threshold to replica). This will also be an extension to the node-level rejections based on the replicaLimitBreached at the target node based on resource usage.
  2. Replica shard failure - If we fail the replica request on any one of the nodes, there will be shard failures, so it will be a configurable option to fail the replica or not. All the threshold limits for the requests are dynamically configurable, so as a default option, we will try to make sure there won’t be any data loss and nodes are not under duress. Even today, the shard indexing backpressure can reject the replica action at the replica node when the threshold limit breaches.

Co-authored by @bharath-techie

@ajaymovva ajaymovva added enhancement Enhancement or improvement to existing feature or request untriaged labels Jul 27, 2023
@Xtansia Xtansia added the Indexing Indexing, Bulk Indexing and anything related to indexing label Aug 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request Indexing Indexing, Bulk Indexing and anything related to indexing
Projects
Status: New
Development

No branches or pull requests

3 participants