-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Time series based workload desc order optimization through reverse segment read #7244
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Signed-off-by: gashutos <gashutos@amazon.com>
Signed-off-by: gashutos <gashutos@amazon.com>
Signed-off-by: gashutos <gashutos@amazon.com>
Signed-off-by: gashutos <gashutos@amazon.com>
Signed-off-by: gashutos <gashutos@amazon.com>
Signed-off-by: gashutos <gashutos@amazon.com>
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
The backport to
To backport manually, run these commands in your terminal: # Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-7244-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 4c98b3d38064b94a87f6d7a1359e623849459bac
# Push it to GitHub
git push --set-upstream origin backport/backport-7244-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/backport-2.x Then, create a pull request where the |
austintlee
pushed a commit
to austintlee/OpenSearch
that referenced
this pull request
Apr 28, 2023
…gment read (opensearch-project#7244) This commit changes the EngineConfig for timeseries indexes only (e.g., indexes that use the @timestamp metadata field) so that a descending LeafSorter comparator is used to visit segments in order of most newest to oldest. For the more infrequent case that a user chooses to sort query results by ASC time, this would cause a search regression so the ContextIndexSearcher is updated to inspect the sort order from the search request and reverse the comparator so segments are visited in ascending order. LeafSorter behavior for non-timeseries indexes is left the same. Signed-off-by: gashutos <gashutos@amazon.com> Signed-off-by: Chaitanya Gohel <104654647+gashutos@users.noreply.github.com>
gashutos
added a commit
to gashutos/OpenSearch
that referenced
this pull request
May 8, 2023
…gment read (opensearch-project#7244) This commit changes the EngineConfig for timeseries indexes only (e.g., indexes that use the @timestamp metadata field) so that a descending LeafSorter comparator is used to visit segments in order of most newest to oldest. For the more infrequent case that a user chooses to sort query results by ASC time, this would cause a search regression so the ContextIndexSearcher is updated to inspect the sort order from the search request and reverse the comparator so segments are visited in ascending order. LeafSorter behavior for non-timeseries indexes is left the same. Signed-off-by: gashutos <gashutos@amazon.com> Signed-off-by: Chaitanya Gohel <104654647+gashutos@users.noreply.github.com>
reta
pushed a commit
that referenced
this pull request
May 8, 2023
…gment read (#7244) (#7457) This commit changes the EngineConfig for timeseries indexes only (e.g., indexes that use the @timestamp metadata field) so that a descending LeafSorter comparator is used to visit segments in order of most newest to oldest. For the more infrequent case that a user chooses to sort query results by ASC time, this would cause a search regression so the ContextIndexSearcher is updated to inspect the sort order from the search request and reverse the comparator so segments are visited in ascending order. LeafSorter behavior for non-timeseries indexes is left the same. Signed-off-by: gashutos <gashutos@amazon.com> Signed-off-by: Chaitanya Gohel <104654647+gashutos@users.noreply.github.com>
andrross
added a commit
to andrross/OpenSearch
that referenced
this pull request
Jun 2, 2023
…verse segment read (opensearch-project#7244)" This reverts commit 4c98b3d. Reverting due to issue reported in opensearch-project#7878.
andrross
added a commit
to andrross/OpenSearch
that referenced
this pull request
Jun 2, 2023
…verse segment read (opensearch-project#7244)" This reverts commit 4c98b3d. Reverting due to issue reported in opensearch-project#7878. Signed-off-by: Andrew Ross <andrross@amazon.com>
6 tasks
andrross
added a commit
to andrross/OpenSearch
that referenced
this pull request
Jun 2, 2023
…verse segment read (opensearch-project#7244)" This reverts commit 4c98b3d. Reverting due to issue reported in opensearch-project#7878. Signed-off-by: Andrew Ross <andrross@amazon.com>
andrross
added a commit
to andrross/OpenSearch
that referenced
this pull request
Jun 2, 2023
…verse segment read (opensearch-project#7244)" This reverts commit 4c98b3d. Reverting due to issue reported in opensearch-project#7878. Signed-off-by: Andrew Ross <andrross@amazon.com>
andrross
added a commit
that referenced
this pull request
Jun 2, 2023
andrross
added a commit
that referenced
this pull request
Jun 2, 2023
reta
pushed a commit
that referenced
this pull request
Jun 2, 2023
…tion through re… (#7895) * Revert "Time series based workload desc order optimization through reverse segment read (#7244)" (#7892) This reverts commit 4c98b3d. Reverting due to issue reported in #7878. Signed-off-by: Andrew Ross <andrross@amazon.com> (cherry picked from commit bb26536) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> * Remove unused imports Signed-off-by: Andrew Ross <andrross@amazon.com> --------- Signed-off-by: Andrew Ross <andrross@amazon.com> Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Andrew Ross <andrross@amazon.com>
gashutos
added a commit
to gashutos/OpenSearch
that referenced
this pull request
Jun 8, 2023
…rough reverse segment read (opensearch-project#7244)" (opensearch-project#7892)" This reverts commit bb26536.
gashutos
added a commit
to gashutos/OpenSearch
that referenced
this pull request
Jun 8, 2023
…rough reverse segment read (opensearch-project#7244)" (opensearch-project#7892)" This reverts commit bb26536. Signed-off-by: gashutos <gashutos@amazon.com>
6 tasks
andrross
pushed a commit
that referenced
this pull request
Jun 12, 2023
…gh reverse segment read (#7244)] with fixes (#7967) * Revert "Revert "Time series based workload desc order optimization through reverse segment read (#7244)" (#7892)" This reverts commit bb26536. Signed-off-by: gashutos <gashutos@amazon.com> * Enable time series optimization only if it is not IndexSorted index, also ASC order reverse should only consider in @timestamp field Signed-off-by: gashutos <gashutos@amazon.com> * Modifying CHANGELOG Signed-off-by: gashutos <gashutos@amazon.com> * Adding integ test for scroll API where sort by _doc is getting early termination Signed-off-by: gashutos <gashutos@amazon.com> --------- Signed-off-by: gashutos <gashutos@amazon.com>
gashutos
added a commit
to gashutos/OpenSearch
that referenced
this pull request
Jun 13, 2023
…gh reverse segment read (opensearch-project#7244)] with fixes (opensearch-project#7967) * Revert "Revert "Time series based workload desc order optimization through reverse segment read (opensearch-project#7244)" (opensearch-project#7892)" This reverts commit bb26536. Signed-off-by: gashutos <gashutos@amazon.com> * Enable time series optimization only if it is not IndexSorted index, also ASC order reverse should only consider in @timestamp field Signed-off-by: gashutos <gashutos@amazon.com> * Modifying CHANGELOG Signed-off-by: gashutos <gashutos@amazon.com> * Adding integ test for scroll API where sort by _doc is getting early termination Signed-off-by: gashutos <gashutos@amazon.com> --------- Signed-off-by: gashutos <gashutos@amazon.com>
gaiksaya
pushed a commit
to gaiksaya/OpenSearch
that referenced
this pull request
Jun 26, 2023
…tion through re… (opensearch-project#7895) * Revert "Time series based workload desc order optimization through reverse segment read (opensearch-project#7244)" (opensearch-project#7892) This reverts commit 4c98b3d. Reverting due to issue reported in opensearch-project#7878. Signed-off-by: Andrew Ross <andrross@amazon.com> (cherry picked from commit bb26536) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> * Remove unused imports Signed-off-by: Andrew Ross <andrross@amazon.com> --------- Signed-off-by: Andrew Ross <andrross@amazon.com> Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Andrew Ross <andrross@amazon.com>
gaiksaya
pushed a commit
to gaiksaya/OpenSearch
that referenced
this pull request
Jun 26, 2023
…gh reverse segment read (opensearch-project#7244)] with fixes (opensearch-project#7967) (opensearch-project#8037) Signed-off-by: gashutos <gashutos@amazon.com>
imRishN
pushed a commit
to imRishN/OpenSearch
that referenced
this pull request
Jun 27, 2023
…gh reverse segment read (opensearch-project#7244)] with fixes (opensearch-project#7967) * Revert "Revert "Time series based workload desc order optimization through reverse segment read (opensearch-project#7244)" (opensearch-project#7892)" This reverts commit bb26536. Signed-off-by: gashutos <gashutos@amazon.com> * Enable time series optimization only if it is not IndexSorted index, also ASC order reverse should only consider in @timestamp field Signed-off-by: gashutos <gashutos@amazon.com> * Modifying CHANGELOG Signed-off-by: gashutos <gashutos@amazon.com> * Adding integ test for scroll API where sort by _doc is getting early termination Signed-off-by: gashutos <gashutos@amazon.com> --------- Signed-off-by: gashutos <gashutos@amazon.com> Signed-off-by: Rishab Nahata <rnnahata@amazon.com>
shiv0408
pushed a commit
to Gaurav614/OpenSearch
that referenced
this pull request
Apr 25, 2024
…gment read (opensearch-project#7244) This commit changes the EngineConfig for timeseries indexes only (e.g., indexes that use the @timestamp metadata field) so that a descending LeafSorter comparator is used to visit segments in order of most newest to oldest. For the more infrequent case that a user chooses to sort query results by ASC time, this would cause a search regression so the ContextIndexSearcher is updated to inspect the sort order from the search request and reverse the comparator so segments are visited in ascending order. LeafSorter behavior for non-timeseries indexes is left the same. Signed-off-by: gashutos <gashutos@amazon.com> Signed-off-by: Chaitanya Gohel <104654647+gashutos@users.noreply.github.com> Signed-off-by: Shivansh Arora <hishiv@amazon.com>
shiv0408
pushed a commit
to Gaurav614/OpenSearch
that referenced
this pull request
Apr 25, 2024
…verse segment read (opensearch-project#7244)" (opensearch-project#7892) This reverts commit 4c98b3d. Reverting due to issue reported in opensearch-project#7878. Signed-off-by: Andrew Ross <andrross@amazon.com> Signed-off-by: Shivansh Arora <hishiv@amazon.com>
shiv0408
pushed a commit
to Gaurav614/OpenSearch
that referenced
this pull request
Apr 25, 2024
…gh reverse segment read (opensearch-project#7244)] with fixes (opensearch-project#7967) * Revert "Revert "Time series based workload desc order optimization through reverse segment read (opensearch-project#7244)" (opensearch-project#7892)" This reverts commit bb26536. Signed-off-by: gashutos <gashutos@amazon.com> * Enable time series optimization only if it is not IndexSorted index, also ASC order reverse should only consider in @timestamp field Signed-off-by: gashutos <gashutos@amazon.com> * Modifying CHANGELOG Signed-off-by: gashutos <gashutos@amazon.com> * Adding integ test for scroll API where sort by _doc is getting early termination Signed-off-by: gashutos <gashutos@amazon.com> --------- Signed-off-by: gashutos <gashutos@amazon.com> Signed-off-by: Shivansh Arora <hishiv@amazon.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
EDIT -> After incorporating comments/suggestions from Nick, Andriya & Andorss, we decided to limit this change only to @timestamp field indices and also we changed default segment search order for these indices as Descending (from latest to the oldest segments), and also safe guarded rare ASC queries on time series based workload not to traverse on default DESC order to avoid any regression. The perf gains still remains similar.
As described in issue OpenSearch-6814, desc order sort queries are taking more time (3x or more) compare to asc order queries on same data set (time series based).
That is because all latest documents in time series workload are in latest segments when our IndexSearcher search documents in order of oldest segments to newest segments.
In this PR, we changed the leafReaderContext (segments) order in ContextIndexSearcher in reverse if query type is descending order sort. This will by default behaviour for all type of workloads and not just for time series based workloads. However it will only benefit to workload which has ever increasing data values. And for other type of workload, it wont regress or impact since data is randomly distributed across all segments.
But as a precaution, in case any user/workload gets search latency hit, I have added index level setting to disable this optimization.
Optimization gains
We are seeing 3x to 5x gains on below configuration.
Issues Resolved
OpenSearch-6814
Next action item
On same segment of time series based workload, ASC queiry is faster compare to descending order. We need to look further to that.
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.