-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Apply the date histogram rewrite optimization to range aggregation #13865
Conversation
Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
❌ Gradle check result for c5d2175: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
❌ Gradle check result for ed79e02: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
6d41421
to
8ec5f58
Compare
8ec5f58
to
67c281c
Compare
❌ Gradle check result for 8ec5f58: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
❌ Gradle check result for 67c281c: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
❌ Gradle check result for c10c775: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
❌ Gradle check result for 783b14a: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
❌ Gradle check result for 783b14a: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
@github-actions commented on Jun 18, 2024, 11:17 PM PDT:
|
…13865) * Refactor the ranges representation Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * Refactor try fast filter Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * Main work finished; left the handling of different numeric data types Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * buildRanges accepts field type Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * first working draft probably Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * add change log Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * accommodate geo distance agg Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * Fix test support all numeric types minus one on the upper range Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * [Refactor] range is lower inclusive, right exclusive Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * adding test Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * Adding test and refactor Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * refactor Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * add test Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * add test and update the compare logic in tree traversal Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * fix test, add random test Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * refactor to address comments Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * small potential performance update Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * fix precommit Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * refactor Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * refactor Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * set refresh_interval to -1 Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * address comment Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * address comment Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * address comment Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * Fix test To understand fully about the double and bigdecimal usage in scaled float field will take more time. Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> --------- Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> (cherry picked from commit 57fb50b) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
…13865) (#14463) * Refactor the ranges representation * Refactor try fast filter * Main work finished; left the handling of different numeric data types * buildRanges accepts field type * first working draft probably * add change log * accommodate geo distance agg * Fix test support all numeric types minus one on the upper range * [Refactor] range is lower inclusive, right exclusive * adding test * Adding test and refactor * refactor * add test * add test and update the compare logic in tree traversal * fix test, add random test * refactor to address comments * small potential performance update * fix precommit * refactor * refactor * set refresh_interval to -1 * address comment * address comment * address comment * Fix test To understand fully about the double and bigdecimal usage in scaled float field will take more time. --------- (cherry picked from commit 57fb50b) Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
@bowenlan-amzn Maybe a test failure here related to this change: https://build.ci.opensearch.org/job/gradle-check/41411/consoleText
That will reproduce for me, though not every time |
…pensearch-project#13865) * Refactor the ranges representation Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * Refactor try fast filter Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * Main work finished; left the handling of different numeric data types Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * buildRanges accepts field type Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * first working draft probably Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * add change log Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * accommodate geo distance agg Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * Fix test support all numeric types minus one on the upper range Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * [Refactor] range is lower inclusive, right exclusive Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * adding test Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * Adding test and refactor Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * refactor Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * add test Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * add test and update the compare logic in tree traversal Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * fix test, add random test Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * refactor to address comments Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * small potential performance update Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * fix precommit Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * refactor Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * refactor Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * set refresh_interval to -1 Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * address comment Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * address comment Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * address comment Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * Fix test To understand fully about the double and bigdecimal usage in scaled float field will take more time. Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> --------- Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
…pensearch-project#13865) (opensearch-project#14463) * Refactor the ranges representation * Refactor try fast filter * Main work finished; left the handling of different numeric data types * buildRanges accepts field type * first working draft probably * add change log * accommodate geo distance agg * Fix test support all numeric types minus one on the upper range * [Refactor] range is lower inclusive, right exclusive * adding test * Adding test and refactor * refactor * add test * add test and update the compare logic in tree traversal * fix test, add random test * refactor to address comments * small potential performance update * fix precommit * refactor * refactor * set refresh_interval to -1 * address comment * address comment * address comment * Fix test To understand fully about the double and bigdecimal usage in scaled float field will take more time. --------- (cherry picked from commit 57fb50b) Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Signed-off-by: kkewwei <kkewwei@163.com>
…pensearch-project#13865) * Refactor the ranges representation Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * Refactor try fast filter Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * Main work finished; left the handling of different numeric data types Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * buildRanges accepts field type Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * first working draft probably Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * add change log Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * accommodate geo distance agg Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * Fix test support all numeric types minus one on the upper range Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * [Refactor] range is lower inclusive, right exclusive Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * adding test Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * Adding test and refactor Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * refactor Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * add test Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * add test and update the compare logic in tree traversal Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * fix test, add random test Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * refactor to address comments Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * small potential performance update Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * fix precommit Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * refactor Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * refactor Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * set refresh_interval to -1 Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * address comment Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * address comment Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * address comment Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * Fix test To understand fully about the double and bigdecimal usage in scaled float field will take more time. Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> --------- Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
Description
Background
Previously we optimized date histogram aggregation by utilizing the index structure of date field — BKD tree/Points to provide the documents count of date histogram buckets results. We first build date ranges from the date histogram query input like daily interval histogram. Then we "query" the index of date for how many documents are in these date ranges.
Idea
Date is actually saved as numeric data of long data type. This leads to the idea that we can extend the optimization to the RangeAggregator which also perform ranges aggregation on numeric data.
Implementation
The core optimization algorithm remains the same. (Details #13317)
Note the algorithm has a hidden pre-condition: all the ranges are not overlapping (because date interval cannot produce overlapping date ranges)
Another difference is that range aggregation buildRange method won't be built from segment level match all path. The ranges are provided by user directly, not like date histogram needs to check the boundaries of shard or segment or accommodate top level range query. (Details #12073)
Changes
long[]
Possible Follow Ups
Benchmarks
However, considering this is already ~10ms. We can look into the reason later as a follow up. For the other workloads, big5, nyc, http, they are all even faster.
big5
noaa
Related Issues
Resolves #13531
Check List
New functionality has been documented.API changes companion pull request created.Public documentation issue/PR createdBy submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.