-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unify precomputation of aggregations behind a common API #16733
Unify precomputation of aggregations behind a common API #16733
Conversation
❌ Gradle check result for 4d5c32b: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
server/src/main/java/org/opensearch/search/aggregations/AggregatorBase.java
Show resolved
Hide resolved
Regarding implementation of this, I have one more alternative which I think is worth discussing. How about bringing this abstraction at ContextIndexSearcher itself.
Basically if we have pre computed aggregations already, we assign it as EarlyTerminationCollector. So, what I'm thinking about is cases with sub-aggregations that we can pre-compute, which is highly relevant in cases of star tree pre-computation. For eg.: #16674 and if a dedicated abstraction for star-tree preCompute in ComtextIndexSearcher wopuld make more sense or not. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #16733 +/- ##
============================================
- Coverage 72.41% 72.32% -0.09%
- Complexity 65626 65712 +86
============================================
Files 5306 5319 +13
Lines 304927 305722 +795
Branches 44257 44348 +91
============================================
+ Hits 220804 221107 +303
- Misses 66007 66573 +566
+ Partials 18116 18042 -74 ☔ View full report in Codecov by Sentry. |
@jainankitk -- you're probably the maintainer (other than me) with the most context into this change. What do you think? |
We've had a series of aggregation speedups that use the same strategy: instead of iterating through documents that match the query one-by-one, we can look at a Lucene segment and compute the aggregation directly (if some particular conditions are met). In every case, we've hooked that into custom logic hijacks the getLeafCollector method and throws CollectionTerminatedException. This creates the illusion that we're implementing a custom LeafCollector, when really we're not collecting at all (which is the whole point). With this refactoring, the mechanism (hijacking getLeafCollector) is moved into AggregatorBase. Aggregators that have a strategy to precompute their answer can override tryPrecomputeAggregationForLeaf, which is expected to return true if they managed to precompute. This should also make it easier to keep track of which aggregations have precomputation approaches (since they override this method). Signed-off-by: Michael Froh <froh@amazon.com>
Not sure why I added this, when the existing implementation didn't have it. That said, we *should* call finishLeaf() before precomputing the current leaf. Signed-off-by: Michael Froh <froh@amazon.com>
Signed-off-by: Michael Froh <froh@amazon.com>
c3897a0
to
19a40cc
Compare
@expani, @sandeshkr419 -- I resolved conflicts with your recent star-tree changes. Can you please take a look? |
❌ Gradle check result for 19a40cc: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
One high level class I see missing among the metric aggregators is |
server/src/main/java/org/opensearch/search/aggregations/metrics/MaxAggregator.java
Show resolved
Hide resolved
server/src/main/java/org/opensearch/search/aggregations/metrics/MinAggregator.java
Show resolved
Hide resolved
Signed-off-by: Michael Froh <froh@amazon.com>
4ac8bcb
to
caceb62
Compare
Signed-off-by: Michael Froh <froh@amazon.com>
* Unify precomputation of aggregations behind a common API We've had a series of aggregation speedups that use the same strategy: instead of iterating through documents that match the query one-by-one, we can look at a Lucene segment and compute the aggregation directly (if some particular conditions are met). In every case, we've hooked that into custom logic hijacks the getLeafCollector method and throws CollectionTerminatedException. This creates the illusion that we're implementing a custom LeafCollector, when really we're not collecting at all (which is the whole point). With this refactoring, the mechanism (hijacking getLeafCollector) is moved into AggregatorBase. Aggregators that have a strategy to precompute their answer can override tryPrecomputeAggregationForLeaf, which is expected to return true if they managed to precompute. This should also make it easier to keep track of which aggregations have precomputation approaches (since they override this method). Signed-off-by: Michael Froh <froh@amazon.com> * Remove subaggregator check from CompositeAggregator Not sure why I added this, when the existing implementation didn't have it. That said, we *should* call finishLeaf() before precomputing the current leaf. Signed-off-by: Michael Froh <froh@amazon.com> * Resolve conflicts with star-tree changes Signed-off-by: Michael Froh <froh@amazon.com> * Skip precomputation when valuesSource is null Signed-off-by: Michael Froh <froh@amazon.com> * Add comment as suggested by @bowenlan-amzn Signed-off-by: Michael Froh <froh@amazon.com> --------- Signed-off-by: Michael Froh <froh@amazon.com> (cherry picked from commit 2847695) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
…7197) We've had a series of aggregation speedups that use the same strategy: instead of iterating through documents that match the query one-by-one, we can look at a Lucene segment and compute the aggregation directly (if some particular conditions are met). In every case, we've hooked that into custom logic hijacks the getLeafCollector method and throws CollectionTerminatedException. This creates the illusion that we're implementing a custom LeafCollector, when really we're not collecting at all (which is the whole point). With this refactoring, the mechanism (hijacking getLeafCollector) is moved into AggregatorBase. Aggregators that have a strategy to precompute their answer can override tryPrecomputeAggregationForLeaf, which is expected to return true if they managed to precompute. This should also make it easier to keep track of which aggregations have precomputation approaches (since they override this method). --------- (cherry picked from commit 2847695) Signed-off-by: Michael Froh <froh@amazon.com> Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
@msfroh Since this change is not a feature update, should we create a One major advantage to backport in 2.19 I see is that any critical bugs if we have to backport to 2.19 in future, can be easily backported to 2.19 without having to worry about making too many manual changes. Thoughts? cc - @rishabh6788 (2.19 Release Manager) |
That's a good question. Part of me says, "Well, I missed the 2.19 cut-off, so too bad". On the other hand, your argument about avoiding merge conflicts is also relevant. I'll defer to @rishabh6788's judgement. |
* Unify precomputation of aggregations behind a common API We've had a series of aggregation speedups that use the same strategy: instead of iterating through documents that match the query one-by-one, we can look at a Lucene segment and compute the aggregation directly (if some particular conditions are met). In every case, we've hooked that into custom logic hijacks the getLeafCollector method and throws CollectionTerminatedException. This creates the illusion that we're implementing a custom LeafCollector, when really we're not collecting at all (which is the whole point). With this refactoring, the mechanism (hijacking getLeafCollector) is moved into AggregatorBase. Aggregators that have a strategy to precompute their answer can override tryPrecomputeAggregationForLeaf, which is expected to return true if they managed to precompute. This should also make it easier to keep track of which aggregations have precomputation approaches (since they override this method). Signed-off-by: Michael Froh <froh@amazon.com> * Remove subaggregator check from CompositeAggregator Not sure why I added this, when the existing implementation didn't have it. That said, we *should* call finishLeaf() before precomputing the current leaf. Signed-off-by: Michael Froh <froh@amazon.com> * Resolve conflicts with star-tree changes Signed-off-by: Michael Froh <froh@amazon.com> * Skip precomputation when valuesSource is null Signed-off-by: Michael Froh <froh@amazon.com> * Add comment as suggested by @bowenlan-amzn Signed-off-by: Michael Froh <froh@amazon.com> --------- Signed-off-by: Michael Froh <froh@amazon.com> (cherry picked from commit 2847695) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Discussed with @rishabh6788 offline. We are in consensus to include this for the fore-mentioned reason. Adding up |
Description
We've had a series of aggregation speedups that use the same strategy: instead of iterating through documents that match the query one-by-one, we can look at a Lucene segment and compute the aggregation directly (if some particular conditions are met).
In every case, we've hooked that into custom logic that hijacks the getLeafCollector method and throws CollectionTerminatedException. This creates the illusion that we're implementing a custom LeafCollector, when really we're not collecting at all (which is the whole point).
With this refactoring, the mechanism (hijacking getLeafCollector) is moved into AggregatorBase. Aggregators that have a strategy to precompute their answer can override
tryPrecomputeAggregationForLeaf
, which is expected to return true if they managed to precompute.This should also make it easier to keep track of which aggregations have precomputation approaches (since they override this method).
Related Issues
N/A
Check List
Functionality includes testing.API changes companion pull request created, if applicable.Public documentation issue/PR created, if applicable.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.