-
Notifications
You must be signed in to change notification settings - Fork 24.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Partitionable aggregations #21487
Comments
@jpountz we discussed this on FixItFriday but didn't reach a conclusion - said we'd hold off for your input |
This kind of feature would help compute exhaustive results, which can't be done with the current API. However it's not clear to me which aggregations would benefit from it in terms of memory usage (eg. the example agg should work pretty well with breadth_first?). Currently I tend to see it more as a solution to #4915 (with less scope, which is needed anyway since pagination in general is not something we can implement) than to memory-usage issues. Reading your proposal makes me wonder whether we could achieve the same result without modifying aggregations but exending the current |
A breadth_first request would be ignored (or error?) because it is reliant on results from the child agg.
It can be more complex than that - sometimes we are talking about multi-value fields e.g. an exhaustive analysis of all tags used in stackoverflow articles.
Aggs themselves wouldn't change - the proposal is isolated to adding another type of filter to the existing IncludeExclude class. |
I prototyped this in IncludeExclude and benchmarked for a high-cardinality query (finding which of the 2.7m user accounts on StackOverflow look like they haven't been active since 2010). Each question doc has many user accounts associated with them. Example query:
Without term partitioning getting an exhaustive list simply would not be feasible as a single request on the existing data model and would require resorting to entity-centric indexing. |
The breadth_first would work, it's just that in this case the child agg (bucket_selector) would be executed with the terms aggregation and the second round would compute the cardinality for the selected terms. Isn't it possible to do this with a scripted terms aggs ? The execution time should be similar since you need to access the term (and not just the global ordinals) to compute the hash ? |
Breadth first is ignored in my first example because the top-level terms aggs is sorted on the child numSessions cardinality agg. This code is where it decides NOT to do breadth_first on a child agg because it sees it is responsible for sorting.
Yes, but being a script
|
I think Jim still made a good point that it would be nice to figure out whether we could do the partitioning differently so that we do not have to access values to know whether they are part of the current partition. One way would be to use the ordinal range Another way would be to use ranges of terms. For instance if there are 256 partitions in total, we could partition based on the first byte (we could figure out the ordinal range by calling Something I like about this proposal in general is that hopefully it would solve some of the use-cases behind the "paging support for aggregations" request. (#4915) |
I think we're on the same page with the ordinal support? Rough PR here: #21626 |
Right I missed the fact that the bucket selector needs to access the result of the cardinality ;).
I like the global ords proposal but how can we ensure that maxOrd does not change between requests ? |
Indeed we can't. So this would be best-effort only, similarly to pagination when not using the scroll API. |
Note: from my benchmark tests there's a sweet spot to the num partitions you select because bigger numbers= lower-memory requests and faster (individual) responses but you have to do more requests to process all the data. The costs start to accumulate with high partition numbers because there's a fixed-cost element to running each request which is the feeding of all terms into the Include filter. |
The problem (like with paginations) is that there is no way to check if a response is valid or not. This makes this feature usable only on static indices otherwise the results could be completely wrong. |
The client can use this to avoid ordinals:
|
@markharwood sorry I jumped from this issue to the PR and realized that afterward. Though I agree with Adrien here, we should be able to use the ordinals for the terms agg and the strings for the partitioning. This way the aggregation is still fast and always accurate ? I really don't know how to use this if I need to ensure that my index is not updated meanwhile. |
Refresh problems aside, presumably there's a more fundamental issue that partitioning on global ordinals will give wrong results in a multi-shard system because "global" is only in the sense of spanning segments - not shards. For this reason we have to partition on values. |
@markharwood lol ! |
…ions so that multiple requests can be done without trying to compute everything in one request. Closes elastic#21487
…ions so that multiple requests can be done without trying to compute everything in one request. Closes #21487
Currently users frequently run into memory/circuit-breaker issues trying to perform analytics on high cardinality fields e.g. when finding IP addresses that have had more than 3 sessions.
The combination of expensive aggs such as
cardinality
under fields with many terms has an explosive effect. Entity centric indexing orcollect_mode:breadth_first
can help but aren't always a solution.This proposal is that the
terms
agg should allowinclude
clauses that help partition high-cardinality fields so that a client request can focus on just a subset of the overall data i.e.The client could then make repeated requests for partition 1, then 2 etc. Internally the
terms
agg would filter where the hash-modulo-N of a term did not match the chosen partition number.The fuller example of "ip addresses with many sessions" example would then look like this (using pipeline aggs to remove uninteresting results)
Users today could of course create hashed forms of indexed values in the index and query ranges of those values to achieve the same effect (perhaps more efficiently) but this new syntax is arguably less work for the client and works with existing indices. Thoughts?
The text was updated successfully, but these errors were encountered: