Partitionable aggregations #21487

markharwood · 2016-11-11T09:09:08Z

Currently users frequently run into memory/circuit-breaker issues trying to perform analytics on high cardinality fields e.g. when finding IP addresses that have had more than 3 sessions.
The combination of expensive aggs such as cardinality under fields with many terms has an explosive effect. Entity centric indexing or collect_mode:breadth_first can help but aren't always a solution.
This proposal is that the terms agg should allow include clauses that help partition high-cardinality fields so that a client request can focus on just a subset of the overall data i.e.

"terms": {
  "field": "IP_ADDRESS",
  "include":{
	"partition":1,
	"of":20
  }

The client could then make repeated requests for partition 1, then 2 etc. Internally the terms agg would filter where the hash-modulo-N of a term did not match the chosen partition number.

The fuller example of "ip addresses with many sessions" example would then look like this (using pipeline aggs to remove uninteresting results)

{
	"aggs": {
		"anomalousIPs": {
			"terms": {
				"field": "IP_ADDRESS",
				"size": 10000,
				"order": {
					"numSessions": "desc"
				},
				"include": {
					"partition": 1,
					"of": 20
				}
			},
			"aggs": {
				"numSessions": {
					"cardinality": {
						"field": "session_id",
						"precision_threshold": 100
					}
				},
				"tooManySessions": {
					"bucket_selector": {
						"buckets_path": {
							"numSessions": "numSessions"
						},
						"script": "params.numSessions>3"
					}
				}

			}
		}
	}
}

Users today could of course create hashed forms of indexed values in the index and query ranges of those values to achieve the same effect (perhaps more efficiently) but this new syntax is arguably less work for the client and works with existing indices. Thoughts?

The text was updated successfully, but these errors were encountered:

markharwood · 2016-11-11T10:56:02Z

@jpountz we discussed this on FixItFriday but didn't reach a conclusion - said we'd hold off for your input

jpountz · 2016-11-14T09:47:49Z

This kind of feature would help compute exhaustive results, which can't be done with the current API. However it's not clear to me which aggregations would benefit from it in terms of memory usage (eg. the example agg should work pretty well with breadth_first?). Currently I tend to see it more as a solution to #4915 (with less scope, which is needed anyway since pagination in general is not something we can implement) than to memory-usage issues.

Reading your proposal makes me wonder whether we could achieve the same result without modifying aggregations but exending the current slice parameter to also work with non-scroll requests and non-numeric fields.

markharwood · 2016-11-14T10:02:19Z

eg. the example agg should work pretty well with breadth_first?

A breadth_first request would be ignored (or error?) because it is reliant on results from the child agg.

exending the current slice parameter to also work with non-scroll requests and non-numeric fields.

It can be more complex than that - sometimes we are talking about multi-value fields e.g. an exhaustive analysis of all tags used in stackoverflow articles.

without modifying aggregations

Aggs themselves wouldn't change - the proposal is isolated to adding another type of filter to the existing IncludeExclude class.

markharwood · 2016-11-17T13:30:23Z

I prototyped this in IncludeExclude and benchmarked for a high-cardinality query (finding which of the 2.7m user accounts on StackOverflow look like they haven't been active since 2010). Each question doc has many user accounts associated with them. Example query:

GET stackq3/_search
{
   "size": 0,
   "aggs": {
      "accountRetirementCandidates": {
         "terms": {
            "field": "user",
            "size": 10000,
            "include": {
               "partition": 1,
               "of": 100
            },
            "order": {
               "lastVisit": "asc"
            }
         },
         "aggs": {
            "lastVisit": {
               "max": {
                  "field": "lastUpdateDate"
               }
            },
            "lastActiveBefore2010": {
               "bucket_selector": {
                  "buckets_path": {
                     "lastVisit": "lastVisit"
                  },
                  "script": "params.lastVisit<1262445274573l"
               }
            }
         }
      }
   }
}

Without term partitioning getting an exhaustive list simply would not be feasible as a single request on the existing data model and would require resorting to entity-centric indexing.

jimczi · 2016-11-17T14:40:32Z

A breadth_first request would be ignored (or error?) because it is reliant on results from the child agg.

The breadth_first would work, it's just that in this case the child agg (bucket_selector) would be executed with the terms aggregation and the second round would compute the cardinality for the selected terms.

Isn't it possible to do this with a scripted terms aggs ? The execution time should be similar since you need to access the term (and not just the global ordinals) to compute the hash ?

markharwood · 2016-11-17T15:18:49Z

the second round would compute the cardinality for the selected terms.

Breadth first is ignored in my first example because the top-level terms aggs is sorted on the child numSessions cardinality agg. This code is where it decides NOT to do breadth_first on a child agg because it sees it is responsible for sorting.

Isn't it possible to do this with a scripted terms aggs ?

Yes, but being a script

I expect it to be slower
It needs to get hold of a decent hash algo e.g. MurmurHash
It would need to work with multi-value fields (see my last StackOverflow example query)
It doesn't have (faster) access to ordinals.

jpountz · 2016-11-17T15:29:03Z

I think Jim still made a good point that it would be nice to figure out whether we could do the partitioning differently so that we do not have to access values to know whether they are part of the current partition.

One way would be to use the ordinal range [partition*maxOrd/numPartitions, (partition+1)*maxOrd/numPartitions] even though it also has drawbacks, since eg. the partitions would become different if some terms are added or removed before we got time to retrieve data for all partitions.

Another way would be to use ranges of terms. For instance if there are 256 partitions in total, we could partition based on the first byte (we could figure out the ordinal range by calling lookupTerm on the min/max values). But this has a different issue that partitions can have very different sizes (especially in the worst case that they all share a common prefix, like urls).

Something I like about this proposal in general is that hopefully it would solve some of the use-cases behind the "paging support for aggregations" request. (#4915)
Based on the pros/cons of all options, I think I would lean towards using ordinal ranges ([partition*maxOrd/numPartitions, (partition+1)*maxOrd/numPartitions]) for strings and hashing for numbers?

markharwood · 2016-11-17T15:37:06Z

I think we're on the same page with the ordinal support? Rough PR here: #21626

jimczi · 2016-11-17T15:41:17Z

Breadth first is ignored in my first example because the top-level terms aggs is sorted on the child numSessions cardinality agg. This code is where it decides NOT to do breadth_first on a child agg because it sees it is responsible for sorting.

Right I missed the fact that the bucket selector needs to access the result of the cardinality ;).

I think Jim still made a good point that it would be nice to figure out whether we could do the partitioning differently so that we do not have to access values to know whether they are part of the current partition.

I like the global ords proposal but how can we ensure that maxOrd does not change between requests ?

jpountz · 2016-11-17T15:43:47Z

I like the global ords proposal but how can we ensure that maxOrd does not change between requests?

Indeed we can't. So this would be best-effort only, similarly to pagination when not using the scroll API.

markharwood · 2016-11-17T16:01:49Z

Note: from my benchmark tests there's a sweet spot to the num partitions you select because bigger numbers= lower-memory requests and faster (individual) responses but you have to do more requests to process all the data. The costs start to accumulate with high partition numbers because there's a fixed-cost element to running each request which is the feeding of all terms into the Include filter.

jimczi · 2016-11-17T16:19:21Z

Indeed we can't. So this would be best-effort only, similarly to pagination when not using the scroll API.

The problem (like with paginations) is that there is no way to check if a response is valid or not. This makes this feature usable only on static indices otherwise the results could be completely wrong.
Since this is intended to exhaust a terms aggregation with high cardinality I think that precision should come before speed. Could we just rely on the term itself for the partitioning ? I know it will be slow but at least it will always work. Maybe we can figure out after how to speed up things but the ordinal based partition seems risky to me.

markharwood · 2016-11-17T16:39:36Z

The client can use this to avoid ordinals:

     "execution_hint":"map",

jimczi · 2016-11-17T16:51:32Z

@markharwood sorry I jumped from this issue to the PR and realized that afterward. Though I agree with Adrien here, we should be able to use the ordinals for the terms agg and the strings for the partitioning. This way the aggregation is still fast and always accurate ? I really don't know how to use this if I need to ensure that my index is not updated meanwhile.

markharwood · 2016-11-18T11:20:17Z

Refresh problems aside, presumably there's a more fundamental issue that partitioning on global ordinals will give wrong results in a multi-shard system because "global" is only in the sense of spanning segments - not shards. For this reason we have to partition on values.

jimczi · 2016-11-18T11:48:43Z

@markharwood lol !

…ions so that multiple requests can be done without trying to compute everything in one request. Closes elastic#21487

…ions so that multiple requests can be done without trying to compute everything in one request. Closes #21487

markharwood added :Analytics/Aggregations Aggregations discuss >enhancement labels Nov 11, 2016

markharwood mentioned this issue Nov 17, 2016

Support for partitioning set of terms #21626

Closed

markharwood mentioned this issue Nov 23, 2016

Paging support for aggregations #4915

Closed

markharwood closed this as completed in aa60e5c Nov 24, 2016

markharwood added v5.2.0 v6.0.0-alpha1 and removed discuss labels Nov 24, 2016

markharwood added a commit that referenced this issue Nov 24, 2016

Aggregations - support for partitioning set of terms used in aggregat…

d78ae86

…ions so that multiple requests can be done without trying to compute everything in one request. Closes #21487

markharwood mentioned this issue Mar 24, 2017

Terms aggs Partitions doesn't work as expected #23740

Closed

jrmdayn mentioned this issue Mar 6, 2018

add elastic search store stratumn/go-core#357

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Partitionable aggregations #21487

Partitionable aggregations #21487

markharwood commented Nov 11, 2016

markharwood commented Nov 11, 2016

jpountz commented Nov 14, 2016

markharwood commented Nov 14, 2016 •

edited

Loading

markharwood commented Nov 17, 2016

jimczi commented Nov 17, 2016

markharwood commented Nov 17, 2016

jpountz commented Nov 17, 2016 •

edited

Loading

markharwood commented Nov 17, 2016

jimczi commented Nov 17, 2016

jpountz commented Nov 17, 2016

markharwood commented Nov 17, 2016

jimczi commented Nov 17, 2016

markharwood commented Nov 17, 2016

jimczi commented Nov 17, 2016

markharwood commented Nov 18, 2016

jimczi commented Nov 18, 2016

Partitionable aggregations #21487

Partitionable aggregations #21487

Comments

markharwood commented Nov 11, 2016

markharwood commented Nov 11, 2016

jpountz commented Nov 14, 2016

markharwood commented Nov 14, 2016 • edited Loading

markharwood commented Nov 17, 2016

jimczi commented Nov 17, 2016

markharwood commented Nov 17, 2016

jpountz commented Nov 17, 2016 • edited Loading

markharwood commented Nov 17, 2016

jimczi commented Nov 17, 2016

jpountz commented Nov 17, 2016

markharwood commented Nov 17, 2016

jimczi commented Nov 17, 2016

markharwood commented Nov 17, 2016

jimczi commented Nov 17, 2016

markharwood commented Nov 18, 2016

jimczi commented Nov 18, 2016

markharwood commented Nov 14, 2016 •

edited

Loading

jpountz commented Nov 17, 2016 •

edited

Loading