Add a new quantile histogram aggregation for numeric fields #50386

agirbal · 2019-12-19T17:21:45Z

This issue is related to #31828 and to some extent #28993. It would be more useful for most of our use cases than #31828 (cc @pcsanwald).
I talked about this feature to @smayzak and @AlonaNadler a bit.

Problem: when doing histograms using a numeric value (on the X-axis) it is very common that the distribution of documents is concentrated in a tiny portion of the histogram. A common example if you want to plot against say "user request latency" of a production system, 90+% of them are going to concentrated in the 1st bucket - it is a long tail problem which is common to most production datasets. Trying to filter out higher values is very tedious and still you end up with a histogram distribution of values that is not conducive to making any analysis / conclusions.

Ideal solution: most data analysis (that we base decisions on) instead use a quantile distribution on the X-axis, meaning that each bucket represents an equivalent portion of the data. For example the first bucket would be the 10% users with best "request latency" (call it p0-10), next would be 10-20% best (p10-20), etc and last bucket is my 10% users with worst performance (p90-100). In turn this lets the operator do very clear analysis: "this change in my software is hurting performance by 5% for my 10% best connected users but improves 15% for my p90 users, so it's a very positive change." Each bucket could be either equal in terms of portion of dataset, or better you could just customize the ranges as percentile ranks, just like you do in the percentiles value function.

Workaround: As suggested by @jpountz you can do a pre-flight request to ES to obtain the quantile bucket bounds, then make a second request for a standard histogram with known buckets. I have done this and it works but it is extremely cumbersome and not viable solution really, besides a fun experiment. I had to create a complex HTML form to allow to pick the fields, percentiles, function to apply to Y-axis, etc. Then hack a complex URL query string to generate the Kibana histogram, guaranteed to break. From there the display in Kibana is not really shareable, you can't change time window or any filter without having to redo the whole thing, because the buckets need to be recalculated.

Note there is already Kibana tickets about it
elastic/kibana#3905 and elastic/kibana#3757 .
But it really seems for this to work seamlessly in Kibana, ES should support it as a native aggregation.
Thanks much!

elasticmachine · 2019-12-20T00:28:31Z

Pinging @elastic/es-analytics-geo (:Analytics/Aggregations)

talevy · 2020-07-23T15:10:41Z

It would be great to do this in two passes, on sorted data. blocked on multi-pass aggregation support

elasticsearchmachine · 2024-06-10T20:18:15Z

Pinging @elastic/es-analytical-engine (Team:Analytics)

Tim-Brooks added the :Analytics/Aggregations Aggregations label Dec 20, 2019

agirbal mentioned this issue Feb 24, 2020

Cumulative distribution function elastic/kibana#3905

Closed

rjernst added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label May 4, 2020

imotov added the team-discuss label Jun 26, 2020

imotov removed the team-discuss label Jul 23, 2020

not-napoleon mentioned this issue Jul 23, 2020

Feature request: Aggregation to produce buckets with a fixed number of documents in them #50120

Open

$@polyfractal$ polyfractal mentioned this issue Jul 28, 2020

Freedman-Diaconis histogram #60312

Closed

wchaparro added the >feature label May 5, 2022

wchaparro added the >enhancement label Jun 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a new quantile histogram aggregation for numeric fields #50386

Add a new quantile histogram aggregation for numeric fields #50386

agirbal commented Dec 19, 2019 •

edited

Loading

elasticmachine commented Dec 20, 2019

talevy commented Jul 23, 2020

elasticsearchmachine commented Jun 10, 2024

Add a new quantile histogram aggregation for numeric fields #50386

Add a new quantile histogram aggregation for numeric fields #50386

Comments

agirbal commented Dec 19, 2019 • edited Loading

elasticmachine commented Dec 20, 2019

talevy commented Jul 23, 2020

elasticsearchmachine commented Jun 10, 2024

agirbal commented Dec 19, 2019 •

edited

Loading