-
Notifications
You must be signed in to change notification settings - Fork 265
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Specify MinMaxLastSumCount aggregation be exposed as fields of SummaryDataPoint #170
Conversation
I feel that it is a small conflict between this and #171. Do you want to end up with quantiles (0,1) plus min and max? |
Yes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like there are some conflicts to resolve, but the change itself looks good
@jmacd I talked to someone from Splunk (good at math and statistics) to help us clarify if quantiles 0 and 1 can be treated as min and max. Will wait Monday to have an official answer, but he pointed me to some examples where this is already happening:
Will wait to chat more with him Monday and ask him to post an answer here. Hope you don't feel bad that I asked for some external help. |
@bogdandrutu That's great. I find the definition of quantile to be quite confusing, but it's the same as the definition for percentile. A question on this topic was raised in the SDK specification draft much earlier, and I did some research: |
More justification related to min/max value encoding: #171 (comment) |
Couple of suggestions:
|
While it's true the extreme quantiles are the only ones that can be computed exactly by aggregating those quantiles calculated separately on the elements of a partition, in general aggregated quantiles do possess a meaning. More precisely, if an ordered set S is expressed as a disjoint union of (ordered) subsets S_1, ..., S_n, and a particular quantile (e.g., P90) is estimated on each, producing values q(S_1), ..., q(S_n), then the range [min(q(S_1), ..., q(S_n)), max(q(S_1), ..., q(S_n))] is guaranteed to contain the true quantile (i.e., that of S itself). It just so happens that for q=0 and q=1 we can do a little better and ignore most of the range. A sample quantile is based on one (in rank-based methods) or two (if one wishes to interpolate) order statistics. Hyndman and Fan ("Sample Quantiles in Statistical Packages") contains a theoretical discussion of various approaches. There is no ambiguity about which order statistic should participate for q=0 or q=1. |
This corresponds with OTEP open-telemetry/oteps#117.