-
Notifications
You must be signed in to change notification settings - Fork 267
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update SummaryDataPoint percentile comment #127
Update SummaryDataPoint percentile comment #127
Conversation
Include information on the way we are fitting the `MinMaxSumCount` aggregation into the `Summary` metric kind.
// To support the Min and Max values of a MinMaxSumCount aggregation the | ||
// following conventions are used: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would not make a direct reference to MinMaxSumCount
:
To record Min and Max values following conventions are used:
// - The 100th percentile is equivalent to the maximum value observed.
// - The 0th percentile is equivalent to the minimum value observed.
Also I couldn't find a good source to confirm that 0/100 are not mathematically correct (I found some sources saying that they may not be generally accepted), but I would not put a strong phrase here without a good proof.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not entirely sure why we wouldn't include a reference to the MinMaxSumCount
aggregation as OpenTelemetry implementations will expect guidance from the OTLP as to what should be done with this native aggregation. It seems like I've missed the target audience of who will be using this OTLP. Who do you envision using this?
As for the mathematical correctness, from the linked ticket in the comment:
the 0th percentile is the value which 0% of events occured, where as the minimum is the minimal value where at least 1 event occured
It is based on the definition of a percentile that the incorrectness is determined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is the source of the comment in the issue?
What I found so far:
- https://www.quora.com/How-do-you-calculate-the-0th-and-the-100th-percentile-of-a-group-of-set-of-numbers
- Even on the wikipedia there are mentions about 0th and 100th "Note that in theory the 0th percentile falls at negative infinity and the 100th percentile at positive infinity, although in many practical applications, such as test results, natural lower and/or upper limits are enforced.". https://en.wikipedia.org/wiki/Percentile
So I am not sure where the sentence "0th percentile is mathematically incorrect" comes from.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MrAlias I don't care that you include a reference to that, but what about DDSketch do we include a reference to that as well? Or any other algorithm that can produce percentiles?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't care that you include a reference to that, but what about DDSketch do we include a reference to that as well? Or any other algorithm that can produce percentiles?
Ah! Makes sense. I'll remove it.
With regard to the source of the comment in the issue, it was primarily based my prior work with percentiles, but the Wikipedia article you linked also defines them the same way:
A percentile (or a centile) is a measure used in statistics indicating the value below which a given percentage of observations in a group of observations falls. For example, the 20th percentile is the value (or score) below which 20% of the observations may be found. Similarly, 80% of the observations are found above the 20th percentile.
From that it follow that the 0th percentile is the value below which 0% of the observations may be found, meaning zero events.
Again, I was pulling the definition of a minimum from past work, but as defined by Wikipedia:
In statistics, the sample maximum and sample minimum, also called the largest observation and smallest observation, are the values of the greatest and least elements of a sample.
From this definition it follows that at least one event had this value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also just realized it might not be so much the correctness of the statement as it is the strength of it which is the main reason you wanted me to take it out. Sorry, I think I missed that on the first read through. Updated to the suggested language.
3f5e3a1
to
13c8eba
Compare
Please rebase so I can merge. |
Include information on the way we are fitting the
MinMaxSumCount
aggregation into theSummary
metric kind.Relates to #125