Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update SummaryDataPoint percentile comment #127

Merged
merged 4 commits into from
Mar 15, 2020

Conversation

MrAlias
Copy link
Contributor

@MrAlias MrAlias commented Mar 13, 2020

Include information on the way we are fitting the MinMaxSumCount aggregation into the Summary metric kind.

Relates to #125

Include information on the way we are fitting the `MinMaxSumCount`
aggregation into the `Summary` metric kind.
Comment on lines 344 to 345
// To support the Min and Max values of a MinMaxSumCount aggregation the
// following conventions are used:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not make a direct reference to MinMaxSumCount:

To record Min and Max values following conventions are used:
// - The 100th percentile is equivalent to the maximum value observed.
// - The 0th percentile is equivalent to the minimum value observed.

Also I couldn't find a good source to confirm that 0/100 are not mathematically correct (I found some sources saying that they may not be generally accepted), but I would not put a strong phrase here without a good proof.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely sure why we wouldn't include a reference to the MinMaxSumCount aggregation as OpenTelemetry implementations will expect guidance from the OTLP as to what should be done with this native aggregation. It seems like I've missed the target audience of who will be using this OTLP. Who do you envision using this?

As for the mathematical correctness, from the linked ticket in the comment:

the 0th percentile is the value which 0% of events occured, where as the minimum is the minimal value where at least 1 event occured

It is based on the definition of a percentile that the incorrectness is determined.

Copy link
Member

@bogdandrutu bogdandrutu Mar 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is the source of the comment in the issue?

What I found so far:

So I am not sure where the sentence "0th percentile is mathematically incorrect" comes from.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MrAlias I don't care that you include a reference to that, but what about DDSketch do we include a reference to that as well? Or any other algorithm that can produce percentiles?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't care that you include a reference to that, but what about DDSketch do we include a reference to that as well? Or any other algorithm that can produce percentiles?

Ah! Makes sense. I'll remove it.

With regard to the source of the comment in the issue, it was primarily based my prior work with percentiles, but the Wikipedia article you linked also defines them the same way:

A percentile (or a centile) is a measure used in statistics indicating the value below which a given percentage of observations in a group of observations falls. For example, the 20th percentile is the value (or score) below which 20% of the observations may be found. Similarly, 80% of the observations are found above the 20th percentile.

From that it follow that the 0th percentile is the value below which 0% of the observations may be found, meaning zero events.

Again, I was pulling the definition of a minimum from past work, but as defined by Wikipedia:

In statistics, the sample maximum and sample minimum, also called the largest observation and smallest observation, are the values of the greatest and least elements of a sample.

From this definition it follows that at least one event had this value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also just realized it might not be so much the correctness of the statement as it is the strength of it which is the main reason you wanted me to take it out. Sorry, I think I missed that on the first read through. Updated to the suggested language.

@MrAlias MrAlias force-pushed the update-metric-docs branch from 3f5e3a1 to 13c8eba Compare March 14, 2020 00:49
@bogdandrutu
Copy link
Member

Please rebase so I can merge.

@bogdandrutu bogdandrutu merged commit caed74b into open-telemetry:master Mar 15, 2020
@MrAlias MrAlias deleted the update-metric-docs branch March 15, 2020 19:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants