-
Notifications
You must be signed in to change notification settings - Fork 264
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add comments about StartTimeUnixNano #295
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good, but I'm requested two changes:
- Make it explicitly clear this algorithm is an outline. Possibly include a link to a specification section where we can go into more details and justification.
- We should be explicit with identity of metrics. At a minimum "resource" should be included in your key for the map.
Co-authored-by: Tigran Najaryan <4194920+tigrannajaryan@users.noreply.github.com>
@jsuereth and @tigrannajaryan please take another look at 03b2d36, I merged another commit clarifying how StartTimeUnixNano MAY be used with Gauges and Summaries for overlap detection. @bogdandrutu please review. |
@open-telemetry/specs-metrics-approvers PTAL. |
// and with it prescribes a solution for filling in StartTimeUnixNanos | ||
// with as much information as possible to assist consumers with | ||
// calculating rates and detecting overlap. This process can be | ||
// applied to Sum and Histogram points with cumulative aggregation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You cannot apply that unless you know:
- Sum is monotonic
- Histogram Sum is monotonic (which is something that we don't know, and have issue Is knowing that a Histogram/Summary sum is monotonic important? #187), which I think was not resolved, because we need at least to add a comment that the sum is "counter".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EDIT For histograms we should use the "count" as the counter and all other bins values are counters, so we are good on that.
But the question about #187 still stays, it says that is resolved in a manner but I don't see that in the documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to adding documentation on how to handle things as we discussed in #187. Do you want that in a separate PR (I can do that) or we can update this PR with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Separate PR is good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have tried to separate the topic of "reset" from the topic of computing a rate. The notion of a reset I'm trying to develop has to do with eliminating overlapping series that may have been accidentally written as well as calculating correct rates.
The logic here applies a StartTime to all the kinds of points you might find in a Prometheus RemoteWrite output (or WAL), for example, and the outcome is an OTLP stream that is not overlapping with itself as well as encodes information about reset semantics.
This formalizes the Prometheus heuristic for resetting counters in a way that lets OTLP points be written out of order, but it also ensures that two Prometheus servers configured to write the same series, converted into OTLP this way would be detected as overlapping.
// Data points with/ the 0 value for TimeUnixNano SHOULD be rejected | ||
// by consumers. | ||
// | ||
// # StartTimeUnixNano |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the interest of both getting this PR in, and ensuring information is easily found, Would It make sense to move the depiction of unbroken sequences and start time synthesis into the DataModel specificaiton (with a link from here) rather than in the proto file?
a. We may evolve/improve the description.
b. This isn't a great medium for delivering "why" content.
My mantra for this file is "What, not why", and for the DataModel specification is "Why then What" (i.e. Justification, then Specification semantics).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK; will do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 @jsuereth also we make sure we don't duplicate specification and get one of them outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also remove from line 134 to 154 for the moment. We can add it later.
This follows the discussion from 4/20/21, notes here.
CC: @vishiy @jsuereth @open-telemetry/specs-metrics-approvers
Resolves #292
Resolves #289