-
Notifications
You must be signed in to change notification settings - Fork 896
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add observable (async) counter to the metrics API #1590
Add observable (async) counter to the metrics API #1590
Conversation
1fa59f9
to
1264dcb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few non-blocking comments.
specification/metrics/new_api.md
Outdated
@@ -319,6 +421,14 @@ for the interaction between the API and SDK. | |||
* A value | |||
* [`Attributes`](../common/common.md#attributes) | |||
|
|||
## Compatibility | |||
|
|||
All the metrics components SHOULD allow new APIs to be added to existing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. We may need to provide guidance here during implementations, but glad to see this called out.
|
||
Example uses for `ObservableCounter`: | ||
|
||
* [CPU time](https://wikipedia.org/wiki/CPU_time), which could be reported for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think It''d be good to call out use cases for where "ObservableCounter" is not good for tracking CPU time.
E.g. in a multi-tenant scenario where you want to track CPU usage / user, you'd actually want some kind of sycnhronous instrument that can report on request-level usage with an identified user.
Observable / Async metrics are really only useful for context-less metrics (those that cannot interact with other telemetry). I'd like to call that out here for folks deciding between Async/Sync as I think Otel really shines in the Sync scenario w/ Context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, but this is tricky because you don't want users that report cpu_usage (not per request) to use the sync version and believe that is better :)
|
||
Example uses for `ObservableCounter`: | ||
|
||
* [CPU time](https://wikipedia.org/wiki/CPU_time), which could be reported for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, but this is tricky because you don't want users that report cpu_usage (not per request) to use the sync version and believe that is better :)
f69e4c4
to
b06a5d9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are any of the open comments actually un-resolved at this point?
So far all the outstanding/blocking comments are resolved. There are some open discussions (e.g. nice to have some high level document describing why we want to distinguish monotonic vs. non-monotonic, and providing examples/suggestions on when to use sync vs. async instrumentation), these are non-blocking and can be addressed in separate PRs. |
### CounterFunc | ||
|
||
`CounterFunc` is an asynchronous Instrument which reports | ||
[monotonically](https://wikipedia.org/wiki/Monotonic_function) increasing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Who do we believe has the onus to enforce that the values are monotonic? Is it undefined behavior if the callback returns a value that is smaller than the one it returned on the last invocation?
Alternatively, if the only purpose of saying the values are monotonic is to infer that the user prefers that the data is presented as a rate, I've yet to understand a problem with redefining the counter as a "uses rate presentation by default" counter. I can still make a mathematically well defined rate function that is sum_of_measurements / time and there is no requirement to only accept positive inputs.
(I'm fine accepting the PR as-is and resolving this as a follow up)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say something like the default SDK SHOULD enforce monotonicity, and that it is an error when a callback returns a value less than a previously returned value.
I can still make a mathematically well defined rate function that is sum_of_measurements / time and there is no requirement to only accept positive inputs
I take it you are considering whether we could drop the monotonicity idea. Why should we use separate instruments for monotonic and non-monotonic streams? There are legacy arguments that this information should be preserved, but I don't like to use those arguments. One good reason is simply to help the user. If you have a monotonic instrument, we can detect when you use it incorrectly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I take it you are considering whether we could drop the monotonicity idea
Yeah, but very open to the notion that I may not have the full picture. What I have read so far in the conversation suggests that users don't care about monotonicity, they care whether the data is shown as a rate or as an absolute value. If its true that rate vs. absolute value is all they care about then I think pros/cons of enforcing monotonicity anyways would be:
pros:
- some users may misuse the API and we correctly gave them error that helped them fix their code
cons:
- some users might want to report a negative rate and they will be frustrated if the SDK prevents that usage
- If we add more instruments to have separate monotonic and non-monotonic rate options this adds complexity to all users when choosing the appropriate instrument for their use case
- There is a (tiny) perf cost to verify every measurement, probably only relevant in the synchronous instrument case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've covered this topic during the SIG meeting that monotonic vs. non-monotonic is important for the user experience, and it is also widely adopted by many well established metrics systems/libraries.
So here goes my suggestion:
- Keep the current approach - have dedicated monotonic instruments (e.g. Counter/CounterFunc)
- Clarify in the SDK spec whether we want to the SDK to enforce monotonicity or not (I think I agree with @jmacd that the default SDK SHOULD enforce monotonicity).
- Use Editorial change - call out that Counter is monotonic #1593 to track the further clarification - for example, we might want to explain these concepts and put some examples in the README file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've covered this topic during the SIG meeting that monotonic vs. non-monotonic is important for the user experience
Perhaps I am being dense but that wasn't my understanding of the conversation. To me it felt like I put forth a question and an idea. Nobody responded saying they loved it nor did they respond saying its bad. Perhaps others thought it wasn't a good idea and were being polite, or wanted time to think it over, or I simply didn't understand the feedback : )
@jmacd, I know at the end of the SIG meeting you said you felt like you didn't have adequate opportunity to respond and mentioned some concern that the conversation was calling into question too much from the original spec. I wasn't clear whether you were refering to my comments or other comments/questions that were in the vicinity. Certainly my aim is not to push you (or anyone else) into a position you disagree with and I welcome your thoughts. So far I get the impression you aren't particularly excited about pivoting the definition of Counter to mean "displays as rate by default" with no enforcement of monotonicity. I don't know if anything I have said since was convincing or you still feel that being able to report negative measurements as errors is the most compelling concern at play so we should lay the issue to rest with that as the conclusion?
So here goes my suggestion:
To me the issue appeared unresolved. I don't want to hold up the PR and I am fine accepting that Counter defined to be monotonic is the presumptive outcome at this point. I am hoping there will be a little more discussion in #1593, ideally considering the alternatives, but if not that then at least to clarify and document the rationale. Assuming that Counter is monotonic, having the SDK be the enforcement makes sense to me. Thanks all!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some suggestions, but I am also happy to accept the changes as-is and follow up on specific adjustments/questions in future targetted PRs.
### CounterFunc | ||
|
||
`CounterFunc` is an asynchronous Instrument which reports | ||
[monotonically](https://wikipedia.org/wiki/Monotonic_function) increasing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say something like the default SDK SHOULD enforce monotonicity, and that it is an error when a callback returns a value less than a previously returned value.
I can still make a mathematically well defined rate function that is sum_of_measurements / time and there is no requirement to only accept positive inputs
I take it you are considering whether we could drop the monotonicity idea. Why should we use separate instruments for monotonic and non-monotonic streams? There are legacy arguments that this information should be preserved, but I don't like to use those arguments. One good reason is simply to help the user. If you have a monotonic instrument, we can detect when you use it incorrectly.
b0f134b
to
138281d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* add observable counter * change the wording to allow language client to decide how to handle callback state * update based on discussion during the SIG meeting * add observer example * clarify that it is the instrument not meter being observed * clarify that duplicates are not allowed, and the sdk can decide how to handle it * address review feedback * fix typo * specify that the callback ensures a single timestamp * rename to CounterFunc * address review comments * clarify that the callback function should not take indefinite amount of time * update the wording for callback based on PR comments * add notes that CounterFunc callback returns absolute value instead of delta/rate * update the example based on feedback
Related to #1578.
Changes
During the 03/25/2021 Metrics API/SDK SIG Meeting, we've agreed to tackle an async instrument after
Counter
is done.After this PR, the only outstanding issues for the metrics API spec would be:
This will be discussed during the upcoming (4/1) Metrics API/SDK SIG meeting.
Related oteps OTEP146