-
Notifications
You must be signed in to change notification settings - Fork 164
Metrics API instrument foundation and refinements #88
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am pro having the properties of monotonic/absolute embedded in the name/type of the Instrument instead of being properties of the instrument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, much simpler having optional instruments than instrument options.
Reading this, I realize I don't understand how observer measurements are supposed to be aggregated. Is the idea that a single observer instrument can capture multiple measurements, but each has to have a different labelset?
As posted in https://gitter.im/open-telemetry/metrics-specs?at=5e67b66ef046a30bbe55b092 In last week's Metrics SIG we had a conversation about the interpretation of the "instrument refinements" proposed in the OTEP, and we came to realize a problem with the set of refinements being proposed, which were "sum-only", "non-negative", and "monotonic". The test case was how to think about exporting a cumulative CPU-usage counter. The problem was that metric events are aggregated differently with respect to time than they are with respect to the spatial dimensions for cumulative instruments. I've updated the document with a significant amount of new text. I've introduced the term "Rate aggregation", which is defined differently based on whether the values being reported are deltas or cumulatives. Now the document describes 4 refinements, "sum-only", "non-negative" are still there from before, and the two new ones are "precomputed-sum" and "non-negative-rate". The "pre-computed sum" refinement says that a value is cumulative. The "non-negative-rate" refinement says that the sum is monotonic. There are still no functional changes in this proposal, just a revision of the mental model. |
Based on discussion in the metrics SIG, I have started a document to capture vendor and open-source backend metric types. Please add all relevant information: https://docs.google.com/document/d/1-Tut-w5lACJhrjGG1Jn9C41uDMquZpkd1_oXLpiYFK0/edit?usp=sharing |
@open-telemetry/specs-approvers please review this. |
@open-telemetry/specs-approvers We are experiencing delays. PLEASE REVIEW. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pulling all this together Josh.
Based on the Last Value relationship, we can ask and answer questions | ||
such as "what is the average last value of a metric at a point in | ||
time?". Observer instruments define the Last Value relationship | ||
without referring to the collection interval and without ambiguity. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since "last value" is, by its very nature, a single value for a given reporting interval, doesn't the "average" need to be defined over some set of collection intervals? And, if this is a "temporal" average (rather than a "spatial" average), doesn't there need to be some concept of a time window over which the average is computed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think of average being well-defined for a single. interval. If there is no aggregation, there's a count of 1 and the average equals the single value. If there is spatial aggregation within the interval, the average is still well defined.
If there is temporal aggregation over multiple intervals, I think the average is still well defined. What we're not defining, however, is the average over a specific time range--the average is defined over a range of intervals.
I like the "last value" relationship defined here, since we can write queries that apply across machines, even though their collection intervals are not synchronized. The query is something like "read the last-value of metric X at time T across a set of machines and average the results". We are using the last value at the most-recent point in time as the input to a distributed spatial aggregation.
We could try to define an average across time and space using this data, but now some interpolation may be required, since the contributing points are spread across time and the query window is probably not aligned with the collection intervals of the target machines.
See also https://github.com/open-telemetry/oteps/pull/96/files#diff-e822e6cd6cb430ee3f62ff7fbeaed0daR130
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's good. I was just confused about the question when the answer seemed very obvious. :)
_AbsoluteMeasure_ with the correct unit and a language-specific | ||
duration type for measuring time. | ||
|
||
If the above open question is decided in favor of treating the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the open question referenced disappeared in some edit. It's not there now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The question was whether names like "Measure" and "Observer" should be reserved as explanatory, descriptive names, and whether we should find new, different names for the actual, concrete unrestricted/unrefined form of these instruments handled by users. This was discussed in the SIG call but no issue specifically was filed.
This has been referred to as "Abstract foundation instruments" vs "Non-abstract foundation instruments". I originally proposed Non-abstract, i.e., that the unrestricted Measure instrument should be called Measure. @bogdandrutu convinced me to go the other way, which I have done in this document's sample proposal and in OTEP 93.
However, I have changed that stance again in OTEP 96, which states that we should use "Recorder" instead of "Measure", and that the default kind of Recorder instance should be named "Recorder--this is the Non-abstract proposal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, I know the open question...it just isn't in this doc any more that I could see. :)
* Work on new ontology * More * Typos * Rewording and typos * Updates * More table * Typo * Add an example * Add detail and translation guidance * About monotonicity * More on Last Value relationship * Clarify _aggregation_ * For Bogdan * Note about OTEP 93 and OTEP 96 * Formatting
* Work on new ontology * More * Typos * Rewording and typos * Updates * More table * Typo * Add an example * Add detail and translation guidance * About monotonicity * More on Last Value relationship * Clarify _aggregation_ * For Bogdan * Note about OTEP 93 and OTEP 96 * Formatting
* Work on new ontology * More * Typos * Rewording and typos * Updates * More table * Typo * Add an example * Add detail and translation guidance * About monotonicity * More on Last Value relationship * Clarify _aggregation_ * For Bogdan * Note about OTEP 93 and OTEP 96 * Formatting
…ps#88) * Work on new ontology * More * Typos * Rewording and typos * Updates * More table * Typo * Add an example * Add detail and translation guidance * About monotonicity * More on Last Value relationship * Clarify _aggregation_ * For Bogdan * Note about OTEP 93 and OTEP 96 * Formatting
This OTEP stems from discussion immediately following the v0.3 metrics API specification.
Partly resolves:
open-telemetry/opentelemetry-specification#467
open-telemetry/opentelemetry-specification#465
open-telemetry/opentelemetry-specification#464
open-telemetry/opentelemetry-specification#462