From 01afa0b4ba489dac204b46597f4b37b3b015cc87 Mon Sep 17 00:00:00 2001 From: jmacd Date: Tue, 21 Jan 2020 17:03:52 -0800 Subject: [PATCH 1/3] Remove the Metric Gauge instrument, recommend exclusive use of Observer instrument --- text/0080-remove-metric-gauge.md | 59 ++++++++++++++++++++++++++++++++ 1 file changed, 59 insertions(+) create mode 100644 text/0080-remove-metric-gauge.md diff --git a/text/0080-remove-metric-gauge.md b/text/0080-remove-metric-gauge.md new file mode 100644 index 000000000..f41dda563 --- /dev/null +++ b/text/0080-remove-metric-gauge.md @@ -0,0 +1,59 @@ +# Remove the Metric API Gauge instrument + +The [Observer instrument](./0072-metric-observer.md) is semantically +identical to the metric Gauge instrument, only it is reported via a +callback instead of synchronous API calls. Implentation has shown +that Gauge instruments are difficult to reason about because the +semantics of a "last value" Aggregator have to address questions about +statefulness--the SDK's ability to recall old values. Observer +instruments avoid some of these concerns because they are reported +once per collection period, making it easier to reason about "all +values" in an aggregator. + +## Motivation + +Observer instruments improve on our ability to define certain +aggregations, compared with the existing Gauge instrument. Because of +callbacks, Observer instruments support reporting a complete snapshot +of the "last" values of an instrument, for all active label sets, for +a single collection period. This is accomplished requiring new state +to be managed by the SDK, simply because the Observer instrument is +expected to support a current value for all active label sets. + +## Explanation + +Knowing that each collection period contains a complete set of +instrument values, it is possible to define a sum aggregation and to +assign ratios to values. This aggregation is not possible for Gauge +instruments without imposing new state requirements on the SDK, +essentially requiring it to remember values from prior collection +periods. + +## Internal details + +The Gauge instrument will be removed from the specification at the +same time the Observer instrument is added. This will make the +transition easier because in many cases, Observer instruments simply +replace Gauge instruments in the text. + +## Trade-offs and mitigations + +Not much is lost to the user from removing Gauge instruments. There +may be situations where it is undesireable to be interrupted by the +Metric SDK in order to execute an Observer callback--situations where +Observer semantics are correct but a synchronous API is more +acceptable. These cases can be addressed by Observer instruments +paired with helpers to maintain last-value state and define the +"complete" set of values. This requires more of the user, but users +are likely to be able to optimize this more than the SDK could have, +and this forces the user to recognize that uses of Gauge-like +instruments generally should know the complete set of values when +reporting. + +## Prior art and alternatives + +Many existing Metric libraries support both synchronous and +asynchronous Gauge-like instruments. + +See the initial discussion in [Spec issue +412](https://github.com/open-telemetry/opentelemetry-specification/issues/412). From c8ba41887094a0b8e36f1a979e7b7c56f5b768db Mon Sep 17 00:00:00 2001 From: jmacd Date: Tue, 21 Jan 2020 17:07:13 -0800 Subject: [PATCH 2/3] Typos --- text/0080-remove-metric-gauge.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/text/0080-remove-metric-gauge.md b/text/0080-remove-metric-gauge.md index f41dda563..74bb6faf0 100644 --- a/text/0080-remove-metric-gauge.md +++ b/text/0080-remove-metric-gauge.md @@ -2,7 +2,7 @@ The [Observer instrument](./0072-metric-observer.md) is semantically identical to the metric Gauge instrument, only it is reported via a -callback instead of synchronous API calls. Implentation has shown +callback instead of synchronous API calls. Implementation has shown that Gauge instruments are difficult to reason about because the semantics of a "last value" Aggregator have to address questions about statefulness--the SDK's ability to recall old values. Observer @@ -39,7 +39,7 @@ replace Gauge instruments in the text. ## Trade-offs and mitigations Not much is lost to the user from removing Gauge instruments. There -may be situations where it is undesireable to be interrupted by the +may be situations where it is undesirable to be interrupted by the Metric SDK in order to execute an Observer callback--situations where Observer semantics are correct but a synchronous API is more acceptable. These cases can be addressed by Observer instruments From f5359f3d933b1dd1eacba22ba2b4b39c56744e9c Mon Sep 17 00:00:00 2001 From: jmacd Date: Tue, 21 Jan 2020 23:46:54 -0800 Subject: [PATCH 3/3] More text --- text/0080-remove-metric-gauge.md | 119 ++++++++++++++++++++++++------- 1 file changed, 95 insertions(+), 24 deletions(-) diff --git a/text/0080-remove-metric-gauge.md b/text/0080-remove-metric-gauge.md index 74bb6faf0..59055d220 100644 --- a/text/0080-remove-metric-gauge.md +++ b/text/0080-remove-metric-gauge.md @@ -12,22 +12,85 @@ values" in an aggregator. ## Motivation -Observer instruments improve on our ability to define certain -aggregations, compared with the existing Gauge instrument. Because of -callbacks, Observer instruments support reporting a complete snapshot -of the "last" values of an instrument, for all active label sets, for -a single collection period. This is accomplished requiring new state -to be managed by the SDK, simply because the Observer instrument is -expected to support a current value for all active label sets. +Observer instruments improve on our ability to compute well-defined +sum and average-value aggregations over a set of last-value aggregated +data, compared with the existing Gauge instrument. Using data from an +Observer instrument, we are easily able to pose queries about the +current sum of all current values as well as the number of distinct +values, which together define the average value. + +To do the same with synchronous Gauge instruments, the SDK would +potentially be required to maintain state outside a single collection +window, which complicates memory management. The SDK is required to +maintain state about all distinct label sets over the query evaluation +interval. + +The question is: how long should the SDK remember a gauge value? +Observer instruments do not pose this complication, because +observations are synchronized with collection instead of with the +application. + +Unlike with Gauge instruments, Observer instruments naturally define +the current set of all values for a single collection period, making +sum and average-value aggregations possible without mention of the +query evaluation interval, and without the implied additional state +management. ## Explanation -Knowing that each collection period contains a complete set of -instrument values, it is possible to define a sum aggregation and to -assign ratios to values. This aggregation is not possible for Gauge -instruments without imposing new state requirements on the SDK, -essentially requiring it to remember values from prior collection -periods. +The Gauge instrument's most significant feature is that its +measurement interval is arbitrary -- controlled by the application +through explicit, synchronous calls to `Set()`. It is used to report +a current value in a synchronous context, meaning the metric event is +associated with a label set determined by some "request". + +This proposal recommends that synchronously reporting Gauge values can +always be accomplished using one of the three other kinds of +instrument. + +It was _already_ recommended in the specification that if the +instrument reports values you would naturally sum, you should have +used a Counter in the first place. These are not really "current" +values when reported, they are current contributions to the sum. We +still recommend Counters in this case. + +If the gauge reports values, where you would naturally average the +last value across distinct label sets, use a Measure instrument. +Configure the instrument for last-value aggregation. Since last-value +aggregation is not the default for Measure instruments, this will be +non-standard and require extra configuration. + +If the gauge reports values, where you would naturally sum the last +value across distinct label sets, use an Observer instrument. The +current set of entities (e.g., shards, active users, etc) constributes +a last value that should be summed. These are different from Counter +instruments because we are not interested in a sum across time, we are +interested in a sum across distinct instances. + +### Example: Reporting per-request CPU usage + +Use a counter to report a quantity that is naturally summed over time, +such as CPU usage. + +### Example: Reporting per-shard memory holdings + +There are a number of current shards holding variable amounts of +memory by a widely-used library. Observe the current allocation per +shard using an Observer instrument. These can be aggregated across +hosts to compute cluster-wide memory holdings by shard, for example. + +It does not make sense to compute a sum of memory holdings over +multiple periods, as these are not additive quantities. It does makes +sense to sum the last value across hosts. + +### Example: Reporting a per-request finishing account balance + +There's a number that rises and falls such as a bank account balance. +This was being `Set()` at the finish of all transactions. Replace it +with a Measure instrument and `Record()` the last value. + +Similar cases: report a cpu load, specific temperature, fan speed, or +altitude measurement associated with a request. ## Internal details @@ -38,17 +101,25 @@ replace Gauge instruments in the text. ## Trade-offs and mitigations -Not much is lost to the user from removing Gauge instruments. There -may be situations where it is undesirable to be interrupted by the -Metric SDK in order to execute an Observer callback--situations where -Observer semantics are correct but a synchronous API is more -acceptable. These cases can be addressed by Observer instruments -paired with helpers to maintain last-value state and define the -"complete" set of values. This requires more of the user, but users -are likely to be able to optimize this more than the SDK could have, -and this forces the user to recognize that uses of Gauge-like -instruments generally should know the complete set of values when -reporting. +Not much is lost to the user from removing Gauge instruments. + +There may be situations where an Observer instrument is the natural +choice but it is undesirable to be interrupted by the Metric SDK in +order to execute an Observer callback. Situations where Observer +semantics are correct (not Counter, not Measure) but a synchronous API +is more acceptable are expected to be very rare. + +To address such rare cases, here are two possibilities: + +1. Implement a Gauge Set instrument backed by an Observer instrument. +The Gauge Set's job is to maintain the current set of label sets +(e.g., explicitly managed or by time-limit) and their last value, to +be reported by the Observer at each collection interval. +2. Implement an application-specific metric collection API that would +allow the application to synchronize with the SDK on collection +intervals. For example, a transactional API allowing the application +to BEGIN and END synchronously reporting Observer instrument +observations. ## Prior art and alternatives