From 3e1705076a9fe7126183dcea155e48e63ad023e6 Mon Sep 17 00:00:00 2001 From: jmacd Date: Wed, 22 Jan 2020 22:35:05 -0800 Subject: [PATCH 01/54] Gauge/Observer rewrite pt 1 --- specification/api-metrics.md | 140 ++++++++++++++++++++++++----------- 1 file changed, 96 insertions(+), 44 deletions(-) diff --git a/specification/api-metrics.md b/specification/api-metrics.md index b7c217a94e3..248542e94f9 100644 --- a/specification/api-metrics.md +++ b/specification/api-metrics.md @@ -12,22 +12,22 @@ Table of Contents * [Metric kinds and inputs](#metric-kinds-and-inputs) * [Metric selection](#metric-selection) * [Counter](#counter) - * [Gauge](#gauge) * [Measure](#measure) + * [Observer](#observer) ## Overview -The user-facing metrics API supports producing diagnostic measurements -using three basic kinds of instrument. "Metrics" are the thing being -produced--mathematical, statistical summaries of certain observable -behavior in the program. "Instruments" are the devices used by the -program to record observations about their behavior. Therefore, we -use "metric instrument" to refer to a program object, allocated -through the API, used for recording metrics. There are three distinct -instruments in the Metrics API, commonly known as Counters, Gauges, -and Measures. +The OpenTelemetry Metrics API supports producing diagnostic +measurements using three basic kinds of instrument. "Metrics" are the +thing being produced--mathematical, statistical summaries of certain +observable behaviors in the program. "Instruments" are the devices +used by the program to record observations about their behavior. +Therefore, we use "metric instrument" to refer to a programmatic +interface, allocated through the API, used for capturing metric +events. There are three kinds of instruments known as Counters, +Measures, and Observers. Monitoring and alerting are the common use-case for the data provided through metric instruments, after various collection and aggregation @@ -39,63 +39,115 @@ separation of the API from the SDK. ### User-facing API -The user-facing OpenTelemetry API consists of an SDK-independent part -for defining metric instruments. Review the [user-facing -OpenTelementry API specification](api-metrics-user.md) for more detail -about the variety of methods, options, and optimizations available for -users of the instrumentation API. - -To capture measurements using an instrument, you need an SDK that -implements the `Meter` API. +The user-facing OpenTelemetry API for Metrics begins with a `Meter` +interface, usually obtained through dependency injection or a global +instance. The `Meter` API supports defining new metric instruments. +Review the [user-facing OpenTelemetry API +specification](api-metrics-user.md) for more detail about the variety +of methods, options, and optimizations available for users of the +instrumentation API and how to use the instruments defined here. ### Meter API -`Meter` is an interface with the SDK used to capture measurements in -various ways. Review the [SDK-facing OpenTelemetry API -specification](api-metrics-meter.md) known as `Meter`. +To capture measurements using an instrument, you need an SDK that +supports the `Meter` API, which consists of a set of constructors, the +`Labels` function for building label sets, and the `RecordBatch` +function for batch reporting. Refer to the [sdk-facing OpenTelemetry +API specification](api-metrics-user.md) for more implementation notes. Because of API-SDK separation, the `Meter` implementation ultimately determines how metrics events are handled. The specification's task is to define the semantics of the event and describe standard interpretation in high-level terms. How the `Meter` accomplishes its -goals and the export capabilities it supports are not specified. +goals and the export capabilities it supports are specified for the +default SDK in the (Metric SDK +specification WIP)[#WIP-spec-issue-347]. The standard interpretation for `Meter` implementations to follow is -specified so that users understand the intended use for each kind of -metric. For example, a monotonic Counter instrument supports -`Add()` events, so the standard interpretation is to compute a sum; -the sum may be exported as an absolute value or as the change in -value, but either way the purpose of using a Counter with `Add()` is -to monitor a sum. +given so that users understand the intended use for each kind of +metric. For example, a Counter instrument supports `Add()` events, +and the standard interpretation is to compute a sum. The sum may be +exported as an absolute value or as the change in value, but +regardless of the exporter and the implementation, the purpose of +using a Counter with `Add()` is to monitor a sum. Counters were used +in the example because they require almost no introduction. A +detailed explanation for how to select metric instruments for common +use-cases is given below, according to the semantics defined next. ### Purpose of this document This document gives an overview of the specification, introduces the the three kinds of instrument, and discusses how end-users should -think about various options at a high-level, without getting into -detail about specific method calls. +think about various instruments and options at a high-level, without +getting into detail about specific function calls. For details about +specific function calls, refer to the detailed specifications linked +above. -## Metric kinds and inputs +## Metric API / SDK separation The API distinguishes metric instruments by semantic meaning, not by the type of value produced in an exporter. This is a departure from convention, compared with a number of common metric libraries, and stems from the separation of the API and the SDK. The SDK ultimately determines how to handle metric events and could potentially implement -non-standard behavior. +non-standard behavior. All metric events can be represented as +consisting of a timestamp, an instrument, a number (the value), and a +label set. The semantics defined here are meant to assist both +application and SDK implementors, and examples will be given below. + +The separation of API and SDK explains why the metric API does not +have metric instrument explicitly tied to specific metric "exposition" +types, such as "Histogram", "Summary", or "Last values" (also known, +traditionally, as "Gauges"). In the case of Histogram and Summary +value types, both are appropriate outputs for Measure instruments, +because Measure instruments are meant to used for recording individual +measurements synchronously. + +There is a common metric instrument known as a "Gauge" that is not +included in this API, the term "Gauge" referring to an instrument, +often mechanical, for reading the current (also "last") value of a +measuring device (e.g., a speedometer on your car's dashboard). The +problem with "Gauge" starts from the term itself, which is figurative +in nature. Describing the instrument as a gauge implies how it will +be used, but not its semantics. + +There are use-cases for traditional gauge instruments that fall into +both Observer and Measure instrument use-cases under the semantics +defined here, according to the intended use of number being reported. +This will be discussed in detail after instruments have been +introduced and the distinctions between them have been made clear. + +### Justification for three kinds of instrument + +We believe the three metric kinds Counter, Measure, and Observer form +a sufficient basis for expressing nearly all metric data. But if the +API and SDK are separated, and the SDK can handle metric events as it +pleases, why not have just one kind of instrument? This section +explains how the instruments are fundamentally different, despite all +metric events having the same form (i.e., a timestamp, an instrument, +a number, and a label set). + +Establishing three kinds of instrument is important because it allows +the SDK to provide good functionality, without external configuration, +in most cases by default. + +Factors that come up: + +- is zero meaningful? (i.e., sum important?) +- are the number of measurements important? (is there an implied rate?) +- is the measurement part of a current value set? +- if so ^^^, is it natural to sum current values or average them. +- (is it an interval or a ratio or a count) +- is there a measurement "interval"? is the numnber of measurements meaningful. + + + -This explains why the metric API does not have metric instrument kinds -for exporting "Histogram" and "Summary" distribution explicitly, for -example. These are both semantically `Measure` instruments and an SDK -can be configured to produce histograms or distribution summaries from -Measure events. It is out of scope for the Metrics API to specify how -these alternatives are configured in a particular SDK. +Counter and Measure instruments offer synchronous APIs, . -We believe the three metric kinds Counter, Gauge, and Measure form a -sufficient basis for expression of a wide variety of metric data. -Programmers write and read these as `Add()`, `Set()`, and `Record()` -method calls, signifying the semantics and standard interpretation, -and we believe these three methods are all that are needed. +Programmers write and read these as `Add()` and `Record()` function +calls , signifying the semantics and standard interpretation, and we +believe these three methods are all that are needed. Nevertheless, it is common to apply restrictions on metric values, the inputs to `Add()`, `Set()`, and `Record()`, in order to refine their @@ -109,7 +161,7 @@ provided to record positive or negative values, but it does not change the kind of instrument or the method name used, as the semantics are unchanged. -### Metric selection +### Metric instrument selection To guide the user in selecting the right kind of metric for an application, we'll consider the following questions about the primary From 56e16061e8768031162efd4976d99c389e296ef6 Mon Sep 17 00:00:00 2001 From: jmacd Date: Mon, 27 Jan 2020 16:50:42 -0800 Subject: [PATCH 02/54] Checkpoint --- specification/api-metrics.md | 206 ++++++++++++++++++++--------------- 1 file changed, 119 insertions(+), 87 deletions(-) diff --git a/specification/api-metrics.md b/specification/api-metrics.md index 248542e94f9..5464912b5dc 100644 --- a/specification/api-metrics.md +++ b/specification/api-metrics.md @@ -1,21 +1,19 @@ # Metrics API -
- -Table of Contents - + -* [Overview](#overview) +- [Overview](#overview) * [User-facing API](#user-facing-api) * [Meter API](#meter-api) * [Purpose of this document](#purpose-of-this-document) -* [Metric kinds and inputs](#metric-kinds-and-inputs) - * [Metric selection](#metric-selection) +- [Metric API / SDK separation](#metric-api--sdk-separation) + * [Justification for three kinds of instrument](#justification-for-three-kinds-of-instrument) + * [Metric instrument selection](#metric-instrument-selection) * [Counter](#counter) + * [Gauge](#gauge) * [Measure](#measure) - * [Observer](#observer) -
+ ## Overview @@ -25,14 +23,14 @@ thing being produced--mathematical, statistical summaries of certain observable behaviors in the program. "Instruments" are the devices used by the program to record observations about their behavior. Therefore, we use "metric instrument" to refer to a programmatic -interface, allocated through the API, used for capturing metric -events. There are three kinds of instruments known as Counters, -Measures, and Observers. +interface, allocated through the API, used to produce metric events. +There are three kinds of instruments known as Counters, Measures, and +Observers. Monitoring and alerting are the common use-case for the data provided through metric instruments, after various collection and aggregation strategies are applied to the data. We find there are many other uses -for the _metric events_ that stream into these instruments. We +for the _metric events_ recorded through these instruments. We imagine metric data being aggregated and recorded as events in tracing and logging systems too, and for this reason OpenTelemetry requires a separation of the API from the SDK. @@ -49,7 +47,7 @@ instrumentation API and how to use the instruments defined here. ### Meter API -To capture measurements using an instrument, you need an SDK that +To produce measurements using an instrument, you need an SDK that supports the `Meter` API, which consists of a set of constructors, the `Labels` function for building label sets, and the `RecordBatch` function for batch reporting. Refer to the [sdk-facing OpenTelemetry @@ -57,22 +55,21 @@ API specification](api-metrics-user.md) for more implementation notes. Because of API-SDK separation, the `Meter` implementation ultimately determines how metrics events are handled. The specification's task -is to define the semantics of the event and describe standard -interpretation in high-level terms. How the `Meter` accomplishes its -goals and the export capabilities it supports are specified for the -default SDK in the (Metric SDK -specification WIP)[#WIP-spec-issue-347]. +is to define the semantics of the events in high-level terms, so that +users and implementors can agree on their meaning. How the `Meter` +accomplishes its goals and the export capabilities it supports are +specified for the default SDK in the (Metric SDK specification +WIP)[#WIP-spec-issue-347]. The standard interpretation for `Meter` implementations to follow is given so that users understand the intended use for each kind of metric. For example, a Counter instrument supports `Add()` events, -and the standard interpretation is to compute a sum. The sum may be +and the default implementation is to compute a sum. The sum may be exported as an absolute value or as the change in value, but regardless of the exporter and the implementation, the purpose of -using a Counter with `Add()` is to monitor a sum. Counters were used -in the example because they require almost no introduction. A -detailed explanation for how to select metric instruments for common -use-cases is given below, according to the semantics defined next. +using a Counter with `Add()` is to monitor a sum. A detailed +explanation for how to select metric instruments for common use-cases +is given below, according to the semantics defined next. ### Purpose of this document @@ -96,72 +93,107 @@ label set. The semantics defined here are meant to assist both application and SDK implementors, and examples will be given below. The separation of API and SDK explains why the metric API does not -have metric instrument explicitly tied to specific metric "exposition" -types, such as "Histogram", "Summary", or "Last values" (also known, -traditionally, as "Gauges"). In the case of Histogram and Summary -value types, both are appropriate outputs for Measure instruments, -because Measure instruments are meant to used for recording individual -measurements synchronously. +have metric instruments that generate specific metric "exposition" +values, for example "Histogram" and "Summary" values, which are +different ways to expose a distribution of measurements. This API +specifies the use of a Measure kind of instrument with `Record()` for +recording individual measurements. The instruments are defined so +that users and implementors understand what they mean, beacuse +different SDKs will handle events differently. There is a common metric instrument known as a "Gauge" that is not included in this API, the term "Gauge" referring to an instrument, -often mechanical, for reading the current (also "last") value of a -measuring device (e.g., a speedometer on your car's dashboard). The -problem with "Gauge" starts from the term itself, which is figurative -in nature. Describing the instrument as a gauge implies how it will -be used, but not its semantics. - -There are use-cases for traditional gauge instruments that fall into -both Observer and Measure instrument use-cases under the semantics -defined here, according to the intended use of number being reported. -This will be discussed in detail after instruments have been -introduced and the distinctions between them have been made clear. - -### Justification for three kinds of instrument - -We believe the three metric kinds Counter, Measure, and Observer form -a sufficient basis for expressing nearly all metric data. But if the -API and SDK are separated, and the SDK can handle metric events as it -pleases, why not have just one kind of instrument? This section -explains how the instruments are fundamentally different, despite all -metric events having the same form (i.e., a timestamp, an instrument, -a number, and a label set). - -Establishing three kinds of instrument is important because it allows -the SDK to provide good functionality, without external configuration, -in most cases by default. - -Factors that come up: - -- is zero meaningful? (i.e., sum important?) -- are the number of measurements important? (is there an implied rate?) -- is the measurement part of a current value set? -- if so ^^^, is it natural to sum current values or average them. -- (is it an interval or a ratio or a count) -- is there a measurement "interval"? is the numnber of measurements meaningful. - - - - -Counter and Measure instruments offer synchronous APIs, . - -Programmers write and read these as `Add()` and `Record()` function -calls , signifying the semantics and standard interpretation, and we -believe these three methods are all that are needed. - -Nevertheless, it is common to apply restrictions on metric values, the -inputs to `Add()`, `Set()`, and `Record()`, in order to refine their -standard interpretation. Generally, there is a question of whether -the instrument can be used to compute a rate, because that is usually -a desirable analysis. Each metric instrument offers an optional -declaration, specifying restrictions on values input to the metric. -For example, Measures are declared as non-negative by default, -appropriate for reporting sizes and durations; a Measure option is -provided to record positive or negative values, but it does not change -the kind of instrument or the method name used, as the semantics are -unchanged. - -### Metric instrument selection +often mechanical, for reading the current "last" value of a measuring +device (e.g., a speedometer on your car's dashboard). The problem +with "Gauge" starts from the term itself, which is figurative in +nature. Using the word "gauge" suggests a behavior, that the +instrument will be used to expose the last value, not a semantic +definition. Uses of traditional gauge instruments translate into an +Observer or the Measure instrument in this API. + +### Brief: Three kinds of instrument + +Two of the three kinds of instrument have been introduced through +examples above: + +1. Counter. Events contribute to a sum (i.e., a running total). The +key property of Counter events is that they can be semantically +combined without affecting their meaning. For example, two `Add(1)` +events are semantically equivalent to one `Add(2)` event. +2. Measure. Events are individual measurements. The key property of +Measure events is that they correspond to real events, they cannot be +semantically combined. + +The third kind of instrument is the Observer instrument, used for +recording the current value from a measurement source. Each +collection interval, Observer instruments contribute the current value +for each distinct set of labels. When aggregating Observer instrument +values across distinct label sets, the default implementation is to +compute a sum of current values. + +Whereas Counter and Measure are both synchronous instruments, called +by the program to report on itself, the Observer instrument is +activated by the SDK through a callback, synchronized with collection. +Two key properties of an Observer instrument are (1) that it allows +the programmer to report a coherent set of values, (2) that it can +lower collection cost by computing values on demand. + +### Interpretation + +We believe the three instrument kinds Counter, Measure, and Observer +form a sufficient basis for expressing nearly all metric data. But if +the API and SDK are separated, and the SDK can handle any metric event +as it pleases, why not have just one kind of instrument? How are the +instruments fundamentally different, despite all metric events having +the same form (i.e., a timestamp, an instrument, a number, and a label +set)? + +Establishing different kinds of instrument is important because in +most cases it allows the SDK to provide default functionality without +requiring alternative beahviors to be configured. The choice of +instrument determines not only the meaning of the events but also the +name of the function used to report data. The function names--`Add()` +for Counter instruments, `Record()` for Measure instruments, and +`Observe()` for Observer instruments--help convey and reinforce the +semantics of the event. + +The standard implementation for the three instruments is defined as +follows: + +1. Counter. Accumulate a total for each distinct label set. When +aggregating distinct label sets for a Counter, combine using addition. +Export as the computed sum. +2. Measure. Compute summary statistics of the value distribution for +each distinct label set. Which statistics are used is determined by +the implementation, but they usually include at least the sum of +values, the count of measurements, and the minimum and maximum values. +When aggregating distinct label sets for a Measure, report summary +statistics of the combined value distribution. Exposition formats for +Measure data vary widely by backend service. + +3. Observer. Current values are provided by the Observer callback at +the end of each Metric collection period. When aggregating values +_for the same label set_, combine using the more-recent value. When +aggregating values _for different label sets_, combine using the sum. +Export as label set, calculated value pairs. + +### Optional restrictions + +It is common to apply restrictions on the input range of metric +values passed to inputs `Add()`, `Record()`, and `Observe()`. + +@@@ HERE + +Generally, there is a question of whether the instrument can be used +to compute a rate, because that is usually a desirable analysis. Each +metric instrument offers an optional declaration, specifying +restrictions on values input to the metric. For example, Measures are +declared as non-negative by default, appropriate for reporting sizes +and durations; a Measure option is provided to record positive or +negative values, but it does not change the kind of instrument or the +method name used, as the semantics are unchanged. + +## Metric instrument selection To guide the user in selecting the right kind of metric for an application, we'll consider the following questions about the primary From 4f4f4f853f78b932492f1a4dd977a429f4a2b209 Mon Sep 17 00:00:00 2001 From: jmacd Date: Tue, 28 Jan 2020 00:33:00 -0800 Subject: [PATCH 03/54] Checkpoint --- specification/api-metrics.md | 153 ++++++++++++++++++++++++----------- 1 file changed, 106 insertions(+), 47 deletions(-) diff --git a/specification/api-metrics.md b/specification/api-metrics.md index 5464912b5dc..b745b971227 100644 --- a/specification/api-metrics.md +++ b/specification/api-metrics.md @@ -113,30 +113,74 @@ Observer or the Measure instrument in this API. ### Brief: Three kinds of instrument -Two of the three kinds of instrument have been introduced through -examples above: - -1. Counter. Events contribute to a sum (i.e., a running total). The -key property of Counter events is that they can be semantically -combined without affecting their meaning. For example, two `Add(1)` -events are semantically equivalent to one `Add(2)` event. -2. Measure. Events are individual measurements. The key property of -Measure events is that they correspond to real events, they cannot be -semantically combined. - -The third kind of instrument is the Observer instrument, used for -recording the current value from a measurement source. Each -collection interval, Observer instruments contribute the current value -for each distinct set of labels. When aggregating Observer instrument -values across distinct label sets, the default implementation is to -compute a sum of current values. - -Whereas Counter and Measure are both synchronous instruments, called -by the program to report on itself, the Observer instrument is -activated by the SDK through a callback, synchronized with collection. -Two key properties of an Observer instrument are (1) that it allows -the programmer to report a coherent set of values, (2) that it can -lower collection cost by computing values on demand. +Two of the three kinds of instrument have been used in the examples +above. The three instruments are: + +#### Counter + +Counter instruments are used to report sums. These are sometimes used +to monitor _rates_, and they are sometimes used to report _totals_. +One key property of Counter instruments is that two `Add(1)` events +are semantically equivalent to one `Add(2)` event, meaning that +Counter events can be combined using addition by definition. + +Labels associated with Counter instrument events can be used to +compute rates and totals over selected dimensions. When aggregating +Counter events we naturally combine values using addition. Counter +`Add(0)` events are no-ops by definition. + +Counters are monotonic by default, meaning `Add()` logically accepts +only non-negative values. Monotonicity is useful for defining rates, +especially; non-monotonic Counter instruments are an option to support +sums that rise and fall. + +Examples: requests processed, bytes read or written, memory allocated. + +#### Measure + +Measure instruments are used to report individual measurements. +Measure events are semantically independent and cannot be naturally +combined like with Counters. Measure instruments are used to report +many kinds of information, recommended for reporting measurements +associated with real events in a computer program. + +As a synchronous API, Measure `Record()` events can be used to record +information with associated labels and context. This is the more +general of the two synchronous metric instruments. + +Measure instruments support non-negative values by default, also known +as "absolute" in the sense that, mathematically, absolute values are +never negative. As an option, Measure instruments can be defined as +not absolute, supporting both postive and negative values. + +Examples: request latency, number of terms in a query, temperature. + +#### Observer + +Observer instruments are used to report a current set of values at the +time of collection. Observer instruments report not only current +values, but also which label sets are current at the moment of +collection as a coherent set of values. These instruments reduce +collection cost because they are computed and reported only once per +collection interval, by definition. + +Unlike Counter and Measure instruments, Observer instruments are +synchronized with collection, used to report values not based in +events but periodically, on demand, by the program itself. There is +no aggregation across time for Observer instruments by definition, +only the current value is defined. + +Observer instruments support being declared as monotonic. A monotonic +measure instrument supports reporting values that are not less than +the value reported in the previous collection interval. + +When aggregating Observer instrument values across dimensions other +than time, Observer instruments may be treated like Counters (to +combine a rate or sum) or like Measures (to combine a distribution). +Unless otherwise configured, Observers are aggregated as Counters +would be. + +Examples: memory held per shard, queue size by name. ### Interpretation @@ -149,40 +193,51 @@ the same form (i.e., a timestamp, an instrument, a number, and a label set)? Establishing different kinds of instrument is important because in -most cases it allows the SDK to provide default functionality without -requiring alternative beahviors to be configured. The choice of -instrument determines not only the meaning of the events but also the -name of the function used to report data. The function names--`Add()` -for Counter instruments, `Record()` for Measure instruments, and -`Observe()` for Observer instruments--help convey and reinforce the -semantics of the event. +most cases it allows the SDK to provide good default functionality +without requiring alternative behaviors to be configured. The choice +of instrument determines not only the meaning of the events but also +the name of the function used to report data. The function +names--`Add()` for Counter instruments, `Record()` for Measure +instruments, and `Observe()` for Observer instruments--help convey and +reinforce the semantics of the event. The standard implementation for the three instruments is defined as follows: -1. Counter. Accumulate a total for each distinct label set. When -aggregating distinct label sets for a Counter, combine using addition. -Export as the computed sum. -2. Measure. Compute summary statistics of the value distribution for -each distinct label set. Which statistics are used is determined by -the implementation, but they usually include at least the sum of -values, the count of measurements, and the minimum and maximum values. -When aggregating distinct label sets for a Measure, report summary -statistics of the combined value distribution. Exposition formats for -Measure data vary widely by backend service. - +1. Counter. The `Add()` function accumulates a total for each +distinct label set. When aggregating over distinct label sets for a +Counter, combine using addition. Export as the computed sum. +2. Measure. Use the `Record()` function to report summary statistics +about the distribution of values, for each distinct label set. Which +statistics are used is determined by the implementation, but they +usually include at least the sum of values, the count of measurements, +and the minimum and maximum values. When aggregating distinct label +sets for a Measure, report summary statistics of the combined value +distribution. Exposition formats for Measure data vary widely by +backend service. 3. Observer. Current values are provided by the Observer callback at the end of each Metric collection period. When aggregating values -_for the same label set_, combine using the more-recent value. When +_for the same label set_, combine using the most-recent value. When aggregating values _for different label sets_, combine using the sum. -Export as label set, calculated value pairs. +Export as pairs of label set and calculated value. + +We recognize that the standard behavior of the three instruments does +not cover all use-cases perfectly. There is a natural tension between +offering dedicated metric instruments for every distinct metric +application and combining use-cases, generalizing semantics to reduce +the API surface area. We could have define more than or fewer than +three kinds of instrument; we have three because we these seem like +enough. Where a uncommon use-cases call for non-standard +implementation (e.g., a Measure instrument configured to with +last-value aggregation), we accept that users will be required to +provide additional configuration for how to view certain metric data. -### Optional restrictions +### Optional semantic restrictions It is common to apply restrictions on the input range of metric -values passed to inputs `Add()`, `Record()`, and `Observe()`. +values passed to inputs `Add()`, `Record()`, and `Observe()`. As -@@@ HERE +@@@ HERE note that it's semantics, not required enforcement. Generally, there is a question of whether the instrument can be used to compute a rate, because that is usually a desirable analysis. Each @@ -193,8 +248,12 @@ and durations; a Measure option is provided to record positive or negative values, but it does not change the kind of instrument or the method name used, as the semantics are unchanged. +@@@ Optional special case for timer instrument: SHOULD + ## Metric instrument selection +@@@ HERE add many more examples. + To guide the user in selecting the right kind of metric for an application, we'll consider the following questions about the primary intent of reporting given data. We use "of primary interest" here to From 12339db1006a81fc96013615825eb364479b9f20 Mon Sep 17 00:00:00 2001 From: jmacd Date: Tue, 28 Jan 2020 14:37:22 -0800 Subject: [PATCH 04/54] Optional features restored --- specification/api-metrics.md | 127 +++++++++++++++++++++++++---------- 1 file changed, 93 insertions(+), 34 deletions(-) diff --git a/specification/api-metrics.md b/specification/api-metrics.md index b745b971227..9c9562ca3b3 100644 --- a/specification/api-metrics.md +++ b/specification/api-metrics.md @@ -111,10 +111,7 @@ instrument will be used to expose the last value, not a semantic definition. Uses of traditional gauge instruments translate into an Observer or the Measure instrument in this API. -### Brief: Three kinds of instrument - -Two of the three kinds of instrument have been used in the examples -above. The three instruments are: +### Three kinds of instrument #### Counter @@ -134,7 +131,8 @@ only non-negative values. Monotonicity is useful for defining rates, especially; non-monotonic Counter instruments are an option to support sums that rise and fall. -Examples: requests processed, bytes read or written, memory allocated. +Examples: requests processed, bytes read or written, memory allocated +and deallocated. #### Measure @@ -153,24 +151,25 @@ as "absolute" in the sense that, mathematically, absolute values are never negative. As an option, Measure instruments can be defined as not absolute, supporting both postive and negative values. -Examples: request latency, number of terms in a query, temperature. +Examples: request latency, number of query terms, temperature, fan +speed, account balance, load average, screen width. #### Observer Observer instruments are used to report a current set of values at the time of collection. Observer instruments report not only current -values, but also which label sets are current at the moment of -collection as a coherent set of values. These instruments reduce +values, but also _which label sets are current_ at the moment of +collection, as a coherent set of values. These instruments reduce collection cost because they are computed and reported only once per collection interval, by definition. Unlike Counter and Measure instruments, Observer instruments are -synchronized with collection, used to report values not based in -events but periodically, on demand, by the program itself. There is -no aggregation across time for Observer instruments by definition, -only the current value is defined. +synchronized with collection, used to report values, on demand, +calculated by the program itself. There is no aggregation across time +for Observer instruments by definition, only the current value is +defined. -Observer instruments support being declared as monotonic. A monotonic +Observer instruments can be declared as monotonic. A monotonic measure instrument supports reporting values that are not less than the value reported in the previous collection interval. @@ -182,7 +181,7 @@ would be. Examples: memory held per shard, queue size by name. -### Interpretation +### Standard Interpretation We believe the three instrument kinds Counter, Measure, and Observer form a sufficient basis for expressing nearly all metric data. But if @@ -199,14 +198,14 @@ of instrument determines not only the meaning of the events but also the name of the function used to report data. The function names--`Add()` for Counter instruments, `Record()` for Measure instruments, and `Observe()` for Observer instruments--help convey and -reinforce the semantics of the event. +reinforce the standard interpretation of the event. The standard implementation for the three instruments is defined as follows: 1. Counter. The `Add()` function accumulates a total for each distinct label set. When aggregating over distinct label sets for a -Counter, combine using addition. Export as the computed sum. +Counter, combine using addition. Export as a set of calculated sums. 2. Measure. Use the `Record()` function to report summary statistics about the distribution of values, for each distinct label set. Which statistics are used is determined by the implementation, but they @@ -226,31 +225,91 @@ not cover all use-cases perfectly. There is a natural tension between offering dedicated metric instruments for every distinct metric application and combining use-cases, generalizing semantics to reduce the API surface area. We could have define more than or fewer than -three kinds of instrument; we have three because we these seem like -enough. Where a uncommon use-cases call for non-standard -implementation (e.g., a Measure instrument configured to with -last-value aggregation), we accept that users will be required to -provide additional configuration for how to view certain metric data. +three kinds of instrument; we have three because these seem like +enough. Where uncommon use-cases call for a non-standard +implementation configuration (e.g., a Measure instrument configured +with last-value aggregation), we accept that users will be required to +provide additional input on how to view certain metric data. ### Optional semantic restrictions -It is common to apply restrictions on the input range of metric -values passed to inputs `Add()`, `Record()`, and `Observe()`. As +The instruments support optional declarations that indicate +restrictions on the valid range of inputs. There are two options, one +to indicate whether the value is signed or not, the other to indicate +monotonicity. These options are meant to be used as a signal to the +observability system, since they impact the way these data are exposed +to users. + +In both cases, the optional restriction does not change the semantics +of the instrument. The options are independent, both can be +meaningfully set on any instrument kind. + +The specification describes enforcement of these options as "best +effort", not required. Users are expected to honor their own +declarations when using instruments, and the SDK is expected to +perform checking of these options only when it can be done +inexpensively. + +#### Absolute vs. Non-Absolute + +Absolute refers to whether an instrument accepts negative values. +Absolute instruments can be described as accepting non-negative +inputs, whereas non-absolute instruments can be described as accepting +signed inputs. + +When an instrument is absolute (i.e., accepts non-negative updates), +we know that the sum can be used to express a rate automatically. +This is true for all kinds of instrument. + +When exporting measure values as a histogram, for example, knowing the +instrument is absolute facilitates the use of logarithmic buckets +(which are difficult to use when the input range spans zero). + +Absolute behavior is the default for all instrument kinds. The +Non-Absolute option is supported for all instrument kinds. + +Because this is a simple property for the SDK to test, the +specification recommends that SDKs SHOULD reject metric events for +absolute instruments when negative values are used, and instead issue +a warning to the user. + +#### Monotonic vs. Non-Monotonic + +Monotonic refers to whether an instrument only accepts values that are +greater than or equal to the previously recorded value. Non-monotonic +instruments are those which accept any change in the value, positive +or negative. + +Absolute-valued counters are naturally monotonic, so that Absolute and +Monotonic have the same interpretation for Counter instruments. + +Measure and Observer instruments may be declared as monotonic, however +since this property is expensive to test, the specification recommends +that SDKs SHOULD implement monotonicity checking only when computing a +last-value aggregation. The SDK SHOULD only perform this test against +the last known value when it holds the necessary information, it +should not go out of its way to save data simply to perform +monotonicity testing. + +### Option: Dedicated Measure for timer values + +As a language-optional feature, the API may support a dedicated +instrument for reporting timing measurements. This kind of +instrument, with recommended name `TimingMeasure` (and +`BoundTimingMeasure`), is semantically equivalent to a Measure +instrument, and like the Measure instrument supports a `Record()` +function, but the input value to this instrument is in the language's +conventional data type for timing measurements. + +For example, in Go the API will accept a `time.Duration`, and in C++ +the API will accept a `std::chrono::duration`. These advantage of +using these instruments is that they use the correct units +automatically, avoiding the potential for confusion over timing metrics. -@@@ HERE note that it's semantics, not required enforcement. +## Metric instrument selection -Generally, there is a question of whether the instrument can be used -to compute a rate, because that is usually a desirable analysis. Each -metric instrument offers an optional declaration, specifying -restrictions on values input to the metric. For example, Measures are -declared as non-negative by default, appropriate for reporting sizes -and durations; a Measure option is provided to record positive or -negative values, but it does not change the kind of instrument or the -method name used, as the semantics are unchanged. -@@@ Optional special case for timer instrument: SHOULD -## Metric instrument selection @@@ HERE add many more examples. From 5945fa3e33a970b3be70bf07fe46a97dfc9c8438 Mon Sep 17 00:00:00 2001 From: jmacd Date: Wed, 29 Jan 2020 00:44:36 -0800 Subject: [PATCH 05/54] Update selection process --- specification/api-metrics.md | 159 ++++++++++++++--------------------- 1 file changed, 64 insertions(+), 95 deletions(-) diff --git a/specification/api-metrics.md b/specification/api-metrics.md index 9c9562ca3b3..aa3a81b4b07 100644 --- a/specification/api-metrics.md +++ b/specification/api-metrics.md @@ -140,7 +140,7 @@ Measure instruments are used to report individual measurements. Measure events are semantically independent and cannot be naturally combined like with Counters. Measure instruments are used to report many kinds of information, recommended for reporting measurements -associated with real events in a computer program. +associated with real events in the program. As a synchronous API, Measure `Record()` events can be used to record information with associated labels and context. This is the more @@ -291,99 +291,68 @@ the last known value when it holds the necessary information, it should not go out of its way to save data simply to perform monotonicity testing. -### Option: Dedicated Measure for timer values - -As a language-optional feature, the API may support a dedicated -instrument for reporting timing measurements. This kind of -instrument, with recommended name `TimingMeasure` (and -`BoundTimingMeasure`), is semantically equivalent to a Measure -instrument, and like the Measure instrument supports a `Record()` -function, but the input value to this instrument is in the language's -conventional data type for timing measurements. - -For example, in Go the API will accept a `time.Duration`, and in C++ -the API will accept a `std::chrono::duration`. These advantage of -using these instruments is that they use the correct units -automatically, avoiding the potential for confusion over timing metrics. - ## Metric instrument selection - - - -@@@ HERE add many more examples. - -To guide the user in selecting the right kind of metric for an -application, we'll consider the following questions about the primary -intent of reporting given data. We use "of primary interest" here to -mean information that is almost certainly useful in understanding -system behavior. Consider these questions: - -- Does the measurement represent a quantity of something? Is it also non-negative? -- Is the sum a matter of primary interest? -- Is the event count a matter of primary interest? -- Is the distribution (p50, p99, etc.) a matter of primary interest? - -With answers to these questions, a user should be able to select the -kind of metric instrument based on its primary purpose. - -### Counter - -Counters support `Add(value)`. Choose this kind of metric when the -value is a quantity, the sum is of primary interest, and the event -count and value distribution are not of primary interest. - -Counters are defined as `Monotonic = true` by default, meaning that -positive values are expected. `Monotonic = true` counters are -typically used because they can automatically be interpreted as a -rate. - -As an option, counters can be declared as `Monotonic = false`, in which -case they support positive and negative increments. `Monotonic = false` -counters are useful to report changes in an accounting scheme, such as -the number of bytes allocated and deallocated. - -### Gauge - -Gauges support `Set(value)`. Gauge metrics express a pre-calculated -value that is either Set() by explicit instrumentation or observed -through a callback. Generally, this kind of metric should be used -when the metric cannot be expressed as a sum or because the -measurement interval is arbitrary. Use this kind of metric when the -measurement is not a quantity, and the sum and event count are not of -interest. - -Gauges are defined as `Monotonic = false` by default, meaning that new -values are permitted to make positive or negative changes to the -gauge. There is no restriction on the sign of the input for gauges. - -As an option, gauges can be declared as `Monotonic = true`, in which case -successive values are expected to rise monotonically. `Monotonic = true` -gauges are useful in reporting computed cumulative sums, allowing an -application to compute a current value and report it, without -remembering the last-reported value in order to report an increment. - -A special case of gauge is supported, called an `Observer` metric -instrument, which is semantically equivalent to a gauge but uses a -callback to report the current value. Observer instruments are -defined by a callback, instead of supporting `Set()`, but the -semantics are the same. The only difference between `Observer` and -ordinary gauges is that their events do not have an associated -OpenTelemetry context. Observer instruments are `Monotonic = false` by -default and `Monotonic = true` as an option, like ordinary gauges. - -### Measure - -Measures support `Record(value)`, signifying that events report -individual measurements. This kind of metric should be used when the -count or rate of events is meaningful and either: - -- The sum is of interest in addition to the count (rate) -- Quantile information is of interest. - -Measures are defined as `Absolute = true` by default, meaning that -negative values are invalid. `Absolute = true` measures are typically -used to record absolute values such as durations and sizes. - -As an option, measures can be declared as `Absolute = false` to -indicate support for positive and negative values. +To guide the user in selecting the right kind of metric instrument for +an application, we'll consider several questions about the kind of +numbers being reported. Here are some ways to help choose. Examples +are provided in the following section. + +### Counters and Measures compared + +Counters and Measures are both recommended for reporting measurements +taken during synchronous activity, driven by events in the program. +These measurements include an associated distributed context, the +effective span context (if any), the correlation context, and +user-provided LabelSet values. + +Start with an application for metrics data in mind. It is useful to +consider whether you are more likely to be interested in the sum of +values or the average value, as processed by the instrument. Counters +are useful when only the sum is interesting. Measures are useful when +the sum and any other kind of summary information about the individual +values are of interest. + +If only the sum is of interest, use a Counter instrument. If the +Counter instrument accepts non-negative `Add()` values, use a +(default) monotonic Counter which will typically be expressed as a +rate (i.e., change per unit time). If the Counter accepts both +positive and negative `Add()` values, use a non-monotonic Counter +which will typically be expressed as the total sum. + +If you are interested in any other kind of summary value or statistic, +such as mean, median and other quantiles, or minimum and maximum +value, use a Measure instrument. Measure instruments are used to +report any kind of measurement that is not typically expressed as a +rate or as a total sum. + +If the Measure instrument accepts only non-negative values, as is +typically the case for measuring physical quantities, use a (default) +absolute Measure. If the Measure instrument accepts both positive and +negative values, use a non-absolute Measure. Both of these are +typically expressed in terms of a distribution of values, independent +from and _in addition to_ the rate of these measurements. + +### Observer instruments + +Observer instruments are recommended for reporting measurements about +the state of the program at a moment in time. These expose current +information about the program itself, not related to individual events +taking place in the program. Observer instruments are reported +outside of a context, thus do not have an effective span context or +correlation context. + +Observer instruments are meant to be used when measured values report +on the current state of the program, as opposed to a change of state +in the program. When Observer instruments are used to report physical +quantities, use a (default) absolute Observer. When Observer +instruments are used to report measurements can be negative for any +reason, use a non-absolute Observer. + +If the Observer reports a current total sum, declare it as a monotonic +Observer. Monotonic values are typically expressed as a rate of +change. + +## Examples + +TODO(jmacd): Working on these 1/29/2020. \ No newline at end of file From da613042c59a50c868283b9a989c55b9b69cfc10 Mon Sep 17 00:00:00 2001 From: jmacd Date: Wed, 29 Jan 2020 00:47:48 -0800 Subject: [PATCH 06/54] Update TOC --- specification/api-metrics.md | 17 ++++++++++++----- 1 file changed, 12 insertions(+), 5 deletions(-) diff --git a/specification/api-metrics.md b/specification/api-metrics.md index aa3a81b4b07..59d659b3e69 100644 --- a/specification/api-metrics.md +++ b/specification/api-metrics.md @@ -7,11 +7,18 @@ * [Meter API](#meter-api) * [Purpose of this document](#purpose-of-this-document) - [Metric API / SDK separation](#metric-api--sdk-separation) - * [Justification for three kinds of instrument](#justification-for-three-kinds-of-instrument) - * [Metric instrument selection](#metric-instrument-selection) - * [Counter](#counter) - * [Gauge](#gauge) - * [Measure](#measure) + * [Three kinds of instrument](#three-kinds-of-instrument) + + [Counter](#counter) + + [Measure](#measure) + + [Observer](#observer) + * [Standard Interpretation](#standard-interpretation) + * [Optional semantic restrictions](#optional-semantic-restrictions) + + [Absolute vs. Non-Absolute](#absolute-vs-non-absolute) + + [Monotonic vs. Non-Monotonic](#monotonic-vs-non-monotonic) +- [Metric instrument selection](#metric-instrument-selection) + * [Counters and Measures compared](#counters-and-measures-compared) + * [Observer instruments](#observer-instruments) +- [Examples](#examples) From cc9caef94728e97ed2fdb79d113746c44219f2fb Mon Sep 17 00:00:00 2001 From: jmacd Date: Wed, 29 Jan 2020 10:22:03 -0800 Subject: [PATCH 07/54] Reword intro para on standard impl --- specification/api-metrics.md | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/specification/api-metrics.md b/specification/api-metrics.md index 59d659b3e69..4c1829ef0ce 100644 --- a/specification/api-metrics.md +++ b/specification/api-metrics.md @@ -68,15 +68,15 @@ accomplishes its goals and the export capabilities it supports are specified for the default SDK in the (Metric SDK specification WIP)[#WIP-spec-issue-347]. -The standard interpretation for `Meter` implementations to follow is -given so that users understand the intended use for each kind of -metric. For example, a Counter instrument supports `Add()` events, -and the default implementation is to compute a sum. The sum may be -exported as an absolute value or as the change in value, but -regardless of the exporter and the implementation, the purpose of -using a Counter with `Add()` is to monitor a sum. A detailed -explanation for how to select metric instruments for common use-cases -is given below, according to the semantics defined next. +The standard implementation for `Meter` implementations to follow is +given, to aid with understanding the different instrument use-cases. +For example, a Counter instrument supports `Add()` events, and the +standard implementation is to compute a sum. The sum may be exported +as a grand total or as the change in the grand total, but regardless +of the exporter and specific techniques, the purpose of using a +Counter with `Add()` is to monitor a sum. A detailed explanation for +how to select metric instruments for common use-cases is given below, +according to the semantics defined next. ### Purpose of this document From 210491e2b9b832cc26bad7f32061fba0a9ea304f Mon Sep 17 00:00:00 2001 From: jmacd Date: Wed, 29 Jan 2020 10:34:44 -0800 Subject: [PATCH 08/54] Add detail on label set --- specification/api-metrics.md | 21 ++++++++++++++++++--- 1 file changed, 18 insertions(+), 3 deletions(-) diff --git a/specification/api-metrics.md b/specification/api-metrics.md index 4c1829ef0ce..1a61a930a21 100644 --- a/specification/api-metrics.md +++ b/specification/api-metrics.md @@ -58,7 +58,7 @@ To produce measurements using an instrument, you need an SDK that supports the `Meter` API, which consists of a set of constructors, the `Labels` function for building label sets, and the `RecordBatch` function for batch reporting. Refer to the [sdk-facing OpenTelemetry -API specification](api-metrics-user.md) for more implementation notes. +API specification](api-metrics-meter.md) for more implementation notes. Because of API-SDK separation, the `Meter` implementation ultimately determines how metrics events are handled. The specification's task @@ -95,8 +95,8 @@ convention, compared with a number of common metric libraries, and stems from the separation of the API and the SDK. The SDK ultimately determines how to handle metric events and could potentially implement non-standard behavior. All metric events can be represented as -consisting of a timestamp, an instrument, a number (the value), and a -label set. The semantics defined here are meant to assist both +consisting of a timestamp, an instrument, a number (the _value_), and +a label set. The semantics defined here are meant to assist both application and SDK implementors, and examples will be given below. The separation of API and SDK explains why the metric API does not @@ -118,6 +118,21 @@ instrument will be used to expose the last value, not a semantic definition. Uses of traditional gauge instruments translate into an Observer or the Measure instrument in this API. +### Label sets + +_Label_ is the term used to refer to a key:value attribute associated +with a metric event. Although they are fundamentally similar to [Span +attributes](api-tracing.md#span) in the tracing API, a set of labels +is given its own type in the Metric API, `LabelSet`. Label sets are a +feature of the API to facilitate re-use across the metric API. Users +are encouraged to re-use label sets whenever possible, as they may +contain a previously encoded representation of the data. + +Users obtain label sets by calling the `Meter` API function (e.g., +`Meter.GetLabels({ key, value }, ...)`). Each of the instrument +calling conventions detailed in the [user-facing API +specification](api-metrics-user.md) accepts a `LabelSet`. + ### Three kinds of instrument #### Counter From 891759e7ba18fbab4955dc97812cb588530be05c Mon Sep 17 00:00:00 2001 From: jmacd Date: Thu, 30 Jan 2020 14:48:05 -0800 Subject: [PATCH 09/54] Add examples --- specification/api-metrics.md | 119 ++++++++++++++++++++++++++++++++--- 1 file changed, 111 insertions(+), 8 deletions(-) diff --git a/specification/api-metrics.md b/specification/api-metrics.md index 1a61a930a21..c52f9255734 100644 --- a/specification/api-metrics.md +++ b/specification/api-metrics.md @@ -115,8 +115,8 @@ device (e.g., a speedometer on your car's dashboard). The problem with "Gauge" starts from the term itself, which is figurative in nature. Using the word "gauge" suggests a behavior, that the instrument will be used to expose the last value, not a semantic -definition. Uses of traditional gauge instruments translate into an -Observer or the Measure instrument in this API. +definition. Uses of traditional gauge instruments translate into uses +of Observer or Measure instruments in this API. ### Label sets @@ -137,11 +137,12 @@ specification](api-metrics-user.md) accepts a `LabelSet`. #### Counter -Counter instruments are used to report sums. These are sometimes used -to monitor _rates_, and they are sometimes used to report _totals_. -One key property of Counter instruments is that two `Add(1)` events -are semantically equivalent to one `Add(2)` event, meaning that -Counter events can be combined using addition by definition. +Counter instruments are used to report sums. These are commonly used +to monitor rates, and they are sometimes used to report totals. One +key property of Counter instruments is that two `Add(1)` events are +semantically equivalent to one `Add(2)` event--`Add(m)` and `Add(n)` +is equivalent to `Add(m+n)`. This means that Counter events can be +combined using addition by definition. Labels associated with Counter instrument events can be used to compute rates and totals over selected dimensions. When aggregating @@ -377,4 +378,106 @@ change. ## Examples -TODO(jmacd): Working on these 1/29/2020. \ No newline at end of file +### Reporting bytes read and written + +You wish to monitor the number of bytes read and written from a +messaging server that supports several protocols. The number of bytes +read and written should be labeled with the protocol name and +aggregated in the process. + +This is a typical application for the Counter instrument. Use one +Counter for bytes read and one Counter for bytes written. When +handling a request, compute a LabelSet containing the name of the +protocol and potentially other useful labels, then call `Add()` twice +with the same label set and the number of bytes read and written. + +To make lower the cost of this reporting, you can `Bind()` the +instrument with each of the supported protocols ahead of time and +avoid computing the label set for each request. + +### Reporting per-request CPU usage + +Suppose you have a way to measure the CPU usage of processing an +individual request. This is given to you in terms of cpu-seconds +consumed. You may wish to monitor total CPU usage, or you could be +interested in the peak rate of CPU usage. + +Use a Counter instrument to `Add()` this quantity to an instrument +named `cpu.seconds.used` after sending the response. A Counter is +called for, in this case, because a sum is requested, meaning a sum of +all `Add()` events for the instrument in the specified time range. + +### Reporting system call duration + +You wish to monitor the duration of a specific system call being made +frequently in your application, with a label to indicate a file name +associated with the operation. + +This is a typical application for the Measure instrument. Use a timer +to measure the duration of each call and `Record()` the measurement +with a label for the file name. + +### Reporting request size + +You wish to monitor a trend in request sizes, which means you are +interested in characterizing individual events, as opposed to a sum. +Label these with relevant information that may help explain variance +in request sizes, such as the type of the request. + +This is a typical application for a Measure instrument. The standard +aggregation for Measure instruments will compute a measurement sum and +the event count, which determines the mean request size, as well as +the minimum and maximum sizes. + +### Reporting a per-request finishing account balance + +There's a number that rises and falls such as a bank account balance. +You wish to monitor the average account balance at the end of +requests, broken down by transaction type (e.g., withdrawal, deposit). + +Use a Measure instrument to report the current account balance at the +end of each request. Use a label for the transaction type. + +### Reporting process-wide CPU usage + +You are interested in reporting the CPU usage of the process as a +whole, which is computed via a (relatively expensive) system call +which returns two values, process-lifetime user and system +cpu-seconds. It is not necessary to update this measurement +frequently, because it is meant to be used only for accounting +purposes. + +A single Observer instrument is recommended for this case, with a +label value to distinguish user from system CPU time. Declare this as +a monotonic instrument, since CPU usage never falls. The Observer +callback will be called once per collection interval, which lowers the +cost of collecting this information. + +CPU usage is something that we naturally sum, which raises several +questions. + +- Why not use a Counter instrument? In order to use a Counter +instrument, we would need to convert total usage figures into +differences. Calculating differences from the previous measurement is +easy to do, but Counter instruments are not meant to be used from +callbacks. +- Why not report differences in the Observer callback? Observer +instruments are meant to be used to observe current values. Nothing +prevents reporting differences with an Observer, but the standard +aggregation for Observer instruments is to sum the current value +across distinct label sets. The standard behavior is useful for +determining the current rate of CPU usage, but special configuration +would be required for an Observer instrument to use Counter +aggregation. + +### Reporting per-shard memory holdings + +Suppose you have a widely-used library that acts as a client to a +sharded service. For each shard it maintains some client-side state, +holding a variable amount of memory per shard. + +Observe the current allocation per shard using an Observer instrument +with a shard label. These can be aggregated across hosts to compute +cluster-wide memory holdings by shard, for example, using the standard +aggregation for Observers, which sums the current value across +distinct label sets. From a68d28c4e60df2ad9a92e049e4b8630b56f6dd58 Mon Sep 17 00:00:00 2001 From: jmacd Date: Tue, 4 Feb 2020 12:53:03 -0800 Subject: [PATCH 10/54] Misspellings --- specification/api-metrics.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/specification/api-metrics.md b/specification/api-metrics.md index c52f9255734..2e6f65e160a 100644 --- a/specification/api-metrics.md +++ b/specification/api-metrics.md @@ -105,7 +105,7 @@ values, for example "Histogram" and "Summary" values, which are different ways to expose a distribution of measurements. This API specifies the use of a Measure kind of instrument with `Record()` for recording individual measurements. The instruments are defined so -that users and implementors understand what they mean, beacuse +that users and implementors understand what they mean, because different SDKs will handle events differently. There is a common metric instrument known as a "Gauge" that is not @@ -369,7 +369,7 @@ Observer instruments are meant to be used when measured values report on the current state of the program, as opposed to a change of state in the program. When Observer instruments are used to report physical quantities, use a (default) absolute Observer. When Observer -instruments are used to report measurements can be negative for any +instruments are used to report measurements that can be negative for any reason, use a non-absolute Observer. If the Observer reports a current total sum, declare it as a monotonic @@ -391,7 +391,7 @@ handling a request, compute a LabelSet containing the name of the protocol and potentially other useful labels, then call `Add()` twice with the same label set and the number of bytes read and written. -To make lower the cost of this reporting, you can `Bind()` the +To lower the cost of this reporting, you can `Bind()` the instrument with each of the supported protocols ahead of time and avoid computing the label set for each request. From 610056bdfa406b07747bb1630d5cef184af91a0a Mon Sep 17 00:00:00 2001 From: jmacd Date: Wed, 5 Feb 2020 00:16:41 -0800 Subject: [PATCH 11/54] Rewrite the introductory material --- specification/api-metrics.md | 239 +++++++++++++++++++---------------- 1 file changed, 131 insertions(+), 108 deletions(-) diff --git a/specification/api-metrics.md b/specification/api-metrics.md index 2e6f65e160a..5302c29ea88 100644 --- a/specification/api-metrics.md +++ b/specification/api-metrics.md @@ -24,118 +24,140 @@ ## Overview -The OpenTelemetry Metrics API supports producing diagnostic -measurements using three basic kinds of instrument. "Metrics" are the -thing being produced--mathematical, statistical summaries of certain -observable behaviors in the program. "Instruments" are the devices -used by the program to record observations about their behavior. -Therefore, we use "metric instrument" to refer to a programmatic -interface, allocated through the API, used to produce metric events. -There are three kinds of instruments known as Counters, Measures, and -Observers. - -Monitoring and alerting are the common use-case for the data provided -through metric instruments, after various collection and aggregation -strategies are applied to the data. We find there are many other uses -for the _metric events_ recorded through these instruments. We -imagine metric data being aggregated and recorded as events in tracing -and logging systems too, and for this reason OpenTelemetry requires a -separation of the API from the SDK. - -### User-facing API - -The user-facing OpenTelemetry API for Metrics begins with a `Meter` -interface, usually obtained through dependency injection or a global -instance. The `Meter` API supports defining new metric instruments. -Review the [user-facing OpenTelemetry API -specification](api-metrics-user.md) for more detail about the variety -of methods, options, and optimizations available for users of the -instrumentation API and how to use the instruments defined here. - -### Meter API - -To produce measurements using an instrument, you need an SDK that -supports the `Meter` API, which consists of a set of constructors, the -`Labels` function for building label sets, and the `RecordBatch` -function for batch reporting. Refer to the [sdk-facing OpenTelemetry -API specification](api-metrics-meter.md) for more implementation notes. - -Because of API-SDK separation, the `Meter` implementation ultimately -determines how metrics events are handled. The specification's task -is to define the semantics of the events in high-level terms, so that -users and implementors can agree on their meaning. How the `Meter` -accomplishes its goals and the export capabilities it supports are -specified for the default SDK in the (Metric SDK specification -WIP)[#WIP-spec-issue-347]. - -The standard implementation for `Meter` implementations to follow is -given, to aid with understanding the different instrument use-cases. -For example, a Counter instrument supports `Add()` events, and the -standard implementation is to compute a sum. The sum may be exported -as a grand total or as the change in the grand total, but regardless -of the exporter and specific techniques, the purpose of using a -Counter with `Add()` is to monitor a sum. A detailed explanation for -how to select metric instruments for common use-cases is given below, -according to the semantics defined next. - -### Purpose of this document - -This document gives an overview of the specification, introduces the -the three kinds of instrument, and discusses how end-users should -think about various instruments and options at a high-level, without -getting into detail about specific function calls. For details about -specific function calls, refer to the detailed specifications linked -above. - -## Metric API / SDK separation - -The API distinguishes metric instruments by semantic meaning, not by -the type of value produced in an exporter. This is a departure from -convention, compared with a number of common metric libraries, and -stems from the separation of the API and the SDK. The SDK ultimately -determines how to handle metric events and could potentially implement -non-standard behavior. All metric events can be represented as -consisting of a timestamp, an instrument, a number (the _value_), and -a label set. The semantics defined here are meant to assist both -application and SDK implementors, and examples will be given below. - -The separation of API and SDK explains why the metric API does not -have metric instruments that generate specific metric "exposition" -values, for example "Histogram" and "Summary" values, which are -different ways to expose a distribution of measurements. This API -specifies the use of a Measure kind of instrument with `Record()` for -recording individual measurements. The instruments are defined so -that users and implementors understand what they mean, because -different SDKs will handle events differently. - -There is a common metric instrument known as a "Gauge" that is not -included in this API, the term "Gauge" referring to an instrument, -often mechanical, for reading the current "last" value of a measuring -device (e.g., a speedometer on your car's dashboard). The problem -with "Gauge" starts from the term itself, which is figurative in -nature. Using the word "gauge" suggests a behavior, that the -instrument will be used to expose the last value, not a semantic -definition. Uses of traditional gauge instruments translate into uses -of Observer or Measure instruments in this API. +The OpenTelemetry Metrics API supports capturing measurements about +the execution of a computer program in real time. The Metrics API is +designed explicitly for processing raw measurements, generally with +the intent to produce continuous summaries those measurements, also in +real time. Hereafter, "the API" refers to the OpenTelemetry Metrics +API unless otherwise specified. + +The API provides functions for entering raw measurements, through +several [calling conventions](TODO: link to user doc) that offer +different levels of performance. Regardless of calling convention, we +define a _metric event_ as the logical thing that happens when a new +measurement is entered. + +Monitoring and alerting systems commonly use the data provided through +metric events, after applying various [aggregations](#aggregations) +and converting into various [exposition formats](#exposition-formats). +However, we find that there are many other uses for metric events, +such as to record the aggregated or raw data in tracing and logging +systems. For this reason, [OpenTelemetry requires a separation of the +API from the SDK](library-guidelines.md#requirements), so that +different SDKs can be configured at runtime. + +The word "semantic" or "semantics" as used here refers to _how we give +meaning_ to metric events, as they take place under the API. The term +is used extensively in this document to define and explain these API +functions and how we should interpret them. As far as possible, the +terminology used here tries to convey the intended semantics, and a +_standard implementation_ will be described below to help us +understand their meaning. The standard implementation defined here +corresponds to the behavior of the default [OpenTelemetry Metrics +Default SDK](TODO: after PR #347 merges). + +### Metric Instruments + +A _metric instrument_, of which there are three kinds, is a device for +entering raw measurements into the API. There are Counter, Measure, +and Observer instruments, each with different semantics and intended +uses, that will be specified here. All measurements that enter the +API are associated with an instrument, which gives the measurement its +properties. Instruments are created and defined through calls to a +`Meter` API, which is the user-facing entry point to the SDK. + +Each kinds of metric instrument has its own semantics, briefly +described as: + +- Counter: metric events of this kind _Add_ to a value that you would sum over time +- Measure: metric events of this kind _Record_ a value that you would average over time +- Observer: metric events of this kind _Observe_ a coherent set of values at an instant in time. + +An _instrument definition_ describes several properties of the +instrument, including its name and its kind. The other properties of +a metric instrument are optional, including its description, the unit +of measurement, and several settings convey additional meaning, such +as indicating that it is appropriate to compute a rate over time. An +instrument definition is associated with the events that it produces. + +Details about calling conventions for each kind of instrument are +covered in the [user-level API specification](api-metrics-user.md). ### Label sets -_Label_ is the term used to refer to a key:value attribute associated +_Label_ is the term used to refer to a key-value attribute associated with a metric event. Although they are fundamentally similar to [Span -attributes](api-tracing.md#span) in the tracing API, a set of labels -is given its own type in the Metric API, `LabelSet`. Label sets are a -feature of the API to facilitate re-use across the metric API. Users -are encouraged to re-use label sets whenever possible, as they may -contain a previously encoded representation of the data. +attributes](api-tracing.md#span) in the tracing API, a [label +set](TODO: link to user doc) is given its own type in the Metrics API +(generally: `LabelSet`). Label sets are a feature of the API meant to +facilitate re-use, to lower the cost of processing metric events. +Users are encouraged to re-use label sets whenever possible, as they +may contain a previously encoded representation of the labels. + +Users obtain label sets by calling a `Meter` API function. Each of +the instrument calling conventions detailed in the [user-level API +specification](api-metrics-user.md) accepts a label set. + +### Meter Interface + +To produce measurements using an instrument, you need an SDK that +supports the `Meter` API, which consists of a set of instrument +constructors, functionality related to label sets, and a facility for +reporting batches of measurements in a semantically atomic way. + +There is a global `Meter` instance available for use. Use of the this +instance allows library code that uses it to be automatically enabled +whenever the main application configures an SDK at the global level. + +Details about installing an SDK and obtaining a `Meter` are covered in +the [SDK-level API specification](api-metrics-meter.md). + +### Aggregations + +_Aggregation_ refers to the process of combining a large number of +measurements into exact or estimated statistics about the metric +events that took place during a window of real time, during program +execution. Computing aggregations is mainly a subject of the SDK +specification. + +The semantic connection between the API and the SDK is established +here, through the interpretation given by the standard implementation. +In the standard implementation: + +- Counter instruments use _Sum_ aggregation +- Measure instruments use _MinMaxSumCount_ aggregation +- Observer instruments use _LastValue_ aggregation. + +The default Metric SDK specification includes support for configuring +alternative aggregations, so that metric instruments can be repuposed +and their data can be examined in different ways. Using the default +SDK, or an alternate one, we are able to change the interpretation of +metric events at runtime. Other aggregations are available, +especially for Measure instruments, where we are generally interested +in a variety of forms of statistics, such as histogram and quantile +summaries. + +### Metric Event Format + +Metric events have the same logical representation, regardless of +kind. Whether a Counter, a Measure, or an Observer instrument, metric +events produced through an instrument consist of: + +- an implicit timestamp at the moment the API function is called +- the instrument definition +- a value (numeric) +- a label set +- a [Context](api-context.md) -Users obtain label sets by calling the `Meter` API function (e.g., -`Meter.GetLabels({ key, value }, ...)`). Each of the instrument -calling conventions detailed in the [user-facing API -specification](api-metrics-user.md) accepts a `LabelSet`. +Because metric events are implicitly timestamped, we could refer to a +series of metric events as a _time series_. However, we reserve the +use of this term for the SDK specification, to refer to parts of a +data format that express explicitly timestamped values, in sequence, +resulting from an aggregation over time. -### Three kinds of instrument +## Three kinds of instrument -#### Counter +### Counter Counter instruments are used to report sums. These are commonly used to monitor rates, and they are sometimes used to report totals. One @@ -157,7 +179,7 @@ sums that rise and fall. Examples: requests processed, bytes read or written, memory allocated and deallocated. -#### Measure +### Measure Measure instruments are used to report individual measurements. Measure events are semantically independent and cannot be naturally @@ -177,7 +199,7 @@ not absolute, supporting both postive and negative values. Examples: request latency, number of query terms, temperature, fan speed, account balance, load average, screen width. -#### Observer +### Observer Observer instruments are used to report a current set of values at the time of collection. Observer instruments report not only current @@ -204,7 +226,7 @@ would be. Examples: memory held per shard, queue size by name. -### Standard Interpretation +## Interpretation We believe the three instrument kinds Counter, Measure, and Observer form a sufficient basis for expressing nearly all metric data. But if @@ -362,6 +384,7 @@ Observer instruments are recommended for reporting measurements about the state of the program at a moment in time. These expose current information about the program itself, not related to individual events taking place in the program. Observer instruments are reported +b outside of a context, thus do not have an effective span context or correlation context. From 4f85830e613539cdd97b0dead2417735c0faedb8 Mon Sep 17 00:00:00 2001 From: jmacd Date: Wed, 5 Feb 2020 16:50:50 -0800 Subject: [PATCH 12/54] Add a section on time --- specification/api-metrics.md | 94 ++++++++++++++++++++++++++---------- 1 file changed, 68 insertions(+), 26 deletions(-) diff --git a/specification/api-metrics.md b/specification/api-metrics.md index 5302c29ea88..666aa583190 100644 --- a/specification/api-metrics.md +++ b/specification/api-metrics.md @@ -1,4 +1,4 @@ -# Metrics API + # Metrics API @@ -27,9 +27,9 @@ The OpenTelemetry Metrics API supports capturing measurements about the execution of a computer program in real time. The Metrics API is designed explicitly for processing raw measurements, generally with -the intent to produce continuous summaries those measurements, also in -real time. Hereafter, "the API" refers to the OpenTelemetry Metrics -API unless otherwise specified. +the intent to produce continuous summaries of those measurements, also +in real time. Hereafter, "the API" refers to the OpenTelemetry +Metrics API. The API provides functions for entering raw measurements, through several [calling conventions](TODO: link to user doc) that offer @@ -41,10 +41,10 @@ Monitoring and alerting systems commonly use the data provided through metric events, after applying various [aggregations](#aggregations) and converting into various [exposition formats](#exposition-formats). However, we find that there are many other uses for metric events, -such as to record the aggregated or raw data in tracing and logging -systems. For this reason, [OpenTelemetry requires a separation of the -API from the SDK](library-guidelines.md#requirements), so that -different SDKs can be configured at runtime. +such as to record aggregated or raw measurements in tracing and +logging systems. For this reason, [OpenTelemetry requires a +separation of the API from the SDK](library-guidelines.md#requirements), +so that different SDKs can be configured at runtime. The word "semantic" or "semantics" as used here refers to _how we give meaning_ to metric events, as they take place under the API. The term @@ -75,10 +75,10 @@ described as: An _instrument definition_ describes several properties of the instrument, including its name and its kind. The other properties of -a metric instrument are optional, including its description, the unit -of measurement, and several settings convey additional meaning, such -as indicating that it is appropriate to compute a rate over time. An -instrument definition is associated with the events that it produces. +a metric instrument are optional, including a description, the unit of +measurement, and several settings that convey additional meaning +(e.g., monotonicity). An instrument definition is associated with the +events that it produces. Details about calling conventions for each kind of instrument are covered in the [user-level API specification](api-metrics-user.md). @@ -120,9 +120,10 @@ events that took place during a window of real time, during program execution. Computing aggregations is mainly a subject of the SDK specification. -The semantic connection between the API and the SDK is established -here, through the interpretation given by the standard implementation. -In the standard implementation: +Users do not have a facility in the API to select the aggregation they +want for particular instruments. The choice of instrument dictates +semantics and thus gives a default interpretation. For the standard +implementation: - Counter instruments use _Sum_ aggregation - Measure instruments use _MinMaxSumCount_ aggregation @@ -132,10 +133,48 @@ The default Metric SDK specification includes support for configuring alternative aggregations, so that metric instruments can be repuposed and their data can be examined in different ways. Using the default SDK, or an alternate one, we are able to change the interpretation of -metric events at runtime. Other aggregations are available, -especially for Measure instruments, where we are generally interested -in a variety of forms of statistics, such as histogram and quantile -summaries. +metric events at runtime. + +Other standard aggregations are available, especially for Measure +instruments, where we are generally interested in a variety of forms +of statistics, such as histogram and quantile summaries. + +### Time + +Time is a fundamental property of metric events, but not an explicit +one. Users do not provide explicit timestamps for metric events. The +SDK is welcome to, but not required to capture the current timestamp +for each event by reading from a clock. + +This non-requirement stems from a common optimization in metrics +reporting, which is to configure metric data collection with a +relatively small period (e.g., 1 second) and use a single timestamp to +describe a batch of exported data, since the loss of precision is +insignificant when aggregating data across minutes or hours of data. + +Aggregations are commonly computed over a series of events that fall +into a contiguous region of time, known as the collection interval. +Since the SDK controls decision to start collection, it is possible to +collect aggregated metric data while only reading the clock once per +collection interval. + +Counter and Measure instruments offer synchronous APIs for entering +measurements. Metric events from Counter and Measure instruments +happen when they happen, the moment the SDK receives the function +call. + +The Observer instrument supports an asynchronous API, allowing the SDK +to collect metric data on demand, once per collection interval. A +single Observer instrument callback can enter multiple metric events +associated with different label sets. Semantically, by definition, +these observations are captured at a single instant in time, the +instant that they became the current set of last-measured values. + +Because metric events are implicitly timestamped, we could refer to a +series of metric events as a _time series_. However, we reserve the +use of this term for the SDK specification, to refer to parts of a +data format that express explicitly timestamped values, in a sequence, +resulting from an aggregation of raw measurements over time. ### Metric Event Format @@ -144,19 +183,22 @@ kind. Whether a Counter, a Measure, or an Observer instrument, metric events produced through an instrument consist of: - an implicit timestamp at the moment the API function is called -- the instrument definition +- the instrument definition (name, kind, and options) - a value (numeric) - a label set - a [Context](api-context.md) -Because metric events are implicitly timestamped, we could refer to a -series of metric events as a _time series_. However, we reserve the -use of this term for the SDK specification, to refer to parts of a -data format that express explicitly timestamped values, in sequence, -resulting from an aggregation over time. +This is the outcome of separating the API from the SDK--a common +representation for metric events, where the only semantic distinction +is the kind of instrument that was used. ## Three kinds of instrument +Because of API-SDK separation, the `Meter` implementation ultimately +determines how metrics events are handled. Therefore, the choice of +instrument should be guided by semantics and the intended +interpretation. Here we detail the three instruments. + ### Counter Counter instruments are used to report sums. These are commonly used @@ -164,7 +206,7 @@ to monitor rates, and they are sometimes used to report totals. One key property of Counter instruments is that two `Add(1)` events are semantically equivalent to one `Add(2)` event--`Add(m)` and `Add(n)` is equivalent to `Add(m+n)`. This means that Counter events can be -combined using addition by definition. +combined using addition, by definition. Labels associated with Counter instrument events can be used to compute rates and totals over selected dimensions. When aggregating From 23a3430f51cfefe3fb2804c96e1258965a2c1a51 Mon Sep 17 00:00:00 2001 From: jmacd Date: Wed, 5 Feb 2020 16:58:43 -0800 Subject: [PATCH 13/54] Clarify 'standard implementation' and 'default interpretation' --- specification/api-metrics.md | 19 ++++++++++--------- 1 file changed, 10 insertions(+), 9 deletions(-) diff --git a/specification/api-metrics.md b/specification/api-metrics.md index 666aa583190..81a0dfbf4d0 100644 --- a/specification/api-metrics.md +++ b/specification/api-metrics.md @@ -52,9 +52,9 @@ is used extensively in this document to define and explain these API functions and how we should interpret them. As far as possible, the terminology used here tries to convey the intended semantics, and a _standard implementation_ will be described below to help us -understand their meaning. The standard implementation defined here -corresponds to the behavior of the default [OpenTelemetry Metrics -Default SDK](TODO: after PR #347 merges). +understand their meaning. The standard implementation performs +aggregation corresponding to the default interpretation for each kind +of metric event. ### Metric Instruments @@ -185,8 +185,8 @@ events produced through an instrument consist of: - an implicit timestamp at the moment the API function is called - the instrument definition (name, kind, and options) - a value (numeric) +- a [Context](api-context.md) (span context, correlation context) - a label set -- a [Context](api-context.md) This is the outcome of separating the API from the SDK--a common representation for metric events, where the only semantic distinction @@ -194,10 +194,10 @@ is the kind of instrument that was used. ## Three kinds of instrument -Because of API-SDK separation, the `Meter` implementation ultimately -determines how metrics events are handled. Therefore, the choice of -instrument should be guided by semantics and the intended -interpretation. Here we detail the three instruments. +Because the API is separated from the SDK, the implementation +ultimately determines how metric events are handled. Therefore, the +choice of instrument should be guided by semantics and the intended +interpretation. Here we detail the three instruments and their semantics. ### Counter @@ -206,7 +206,8 @@ to monitor rates, and they are sometimes used to report totals. One key property of Counter instruments is that two `Add(1)` events are semantically equivalent to one `Add(2)` event--`Add(m)` and `Add(n)` is equivalent to `Add(m+n)`. This means that Counter events can be -combined using addition, by definition. +combined using addition, by definition, which makes them relatively +inexpensive Labels associated with Counter instrument events can be used to compute rates and totals over selected dimensions. When aggregating From c97fefda9c23b72d6e386307b667dd718c780ece Mon Sep 17 00:00:00 2001 From: jmacd Date: Wed, 5 Feb 2020 21:22:05 -0800 Subject: [PATCH 14/54] Explain aggregations; Discourage timestamp use --- specification/api-metrics.md | 16 +++++++++------- 1 file changed, 9 insertions(+), 7 deletions(-) diff --git a/specification/api-metrics.md b/specification/api-metrics.md index 81a0dfbf4d0..3d99275e7a2 100644 --- a/specification/api-metrics.md +++ b/specification/api-metrics.md @@ -118,7 +118,8 @@ _Aggregation_ refers to the process of combining a large number of measurements into exact or estimated statistics about the metric events that took place during a window of real time, during program execution. Computing aggregations is mainly a subject of the SDK -specification. +specification, with the goal of reducing the amount of data that must +be sent to the telemetry collection backend. Users do not have a facility in the API to select the aggregation they want for particular instruments. The choice of instrument dictates @@ -142,9 +143,10 @@ of statistics, such as histogram and quantile summaries. ### Time Time is a fundamental property of metric events, but not an explicit -one. Users do not provide explicit timestamps for metric events. The -SDK is welcome to, but not required to capture the current timestamp -for each event by reading from a clock. +one. Users do not provide explicit timestamps for metric events. +SDKs are discouraged from capturing the current timestamp for each +event (by reading from a clock) unless there is a definite need for +high-precision timestamps. This non-requirement stems from a common optimization in metrics reporting, which is to configure metric data collection with a @@ -156,11 +158,11 @@ Aggregations are commonly computed over a series of events that fall into a contiguous region of time, known as the collection interval. Since the SDK controls decision to start collection, it is possible to collect aggregated metric data while only reading the clock once per -collection interval. +collection interval. The default SDK takes this approach. Counter and Measure instruments offer synchronous APIs for entering -measurements. Metric events from Counter and Measure instruments -happen when they happen, the moment the SDK receives the function +measurements. Metric events from Counter and Measure instruments are +captured when they happen, the moment the SDK receives the function call. The Observer instrument supports an asynchronous API, allowing the SDK From ab5088c0e68b1c7bea2bb859c0f7026563e9b720 Mon Sep 17 00:00:00 2001 From: jmacd Date: Wed, 5 Feb 2020 22:20:25 -0800 Subject: [PATCH 15/54] Reword the metric event format --- specification/api-metrics.md | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/specification/api-metrics.md b/specification/api-metrics.md index 3d99275e7a2..d84ba3ac5ae 100644 --- a/specification/api-metrics.md +++ b/specification/api-metrics.md @@ -162,8 +162,8 @@ collection interval. The default SDK takes this approach. Counter and Measure instruments offer synchronous APIs for entering measurements. Metric events from Counter and Measure instruments are -captured when they happen, the moment the SDK receives the function -call. +captured at the moment they happen, when the SDK receives the +corresponding function call. The Observer instrument supports an asynchronous API, allowing the SDK to collect metric data on demand, once per collection interval. A @@ -184,15 +184,15 @@ Metric events have the same logical representation, regardless of kind. Whether a Counter, a Measure, or an Observer instrument, metric events produced through an instrument consist of: -- an implicit timestamp at the moment the API function is called -- the instrument definition (name, kind, and options) -- a value (numeric) -- a [Context](api-context.md) (span context, correlation context) -- a label set +- [Context](context.md) (Span context, Correlation context) +- timestamp (implicit to the SDK) +- instrument definition (name, kind, and semantic options) +- label set (associated key-values) +- value (a number) -This is the outcome of separating the API from the SDK--a common +This format is the result of separating the API from the SDK--a common representation for metric events, where the only semantic distinction -is the kind of instrument that was used. +is the kind of instrument that was specified by the user. ## Three kinds of instrument From da8f18957a36f3886cd4772e8198c109a1909387 Mon Sep 17 00:00:00 2001 From: jmacd Date: Thu, 6 Feb 2020 00:35:03 -0800 Subject: [PATCH 16/54] Update Counter and Measure --- specification/api-metrics.md | 98 +++++++++++++++++++++++------------- 1 file changed, 64 insertions(+), 34 deletions(-) diff --git a/specification/api-metrics.md b/specification/api-metrics.md index d84ba3ac5ae..ad6a9a7cd9e 100644 --- a/specification/api-metrics.md +++ b/specification/api-metrics.md @@ -199,50 +199,80 @@ is the kind of instrument that was specified by the user. Because the API is separated from the SDK, the implementation ultimately determines how metric events are handled. Therefore, the choice of instrument should be guided by semantics and the intended -interpretation. Here we detail the three instruments and their semantics. +interpretation. Here we detail the three instruments and their +individual semantics. + +Metric instruments can be qualified in several ways, primarily related +to the range of values that are semantically meaningful for the +specific instrument. _Range options_, as they are called, indicate +additional semantics that are commonly useful to the SDK and systems +based on telemetry data. + +Normally, without setting any explicit options, instruments accept +non-negative values by default, called _Absolute_ instruments. An +explicit to declare support for both positive and negative values +existed, these are called _Unrestricted_ instruments. + +Out-of-range values are considered semantically meaningless. SDKs +SHOULD implement data validation for Absolute (as opposed to +Unrestricted) instruments by dropping negative values. (The [default +SDK](TODO: link) MUST implement data validation for Absolute +instruments.) ### Counter -Counter instruments are used to report sums. These are commonly used -to monitor rates, and they are sometimes used to report totals. One -key property of Counter instruments is that two `Add(1)` events are -semantically equivalent to one `Add(2)` event--`Add(m)` and `Add(n)` -is equivalent to `Add(m+n)`. This means that Counter events can be -combined using addition, by definition, which makes them relatively -inexpensive +Counter instruments are used to enter changes in sums, synchronously. +These are commonly used to monitor rates, and they are sometimes used +to report totals that rise and fall. An essential property of Counter +instruments is that two `Add(1)` events are semantically equivalent to +one `Add(2)` event--`Add(m)` and `Add(n)` is equivalent to `Add(m+n)`. +This property means that Counter events can be combined using +addition, by definition, which makes them relatively inexpensive. Labels associated with Counter instrument events can be used to -compute rates and totals over selected dimensions. When aggregating -Counter events we naturally combine values using addition. Counter -`Add(0)` events are no-ops by definition. +compute rates and totals from the instrument, over selected +dimensions. When aggregating Counter events, we naturally combine +values using arithmetic addition. Counter `Add(0)` events are no-ops, +by definition. -Counters are monotonic by default, meaning `Add()` logically accepts -only non-negative values. Monotonicity is useful for defining rates, -especially; non-monotonic Counter instruments are an option to support -sums that rise and fall. +#### Range options -Examples: requests processed, bytes read or written, memory allocated -and deallocated. +Counter instruments are Absolute by default, meaning `Add()` logically +accepts only non-negative values. Absolute counters are commonly +useful for defining rates. The sum of an absolute counter is +monotonic because it never adds a negative amount. Examples: requests +processed, bytes read or written. -### Measure - -Measure instruments are used to report individual measurements. -Measure events are semantically independent and cannot be naturally -combined like with Counters. Measure instruments are used to report -many kinds of information, recommended for reporting measurements -associated with real events in the program. +Unrestricted Counter instruments are an option to support sums that +rise and fall, useful in selected cases where the report value changes +to an sum that may be unknown to the observer. Examples: semaphore +up/down, channel enqueue/deque. -As a synchronous API, Measure `Record()` events can be used to record -information with associated labels and context. This is the more -general of the two synchronous metric instruments. - -Measure instruments support non-negative values by default, also known -as "absolute" in the sense that, mathematically, absolute values are -never negative. As an option, Measure instruments can be defined as -not absolute, supporting both postive and negative values. +### Measure -Examples: request latency, number of query terms, temperature, fan -speed, account balance, load average, screen width. +Semantically, metric events from Measure instruments are independent, +meaning they cannot be combined naturally, as with Counters. Measure +instruments are used to report many kinds of information, and are +recommended for all cases where the additive property of Counter +instruments does not apply. + +Labels associated with Measure instrument events can be used to +compute information about the distribution of values from the +instrument, over selected dimensions. When aggregating Measure +events, the output statistics are expected to reflect the combined +data set. + +#### Range options + +Measure instruments are Absolute by default, meaning `Record()` +logically accepts only non-negative values. Absolute measures are +commonly used for measuring things like latency and size that are +never negative. Examples: request latency, request size, screen +width. + +Unrestricted Measure instruments are an option to support +distributions that include negative values. Examples: fan speed, +velocity, longitude. ### Observer From afa0bb18266f692596ed0fea9b2ff723bdfa8c31 Mon Sep 17 00:00:00 2001 From: jmacd Date: Thu, 6 Feb 2020 01:00:37 -0800 Subject: [PATCH 17/54] Update Observer (partial) --- specification/api-metrics.md | 61 ++++++++++++++++++++++++------------ 1 file changed, 41 insertions(+), 20 deletions(-) diff --git a/specification/api-metrics.md b/specification/api-metrics.md index ad6a9a7cd9e..b115ed57b19 100644 --- a/specification/api-metrics.md +++ b/specification/api-metrics.md @@ -103,7 +103,7 @@ specification](api-metrics-user.md) accepts a label set. To produce measurements using an instrument, you need an SDK that supports the `Meter` API, which consists of a set of instrument constructors, functionality related to label sets, and a facility for -reporting batches of measurements in a semantically atomic way. +entering batches of measurements in a semantically atomic way. There is a global `Meter` instance available for use. Use of the this instance allows library code that uses it to be automatically enabled @@ -223,7 +223,7 @@ instruments.) Counter instruments are used to enter changes in sums, synchronously. These are commonly used to monitor rates, and they are sometimes used -to report totals that rise and fall. An essential property of Counter +to enter totals that rise and fall. An essential property of Counter instruments is that two `Add(1)` events are semantically equivalent to one `Add(2)` event--`Add(m)` and `Add(n)` is equivalent to `Add(m+n)`. This property means that Counter events can be combined using @@ -244,7 +244,7 @@ monotonic because it never adds a negative amount. Examples: requests processed, bytes read or written. Unrestricted Counter instruments are an option to support sums that -rise and fall, useful in selected cases where the report value changes +rise and fall, useful in selected cases where the enter value changes to an sum that may be unknown to the observer. Examples: semaphore up/down, channel enqueue/deque. @@ -252,7 +252,7 @@ up/down, channel enqueue/deque. Semantically, metric events from Measure instruments are independent, meaning they cannot be combined naturally, as with Counters. Measure -instruments are used to report many kinds of information, and are +instruments are used to enter many kinds of information, and are recommended for all cases where the additive property of Counter instruments does not apply. @@ -271,27 +271,48 @@ never negative. Examples: request latency, request size, screen width. Unrestricted Measure instruments are an option to support -distributions that include negative values. Examples: fan speed, -velocity, longitude. +distributions that include negative values. Example: velocity, +longitude. ### Observer -Observer instruments are used to report a current set of values at the -time of collection. Observer instruments report not only current -values, but also _which label sets are current_ at the moment of -collection, as a coherent set of values. These instruments reduce -collection cost because they are computed and reported only once per -collection interval, by definition. +Observer instruments are used to enter a _current set of values_ at a +point in time. Observer instruments are asynchronous, with the use of +callbacks allowing the user to enter multiple values per collection +interval. + +Observer instruments enter not only current values, but also +effectively _which label sets are current_ at the moment of +collection. These instruments can be used to compute probabilities +and ratios, because values are part of a set. Unlike Counter and Measure instruments, Observer instruments are -synchronized with collection, used to report values, on demand, -calculated by the program itself. There is no aggregation across time -for Observer instruments by definition, only the current value is -defined. - -Observer instruments can be declared as monotonic. A monotonic -measure instrument supports reporting values that are not less than -the value reported in the previous collection interval. +synchronized with collection. There is no aggregation across time for +Observer instruments by definition, only the current set of values is +semantically defined. + +These values are considered coherent, because measurements from an +Observer instrument in a single collection interval are considered +simultaneous. The set of measurements entered through one callback +invocation implicitly share a timestamp. + +#### Range options + +Observer instruments are Absolute by default, meaning `Observe()` +logically accepts only non-negative values. Absolute observers are +commonly used for measuring things like cpu-load or queue sizes. + +Unrestricted Observer instruments can be used for quantities that can +be negative, such as account balance or degrees Celsius. + +Observer instruments can be declared as _Monotonic_, a special kind of +range option that applies only to Observer instruments. The range of +a Monotonic Observer logically must be greater than or equal to the +preceding value. + +#### Aggregation options + +TODO: HERE When aggregating Observer instrument values across dimensions other than time, Observer instruments may be treated like Counters (to From a0730c6679b6bc34a4c3afe7fe915648c3a35d49 Mon Sep 17 00:00:00 2001 From: jmacd Date: Thu, 6 Feb 2020 15:56:08 -0800 Subject: [PATCH 18/54] Remove options, discuss views API --- specification/api-metrics.md | 318 +++++++++++++---------------------- 1 file changed, 121 insertions(+), 197 deletions(-) diff --git a/specification/api-metrics.md b/specification/api-metrics.md index b115ed57b19..0cd2a485d41 100644 --- a/specification/api-metrics.md +++ b/specification/api-metrics.md @@ -1,24 +1,35 @@ - # Metrics API +# Metrics API - [Overview](#overview) - * [User-facing API](#user-facing-api) - * [Meter API](#meter-api) - * [Purpose of this document](#purpose-of-this-document) -- [Metric API / SDK separation](#metric-api--sdk-separation) - * [Three kinds of instrument](#three-kinds-of-instrument) - + [Counter](#counter) - + [Measure](#measure) - + [Observer](#observer) - * [Standard Interpretation](#standard-interpretation) - * [Optional semantic restrictions](#optional-semantic-restrictions) - + [Absolute vs. Non-Absolute](#absolute-vs-non-absolute) - + [Monotonic vs. Non-Monotonic](#monotonic-vs-non-monotonic) + * [Metric Instruments](#metric-instruments) + * [Label sets](#label-sets) + * [Meter Interface](#meter-interface) + * [Aggregations](#aggregations) + * [Time](#time) + * [WithKeys declaration on metric instruments](#withkeys-declaration-on-metric-instruments) + * [Metric Event Format](#metric-event-format) +- [Three kinds of instrument](#three-kinds-of-instrument) + * [Counter](#counter) + * [Measure](#measure) + * [Observer](#observer) +- [Interpretation](#interpretation) + * [Standard implementation](#standard-implementation) + * [Option: Dedicated Measure instrument for timing measurements](#option-dedicated-measure-instrument-for-timing-measurements) + * [Future Work: Option Support](#future-work-option-support) + * [Future Work: Configurable Aggregations / View API](#future-work-configurable-aggregations--view-api) - [Metric instrument selection](#metric-instrument-selection) * [Counters and Measures compared](#counters-and-measures-compared) * [Observer instruments](#observer-instruments) - [Examples](#examples) + * [Reporting bytes read and written](#reporting-bytes-read-and-written) + * [Reporting per-request CPU usage](#reporting-per-request-cpu-usage) + * [Reporting system call duration](#reporting-system-call-duration) + * [Reporting request size](#reporting-request-size) + * [Reporting a per-request finishing account balance](#reporting-a-per-request-finishing-account-balance) + * [Reporting process-wide CPU usage](#reporting-process-wide-cpu-usage) + * [Reporting per-shard memory holdings](#reporting-per-shard-memory-holdings) @@ -178,6 +189,19 @@ use of this term for the SDK specification, to refer to parts of a data format that express explicitly timestamped values, in a sequence, resulting from an aggregation of raw measurements over time. +### WithKeys declaration on metric instruments + +A standard feature of metric SDKs is to pre-aggregate metric events +according to a specified set of label keys (i.e., dimensions). The +API provides a `WithKeys` option for the user to declare the +recommended aggregation keys. + +This feature is useful for configuring _pre-aggegation_ within the +SDK, prior to export, and generally helps lower the cost of exporting +metric data. This feature is included in the API to ensure that this +optimization is easily configured at the point where instruments are +declared. + ### Metric Event Format Metric events have the same logical representation, regardless of @@ -188,7 +212,7 @@ events produced through an instrument consist of: - timestamp (implicit to the SDK) - instrument definition (name, kind, and semantic options) - label set (associated key-values) -- value (a number) +- value (a signed integer or floating point number) This format is the result of separating the API from the SDK--a common representation for metric events, where the only semantic distinction @@ -202,23 +226,6 @@ choice of instrument should be guided by semantics and the intended interpretation. Here we detail the three instruments and their individual semantics. -Metric instruments can be qualified in several ways, primarily related -to the range of values that are semantically meaningful for the -specific instrument. _Range options_, as they are called, indicate -additional semantics that are commonly useful to the SDK and systems -based on telemetry data. - -Normally, without setting any explicit options, instruments accept -non-negative values by default, called _Absolute_ instruments. An -explicit to declare support for both positive and negative values -existed, these are called _Unrestricted_ instruments. - -Out-of-range values are considered semantically meaningless. SDKs -SHOULD implement data validation for Absolute (as opposed to -Unrestricted) instruments by dropping negative values. (The [default -SDK](TODO: link) MUST implement data validation for Absolute -instruments.) - ### Counter Counter instruments are used to enter changes in sums, synchronously. @@ -226,8 +233,8 @@ These are commonly used to monitor rates, and they are sometimes used to enter totals that rise and fall. An essential property of Counter instruments is that two `Add(1)` events are semantically equivalent to one `Add(2)` event--`Add(m)` and `Add(n)` is equivalent to `Add(m+n)`. -This property means that Counter events can be combined using -addition, by definition, which makes them relatively inexpensive. +This property means that Counter events can be combined inexpensively, +by definition. Labels associated with Counter instrument events can be used to compute rates and totals from the instrument, over selected @@ -235,19 +242,6 @@ dimensions. When aggregating Counter events, we naturally combine values using arithmetic addition. Counter `Add(0)` events are no-ops, by definition. -#### Range options - -Counter instruments are Absolute by default, meaning `Add()` logically -accepts only non-negative values. Absolute counters are commonly -useful for defining rates. The sum of an absolute counter is -monotonic because it never adds a negative amount. Examples: requests -processed, bytes read or written. - -Unrestricted Counter instruments are an option to support sums that -rise and fall, useful in selected cases where the enter value changes -to an sum that may be unknown to the observer. Examples: semaphore -up/down, channel enqueue/deque. - ### Measure Semantically, metric events from Measure instruments are independent, @@ -262,18 +256,6 @@ instrument, over selected dimensions. When aggregating Measure events, the output statistics are expected to reflect the combined data set. -#### Range options - -Measure instruments are Absolute by default, meaning `Record()` -logically accepts only non-negative values. Absolute measures are -commonly used for measuring things like latency and size that are -never negative. Examples: request latency, request size, screen -width. - -Unrestricted Measure instruments are an option to support -distributions that include negative values. Example: velocity, -longitude. - ### Observer Observer instruments are used to enter a _current set of values_ at a @@ -294,33 +276,7 @@ semantically defined. These values are considered coherent, because measurements from an Observer instrument in a single collection interval are considered simultaneous. The set of measurements entered through one callback -invocation implicitly share a timestamp. - -#### Range options - -Observer instruments are Absolute by default, meaning `Observe()` -logically accepts only non-negative values. Absolute observers are -commonly used for measuring things like cpu-load or queue sizes. - -Unrestricted Observer instruments can be used for quantities that can -be negative, such as account balance or degrees Celsius. - -Observer instruments can be declared as _Monotonic_, a special kind of -range option that applies only to Observer instruments. The range of -a Monotonic Observer logically must be greater than or equal to the -preceding value. - -#### Aggregation options - -TODO: HERE - -When aggregating Observer instrument values across dimensions other -than time, Observer instruments may be treated like Counters (to -combine a rate or sum) or like Measures (to combine a distribution). -Unless otherwise configured, Observers are aggregated as Counters -would be. - -Examples: memory held per shard, queue size by name. +invocation implicitly share a single timestamp. ## Interpretation @@ -328,109 +284,98 @@ We believe the three instrument kinds Counter, Measure, and Observer form a sufficient basis for expressing nearly all metric data. But if the API and SDK are separated, and the SDK can handle any metric event as it pleases, why not have just one kind of instrument? How are the -instruments fundamentally different, despite all metric events having -the same form (i.e., a timestamp, an instrument, a number, and a label -set)? +instruments fundamentally different, and why are there only three? Establishing different kinds of instrument is important because in -most cases it allows the SDK to provide good default functionality +most cases it allows the SDK to provide good default functionality, without requiring alternative behaviors to be configured. The choice of instrument determines not only the meaning of the events but also the name of the function used to report data. The function names--`Add()` for Counter instruments, `Record()` for Measure -instruments, and `Observe()` for Observer instruments--help convey and -reinforce the standard interpretation of the event. +instruments, and `Observe()` for Observer instruments--help convey the +meaning of these actions. + +### Standard implementation The standard implementation for the three instruments is defined as follows: 1. Counter. The `Add()` function accumulates a total for each distinct label set. When aggregating over distinct label sets for a -Counter, combine using addition. Export as a set of calculated sums. +Counter, combine using arithmetic addition and export as a sum. +Depending on the exposition format, sums are exported either as pairs +of label set and cumulative _difference_ or as pairs of label set and +cumulative _total. + 2. Measure. Use the `Record()` function to report summary statistics -about the distribution of values, for each distinct label set. Which -statistics are used is determined by the implementation, but they -usually include at least the sum of values, the count of measurements, -and the minimum and maximum values. When aggregating distinct label -sets for a Measure, report summary statistics of the combined value -distribution. Exposition formats for Measure data vary widely by -backend service. +about the distribution of values, for each distinct label set. The +summary statistics to use are determined by the implementation, but +they usually include at least the sum of values, the count of +measurements, and the minimum and maximum values. When aggregating +distinct Measure events, report summary statistics of the combined +value distribution. Exposition formats for summary statistics vary +widely, but typically include pairs of label set and (sum, count, +minimum and maximum value). + 3. Observer. Current values are provided by the Observer callback at the end of each Metric collection period. When aggregating values _for the same label set_, combine using the most-recent value. When -aggregating values _for different label sets_, combine using the sum. -Export as pairs of label set and calculated value. - -We recognize that the standard behavior of the three instruments does -not cover all use-cases perfectly. There is a natural tension between -offering dedicated metric instruments for every distinct metric -application and combining use-cases, generalizing semantics to reduce -the API surface area. We could have define more than or fewer than -three kinds of instrument; we have three because these seem like -enough. Where uncommon use-cases call for a non-standard -implementation configuration (e.g., a Measure instrument configured -with last-value aggregation), we accept that users will be required to -provide additional input on how to view certain metric data. - -### Optional semantic restrictions - -The instruments support optional declarations that indicate -restrictions on the valid range of inputs. There are two options, one -to indicate whether the value is signed or not, the other to indicate -monotonicity. These options are meant to be used as a signal to the -observability system, since they impact the way these data are exposed -to users. - -In both cases, the optional restriction does not change the semantics -of the instrument. The options are independent, both can be -meaningfully set on any instrument kind. - -The specification describes enforcement of these options as "best -effort", not required. Users are expected to honor their own -declarations when using instruments, and the SDK is expected to -perform checking of these options only when it can be done -inexpensively. - -#### Absolute vs. Non-Absolute - -Absolute refers to whether an instrument accepts negative values. -Absolute instruments can be described as accepting non-negative -inputs, whereas non-absolute instruments can be described as accepting -signed inputs. - -When an instrument is absolute (i.e., accepts non-negative updates), -we know that the sum can be used to express a rate automatically. -This is true for all kinds of instrument. - -When exporting measure values as a histogram, for example, knowing the -instrument is absolute facilitates the use of logarithmic buckets -(which are difficult to use when the input range spans zero). - -Absolute behavior is the default for all instrument kinds. The -Non-Absolute option is supported for all instrument kinds. - -Because this is a simple property for the SDK to test, the -specification recommends that SDKs SHOULD reject metric events for -absolute instruments when negative values are used, and instead issue -a warning to the user. - -#### Monotonic vs. Non-Monotonic - -Monotonic refers to whether an instrument only accepts values that are -greater than or equal to the previously recorded value. Non-monotonic -instruments are those which accept any change in the value, positive -or negative. - -Absolute-valued counters are naturally monotonic, so that Absolute and -Monotonic have the same interpretation for Counter instruments. - -Measure and Observer instruments may be declared as monotonic, however -since this property is expensive to test, the specification recommends -that SDKs SHOULD implement monotonicity checking only when computing a -last-value aggregation. The SDK SHOULD only perform this test against -the last known value when it holds the necessary information, it -should not go out of its way to save data simply to perform -monotonicity testing. +aggregating values _for different label sets_, combine the value +distribution as for Measure instruments. Export as pairs of label set +and (sum, count, minimum and maximum value). + +We believe that the standard behavior of one of these three +instruments covers nearly all use-cases for users of OpenTelemetry in +terms of the intended semantics. + +### Option: Dedicated Measure instrument for timing measurements + +As a language-optional feature, the API may support a dedicated +instrument for reporting timing measurements. This kind of +instrument, with recommended name `TimingMeasure` (and +`BoundTimingMeasure`), is semantically equivalent to a Measure +instrument, and like the Measure instrument supports a `Record()` +function, but the input value to this instrument is in the language's +conventional data type for timing measurements. + +For example, in Go the API will accept a `time.Duration`, and in C++ +the API will accept a `std::chrono::duration`. These advantage of +using these instruments is that they use the correct units +automatically, avoiding the potential for confusion over timing metrics. + +### Future Work: Option Support + +We are aware of a number of reasons to refine on these types, in order +to offer: + +1. Range restrictions on input data. Instruments accepting negative +values is rare in most applications, for example, and it is useful to +offer both a semantic declaration (e.g., "negative values are +meaningless") and a data validation step (e.g., "negative values +should be dropped"). +2. Monotonicity support. When a series of values is known to be +monotonic, it is useful to declare this; this allows us to detect +process resets. + +For the most part, these behaviors are not necessary for correctness +within the local process or the SDK, but they are valuable in +down-stream services that use this data. We look to future work on +this subject. + +### Future Work: Configurable Aggregations / View API + +The API does not support configurable aggregations, in this +specification. This is a requirement for OpenTelemetry, but there are +two ways this has been requested. + +A _View API_ is defined as an interface to an SDK mechanism that +supports configuring aggregations, including which operator is applied +(sum, p99, last-value, etc.) and which dimensions are used. + +1. Should the API user be provided with options to configure specific +views, statically, in the source? +2. Should the View API be a stand-alone facility, able to install +configurable aggregations, at runtime? ## Metric instrument selection @@ -454,12 +399,7 @@ are useful when only the sum is interesting. Measures are useful when the sum and any other kind of summary information about the individual values are of interest. -If only the sum is of interest, use a Counter instrument. If the -Counter instrument accepts non-negative `Add()` values, use a -(default) monotonic Counter which will typically be expressed as a -rate (i.e., change per unit time). If the Counter accepts both -positive and negative `Add()` values, use a non-monotonic Counter -which will typically be expressed as the total sum. +If only the sum is of interest, use a Counter instrument. If you are interested in any other kind of summary value or statistic, such as mean, median and other quantiles, or minimum and maximum @@ -467,33 +407,18 @@ value, use a Measure instrument. Measure instruments are used to report any kind of measurement that is not typically expressed as a rate or as a total sum. -If the Measure instrument accepts only non-negative values, as is -typically the case for measuring physical quantities, use a (default) -absolute Measure. If the Measure instrument accepts both positive and -negative values, use a non-absolute Measure. Both of these are -typically expressed in terms of a distribution of values, independent -from and _in addition to_ the rate of these measurements. - ### Observer instruments Observer instruments are recommended for reporting measurements about the state of the program at a moment in time. These expose current information about the program itself, not related to individual events taking place in the program. Observer instruments are reported -b outside of a context, thus do not have an effective span context or correlation context. Observer instruments are meant to be used when measured values report -on the current state of the program, as opposed to a change of state -in the program. When Observer instruments are used to report physical -quantities, use a (default) absolute Observer. When Observer -instruments are used to report measurements that can be negative for any -reason, use a non-absolute Observer. - -If the Observer reports a current total sum, declare it as a monotonic -Observer. Monotonic values are typically expressed as a rate of -change. +on the current state of the program, as opposed to an event or a +change of state in the program. ## Examples @@ -567,8 +492,7 @@ frequently, because it is meant to be used only for accounting purposes. A single Observer instrument is recommended for this case, with a -label value to distinguish user from system CPU time. Declare this as -a monotonic instrument, since CPU usage never falls. The Observer +label value to distinguish user from system CPU time. The Observer callback will be called once per collection interval, which lowers the cost of collecting this information. From e1298eaa5ef260eaafcf22c11832134a9b3970e1 Mon Sep 17 00:00:00 2001 From: jmacd Date: Thu, 6 Feb 2020 16:03:13 -0800 Subject: [PATCH 19/54] Fixes --- specification/api-metrics.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/specification/api-metrics.md b/specification/api-metrics.md index 0cd2a485d41..badd2a093e7 100644 --- a/specification/api-metrics.md +++ b/specification/api-metrics.md @@ -112,7 +112,7 @@ specification](api-metrics-user.md) accepts a label set. ### Meter Interface To produce measurements using an instrument, you need an SDK that -supports the `Meter` API, which consists of a set of instrument +implements the `Meter` API, which consists of a set of instrument constructors, functionality related to label sets, and a facility for entering batches of measurements in a semantically atomic way. @@ -309,7 +309,7 @@ cumulative _total. 2. Measure. Use the `Record()` function to report summary statistics about the distribution of values, for each distinct label set. The -summary statistics to use are determined by the implementation, but +summary statistics to use are determined by the aggregation, but they usually include at least the sum of values, the count of measurements, and the minimum and maximum values. When aggregating distinct Measure events, report summary statistics of the combined From cf896f38fef5e91dd43f9dd6a482a84df12ff221 Mon Sep 17 00:00:00 2001 From: jmacd Date: Thu, 6 Feb 2020 16:05:44 -0800 Subject: [PATCH 20/54] Fixes --- specification/api-metrics.md | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/specification/api-metrics.md b/specification/api-metrics.md index badd2a093e7..de5b213514d 100644 --- a/specification/api-metrics.md +++ b/specification/api-metrics.md @@ -238,9 +238,7 @@ by definition. Labels associated with Counter instrument events can be used to compute rates and totals from the instrument, over selected -dimensions. When aggregating Counter events, we naturally combine -values using arithmetic addition. Counter `Add(0)` events are no-ops, -by definition. +dimensions. Counter `Add(0)` events are no-ops, by definition. ### Measure From baba6e42b33ee82483dd6914dc05f63a18635795 Mon Sep 17 00:00:00 2001 From: jmacd Date: Thu, 6 Feb 2020 16:09:42 -0800 Subject: [PATCH 21/54] Fixes --- specification/api-metrics.md | 30 +++++++++++++++--------------- 1 file changed, 15 insertions(+), 15 deletions(-) diff --git a/specification/api-metrics.md b/specification/api-metrics.md index de5b213514d..5ff45da4b39 100644 --- a/specification/api-metrics.md +++ b/specification/api-metrics.md @@ -80,8 +80,8 @@ properties. Instruments are created and defined through calls to a Each kinds of metric instrument has its own semantics, briefly described as: -- Counter: metric events of this kind _Add_ to a value that you would sum over time -- Measure: metric events of this kind _Record_ a value that you would average over time +- Counter: metric events of this kind _Add_ to a value that is summed over time +- Measure: metric events of this kind _Record_ a value that is aggregated over time - Observer: metric events of this kind _Observe_ a coherent set of values at an instant in time. An _instrument definition_ describes several properties of the @@ -305,15 +305,15 @@ Depending on the exposition format, sums are exported either as pairs of label set and cumulative _difference_ or as pairs of label set and cumulative _total. -2. Measure. Use the `Record()` function to report summary statistics -about the distribution of values, for each distinct label set. The -summary statistics to use are determined by the aggregation, but -they usually include at least the sum of values, the count of -measurements, and the minimum and maximum values. When aggregating -distinct Measure events, report summary statistics of the combined -value distribution. Exposition formats for summary statistics vary -widely, but typically include pairs of label set and (sum, count, -minimum and maximum value). +2. Measure. Use the `Record()` function to report events that for +which the SDK will compute summary statistics about the distribution +of values, for each distinct label set. The summary statistics to use +are determined by the aggregation, but they usually include at least +the sum of values, the count of measurements, and the minimum and +maximum values. When aggregating distinct Measure events, report +summary statistics of the combined value distribution. Exposition +formats for summary statistics vary widely, but typically include +pairs of label set and (sum, count, minimum and maximum value). 3. Observer. Current values are provided by the Observer callback at the end of each Metric collection period. When aggregating values @@ -392,10 +392,10 @@ user-provided LabelSet values. Start with an application for metrics data in mind. It is useful to consider whether you are more likely to be interested in the sum of -values or the average value, as processed by the instrument. Counters -are useful when only the sum is interesting. Measures are useful when -the sum and any other kind of summary information about the individual -values are of interest. +values or any other aggregate value (e.g., average, histogram), as +processed by the instrument. Counters are useful when only the sum is +interesting. Measures are useful when the sum and any other kind of +summary information about the individual values are of interest. If only the sum is of interest, use a Counter instrument. From 12753e03691156dd95a2ae15aebf105cc94d075b Mon Sep 17 00:00:00 2001 From: jmacd Date: Mon, 10 Feb 2020 20:06:49 -0800 Subject: [PATCH 22/54] Rename BoundTimer --- specification/api-metrics.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/specification/api-metrics.md b/specification/api-metrics.md index 5ff45da4b39..fcbbd5c5dd3 100644 --- a/specification/api-metrics.md +++ b/specification/api-metrics.md @@ -330,8 +330,8 @@ terms of the intended semantics. As a language-optional feature, the API may support a dedicated instrument for reporting timing measurements. This kind of -instrument, with recommended name `TimingMeasure` (and -`BoundTimingMeasure`), is semantically equivalent to a Measure +instrument, with recommended name `Timer` (and +`BoundTimer`), is semantically equivalent to a Measure instrument, and like the Measure instrument supports a `Record()` function, but the input value to this instrument is in the language's conventional data type for timing measurements. From b3da9d19425f256b6d167b88ba58172cc93b1c89 Mon Sep 17 00:00:00 2001 From: jmacd Date: Mon, 10 Feb 2020 20:15:30 -0800 Subject: [PATCH 23/54] Typos and suggestions --- specification/api-metrics.md | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/specification/api-metrics.md b/specification/api-metrics.md index fcbbd5c5dd3..c933454743f 100644 --- a/specification/api-metrics.md +++ b/specification/api-metrics.md @@ -77,11 +77,11 @@ API are associated with an instrument, which gives the measurement its properties. Instruments are created and defined through calls to a `Meter` API, which is the user-facing entry point to the SDK. -Each kinds of metric instrument has its own semantics, briefly +Each kind of metric instrument has its own semantics, briefly described as: -- Counter: metric events of this kind _Add_ to a value that is summed over time -- Measure: metric events of this kind _Record_ a value that is aggregated over time +- Counter: metric events of this kind _Add_ to a value that is summed over time. +- Measure: metric events of this kind _Record_ a value that is aggregated over time. - Observer: metric events of this kind _Observe_ a coherent set of values at an instant in time. An _instrument definition_ describes several properties of the @@ -101,7 +101,7 @@ with a metric event. Although they are fundamentally similar to [Span attributes](api-tracing.md#span) in the tracing API, a [label set](TODO: link to user doc) is given its own type in the Metrics API (generally: `LabelSet`). Label sets are a feature of the API meant to -facilitate re-use, to lower the cost of processing metric events. +facilitate re-use and thereby to lower the cost of processing metric events. Users are encouraged to re-use label sets whenever possible, as they may contain a previously encoded representation of the labels. @@ -112,12 +112,12 @@ specification](api-metrics-user.md) accepts a label set. ### Meter Interface To produce measurements using an instrument, you need an SDK that -implements the `Meter` API, which consists of a set of instrument +implements the `Meter` API. This interface consists of a set of instrument constructors, functionality related to label sets, and a facility for entering batches of measurements in a semantically atomic way. There is a global `Meter` instance available for use. Use of the this -instance allows library code that uses it to be automatically enabled +instance allows library code to operate in a no-op fashion until whenever the main application configures an SDK at the global level. Details about installing an SDK and obtaining a `Meter` are covered in @@ -167,7 +167,7 @@ insignificant when aggregating data across minutes or hours of data. Aggregations are commonly computed over a series of events that fall into a contiguous region of time, known as the collection interval. -Since the SDK controls decision to start collection, it is possible to +Since the SDK controls the decision to start collection, it is possible to collect aggregated metric data while only reading the clock once per collection interval. The default SDK takes this approach. @@ -303,7 +303,7 @@ distinct label set. When aggregating over distinct label sets for a Counter, combine using arithmetic addition and export as a sum. Depending on the exposition format, sums are exported either as pairs of label set and cumulative _difference_ or as pairs of label set and -cumulative _total. +cumulative _total_. 2. Measure. Use the `Record()` function to report events that for which the SDK will compute summary statistics about the distribution @@ -328,7 +328,7 @@ terms of the intended semantics. ### Option: Dedicated Measure instrument for timing measurements -As a language-optional feature, the API may support a dedicated +As a language-optional feature, the API MAY support a dedicated instrument for reporting timing measurements. This kind of instrument, with recommended name `Timer` (and `BoundTimer`), is semantically equivalent to a Measure From 445a8a85b16cbc4b64fa3cc5186c29e05907dfed Mon Sep 17 00:00:00 2001 From: jmacd Date: Mon, 10 Feb 2020 21:49:28 -0800 Subject: [PATCH 24/54] Reword global Meter --- specification/api-metrics.md | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/specification/api-metrics.md b/specification/api-metrics.md index c933454743f..9e398d8afc4 100644 --- a/specification/api-metrics.md +++ b/specification/api-metrics.md @@ -116,9 +116,12 @@ implements the `Meter` API. This interface consists of a set of instrument constructors, functionality related to label sets, and a facility for entering batches of measurements in a semantically atomic way. -There is a global `Meter` instance available for use. Use of the this -instance allows library code to operate in a no-op fashion until -whenever the main application configures an SDK at the global level. +There is a global `Meter` instance available for use that facilitates +automatic instrumentation for third-party code. Use of this instance +allows code to statically initialize its metric instruments, without +explicit dependency injection. The global `Meter` instance acts as a +no-op implementation until the application explicitly initializes a +global `Meter` by installing an SDK. Details about installing an SDK and obtaining a `Meter` are covered in the [SDK-level API specification](api-metrics-meter.md). From 33255991c0d1edd2ef78697b55b896243a087f0c Mon Sep 17 00:00:00 2001 From: jmacd Date: Mon, 10 Feb 2020 22:59:14 -0800 Subject: [PATCH 25/54] Rearrange early paragraphs --- specification/api-metrics.md | 31 +++++++++++++++++-------------- 1 file changed, 17 insertions(+), 14 deletions(-) diff --git a/specification/api-metrics.md b/specification/api-metrics.md index 9e398d8afc4..2cec1df16be 100644 --- a/specification/api-metrics.md +++ b/specification/api-metrics.md @@ -42,11 +42,24 @@ the intent to produce continuous summaries of those measurements, also in real time. Hereafter, "the API" refers to the OpenTelemetry Metrics API. +The word "semantic" or "semantics" as used here refers to _how we give +meaning_ to metric events, as they take place under the API. The term +is used extensively in this document to define and explain these API +functions and how we should interpret them. As far as possible, the +terminology used here tries to convey the intended semantics, and a +_standard implementation_ will be described below to help us +understand their meaning. The standard implementation performs +aggregation corresponding to the default interpretation for each kind +of metric event. + The API provides functions for entering raw measurements, through -several [calling conventions](TODO: link to user doc) that offer -different levels of performance. Regardless of calling convention, we -define a _metric event_ as the logical thing that happens when a new -measurement is entered. +several [calling +conventions](api-metrics-user.md#metric-calling-conventions) that +offer different levels of performance. Regardless of calling +convention, we define a _metric event_ as the logical thing that +happens when a new measurement is entered. The word "enter" as used +here refers to the logical creation of an event through one of the +calling conventions. Monitoring and alerting systems commonly use the data provided through metric events, after applying various [aggregations](#aggregations) @@ -57,16 +70,6 @@ logging systems. For this reason, [OpenTelemetry requires a separation of the API from the SDK](library-guidelines.md#requirements), so that different SDKs can be configured at runtime. -The word "semantic" or "semantics" as used here refers to _how we give -meaning_ to metric events, as they take place under the API. The term -is used extensively in this document to define and explain these API -functions and how we should interpret them. As far as possible, the -terminology used here tries to convey the intended semantics, and a -_standard implementation_ will be described below to help us -understand their meaning. The standard implementation performs -aggregation corresponding to the default interpretation for each kind -of metric event. - ### Metric Instruments A _metric instrument_, of which there are three kinds, is a device for From 21fd3e080485155e682f9a28d97e3f66194a15f0 Mon Sep 17 00:00:00 2001 From: jmacd Date: Mon, 10 Feb 2020 23:02:48 -0800 Subject: [PATCH 26/54] Use _run time_ and _simultaneous_ --- specification/api-metrics.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/specification/api-metrics.md b/specification/api-metrics.md index 2cec1df16be..215c0f28256 100644 --- a/specification/api-metrics.md +++ b/specification/api-metrics.md @@ -36,10 +36,10 @@ ## Overview The OpenTelemetry Metrics API supports capturing measurements about -the execution of a computer program in real time. The Metrics API is +the execution of a computer program as it runs. The Metrics API is designed explicitly for processing raw measurements, generally with -the intent to produce continuous summaries of those measurements, also -in real time. Hereafter, "the API" refers to the OpenTelemetry +the intent to produce continuous summaries of those measurements +simultaneously. Hereafter, "the API" refers to the OpenTelemetry Metrics API. The word "semantic" or "semantics" as used here refers to _how we give From bb16febaaa27bc1118112c62845e7b4b040dd644 Mon Sep 17 00:00:00 2001 From: jmacd Date: Mon, 10 Feb 2020 23:08:49 -0800 Subject: [PATCH 27/54] Rearrange early paragraphs (again) --- specification/api-metrics.md | 24 +++++++++++++----------- 1 file changed, 13 insertions(+), 11 deletions(-) diff --git a/specification/api-metrics.md b/specification/api-metrics.md index 215c0f28256..6873fcec84d 100644 --- a/specification/api-metrics.md +++ b/specification/api-metrics.md @@ -36,12 +36,23 @@ ## Overview The OpenTelemetry Metrics API supports capturing measurements about -the execution of a computer program as it runs. The Metrics API is +the execution of a computer program at run time. The Metrics API is designed explicitly for processing raw measurements, generally with the intent to produce continuous summaries of those measurements simultaneously. Hereafter, "the API" refers to the OpenTelemetry Metrics API. +The API provides functions for entering raw measurements, through +several [calling +conventions](api-metrics-user.md#metric-calling-conventions) that +offer different levels of performance. Regardless of calling +convention, we define a _metric event_ as the logical thing that +happens when a new measurement is entered. The word "enter" as used +here refers to the logical creation of an event through one of the +calling conventions. This moment of entry defines an implicit +timestamp, which is the wall time an SDK would read from a clock at +that moment. + The word "semantic" or "semantics" as used here refers to _how we give meaning_ to metric events, as they take place under the API. The term is used extensively in this document to define and explain these API @@ -52,15 +63,6 @@ understand their meaning. The standard implementation performs aggregation corresponding to the default interpretation for each kind of metric event. -The API provides functions for entering raw measurements, through -several [calling -conventions](api-metrics-user.md#metric-calling-conventions) that -offer different levels of performance. Regardless of calling -convention, we define a _metric event_ as the logical thing that -happens when a new measurement is entered. The word "enter" as used -here refers to the logical creation of an event through one of the -calling conventions. - Monitoring and alerting systems commonly use the data provided through metric events, after applying various [aggregations](#aggregations) and converting into various [exposition formats](#exposition-formats). @@ -68,7 +70,7 @@ However, we find that there are many other uses for metric events, such as to record aggregated or raw measurements in tracing and logging systems. For this reason, [OpenTelemetry requires a separation of the API from the SDK](library-guidelines.md#requirements), -so that different SDKs can be configured at runtime. +so that different SDKs can be configured at run time. ### Metric Instruments From 01ae0d5a317d0b6616ae3923bbb7cfb6382e470d Mon Sep 17 00:00:00 2001 From: jmacd Date: Mon, 10 Feb 2020 23:10:54 -0800 Subject: [PATCH 28/54] Define _run time_ --- specification/api-metrics.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/specification/api-metrics.md b/specification/api-metrics.md index 6873fcec84d..a98c264457d 100644 --- a/specification/api-metrics.md +++ b/specification/api-metrics.md @@ -49,9 +49,9 @@ offer different levels of performance. Regardless of calling convention, we define a _metric event_ as the logical thing that happens when a new measurement is entered. The word "enter" as used here refers to the logical creation of an event through one of the -calling conventions. This moment of entry defines an implicit -timestamp, which is the wall time an SDK would read from a clock at -that moment. +calling conventions. This moment of entry (at "run time") defines an +implicit timestamp, which is the wall time an SDK would read from a +clock at that moment. The word "semantic" or "semantics" as used here refers to _how we give meaning_ to metric events, as they take place under the API. The term From f92b42b1457fdc97a571a182b5c66f2fac8cd26f Mon Sep 17 00:00:00 2001 From: jmacd Date: Mon, 10 Feb 2020 23:15:11 -0800 Subject: [PATCH 29/54] These --- specification/api-metrics.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/specification/api-metrics.md b/specification/api-metrics.md index a98c264457d..4b66dd55e66 100644 --- a/specification/api-metrics.md +++ b/specification/api-metrics.md @@ -345,7 +345,7 @@ function, but the input value to this instrument is in the language's conventional data type for timing measurements. For example, in Go the API will accept a `time.Duration`, and in C++ -the API will accept a `std::chrono::duration`. These advantage of +the API will accept a `std::chrono::duration`. The advantage of using these instruments is that they use the correct units automatically, avoiding the potential for confusion over timing metrics. From d651afa4707586288a37a57f7ec7820096df61b1 Mon Sep 17 00:00:00 2001 From: jmacd Date: Mon, 10 Feb 2020 23:17:10 -0800 Subject: [PATCH 30/54] Iterate --- specification/api-metrics.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/specification/api-metrics.md b/specification/api-metrics.md index 4b66dd55e66..1904e443909 100644 --- a/specification/api-metrics.md +++ b/specification/api-metrics.md @@ -351,8 +351,8 @@ automatically, avoiding the potential for confusion over timing metrics. ### Future Work: Option Support -We are aware of a number of reasons to refine on these types, in order -to offer: +We are aware of a number of reasons to iterate on these +instrumentation kinds, in order to offer: 1. Range restrictions on input data. Instruments accepting negative values is rare in most applications, for example, and it is useful to From 49c19d4338460980accdf9ebe4dbb8ca8142f7e9 Mon Sep 17 00:00:00 2001 From: jmacd Date: Mon, 10 Feb 2020 23:27:10 -0800 Subject: [PATCH 31/54] Remove a TODO --- specification/api-metrics.md | 23 ++++++++++++++--------- 1 file changed, 14 insertions(+), 9 deletions(-) diff --git a/specification/api-metrics.md b/specification/api-metrics.md index 1904e443909..5965d049503 100644 --- a/specification/api-metrics.md +++ b/specification/api-metrics.md @@ -103,12 +103,12 @@ covered in the [user-level API specification](api-metrics-user.md). _Label_ is the term used to refer to a key-value attribute associated with a metric event. Although they are fundamentally similar to [Span -attributes](api-tracing.md#span) in the tracing API, a [label -set](TODO: link to user doc) is given its own type in the Metrics API -(generally: `LabelSet`). Label sets are a feature of the API meant to -facilitate re-use and thereby to lower the cost of processing metric events. -Users are encouraged to re-use label sets whenever possible, as they -may contain a previously encoded representation of the labels. +attributes](api-tracing.md#span) in the tracing API, a label set is +given its own type in the Metrics API (generally: `LabelSet`). Label +sets are a feature of the API meant to facilitate re-use and thereby +to lower the cost of processing metric events. Users are encouraged +to re-use label sets whenever possible, as they may contain a +previously encoded representation of the labels. Users obtain label sets by calling a `Meter` API function. Each of the instrument calling conventions detailed in the [user-level API @@ -117,9 +117,14 @@ specification](api-metrics-user.md) accepts a label set. ### Meter Interface To produce measurements using an instrument, you need an SDK that -implements the `Meter` API. This interface consists of a set of instrument -constructors, functionality related to label sets, and a facility for -entering batches of measurements in a semantically atomic way. +implements the `Meter` API. This interface consists of a set of +instrument constructors, functionality related to label sets, and a +facility for entering batches of measurements in a semantically atomic +way. As an obligatory step, the API requires the caller to provide +the name of the instrumenting library (optionally, the version), that +is meant to be used for identifying instrumentation produced from that +library for such purposes as disabling instrumentation, configuring +aggregation, and applying sampling policies. There is a global `Meter` instance available for use that facilitates automatic instrumentation for third-party code. Use of this instance From a53ad588c16f6960c1c60e6bc7ba3ce0e1f5b15c Mon Sep 17 00:00:00 2001 From: jmacd Date: Mon, 10 Feb 2020 23:37:24 -0800 Subject: [PATCH 32/54] More on global/named meters --- specification/api-metrics.md | 18 +++++++++++------- 1 file changed, 11 insertions(+), 7 deletions(-) diff --git a/specification/api-metrics.md b/specification/api-metrics.md index 5965d049503..6d14fc248cb 100644 --- a/specification/api-metrics.md +++ b/specification/api-metrics.md @@ -120,11 +120,7 @@ To produce measurements using an instrument, you need an SDK that implements the `Meter` API. This interface consists of a set of instrument constructors, functionality related to label sets, and a facility for entering batches of measurements in a semantically atomic -way. As an obligatory step, the API requires the caller to provide -the name of the instrumenting library (optionally, the version), that -is meant to be used for identifying instrumentation produced from that -library for such purposes as disabling instrumentation, configuring -aggregation, and applying sampling policies. +way. There is a global `Meter` instance available for use that facilitates automatic instrumentation for third-party code. Use of this instance @@ -133,8 +129,16 @@ explicit dependency injection. The global `Meter` instance acts as a no-op implementation until the application explicitly initializes a global `Meter` by installing an SDK. -Details about installing an SDK and obtaining a `Meter` are covered in -the [SDK-level API specification](api-metrics-meter.md). +As an obligatory step, the API requires the caller to provide the name +of the instrumenting library (optionally, the version) when obtaining +a `Meter` implementation, that is meant to be used for identifying +instrumentation produced from that library for such purposes as +disabling instrumentation, configuring aggregation, and applying +sampling policies. (TODO: refer to the semantic convention on the +Named Tracer/Meter). + +Details about installing an SDK and obtaining a named `Meter` are +covered in the [SDK-level API specification](api-metrics-meter.md). ### Aggregations From 313cd32d456f9158baf2770889379498624ef431 Mon Sep 17 00:00:00 2001 From: jmacd Date: Mon, 10 Feb 2020 23:44:47 -0800 Subject: [PATCH 33/54] Difference, Capture --- specification/api-metrics.md | 52 +++++++++++++++++------------------- 1 file changed, 25 insertions(+), 27 deletions(-) diff --git a/specification/api-metrics.md b/specification/api-metrics.md index 6d14fc248cb..de0aacdbfc6 100644 --- a/specification/api-metrics.md +++ b/specification/api-metrics.md @@ -42,16 +42,14 @@ the intent to produce continuous summaries of those measurements simultaneously. Hereafter, "the API" refers to the OpenTelemetry Metrics API. -The API provides functions for entering raw measurements, through +The API provides functions for capturing raw measurements, through several [calling conventions](api-metrics-user.md#metric-calling-conventions) that offer different levels of performance. Regardless of calling convention, we define a _metric event_ as the logical thing that -happens when a new measurement is entered. The word "enter" as used -here refers to the logical creation of an event through one of the -calling conventions. This moment of entry (at "run time") defines an -implicit timestamp, which is the wall time an SDK would read from a -clock at that moment. +happens when a new measurement is captured. This moment of capture +(at "run time") defines an implicit timestamp, which is the wall time +an SDK would read from a clock at that moment. The word "semantic" or "semantics" as used here refers to _how we give meaning_ to metric events, as they take place under the API. The term @@ -75,9 +73,9 @@ so that different SDKs can be configured at run time. ### Metric Instruments A _metric instrument_, of which there are three kinds, is a device for -entering raw measurements into the API. There are Counter, Measure, +capturing raw measurements into the API. There are Counter, Measure, and Observer instruments, each with different semantics and intended -uses, that will be specified here. All measurements that enter the +uses, that will be specified here. All measurements captured by the API are associated with an instrument, which gives the measurement its properties. Instruments are created and defined through calls to a `Meter` API, which is the user-facing entry point to the SDK. @@ -119,7 +117,7 @@ specification](api-metrics-user.md) accepts a label set. To produce measurements using an instrument, you need an SDK that implements the `Meter` API. This interface consists of a set of instrument constructors, functionality related to label sets, and a -facility for entering batches of measurements in a semantically atomic +facility for capturing batches of measurements in a semantically atomic way. There is a global `Meter` instance available for use that facilitates @@ -188,14 +186,14 @@ Since the SDK controls the decision to start collection, it is possible to collect aggregated metric data while only reading the clock once per collection interval. The default SDK takes this approach. -Counter and Measure instruments offer synchronous APIs for entering +Counter and Measure instruments offer synchronous APIs for capturing measurements. Metric events from Counter and Measure instruments are captured at the moment they happen, when the SDK receives the corresponding function call. The Observer instrument supports an asynchronous API, allowing the SDK to collect metric data on demand, once per collection interval. A -single Observer instrument callback can enter multiple metric events +single Observer instrument callback can capture multiple metric events associated with different label sets. Semantically, by definition, these observations are captured at a single instant in time, the instant that they became the current set of last-measured values. @@ -245,13 +243,13 @@ individual semantics. ### Counter -Counter instruments are used to enter changes in sums, synchronously. -These are commonly used to monitor rates, and they are sometimes used -to enter totals that rise and fall. An essential property of Counter -instruments is that two `Add(1)` events are semantically equivalent to -one `Add(2)` event--`Add(m)` and `Add(n)` is equivalent to `Add(m+n)`. -This property means that Counter events can be combined inexpensively, -by definition. +Counter instruments are used to capture changes in running sums, +synchronously. These are commonly used to monitor rates, and they are +sometimes used to capture totals that rise and fall. An essential +property of Counter instruments is that two `Add(1)` events are +semantically equivalent to one `Add(2)` event--`Add(m)` and `Add(n)` +is equivalent to `Add(m+n)`. This property means that Counter events +can be combined inexpensively, by definition. Labels associated with Counter instrument events can be used to compute rates and totals from the instrument, over selected @@ -261,7 +259,7 @@ dimensions. Counter `Add(0)` events are no-ops, by definition. Semantically, metric events from Measure instruments are independent, meaning they cannot be combined naturally, as with Counters. Measure -instruments are used to enter many kinds of information, and are +instruments are used to capture many kinds of information, and are recommended for all cases where the additive property of Counter instruments does not apply. @@ -273,12 +271,12 @@ data set. ### Observer -Observer instruments are used to enter a _current set of values_ at a +Observer instruments are used to capture a _current set of values_ at a point in time. Observer instruments are asynchronous, with the use of -callbacks allowing the user to enter multiple values per collection +callbacks allowing the user to capture multiple values per collection interval. -Observer instruments enter not only current values, but also +Observer instruments capture not only current values, but also effectively _which label sets are current_ at the moment of collection. These instruments can be used to compute probabilities and ratios, because values are part of a set. @@ -290,7 +288,7 @@ semantically defined. These values are considered coherent, because measurements from an Observer instrument in a single collection interval are considered -simultaneous. The set of measurements entered through one callback +simultaneous. The set of measurements captured through one callback invocation implicitly share a single timestamp. ## Interpretation @@ -319,7 +317,7 @@ follows: distinct label set. When aggregating over distinct label sets for a Counter, combine using arithmetic addition and export as a sum. Depending on the exposition format, sums are exported either as pairs -of label set and cumulative _difference_ or as pairs of label set and +of label set and cumulative _delta_ or as pairs of label set and cumulative _total_. 2. Measure. Use the `Record()` function to report events that for @@ -516,12 +514,12 @@ questions. - Why not use a Counter instrument? In order to use a Counter instrument, we would need to convert total usage figures into -differences. Calculating differences from the previous measurement is +deltas. Calculating deltas from the previous measurement is easy to do, but Counter instruments are not meant to be used from callbacks. -- Why not report differences in the Observer callback? Observer +- Why not report deltass in the Observer callback? Observer instruments are meant to be used to observe current values. Nothing -prevents reporting differences with an Observer, but the standard +prevents reporting deltas with an Observer, but the standard aggregation for Observer instruments is to sum the current value across distinct label sets. The standard behavior is useful for determining the current rate of CPU usage, but special configuration From 968caffe5653a3636c91dcfb41ccac6f411d7c1c Mon Sep 17 00:00:00 2001 From: jmacd Date: Tue, 11 Feb 2020 00:10:35 -0800 Subject: [PATCH 34/54] Timer wording --- specification/api-metrics.md | 21 +++++++++++++-------- 1 file changed, 13 insertions(+), 8 deletions(-) diff --git a/specification/api-metrics.md b/specification/api-metrics.md index de0aacdbfc6..fcfbe9684d5 100644 --- a/specification/api-metrics.md +++ b/specification/api-metrics.md @@ -345,16 +345,21 @@ terms of the intended semantics. As a language-optional feature, the API MAY support a dedicated instrument for reporting timing measurements. This kind of -instrument, with recommended name `Timer` (and -`BoundTimer`), is semantically equivalent to a Measure -instrument, and like the Measure instrument supports a `Record()` -function, but the input value to this instrument is in the language's -conventional data type for timing measurements. +instrument, with recommended name `Timer` (and `BoundTimer`), is +semantically equivalent to a Measure instrument. Like the Measure +instrument, Timers support a `Record()` function. The input value to +this instrument is in the language's conventional data type for timing +measurements. + +Timer instruments MUST capture only the magnitude of the input value +(i.e., an absolute value). When the user provides a negative value to +the Timer `Record()` function, the captured measurement is the number +stripped of its sign. For example, in Go the API will accept a `time.Duration`, and in C++ -the API will accept a `std::chrono::duration`. The advantage of -using these instruments is that they use the correct units -automatically, avoiding the potential for confusion over timing metrics. +the API will accept a `std::chrono::duration`. These instruments +apply the correct units automatically, reducing the potential for +confusion over timing metric events. ### Future Work: Option Support From 0faf4587c70de13a98ee1571019945964151307d Mon Sep 17 00:00:00 2001 From: jmacd Date: Tue, 11 Feb 2020 00:16:16 -0800 Subject: [PATCH 35/54] Explicit timestamp --- specification/api-metrics.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/specification/api-metrics.md b/specification/api-metrics.md index fcfbe9684d5..9f9253198f5 100644 --- a/specification/api-metrics.md +++ b/specification/api-metrics.md @@ -289,7 +289,8 @@ semantically defined. These values are considered coherent, because measurements from an Observer instrument in a single collection interval are considered simultaneous. The set of measurements captured through one callback -invocation implicitly share a single timestamp. +invocation are considered independent metric events; explicitly, these +events share a single timestamp. ## Interpretation From 0081967e46cad74946368b1a6e31b097a60075e2 Mon Sep 17 00:00:00 2001 From: jmacd Date: Tue, 11 Feb 2020 00:30:43 -0800 Subject: [PATCH 36/54] Rewrite simultaneous --- specification/api-metrics.md | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/specification/api-metrics.md b/specification/api-metrics.md index 9f9253198f5..89b72eda876 100644 --- a/specification/api-metrics.md +++ b/specification/api-metrics.md @@ -287,10 +287,9 @@ Observer instruments by definition, only the current set of values is semantically defined. These values are considered coherent, because measurements from an -Observer instrument in a single collection interval are considered -simultaneous. The set of measurements captured through one callback -invocation are considered independent metric events; explicitly, these -events share a single timestamp. +Observer instrument in a single collection interval are captured at +the same logical time. A single callback invocation generates (zero +or more) simultaneous metric events, all sharing an implicit timestamp. ## Interpretation From bd1e06f9bd8e145d1fefdd31ca90b7bd4cba03a0 Mon Sep 17 00:00:00 2001 From: jmacd Date: Tue, 11 Feb 2020 00:43:41 -0800 Subject: [PATCH 37/54] Example: active requests --- specification/api-metrics.md | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/specification/api-metrics.md b/specification/api-metrics.md index 89b72eda876..db6b136d205 100644 --- a/specification/api-metrics.md +++ b/specification/api-metrics.md @@ -133,7 +133,7 @@ a `Meter` implementation, that is meant to be used for identifying instrumentation produced from that library for such purposes as disabling instrumentation, configuring aggregation, and applying sampling policies. (TODO: refer to the semantic convention on the -Named Tracer/Meter). +reporting library name). Details about installing an SDK and obtaining a named `Meter` are covered in the [SDK-level API specification](api-metrics-meter.md). @@ -284,13 +284,15 @@ and ratios, because values are part of a set. Unlike Counter and Measure instruments, Observer instruments are synchronized with collection. There is no aggregation across time for Observer instruments by definition, only the current set of values is -semantically defined. +semantically defined. Because Observer instruments are activated by +the SDK, they can be effectively disabled at low cost. These values are considered coherent, because measurements from an Observer instrument in a single collection interval are captured at the same logical time. A single callback invocation generates (zero or more) simultaneous metric events, all sharing an implicit timestamp. + ## Interpretation We believe the three instrument kinds Counter, Measure, and Observer @@ -542,3 +544,12 @@ with a shard label. These can be aggregated across hosts to compute cluster-wide memory holdings by shard, for example, using the standard aggregation for Observers, which sums the current value across distinct label sets. + +### Reporting number of active requests + +Suppose your server maintains the count of active requests, which +rises and falls as new requests begin and end processing. + +Observer the number of active requests periodically with an Observer +instrument. Labels can be used to indicate which application-specific +properties are associated with these events. From b0ef3e462a5b5a37308004451bfbcd216ece84f9 Mon Sep 17 00:00:00 2001 From: jmacd Date: Tue, 11 Feb 2020 00:45:17 -0800 Subject: [PATCH 38/54] Synchronously --- specification/api-metrics.md | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/specification/api-metrics.md b/specification/api-metrics.md index db6b136d205..4e805a0195c 100644 --- a/specification/api-metrics.md +++ b/specification/api-metrics.md @@ -259,9 +259,10 @@ dimensions. Counter `Add(0)` events are no-ops, by definition. Semantically, metric events from Measure instruments are independent, meaning they cannot be combined naturally, as with Counters. Measure -instruments are used to capture many kinds of information, and are -recommended for all cases where the additive property of Counter -instruments does not apply. +instruments are used to capture many kinds of information, +synchronously, and are recommended for all cases that reflect an event +in the application where the additive property of Counter instruments +does not apply. Labels associated with Measure instrument events can be used to compute information about the distribution of values from the From cb3d36929555740d6ee99376e50bb1bca07778f2 Mon Sep 17 00:00:00 2001 From: jmacd Date: Tue, 11 Feb 2020 00:47:57 -0800 Subject: [PATCH 39/54] Wording --- specification/api-metrics.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/specification/api-metrics.md b/specification/api-metrics.md index 4e805a0195c..8b772ade115 100644 --- a/specification/api-metrics.md +++ b/specification/api-metrics.md @@ -431,7 +431,7 @@ rate or as a total sum. ### Observer instruments Observer instruments are recommended for reporting measurements about -the state of the program at a moment in time. These expose current +the state of the program periodically. These expose current information about the program itself, not related to individual events taking place in the program. Observer instruments are reported outside of a context, thus do not have an effective span context or From dc61b4eb4b95dfcc9e4ab756d5cd6c8d0c21aa66 Mon Sep 17 00:00:00 2001 From: jmacd Date: Tue, 11 Feb 2020 00:55:51 -0800 Subject: [PATCH 40/54] Example using correlation context --- specification/api-metrics.md | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/specification/api-metrics.md b/specification/api-metrics.md index 8b772ade115..422780b9eb9 100644 --- a/specification/api-metrics.md +++ b/specification/api-metrics.md @@ -554,3 +554,16 @@ rises and falls as new requests begin and end processing. Observer the number of active requests periodically with an Observer instrument. Labels can be used to indicate which application-specific properties are associated with these events. + +### Reporting bytes read and written correlated by end user + +An application uses storage servers to read and write from some +underlying media. These requests are made in the context of the end +user that made the request into the frontend system, with Correlation +Context passed from the frontend to the storage servers carrying these +properties. + +Use Counter instruments to report the number of bytes read and written +by the storage server. Configure the SDK to use a Correltion Context +label key (e.g., named "app.user") to aggregate events by all metric +instruments. \ No newline at end of file From 6dce940168650aa7b9ea3085148f3be1123282d4 Mon Sep 17 00:00:00 2001 From: jmacd Date: Tue, 11 Feb 2020 11:46:03 -0800 Subject: [PATCH 41/54] Detail on WithKeys --- specification/api-metrics.md | 38 +++++++++++++++++++++++++++--------- 1 file changed, 29 insertions(+), 9 deletions(-) diff --git a/specification/api-metrics.md b/specification/api-metrics.md index 422780b9eb9..eabd8590028 100644 --- a/specification/api-metrics.md +++ b/specification/api-metrics.md @@ -207,15 +207,35 @@ resulting from an aggregation of raw measurements over time. ### WithKeys declaration on metric instruments A standard feature of metric SDKs is to pre-aggregate metric events -according to a specified set of label keys (i.e., dimensions). The -API provides a `WithKeys` option for the user to declare the -recommended aggregation keys. - -This feature is useful for configuring _pre-aggegation_ within the -SDK, prior to export, and generally helps lower the cost of exporting -metric data. This feature is included in the API to ensure that this -optimization is easily configured at the point where instruments are -declared. +according to a specified set of label keys (i.e., dimensions). To +perform this task, the SDK must aggregate metric events over the +collection interval: (1) across time, (2) across key dimensions in +_label space_. + +When aggregating across spatial dimensions, metric events for +different label sets are combined into an aggregated value for each +distinct "group" of values for the key dimensions. It means that +measurements are combined for all metric events having the same values +for selected keys, explicitly disregarding any additional labels with +keys not in the set of aggregation keys. Some exporters are known to +require pre-specifying the label keys used for aggregation (e.g., +Prometheus). + +For example, if `[ak1, ak2]` are the aggregation keys and `[ik1, +ik2]` are the ignored keys, then a metric event having labels +`{ak1=A, ak2=B, ik1=C, ik1=D}` will be combined with a metric +event having labels `{ak1=A, ak2=B, ik1=Y, ik1=Z}` because they +have identical label values for all of the aggregation keys. + +The API provides a `WithKeys` option for the user to declare the +recommended aggregation keys when declaring new metric instruments, +intended as the default way to configure an exporter for +pre-aggregation, if it is expected. Since this is only expected in +some exporters, it is regarded as an option relevant to the exporter, +whether keys configured through `WithKeys` are applied for aggregation +purposes or not. This allows the user influence the standard +implementation behavior, especially for exporters that require +pre-specified aggregation keys. ### Metric Event Format From ec8b3d5af275426c13ed502871ee91dd4149f2e3 Mon Sep 17 00:00:00 2001 From: jmacd Date: Tue, 11 Feb 2020 12:13:39 -0800 Subject: [PATCH 42/54] Counter is a special case --- specification/api-metrics.md | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/specification/api-metrics.md b/specification/api-metrics.md index eabd8590028..454f98783ef 100644 --- a/specification/api-metrics.md +++ b/specification/api-metrics.md @@ -271,6 +271,13 @@ semantically equivalent to one `Add(2)` event--`Add(m)` and `Add(n)` is equivalent to `Add(m+n)`. This property means that Counter events can be combined inexpensively, by definition. +Counter instruments can be seen as special cases of Measure +instruments with the additive property described above and a +more-specific verb to improve readability (i.e., "Add" instead of +"Record"). Counter instruments are special cases of Measure +instruments in that they only preserve a Sum, by default, and no other +summary statistics. + Labels associated with Counter instrument events can be used to compute rates and totals from the instrument, over selected dimensions. Counter `Add(0)` events are no-ops, by definition. From 775725fdb682d2f04de3f3b467f9c0cf2299f74e Mon Sep 17 00:00:00 2001 From: jmacd Date: Tue, 11 Feb 2020 12:17:45 -0800 Subject: [PATCH 43/54] Counter fixes; typo --- specification/api-metrics.md | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/specification/api-metrics.md b/specification/api-metrics.md index 454f98783ef..f7c35c472e8 100644 --- a/specification/api-metrics.md +++ b/specification/api-metrics.md @@ -266,10 +266,10 @@ individual semantics. Counter instruments are used to capture changes in running sums, synchronously. These are commonly used to monitor rates, and they are sometimes used to capture totals that rise and fall. An essential -property of Counter instruments is that two `Add(1)` events are -semantically equivalent to one `Add(2)` event--`Add(m)` and `Add(n)` -is equivalent to `Add(m+n)`. This property means that Counter events -can be combined inexpensively, by definition. +property of Counter instruments is that two events `Add(m)` and +`Add(n)` are semantically equivalent to one event `Add(m+n)`. This +property means that Counter events can be combined inexpensively, by +definition. Counter `Add(0)` events are no-ops, by definition. Counter instruments can be seen as special cases of Measure instruments with the additive property described above and a @@ -280,7 +280,7 @@ summary statistics. Labels associated with Counter instrument events can be used to compute rates and totals from the instrument, over selected -dimensions. Counter `Add(0)` events are no-ops, by definition. +dimensions. ### Measure @@ -350,9 +350,9 @@ Depending on the exposition format, sums are exported either as pairs of label set and cumulative _delta_ or as pairs of label set and cumulative _total_. -2. Measure. Use the `Record()` function to report events that for -which the SDK will compute summary statistics about the distribution -of values, for each distinct label set. The summary statistics to use +2. Measure. Use the `Record()` function to report events for which +the SDK will compute summary statistics about the distribution of +values, for each distinct label set. The summary statistics to use are determined by the aggregation, but they usually include at least the sum of values, the count of measurements, and the minimum and maximum values. When aggregating distinct Measure events, report From 245db5926912e1ccae67533894a71a0332531c95 Mon Sep 17 00:00:00 2001 From: jmacd Date: Tue, 11 Feb 2020 12:18:46 -0800 Subject: [PATCH 44/54] Typo --- specification/api-metrics.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/specification/api-metrics.md b/specification/api-metrics.md index f7c35c472e8..9492afd7798 100644 --- a/specification/api-metrics.md +++ b/specification/api-metrics.md @@ -578,7 +578,7 @@ distinct label sets. Suppose your server maintains the count of active requests, which rises and falls as new requests begin and end processing. -Observer the number of active requests periodically with an Observer +Observe the number of active requests periodically with an Observer instrument. Labels can be used to indicate which application-specific properties are associated with these events. From 81ac2057dad77365f0961f2e79d980972977a1f8 Mon Sep 17 00:00:00 2001 From: jmacd Date: Tue, 11 Feb 2020 12:32:45 -0800 Subject: [PATCH 45/54] Split the leading example into Counter and Measure cases --- specification/api-metrics.md | 35 ++++++++++++++++++++++++++--------- 1 file changed, 26 insertions(+), 9 deletions(-) diff --git a/specification/api-metrics.md b/specification/api-metrics.md index 9492afd7798..090425b6ff3 100644 --- a/specification/api-metrics.md +++ b/specification/api-metrics.md @@ -470,23 +470,40 @@ change of state in the program. ## Examples -### Reporting bytes read and written +### Reporting total bytes read -You wish to monitor the number of bytes read and written from a -messaging server that supports several protocols. The number of bytes -read and written should be labeled with the protocol name and -aggregated in the process. +You wish to monitor the total number of bytes read from a messaging +server that supports several protocols. The number of bytes read +should be labeled with the protocol name and aggregated in the +process. This is a typical application for the Counter instrument. Use one -Counter for bytes read and one Counter for bytes written. When -handling a request, compute a LabelSet containing the name of the -protocol and potentially other useful labels, then call `Add()` twice -with the same label set and the number of bytes read and written. +Counter for capturing the number bytes read. When handling a request, +compute a LabelSet containing the name of the protocol and potentially +other useful labels, then call `Add()` with the same label set and the +number of bytes read. To lower the cost of this reporting, you can `Bind()` the instrument with each of the supported protocols ahead of time and avoid computing the label set for each request. +### Reporting total bytes read and bytes per request + +You wish to monitor the total number of bytes read as well as the +number of bytes read per request, to have observability into total +traffic as well as typical request size. As with the example above, +these metric events should be labeled with a protocol name. + +This is a typical application for the Measure instrument. Use one +Measure for capturing the number of bytes per request. A sum +aggregation applied to this data yields the total bytes read; other +aggregations allow you to export the minimum and maximum number of +bytes read, as well as the average value, and quantile estimates. + +In this case, the guidance is to create a single instrument. Do not +create a Counter instrument to export a sum when you want to export +other summary statistics using a Measure instrument. + ### Reporting per-request CPU usage Suppose you have a way to measure the CPU usage of processing an From 7b569586cb3f135f2f8b850cec6cf5e71153067d Mon Sep 17 00:00:00 2001 From: jmacd Date: Tue, 11 Feb 2020 12:34:35 -0800 Subject: [PATCH 46/54] Remove poor example --- specification/api-metrics.md | 12 ------------ 1 file changed, 12 deletions(-) diff --git a/specification/api-metrics.md b/specification/api-metrics.md index 090425b6ff3..89297f7566e 100644 --- a/specification/api-metrics.md +++ b/specification/api-metrics.md @@ -504,18 +504,6 @@ In this case, the guidance is to create a single instrument. Do not create a Counter instrument to export a sum when you want to export other summary statistics using a Measure instrument. -### Reporting per-request CPU usage - -Suppose you have a way to measure the CPU usage of processing an -individual request. This is given to you in terms of cpu-seconds -consumed. You may wish to monitor total CPU usage, or you could be -interested in the peak rate of CPU usage. - -Use a Counter instrument to `Add()` this quantity to an instrument -named `cpu.seconds.used` after sending the response. A Counter is -called for, in this case, because a sum is requested, meaning a sum of -all `Add()` events for the instrument in the specified time range. - ### Reporting system call duration You wish to monitor the duration of a specific system call being made From dfe04466566cab164bc7630eeb01b7ff9bc7934b Mon Sep 17 00:00:00 2001 From: jmacd Date: Tue, 11 Feb 2020 12:53:25 -0800 Subject: [PATCH 47/54] Update TOC --- specification/api-metrics.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/specification/api-metrics.md b/specification/api-metrics.md index 89297f7566e..0689fb2951d 100644 --- a/specification/api-metrics.md +++ b/specification/api-metrics.md @@ -23,13 +23,15 @@ * [Counters and Measures compared](#counters-and-measures-compared) * [Observer instruments](#observer-instruments) - [Examples](#examples) - * [Reporting bytes read and written](#reporting-bytes-read-and-written) - * [Reporting per-request CPU usage](#reporting-per-request-cpu-usage) + * [Reporting total bytes read](#reporting-total-bytes-read) + * [Reporting total bytes read and bytes per request](#reporting-total-bytes-read-and-bytes-per-request) * [Reporting system call duration](#reporting-system-call-duration) * [Reporting request size](#reporting-request-size) * [Reporting a per-request finishing account balance](#reporting-a-per-request-finishing-account-balance) * [Reporting process-wide CPU usage](#reporting-process-wide-cpu-usage) * [Reporting per-shard memory holdings](#reporting-per-shard-memory-holdings) + * [Reporting number of active requests](#reporting-number-of-active-requests) + * [Reporting bytes read and written correlated by end user](#reporting-bytes-read-and-written-correlated-by-end-user) From 641a39708dd458be6b52f9abf86c56d2f4719a6b Mon Sep 17 00:00:00 2001 From: jmacd Date: Tue, 11 Feb 2020 14:01:25 -0800 Subject: [PATCH 48/54] Add note about Observer and Context --- specification/api-metrics.md | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/specification/api-metrics.md b/specification/api-metrics.md index 0689fb2951d..36081c2889e 100644 --- a/specification/api-metrics.md +++ b/specification/api-metrics.md @@ -301,10 +301,12 @@ data set. ### Observer -Observer instruments are used to capture a _current set of values_ at a -point in time. Observer instruments are asynchronous, with the use of -callbacks allowing the user to capture multiple values per collection -interval. +Observer instruments are used to capture a _current set of values_ at +a point in time. Observer instruments are asynchronous, with the use +of callbacks allowing the user to capture multiple values per +collection interval. Observer instruments are not associated with a +Context, meaning it is impossible to associate Observer instruments +with Correlation Context. Observer instruments capture not only current values, but also effectively _which label sets are current_ at the moment of From c1b7c108ba843f8fc493f1fd373699e54e0ec2a5 Mon Sep 17 00:00:00 2001 From: jmacd Date: Tue, 11 Feb 2020 14:05:23 -0800 Subject: [PATCH 49/54] Add note about context-freedom in Observer --- specification/api-metrics.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/specification/api-metrics.md b/specification/api-metrics.md index 36081c2889e..22b6ebfbfde 100644 --- a/specification/api-metrics.md +++ b/specification/api-metrics.md @@ -305,8 +305,9 @@ Observer instruments are used to capture a _current set of values_ at a point in time. Observer instruments are asynchronous, with the use of callbacks allowing the user to capture multiple values per collection interval. Observer instruments are not associated with a -Context, meaning it is impossible to associate Observer instruments -with Correlation Context. +Context, by definition. This means, for example, it is not possible +to associate Observer instrument events with Correlation or Span +context. Observer instruments capture not only current values, but also effectively _which label sets are current_ at the moment of From 6accb0559af44523ae1d8d81e4b1312a73b26736 Mon Sep 17 00:00:00 2001 From: jmacd Date: Tue, 11 Feb 2020 14:33:04 -0800 Subject: [PATCH 50/54] Run MDL --- specification/api-metrics.md | 62 +++++++----------------------------- 1 file changed, 12 insertions(+), 50 deletions(-) diff --git a/specification/api-metrics.md b/specification/api-metrics.md index 22b6ebfbfde..173e90ffb59 100644 --- a/specification/api-metrics.md +++ b/specification/api-metrics.md @@ -282,7 +282,7 @@ summary statistics. Labels associated with Counter instrument events can be used to compute rates and totals from the instrument, over selected -dimensions. +dimensions. ### Measure @@ -325,7 +325,6 @@ Observer instrument in a single collection interval are captured at the same logical time. A single callback invocation generates (zero or more) simultaneous metric events, all sharing an implicit timestamp. - ## Interpretation We believe the three instrument kinds Counter, Measure, and Observer @@ -348,29 +347,11 @@ meaning of these actions. The standard implementation for the three instruments is defined as follows: -1. Counter. The `Add()` function accumulates a total for each -distinct label set. When aggregating over distinct label sets for a -Counter, combine using arithmetic addition and export as a sum. -Depending on the exposition format, sums are exported either as pairs -of label set and cumulative _delta_ or as pairs of label set and -cumulative _total_. - -2. Measure. Use the `Record()` function to report events for which -the SDK will compute summary statistics about the distribution of -values, for each distinct label set. The summary statistics to use -are determined by the aggregation, but they usually include at least -the sum of values, the count of measurements, and the minimum and -maximum values. When aggregating distinct Measure events, report -summary statistics of the combined value distribution. Exposition -formats for summary statistics vary widely, but typically include -pairs of label set and (sum, count, minimum and maximum value). - -3. Observer. Current values are provided by the Observer callback at -the end of each Metric collection period. When aggregating values -_for the same label set_, combine using the most-recent value. When -aggregating values _for different label sets_, combine the value -distribution as for Measure instruments. Export as pairs of label set -and (sum, count, minimum and maximum value). +1. Counter. The `Add()` function accumulates a total for each distinct label set. When aggregating over distinct label sets for a Counter, combine using arithmetic addition and export as a sum. Depending on the exposition format, sums are exported either as pairs of label set and cumulative _delta_ or as pairs of label set and cumulative _total_. + +2. Measure. Use the `Record()` function to report events for which the SDK will compute summary statistics about the distribution of values, for each distinct label set. The summary statistics to use are determined by the aggregation, but they usually include at least the sum of values, the count of measurements, and the minimum and maximum values. When aggregating distinct Measure events, report summary statistics of the combined value distribution. Exposition formats for summary statistics vary widely, but typically include pairs of label set and (sum, count, minimum and maximum value). + +3. Observer. Current values are provided by the Observer callback at the end of each Metric collection period. When aggregating values _for the same label set_, combine using the most-recent value. When aggregating values _for different label sets_, combine the value distribution as for Measure instruments. Export as pairs of label set and (sum, count, minimum and maximum value). We believe that the standard behavior of one of these three instruments covers nearly all use-cases for users of OpenTelemetry in @@ -401,14 +382,8 @@ confusion over timing metric events. We are aware of a number of reasons to iterate on these instrumentation kinds, in order to offer: -1. Range restrictions on input data. Instruments accepting negative -values is rare in most applications, for example, and it is useful to -offer both a semantic declaration (e.g., "negative values are -meaningless") and a data validation step (e.g., "negative values -should be dropped"). -2. Monotonicity support. When a series of values is known to be -monotonic, it is useful to declare this; this allows us to detect -process resets. +1. Range restrictions on input data. Instruments accepting negative values is rare in most applications, for example, and it is useful to offer both a semantic declaration (e.g., "negative values are meaningless") and a data validation step (e.g., "negative values should be dropped"). +2. Monotonicity support. When a series of values is known to be monotonic, it is useful to declare this; this allows us to detect process resets. For the most part, these behaviors are not necessary for correctness within the local process or the SDK, but they are valuable in @@ -425,10 +400,8 @@ A _View API_ is defined as an interface to an SDK mechanism that supports configuring aggregations, including which operator is applied (sum, p99, last-value, etc.) and which dimensions are used. -1. Should the API user be provided with options to configure specific -views, statically, in the source? -2. Should the View API be a stand-alone facility, able to install -configurable aggregations, at runtime? +1. Should the API user be provided with options to configure specific views, statically, in the source? +2. Should the View API be a stand-alone facility, able to install configurable aggregations, at runtime? ## Metric instrument selection @@ -557,19 +530,8 @@ cost of collecting this information. CPU usage is something that we naturally sum, which raises several questions. -- Why not use a Counter instrument? In order to use a Counter -instrument, we would need to convert total usage figures into -deltas. Calculating deltas from the previous measurement is -easy to do, but Counter instruments are not meant to be used from -callbacks. -- Why not report deltass in the Observer callback? Observer -instruments are meant to be used to observe current values. Nothing -prevents reporting deltas with an Observer, but the standard -aggregation for Observer instruments is to sum the current value -across distinct label sets. The standard behavior is useful for -determining the current rate of CPU usage, but special configuration -would be required for an Observer instrument to use Counter -aggregation. +- Why not use a Counter instrument? In order to use a Counter instrument, we would need to convert total usage figures into deltas. Calculating deltas from the previous measurement is easy to do, but Counter instruments are not meant to be used from callbacks. +- Why not report deltass in the Observer callback? Observer instruments are meant to be used to observe current values. Nothing prevents reporting deltas with an Observer, but the standard aggregation for Observer instruments is to sum the current value across distinct label sets. The standard behavior is useful for determining the current rate of CPU usage, but special configuration would be required for an Observer instrument to use Counter aggregation. ### Reporting per-shard memory holdings From 18bdaa8f478b4abba0a2cefdaa4cd170615367d7 Mon Sep 17 00:00:00 2001 From: jmacd Date: Tue, 11 Feb 2020 14:51:33 -0800 Subject: [PATCH 51/54] WithRecommendedKeys --- specification/api-metrics.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/specification/api-metrics.md b/specification/api-metrics.md index 173e90ffb59..223d5e8fcc9 100644 --- a/specification/api-metrics.md +++ b/specification/api-metrics.md @@ -8,7 +8,7 @@ * [Meter Interface](#meter-interface) * [Aggregations](#aggregations) * [Time](#time) - * [WithKeys declaration on metric instruments](#withkeys-declaration-on-metric-instruments) + * [WithRecommendedKeys declaration on metric instruments](#withrecommendedkeys-declaration-on-metric-instruments) * [Metric Event Format](#metric-event-format) - [Three kinds of instrument](#three-kinds-of-instrument) * [Counter](#counter) @@ -206,7 +206,7 @@ use of this term for the SDK specification, to refer to parts of a data format that express explicitly timestamped values, in a sequence, resulting from an aggregation of raw measurements over time. -### WithKeys declaration on metric instruments +### WithRecommendedKeys declaration on metric instruments A standard feature of metric SDKs is to pre-aggregate metric events according to a specified set of label keys (i.e., dimensions). To @@ -229,12 +229,12 @@ ik2]` are the ignored keys, then a metric event having labels event having labels `{ak1=A, ak2=B, ik1=Y, ik1=Z}` because they have identical label values for all of the aggregation keys. -The API provides a `WithKeys` option for the user to declare the +The API provides a `WithRecommendedKeys` option for the user to declare the recommended aggregation keys when declaring new metric instruments, intended as the default way to configure an exporter for pre-aggregation, if it is expected. Since this is only expected in some exporters, it is regarded as an option relevant to the exporter, -whether keys configured through `WithKeys` are applied for aggregation +whether keys configured through `WithRecommendedKeys` are applied for aggregation purposes or not. This allows the user influence the standard implementation behavior, especially for exporters that require pre-specified aggregation keys. From e51023d9def21ff4890021c6bc89433189bdac92 Mon Sep 17 00:00:00 2001 From: jmacd Date: Wed, 12 Feb 2020 11:14:18 -0800 Subject: [PATCH 52/54] Typo --- specification/api-metrics.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/specification/api-metrics.md b/specification/api-metrics.md index 223d5e8fcc9..ba925746835 100644 --- a/specification/api-metrics.md +++ b/specification/api-metrics.md @@ -235,7 +235,7 @@ intended as the default way to configure an exporter for pre-aggregation, if it is expected. Since this is only expected in some exporters, it is regarded as an option relevant to the exporter, whether keys configured through `WithRecommendedKeys` are applied for aggregation -purposes or not. This allows the user influence the standard +purposes or not. This allows the user to influence the standard implementation behavior, especially for exporters that require pre-specified aggregation keys. From 6ebfb6efea57b7c7e12a0b383d848e5bc07bb76b Mon Sep 17 00:00:00 2001 From: jmacd Date: Thu, 13 Feb 2020 14:03:34 -0800 Subject: [PATCH 53/54] Address most of Bogdan's feedback --- specification/api-metrics.md | 31 +++++++++---------------------- 1 file changed, 9 insertions(+), 22 deletions(-) diff --git a/specification/api-metrics.md b/specification/api-metrics.md index ba925746835..d235d6f2e0c 100644 --- a/specification/api-metrics.md +++ b/specification/api-metrics.md @@ -271,7 +271,12 @@ sometimes used to capture totals that rise and fall. An essential property of Counter instruments is that two events `Add(m)` and `Add(n)` are semantically equivalent to one event `Add(m+n)`. This property means that Counter events can be combined inexpensively, by -definition. Counter `Add(0)` events are no-ops, by definition. +definition. + +Note that `Add(0)` events are not considered a special case, despite +contributing nothing to a sum. `Add(0)` events MUST be observed by +the SDK in case non-default aggregations are configured for the +instrument. Counter instruments can be seen as special cases of Measure instruments with the additive property described above and a @@ -357,33 +362,13 @@ We believe that the standard behavior of one of these three instruments covers nearly all use-cases for users of OpenTelemetry in terms of the intended semantics. -### Option: Dedicated Measure instrument for timing measurements - -As a language-optional feature, the API MAY support a dedicated -instrument for reporting timing measurements. This kind of -instrument, with recommended name `Timer` (and `BoundTimer`), is -semantically equivalent to a Measure instrument. Like the Measure -instrument, Timers support a `Record()` function. The input value to -this instrument is in the language's conventional data type for timing -measurements. - -Timer instruments MUST capture only the magnitude of the input value -(i.e., an absolute value). When the user provides a negative value to -the Timer `Record()` function, the captured measurement is the number -stripped of its sign. - -For example, in Go the API will accept a `time.Duration`, and in C++ -the API will accept a `std::chrono::duration`. These instruments -apply the correct units automatically, reducing the potential for -confusion over timing metric events. - ### Future Work: Option Support We are aware of a number of reasons to iterate on these instrumentation kinds, in order to offer: 1. Range restrictions on input data. Instruments accepting negative values is rare in most applications, for example, and it is useful to offer both a semantic declaration (e.g., "negative values are meaningless") and a data validation step (e.g., "negative values should be dropped"). -2. Monotonicity support. When a series of values is known to be monotonic, it is useful to declare this; this allows us to detect process resets. +2. Monotonicity support. When a series of values is known to be monotonic, it is useful to declare this.. For the most part, these behaviors are not necessary for correctness within the local process or the SDK, but they are valuable in @@ -403,6 +388,8 @@ supports configuring aggregations, including which operator is applied 1. Should the API user be provided with options to configure specific views, statically, in the source? 2. Should the View API be a stand-alone facility, able to install configurable aggregations, at runtime? +See the [current issue on this topic](https://github.com/open-telemetry/opentelemetry-specification/issues/466). + ## Metric instrument selection To guide the user in selecting the right kind of metric instrument for From 9f186b50c08b3365b0170555df18380d6714a4ab Mon Sep 17 00:00:00 2001 From: jmacd Date: Thu, 13 Feb 2020 15:00:47 -0800 Subject: [PATCH 54/54] Typo --- specification/api-metrics.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/specification/api-metrics.md b/specification/api-metrics.md index d235d6f2e0c..343ac548249 100644 --- a/specification/api-metrics.md +++ b/specification/api-metrics.md @@ -518,7 +518,7 @@ CPU usage is something that we naturally sum, which raises several questions. - Why not use a Counter instrument? In order to use a Counter instrument, we would need to convert total usage figures into deltas. Calculating deltas from the previous measurement is easy to do, but Counter instruments are not meant to be used from callbacks. -- Why not report deltass in the Observer callback? Observer instruments are meant to be used to observe current values. Nothing prevents reporting deltas with an Observer, but the standard aggregation for Observer instruments is to sum the current value across distinct label sets. The standard behavior is useful for determining the current rate of CPU usage, but special configuration would be required for an Observer instrument to use Counter aggregation. +- Why not report deltas in the Observer callback? Observer instruments are meant to be used to observe current values. Nothing prevents reporting deltas with an Observer, but the standard aggregation for Observer instruments is to sum the current value across distinct label sets. The standard behavior is useful for determining the current rate of CPU usage, but special configuration would be required for an Observer instrument to use Counter aggregation. ### Reporting per-shard memory holdings