Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metrics SDK: Accumulator and Processor state requirements #1198

Closed
wants to merge 7 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
118 changes: 108 additions & 10 deletions specification/metrics/sdk.md
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,9 @@ These are the significant data types used in the model architecture:
- **ExportRecord**: consists of Instrument, Label Set, Resource, Timestamp(s), and Aggregation
- **ExportRecordSet**: a set of export records.

TODO(jmacd): rename ExportKind to AggregationTemporality,
ExportKindSelector to AggregationTemporalitySelector.

The term **SDK instrument** refers to the underlying implementation of
an instrument.

Expand Down Expand Up @@ -228,18 +231,33 @@ Accumulator, with detail shown for synchronous instruments.

![Metrics SDK Accumulator Detail Diagram](img/accumulator-detail.png)

For a synchronous instrument, the Accumulator will:
The Accumulator's primary tasks are to aggregate synchronous metric
events over a collection interval, and then at end of the interval, to
evaluate asynchronous callbacks and finally snapshot current
Aggregators for passing to the Processor.

The Accumulator MUST be configured with an AggregatorSelector
interface that is used to assign new Aggregators to instruments as
they are needed.

1. Map each active Label Set to a record, consisting of two instances of the same type Aggregator
2. Enter new records into the mapping, calling the AggregationSelector if needed
3. Update the current Aggregator instance, responding to concurrent API events
4. Call Aggregator.SynchronizedMove on the current Aggregator instance to: (a) copy its value into the snapshot Aggregator instance and (b) reset the current Aggregator to the zero state
5. Call Processor.Process for every resulting Accumulation (i.e., Instrument, Label Set, Resource, and Aggregator snapshot)
The Accumulator MUST ensure that metric events for a given instrument
and identical label set that occur within a single collection interval
are passed to the same Aggregator.

The Accumulator MUST provide the option to associate a
[`Resource`](../resource/sdk.md) with the Accumulations that it
The Accumulator MUST snapshot the current value and reset the state of
every Aggregator that was used during a collection interval. The
Aggregator snapshot, together with the instrument descriptor, label
set, and Resource, define an Accumulation and are passed to the
Processor.

The Accumulator MUST provide the option to configure the
[`Resource`](../resource/sdk.md) associated with Accumulations that it
produces.

Aggregators that are not used during a collection interval MUST not
yield Accumulations for that collection interval, when no events or
observations happen.

Synchronous metric instruments are expected to be used concurrently.
Unless concurrency is not a feature of the source language, the SDK
Accumulator component SHOULD be designed with concurrent performance
Expand All @@ -250,6 +268,18 @@ synchronous instrument updates. The Accumulator SHOULD NOT hold an
exclusive lock while calling an Aggregator (see below), since some
Aggregators may have higher concurrency expectations.

The Accumulator MUST eliminate from memory (i.e. "forget") state
associated with label sets used in earlier collection intervals, after
they are not for a suitable amount of time. A "suitable amount of
time" is intentionally not specific, since implementations may wish to
optimize memory management and have to contend with concurrent access.
This requirement ensures that export pipelines constructed for
stateless exporters (e.g. Statsd, OTLP with a stateless
ExportKindSelector) are not forced into the use of permanent state in
the Accumulator. This implies that the use of long-term state in a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great if the details of how the golang implementation manages this were included in the detailed description of the model implementation. [I think this is done via some clever reference counting, IIRC].

Metrics export pipeline should be elective, and such state if present
should be managed in the Processor component.

#### Accumulator: Collect() function

The Accumulator MUST implement a Collect method that builds and
Expand All @@ -268,9 +298,77 @@ Label Set, Resource, and metric Descriptor.

TODO: _Are there more Accumulator functional requirements?_

### Processor
### Processor: Component interface

The Processor component interface supports interaction from the
Controller and Accumulator. The Controller, responsible for
initiating a new round of collection, informs the Processor when a
collection interval starts and finishes. After finishing the
collection interval, the Controller gets the ExportRecordSet before
calling the Exporter to export data.

The Accumulator interacts with the Processor during the call to its
`Collect()` method, during which it calls `Process()` once per
ExportRecord on the Processor.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why the Controller doesn't mediate this? Couldn't the controller ask the accumulator for the data, then pass it to the Processor, rather than have the Accumulator know about the processor directly? What do we gain from the extra coupling between Accumulator and Processor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Controller does mediate this--it's the exclusive "owner" of the pipeline and mediates everything, I'd say. Why ask the Accumulator to pass data to the Processor? The Processor has lots of flexibility this way, e.g., to simply ignore Accumulations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My question is...why not have the controller pass this data to the processor, and decouple the accumulator from the processor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After thinking about this, I guess there's no reason to require the Accumulator to call the Processor directly. The Processor could instead implement an iterator pattern, to let the Processor scan through a collection representing the Accumulator state.

Is that a better design? I'm not sure the code will be cleaner for me in Go--if I'm to avoid allocating an intermediate data structure--but it sounds logically cleaner.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll ponder in my nascent java implementation what the appropriate degree of coupling might be. @breedx-splk this is something we should think about.


The Processor component is meant to be used for managing long-term
state; it is also one of the locations in the Metrics export pipeline
where we can impemlement control over cardinality. There are two
reasons that long-term state is typically required in a Metric export
pipeline:

1. Because the Exporter requests Cumulative aggregation temporality for Sum and/or Histogram data points
2. Because the Exporter requests keeping Memory about all metric label sets, regardless of the requested aggregation temporality.

Note that both of these behaviors are typically required for a
Prometheus exporter and that when none of these behaviors are
configured, the Metrics export pipeline can be expected not to develop
long-term state.

#### Basic Processor

The basic Processor supports two standard ExportKindSelectors and the
independent Memory behavior described above. The default
OpenTelemetry Metrics SDK MUST provide a basic Processor meeting these
requirements.

##### Basic Processor: CumulativeExportKindSelector

CumulativeExportKindSelector is the default behavior, which requests
exporting Cumulative aggregation temporality for Sums and Histograms
and implies that label sets used with synchronous instruments will be
remembered indefinitely in the SDK. This ExportKindSelector is the
default in order support downstream Prometheus exporters "out of the
box".

##### Basic Processor: StatelessExportKindSelector

The StatelessExportKindSelector configures a Metric export pipeline
with no long-term memory requirements. In this selector, the Counter,
UpDownCounter, ValueRecorder, and ValueObserver instruments are
configured for Delta aggregation temporality while SumObserver and
UpDownSumObserver instruments are configured for Cumulative
aggregation temporality. This basic Processor configuration has no
long-term memory requirements because the instrument temporality
matches the aggregation temporality, meaning Accumulations "pass
through" the Processor without requiring additional memory for
temporality conversion.

##### Basic Processor: Memory

Some metrics exporter configurations request that the Metric export
pipeline maintain long-term state about historically reported Metric
timeseries. This option is a simple boolean that, when set, requires
the Processor to retain memory about all timeseries it has ever
exported. This option is only meaningful when reporting Cumulative
aggregation temporality.

#### Reducing Processor

TODO _Processor functional requirements_
The reducing Processor is a Processor interface implementation used in
conjunction with another (e.g., basic) Processor to drop labels in a
Metric export pipeline. The default OpenTelemetry SDK SHOULD provide
a Reducing Processor implementation.

### Controller

Expand Down