Metrics default SDK configuration CFP #382

jmacd · 2019-12-10T20:11:23Z

This issue is meant to tie together a number of loose threads about configuring a metrics SDK for some of the real-world demands that we know exist. Those are:

Configuring specific "views" of a metric instrument, which implies the ability to select which metric instruments are aggregated, which set(s) of label keys they are pre-aggregated by, and which aggregation(s) are applied. It should be possible to disable metric instruments, to configure the same instrument for multiple aggregations (or multiple sets of label keys). For push-based exporters (specifically OTLP), there is a desire to configure the collection interval on a per-instrument or per-view basis.
Configuring automatic extraction of correlation context for inclusion in metrics aggregation. In some ways distributed correlations are just the same as ordinary labels, in so far as configuring them goes, but we know that they are more expensive to implement and therefore it would be desirable to configure distributed correlation aggregations explicitly. We've discussed use of a boolean to indicate when an aggregation key is non-local to imply that it should be retrieved from the context, not from the LabelSet.
We've discussed a desire to configure trace "exemplars", with two natural forms: (a) configure exporting of span context values for a given aggregation, (b) export a sample of additional labels (i.e., those in the LabelSet or the distributed correlation context that are not part of the aggregation) as exemplars. An example of part (b) would be to configure the top-K most frequent values of a label that was not used for aggregation; ideally the exemplar format would permit including both values and estimated frequencies--for example, when aggregating a sum of "request bytes" by service name, the export the approximate top-10 "host" label values that contributed to each service names's sum.

Configuration should be specified in protobuf format, allowing us to reason about SDK configuration via plain code, via a configuration file, or via a network response. The configuration specification should think about whether these configurations can be changed dynamically or whether they are set once at startup.

There is a separate set of concerns related to configuring metrics export within a stream of trace data. While this is also tracing SDK configuration, it touches on metrics so I'm including it in this issue.

Can we use this specification to configure per-span export of metrics?
Can we configure per-span export of specific metrics? For example, I'd like to include the current (average) CPU load of my process with every span. The instrumentation should not have to be modified, simply use a stateful aggregation of the CPU load and export the last value within each span.

Some related topics were discussed in #259. See related issue #381. See related discussion of exemplars in the OTLP proto open-telemetry/opentelemetry-proto#81.

The text was updated successfully, but these errors were encountered:

jkwatson · 2019-12-10T22:15:44Z

Fantastic writeup, @jmacd . you beat me to it!

jmacd · 2019-12-13T06:13:28Z

The two issues linked above call for implementation support of the basic mechanisms described here, namely the ability to support multiple aggregations and aggregation by distributed correlations. It will be helpful to prototype the configuration mechanism to understand its feasibility.

tylerbenson · 2019-12-13T16:05:49Z

I don't want to pass judgement too early, but I have concerns about having all config defined across languages via protobuf. If we can agree that the protobuf defines a core set of configs, but each language will likely have additional config, then I think that would reduce my concern somewhat.

jkwatson · 2019-12-16T16:33:32Z

I think we should pull this into a single issue (or re-title this one), for general default SDK configuration considerations. Or, would you rather keep this one as-is, and have a separate issue that tracks specifying the general considerations?

That is, we can have a new issue that describes how the default SDK should be configurable, and separate issues for the details of what is configurable for various pieces of the SDK (metrics, traces, exporters, etc).

jmacd · 2019-12-16T20:13:42Z

I think I agree. There are already some Tracer-related items in this issue, to your point. OTOH, I've seen a desire to keep the metrics and tracing SDKs relatively separate, so they can be mixed and matched.

jkwatson · 2019-12-16T21:11:15Z

Having a common set of mechanisms for configuration across all the SDK pieces and parts seems important for reducing developer (operator, etc) cognitive load and surprise seems very important to me.

So, perhaps one issue to track the default-SDK(s) configuration mechanisms, and then separate issues for the details of what is configurable for each of the pieces feels like a good separation.

If you're ok with that, I'll go ahead and create the "general configuration mechanisms" issue separate from this one.

jmacd · 2019-12-16T21:32:38Z

Sounds good.

jkwatson · 2019-12-16T22:48:23Z

#390 for configuration mechanism

cijothomas · 2019-12-19T20:49:57Z

As asked in Metric SIG today, sharing one concrete example of scenario where a metric instrument should be able to be aggregated by multiple aggregators.

Microsoft Azure Monitor has a feature called "Live Metrics" which shows metrics in near real-time - it shows metrics like Requests/Sec, Request/Duration, with 1 sec aggregation, and with limited label/dimensions. (success, servername). The same metrics are also stored with 1 min aggregation with more label/dimensions (response code, url, etc.), for other Metric UI experiences.

To continue providing the same feature with OT, we need the ability of associating Metric instruments to multiple Aggregators.

(
In Azure Monitor, all metric update calls go through a chain of processors where every processor gets the metric update call. One of the processors in the chain does 1 sec aggregation with minimal dimensions and the next processor does 1 min aggregation with more dimensions.
)

jmacd · 2020-01-22T16:58:32Z

#197 requests a way to configure metric reporting intervals independently.

jmacd · 2020-06-05T22:50:49Z

See open-telemetry/opentelemetry-proto#155

jmacd added the spec:metrics Related to the specification/metrics directory label Dec 12, 2019

This was referenced Dec 12, 2019

Metrics SDK: Support for aggregation by distributed correlation context (Spec) open-telemetry/opentelemetry-go#383

Closed

Metrics SDK: Support for configurable metrics Batcher/Aggregators (Spec) open-telemetry/opentelemetry-go#384

Closed

jmacd mentioned this issue Jan 22, 2020

Feature Request: configure reporting period separately for each metric #197

Closed

jmacd mentioned this issue Apr 8, 2020

Support Points aggregator marwan-at-work/otel-exporter-datadog#1

Closed

reyang added the area:sdk Related to the SDK label Jun 30, 2020

andrewhsu added priority:p2 Medium priority level release:required-for-ga Must be resolved before GA release, or nice to have before GA labels Jul 28, 2020

andrewhsu added release:allowed-for-ga Editorial changes that can still be added before GA since they don't require action by SIGs and removed release:required-for-ga Must be resolved before GA release, or nice to have before GA labels Oct 20, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Metrics default SDK configuration CFP #382

Metrics default SDK configuration CFP #382

jmacd commented Dec 10, 2019

jkwatson commented Dec 10, 2019

jmacd commented Dec 13, 2019

tylerbenson commented Dec 13, 2019

jkwatson commented Dec 16, 2019

jmacd commented Dec 16, 2019

jkwatson commented Dec 16, 2019

jmacd commented Dec 16, 2019

jkwatson commented Dec 16, 2019

cijothomas commented Dec 19, 2019

jmacd commented Jan 22, 2020

jmacd commented Jun 5, 2020

Metrics default SDK configuration CFP #382

Metrics default SDK configuration CFP #382

Comments

jmacd commented Dec 10, 2019

jkwatson commented Dec 10, 2019

jmacd commented Dec 13, 2019

tylerbenson commented Dec 13, 2019

jkwatson commented Dec 16, 2019

jmacd commented Dec 16, 2019

jkwatson commented Dec 16, 2019

jmacd commented Dec 16, 2019

jkwatson commented Dec 16, 2019

cijothomas commented Dec 19, 2019

jmacd commented Jan 22, 2020

jmacd commented Jun 5, 2020