Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metrics default SDK configuration CFP #382

Open
jmacd opened this issue Dec 10, 2019 · 11 comments
Open

Metrics default SDK configuration CFP #382

jmacd opened this issue Dec 10, 2019 · 11 comments
Labels
area:sdk Related to the SDK priority:p2 Medium priority level release:allowed-for-ga Editorial changes that can still be added before GA since they don't require action by SIGs spec:metrics Related to the specification/metrics directory

Comments

@jmacd
Copy link
Contributor

jmacd commented Dec 10, 2019

This issue is meant to tie together a number of loose threads about configuring a metrics SDK for some of the real-world demands that we know exist. Those are:

  1. Configuring specific "views" of a metric instrument, which implies the ability to select which metric instruments are aggregated, which set(s) of label keys they are pre-aggregated by, and which aggregation(s) are applied. It should be possible to disable metric instruments, to configure the same instrument for multiple aggregations (or multiple sets of label keys). For push-based exporters (specifically OTLP), there is a desire to configure the collection interval on a per-instrument or per-view basis.
  2. Configuring automatic extraction of correlation context for inclusion in metrics aggregation. In some ways distributed correlations are just the same as ordinary labels, in so far as configuring them goes, but we know that they are more expensive to implement and therefore it would be desirable to configure distributed correlation aggregations explicitly. We've discussed use of a boolean to indicate when an aggregation key is non-local to imply that it should be retrieved from the context, not from the LabelSet.
  3. We've discussed a desire to configure trace "exemplars", with two natural forms: (a) configure exporting of span context values for a given aggregation, (b) export a sample of additional labels (i.e., those in the LabelSet or the distributed correlation context that are not part of the aggregation) as exemplars. An example of part (b) would be to configure the top-K most frequent values of a label that was not used for aggregation; ideally the exemplar format would permit including both values and estimated frequencies--for example, when aggregating a sum of "request bytes" by service name, the export the approximate top-10 "host" label values that contributed to each service names's sum.

Configuration should be specified in protobuf format, allowing us to reason about SDK configuration via plain code, via a configuration file, or via a network response. The configuration specification should think about whether these configurations can be changed dynamically or whether they are set once at startup.

There is a separate set of concerns related to configuring metrics export within a stream of trace data. While this is also tracing SDK configuration, it touches on metrics so I'm including it in this issue.

  1. Can we use this specification to configure per-span export of metrics?
  2. Can we configure per-span export of specific metrics? For example, I'd like to include the current (average) CPU load of my process with every span. The instrumentation should not have to be modified, simply use a stateful aggregation of the CPU load and export the last value within each span.

Some related topics were discussed in #259. See related issue #381. See related discussion of exemplars in the OTLP proto open-telemetry/opentelemetry-proto#81.

@jkwatson
Copy link
Contributor

Fantastic writeup, @jmacd . you beat me to it!

@jmacd
Copy link
Contributor Author

jmacd commented Dec 13, 2019

The two issues linked above call for implementation support of the basic mechanisms described here, namely the ability to support multiple aggregations and aggregation by distributed correlations. It will be helpful to prototype the configuration mechanism to understand its feasibility.

@tylerbenson
Copy link
Member

I don't want to pass judgement too early, but I have concerns about having all config defined across languages via protobuf. If we can agree that the protobuf defines a core set of configs, but each language will likely have additional config, then I think that would reduce my concern somewhat.

@jkwatson
Copy link
Contributor

I think we should pull this into a single issue (or re-title this one), for general default SDK configuration considerations. Or, would you rather keep this one as-is, and have a separate issue that tracks specifying the general considerations?

That is, we can have a new issue that describes how the default SDK should be configurable, and separate issues for the details of what is configurable for various pieces of the SDK (metrics, traces, exporters, etc).

@jmacd
Copy link
Contributor Author

jmacd commented Dec 16, 2019

I think I agree. There are already some Tracer-related items in this issue, to your point. OTOH, I've seen a desire to keep the metrics and tracing SDKs relatively separate, so they can be mixed and matched.

@jkwatson
Copy link
Contributor

Having a common set of mechanisms for configuration across all the SDK pieces and parts seems important for reducing developer (operator, etc) cognitive load and surprise seems very important to me.

So, perhaps one issue to track the default-SDK(s) configuration mechanisms, and then separate issues for the details of what is configurable for each of the pieces feels like a good separation.

If you're ok with that, I'll go ahead and create the "general configuration mechanisms" issue separate from this one.

@jmacd
Copy link
Contributor Author

jmacd commented Dec 16, 2019

Sounds good.

@jkwatson
Copy link
Contributor

#390 for configuration mechanism

@cijothomas
Copy link
Member

As asked in Metric SIG today, sharing one concrete example of scenario where a metric instrument should be able to be aggregated by multiple aggregators.

Microsoft Azure Monitor has a feature called "Live Metrics" which shows metrics in near real-time - it shows metrics like Requests/Sec, Request/Duration, with 1 sec aggregation, and with limited label/dimensions. (success, servername). The same metrics are also stored with 1 min aggregation with more label/dimensions (response code, url, etc.), for other Metric UI experiences.

To continue providing the same feature with OT, we need the ability of associating Metric instruments to multiple Aggregators.

(
In Azure Monitor, all metric update calls go through a chain of processors where every processor gets the metric update call. One of the processors in the chain does 1 sec aggregation with minimal dimensions and the next processor does 1 min aggregation with more dimensions.
)

@jmacd
Copy link
Contributor Author

jmacd commented Jan 22, 2020

#197 requests a way to configure metric reporting intervals independently.

@jmacd
Copy link
Contributor Author

jmacd commented Jun 5, 2020

@reyang reyang added the area:sdk Related to the SDK label Jun 30, 2020
@andrewhsu andrewhsu added priority:p2 Medium priority level release:required-for-ga Must be resolved before GA release, or nice to have before GA labels Jul 28, 2020
@andrewhsu andrewhsu added release:allowed-for-ga Editorial changes that can still be added before GA since they don't require action by SIGs and removed release:required-for-ga Must be resolved before GA release, or nice to have before GA labels Oct 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:sdk Related to the SDK priority:p2 Medium priority level release:allowed-for-ga Editorial changes that can still be added before GA since they don't require action by SIGs spec:metrics Related to the specification/metrics directory
Projects
None yet
Development

No branches or pull requests

6 participants