Specify the user-facing API for metric instruments #299

jmacd · 2019-10-10T21:53:35Z

This is rewritten content that was removed from PR#250. There is only only substantial change here, but it is substantial. This specification now says that instruments are allocated by the Meter, meaning they are automatically registered and qualified by the named Meter's component namespace. This resolves questions about what kind of registry support is needed and how to export metrics that have not been updated.

See the new caveats about instrument storage. These changes will encourage users not to store instruments in static scope, otherwise they will be forced to use the global Meter to allocate metric instruments.

jmacd · 2019-10-10T21:57:25Z

@lzchen

bogdandrutu

First half of the document, will review the rest tonight after some meetings.

specification/api-metrics-user.md

bogdandrutu · 2019-10-11T00:18:03Z

specification/api-metrics-user.md

+required to delete handles when they are no-longer in use.
+
+```golang
+func (s *server) processStream(ctx context.Context) {


I think the Handle should mostly have the same lifetime as the Metric. Deleting a Handle should happen only when no other values will be recorded for that instrument/labelset.

I think the Handle.Delete is confusing, and probably should be removed for the moment. The intention of deleting is different than the one in this example.

If I can't delete a handle, it's only useful to me for objects that live indefinitely in my process. This seems like a significant detractor. Also if handles can't be deleted, they can't be used to implement the direct access calling convention. In my Go SDK I've implemented support for handle deletion, and it is tricky, but it is the only way to make handles useful for definite-lifetime objects.

I would assume that a handle can be deleted and that the corresponding resources will be reclaimed once those updates are collected.

Not sure I understand, I think Handles semantically mean a "bidding" between an Instrument (which has a process lifetime) and a LabelSet (which if I understand correctly has the same lifetime) so it does not extend any lifetime or does not have any lifetime concern.

If LabelSet does not have a process lifetime then we need a delete there as well. Personally I see no reason to have delete (I inherit that from Prometheus I think but never seen a real use-case for these). So I am happy to just remove the Delete.

In my thinking, LabelSet objects do not have a lifetime managed by the SDK. They are values that the caller is allowed to refer to, they will be garbage collected when no longer used or released when the object is finalized. Think of a LabelSet as a value that can be used for a fast lookup, not as something registered by the SDK with a managed lifetime.

On the other handle, there is a presumption that handles take resources and should support delete. Prometheus clients do support delete as well, for example: https://godoc.org/github.com/prometheus/client_golang/prometheus#CounterVec.Delete

The Delete in prometheus is to delete the "Metric" and no longer report it, so should be called when no more values will be recorded for that "Metric" (metric in the description of the Delete function is what it is called Timeseries in OpenMetrics).

To understand that some internal infos are required about prometheus implementation:
Metric {
Map<Map<String, String> /labels/, Value /usually some atomic var/> timeseries
}

Delete will remove the entry from the timeseries map so will never be reported again. So deleting the handle should not be used as in the examples you gave.

It suggests that you can only delete handles after you stop using them and force a collection, somehow. This doesn't look well addressed in Prometheus, but the documentation for Delete is pretty clear about its purpose, and I believe it's the same purpose.

I'm asking for more than Prometheus offers. I intend for the implementation of the direct calling convention to be: (1) acquire handle, (2) operate, (3) delete handle.

Not sure I understand if we are on the same page about prometheus use of Delete which is different than what you propose. I think prometheus use has a lot of benefits, also keep in mind that an implementation is free to not allocate anything in any map or anywhere so you can still avoid holding memory if that is your goal.

It's not very clear how exactly Prometheus intends their Delete method to be used, but it does accomplish the same goal that I have, which is to free memory. I now understand that this discussion relates with Prometheus semantics, which require all memory to be pinned. The only way to release memory is to expunge it and reset the process start timestamp, i.e., force a reset.

It does appear that a Prometheus user can achieve the effect that I want if they: (1) issue final metric updates on a handle, (2) force a Collection, (3) Delete the timeseries to free resources. This allows the data to be collected and memory to be released, but it breaks the semantics of Prometheus, because now a metric that once existed as actually disappeared, which is unexpected.

Now consider a statsd exporter that pushes data out periodically--it benefits from the same aggregation as the prometheus client in this case, but it's able to free memory with correct semantics. After you send a gauge update, you're free to forget that gauge ever existed. This is a fundamental difference. If I have a process that's going to be up for days, I want it to free the bits of memory that it's using to store unused metrics. I want all the performance benefits of handles, but cannot afford for the memory not to be reclaimed.

I see this as a major drawback of the Prometheus model, which can be fixed by pushing data into a server that keeps state. This is where I expect OpenTelemetry clients to head -- pushing aggregated metrics to an opentelemetry service that can export Prometheus.

As you know, an implementation is expected to implement handles for efficiency. I've stated that my intention is to use handles as the underlying implementation for direct calls. I.e., Counter.Add() will be implemented as (1) Get handle, (2) Add(), (3) Release handle <--- (i.e., delete the handle). I'm willing to stay away from the word "Delete" because it implies deletion of a timeseries. I'm looking for "release" the handle, in order to forget it and allow memory to be reclaimed in a situation where Prometheus semantics are not requiring the memory to be kept. I don't consider an SDK that does "not allocate anything" to be an option--we expect handles to be fast, I just don't expect them to tie up memory for the lifetime of the process.

This proposal requires that we implement a method to release handles, and I'm flexible on naming. How do you like Forget() or Release()?

specification/api-metrics-user.md

bogdandrutu · 2019-10-11T00:29:08Z

specification/api-metrics-user.md

+Re-usable `LabelSet` objects provide a potential optimization for
+scenarios where handles might not be effective.  For example, if the
+label set will be re-used but only used once per metric, handles do
+not offer any optimization.  It may be best to pre-compute a


If you can "pre-compute" the LabelSet then you can "pre-compute" the Handle which is a bit more optimal, so I think this should say that if there is an opportunity to "pre-compute" the LabelSet then user should use Handle.

You can have both. You can compute one label set and then use it to construct N handles.

(As a user I am still not interested in using handles. I do not wish to allocate one field per metric that I use regardless of the optimization win.)

I've added a lot more language about the relative performance of handles and label sets. I also added a language-optional spec for ordered-LabelSet construction.

"It may be best to pre-compute a canonicalized LabelSet once and re-use it with the direct calling convention."

Does this refer to process level lifetime or request level lifetime?

I do not intend for LabelSets to have a lifetime--they need not tie up any resources inside the SDK. Only when you associate a LabelSet with an instrument (i.e., bind a handle), only then are resources consumed that should potentially be deleted when the handle lifetime ends.

Even then I think it is up to the implementation, you can return a Bind/Handle that does not reference any internal memory, but that allows us to also reference a small amount of extra memory (which to be honest is way less than the LabelSet because we are talking about an Atomic number).

In the proposed Go SDK, I allocate these small amount of extra memory the way you would expect. Handles are supported. These small amounts of memory have to be freed when they are no longer in use. Handles support a delete method so that after they've been fully collected, the memory can be freed. How about Release() or Forget()?

specification/api-metrics-user.md

bogdandrutu · 2019-10-11T00:42:54Z

specification/api-metrics-user.md

+`Add()`, `Set()`, and `Record()` method for submitting individual
+metric events.
+
+### Observer metric


Look at this example in Java for runtime metrics where I was able to use Handles and also probably win some performance:
https://github.com/open-telemetry/opentelemetry-java/blob/master/contrib/runtime_metrics/src/main/java/io/opentelemetry/contrib/metrics/runtime/GarbageCollector.java#L59

I was able to pre-compute all the Handles.

Also I like (ok don't blame me that I like my design there) the way I passed an object called "Result" or "Update" to the callback that is used to construct the result.

Your example was a bit surprising. Why not let each Observer handle have its own callback? That seems more natural. Not saying it's bad, but letting handles use their own callback could allow parallelization.

Instead of Result.put, I would expect Result.observe, fwiw.

Parallelization can be implemented if really needed in the callback. The most cases where I had to use the Observers were cases where multiple values (for more than one Handle) are gotten using one syscall (or any other mechanism). For example here https://github.com/open-telemetry/opentelemetry-java/blob/master/contrib/runtime_metrics/src/main/java/io/opentelemetry/contrib/metrics/runtime/MemoryPools.java#L94 you can see that I use heapUsage result to update 3 handles, if one callback per Handle I need to get the heapUsage 3 times.

Also I saw another interesting pattern, where a syscall (or any mechanism to get the metric) requires the data to update different Handles in different metrics (this is a bit more rare than the example I gave to you).

The put vs observe I have 0 problem changing that. Put is a Java specific keyword so happy to use observe there.

I propose to return to the OTEP 0008 on metric observers and pin down a few more details. I'm removing it from this draft.

bogdandrutu

Sorry got trapped. Thought that the whole PR was just that small fix

jmacd · 2019-10-12T00:04:18Z

I've run out of time to respond to these remarks today. Back at it next week.

yurishkuro · 2019-10-12T00:37:14Z

specification/api-metrics-user.md

+metric events, but offer varying degrees of performance and
+convenience.
+
+#### Metric handle calling convention


Consider the name "Bound Instruments", created via Instrument.Bind(labels) method. A handle is essentially an instrument (a thing used to record measurements), so imo introducing a new concept of "handle" is more cognitive overhead than a "slightly different/optimized instrument".

@bogdandrutu Do you agree? I think it's a good idea.

If this is just a name change, I am usually very flexible with that, if it does change behavior I would like to understand what changes.

Yes, it's just a name change. We would replace "Instrument handle" with "Bound instrument", "GetHandle" with "Bind", etc.

Shall I update the names in this PR?

specification/api-metrics-user.md

yurishkuro · 2019-10-12T00:40:28Z

specification/api-metrics-user.md

+
+### Observer metric
+
+Observer metrics do not support handles or direct calls.  The


fwiw, I am very confused by the observers and what they are used for. Could use some examples / use cases.

In Java I created an example for some runtime metrics that uses observer https://github.com/open-telemetry/opentelemetry-java/blob/master/contrib/runtime_metrics/src/main/java/io/opentelemetry/contrib/metrics/runtime/GarbageCollector.java#L59

I checked that example, it didn't help. How is that different from having a timer thread that would bulk-publish metrics?

In general you can see these two as equivalent. There are some important differences:

With Observer implementation can better control which metrics to read, as an example some metrics can be disabled and we don't have to read them at all (which is not possible with the timer because the timer callback will get executed anyway).

With Observer pattern in case of a pull based model like Prometheus, we do execute the read of the metrics only when an external event ask for the metric (in case of Prometheus when the server does a request to the client).

With Observer the some implementations can give users an uniform way to control the read interval for these metrics.

In some languages is not that easy to start a timer (for example in Java that will be around 50 (estimated) lines of code). So having this logic in one place is better. Also timer is not free and having the possibility to implement all these callbacks with less timers may be something that some implementations may consider.

These are some important advantages to have the logic of reading these kind of metrics controlled by the implementation.

The text describing Observer says nothing about all these aspects, like timers, interval control, what triggers the callback, etc. Can we remove it from this PR and do another one dedicated just to the Observer, with more detailed description / specification?

I don't have a strong opinion about doing it in a different PR or in this one, but I do agree that things that I mentioned in the previous comment should be documented.

I will have to add more content, I'm happy to iterate in this PR because there are many open questions that will delay it. The OTEP has details: https://github.com/open-telemetry/oteps/blob/master/text/0008-metric-observer.md. The argument is that it's most efficient when the exporter or whomever is pulling metrics determines the frequency that metrics are captured. This is important in cases where the metrics are expensive to compute.

I propose to return to the OTEP 0008 on metric observers and pin down a few more details. I'm removing it from this draft.

jmacd

As there are a lot of open questions here, I will keep this PR open and push one update at the end of today. Meanwhile, please consider the questions raised by me and others, here.

specification/api-metrics-user.md

jmacd · 2019-10-14T20:28:47Z

specification/api-metrics-user.md

+metric events, but offer varying degrees of performance and
+convenience.
+
+#### Metric handle calling convention


@bogdandrutu Do you agree? I think it's a good idea.

specification/api-metrics-user.md

jmacd · 2019-10-14T21:08:25Z

specification/api-metrics-user.md

+`Add()`, `Set()`, and `Record()` method for submitting individual
+metric events.
+
+### Observer metric


Your example was a bit surprising. Why not let each Observer handle have its own callback? That seems more natural. Not saying it's bad, but letting handles use their own callback could allow parallelization.

Instead of Result.put, I would expect Result.observe, fwiw.

jmacd · 2019-10-14T21:10:49Z

specification/api-metrics-user.md

+
+### Observer metric
+
+Observer metrics do not support handles or direct calls.  The


I will have to add more content, I'm happy to iterate in this PR because there are many open questions that will delay it. The OTEP has details: https://github.com/open-telemetry/oteps/blob/master/text/0008-metric-observer.md. The argument is that it's most efficient when the exporter or whomever is pulling metrics determines the frequency that metrics are captured. This is important in cases where the metrics are expensive to compute.

specification/api-metrics-user.md

…for allocating static metrics

bogdandrutu · 2019-10-14T23:02:17Z

Did you forget to push your change?

jmacd · 2019-10-16T17:32:35Z

specification/api-metrics-user.md

+| Unit         | WithUnit(string) | Units specified according to the [UCUM](http://unitsofmeasure.org/ucum.html). |
+| Recommended label keys | WithRecommendedKeys(list) | Recommended grouping keys for this instrument. |
+| Monotonic   | WithMonotonic(boolean) | Configure a counter or gauge that accepts only monotonic/non-monotonic updates. |
+| Absolute    | WithAbsolute(boolean) | Configure a measure that does or does not accept negative updates. |


I made this change considering your feedback on the previous PR. We've got two options here, WithMonotonic(boolean) and WithAbsolute(boolean). To construct a measure instrument for signed values, you would pass WithAbsolute(false) as the option. Sound good? I will update the other document accordingly in a parallel PR.

@AloisReitbauer ^^^

SergeyKanzhelev · 2019-10-16T20:55:27Z

My understanding of this PR is that in principle it looks good. And now there are many questions about naming and behavior of specific methods.

@jmacd @bogdandrutu @yurishkuro @rghetia @lzchen do you believe those issues may be filed as a separate items and marked for v0.3 while this PR is merged? Or the entire PR postponed to the next milestone?

rghetia · 2019-10-16T21:12:21Z

My understanding of this PR is that in principle it looks good. And now there are many questions about naming and behavior of specific methods.

@jmacd @bogdandrutu @yurishkuro @rghetia @lzchen do you believe those issues may be filed as a separate items and marked for v0.3 while this PR is merged? Or the entire PR postponed to the next milestone?

I prefer on merging this PR and sorting out the remaining issues in v0.3 rather than moving this entirely to v0.3

jmacd · 2019-10-16T23:08:26Z

There appears to be a disagreement about the handle Delete method. I can't see a way around it. I recognize it means we have to implement garbage collection or add locking to achieve the semantics, which is why I'm also working on a (hopefully) convincing lock-free algorithm for maintaining state in the SDK (open-telemetry/opentelemetry-go#172).

I believe we could quickly modify "Instrument handle" to "Bound instrument" as a follow-on, I'd rather not continue modifying this PR. I also intend to supply a document detailing the Meter APIs, but it's a very mechanical document and I do not expect it to raise any disagreements.

SergeyKanzhelev · 2019-10-16T23:15:38Z

@jmacd can we ship without Delete now? it will cripple scenarios supported but we can introduce it later

jmacd · 2019-10-16T23:38:23Z

I think we are seeing that it is not trivial to add Delete to a system that was designed without it, so this worries me. My position is that in order to implement the direct calling convention most reasonable implementations will (1) get handle, (2) operate, (3) delete handle. Deleting a handle means removing an external reference to some internal state. The internal state should be kept until it is collected and then it can be garbage collected if there are no active handles.

I would like to hear a convincing plan that implements the direct calling convention without using handles underneath before we remove Delete.

jmacd · 2019-10-17T16:12:06Z

@bogdandrutu @SergeyKanzhelev I am willing to take Delete out for now. I also want to keep something, so I wonder if we could call this Forget(), instead, and specify that an application is not required to implement garbage collection, particularly that when the data semantics require memory, as it does to compute a sum from a series of Add() calls. In the monotonic cases, the semantics are calling for Forget() to just collect the handle, not delete anything from the underlying memory.

SergeyKanzhelev · 2019-10-17T16:49:32Z

@jmacd I believe we will need a variant of Delete. So I only propose to limit API for now to give us time to discuss how to put it back. Not suggesting to eliminate it all together for 1.0.

With this in mind it's easier if we will simply remove the paragraph about it for now.

jmacd · 2019-10-17T17:05:04Z

Done @SergeyKanzhelev

SergeyKanzhelev · 2019-10-17T17:39:16Z

@bogdandrutu @yurishkuro @rghetia @lzchen please file issues off the every outstanding conversation. Merging the rest

bogdandrutu

In order to make progress, let's submit the PR as it is and fix outstanding comments in a future PR.

SergeyKanzhelev · 2019-10-17T17:45:01Z

merging with the outstanding comments to be addressed.

* Edits, expansion * Minor fix for AloisReitbauer's feedback on PR250 * Describe metrics event context; make use of global Meter recommended for allocating static metrics * Feedback part 1 * Feedback part 2 * Feedback part 3 * Feedback part 4 * Revert api-metrics.md fix * Comment on convenience overload to bypass LabelSet * Change signed/non-negative to absolute * Remove observer instrument from this spec for v0.2 * Remove Delete for 0.2

Edits, expansion

bdc607d

jmacd requested review from AloisReitbauer, bogdandrutu, c24t, carlosalberto, iredelmeier, reyang, SergeyKanzhelev, songy23, tedsuo, tigrannajaryan and yurishkuro as code owners October 10, 2019 21:53

bogdandrutu reviewed Oct 10, 2019

View reviewed changes

specification/api-metrics-user.md Outdated Show resolved Hide resolved

specification/api-metrics-user.md Outdated Show resolved Hide resolved

specification/api-metrics-user.md Outdated Show resolved Hide resolved

specification/api-metrics-user.md Outdated Show resolved Hide resolved

rghetia reviewed Oct 10, 2019

View reviewed changes

specification/api-metrics-user.md Outdated Show resolved Hide resolved

specification/api-metrics-user.md Show resolved Hide resolved

specification/api-metrics-user.md Outdated Show resolved Hide resolved

bogdandrutu reviewed Oct 11, 2019

View reviewed changes

jmacd mentioned this pull request Oct 11, 2019

Metrics specification updates #250

Merged

Minor fix for AloisReitbauer's feedback on PR250

2edc872

bogdandrutu approved these changes Oct 12, 2019

View reviewed changes

bogdandrutu requested changes Oct 12, 2019

View reviewed changes

yurishkuro reviewed Oct 12, 2019

View reviewed changes

specification/api-metrics-user.md Outdated Show resolved Hide resolved

yurishkuro reviewed Oct 12, 2019

View reviewed changes

jmacd commented Oct 14, 2019

View reviewed changes

Describe metrics event context; make use of global Meter recommended …

599d2ac

…for allocating static metrics

jmacd added 3 commits October 14, 2019 23:11

Feedback part 1

12b941f

Feedback part 2

c3dd27b

Feedback part 3

4cca929

jmacd added 3 commits October 15, 2019 18:09

Comment on convenience overload to bypass LabelSet

84865af

Change signed/non-negative to absolute

1342cdc

Remove observer instrument from this spec for v0.2

de28b1f

jmacd commented Oct 16, 2019

View reviewed changes

Upstream

cc9d0d0

jmacd mentioned this pull request Oct 16, 2019

Use Absolute(true/false) vs NonNegative/Signed #312

Merged

SergeyKanzhelev added this to the Alpha v0.2 milestone Oct 16, 2019

Remove Delete for 0.2

adc68a3

SergeyKanzhelev approved these changes Oct 17, 2019

View reviewed changes

bogdandrutu approved these changes Oct 17, 2019

View reviewed changes

SergeyKanzhelev merged commit 9385578 into open-telemetry:master Oct 17, 2019

This was referenced Oct 17, 2019

API: Raw vs. other metrics / measurements are unclear #83

Closed

Eliminate Measurement class to save on allocations #145

Closed

jmacd deleted the jmacd/metrics-spec4 branch October 17, 2019 19:55

jmacd mentioned this pull request Oct 22, 2019

OTEP 0049: LabelSet specification (to match current Metrics spec) open-telemetry/oteps#49

Merged

lzchen mentioned this pull request Oct 29, 2019

feat: implement labelset open-telemetry/opentelemetry-js#463

Merged

c24t mentioned this pull request Nov 25, 2019

Update 0009-metric-handles to match current spec open-telemetry/oteps#62

Merged

jmacd mentioned this pull request Feb 3, 2020

Combined Metrics API updates for v0.3 (Add Observer instrument, remove Gauge instrument, etc.) #430

Merged

TuckTuckFloof pushed a commit to TuckTuckFloof/opentelemetry-specification that referenced this pull request Oct 15, 2020

Add OTLP exporter README (open-telemetry#299)

7d4c181


		### Observer metric

		Observer metrics do not support handles or direct calls. The

Specify the user-facing API for metric instruments #299

Specify the user-facing API for metric instruments #299

Conversation

jmacd commented Oct 10, 2019

jmacd commented Oct 10, 2019

bogdandrutu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jmacd Oct 14, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jmacd Oct 18, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bogdandrutu left a comment

Choose a reason for hiding this comment

jmacd commented Oct 12, 2019

yurishkuro Oct 12, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jmacd left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bogdandrutu commented Oct 14, 2019

jmacd Oct 16, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SergeyKanzhelev commented Oct 16, 2019

rghetia commented Oct 16, 2019

jmacd commented Oct 16, 2019

SergeyKanzhelev commented Oct 16, 2019

jmacd commented Oct 16, 2019

jmacd commented Oct 17, 2019

SergeyKanzhelev commented Oct 17, 2019

jmacd commented Oct 17, 2019

SergeyKanzhelev commented Oct 17, 2019

bogdandrutu left a comment

Choose a reason for hiding this comment

SergeyKanzhelev commented Oct 17, 2019

jmacd Oct 14, 2019 •

edited

Loading

jmacd Oct 18, 2019 •

edited

Loading

yurishkuro Oct 12, 2019 •

edited

Loading

jmacd Oct 16, 2019 •

edited

Loading