Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specify the user-facing API for metric instruments #299

Merged
merged 13 commits into from
Oct 17, 2019

Conversation

jmacd
Copy link
Contributor

@jmacd jmacd commented Oct 10, 2019

This is rewritten content that was removed from PR#250. There is only only substantial change here, but it is substantial. This specification now says that instruments are allocated by the Meter, meaning they are automatically registered and qualified by the named Meter's component namespace. This resolves questions about what kind of registry support is needed and how to export metrics that have not been updated.

See the new caveats about instrument storage. These changes will encourage users not to store instruments in static scope, otherwise they will be forced to use the global Meter to allocate metric instruments.

@jmacd
Copy link
Contributor Author

jmacd commented Oct 10, 2019

@lzchen

Copy link
Member

@bogdandrutu bogdandrutu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First half of the document, will review the rest tonight after some meetings.

specification/api-metrics-user.md Outdated Show resolved Hide resolved
specification/api-metrics-user.md Outdated Show resolved Hide resolved
specification/api-metrics-user.md Outdated Show resolved Hide resolved
specification/api-metrics-user.md Outdated Show resolved Hide resolved
specification/api-metrics-user.md Outdated Show resolved Hide resolved
specification/api-metrics-user.md Show resolved Hide resolved
specification/api-metrics-user.md Outdated Show resolved Hide resolved
required to delete handles when they are no-longer in use.

```golang
func (s *server) processStream(ctx context.Context) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the Handle should mostly have the same lifetime as the Metric. Deleting a Handle should happen only when no other values will be recorded for that instrument/labelset.

I think the Handle.Delete is confusing, and probably should be removed for the moment. The intention of deleting is different than the one in this example.

Copy link
Contributor Author

@jmacd jmacd Oct 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I can't delete a handle, it's only useful to me for objects that live indefinitely in my process. This seems like a significant detractor. Also if handles can't be deleted, they can't be used to implement the direct access calling convention. In my Go SDK I've implemented support for handle deletion, and it is tricky, but it is the only way to make handles useful for definite-lifetime objects.

I would assume that a handle can be deleted and that the corresponding resources will be reclaimed once those updates are collected.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand, I think Handles semantically mean a "bidding" between an Instrument (which has a process lifetime) and a LabelSet (which if I understand correctly has the same lifetime) so it does not extend any lifetime or does not have any lifetime concern.

If LabelSet does not have a process lifetime then we need a delete there as well. Personally I see no reason to have delete (I inherit that from Prometheus I think but never seen a real use-case for these). So I am happy to just remove the Delete.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my thinking, LabelSet objects do not have a lifetime managed by the SDK. They are values that the caller is allowed to refer to, they will be garbage collected when no longer used or released when the object is finalized. Think of a LabelSet as a value that can be used for a fast lookup, not as something registered by the SDK with a managed lifetime.

On the other handle, there is a presumption that handles take resources and should support delete. Prometheus clients do support delete as well, for example: https://godoc.org/github.com/prometheus/client_golang/prometheus#CounterVec.Delete

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Delete in prometheus is to delete the "Metric" and no longer report it, so should be called when no more values will be recorded for that "Metric" (metric in the description of the Delete function is what it is called Timeseries in OpenMetrics).

To understand that some internal infos are required about prometheus implementation:
Metric {
Map<Map<String, String> /labels/, Value /usually some atomic var/> timeseries
}

Delete will remove the entry from the timeseries map so will never be reported again. So deleting the handle should not be used as in the examples you gave.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It suggests that you can only delete handles after you stop using them and force a collection, somehow. This doesn't look well addressed in Prometheus, but the documentation for Delete is pretty clear about its purpose, and I believe it's the same purpose.

I'm asking for more than Prometheus offers. I intend for the implementation of the direct calling convention to be: (1) acquire handle, (2) operate, (3) delete handle.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand if we are on the same page about prometheus use of Delete which is different than what you propose. I think prometheus use has a lot of benefits, also keep in mind that an implementation is free to not allocate anything in any map or anywhere so you can still avoid holding memory if that is your goal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not very clear how exactly Prometheus intends their Delete method to be used, but it does accomplish the same goal that I have, which is to free memory. I now understand that this discussion relates with Prometheus semantics, which require all memory to be pinned. The only way to release memory is to expunge it and reset the process start timestamp, i.e., force a reset.

It does appear that a Prometheus user can achieve the effect that I want if they: (1) issue final metric updates on a handle, (2) force a Collection, (3) Delete the timeseries to free resources. This allows the data to be collected and memory to be released, but it breaks the semantics of Prometheus, because now a metric that once existed as actually disappeared, which is unexpected.

Now consider a statsd exporter that pushes data out periodically--it benefits from the same aggregation as the prometheus client in this case, but it's able to free memory with correct semantics. After you send a gauge update, you're free to forget that gauge ever existed. This is a fundamental difference. If I have a process that's going to be up for days, I want it to free the bits of memory that it's using to store unused metrics. I want all the performance benefits of handles, but cannot afford for the memory not to be reclaimed.

I see this as a major drawback of the Prometheus model, which can be fixed by pushing data into a server that keeps state. This is where I expect OpenTelemetry clients to head -- pushing aggregated metrics to an opentelemetry service that can export Prometheus.

As you know, an implementation is expected to implement handles for efficiency. I've stated that my intention is to use handles as the underlying implementation for direct calls. I.e., Counter.Add() will be implemented as (1) Get handle, (2) Add(), (3) Release handle <--- (i.e., delete the handle). I'm willing to stay away from the word "Delete" because it implies deletion of a timeseries. I'm looking for "release" the handle, in order to forget it and allow memory to be reclaimed in a situation where Prometheus semantics are not requiring the memory to be kept. I don't consider an SDK that does "not allocate anything" to be an option--we expect handles to be fast, I just don't expect them to tie up memory for the lifetime of the process.

This proposal requires that we implement a method to release handles, and I'm flexible on naming. How do you like Forget() or Release()?

specification/api-metrics-user.md Show resolved Hide resolved
specification/api-metrics-user.md Show resolved Hide resolved
Re-usable `LabelSet` objects provide a potential optimization for
scenarios where handles might not be effective. For example, if the
label set will be re-used but only used once per metric, handles do
not offer any optimization. It may be best to pre-compute a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you can "pre-compute" the LabelSet then you can "pre-compute" the Handle which is a bit more optimal, so I think this should say that if there is an opportunity to "pre-compute" the LabelSet then user should use Handle.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can have both. You can compute one label set and then use it to construct N handles.

(As a user I am still not interested in using handles. I do not wish to allocate one field per metric that I use regardless of the optimization win.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a lot more language about the relative performance of handles and label sets. I also added a language-optional spec for ordered-LabelSet construction.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"It may be best to pre-compute a canonicalized LabelSet once and re-use it with the direct calling convention."

Does this refer to process level lifetime or request level lifetime?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not intend for LabelSets to have a lifetime--they need not tie up any resources inside the SDK. Only when you associate a LabelSet with an instrument (i.e., bind a handle), only then are resources consumed that should potentially be deleted when the handle lifetime ends.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even then I think it is up to the implementation, you can return a Bind/Handle that does not reference any internal memory, but that allows us to also reference a small amount of extra memory (which to be honest is way less than the LabelSet because we are talking about an Atomic number).

Copy link
Contributor Author

@jmacd jmacd Oct 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the proposed Go SDK, I allocate these small amount of extra memory the way you would expect. Handles are supported. These small amounts of memory have to be freed when they are no longer in use. Handles support a delete method so that after they've been fully collected, the memory can be freed. How about Release() or Forget()?

specification/api-metrics-user.md Show resolved Hide resolved
specification/api-metrics-user.md Outdated Show resolved Hide resolved
specification/api-metrics-user.md Show resolved Hide resolved
specification/api-metrics-user.md Outdated Show resolved Hide resolved
`Add()`, `Set()`, and `Record()` method for submitting individual
metric events.

### Observer metric
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look at this example in Java for runtime metrics where I was able to use Handles and also probably win some performance:
https://github.com/open-telemetry/opentelemetry-java/blob/master/contrib/runtime_metrics/src/main/java/io/opentelemetry/contrib/metrics/runtime/GarbageCollector.java#L59

I was able to pre-compute all the Handles.

Also I like (ok don't blame me that I like my design there) the way I passed an object called "Result" or "Update" to the callback that is used to construct the result.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your example was a bit surprising. Why not let each Observer handle have its own callback? That seems more natural. Not saying it's bad, but letting handles use their own callback could allow parallelization.

Instead of Result.put, I would expect Result.observe, fwiw.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parallelization can be implemented if really needed in the callback. The most cases where I had to use the Observers were cases where multiple values (for more than one Handle) are gotten using one syscall (or any other mechanism). For example here https://github.com/open-telemetry/opentelemetry-java/blob/master/contrib/runtime_metrics/src/main/java/io/opentelemetry/contrib/metrics/runtime/MemoryPools.java#L94 you can see that I use heapUsage result to update 3 handles, if one callback per Handle I need to get the heapUsage 3 times.

Also I saw another interesting pattern, where a syscall (or any mechanism to get the metric) requires the data to update different Handles in different metrics (this is a bit more rare than the example I gave to you).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The put vs observe I have 0 problem changing that. Put is a Java specific keyword so happy to use observe there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I propose to return to the OTEP 0008 on metric observers and pin down a few more details. I'm removing it from this draft.

Copy link
Member

@bogdandrutu bogdandrutu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry got trapped. Thought that the whole PR was just that small fix

@jmacd
Copy link
Contributor Author

jmacd commented Oct 12, 2019

I've run out of time to respond to these remarks today. Back at it next week.

metric events, but offer varying degrees of performance and
convenience.

#### Metric handle calling convention
Copy link
Member

@yurishkuro yurishkuro Oct 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider the name "Bound Instruments", created via Instrument.Bind(labels) method. A handle is essentially an instrument (a thing used to record measurements), so imo introducing a new concept of "handle" is more cognitive overhead than a "slightly different/optimized instrument".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bogdandrutu Do you agree? I think it's a good idea.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is just a name change, I am usually very flexible with that, if it does change behavior I would like to understand what changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's just a name change. We would replace "Instrument handle" with "Bound instrument", "GetHandle" with "Bind", etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall I update the names in this PR?


### Observer metric

Observer metrics do not support handles or direct calls. The
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fwiw, I am very confused by the observers and what they are used for. Could use some examples / use cases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked that example, it didn't help. How is that different from having a timer thread that would bulk-publish metrics?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general you can see these two as equivalent. There are some important differences:

  1. With Observer implementation can better control which metrics to read, as an example some metrics can be disabled and we don't have to read them at all (which is not possible with the timer because the timer callback will get executed anyway).
  2. With Observer pattern in case of a pull based model like Prometheus, we do execute the read of the metrics only when an external event ask for the metric (in case of Prometheus when the server does a request to the client).
  3. With Observer the some implementations can give users an uniform way to control the read interval for these metrics.
  4. In some languages is not that easy to start a timer (for example in Java that will be around 50 (estimated) lines of code). So having this logic in one place is better. Also timer is not free and having the possibility to implement all these callbacks with less timers may be something that some implementations may consider.

These are some important advantages to have the logic of reading these kind of metrics controlled by the implementation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The text describing Observer says nothing about all these aspects, like timers, interval control, what triggers the callback, etc. Can we remove it from this PR and do another one dedicated just to the Observer, with more detailed description / specification?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong opinion about doing it in a different PR or in this one, but I do agree that things that I mentioned in the previous comment should be documented.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will have to add more content, I'm happy to iterate in this PR because there are many open questions that will delay it. The OTEP has details: https://github.com/open-telemetry/oteps/blob/master/text/0008-metric-observer.md. The argument is that it's most efficient when the exporter or whomever is pulling metrics determines the frequency that metrics are captured. This is important in cases where the metrics are expensive to compute.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I propose to return to the OTEP 0008 on metric observers and pin down a few more details. I'm removing it from this draft.

Copy link
Contributor Author

@jmacd jmacd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As there are a lot of open questions here, I will keep this PR open and push one update at the end of today. Meanwhile, please consider the questions raised by me and others, here.

specification/api-metrics-user.md Outdated Show resolved Hide resolved
specification/api-metrics-user.md Outdated Show resolved Hide resolved
specification/api-metrics-user.md Outdated Show resolved Hide resolved
specification/api-metrics-user.md Outdated Show resolved Hide resolved
metric events, but offer varying degrees of performance and
convenience.

#### Metric handle calling convention
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bogdandrutu Do you agree? I think it's a good idea.

specification/api-metrics-user.md Outdated Show resolved Hide resolved
specification/api-metrics-user.md Outdated Show resolved Hide resolved
`Add()`, `Set()`, and `Record()` method for submitting individual
metric events.

### Observer metric
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your example was a bit surprising. Why not let each Observer handle have its own callback? That seems more natural. Not saying it's bad, but letting handles use their own callback could allow parallelization.

Instead of Result.put, I would expect Result.observe, fwiw.


### Observer metric

Observer metrics do not support handles or direct calls. The
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will have to add more content, I'm happy to iterate in this PR because there are many open questions that will delay it. The OTEP has details: https://github.com/open-telemetry/oteps/blob/master/text/0008-metric-observer.md. The argument is that it's most efficient when the exporter or whomever is pulling metrics determines the frequency that metrics are captured. This is important in cases where the metrics are expensive to compute.

specification/api-metrics-user.md Outdated Show resolved Hide resolved
@bogdandrutu
Copy link
Member

Did you forget to push your change?

| Unit | WithUnit(string) | Units specified according to the [UCUM](http://unitsofmeasure.org/ucum.html). |
| Recommended label keys | WithRecommendedKeys(list) | Recommended grouping keys for this instrument. |
| Monotonic | WithMonotonic(boolean) | Configure a counter or gauge that accepts only monotonic/non-monotonic updates. |
| Absolute | WithAbsolute(boolean) | Configure a measure that does or does not accept negative updates. |
Copy link
Contributor Author

@jmacd jmacd Oct 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made this change considering your feedback on the previous PR. We've got two options here, WithMonotonic(boolean) and WithAbsolute(boolean). To construct a measure instrument for signed values, you would pass WithAbsolute(false) as the option. Sound good? I will update the other document accordingly in a parallel PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SergeyKanzhelev
Copy link
Member

My understanding of this PR is that in principle it looks good. And now there are many questions about naming and behavior of specific methods.

@jmacd @bogdandrutu @yurishkuro @rghetia @lzchen do you believe those issues may be filed as a separate items and marked for v0.3 while this PR is merged? Or the entire PR postponed to the next milestone?

@rghetia
Copy link
Contributor

rghetia commented Oct 16, 2019

My understanding of this PR is that in principle it looks good. And now there are many questions about naming and behavior of specific methods.

@jmacd @bogdandrutu @yurishkuro @rghetia @lzchen do you believe those issues may be filed as a separate items and marked for v0.3 while this PR is merged? Or the entire PR postponed to the next milestone?

I prefer on merging this PR and sorting out the remaining issues in v0.3 rather than moving this entirely to v0.3

@SergeyKanzhelev SergeyKanzhelev added this to the Alpha v0.2 milestone Oct 16, 2019
@jmacd
Copy link
Contributor Author

jmacd commented Oct 16, 2019

There appears to be a disagreement about the handle Delete method. I can't see a way around it. I recognize it means we have to implement garbage collection or add locking to achieve the semantics, which is why I'm also working on a (hopefully) convincing lock-free algorithm for maintaining state in the SDK (open-telemetry/opentelemetry-go#172).

I believe we could quickly modify "Instrument handle" to "Bound instrument" as a follow-on, I'd rather not continue modifying this PR. I also intend to supply a document detailing the Meter APIs, but it's a very mechanical document and I do not expect it to raise any disagreements.

@SergeyKanzhelev
Copy link
Member

@jmacd can we ship without Delete now? it will cripple scenarios supported but we can introduce it later

@jmacd
Copy link
Contributor Author

jmacd commented Oct 16, 2019

I think we are seeing that it is not trivial to add Delete to a system that was designed without it, so this worries me. My position is that in order to implement the direct calling convention most reasonable implementations will (1) get handle, (2) operate, (3) delete handle. Deleting a handle means removing an external reference to some internal state. The internal state should be kept until it is collected and then it can be garbage collected if there are no active handles.

I would like to hear a convincing plan that implements the direct calling convention without using handles underneath before we remove Delete.

@jmacd
Copy link
Contributor Author

jmacd commented Oct 17, 2019

@bogdandrutu @SergeyKanzhelev I am willing to take Delete out for now. I also want to keep something, so I wonder if we could call this Forget(), instead, and specify that an application is not required to implement garbage collection, particularly that when the data semantics require memory, as it does to compute a sum from a series of Add() calls. In the monotonic cases, the semantics are calling for Forget() to just collect the handle, not delete anything from the underlying memory.

@SergeyKanzhelev
Copy link
Member

@jmacd I believe we will need a variant of Delete. So I only propose to limit API for now to give us time to discuss how to put it back. Not suggesting to eliminate it all together for 1.0.

With this in mind it's easier if we will simply remove the paragraph about it for now.

@jmacd
Copy link
Contributor Author

jmacd commented Oct 17, 2019

Done @SergeyKanzhelev

@SergeyKanzhelev
Copy link
Member

@bogdandrutu @yurishkuro @rghetia @lzchen please file issues off the every outstanding conversation. Merging the rest

Copy link
Member

@bogdandrutu bogdandrutu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to make progress, let's submit the PR as it is and fix outstanding comments in a future PR.

@SergeyKanzhelev
Copy link
Member

merging with the outstanding comments to be addressed.

@SergeyKanzhelev SergeyKanzhelev merged commit 9385578 into open-telemetry:master Oct 17, 2019
@jmacd jmacd deleted the jmacd/metrics-spec4 branch October 17, 2019 19:55
SergeyKanzhelev pushed a commit to SergeyKanzhelev/opentelemetry-specification that referenced this pull request Feb 18, 2020
* Edits, expansion

* Minor fix for AloisReitbauer's feedback on PR250

* Describe metrics event context; make use of global Meter recommended for allocating static metrics

* Feedback part 1

* Feedback part 2

* Feedback part 3

* Feedback part 4

* Revert api-metrics.md fix

* Comment on convenience overload to bypass LabelSet

* Change signed/non-negative to absolute

* Remove observer instrument from this spec for v0.2

* Remove Delete for 0.2
TuckTuckFloof pushed a commit to TuckTuckFloof/opentelemetry-specification that referenced this pull request Oct 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants