Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a oneof for Histogram buckets; add ExplicitBuckets; deprecate DoubleHistogram #272

Closed
wants to merge 12 commits into from

Conversation

jmacd
Copy link
Contributor

@jmacd jmacd commented Mar 4, 2021

This deprecates the existing DoubleHistogram and creates a new one with room for more histogram types to be added in the future.

The oneof proposed here was first described in the issue #259 (comment) and has support from contributors working on #226, e.g., #226 (comment).

Fixes #259.
Alternate to #255.

@bogdandrutu
Copy link
Member

@jmacd and some folks had an offline chat, and I want us to make a decision between multiple bucket types vs multiple histogram types:

  • Pros for this: Simplify the number of types we will have in the metrics.
  • Cons for this: Extra allocations (one extra allocation in every point); Have to deal with a corner case where points have different bucket types.

opentelemetry/proto/metrics/v1/metrics.proto Outdated Show resolved Hide resolved
opentelemetry/proto/metrics/v1/metrics.proto Outdated Show resolved Hide resolved
opentelemetry/proto/metrics/v1/metrics.proto Outdated Show resolved Hide resolved
opentelemetry/proto/metrics/v1/metrics.proto Outdated Show resolved Hide resolved
@jmacd
Copy link
Contributor Author

jmacd commented Mar 5, 2021

Extra allocations (one extra allocation in every point)

These allocations will be real costs in the collector, but they're not a big deal in SDKs where the export frequency is relatively low.

A fancy protocol buffer implementation would help, for the collector case.

Have to deal with a corner case where points have different bucket types

Note that in #264 (comment) is a proposal to place parallel repeated fields into the Gauge, Sum, and Histogram points to reflect number-type variation. In #259, you propose to add per-bucket-style Histogram value types to the Metric oneof, where such variation currently resides. If we follow @victlu's proposal for histograms, to be consistent, we should place the parallel repeated fields inside a single Histogram. Note @victlu as you may be prototyping, we no longer desire to support number-type variation for histograms, but we do intend to support bucket-style variation. @bogdandrutu do you think it would reduce complexity if the (single) Histogram point had parallel fields, as follows?

message Histogram {
  AggregationTemporality aggregation_temporality = 2;

  repeated ExplicitHistogramDataPoint explicit_data_points = 1;
  repeated ExpontentialHistogramDataPoint exponential_data_points = 3;
}

In both cases, with number-type variation and with bucket-style variation, we do not see a semantic conflict but there is loss of information if we convert between them. By supporting parallel fields we reduce the number of overall oneofs, and we prevent users from having to observe parallel histograms or sums/gauges when there is only bucket-style or number-type variation. We avoid an allocation (however the points are correspondingly larger by 24 bytes in the collector case.).

@hdost
Copy link
Contributor

hdost commented Mar 6, 2021

Is the ExpontentialHistogramDataPoint the concept of the "HDR" histogram. As i believe the explicit bounds could technically hold a set of Exponential buckets as well.
Perhaps instead DynamicHistogramDataPoint, SparseHistogramDataPoint or something to that effect.

Co-authored-by: Aaron Abbott <aaronabbott@google.com>
@yzhuge
Copy link
Contributor

yzhuge commented Mar 8, 2021

@hdost Exponential buckets will handle high dynamic range. It is different from http://hdrhistogram.org/ in that http://hdrhistogram.org/ use "log linear" (linear subbuckets in log buckets), while Exponential will use log buckets only, for wider compatibility. We might add support for log linear in the future.

@yzhuge
Copy link
Contributor

yzhuge commented Mar 8, 2021

See open-telemetry/oteps#149 (Add exponential bucketing to histogram protobuf)
The new format is cleaner and simpler than the one proposed in #226, because the count array is no longer shared across bucket formats

@jmacd
Copy link
Contributor Author

jmacd commented Mar 15, 2021

@open-telemetry/specs-metrics-approvers note this PR fully incorporates #270.

@jmacd
Copy link
Contributor Author

jmacd commented Mar 19, 2021

We have decided to let each distinct form of histogram have its own data point kind.

@jmacd jmacd closed this Mar 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Histogram support multiple boundaries types vs different types of histogram aggregation
8 participants