Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Base-2 exponential histogram protocol support #322

Merged
merged 16 commits into from
Sep 17, 2021
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ Full list of differences found in [this compare.](https://github.com/open-teleme

### Added

* ExponentialHistogram is a base-2 exponential histogram described in [OTEP 149](https://github.com/open-telemetry/oteps/pull/149).
* Metrics data points add a `flags` field with one bit to represent explicitly missing data. (#316)

### Removed
Expand Down
114 changes: 114 additions & 0 deletions opentelemetry/proto/metrics/v1/metrics.proto
Original file line number Diff line number Diff line change
Expand Up @@ -176,6 +176,7 @@ message Metric {
// This field will be removed in ~3 months, on July 1, 2021.
IntHistogram int_histogram = 8 [deprecated = true];
Histogram histogram = 9;
ExponentialHistogram exponential_histogram = 12;
jmacd marked this conversation as resolved.
Show resolved Hide resolved
Summary summary = 11;
}
}
Expand Down Expand Up @@ -216,6 +217,16 @@ message Histogram {
AggregationTemporality aggregation_temporality = 2;
}

// ExponentialHistogram represents the type of a metric that is calculated by aggregating
// as a ExponentialHistogram of all reported double measurements over a time interval.
message ExponentialHistogram {
jmacd marked this conversation as resolved.
Show resolved Hide resolved
repeated ExponentialHistogramDataPoint data_points = 1;

// aggregation_temporality describes if the aggregator reports delta changes
// since last report time, or cumulative changes since a fixed start time.
AggregationTemporality aggregation_temporality = 2;
}

// Summary metric data are used to convey quantile summaries,
// a Prometheus (see: https://prometheus.io/docs/concepts/metric_types/#summary)
// and OpenMetrics (see: https://github.com/OpenObservability/OpenMetrics/blob/4dbf6075567ab43296eed941037c12951faafb92/protos/prometheus.proto#L45)
Expand Down Expand Up @@ -448,6 +459,109 @@ message HistogramDataPoint {
uint32 flags = 10;
}

// ExponentialHistogramDataPoint is a single data point in a timeseries that describes the
// time-varying values of a ExponentialHistogram of double values. A ExponentialHistogram contains
// summary statistics for a population of values, it may optionally contain the
// distribution of those values across a set of buckets.
//
message ExponentialHistogramDataPoint {
// The set of key/value pairs that uniquely identify the timeseries from
// where this point belongs. The list may be empty (may contain 0 elements).
repeated opentelemetry.proto.common.v1.KeyValue attributes = 1;

// StartTimeUnixNano is optional but strongly encouraged, see the
// the detiled comments above Metric.
jmacd marked this conversation as resolved.
Show resolved Hide resolved
//
// Value is UNIX Epoch time in nanoseconds since 00:00:00 UTC on 1 January
// 1970.
fixed64 start_time_unix_nano = 2;

// TimeUnixNano is required, see the detailed comments above Metric.
//
// Value is UNIX Epoch time in nanoseconds since 00:00:00 UTC on 1 January
// 1970.
fixed64 time_unix_nano = 3;

// count is the number of values in the population. Must be non-negative. This
// value must be equal to the sum of the "count" fields in buckets if a
jmacd marked this conversation as resolved.
Show resolved Hide resolved
// histogram is provided.
fixed64 count = 4;

// sum of the values in the population. If count is zero then this field
// must be zero. This value must be equal to the sum of the "sum" fields in
// buckets if a histogram is provided.
//
// Note: Sum should only be filled out when measuring non-negative discrete
jmacd marked this conversation as resolved.
Show resolved Hide resolved
// events, and is assumed to be monotonic over the values of these events.
// Negative events *can* be recorded, but sum should not be filled out when
// doing so. This is specifically to enforce compatibility w/ OpenMetrics,
// see: https://github.com/OpenObservability/OpenMetrics/blob/main/specification/OpenMetrics.md#histogram
jmacd marked this conversation as resolved.
Show resolved Hide resolved
double sum = 5;

// scale describes the resolution of the histogram. Boundaries are
// located at powers of the base, where:
//
// base = (2^(2^-scale))
//
// The histogram bucket identified by `index`, a signed integer,
// contains values that are less than or equal to (base^index) and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This defines the value at base^index as the upper bound of a bucket. I think lower bound is more intuitive, and computationally more convenient. "Round down" is generally more convenient in computing. For example, at positive scales, we can compute index by right shifting the exponent field in double representation.

Lower bound is more intuitive for humans. For example, bucket 0 will be [1, base) in lower bound, rather than (1/base, 1] in upper bound. In lower bound, we can also say 0 and above buckets map to 1 and above values; and negative buckets map to values less than 1. The mathematical neutral point of index space (0) and log space (1) coincide perfectly.

I known traditionally, Prometheus uses "le" upper bound buckets. But do we have to keep such compatibility? And from the current OM protocol (https://github.com/prometheus/client_model/blob/61b6c1aac0648ff14fae91ab976b8f2b30c5c5d3/io/prometheus/client/metrics.proto#L63-L73), it is not obvious if it still sticks the to upper bound semantic. Maybe @beorn7 can clarify?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't follow the argument of "more intuitive for humans". In fact, I see it the other way around.

The computational argument is minor and shouldn't tip the balance. Positive scales will be the exception, and even then, the actual difference in computational or algorithmic complexity doesn't really make a dent.

More generally (and along the lines I have explained before), this is at a level of detail that I don't feel qualified to discuss while we are still experimenting to get the basics right. Little knobs like this can easily be tweaked later in the process, based in the insights gained.

Having said that, if I had to take a guess, I would say it's likely Prometheus will stick to the "less or equal" upper bounds because it is consistent with Prometheus's existing histograms so that creating compatibility mappings is easier. There needed to be stronger arguments than "I save a single if frac == 0.5 { sparseKey-- } line in the code in the rare case of very low resolution histograms" to tip the balance.

Finally, to avoid misunderstandings: Everything you see in a "sparsehistogram" branch in a Prometheus repo is part of our experiment. There is no direct link to OpenMetrics, i.e. this is not part of some kind of "OpenMetrics 2.0 draft". First these experiments have to be done and insights have to be gained. Only then can we think about the actual implementation in a future Prometheus release, which will in turn inform how OpenMetrics will embrace sparse high-resolution histograms.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yzhuge I was on the fence about this issue -- whether the upper or lower bound should be inclusive -- until the comment from @oertl here: https://github.com/open-telemetry/opentelemetry-proto/pull/322/files/e680b5938be129ac8410f7ca66f0a50bcc4c4b52#r681484304

You have both recommended against the zero-tolerance field described here, and the remark by @oertl shows how it can be added later in a way that is backwards compatible with exclusive lower-bound buckets. I recommend that we avoid unnecessary conflict with the Prometheus group on this and keep the "less-than-or-equal" upper bound.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First: "inclusiveness/exclusiveness is irrelevant in practice". See previous discussion at open-telemetry/oteps#149 (review) (click on "outdated" to show the comment "CharlesMasson reviewed on Mar 17"). We should avoid inclusiveness/exclusiveness constraint in OTel standard.

I don't see why bucket index (base^index as upper vs lower bound) is relevant to the zero region width. The indexed buckets are "regular" buckets that meet the bound = base ^ i formula. The zero region bucket is a special bucket for zero and values near zero. https://github.com/open-telemetry/opentelemetry-proto/pull/322/files/e680b5938be129ac8410f7ca66f0a50bcc4c4b52#r681484304 seems to be introducing a third category of bucket [zero_tolerance, lowest_regular_bound]. This will add complexity. There was a proposal of linear buckets near zero, followed by exponential buckets further out. For now, I'd like to just stick to a single zero-region bucket and a sequence of regular exponential buckets.

Previous version says: "
// zero_tolerance may be optionally set to convey the width of the
// zero region. When this is not set, implementations may assume
// the zero tolerance equals the smallest representable positive
// IEEE 754 double-precision floating point number, which is (2^−1074)."

This only "suggests" that default is 2^-1074 (smallest positive number "double" can represent). It does not define the field as "the smallest nonnegative value that maps to a regular bucket". When it is not set, most producers mean "I don't know" or "I don't care". The consumer using 2^-1074 is assuming the narrowest zero region, which may not may not be what the producer meant. The key issue with default and backward compatibility is that Protobuf default value is 0, but zero_tolerance default is "unknown". If we remap 0 to unknown, we need another value for logical 0. There are plenty of choices, like -1. But all are confusing when physical 0 is not logical 0.

A better solution is to use an addition boolean field for "zero_tolerance_is_set". Booleans default to false in protobuf. So we know it defaults to "not set" (ie. "unknown"). When it is set, we use zero_tolerance value as is (including the protobuf default of 0) for the real zero_tolerance. Logically, this achieves a default of "Null".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we need to carry the explicit flag (see flags below) for "zero_tolerance_enabled", it's somewhat akin to what we proposed for handling monotonicity of sums in histograms. I'd suggest a similar approach if we need it.

From our (Google) perspective, we have an explicit zero bucket that ends where the first exponential bucket starts, so we don't really need an explicit tolerance here.

// greater than (base^(index-1)).
//
// The positive and negative ranges of the histogram are expressed
// separately. Negative values are mapped by their absolute value
// into the negative range using the same scale as the positive range.
//
// scale is not restricted by the protocol, as the permissible
// values depend on the range of the data.
sint32 scale = 6;

// zero_count is the count of values that are either exactly zero or
// within the region considered zero by the instrumentation at the
// tolerated degree of precision. This bucket stores values that
// cannot be expressed using the standard exponential formula as
// well as values that have been rounded to zero. Users have the
// option to set zero_tolerance to convey additional information about

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming zero_tolerance isn't being re-introduced anytime soon, it might be confusing to reference it from this comment.

// the width of the zero region.
//
// Implementations MAY consider the zero bucket to have probability
// mass equal to (zero_count / count).
fixed64 zero_count = 7;

// positive carries the positive range of exponential bucket counts.
Buckets positive = 8;

// negative carries the negative range of exponential bucket counts.
Buckets negative = 9;

// Buckets are a set of bucket counts, encoded in a contiguous array
// of counts.
message Buckets {
// Offset is the index of the first count expressed in this span.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Offset is the index of the first count expressed in this span" ->
"Offset is the bucket index of the first entry in the bucket_counts array"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

//
// Note: This uses a varint encoding as a simple form of compression.
sint64 offset = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong opinion on this one. Is it necessary to use varint here even though it is a single integer? Elsewhere we seem to favor fixed64


// Count is an array of counts, where count[i] carries the count
// of the bucket at index (offset+i), i.e., count[i] is the count
// of values less than or equal to base^(offset+i) and greater
// than base^(offset+i-1).
//
// Note: By contrast, the explicit HistogramDataPoint uses
// fixedt64. This field is expected to have many buckets,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/fixedt64/fixed64

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

// especially zeros, so has been selected to ensure varint
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean "so "uint64" has been selected to ensure varint encoding?"

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: looks fixedt64 is a typo for fixed64?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

// encoding.
repeated uint64 bucket_counts = 2;
}

// (Optional) List of exemplars collected from
// measurements that were used to form the data point
repeated Exemplar exemplars = 10;

// Flags that apply to this specific data point. See DataPointFlags
// for the available flags and their meaning.
uint32 flags = 11;
}

// SummaryDataPoint is a single data point in a timeseries that describes the
// time-varying values of a Summary metric.
message SummaryDataPoint {
Expand Down