-
Notifications
You must be signed in to change notification settings - Fork 256
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Histogram support multiple boundaries types vs different types of histogram aggregation #259
Comments
I've started thinking about one more alternative to your suggestion, also motivated by the comments in the tail of #226, in particular The idea is to combine the Discretization and Compaction fields into a single type, a
These two fields determine both buckets (Discretization) and counts (Compaction). I've been thinking about one other equally explicit form (mentioned in today's Data Model meeting):
This type is identical to ExplicitCountBuckets, but counts are floating-point values. This has two uses:
In the first example, a statsd In the second example, a Prometheus summary for quantiles [0.25, 0.5, 0.75, 0.9, 0.95, 0.99] with total count C, total sum S. Let the calculated quantile values equal the calculated values P25, P50, P75, P90, P95, P99. The output histogram is as follows:
Note that the final count bucket is redundant, since it represents (P99,+Inf), equal to the total count minus the sum of all bucket counts. (By this logic we could shorten The suggestion in @oertl's comment, bullets 1 and 2, could be summarized as follows: Anyway, please also consider this approach. I'm looking forward to @yzhuge's continued work on this, #226 (comment), and want to thank @githomin for this reminder:
|
I like the idea of each histogram types having its own count encoding. I made the bounds share one common count array in #226 only because I wanted to minimize changes. Since we haven't actually released the protocol, minimizing changes or backward compatibility is not a big concern. Thus we should be more forward looking than backward looking. To be clear, future version could look like: oneof buckets { Each "Buckets" message will have its own bounds and counts. |
@yzhuge I am currently working on a PR that does this basic factoring. I am going to propose that we ignore the difference between ExplicitCountBuckets and ExplicitSampleBuckets at this point and let all counts be doubles, but I will do that in a separate PR. Also I will never again mention this idea about Summary metrics. 😀 |
Fixes: open-telemetry#259 Signed-off-by: Bogdan Drutu <bogdandrutu@gmail.com>
Fixes: open-telemetry#259 Signed-off-by: Bogdan Drutu <bogdandrutu@gmail.com>
Fixes: #259 Signed-off-by: Bogdan Drutu <bogdandrutu@gmail.com>
During the meeting on Thu, Feb 11, there was a discussion about our data model supporting new aggregation types, and the answer was clearly we do support that, but that comment triggered me a new idea.
Instead of deprecating the current explicit boundaries from our current definition of Histogram, what if we just add a new type ExponentialHistogram, and that combined with #256 and #257 will keep the number of types at a reasonable number.
Advantages of this solution is that #226 can be a simple addition to the current data model, and also can improve the way how counts are stored for large number of buckets (like using int64 instead of fixed64, or even using other encoding that does not require to have a value for 0 buckets, etc.).
Another advantage is that we don't have to deal with situation where points in the same histogram metric have different boundaries types.
I would like to propose this for discussion and make a decision about this. I am not 100% convinced on one or the other, but I try to propose this possibility.
/cc @jmacd @open-telemetry/specs-metrics-approvers
The text was updated successfully, but these errors were encountered: