-
Notifications
You must be signed in to change notification settings - Fork 896
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adopting DDSketch as the default ValueRecorder aggregation #919
Comments
Is there a C or Rust implementation? I still wouldn't want it to be the default in Erlang/Elixir since it would require additional dependencies to build and a user may just want the SDK for tracing, but I'm not even sure there is an available implementation to use from Erlang/Elixir with native bindings. |
There is a Rust implementation derived from the DD-Go implementation: https://github.com/mheffner/rust-sketches-ddsketch The OTel-C++ repository has an implementation by @ankit-bhargava that (I believe) is written from the paper, not derived from one of the DD implementations. This may highlight the point I made above, that the pseudocode in the paper is not equal to the DD implementations and it's not clear what differences that implies. In the call today I mentioned that not all languages will have a DDSketch implementation. We could:
|
One of the paper's authors here. I'll add some comments to the discussion to hopefully help the decision. There are a few divergences between our implementations. The Most notably there are divergences in the mappings between values and bucket indexes. However, we can reconcile all of them by using the following mapping from the Java implementation. Only two parameters are necessary:
The other parameters mentioned above ( As for the bucket counts (which are basically an associative array that maps bucket indexes to the number of values that belong to the buckets), the encoding suggested above works (I believe a If data size matters, I'd suggest adding to the protocol an additional way of encoding the counts with an offset and an "array", given that in practice most non-empty buckets are contiguous. That would allow storing the counts of contiguous non-empty buckets more efficiently, while also using the bucket-indexed encoding for the sparse non-empty buckets.
As an example,
Instead of the more costly alternative:
For cases where a bucket would happen to be encoded both in the contiguous way ( The above version allows keeping track of positive values. We also need a specific counter for 0, as well as the same set of four fields for negative values, if we also want to track them (see the signed version of DDSketch in Java):
with:
As for the count type, if all what we need is computing quantiles, note that we don't actually need integer precision, neither do we need 64-bit float precision, so I believe we could in theory go for |
@jmacd Would it really make sense to have spec specify an aggregator as default for GA that has only been implemented in one of our SIGs so far (if I get the discussion right)? It looks like most SIGs would end up using the fallbacks you mentioned above rather than being aligned, right? |
@CharlesMasson thank you for the detailed explanation. To @arminru's point, in today's SIG I agreed to write up what it would take to make DDSketch the default before GA. We have to admit that this protocol is relatively complicated and that some SDKs will not be able to adopt it quickly. There is a lot of enthusiasm for DDSketch in general, which has us leaning toward adopting it if the necessary code can be produced quickly, a question for @mikezvi if DD will commit these resources. Using the protocol you described last for reference,
I have a question or two the non-contiguous buckets and what we may or may not know about the relative accuracy guarantee, but I think I'll reread the paper again before asking them 😀. I wonder about the zero count, which is a zero-width bucket and whether it will create trouble for some implementations. For your question about OTLP in general does not (yet) have a way to represent sampled data counts, though. Issue open-telemetry/opentelemetry-proto#73 discusses this topic (also open-telemetry/opentelemetry-proto#159). As for the list of steps required, here's what I came up with:
Note: these are loosely ordered. 1 and 2 are prerequisites for 6 and 7, and lot of other SDK work has to happen before steps 4 and 5. |
Note that there are independent questions about this data point:
|
@jmacd Note that there are also well-established histograms like HdrHistogram which comes with a lot of implementations in different languages, some of which are already used in production (for example in Elastic) and might be a better option in terms of GA-readiness. I understand, however, that there are certain weaknesses with HdrHistogram that DDSketch addressed. Circllhist (C impl, JS impl) and UDDSketch (apparently an advancement of DDSketch, with a C impl) are some other alternatives that I came across. |
@arminru Yes, it's starting to look like there are many variations on histograms that use variable boundaries and a compressible representation. This suggests we focus on more flexible ways to encode bucket boundaries, but that otherwise the proposed DDSketch data point is no different from a Histogram data point. |
I will propose I will open a new proposal along the lines noted above. |
Issue #636 describes a number of considerations surrounding the choice of a default aggregation for
ValueRecorder
instruments, where "Last Value", "Histogram", "MinMaxSumCount", "Sketch", and "Exact" are the choices, and prior draft specifications have stated MinMaxSumCount as the default, which corresponds with a Prometheus Summary (Sum, Count) containing the p0 and p100 percentiles (a.k.a. Min and Max)--this is a trivially mergeable summary, but not always the most desired format.A leading option that has wide support is the use of DDSketch, which has a number of DataDog open-source implementations (e.g., Golang, Python, Java).
See the protocol definition used in the DataDog agent. This is (unfortunately) not commented. This issue proposes that we settle the open question in #636 as proposed, by specifying DDSketch as the default aggregation. To make this happen, ideally a contributor from DataDog would make a pull request to https://github.com/open-telemetry/opentelemetry-proto with a documented protocol (AI: @mikezvi find us contributor(s)?) for DDSketch aggregations.
Reviewing the current agent payload protocol used by DataDog suggests that the protocol makes assumptions about the configuration of the algorithm. Compare and contrast the implementation used by the DD agent:
https://github.com/DataDog/datadog-agent/blob/master/pkg/quantile/config.go
and the "clean" implementation here:
https://github.com/DataDog/sketches-go/tree/master/ddsketch
We see different default configurations. In the agent:
and in the separate implementation:
The paper states, " As an example, for α = 0.01, a sketch of size 2048 can handle values from 80 microseconds to 1 year, and cover all quantiles." I studied this for a while and had trouble understanding how to get from the settings above to the value of either 80 microseconds or 1 year--particularly note that the paper does not discuss the
defaultMinValue
setting and that the code contains a concept of "key offset" that is also not included in the paper. DataDog (@mikezvi) I would like your help clarifying some of these aspects and, ideally, publishing more exact pseudocode descriptions of the actual algorithm used. Simply stated: your pseudocode in the paper does not closely match your real code. For example, Algorithm 3 in the paper is under 10 lines of pseudocode and uses a linear scan, whereas it is actually implemented by helpers namedgrowLeft
andgrowRight
(~80 lines of code). These helpers are not symmetric, not documented, and not described in the paper. I'd like to see more.Assuming we can resolve these issues (which I'm confident we can), the DDSketch algorithm looks like a winner. I believe that the DD agent-payload protocol linked above is based on an assumption that the default values are fixed, not variable, as they are not included in the protocol. Based on this, a rough draft for the documented protocol might look like:
As you can see, I added three fields to express the configuration parameters used in the sketch. I have concerns about representing these values directly, though, since they do not clearly map into the minimum and maximum values expressible by the sketch (i.e., the connection between these parameters and the 80 microseconds and 1 year figures are not clear, and it's those figures that users actually care about).
Resolves #636.
Closes open-telemetry/oteps#117.
Closes open-telemetry/oteps#118.
The text was updated successfully, but these errors were encountered: