Errors in Exponential Histogram Mapping #3630

MadVikingGod · 2023-07-31T19:20:10Z

Problem

When describing the exponential histograms and how to do the mapping in the DataModel all of the formulas have the property of monotonicity, e.g. for two points $a \lt b$ then $index_a \le index_b$. This is true for the first MapToIndex, but not for the second MapToIndex with an exact power of two cases added. When you make the correction, if any of the values around the power of two are in error because of floating point rounding, then they will violate the monotonicity property. For example, see the MapToIndex function around 2^8 image1. This happens similarly for values around 2^-8 and is only made worse for larger exponents.

Solutions

Use a different formula

The Main introduction of error in the current formula is that Log() of large numbers is not accurate. We can eliminate this error by bounding the log to a smaller range. If we split the value into exponent and frac, using Frexp, we can then only take the log of the fraction. Because the fraction is bounded between 0.5 and 1, the log will have a bounded error.

The change would look like this:

// MapToIndex for any scale, exact for powers of two.
func MapToIndex(value float64, scale int) int {
    // Special case for power-of-two values.
    frac, exp := math.Frexp(value)
    scaleFactor := math.Ldexp(math.Log2E, scale)
    return exp<<scale + int(math.Log(frac)*scaleFactor) - 1
}

Remove the "exact" optimization

By only recommending the first MapToIndex function, we can remove the error. This is the simplest solution but the least optimized for any scale above 0.

Appendix

Proof of New Formula

$$\begin{gather*} & & base = 2^{2^{-scale}} & & \\\ & & scaleFactor = \frac{2^{scale}}{Log(2)} & & \\\ base^{index} & < & value & \le & base^{index+1} \\\ 2^{index*2^{-scale}} & < & value & \le & 2^{(index+1)*2^{-scale}} \\\ index*2^{-scale} & < & Log_2(value) & \le & (index+1)*2^{-scale} \\\ index & < & Log_2(value)*2^{scale} & \le & index+1 \\\ index & < & Log_2(2^{exp} * frac)*2^{scale} & \le & index+1 \\\ index & < & (exp + Log_2(frac))*2^{scale} & \le & index+1 \\\ index & < & exp*2^{scale} + Log_2(frac)*2^{scale} & \le & index+1 \\\ & & exp*2^{scale} + Log(frac)*scaleFactor & \le & index+1 \\\ index & \ge & exp*2^{scale} + Log(frac)*scaleFactor - 1 \\\ \end{gather*}$$

The text was updated successfully, but these errors were encountered:

oertl · 2023-08-01T13:31:26Z

If I have understood this proposal correctly, it relies on the monotonicity of the implementation of the log function (which may not always be given). Otherwise, non-monotonic transitions could still occur at bucket boundaries that are not powers of two.

Btw, I had proposed an exact, monotonic, and platform-independent mapping which is also faster (see implementation here) based on a lookup table. However, this proposal was finally rejected as it is not feasible for very large scales beyond 10 and a restriction of the scale was not desired.

MadVikingGod · 2023-08-01T15:13:48Z

This solution does rely on the monotonicity of Log, but only for the range of [0.5, 1.0). This bounds the rounding error and should be monotonic on all implementations.

The problem with the current recommended code is that it assumes that Log() is accurate near powers of two and uses a shortcut to calculate the bin using integer math. But at large numbers, this assumption doesn't hold and is worse at higher powers of two.

Current code:

// MapToIndex for any scale, exact for powers of two.
func MapToIndex(value float64) int {

// *** This is what causes the dip in the image above
    // Special case for power-of-two values.
    if frac, exp := math.Frexp(value); frac == 0.5 {
        return ((exp - 1) << scale) - 1
    }

    scaleFactor := math.Ldexp(math.Log2E, scale)
    // Note: math.Floor(value) equals math.Ceil(value)-1 when value
    // is not a power of two, which is checked above.
    return math.Floor(math.Log(value) * scaleFactor)
}
``

oertl · 2023-08-01T19:39:39Z

The specification does not require monotonicity and the recommended implementation is not binding. It is probably a matter of taste which implementation should be recommended. There are different criteria according to which one could make a recommendation for a special implementation. Unless one can rely 100% on the monotonicity of the log function implementation, none of the suggestions support monotonicity. I personally would rather prefer recommending the simplest implementation without the shortcut for powers of 2 that is very rarely used in practice anyway:

// MapToIndex for any scale.
func MapToIndex(value float64) int {
    scaleFactor := math.Ldexp(math.Log2E, scale)
    return math.Ceil(math.Log(value) * scaleFactor) - 1
}

Perhaps it would be best not to recommend any particular implementation at all. That would make this issue obsolete.

MadVikingGod · 2023-08-04T14:13:32Z

I said I found this because the monotonicity was violated, but I think I would change my argument that a slightly more complicated formula can avoid the total error introduced by this method. With the addition of two integer operations, we can eliminate all of the rounding errors for all scales.

I dove deeper into how many points will be in error over the entire domain of the index and hacked together a program to calculate it. I found that there is some amount of points in error at almost every index point, not just powers of two, and the number only gets worse the further away from zero the index is.

But if you notice the blue line, that is my proposed solution, which is zero for all points. This is an error that we can avoid, and the recommendation should point developers toward it.

As to removing any implementation, I think that developing this portion of an SDK without an example would make it very difficult for future languages to implement. I personally don't think I would have come up with any of this without good examples to support it.

austinlparker · 2024-05-07T20:46:20Z

Josh, if you don't want this then feel free to unassign yourself, but it seemed up your alley.

jmacd · 2024-05-29T16:17:39Z

@MadVikingGod I agree with your analysis, and I also support @oertl's response. We are aware of the potential for errors when using relatively simple formulas. We tried to acknowledge this in:

https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/data-model.md#all-scales-use-the-logarithm-function

The use of math.Log() to calculate the bucket index is not guaranteed to be exactly correct near powers of two. Values near a boundary could be mapped into the incorrect bucket due to inaccuracy. Defining an exact mapping function is out of scope for this document.

When OTel worked out this design, as in open-telemetry/oteps#149, open-telemetry/opentelemetry-proto#226, and #1776, another prototype algorithm was explored.

@MadVikingGod I support your alternative implementation that reduces error. I assume it is modestly more expensive, but I don't think that is very important to users given the absolute costs are still low. I appreciate @oertl's variation here but we know it is incorrect at the extremes of the range. Not sure whether it is best to add another recommendation in the specification, modify the existing recommendation, or remove all the recommendations.

Both @oertl and @yzhuge demonstrated exact/correct table-lookup based algorithms with O(1) time complexity and O(2^(-scale)) space complexity. I implemented this to make sure I understood it, and this could be used as the basis of a new implementation in Go if there was interest. For my taste, the complexity cost of the table lookup approach makes it undesirable, not the space cost).

danielgblanco · 2024-06-03T09:47:23Z

Hey @jmacd we're assuming you're the sponsor for this one. I've also used the new triage labels (to avoid confusion).

MadVikingGod added the spec:metrics Related to the specification/metrics directory label Jul 31, 2023

github-actions bot assigned yurishkuro Jul 31, 2023

austinlparker added the triage:deciding:tc-inbox label May 7, 2024

austinlparker assigned jmacd and unassigned yurishkuro May 7, 2024

jmacd added [label deprecated] triaged-accepted [label deprecated] Issue triaged and accepted by OTel community, can proceed with creating a PR and removed triage:deciding:tc-inbox labels May 29, 2024

danielgblanco added triage:accepted:ready-with-sponsor and removed [label deprecated] triaged-accepted [label deprecated] Issue triaged and accepted by OTel community, can proceed with creating a PR labels Jun 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Errors in Exponential Histogram Mapping #3630

Errors in Exponential Histogram Mapping #3630

MadVikingGod commented Jul 31, 2023

oertl commented Aug 1, 2023

MadVikingGod commented Aug 1, 2023

oertl commented Aug 1, 2023

MadVikingGod commented Aug 4, 2023

austinlparker commented May 7, 2024

jmacd commented May 29, 2024

danielgblanco commented Jun 3, 2024

Errors in Exponential Histogram Mapping #3630

Errors in Exponential Histogram Mapping #3630

Comments

MadVikingGod commented Jul 31, 2023

Problem

Solutions

Use a different formula

Remove the "exact" optimization

Appendix

Proof of New Formula

oertl commented Aug 1, 2023

MadVikingGod commented Aug 1, 2023

oertl commented Aug 1, 2023

MadVikingGod commented Aug 4, 2023

austinlparker commented May 7, 2024

jmacd commented May 29, 2024

danielgblanco commented Jun 3, 2024