Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Errors in Exponential Histogram Mapping #3630

Open
MadVikingGod opened this issue Jul 31, 2023 · 7 comments
Open

Errors in Exponential Histogram Mapping #3630

MadVikingGod opened this issue Jul 31, 2023 · 7 comments
Assignees
Labels
spec:metrics Related to the specification/metrics directory triage:accepted:ready-with-sponsor

Comments

@MadVikingGod
Copy link
Contributor

Problem

When describing the exponential histograms and how to do the mapping in the DataModel all of the formulas have the property of monotonicity, e.g. for two points $a \lt b$ then $index_a \le index_b$. This is true for the first MapToIndex, but not for the second MapToIndex with an exact power of two cases added. When you make the correction, if any of the values around the power of two are in error because of floating point rounding, then they will violate the monotonicity property. For example, see the MapToIndex function around 2^8 image1. This happens similarly for values around 2^-8 and is only made worse for larger exponents.

MapToIndex_8

Solutions

Use a different formula

The Main introduction of error in the current formula is that Log() of large numbers is not accurate. We can eliminate this error by bounding the log to a smaller range. If we split the value into exponent and frac, using Frexp, we can then only take the log of the fraction. Because the fraction is bounded between 0.5 and 1, the log will have a bounded error.

The change would look like this:

// MapToIndex for any scale, exact for powers of two.
func MapToIndex(value float64, scale int) int {
    // Special case for power-of-two values.
    frac, exp := math.Frexp(value)
    scaleFactor := math.Ldexp(math.Log2E, scale)
    return exp<<scale + int(math.Log(frac)*scaleFactor) - 1
}

Remove the "exact" optimization

By only recommending the first MapToIndex function, we can remove the error. This is the simplest solution but the least optimized for any scale above 0.

Appendix

Proof of New Formula

$$\begin{gather*} & & base = 2^{2^{-scale}} & & \\\ & & scaleFactor = \frac{2^{scale}}{Log(2)} & & \\\ base^{index} & < & value & \le & base^{index+1} \\\ 2^{index*2^{-scale}} & < & value & \le & 2^{(index+1)*2^{-scale}} \\\ index*2^{-scale} & < & Log_2(value) & \le & (index+1)*2^{-scale} \\\ index & < & Log_2(value)*2^{scale} & \le & index+1 \\\ index & < & Log_2(2^{exp} * frac)*2^{scale} & \le & index+1 \\\ index & < & (exp + Log_2(frac))*2^{scale} & \le & index+1 \\\ index & < & exp*2^{scale} + Log_2(frac)*2^{scale} & \le & index+1 \\\ & & exp*2^{scale} + Log(frac)*scaleFactor & \le & index+1 \\\ index & \ge & exp*2^{scale} + Log(frac)*scaleFactor - 1 \\\ \end{gather*}$$
@MadVikingGod MadVikingGod added the spec:metrics Related to the specification/metrics directory label Jul 31, 2023
@oertl
Copy link

oertl commented Aug 1, 2023

If I have understood this proposal correctly, it relies on the monotonicity of the implementation of the log function (which may not always be given). Otherwise, non-monotonic transitions could still occur at bucket boundaries that are not powers of two.

Btw, I had proposed an exact, monotonic, and platform-independent mapping which is also faster (see implementation here) based on a lookup table. However, this proposal was finally rejected as it is not feasible for very large scales beyond 10 and a restriction of the scale was not desired.

@MadVikingGod
Copy link
Contributor Author

This solution does rely on the monotonicity of Log, but only for the range of [0.5, 1.0). This bounds the rounding error and should be monotonic on all implementations.

The problem with the current recommended code is that it assumes that Log() is accurate near powers of two and uses a shortcut to calculate the bin using integer math. But at large numbers, this assumption doesn't hold and is worse at higher powers of two.

Current code:

// MapToIndex for any scale, exact for powers of two.
func MapToIndex(value float64) int {

// *** This is what causes the dip in the image above
    // Special case for power-of-two values.
    if frac, exp := math.Frexp(value); frac == 0.5 {
        return ((exp - 1) << scale) - 1
    }

    scaleFactor := math.Ldexp(math.Log2E, scale)
    // Note: math.Floor(value) equals math.Ceil(value)-1 when value
    // is not a power of two, which is checked above.
    return math.Floor(math.Log(value) * scaleFactor)
}
``

@oertl
Copy link

oertl commented Aug 1, 2023

The specification does not require monotonicity and the recommended implementation is not binding. It is probably a matter of taste which implementation should be recommended. There are different criteria according to which one could make a recommendation for a special implementation. Unless one can rely 100% on the monotonicity of the log function implementation, none of the suggestions support monotonicity. I personally would rather prefer recommending the simplest implementation without the shortcut for powers of 2 that is very rarely used in practice anyway:

// MapToIndex for any scale.
func MapToIndex(value float64) int {
    scaleFactor := math.Ldexp(math.Log2E, scale)
    return math.Ceil(math.Log(value) * scaleFactor) - 1
}

Perhaps it would be best not to recommend any particular implementation at all. That would make this issue obsolete.

@MadVikingGod
Copy link
Contributor Author

I said I found this because the monotonicity was violated, but I think I would change my argument that a slightly more complicated formula can avoid the total error introduced by this method. With the addition of two integer operations, we can eliminate all of the rounding errors for all scales.

I dove deeper into how many points will be in error over the entire domain of the index and hacked together a program to calculate it. I found that there is some amount of points in error at almost every index point, not just powers of two, and the number only gets worse the further away from zero the index is.

MapToIndex_Error_graph

But if you notice the blue line, that is my proposed solution, which is zero for all points. This is an error that we can avoid, and the recommendation should point developers toward it.

As to removing any implementation, I think that developing this portion of an SDK without an example would make it very difficult for future languages to implement. I personally don't think I would have come up with any of this without good examples to support it.

@austinlparker
Copy link
Member

Josh, if you don't want this then feel free to unassign yourself, but it seemed up your alley.

@jmacd
Copy link
Contributor

jmacd commented May 29, 2024

@MadVikingGod I agree with your analysis, and I also support @oertl's response. We are aware of the potential for errors when using relatively simple formulas. We tried to acknowledge this in:

https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/data-model.md#all-scales-use-the-logarithm-function

The use of math.Log() to calculate the bucket index is not guaranteed to be exactly correct near powers of two. Values near a boundary could be mapped into the incorrect bucket due to inaccuracy. Defining an exact mapping function is out of scope for this document.

When OTel worked out this design, as in open-telemetry/oteps#149, open-telemetry/opentelemetry-proto#226, and #1776, another prototype algorithm was explored.

@MadVikingGod I support your alternative implementation that reduces error. I assume it is modestly more expensive, but I don't think that is very important to users given the absolute costs are still low. I appreciate @oertl's variation here but we know it is incorrect at the extremes of the range. Not sure whether it is best to add another recommendation in the specification, modify the existing recommendation, or remove all the recommendations.

Both @oertl and @yzhuge demonstrated exact/correct table-lookup based algorithms with O(1) time complexity and O(2^(-scale)) space complexity. I implemented this to make sure I understood it, and this could be used as the basis of a new implementation in Go if there was interest. For my taste, the complexity cost of the table lookup approach makes it undesirable, not the space cost).

@jmacd jmacd added [label deprecated] triaged-accepted [label deprecated] Issue triaged and accepted by OTel community, can proceed with creating a PR and removed triage:deciding:tc-inbox labels May 29, 2024
@danielgblanco danielgblanco added triage:accepted:ready-with-sponsor and removed [label deprecated] triaged-accepted [label deprecated] Issue triaged and accepted by OTel community, can proceed with creating a PR labels Jun 3, 2024
@danielgblanco
Copy link
Contributor

Hey @jmacd we're assuming you're the sponsor for this one. I've also used the new triage labels (to avoid confusion).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
spec:metrics Related to the specification/metrics directory triage:accepted:ready-with-sponsor
Projects
Status: Spec - In Progress
Development

No branches or pull requests

6 participants