Optimizing the types used for intermediate calculations #477

kimikage · 2021-05-14T14:25:23Z

Currently, many Float64-based calculations are used for color space conversions and color difference calculations, regardless of the input/output type. However, Float32 is accurate enough for practical use in many color and image related applications.

I intend to speed up RGB{N0f8}-->XYZ{Float32}-->Lab{Float32} conversions and colordiff.

Probably the breaking change is that the colordiff for RGB{N0f8} will return Float32 instead of Float64.

The text was updated successfully, but these errors were encountered:

kimikage · 2021-06-20T16:06:44Z

BTW, I was wondering why the required accuracy of colordiff (DE_2000) was so low.

Colors.jl/test/colordiff.jl

Line 48 in ddfc19b

eps_cdiff = 0.01

And then I found two typos in the test data.

Colors.jl/test/colordiff.jl

Lines 36 to 37 in ddfc19b

    
           ((60.2574, -34.0099,  36.2577), (60.4626, -34.1751,  39.4387),  1.2644), 
        
           ((63.0109, -31.0961,  -5.8663), (62.8187, -29.7946,  -4.0864),  1.2530),

((60.2574, -34.0099, 36.2"6"77), (60.4626, -34.1751, 39.4387), 1.2644),
((63.0109, -31.0961, -5.8663), (62.8187, -29.7946, -4.0864), 1.2"6"30),

The fixes will allow us to minimize the tolerance (i.e. eps_cdiff = 0.00005).

kimikage · 2021-06-22T04:43:23Z

One of the most costly parts of the DE_2000 is the following t ( T ) calculation.

Colors.jl/src/differences.jl

Lines 201 to 206 in 9f27da0

    
           # hue weight 
        
           t = 1 - 0.17 * cosd(mh - 30) + 
        
                   0.24 * cosd(2mh) + 
        
                   0.32 * cosd(3mh + 6) - 
        
                   0.20 * cosd(4mh - 63) 
        
           sh = 1 + 0.015 * mc * t

When t is computed with Float32, the accuracy is about 8 ULP (the average is about 7 ULP), even though it requires about 40 ns.
I think this is a polynomial approximation of the whole t, rather than optimizing cosd(::Float32), which would be more advantageous in terms of accuracy.
I don't have a good idea about the Float64 case yet.

kimikage · 2021-06-27T05:26:07Z

The accuracy bottleneck in calculating DE_2000 is in delta_h. delta_h involves the calculation of sin(Δh/2), where Δh is the angle between the ( a*, b*) vectors of the two colors.

Prior to PR #476, this calculation was done in LCHab space. However, the conversion to LCHab space introduces numerical errors in the hue (atand). (PR #495 could also improve the accuracy of DE_2000, but it is not a solution for thee delta_h accuracy.)

PR #476 changed the code to calculate delta_h without conversion to LCHab space. (FYI, the expression is used in CIE94.) However, even with the method, if Δh is small, cancellation of significant digits will occur. This is essentially due to the fact that the method uses 2 * sin(Δh/2)^2 = 1 - cos(Δh). Probably, it would be more accurate to calculate sin(Δh/2) from tan(Δh) without going through cos(Δh).

Another problem related to delta_h is the rounding error in modifying the a* component (i.e. scaling with (1 + g)). The difference between the a* components of the two colors is important for the delta_h calculation, but currently the mdification results are rounded independently for the two colors.

Colors.jl/src/differences.jl

Lines 181 to 182 in bf61512

    
           a_Lab_r = Lab(a_Lab.l, a_Lab.a * (1 + g), a_Lab.b) 
        
           b_Lab_r = Lab(b_Lab.l, b_Lab.a * (1 + g), b_Lab.b)

kimikage · 2021-07-01T16:10:14Z

The following is effective against the precompile problem in that it does not call exp, but it did not improve the speed, possibly due to stalling caused by subnormal numbers. 🤔

Edit: I added the workaround for subnormal numbers. The Float64 version will not be adopted because the speed improvement is not significant compared to the total DE_2000 costs.

const DE2000_SINEXP_F32 = [Float32(π/3 * exp(-i)) for i = 0.0:0.25:87.25]
@inline function _de2000_rot(mh::Float32)
    dh2 = ((mh - 275.0f0) * (1.0f0 / 25))^2
    di = reinterpret(UInt32, dh2 + Float32(0x3p20))
    i = di % UInt16 # round(UInt16, dh2 * 4.0)
    i >= UInt16(350) && return 0.0f0 # avoid subnormal numbers
    t = (reinterpret(Float32, di) - Float32(0x3p20)) - dh2 # |t| <= 0.125
    sinexp = @inbounds DE2000_SINEXP_F32[i + 1] # π/3 * exp(-dh2) = (π/3 * exp(-i/4)) * exp(t)
    em1 = @evalpoly(t, 1.0f0, 0.49999988f0, 0.16666684f0, 0.041693877f0, 0.008323605f0) * t
    ex = muladd(sinexp, em1, sinexp)
    ex < eps(0.5f0) && return ex
    sn = @evalpoly(ex^2, -0.16666667f0, 0.008333333f0, -0.00019841234f0, 2.7550889f-6, -2.4529042f-8)
    return muladd(sn * ex, ex^2, ex)
end

kimikage · 2021-08-02T11:15:37Z

There is still room for this kind of optimization in many places. However, in the interest of importance, I would like to close this issue with PR #506.

This was referenced May 29, 2021

Clean up "src/differences.jl" #476

Merged

Consideration of workarounds for cbrt and exp precompilation problems #425

Closed

Source of the YIQ transform matrix #481

Closed

kimikage mentioned this issue Jun 16, 2021

Improve Lab<-->XYZ conversions #486

Merged

kimikage closed this as completed Jun 20, 2021

kimikage reopened this Jun 20, 2021

This was referenced Jun 20, 2021

Use Float32 to calculate DE_2000 for low precision colors #494

Merged

Use Float32 for the matrix coefficients for low precision colors #482

Merged

kimikage mentioned this issue Jul 19, 2021

Improve accuracy of delta_h for small differences #501

Merged

kimikage mentioned this issue Aug 2, 2021

Optimize distinguishable_colors #506

Merged

kimikage closed this as completed in #506 Aug 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimizing the types used for intermediate calculations #477

Optimizing the types used for intermediate calculations #477

kimikage commented May 14, 2021

kimikage commented Jun 20, 2021

kimikage commented Jun 22, 2021

kimikage commented Jun 27, 2021

kimikage commented Jul 1, 2021 •

edited

Loading

kimikage commented Aug 2, 2021

Optimizing the types used for intermediate calculations #477

Optimizing the types used for intermediate calculations #477

Comments

kimikage commented May 14, 2021

kimikage commented Jun 20, 2021

kimikage commented Jun 22, 2021

kimikage commented Jun 27, 2021

kimikage commented Jul 1, 2021 • edited Loading

kimikage commented Aug 2, 2021

kimikage commented Jul 1, 2021 •

edited

Loading