Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make test-complex fails on 32-bit Linux (Heisenbug?) #10027

Closed
rickhg12hs opened this issue Feb 2, 2015 · 29 comments
Closed

make test-complex fails on 32-bit Linux (Heisenbug?) #10027

rickhg12hs opened this issue Feb 2, 2015 · 29 comments
Labels
system:32-bit Affects only 32-bit systems

Comments

@rickhg12hs
Copy link
Contributor

$ make test-complex
    JULIA test/complex
     * complex             exception on 1: ERROR: LoadError: test failed: (1.0 + NaN*im == 1.0 + 1.0im)
 in expression: round(Complex(1.5,0.5),RoundDown,RoundUp) == Complex(1.0,1.0)
 in error at error.jl:19
 in default_handler at test.jl:27
 in do_test at test.jl:50
 in runtests at /usr/local/src/julia/julia/test/testdefs.jl:78
 in anonymous at multi.jl:641
 in run_work_thunk at multi.jl:602
 in remotecall_fetch at multi.jl:675
 in remotecall_fetch at multi.jl:690
 in anonymous at task.jl:1644
while loading complex.jl, in expression starting on line 752
ERROR: LoadError: LoadError: test failed: (1.0 + NaN*im == 1.0 + 1.0im)
 in expression: round(Complex(1.5,0.5),RoundDown,RoundUp) == Complex(1.0,1.0)
 in error at error.jl:19
 in default_handler at test.jl:27
 in do_test at test.jl:50
 in runtests at /usr/local/src/julia/julia/test/testdefs.jl:78
 in anonymous at multi.jl:641
 in run_work_thunk at multi.jl:602
 in remotecall_fetch at multi.jl:675
 in remotecall_fetch at multi.jl:690
 in anonymous at task.jl:1644
while loading complex.jl, in expression starting on line 752
while loading /usr/local/src/julia/julia/test/runtests.jl, in expression starting on line 42

make[1]: *** [complex] Error 1
make: *** [test-complex] Error 2
$ ./julia -e 'versioninfo()'
Julia Version 0.4.0-dev+3062
Commit a6ae3f1* (2015-02-02 19:40 UTC)
Platform Info:
  System: Linux (i686-redhat-linux)
  CPU: Genuine Intel(R) CPU           T2250  @ 1.73GHz
  WORD_SIZE: 32
  BLAS: libopenblas (DYNAMIC_ARCH NO_AFFINITY Banias)
  LAPACK: libopenblas
  LIBM: libopenlibm
  LLVM: libLLVM-3.3

I poked around a bit and noted some curious symptoms. Are some intermediary results getting clobbered somewhere?

julia> round(Complex(1.5,0.5),RoundDown,RoundUp)
1.0 + NaN*im

julia> round(Complex(1.5,0.5))
2.0 + NaN*im

julia> @which round(Complex(1.5,0.5))
round(z::Complex{T<:Real}) at complex.jl:701

julia> @less round(Complex(1.5,0.5))
round(z::Complex) = Complex(round(real(z)), round(imag(z)))
@vectorize_1arg Complex round

function round(z::Complex, digits::Integer, base::Integer=10)
    Complex(round(real(z), digits, base),
            round(imag(z), digits, base))
end

float{T<:FloatingPoint}(z::Complex{T}) = z
float(z::Complex) = Complex(float(real(z)), float(imag(z)))
@vectorize_1arg Complex float

~
~

julia> function my_round(z::Complex)
           rrz = round(real(z))
           riz = round(imag(z))
           Complex(rrz, riz)
       end
my_round (generic function with 1 method)

julia> my_round(Complex(1.5,0.5))
2.0 + NaN*im

julia> function my_round2(z::Complex)
           @show rrz = round(real(z))
           @show riz = round(imag(z))
           Complex(rrz, riz)
       end
my_round2 (generic function with 1 method)

julia> my_round2(Complex(1.5,0.5))
rrz = round(real(z)) = 2.0
riz = round(imag(z)) = 0.0
2.0 + 0.0im

julia> 
@jiahao
Copy link
Member

jiahao commented Feb 2, 2015

I have no idea why this error occurs. The tests were newly introduced by #8291; they appear to be deeper bugs that happen to be exposed by the complex rounding code.

@jiahao jiahao added the system:32-bit Affects only 32-bit systems label Feb 2, 2015
@jiahao
Copy link
Member

jiahao commented Feb 2, 2015

Out of curiosity, what happens if you dump the bits for the NaN value?

function my_round(z::Complex)
           rrz = round(real(z))
           riz = round(imag(z))
           @show bits(rrz), bits(riz)
           Complex(rrz, riz)
       end

@rickhg12hs
Copy link
Contributor Author

@jiahao : Heisenbugginess

julia> function my_round(z::Complex)
                  rrz = round(real(z))
                  riz = round(imag(z))
                  @show bits(rrz), bits(riz)
                  Complex(rrz, riz)
              end
my_round (generic function with 1 method)

julia> my_round(Complex(1.5,0.5))
(bits(rrz),bits(riz)) = ("0100000000000000000000000000000000000000000000000000000000000000","0000000000000000000000000000000000000000000000000000000000000000")
2.0 + 0.0im

@jiahao
Copy link
Member

jiahao commented Feb 2, 2015

Memory access error? stumped

@rickhg12hs
Copy link
Contributor Author

Maybe help?

julia> function my_round2(z::Complex)
                  rrz = round(real(z))
                  riz = round(imag(z))
                  Complex(rrz, riz)
             end
my_round2 (generic function with 1 method)

julia> my_round2(Complex(1.5,0.5))
2.0 + NaN*im

julia> bits(imag(my_round2(Complex(1.5,0.5))))
"1111111111111000000000000000000000000000000000000000000000000000"

@jiahao
Copy link
Member

jiahao commented Feb 2, 2015

I was wondering if the NaN payload could tell us anything, but it's all zeroed.

@rickhg12hs
Copy link
Contributor Author

Here's another look:

julia> function my_round5(z::Complex)
                  rrz = round(real(z))
                  riz = round(imag(z))
                  my_cmplx = zero(Complex)
                  my_cmplx += rrz
                  my_cmplx += riz*im
                  my_cmplx
       end
my_round5 (generic function with 1 method)

julia> my_round5(Complex(1.5,0.5))
2.0 + 0.0im

@rickhg12hs
Copy link
Contributor Author

Does this narrow it down to the constructor?

julia> function my_round6(z::Complex)
                  rrz = round(real(z))
                  riz = round(imag(z))
                  rrz + im*riz
       end
my_round6 (generic function with 1 method)

julia> my_round6(Complex(1.5,0.5))
2.0 + 0.0im

@jiahao
Copy link
Member

jiahao commented Feb 2, 2015

I believe so. @StefanKarpinski any ideas?

@jiahao
Copy link
Member

jiahao commented Feb 2, 2015

How about trying the inner constructor?

function my_round{T}(z::Complex{T})
                  rrz = round(real(z))
                  riz = round(imag(z))
                  Complex{T}(rrz, riz)
       end

@rickhg12hs
Copy link
Contributor Author

julia> function my_round{T}(z::Complex{T})
                         rrz = round(real(z))
                         riz = round(imag(z))
                         Complex{T}(rrz, riz)
              end
my_round (generic function with 1 method)

julia> my_round(Complex(1.5,0.5))
2.0 + NaN*im

tkelman referenced this issue Feb 3, 2015
@staticfloat
Copy link
Member

There seems to be something weird going on from the code_typed -> code_llvm step:

julia> @code_typed my_round(Complex(1.5,0.5))
1-element Array{Any,1}:
 :($(Expr(:lambda, Any[:z], Any[Any[:rrz,:riz],Any[Any[:z,Complex{Float64},0],Any[:rrz,Float64,18],Any[:riz,Float64,18]],Any[],Any[]], :(begin  # none, line 2:
        rrz = (top(box))(Float64,(top(rint_llvm))((top(getfield))(z::Complex{Float64},:re)::Float64)) # line 3:
        riz = (top(box))(Float64,(top(rint_llvm))((top(getfield))(z::Complex{Float64},:im)::Float64)) # line 4:
        return $(Expr(:new, Complex{Float64}, :(rrz::Float64), :(riz::Float64)))
    end::Complex{Float64}))))

julia> @code_llvm my_round(Complex(1.5,0.5))

define %Complex.5 @julia_my_round_43277(%Complex.5) {
top:
  %1 = extractvalue %Complex.5 %0, 0, !dbg !1032
  %2 = call double @llvm.rint.f64(double %1), !dbg !1032
  %3 = extractvalue %Complex.5 %0, 1, !dbg !1034
  %4 = call double @llvm.rint.f64(double %3), !dbg !1034
  %5 = insertvalue %Complex.5 undef, double %2, 0, !dbg !1035
  %6 = insertvalue %Complex.5 %5, double %4, 1, !dbg !1035, !julia_type !1036
  ret %Complex.5 %6, !dbg !1035
}

That undef looks awfully suspicious. @vtjnash, @Keno any idea on how I can debug why this undef is appearing? This is a pretty bad issue, as it essentially breaks the 32-bit build.

@mbauman
Copy link
Member

mbauman commented Feb 4, 2015

That's just fine — it's the standard idiom for initializing a new aggregate type in LLVM. See the reference for insertvalue, particularly the example. You start with an object full of undefs and iteratively fill its elements.

@vtjnash
Copy link
Member

vtjnash commented Feb 4, 2015

Complex numbers don't round trip correctly to C. That will be fixed very soon in my PR jn/ccall3

@staticfloat
Copy link
Member

We're not calling a C function with complex numbers though, are we?

@jiahao
Copy link
Member

jiahao commented Feb 4, 2015

I don't believe so. Any C function calls are on the real and imaginary parts separately.

@vtjnash
Copy link
Member

vtjnash commented Feb 5, 2015

i'm getting invalid native code if i run this test with a sysimg, but seem to be getting correct results if a sysimg is not involved.

however, i think this might be fixed in llvm3.5

@nalimilan
Copy link
Member

FWIW it happens with LLVM 3.3, 3.4 and 3.5 when building Fedora packages. E.g with 3.5:
https://copr-be.cloud.fedoraproject.org/results/nalimilan/julia-nightlies/fedora-rawhide-i386/julia-0.4.0-0.20150205.fc21/build.log

@tkelman
Copy link
Contributor

tkelman commented Feb 9, 2015

We haven't had new nightlies for a week because of this. I think we should revert #8291 until either we figure out how to do it without breaking 32 bit, or we've upgraded LLVM and hopefully it's fixed there.

@rickhg12hs
Copy link
Contributor Author

This is a bit slower and uses a bit more RAM, but works on my 32-bit Fedora.

Maybe a temporary workaround?

julia> @time round(Complex(1.5,0.5))
elapsed time: 2.3746e-5 seconds (100 bytes allocated)
2.0 + NaN*im

julia> @time round(Complex(1.5,0.5))
elapsed time: 2.3956e-5 seconds (100 bytes allocated)
2.0 + NaN*im

julia> @time round(Complex(1.5,0.5))
elapsed time: 3.0242e-5 seconds (100 bytes allocated)
2.0 + NaN*im

julia> my_round(z::Complex) = Complex(map(round,reim(z))...)   # <----- New Function Def
my_round (generic function with 1 method)

julia> @time my_round(Complex(1.5,0.5))
elapsed time: 0.006331542 seconds (19 kB allocated)
2.0 + 0.0im

julia> @time my_round(Complex(1.5,0.5))
elapsed time: 2.9893e-5 seconds (148 bytes allocated)
2.0 + 0.0im

julia> @time my_round(Complex(1.5,0.5))
elapsed time: 2.7448e-5 seconds (148 bytes allocated)
2.0 + 0.0im

julia> @time my_round(Complex(1.5,0.5))
elapsed time: 2.8426e-5 seconds (148 bytes allocated)
2.0 + 0.0im

julia> @time my_round(Complex(1.5,0.5))
elapsed time: 2.6959e-5 seconds (148 bytes allocated)
2.0 + 0.0im

@staticfloat
Copy link
Member

@rickhg12hs would you be able to put together a PR that I could run on my buildbots to see if it does indeed fix the problem?

@vtjnash
Copy link
Member

vtjnash commented Feb 9, 2015

That'll fix it, but only because it very effectively defeats llvm optimization. Simply preventing inlining of round would have the same effect

@rickhg12hs
Copy link
Contributor Author

@staticfloat : PR made. #10145

@vtjnash : Played with @noinline but clearly I don't know what it does and couldn't seem to use it (blind inclusion) to fix this issue.

@vtjnash
Copy link
Member

vtjnash commented Feb 10, 2015

would probably look something like:

@noinline noinline_round(x...) = round(x...)
round(c::Complex) = Complex(noinline_round(real(c)), noinline_round(imag(c)))

i realize that's a bit awkward, but annotating inline/noinline at the call site isn't actually supported yet

staticfloat added a commit that referenced this issue Feb 10, 2015
…ound

Update base/complex.jl for Complex round (Fix/Workaround #10027)
@JeffBezanson
Copy link
Member

+1 to @vtjnash 's suggestion. That would be a much better workaround. Even better if it is only done on 32-bit systems where the problem exists.

@rickhg12hs
Copy link
Contributor Author

Since this is "working" now, should it be closed?

Comment in code mentions it's a workaround. Will this prompt a revisit with new LLVM version?

Is there a "workaround" issue tag that could be applied to prompt future look.

@ViralBShah
Copy link
Member

Perhaps we should have a revisit label.

@tkelman
Copy link
Contributor

tkelman commented Dec 4, 2015

This probably works better now that we restrict to >= pentium4

@vtjnash
Copy link
Member

vtjnash commented May 20, 2016

this was fixed with the sret fixes some time back. I'll remove the workaround code in #16219

@vtjnash vtjnash closed this as completed May 20, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
system:32-bit Affects only 32-bit systems
Projects
None yet
Development

No branches or pull requests

9 participants