Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stegr! call segfault #42415

Closed
ianatol opened this issue Sep 28, 2021 · 11 comments · Fixed by #42546
Closed

stegr! call segfault #42415

ianatol opened this issue Sep 28, 2021 · 11 comments · Fixed by #42546
Labels
domain:ci Continuous integration domain:linear algebra Linear algebra kind:bug Indicates an unexpected problem or unintended behavior security System security concerns and vulnerabilities
Milestone

Comments

@ianatol
Copy link
Member

ianatol commented Sep 28, 2021

@DilumAluthge alerted us to segfaults that were occurring on some machines during CI builds while running the LinearAlgebra/tridiag test. Using rr, we were able to trace these segfaults back to a call to stegr! in lapack.jl

function stegr!(jobz::AbstractChar, range::AbstractChar, dv::AbstractVector{$elty}, ev::AbstractVector{$elty}, vl::Real, vu::Real, il::Integer, iu::Integer)

While we ran out of time to fully debug and fix this, we have a MWE that replicates the problem on AMD machines.

Here's the MWE:

using LinearAlgebra
dv = Float32[-0.854084671, 1.2590245, -1.31365335, 0.7283687, -0.87726289, 2.06376457, 1.65424788, 
         -0.640322149, -0.925165355, -0.857617259, -0.192993864, -0.79191643]
ev = Float32[0.215433806, 0.260803878, -1.3471483, -0.799203992, 1.31649506, 0.0412902869, 
         -0.639533341, 0.039149601, 0.485840291, 1.54775488, -0.503408968, 0]
LAPACK.stegr!('V', 'V', copy(dv), copy(ev), 0.0f0, 1.0f0, 0, 0)

CC: @Keno

@Keno
Copy link
Member

Keno commented Sep 30, 2021

Hmm, my local reproducer turned out to have been OpenBLAS 0.3.13, not 0.3.17, where I'm having trouble reproducing. The rr traces we have are with 0.3.17 though, so it may not be as simple as just this call on AMD.

@Keno
Copy link
Member

Keno commented Sep 30, 2021

Ok, looks like OpenBLAS is doing an uninitialized read from the work array and using that as an array index bound, which explains, why the reproduction here is sporadic (depends on what's in the work array). For better reproduction, fill the work array with deadbeef first.

julia> (stev, stebz, stegr, stein, elty) = (:sstev_,:sstebz_,:sstegr_,:sstein_,:Float32)
julia> @eval  LinearAlgebra.LAPACK function stegr!(jobz::AbstractChar, range::AbstractChar, dv::AbstractVector{$elty}, ev::AbstractVector{$elty}, vl::Real, vu::Real, il::Integer, iu::Integer)
                   require_one_based_indexing(dv, ev)
                   chkstride1(dv, ev)
                   n = length(dv)
                   ne = length(ev)
                   if ne == n - 1
                       eev = [ev; zero($elty)]
                   elseif ne == n
                       eev = copy(ev)
                       eev[n] = zero($elty)
                   else
                       throw(DimensionMismatch("ev has length $ne but needs one less than or equal to dv's length, $n)"))
                   end

                   abstol = Vector{$elty}(undef, 1)
                   m = Ref{BlasInt}()
                   w = similar(dv, $elty, n)
                   ldz = jobz == 'N' ? 1 : n
                   Z = similar(dv, $elty, ldz, range == 'I' ? iu-il+1 : n)
                   isuppz = similar(dv, BlasInt, 2*size(Z, 2))
                   work = Vector{$elty}(undef, 1)
                   lwork = BlasInt(-1)
                   iwork = Vector{BlasInt}(undef, 1)
                   liwork = BlasInt(-1)
                   info = Ref{BlasInt}()
                   for i = 1:2  # first call returns lwork as work[1] and liwork as iwork[1]
                       ccall((@blasfunc($stegr), libblastrampoline), Cvoid,
                           (Ref{UInt8}, Ref{UInt8}, Ref{BlasInt}, Ptr{$elty},
                           Ptr{$elty}, Ref{$elty}, Ref{$elty}, Ref{BlasInt},
                           Ref{BlasInt}, Ptr{$elty}, Ptr{BlasInt}, Ptr{$elty},
                           Ptr{$elty}, Ref{BlasInt}, Ptr{BlasInt}, Ptr{$elty},
                           Ref{BlasInt}, Ptr{BlasInt}, Ref{BlasInt}, Ptr{BlasInt},
                           Clong, Clong),
                           jobz, range, n, dv,
                           eev, vl, vu, il,
                           iu, abstol, m, w,
                           Z, ldz, isuppz, work,
                           lwork, iwork, liwork, info,
                           1, 1)
                       chklapackerror(info[])
                       if i == 1
                           lwork = BlasInt(work[1])
                           resize!(work, lwork); fill!(work, BlasInt(0xdeadbeef))
                           liwork = iwork[1]
                           resize!(iwork, liwork); fill!(iwork, BlasInt(0xdeadbeef))
                       end
                   end
                   m[] == length(w) ? w : w[1:m[]], m[] == size(Z, 2) ? Z : Z[:,1:m[]]
               end

Keno added a commit to JuliaComputing/lapack that referenced this issue Sep 30, 2021
This was originally reported as JuliaLang/julia#42415.
I've tracked this down to an our of bounds read on the following line:

https://github.com/Reference-LAPACK/lapack/blob/44ecb6a5ff821b1cbb39f8cc2166cb098e060b4d/SRC/slarrv.f#L423

In the crashing example, `M` is `0`, causing `slarrv` to read uninitialized
memory from the work array. I believe the `0` for `M` is correct and indeed,
the documentation above supports that `M` may be zero:

https://github.com/Reference-LAPACK/lapack/blob/44ecb6a5ff821b1cbb39f8cc2166cb098e060b4d/SRC/slarrv.f#L113-L116

I believe it may be sufficient to early-out this function as suggested
in this PR. However, I have limited context for the full routine here,
so I would appreciate a sanity check.
@Keno
Copy link
Member

Keno commented Sep 30, 2021

I have a proposed fix in Reference-LAPACK/lapack#625. @andreasnoack if you have more context for what this code does perhaps you could review upstream.

Keno added a commit to JuliaComputing/lapack that referenced this issue Sep 30, 2021
This was originally reported as JuliaLang/julia#42415.
I've tracked this down to an our of bounds read on the following line:

https://github.com/Reference-LAPACK/lapack/blob/44ecb6a5ff821b1cbb39f8cc2166cb098e060b4d/SRC/slarrv.f#L423

In the crashing example, `M` is `0`, causing `slarrv` to read uninitialized
memory from the work array. I believe the `0` for `M` is correct and indeed,
the documentation above supports that `M` may be zero:

https://github.com/Reference-LAPACK/lapack/blob/44ecb6a5ff821b1cbb39f8cc2166cb098e060b4d/SRC/slarrv.f#L113-L116

I believe it may be sufficient to early-out this function as suggested
in this PR. However, I have limited context for the full routine here,
so I would appreciate a sanity check.
@JeffBezanson JeffBezanson added domain:ci Continuous integration domain:linear algebra Linear algebra labels Sep 30, 2021
Keno added a commit to JuliaComputing/lapack that referenced this issue Sep 30, 2021
This was originally reported as JuliaLang/julia#42415.
I've tracked this down to an our of bounds read on the following line:

https://github.com/Reference-LAPACK/lapack/blob/44ecb6a5ff821b1cbb39f8cc2166cb098e060b4d/SRC/slarrv.f#L423

In the crashing example, `M` is `0`, causing `slarrv` to read uninitialized
memory from the work array. I believe the `0` for `M` is correct and indeed,
the documentation above supports that `M` may be zero:

https://github.com/Reference-LAPACK/lapack/blob/44ecb6a5ff821b1cbb39f8cc2166cb098e060b4d/SRC/slarrv.f#L113-L116

I believe it may be sufficient to early-out this function as suggested
in this PR. However, I have limited context for the full routine here,
so I would appreciate a sanity check.
@Keno
Copy link
Member

Keno commented Oct 1, 2021

This is fixed upstream now, if somebody wants to integrate the patch in our OpenBLAS build :)

@DilumAluthge DilumAluthge added the kind:bug Indicates an unexpected problem or unintended behavior label Oct 7, 2021
@DilumAluthge
Copy link
Member

@Keno Do we know how long this bug has been in the upstream? Once we've integrated the patch into our build, do we need to backport this to 1.6 and/or 1.7?

@Keno
Copy link
Member

Keno commented Oct 7, 2021

The particular utility function has had this bug since it's introduction, but I do not know whether changes upstream or to the tests would have caused this to show up more frequently. It should certainly be safe to backport.

@DilumAluthge
Copy link
Member

This is fixed upstream now, if somebody wants to integrate the patch in our OpenBLAS build :)

Okay, here is my attempt to add the patch to Ygg: JuliaPackaging/Yggdrasil#3708

@vchuravy
Copy link
Sponsor Member

vchuravy commented Oct 9, 2021

@nsslh
Copy link

nsslh commented Dec 15, 2021

Please consider adding the security label, for CVE-2021-4048.

@oscardssmith
Copy link
Member

the link is broken.

@oscardssmith oscardssmith added the security System security concerns and vulnerabilities label Dec 15, 2021
@KristofferC
Copy link
Sponsor Member

It's a GitHub auto-generated link.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:ci Continuous integration domain:linear algebra Linear algebra kind:bug Indicates an unexpected problem or unintended behavior security System security concerns and vulnerabilities
Projects
None yet
8 participants