Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

World age error from LoopVectorization v0.10.0 with Tullio and MPI on Julia v1.5.3 #192

Closed
ranocha opened this issue Jan 26, 2021 · 9 comments · Fixed by JuliaArrays/ArrayInterface.jl#114

Comments

@ranocha
Copy link
Member

ranocha commented Jan 26, 2021

We get a world age error from LoopVectorization v0.10.0 in combination with Tullio and MPI in Trixi.jl on Ubuntu and Windows, see trixi-framework/Trixi.jl#423. The relevant error message is https://github.com/trixi-framework/Trixi.jl/pull/423/checks?check_run_id=1765905078#step:6:333

The applicable method may be too new: running in world age 31903, while current world is 56333.
Closest candidates are:
  register_size() at ~/.julia/packages/VectorizationBase/eE3lL/src/cpu_info.jl:51 (method too new to be called from this world context.)
  register_size(!Matched::Type{T}) where T<:Union{Signed, Unsigned} at ~/.julia/packages/VectorizationBase/eE3lL/src/vector_width.jl:3 (method too new to be called from this world context.)
  register_size(!Matched::Type{T}) where T at ~/.julia/packages/VectorizationBase/eE3lL/src/vector_width.jl:2 (method too new to be called from this world context.)
Stacktrace:
  [1] dynamic_integer_register_size() at ~/.julia/packages/VectorizationBase/eE3lL/src/cpu_info.jl:38
  [2] #s201#30 at ~/.julia/packages/VectorizationBase/eE3lL/src/cpu_info.jl:49 [inlined]
  [3] #s201#30(::Any) at ./none:0
  [4] (::Core.GeneratedFunctionStub)(::Any, ::Vararg{Any,N} where N) at ./boot.jl:527
  [5] simd_integer_register_size() at ~/.julia/packages/VectorizationBase/eE3lL/src/cpu_info.jl:53
  [6] __pick_vector_width(::Int64, ::Int64, ::Any) at ~/.julia/packages/VectorizationBase/eE3lL/src/vector_width.jl:39
  [7] __pick_vector_width(::Int64, ::Int64, ::Any, ::Any, ::Type{T} where T) at ~/.julia/packages/VectorizationBase/eE3lL/src/vector_width.jl:47
  [8] _pick_vector_width(::Type{T} where T, ::Vararg{Type{T} where T,N} where N) at ~/.julia/packages/VectorizationBase/eE3lL/src/vector_width.jl:53
  [9] #s201#33 at ~/.julia/packages/VectorizationBase/eE3lL/src/vector_width.jl:73 [inlined]
  [10] #s201#33(::Any, ::Any) at ./none:0
  [11] (::Core.GeneratedFunctionStub)(::Any, ::Vararg{Any,N} where N) at ./boot.jl:527
  [12] 𝒜𝒸𝓉! at ~/.julia/packages/Tullio/u3myB/src/macro.jl:1090 [inlined]
  [13] 𝒜𝒸𝓉! at ~/.julia/packages/Tullio/u3myB/src/macro.jl:1087 [inlined]
  [14] threader at ~/.julia/packages/Tullio/u3myB/src/threads.jl:48 [inlined]
  [15] macro expansion at ~/.julia/packages/Tullio/u3myB/src/macro.jl:1000 [inlined]

A minimal working example can be created as follows. Save

using Test
using MPI
using LoopVectorization, Tullio

@test !MPI.Initialized()
MPI.Init()
@test MPI.Initialized()

function foo!(C, A, B)
  @tullio C[i,j] = A[i,k] * B[k,j]
  return nothing
end

A = rand(10^2, 10^2);
B = rand(10^2, 10^2);
C = similar(A);
foo!(C, A, B)
successful = C  A * B

all_successful = MPI.Allreduce(Int(successful), +, MPI.COMM_WORLD) == MPI.Comm_size(MPI.COMM_WORLD)
if MPI.Comm_rank(MPI.COMM_WORLD) == 0
  @show all_successful
end

@test !MPI.Finalized()
MPI.Finalize()
@test MPI.Finalized()

as example.jl. Setup a Julia project with

julia> using Pkg; Pkg.status()
Status `/tmp/tmp_test/Project.toml`
  [bdcacae8] LoopVectorization v0.10.0
  [da04e1cc] MPI v0.16.1
  [bc48ee85] Tullio v0.2.11

julia> versioninfo()
Julia Version 1.5.3
Commit 788b2c77c1 (2020-11-09 13:37 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-9.0.1 (ORCJIT, skylake)
Environment:
  JULIA_MPI_BINARY = system
  JULIA_NUM_THREADS = 6

and run

julia> using MPI: mpiexec

julia> mpiexec() do cmd
           run(`$cmd -n 2 $(Base.julia_cmd()) --compiled-modules=no --threads=1 example.jl`)
       end
ERROR: ERROR: LoadError: LoadError: MethodError: no method matching register_size()
The applicable method may be too new: running in world age 29617, while current world is 34251.
Closest candidates are:
  register_size() at ~/.julia/packages/VectorizationBase/eE3lL/src/cpu_info.jl:51 (method too new to be called from this world context.)
  register_size(!Matched::Type{T}) where T<:Union{Signed, Unsigned} at ~/.julia/packages/VectorizationBase/eE3lL/src/vector_width.jl:3 (method too new to be called from this world context.)
  register_size(!Matched::Type{T}) where T at ~/.julia/packages/VectorizationBase/eE3lL/src/vector_width.jl:2 (method too new to be called from this world context.)
Stacktrace:MethodError: no method matching register_size()
The applicable method may be too new: running in world age 29617, while current world is 34251.
Closest candidates are:
  register_size() at ~/.julia/packages/VectorizationBase/eE3lL/src/cpu_info.jl:51 (method too new to be called from this world context.)
  register_size(!Matched::Type{T}) where T<:Union{Signed, Unsigned} at ~/.julia/packages/VectorizationBase/eE3lL/src/vector_width.jl:3 (method too new to be called from this world context.)
  register_size(!Matched::Type{T}) where T at ~/.julia/packages/VectorizationBase/eE3lL/src/vector_width.jl:2 (method too new to be called from this world context.)
Stacktrace:
 [1] dynamic_integer_register_size() at ~/.julia/packages/VectorizationBase/eE3lL/src/cpu_info.jl:38
 [2] 
 [1] #s453#30 at ~/.julia/packages/VectorizationBase/eE3lL/src/cpu_info.jl:49 [inlined]
 [3] #s453#30(dynamic_integer_register_size::() at ~/.julia/packages/VectorizationBase/eE3lL/src/cpu_info.jl:38
 [2] #s453#30 at ~/.julia/packages/VectorizationBase/eE3lL/src/cpu_info.jl:49 [inlined]
 [3] #s453#30(::Any) at ./none:0
 [4] (::Core.GeneratedFunctionStub)(::Any, ::Vararg{Any,N} where N) at ./boot.jl:527
 [5] simd_integer_register_size() at ~/.julia/packages/VectorizationBase/eE3lL/src/cpu_info.jl:53
 [6] __pick_vector_width(::Int64, ::Int64, ::Any) at ~/.julia/packages/VectorizationBase/eE3lL/src/vector_width.jl:39
 [7] __pick_vector_width(::Int64, ::Int64, ::Any, ::Any, ::Type{T} where T) at ~/.julia/packages/VectorizationBase/eE3lL/src/vector_width.jl:47
 [8] _pick_vector_width(::Type{T} where T, ::Vararg{Type{T} where T,N} where N) at ~/.julia/packages/VectorizationBase/eE3lL/src/vector_width.jl:53
 [9] #s453#33 at ~/.julia/packages/VectorizationBase/eE3lL/src/vector_width.jl:73 [inlined]
 [10] #s453#33(::Any, ::Any) at ./none:0
 [11] (::Core.GeneratedFunctionStub)(::Any, ::Vararg{Any,N} where N) at ./boot.jl:527
 [12] 𝒜𝒸𝓉! at ~/.julia/packages/Tullio/u3myB/src/macro.jl:1090 [inlined]
 [13] tile_halves(::var"#𝒜𝒸𝓉!#1", ::Type{Array{Float64,2}}, ::Tuple{Array{Float64,2},Array{Float64,2},Array{Float64,2}}, ::Tuple{UnitRange{Int64},UnitRange{Int64}}, ::Tuple{UnitRange{Int64}}, ::Nothing, ::Nothing) at ~/.julia/packages/Tullio/u3myB/src/threads.jl:139
 [14] tile_halves(::var"#𝒜𝒸𝓉!#1", ::Type{Array{Float64,2}}, ::Tuple{Array{Float64,2},Array{Float64,2},Array{Float64,2}}, ::Tuple{UnitRange{Int64},UnitRange{Int64}}, ::Tuple{UnitRange{Int64}}, ::Nothing, ::Nothing) at ~/.julia/packages/Tullio/u3myB/src/threads.jl:142 (repeats 2 times)
 [15] tile_halves(::var"#𝒜𝒸𝓉!#1", ::Type{Array{Float64,2}}, ::Tuple{Array{Float64,2},Array{Float64,2},Array{Float64,2}}, ::Tuple{UnitRange{Int64},UnitRange{Int64}}, ::Tuple{UnitRange{Int64}}, ::Nothing, ::Bool) at ~/.julia/packages/Tullio/u3myB/src/threads.jl:146
 [16] tile_halves at ~/.julia/packages/Tullio/u3myB/src/threads.jl:136 [inlined]
 [17] threader at ~/.julia/packages/Tullio/u3myB/src/threads.jl:65 [inlined]
 [18] macro expansion at ~/.julia/packages/Tullio/u3myB/src/macro.jl:1000 [inlined]
 [19] foo!(::Array{Float64,2}, ::Array{Float64,2}, ::Array{Float64,2}) at /tmp/tmp_test/example.jl:11
 [20] top-level scope at /tmp/tmp_test/example.jl:18
 [21] include(::Function, ::Module, ::String) at ./Base.jl:380
 [22] include(::Module, ::String) at ./Base.jl:368
 [23] exec_options(::Base.JLOptions) at ./client.jl:296
 [24] _start() at ./client.jl:506
in expression starting at /tmp/tmp_test/example.jl:18
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
Any) at ./none:0
 [4] (::Core.GeneratedFunctionStub)(::Any, ::Vararg{Any,N} where N) at ./boot.jl:527
 [5] simd_integer_register_size() at ~/.julia/packages/VectorizationBase/eE3lL/src/cpu_info.jl:53
 [6] __pick_vector_width(::Int64, ::Int64, ::Any) at ~/.julia/packages/VectorizationBase/eE3lL/src/vector_width.jl:39
 [7] __pick_vector_width(::Int64, ::Int64, ::Any, ::Any, ::Type{T} where T) at ~/.julia/packages/VectorizationBase/eE3lL/src/vector_width.jl:47
 [8] _pick_vector_width(::Type{T} where T, ::Vararg{Type{T} where T,N} where N) at ~/.julia/packages/VectorizationBase/eE3lL/src/vector_width.jl:53
 [9] #s453#33 at ~/.julia/packages/VectorizationBase/eE3lL/src/vector_width.jl:73 [inlined]
 [10] #s453#33(::Any, ::Any) at ./none:0
 [11] (::Core.GeneratedFunctionStub)(::Any, ::Vararg{Any,N} where N) at ./boot.jl:527
 [12] 𝒜𝒸𝓉! at ~/.julia/packages/Tullio/u3myB/src/macro.jl:1090 [inlined]
 [13] tile_halves(::var"#𝒜𝒸𝓉!#1", ::Type{Array{Float64,2}}, ::Tuple{Array{Float64,2},Array{Float64,2},Array{Float64,2}}, ::Tuple{UnitRange{Int64},UnitRange{Int64}}, ::Tuple{UnitRange{Int64}}, ::Nothing, ::Nothing) at ~/.julia/packages/Tullio/u3myB/src/threads.jl:139
 [14] tile_halves(::var"#𝒜𝒸𝓉!#1", ::Type{Array{Float64,2}}, ::Tuple{Array{Float64,2},Array{Float64,2},Array{Float64,2}}, ::Tuple{UnitRange{Int64},UnitRange{Int64}}, ::Tuple{UnitRange{Int64}}, ::Nothing, ::Nothing) at ~/.julia/packages/Tullio/u3myB/src/threads.jl:142 (repeats 2 times)
 [15] tile_halves(::var"#𝒜𝒸𝓉!#1", ::Type{Array{Float64,2}}, ::Tuple{Array{Float64,2},Array{Float64,2},Array{Float64,2}}, ::Tuple{UnitRange{Int64},UnitRange{Int64}}, ::Tuple{UnitRange{Int64}}, ::Nothing, ::Bool) at ~/.julia/packages/Tullio/u3myB/src/threads.jl:146
 [16] tile_halves at ~/.julia/packages/Tullio/u3myB/src/threads.jl:136 [inlined]
 [17] threader at ~/.julia/packages/Tullio/u3myB/src/threads.jl:65 [inlined]
 [18] macro expansion at ~/.julia/packages/Tullio/u3myB/src/macro.jl:1000 [inlined]
 [19] foo!(::Array{Float64,2}, ::Array{Float64,2}, ::Array{Float64,2}) at /tmp/tmp_test/example.jl:11
 [20] top-level scope at /tmp/tmp_test/example.jl:18
 [21] include(::Function, ::Module, ::String) at ./Base.jl:380
 [22] include(::Module, ::String) at ./Base.jl:368
 [23] exec_options(::Base.JLOptions) at ./client.jl:296
 [24] _start() at ./client.jl:506
in expression starting at /tmp/tmp_test/example.jl:18
--------------------------------------------------------------------------
mpiexec detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[20616,1],1]
  Exit code:    1
--------------------------------------------------------------------------
ERROR: failed process: Process(`mpiexec -n 2 ~/Software/julia-1.5.3/bin/julia -Cnative -J~/Software/julia-1.5.3/lib/julia/sys.so -g1 --compiled-modules=no --threads=1 example.jl`, ProcessExited(1)) [1]

Stacktrace:
 [1] pipeline_error at ./process.jl:525 [inlined]
 [2] run(::Cmd; wait::Bool) at ./process.jl:440
 [3] run at ./process.jl:438 [inlined]
 [4] (::var"#1#2")(::Cmd) at ./REPL[4]:2
 [5] (::MPI.var"#26#27"{var"#1#2"})(::Cmd) at ~/.julia/packages/MPI/b7MVG/src/environment.jl:25
 [6] _mpiexec at ~/.julia/packages/MPI/b7MVG/deps/deps.jl:6 [inlined]
 [7] mpiexec(::var"#1#2") at ~/.julia/packages/MPI/b7MVG/src/environment.jl:25
 [8] top-level scope at REPL[4]:1
@ranocha ranocha changed the title World age error from LoopVectorization v0.10.0 with Tullio and MPI World age error from LoopVectorization v0.10.0 with Tullio and MPI on Julia v1.5.3 Jan 26, 2021
@ranocha
Copy link
Member Author

ranocha commented Jan 26, 2021

Running the same steps with Julia v1.6.0-beta1 does not throw any error. Thus, it will be fine for us in the future. Since this is some special use case and we will switch to Julia v1.6 once its released officially, we can postpone upgrading LoopVectorization a bit. Feel free to close this issue.

@chriselrod
Copy link
Member

I get a similar error on 1.7.

@chriselrod
Copy link
Member

chriselrod commented Jan 26, 2021

Do you need --compiled-modules=no?
Without it, it passes:

julia> using MPI: mpiexec

julia> mpiexec() do cmd
                         run(`$cmd -n 2 $(Base.julia_cmd()) --threads=1 /home/chriselrod/Documents/progwork/julia/loopvectests/mpi.jl`)
                     end
all_successful = true
Process(`/home/chriselrod/.julia/artifacts/3acc381f6eb6cae155dc415de8036910624a278c/bin/mpiexec -n 2 /home/chriselrod/Documents/languages/julia-polly/usr/bin/julia -Cnative -J/home/chriselrod/Documents/languages/julia-polly/usr/lib/julia/sys.so -O3 -g1 --threads=1 /home/chriselrod/Documents/progwork/julia/loopvectests/mpi.jl`, ProcessExited(0))

julia> versioninfo()
Julia Version 1.5.3
Commit 788b2c77c1 (2020-11-09 13:37 UTC)
Platform Info:
  OS: Linux (x86_64-generic-linux)
  CPU: Intel(R) Core(TM) i9-7900X CPU @ 3.30GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-9.0.1 (ORCJIT, skylake-avx512)
Environment:
  JULIA_NUM_THREADS = auto

While with it, a simpler reproducible example is just

using VectorizationBase
include(joinpath(pkgdir(VectorizationBase), "test", "runtests.jl"))

Full error:

# > julia -O3 -q --compiled-modules=no
julia> using VectorizationBase

julia> include(joinpath(pkgdir(VectorizationBase), "test", "runtests.jl"))
Julia Version 1.5.3
Commit 788b2c77c1 (2020-11-09 13:37 UTC)
Platform Info:
  OS: Linux (x86_64-generic-linux)
  uname: Linux 5.10.9-1016.native #1 SMP Tue Jan 19 15:04:46 PST 2021 x86_64 unknown
  CPU: Intel(R) Core(TM) i9-7900X CPU @ 3.30GHz:
                 speed         user         nice          sys         idle          irq
       #1-20  3999 MHz    2152869 s       2420 s     452123 s  321536482 s      75879 s

  Memory: 31.043872833251953 GB (16618.921875 MB free)
  Uptime: 162194.0 sec
  Load Avg:  1.185546875  1.19091796875  1.26513671875
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-9.0.1 (ORCJIT, skylake-avx512)
Environment:
  JULIA_NUM_THREADS = auto
  CFLAGS = -O3 -march=native -mprefer-vector-width=512 -feliminate-unused-debug-types  -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=32 -Wformat -Wformat-security -m64  -fasynchronous-unwind-tables -Wp,-D_REENTRANT -ftree-loop-distribute-patterns -Wl,-z -Wl,now -Wl,-z -Wl,relro -fno-semantic-interposition -ffat-lto-objects  -fno-signed-zeros -fno-trapping-math  -fassociative-math -Wl,-sort-common -Wl,--enable-new-dtags
  CLASSPATH = /opt/intel/compilers_and_libraries_2019.4.243/linux/mpi/intel64/lib/mpi.jar:/opt/intel/compilers_and_libraries_2019.4.243/linux/daal/lib/daal.jar
  CPATH = /opt/intel/compilers_and_libraries_2019.4.243/linux/ipp/include:/opt/intel/compilers_and_libraries_2019.4.243/linux/mkl/include:/opt/intel/compilers_and_libraries_2019.4.243/linux/pstl/include:/opt/intel/compilers_and_libraries_2019.4.243/linux/tbb/include:/opt/intel/compilers_and_libraries_2019.4.243/linux/tbb/include:/opt/intel/compilers_and_libraries_2019.4.243/linux/daal/include
  CXXFLAGS = -O3 -march=native -mprefer-vector-width=512 -feliminate-unused-debug-types  -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=32 -Wformat -Wformat-security -m64  -fasynchronous-unwind-tables -Wp,-D_REENTRANT -ftree-loop-distribute-patterns -Wl,-z -Wl,now -Wl,-z -Wl,relro -fno-semantic-interposition -ffat-lto-objects  -fno-signed-zeros -fno-trapping-math  -fassociative-math -Wl,-sort-common -Wl,--enable-new-dtags -fvisibility-inlines-hidden -Wl,--enable-new-dtags
  FCFLAGS = -Ofast -march=native -mprefer-vector-width=512 -feliminate-unused-debug-types  -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=32 -Wformat -Wformat-security -m64  -fasynchronous-unwind-tables -Wp,-D_REENTRANT -ftree-loop-distribute-patterns -Wl,-z -Wl,now -Wl,-z -Wl,relro -fno-semantic-interposition -ffat-lto-objects  -fno-signed-zeros -fno-trapping-math  -fassociative-math -Wl,-sort-common -Wl,--enable-new-dtags
  FFLAGS = -Ofast -march=native -mprefer-vector-width=512 -feliminate-unused-debug-types  -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=32 -Wformat -Wformat-security -m64  -fasynchronous-unwind-tables -Wp,-D_REENTRANT -ftree-loop-distribute-patterns -Wl,-z -Wl,now -Wl,-z -Wl,relro -fno-semantic-interposition -ffat-lto-objects  -fno-signed-zeros -fno-trapping-math  -fassociative-math -Wl,-sort-common -Wl,--enable-new-dtags
  FI_PROVIDER_PATH = /opt/intel/compilers_and_libraries_2019.4.243/linux/mpi/intel64/libfabric/lib/prov
  HOME = /home/chriselrod
  LA_PATH = /usr/lib64/
  LD_LIBRARY_PATH = /opt/intel/compilers_and_libraries_2019.4.243/linux/compiler/lib/intel64_lin:/opt/intel/compilers_and_libraries_2019.4.243/linux/mpi/intel64/libfabric/lib:/opt/intel/compilers_and_libraries_2019.4.243/linux/mpi/intel64/lib/release:/opt/intel/compilers_and_libraries_2019.4.243/linux/mpi/intel64/lib:/opt/intel/compilers_and_libraries_2019.4.243/linux/ipp/lib/intel64:/opt/intel/compilers_and_libraries_2019.4.243/linux/compiler/lib/intel64_lin:/opt/intel/compilers_and_libraries_2019.4.243/linux/mkl/lib/intel64_lin:/opt/intel/compilers_and_libraries_2019.4.243/linux/tbb/lib/intel64/gcc4.1:/opt/intel/compilers_and_libraries_2019.4.243/linux/tbb/lib/intel64/gcc4.1:/opt/intel/compilers_and_libraries_2019.4.243/linux/daal/lib/intel64_lin:/opt/intel/compilers_and_libraries_2019.4.243/linux/daal/../tbb/lib/intel64_lin/gcc4.4
  LIBRARY_PATH = /opt/intel/compilers_and_libraries_2019.4.243/linux/mpi/intel64/libfabric/lib:/opt/intel/compilers_and_libraries_2019.4.243/linux/ipp/lib/intel64:/opt/intel/compilers_and_libraries_2019.4.243/linux/compiler/lib/intel64_lin:/opt/intel/compilers_and_libraries_2019.4.243/linux/mkl/lib/intel64_lin:/opt/intel/compilers_and_libraries_2019.4.243/linux/tbb/lib/intel64/gcc4.1:/opt/intel/compilers_and_libraries_2019.4.243/linux/tbb/lib/intel64/gcc4.1:/opt/intel/compilers_and_libraries_2019.4.243/linux/daal/lib/intel64_lin:/opt/intel/compilers_and_libraries_2019.4.243/linux/daal/../tbb/lib/intel64_lin/gcc4.4
  MANPATH = /opt/intel/man/common:/opt/intel/compilers_and_libraries_2019.4.243/linux/mpi/man:/usr/local/share/man:/usr/share/man:/usr/man
  MPI_PATH = /usr/lib64/
  NLSPATH = /opt/intel/compilers_and_libraries_2019.4.243/linux/compiler/lib/intel64/locale/%l_%t/%N:/opt/intel/compilers_and_libraries_2019.4.243/linux/mkl/lib/intel64_lin/locale/%l_%t/%N
  PATH = /home/chriselrod/miniconda3/bin:/home/chriselrod/miniconda3/condabin:/opt/intel/compilers_and_libraries_2019.4.243/linux/bin/intel64:/opt/intel/compilers_and_libraries_2019.4.243/linux/mpi/intel64/libfabric/bin:/opt/intel/compilers_and_libraries_2019.4.243/linux/mpi/intel64/bin:/usr/bin/haswell/avx512_1:/usr/bin/haswell:/usr/local/bin:/usr/local/sbin:/usr/bin:/opt/3rd-party/bin
  PKG_CONFIG_PATH = /opt/intel/compilers_and_libraries_2019.4.243/linux/mkl/bin/pkgconfig
  TERM = screen
  THEANO_FLAGS = floatX=float32,openmp=true,gcc.cxxflags="-ftree-vectorize -mavx"
  WINDOWPATH = 2
  FONTCONFIG_PATH = /usr/share/defaults/fonts
  CMDSTAN_HOME = /home/chriselrod/Documents/languages/cmdstan
  R_HOME = /usr/lib64/R
ERROR: LoadError: LoadError: MethodError: no method matching register_size()
The applicable method may be too new: running in world age 31504, while current world is 34093.
Closest candidates are:
  register_size() at /home/chriselrod/.julia/dev/VectorizationBase/src/cpu_info.jl:68 (method too new to be called from this world context.)
  register_size(::Type{T}) where T<:Union{Signed, Unsigned} at /home/chriselrod/.julia/dev/VectorizationBase/src/vector_width.jl:3 (method too new to be called from this world context.)
  register_size(::Type{T}) where T at /home/chriselrod/.julia/dev/VectorizationBase/src/vector_width.jl:2 (method too new to be called from this world context.)
Stacktrace:
 [1] dynamic_integer_register_size() at /home/chriselrod/.julia/dev/VectorizationBase/src/cpu_info.jl:38
 [2] #s1160#30 at /home/chriselrod/.julia/dev/VectorizationBase/src/cpu_info.jl:65 [inlined]
 [3] #s1160#30(::Any) at ./none:0
 [4] (::Core.GeneratedFunctionStub)(::Any, ::Vararg{Any,N} where N) at ./boot.jl:527
 [5] simd_integer_register_size() at /home/chriselrod/.julia/dev/VectorizationBase/src/cpu_info.jl:70
 [6] __pick_vector_width(::Int64, ::Int64, ::Any) at /home/chriselrod/.julia/dev/VectorizationBase/src/vector_width.jl:39
 [7] _pick_vector_width(::Type{T} where T) at /home/chriselrod/.julia/dev/VectorizationBase/src/vector_width.jl:53
 [8] #s1160#33 at /home/chriselrod/.julia/dev/VectorizationBase/src/vector_width.jl:73 [inlined]
 [9] #s1160#33(::Any, ::Any) at ./none:0
 [10] (::Core.GeneratedFunctionStub)(::Any, ::Vararg{Any,N} where N) at ./boot.jl:527
 [11] top-level scope at /home/chriselrod/.julia/dev/VectorizationBase/test/testsetup.jl:6
 [12] include(::String) at ./client.jl:457
 [13] top-level scope at /home/chriselrod/.julia/dev/VectorizationBase/test/runtests.jl:4
 [14] include(::String) at ./client.jl:457
 [15] top-level scope at REPL[2]:1
in expression starting at /home/chriselrod/.julia/dev/VectorizationBase/test/testsetup.jl:6
in expression starting at /home/chriselrod/.julia/dev/VectorizationBase/test/runtests.jl:4

Normally, the world age gets frozen when a module is defined, so that everything inside is at the same world age, and you don't get world age issues as a result. Seems that is not the case with --compiled-modules=no, so suddenly the order of definitions matters.
I can use the above to try and put everything into an order that works.

As an aside, the reason for this change (const REGISTER_SIZE = ... -> @generated register_size() = ... was to enable precompilation without baking the values of the compiling system into the compiled object. That is, while const REGISTER_SIZE = ... would've been the same across all systems, @generated register_size() shouldn't compile until called, meaning as long as it isn't during/before compilation, it should be distributable yet across different architectures yet still yield the correct results.

As in, earlier, if you compiled a .ji file or a system image on a system with AVX512, and then transferred it to a system with only AVX2, REGISTER_SIZE would equal 64 on both, causing potential problems ranging from bad performance to crashes on the system without AVX512. Going the other way, and REGISTER_SIZE=32 on both, and while less severe -- bad performance would be the only result -- this is still undesirable.

Now -- and this admittedly needs more testing -- you should be able to produce the .ji files or system images on either the AVX512 or AVX2 system and transfer them to the other, and register_size()'s compilation should be delayed so that it produces the correct values of 64 and 32 on both systems, while also still being a compile-time constant.
That is, one of the goals of this change was to make it so that --compiled-modules=no is necessary less often.

@ranocha
Copy link
Member Author

ranocha commented Jan 26, 2021

Do you need --compiled-modules=no?

That's our workaround since Julia does not support parallel precompilation in v1.5, which is necessary for our MPI runs. We want to disable the flag in Julia v1.6, where it's not needed anymore.

Thanks for the detailed explanation and your great work, @chriselrod!

@chriselrod
Copy link
Member

Can you confirm this has been fixed for you?
Works for me now:

julia> using MPI: mpiexec

julia> mpiexec() do cmd
           run(`$cmd -n 2 $(Base.julia_cmd()) --compiled-modules=no --threads=1 /home/chriselrod/Documents/progwork/julia/loopvectests/mpi.jl`)
       end
all_successful = true
Process(`/home/chriselrod/.julia/artifacts/3acc381f6eb6cae155dc415de8036910624a278c/bin/mpiexec -n 2 /home/chriselrod/Documents/languages/julia/usr/bin/julia -Cnative,-prefer-256-bit -J/home/chriselrod/Documents/languages/julia/usr/lib/julia/sys.so -O3 -g1 --compiled-modules=no --threads=1 /home/chriselrod/Documents/progwork/julia/loopvectests/mpi.jl`, ProcessExited(0))

(@v1.7) pkg> st VectorizationBase LoopVectorization
      Status `~/.julia/environments/v1.7/Project.toml`
  [bdcacae8] LoopVectorization v0.11.2 `~/.julia/dev/LoopVectorization`
  [3d5dd08c] VectorizationBase v0.18.1 `~/.julia/dev/VectorizationBase`

shell> cat /home/chriselrod/Documents/progwork/julia/loopvectests/mpi.jl
using Test
using MPI
using LoopVectorization, Tullio

@test !MPI.Initialized()
MPI.Init()
@test MPI.Initialized()

function foo!(C, A, B)
  @tullio C[i,j] = A[i,k] * B[k,j]
  return nothing
end

A = rand(10^2, 10^2);
B = rand(10^2, 10^2);
C = similar(A);
foo!(C, A, B)
successful = C  A * B

all_successful = MPI.Allreduce(Int(successful), +, MPI.COMM_WORLD) == MPI.Comm_size(MPI.COMM_WORLD)
if MPI.Comm_rank(MPI.COMM_WORLD) == 0
  @show all_successful
end

@test !MPI.Finalized()
MPI.Finalize()
@test MPI.Finalized()

@ranocha
Copy link
Member Author

ranocha commented Feb 1, 2021

I'm AFK right now but will test it tomorrow 👍

@ranocha
Copy link
Member Author

ranocha commented Feb 2, 2021

The error with register_size seems to be fixed but now we get a similar one for argdims:

LoadError: MethodError: no method matching argdims(::Type{Array{Float64,4}}, ::Type{Base.Slice{Base.OneTo{Int64}}})
  The applicable method may be too new: running in world age 28895, while current world is 57019.
  Closest candidates are:
    argdims(::Any, ::Type{T}) where T at /home/runner/.julia/packages/ArrayInterface/WR9aE/src/indexing.jl:22 (method too new to be called from this world context.)
    argdims(::Any, ::Any) at /home/runner/.julia/packages/ArrayInterface/WR9aE/src/indexing.jl:21 (method too new to be called from this world context.)
    argdims(!Matched::ArrayInterface.ArrayStyle, ::Type{T}) where {N, T<:(AbstractArray{var"#s270",N} where var"#s270")} at /home/runner/.julia/packages/ArrayInterface/WR9aE/src/indexing.jl:30 (method too new to be called from this world context.)
    ...
  Stacktrace:
   [1] #s270#38 at /home/runner/.julia/packages/ArrayInterface/WR9aE/src/dimensions.jl:33 [inlined]
   [2] #s270#38(::Any, ::Any, ::Any, ::Any, ::Type{T} where T, ::Any) at ./none:0
   [3] (::Core.GeneratedFunctionStub)(::Any, ::Vararg{Any,N} where N) at ./boot.jl:527
   [4] from_parent_dims at /home/runner/.julia/packages/ArrayInterface/WR9aE/src/dimensions.jl:28 [inlined]
   [5] _contiguous_axis(::Type{SubArray{Float64,3,Array{Float64,4},Tuple{Base.Slice{Base.OneTo{Int64}},Base.Slice{Base.OneTo{Int64}},Base.Slice{Base.OneTo{Int64}},Int64},true}}, ::ArrayInterface.StaticInt{1}) at /home/runner/.julia/packages/ArrayInterface/WR9aE/src/stridelayout.jl:69
   [6] contiguous_axis at /home/runner/.julia/packages/ArrayInterface/WR9aE/src/stridelayout.jl:62 [inlined]
   [7] contiguous_axis at /home/runner/.julia/packages/ArrayInterface/WR9aE/src/stridelayout.jl:21 [inlined]
   [8] stridedpointer at /home/runner/.julia/packages/VectorizationBase/aNPHK/src/strided_pointers/stridedpointers.jl:71 [inlined]
   [9] 𝒜𝒸𝓉! at /home/runner/.julia/packages/Tullio/u3myB/src/macro.jl:1090 [inlined]
   [10] 𝒜𝒸𝓉! at /home/runner/.julia/packages/Tullio/u3myB/src/macro.jl:1087 [inlined]
   [11] threader at /home/runner/.julia/packages/Tullio/u3myB/src/threads.jl:48 [inlined]
   [12] macro expansion at /home/runner/.julia/packages/Tullio/u3myB/src/macro.jl:1000 [inlined]

using LoopVectorization v0.11.2 and VectorizationBase v0.18.1, see https://github.com/trixi-framework/Trixi.jl/pull/428/checks?check_run_id=1809136036#step:6:3923

@chriselrod
Copy link
Member

chriselrod commented Feb 2, 2021

Trixi's tests passed for me locally after the above ArrayInterface PR, but mind upgrading to ArrayInterface 3.0.1 and confirming?

Test Summary: | Test Summary: | Pass  Pass  TotalTotal

Test Summary: | Pass  Total
Parallel 2D  Parallel 2D   |  |   15    15     15   15

Parallel 2D   |  134    134
254.036318 seconds (1.39 M allocations: 69.111 MiB, 0.00% gc time)
  0.000005 seconds (4 allocations: 160 bytes)
  0.000002 seconds (4 allocations: 160 bytes)
  0.000002 seconds (4 allocations: 160 bytes)
Test Summary:  | Pass  Total
Trixi.jl tests |    1      1
254.551402 seconds (2.94 M allocations: 148.049 MiB, 0.01% gc time)
    Testing Trixi tests passed

@ranocha
Copy link
Member Author

ranocha commented Feb 3, 2021

Great, it works for me, too! Thanks a lot for your great work and support, @chriselrod!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants