You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
On ARCHER2 (UK Tier-1 system), I'm observing that calling MPI.Init with system's Cray MPICH (version string "MPI VERSION : CRAY MPICH version 8.1.4.31 (ANL base 3.4a2)\nMPI BUILD INFO : Thu Mar 18 17:07 2021 (git hash 3e74f0c)\n") either segfaults or hangs most of the time when running multi-node jobs. Note: this happens only on master of MPI.jl, but not v0.19.2, so it'd appear something wrong is going on with MPI.jl#master.
I don't have much time to investigate this further at the moment, I'm opening this issue as a reminder to try and look into this at some point.
The text was updated successfully, but these errors were encountered:
I'm utterly confused: hangs/segfaults happen if I install MPI.jl with ]add MPI#4a87d7402ac3baba5cc97bfd8d5bd4cfbb825525, but not if I ]dev MPI and check out the same revision 😐 my attempt to git bisect the issue failed badly because dev'ing the package works. This makes extremely little sense to me.
I forgot about that one, thanks for pointing it out. I'm not sure, ]add MPI works fine for me, and this installs 0.19.2 which seems to be version used in #616 (at least, they mention the JULIA_MPI_* environment variables). It's only when I do ]add MPI#4a87d7402ac3baba5cc97bfd8d5bd4cfbb825525 (but not ]dev MPI) that I get the segfaults, or more often hangs. I know this sounds absurd, I'm also at a loss here.
giordano
changed the title
MPI.Init on ACHER2 either segfaults or hangs most of the time on multi-node jobsMPI.Init on ArCHER2 either segfaults or hangs most of the time on multi-node jobs
Oct 10, 2022
giordano
changed the title
MPI.Init on ArCHER2 either segfaults or hangs most of the time on multi-node jobsMPI.Init on ARCHER2 either segfaults or hangs most of the time on multi-node jobs
Oct 10, 2022
On ARCHER2 (UK Tier-1 system), I'm observing that calling
MPI.Init
with system's Cray MPICH (version string"MPI VERSION : CRAY MPICH version 8.1.4.31 (ANL base 3.4a2)\nMPI BUILD INFO : Thu Mar 18 17:07 2021 (git hash 3e74f0c)\n"
) either segfaults or hangs most of the time when running multi-node jobs. Note: this happens only onmaster
ofMPI.jl
, but not v0.19.2, so it'd appear something wrong is going on withMPI.jl#master
.I don't have much time to investigate this further at the moment, I'm opening this issue as a reminder to try and look into this at some point.
The text was updated successfully, but these errors were encountered: