Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Elemental_jll #54

Merged
merged 9 commits into from
May 12, 2020
Merged

Use Elemental_jll #54

merged 9 commits into from
May 12, 2020

Conversation

christopher-dG
Copy link
Contributor

See CI results: This does not work 😦

Lots of segfaults happening. Not sure if they're because of the MPI version being used to run the tests, or the way that Elemental_jll was built, or what...

I wonder if JuliaParallel/MPI.jl#367 would help, since then the MPI launching the tests would be the same as Elemental is linked against (I think).

@ViralBShah
Copy link
Member

I wonder if JuliaParallel/MPI.jl#367 would help, since then the MPI launching the tests would be the same as Elemental is linked against (I think).

Yes. That will certainly help.

cc @simonbyrne

@andreasnoack
Copy link
Member

It's still not completely clear to me how this would work with a custom HPC-MPI. Could you please elaborate on what would happen if you try to install Elemental on a cluster with a custom MPI version?

@ViralBShah
Copy link
Member

ViralBShah commented Apr 14, 2020

I think in that case we need to allow people to provide their own. My thought is that either you use BB all the way, or you compile your MPI dependencies on your own (say use spack), but it isn't a good idea for us to try and provide an in-between thing where we facilitate building your own Elemental using system libraries.

@simonbyrne
Copy link
Member

simonbyrne commented Apr 14, 2020

I agree. The problem is that there isn’t really an effective mechanism for specifying which library to use. The Overrides.toml file is too limited to be of much use.

@ViralBShah
Copy link
Member

ViralBShah commented Apr 14, 2020

We perhaps need @staticfloat or @Keno to sketch out a solution to this. We can then figure out how to go about doing it. Essentially in the BB build_tarballs we have the products, and it should be possible through a package level setting to look in a user provided path for those products as opposed to the BB path.

@christopher-dG
Copy link
Contributor Author

I'm not really sure what's up with CI but @ViralBShah you can try out this branch if you like, I think it should work identically to master branch.

I was probably overzealous in deleting a bunch of the CI setup stuff, since we still have to use MPI.jl in the tests.

@christopher-dG
Copy link
Contributor Author

I'll also note that despite the build failing here, it's not actually different from master -- master is setting the exit code improperly (with &).

@ViralBShah
Copy link
Member

If the same tests as master are failing, how about mark them as expected to fail or disable and get this all merged?

@andreasnoack
Copy link
Member

Let's not merge before tests pass. If there are issues with master CI then let's fix that.

@andreasnoack
Copy link
Member

andreasnoack commented May 1, 2020

I've fixed the tests in #62 so please rebase and hopefully tests will pass here as well.

Update. I've resolved the conflict so rebasing shouldn't be necessary.

Project.toml Outdated Show resolved Hide resolved
Project.toml Outdated Show resolved Hide resolved
@andreasnoack
Copy link
Member

I've no idea why we are getting this error https://travis-ci.org/github/JuliaParallel/Elemental.jl/jobs/681894268#L1211-L1225. Any ideas?

ViralBShah and others added 2 commits May 1, 2020 09:09
Co-authored-by: Andreas Noack <andreas@noack.dk>
Co-authored-by: Andreas Noack <andreas@noack.dk>
@ViralBShah
Copy link
Member

How do I get Elemental to pick up Elemental_jll on the other Julia processes? I am using the released jll and the dev version of Elemental.jl on your branch. (edited)

julia> using Elemental
[ Info: Precompiling Elemental [902c3f28-d1ec-5e7e-8399-a24c3845ee38]
┌ Warning: Package Elemental does not have Elemental_jll in its dependencies:
│ - If you have Elemental checked out for development and have
│   added Elemental_jll as a dependency but haven't updated your primary
│   environment's manifest file, try `Pkg.resolve()`.
│ - Otherwise you may need to report an issue with Elemental
└ Loading Elemental_jll into Elemental from project dependency, future warnings for Elemental are suppressed.
ERROR: LoadError: UndefVarError: libEl not defined
Stacktrace:
 [1] include(::Module, ::String) at ./Base.jl:377
 [2] top-level scope at none:2
 [3] eval at ./boot.jl:331 [inlined]
 [4] eval(::Expr) at ./client.jl:449
 [5] top-level scope at ./none:3
in expression starting at /home/viralbshah/.julia/dev/Elemental/src/Elemental.jl:10
ERROR: Failed to precompile Elemental [902c3f28-d1ec-5e7e-8399-a24c3845ee38] to /home/viralbshah/.julia/compiled/v1.4/Elemental/mJwGy_jmS8y.ji.
Stacktrace:
 [1] error(::String) at ./error.jl:33
 [2] compilecache(::Base.PkgId, ::String) at ./loading.jl:1272
 [3] _require(::Base.PkgId) at ./loading.jl:1029
 [4] require(::Base.PkgId) at ./loading.jl:927
 [5] require(::Module, ::Symbol) at ./loading.jl:922

@giordano
Copy link
Member

giordano commented May 1, 2020

Did you run ] resolve?

@ViralBShah
Copy link
Member

As in should I do @everywhere Pkg.resolve()? I have it working reliably on a single process. It is the parallel that I am having trouble with.

@giordano
Copy link
Member

giordano commented May 1, 2020

Uh, sorry, misread the question. That's strange 😕

@christopher-dG
Copy link
Contributor Author

There's also still this MPI build error here that seems like a bug in the MPI build script.

@christopher-dG
Copy link
Contributor Author

I wonder if the other Julia processes are only using the package UUID, and so are getting the current released (without JLL) version of Elemental? Is the manifest used here? I'm not really sure.

@christopher-dG
Copy link
Contributor Author

Do we want to keep the extras/elpp.jl stuff? It should just need a tweak to update the libEl installation path.

@ViralBShah
Copy link
Member

Pinging @simonbyrne in here (since he may be interested to follow the progress and issues in here.)

@simonbyrne
Copy link
Member

Hmm, not sure what the problem is. I'll try to spend some time finishing up JuliaParallel/MPI.jl#367 next week.

@christopher-dG
Copy link
Contributor Author

The Mac issues are fixed, but I don't know what to make of this:

ERROR: LoadError: IOError: mkdir: file already exists (EEXIST)
Stacktrace:
 [1] uv_error at ./libuv.jl:97 [inlined]
 [2] mkdir(::String; mode::UInt16) at ./file.jl:177
 [3] mkpath(::String; mode::UInt16) at ./file.jl:227
 [4] mkpath at ./file.jl:222 [inlined]
 [5] compilecache_path(::Base.PkgId) at ./loading.jl:1210
 [6] compilecache(::Base.PkgId, ::String) at ./loading.jl:1240
 [7] _require(::Base.PkgId) at ./loading.jl:1029
 [8] require(::Base.PkgId) at ./loading.jl:927
 [9] require(::Module, ::Symbol) at ./loading.jl:922
 [10] include(::Module, ::String) at ./Base.jl:377
 [11] exec_options(::Base.JLOptions) at ./client.jl:288
 [12] _start() at ./client.jl:484
in expression starting at /Users/travis/build/JuliaParallel/Elemental.jl/test/tsvd.jl:1

@simonbyrne
Copy link
Member

Potentially a race condition in precompilation?

@christopher-dG
Copy link
Contributor Author

Hm, that's a good point -- it's importing a test-only dependency so the precompile in the before_script won't catch it.

@andreasnoack
Copy link
Member

We could also just disable precompilation in CI since it's also probably slowing down the run.

@christopher-dG
Copy link
Contributor Author

Looks like that part is working now at least. It's trying to run a bare mpiexec now, which is confusing because I think MPI and MPIClusterManagers should be picking up the MPICH_jll one now.

@andreasnoack andreasnoack mentioned this pull request May 12, 2020
@ViralBShah
Copy link
Member

ViralBShah commented May 12, 2020

In order to merge this, do we essentially need to fix this one?

JuliaParallel/MPIClusterManagers.jl#16

@christopher-dG
Copy link
Contributor Author

In order for tests to pass, I think so.

@christopher-dG
Copy link
Contributor Author

There still seems to be some race conditions in CI sometimes, but I think we can probably squash + merge this now?

@ViralBShah
Copy link
Member

Go for it!

@christopher-dG christopher-dG merged commit 38cbdb6 into master May 12, 2020
@christopher-dG christopher-dG deleted the cdg/jll branch May 12, 2020 22:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants