Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add support for mpi4py #190

Merged
merged 2 commits into from
May 7, 2019
Merged

add support for mpi4py #190

merged 2 commits into from
May 7, 2019

Conversation

basnijholt
Copy link
Member

@basnijholt basnijholt commented Apr 30, 2019

No description provided.

adaptive/runner.py Outdated Show resolved Hide resolved
@@ -693,6 +700,8 @@ def _get_ncores(ex):
return 1
elif with_distributed and isinstance(ex, distributed.cfexecutor.ClientExecutor):
return sum(n for n in ex._client.ncores().values())
elif with_mpi4py and isinstance(ex, mpi4py.futures.MPIPoolExecutor):
return mpi4py.MPI.COMM_WORLD.size - 1
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

        ex.bootup() # wait until all workers are up and running
        return executor._pool.size  # not public API!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's better, does ex._pool.size work before all the workers are up and running? Because Adaptive can handle scaling of the pool size.

@jbweston
Copy link
Contributor

jbweston commented May 1, 2019

does this "just work"? Isn't there some extra bits needed for launching workers? We should probably document this somehow...

@basnijholt
Copy link
Member Author

basnijholt commented May 1, 2019

@jbweston I'll add some more details to the docs later.

In a nutshell, it works when calling your Python script like:

mpiexec -n 16 python -m mpi4py.futures run_learner.py

or in a SLURM job

srun -n $SLURM_NTASKS --mpi=pmi2 python -m mpi4py.futures run_learner.py

@dalcinl
Copy link

dalcinl commented May 3, 2019

In a nutshell, it works when calling your Python script like:

mpiexec -n 16 python -m mpi4py.futures run_learner.py

In your desktop or laptop, it can also work like this:

export MPI4PY_MAX_WORKERS=15
mpiexec -n 1 python run_learner.py

Or you can pass max_workers=15 programmatically when creating the executor instance.

In this case, the 15 workers will be MPI-spawned at runtime. I consider this the preferred way of using mpi4py.futures, unfortunately it is not always supported by batch systems or vendor MPI implementations in supercomputers. If your code uses no more than one executor instance at a time, then both methods are practically equivalent. The difference is when you create a second executor, in the first form, all executors share all the workers, in the second form (spawn), each executor has its own set of workers. BTW, this is explained in the docs! Folks, for once in my life that I care to write docs, and you do not RTFM? Come on! 😉

@dalcinl
Copy link

dalcinl commented May 3, 2019

does this "just work"?

You are hurting my feelings 😉

@basnijholt
Copy link
Member Author

@dalcinl thanks for the comments!

I've updated the explanation for the docs.

@akhmerov or @jbweston merge if you are happy with it.

@jbweston
Copy link
Contributor

jbweston commented May 7, 2019

LGTM

@jbweston jbweston merged commit abc0f0e into master May 7, 2019
@basnijholt basnijholt mentioned this pull request May 7, 2019
@basnijholt basnijholt deleted the mpi4py_support branch May 8, 2019 23:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants