-
Notifications
You must be signed in to change notification settings - Fork 868
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move --host ordering fix to v3.0.x, v3.1.x, v4.0.x? #6501
Comments
I think we noticed in PR #4327, but never got the cycles to get back to it. I'd like to verify this change also fixes that issue (I think it should) |
Two good points were made on the webex:
With these arguments, it seems to make sense to back-port the commits from #6493 to v3.0.x, v3.1.x, and v4.0.x. |
I attempted to make PRs for v3.0.x, v3.1.x, and v4.0.x. Unfortunately, master's |
#6508 is the PR for v4.0.x. It's becoming a bit of a bear, both in terms of size and complexity. We're working the issue, with the intent that it'll catch whatever v4.0.x train it can. After that, we can hopefully apply a similar back-port to v3.1.x and v3.0.x, and trigger new releases there, too (i.e., with the goal that we can close out the v3.x.y series with this fix). |
Removed the "Target v2.x" label -- the fix is not needed for the v2.x series. |
I came across this issue today, in a case where I need mpirun to place the final rank on a specific node. On 3.1.x it seems to currently be impossible to achieve it in any other way than writing the whole rankfile, am I right about it? Please also note that it's not that easy to build OpenMPI >= 4.0 on some recent-ish distributions, such as Ubuntu 18.04, because 4.0 requires a newer version of hwloc than provided by the distribution. It's achievable, but it's definitely a considerable effort to upgrade the system libraries. |
@marmistrz I'm curious about your statement: why can't you build Open MPI >v4.0 on some recent Linux distributions? Open MPI comes with its own embedded hwloc that satisfies Open MPI's requirements -- meaning that even if the distro's hwloc is old, the embedded hwloc should be sufficient. Is that not working properly? |
@jsquyres when I built OpenMPI with |
FWIW: You can just not specify |
Anyway, this is probably a bug in the m4 scripts, as the doc precisely says that
|
Ah, that might be a bug in the docs. FWIW: the use of hwloc has evolved over time in Open MPI:
|
fixed in the 5.0.x release stream. no plan to back port fix to 4.1.x and older. |
Per #6298, we had an accidental change in behavior of
mpirun --host aaa,bbb
between version v2.1.x and v3.0.x. A fix just went in to master in #6493.Here's what happened:
The question is: should we put this fix on any of v3.0.x, v3.1.x, and/or v4.0.x?
Summary of behavior change
Behavior X
The ordering of hosts in the
--host
list matters:Behavior Y
The ordering of hosts in the
--host
list does not matter (note: this behavior was unintentional. It was always intended that we honor the ordering of hosts in the--host
list):Discussion points
We need to discuss this and decide what to do. Points (in no particular order):
The text was updated successfully, but these errors were encountered: