Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mpi4py: Regression in MPI_Alltoallv() using MPI_INPLACE #9501

Closed
dalcinl opened this issue Oct 9, 2021 · 17 comments
Closed

mpi4py: Regression in MPI_Alltoallv() using MPI_INPLACE #9501

dalcinl opened this issue Oct 9, 2021 · 17 comments

Comments

@dalcinl
Copy link
Contributor

dalcinl commented Oct 9, 2021

Thank you for taking the time to submit an issue!

Background information

What version of Open MPI are you using? (e.g., v3.0.5, v4.0.2, git branch name and hash, etc.)

git master

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

See logs.

If you are building/installing from a git clone, please copy-n-paste the output from git submodule status.

Fresh clone in a GitHub Actions runner. See logs.

Please describe the system on which you are running

  • Operating system/version: Linux
  • Computer hardware: x86_64
  • Network type:

Details of the problem

See logs: https://github.com/mpi4py/mpi4py-testing/runs/3847097248?check_suite_focus=true

This same test was working 6 days ago.

@awlauria
Copy link
Contributor

#9330

Could possibly indicate the regression on master. Is there a link to the source?

@jsquyres
Copy link
Member

Note that the ALLTOALLV change was backported to all the current release branches.

@awlauria
Copy link
Contributor

awlauria commented Oct 11, 2021

10/11 RM call - we think this is a master problem only. #9330 is currently master only.

@awlauria
Copy link
Contributor

Unfortunately, I was wrong. This was merged into v5.0.x here: #9493

It will need to be fixed prior to a v5.0.0 release.

@gpaulsen
Copy link
Member

@dalcinl Can you please retest now with latest v5.0.x now that #9652 has been merged?

@dalcinl
Copy link
Contributor Author

dalcinl commented Nov 11, 2021

@gpaulsen Off-topic, but anyway... I'm doing a manual build, as my ompi/master CI infrastructure is broken (one month by now, waiting for a resolution to issue 9501). I'm runing GCC 11.2.1 and configuring with --enable-debug --enable-mem-debug. I'm seeing tons of warnings (-Wformat, -Wpedantic, -Wcompare, etc). Are you aware of that? Or you just don't test with --enable-debug or with such a recent GCC version?

@awlauria
Copy link
Contributor

It's possible we need a warnings sweep in master. One day it would be nice to compile with -Werror...

@dalcinl
Copy link
Contributor Author

dalcinl commented Nov 11, 2021

@gpaulsen This PR seems to fix the issue with current master. I used a local build. Now I fired up a CI build in GitHub Actions.

Edit: Sorry, the link above is a build for master. A build for v5.0.x is here.

@awlauria
Copy link
Contributor

Cool, thanks @dalcinl for verifying. Closing this as the master and v5 pr's have been merged.

@dalcinl
Copy link
Contributor Author

dalcinl commented Nov 11, 2021

@awlauria Please note my edit above. I messed up. I was asked to run v5.0.x, but I ran master instead. If the fix is in both branches, then all is good.

@gpaulsen
Copy link
Member

Great Thanks. Yes this fix got into both master and v5.0.x.

@dalcinl
Copy link
Contributor Author

dalcinl commented Nov 11, 2021

@gpaulsen @awlauria Sorry, guys, looks like I messed up with my local testing. My CI is still broken, this is for branch v5.0.x: https://github.com/mpi4py/mpi4py-testing/runs/4180762484?check_suite_focus=true. Looks like I tested with only one MPI process. Using two MPI processes, failures are still in there.

@awlauria awlauria reopened this Nov 11, 2021
@awlauria
Copy link
Contributor

Thanks - reopened.

@bwbarrett
Copy link
Member

I'm pretty sure the issue is 531735c, which was part of #9493. The rest of the patch series (that actually fixes #9329) seems ok w.r.t alltoallv. Still trying to figure out the core issue

Jeff mentioned an alltoallv fix that was backported to all the release branches. I do not believe that was the change in play here, based on testing.

@awlauria
Copy link
Contributor

awlauria commented Nov 23, 2021

@dalcinl is this now passing with the latest fixes?

@dalcinl
Copy link
Contributor Author

dalcinl commented Nov 23, 2021

@awlauria CI ran 4 days ago, all tests passed openmpi.

@awlauria
Copy link
Contributor

awlauria commented Dec 1, 2021

Thanks @dalcinl - closing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants