Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[mpi4py] Regression with ompi@main #12195

Closed
dalcinl opened this issue Dec 22, 2023 · 22 comments
Closed

[mpi4py] Regression with ompi@main #12195

dalcinl opened this issue Dec 22, 2023 · 22 comments
Assignees

Comments

@dalcinl
Copy link
Contributor

dalcinl commented Dec 22, 2023

https://github.com/mpi4py/mpi4py-testing/actions/runs/7294717237/job/19880980395

...
testConstructor (test_comm_inter.TestIntercomm.testConstructor) ... testConstructor (test_comm_inter.TestIntercomm.testConstructor) ... testConstructor (test_comm_inter.TestIntercomm.testConstructor) ... ok
ok
ok

***
An unrecoverable error occurred in the gds/shmem component.
Resolve this issue by disabling it. Set in your environment the following:
PMIX_MCA_gds=hash
***
: Success
/home/runner/work/_temp/a134b824-0923-4f27-82ae-d86582046489.sh: line 1: 144454 Aborted                 (core dumped) mpiexec -n 3 python test/main.py -v -f
Error: Process completed with exit code 134.
@dalcinl
Copy link
Contributor Author

dalcinl commented Dec 26, 2023

@rhc54 I'm also getting this output at the beginning of my runs:

--------------------------------------------------------------------------
    This help section is empty because PRRTE was built without Sphinx.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
We attempted to remove a file, but were unable to do so:

  Path:  /tmp/prte.fv-az1543-731.1001/dvm.143695
  Error: Is a directory
--------------------------------------------------------------------------

@wenduwan
Copy link
Contributor

I also see similar warnings in my local run on main branch

We attempted to remove a file, but were unable to do so:

  Path:  /tmp/prte.fv-az1543-731.1001/dvm.[14](https://github.com/mpi4py/mpi4py-testing/actions/runs/7294717237/job/19880980395#step:20:15)3695
  Error: Is a directory

@dalcinl
Copy link
Contributor Author

dalcinl commented Dec 26, 2023

@wenduwan I believe the regression comes from #12179, the last good mpi4py run was from commit e6a294b.

@rhc54
Copy link
Contributor

rhc54 commented Jan 1, 2024

I see how to disable that warning - Linux does something not POSIX compliant, but I can cover it. I'm afraid there isn't enough info about the primary error being reported here for me to do anything about it.

@rhc54
Copy link
Contributor

rhc54 commented Jan 1, 2024

Here's the warning fix: openpmix/openpmix#3252

@dalcinl
Copy link
Contributor Author

dalcinl commented Jan 2, 2024

Here's the warning fix: openpmix/openpmix#3252

Many thanks, Ralph.

I'm afraid there isn't enough info about the primary error being reported here for me to do anything about it.

Any chance that the error was actually related to the fix you pushed?
Any tips about how to get more verbose output that would help you?
If it is not much work, could you please setup a spare branch with updated submodule pointers in your personal ompi repository such that I can test your PMIx fix?

@rhc54
Copy link
Contributor

rhc54 commented Jan 2, 2024

Can't say - you haven't yet told me what the error is 😉

@dalcinl
Copy link
Contributor Author

dalcinl commented Jan 3, 2024

The error is in the description of this issue. "An unrecoverable error occurred in the gds/shmem component. ..." Looks like this happens while trying to construct an intercommunicator with MPI_Intercomm_create.

This happens when running with oversubscription (GitHub Actions runners have 2 virtual cores, and I'm running with 3 MPI processes). Maybe oversubscription has nothing to do with the error, and the issue triggers just because I'm using a small odd number of processes or some corner case like that. Unfortunately, I'm not able to run this locally right now.

@rhc54
Copy link
Contributor

rhc54 commented Jan 3, 2024

Okay, I wasn't sure if that was the error you were talking about. Hard to figure what could cause that as the procs would all have been started before the intercomm create. What does your test actually do - have a proc that spawns more procs and then tries to create a communicator between them? Is the error coming from one of the spawned procs - or all of them?

Once I know what your test does and where the error comes from, I can try to replicate it here. I doubt it has anything to do with the oversubscription.

@rhc54
Copy link
Contributor

rhc54 commented Jan 3, 2024

I see where the error is coming out and why - what I don't understand is what is triggering it. I'm not able to reproduce the problem here when running a simple test that does a spawn and then creates the intercommunicator.

Might help to also know what environment your CI is running.

I could disable the shmem component or revert the commit that added the code behind the error report. However, the person who supports that component may be out this week and I'd prefer to get his input on it. We have a meeting Thurs - I'll raise it with him if he attends.

@dalcinl
Copy link
Contributor Author

dalcinl commented Jan 3, 2024

The failing test may actually be one involving MPI_Intercomm_create_from_groups from MPI-4. Sorry I cannot be more assertive about this. As I said before, I'm not able to run things locally right now.

The MPI_Intercomm_create_from_groups API is usually used with sessions. However, this is not what my test does. The test steps are.

  1. MPI_Comm_split the WORLD comm in two halves to get intracomms with disjoint groups.
  2. Create a new intercomm1 connecting the two disjoint intracomms with MPI_Intercomm_create.
  3. Get the the local and remote groups of intercomm1.
  4. Create a new intercomm2 from the local and remote groups with MPI_Intercomm_create_from_groups.
  5. Check that intercomm1 and intercomm2 are congruent with MPI_Comm_compare.

If you can read some slightly convoluted Python, the test code is split in two hunks here and here.

PS: This particular test has had regressions at least two or three times in the past after PMIx submodule pointer updates. Looks like Open MPI folks should add a test into their suite to prevent this issue to resurfacing over and over.

@dalcinl
Copy link
Contributor Author

dalcinl commented Jan 3, 2024

Might help to also know what environment your CI is running.

GitHub Actions with GitHub-hosted runners running their Ubuntu 22.04 image.

... I'd prefer to get his input on it. ...

Yes, better to wait for the expert and get to the bottom of it.

@rhc54
Copy link
Contributor

rhc54 commented Jan 3, 2024

@hppritcha @samuelkgutierrez This may be hard to reproduce as it sounds like quite a convoluted process. Since it might have something to do with PMIx (it isn't immediately clear to me how PMIx is involved here, nor why gds/shmem would impact it), it would be good if we could distill from the test the specific PMIx calls (and their order) being made so we could create a PMIx-only model of it.

@wenduwan
Copy link
Contributor

wenduwan commented Jan 3, 2024

What worked for me in the past is to bisect on the suspect repo and locate the problematic commit first. The problem will be more obvious if we can narrow it down to a few lines.

@rhc54
Copy link
Contributor

rhc54 commented Jan 3, 2024

I already know the commit that is involved here - sorry I wasn't complete in my answer. What isn't clear is why that commit is triggered by this collage of MPI calls in this CI. Is it the environment? Is it some combination of calls that triggers it? Or something else?

Just need to understand the parameters of the problem. Would help if we could extract the MPI from the test and run it elsewhere (i.e., not in GitHub Actions and/or Ubuntu 22.04) to see if it reproduces.

@hppritcha
Copy link
Member

i'll take a look at this later today or tomorrow.

@hppritcha
Copy link
Member

reproduced on a aarch64 RHEL8 system

[st-master:1841595] SWITCHYARD for prterun-st-master-1841595@1:0:19
[st-master:1841595] [prterun-st-master-1841595@0,0] recvd pmix cmd GROUP CONSTRUCT from [prterun-st-master-1841595@1,0] bytes 133
[st-master:1841595] SWITCHYARD for prterun-st-master-1841595@1:2:20
[st-master:1841595] [prterun-st-master-1841595@0,0] recvd pmix cmd GROUP CONSTRUCT from [prterun-st-master-1841595@1,2] bytes 133
[st-master:1841595] SWITCHYARD for prterun-st-master-1841595@1:3:23
[st-master:1841595] [prterun-st-master-1841595@0,0] recvd pmix cmd GROUP CONSTRUCT from [prterun-st-master-1841595@1,3] bytes 133
[st-master:1841595] SWITCHYARD for prterun-st-master-1841595@1:1:25
[st-master:1841595] [prterun-st-master-1841595@0,0] recvd pmix cmd GROUP CONSTRUCT from [prterun-st-master-1841595@1,1] bytes 133

***
An unrecoverable error occurred in the gds/shmem component.
Resolve this issue by disabling it. Set in your environment the following:
PMIX_MCA_gds=hash
***
: Success

@hppritcha
Copy link
Member

Here's what the PMIX GDS has to say about this

testCreateFromGroups (test_comm_inter.TestIntercomm) ... testCreateFromGroups (test_comm_inter.TestIntercomm) ... testCreateFromGroups (test_comm_inter.TestIntercomm) ... [st-master:1841986] [client/pmix_client_get.c:1004] GDS FETCH KV WITH shmem
[st-master:1841986] gds:shmem:HERE AT pmix_gds_shmem_fetch,544
[st-master:1841987] [client/pmix_client_get.c:1004] GDS FETCH KV WITH shmem
[st-master:1841987] gds:shmem:HERE AT pmix_gds_shmem_fetch,544
[st-master:1841987] gds:shmem:pmix_gds_shmem_fetch:[prterun-st-master-1841982@1,1] key=pmix.loc for proc=[prterun-st-master-1841982@1,0] on scope=UNDEFINED
[st-master:1841988] [client/pmix_client_get.c:1004] GDS FETCH KV WITH shmem
[st-master:1841988] gds:shmem:HERE AT pmix_gds_shmem_fetch,544
[st-master:1841988] gds:shmem:pmix_gds_shmem_fetch:[prterun-st-master-1841982@1,2] key=pmix.loc for proc=[prterun-st-master-1841982@1,0] on scope=UNDEFINED
[st-master:1841988] [prterun-st-master-1841982@1,2] HASH:FETCH table (null) id 0 key pmix.loc
[st-master:1841986] gds:shmem:pmix_gds_shmem_fetch:[prterun-st-master-1841982@1,0] key=pmix.loc for proc=[prterun-st-master-1841982@1,1] on scope=UNDEFINED
[st-master:1841986] [prterun-st-master-1841982@1,0] HASH:FETCH table (null) id 1 key pmix.loc
[st-master:1841987] [prterun-st-master-1841982@1,1] HASH:FETCH table (null) id 0 key pmix.loc
[st-master:1841987] HASH:FETCH data for key pmix.loc not found
[st-master:1841987] [client/pmix_client_get.c:1019] GDS FETCH KV WITH hash
[st-master:1841986] HASH:FETCH data for key pmix.loc not found
[st-master:1841986] [client/pmix_client_get.c:1019] GDS FETCH KV WITH hash
[st-master:1841986] [prterun-st-master-1841982@1,0] pmix:gds:hash fetch pmix.loc for proc [prterun-st-master-1841982@1,1] on scope UNDEFINED
[st-master:1841988] HASH:FETCH data for key pmix.loc not found
[st-master:1841988] [client/pmix_client_get.c:1019] GDS FETCH KV WITH hash
[st-master:1841988] [prterun-st-master-1841982@1,2] pmix:gds:hash fetch pmix.loc for proc [prterun-st-master-1841982@1,0] on scope UNDEFINED
[st-master:1841988] [prterun-st-master-1841982@1,2] HASH:FETCH table internal id 0 key pmix.loc
[st-master:1841987] [prterun-st-master-1841982@1,1] pmix:gds:hash fetch pmix.loc for proc [prterun-st-master-1841982@1,0] on scope UNDEFINED
[st-master:1841987] [prterun-st-master-1841982@1,1] HASH:FETCH table internal id 0 key pmix.loc
[st-master:1841986] [prterun-st-master-1841982@1,0] HASH:FETCH table internal id 1 key pmix.loc
[st-master:1841986] [client/pmix_client_get.c:1004] GDS FETCH KV WITH shmem
[st-master:1841986] gds:shmem:HERE AT pmix_gds_shmem_fetch,544
[st-master:1841988] [client/pmix_client_get.c:1004] GDS FETCH KV WITH shmem
[st-master:1841988] gds:shmem:HERE AT pmix_gds_shmem_fetch,544
[st-master:1841988] gds:shmem:pmix_gds_shmem_fetch:[prterun-st-master-1841982@1,2] key=pml.base.2.0 for proc=[prterun-st-master-1841982@1,0] on scope=UNDEFINED
[st-master:1841987] [client/pmix_client_get.c:1004] GDS FETCH KV WITH shmem
[st-master:1841987] gds:shmem:HERE AT pmix_gds_shmem_fetch,544
[st-master:1841987] gds:shmem:pmix_gds_shmem_fetch:[prterun-st-master-1841982@1,1] key=pml.base.2.0 for proc=[prterun-st-master-1841982@1,0] on scope=UNDEFINED
[st-master:1841987] [prterun-st-master-1841982@1,1] HASH:FETCH table (null) id 0 key pml.base.2.0
[st-master:1841986] gds:shmem:pmix_gds_shmem_fetch:[prterun-st-master-1841982@1,0] key=pmix.loc for proc=[prterun-st-master-1841982@1,2] on scope=UNDEFINED
[st-master:1841986] [prterun-st-master-1841982@1,0] HASH:FETCH table (null) id 2 key pmix.loc
[st-master:1841988] [prterun-st-master-1841982@1,2] HASH:FETCH table (null) id 0 key pml.base.2.0
[st-master:1841988] HASH:FETCH data for key pml.base.2.0 not found
[st-master:1841988] [client/pmix_client_get.c:1019] GDS FETCH KV WITH hash
[st-master:1841988] [prterun-st-master-1841982@1,2] pmix:gds:hash fetch pml.base.2.0 for proc [prterun-st-master-1841982@1,0] on scope UNDEFINED
[st-master:1841987] HASH:FETCH data for key pml.base.2.0 not found
[st-master:1841987] [client/pmix_client_get.c:1019] GDS FETCH KV WITH hash
[st-master:1841987] [prterun-st-master-1841982@1,1] pmix:gds:hash fetch pml.base.2.0 for proc [prterun-st-master-1841982@1,0] on scope UNDEFINED
[st-master:1841987] [prterun-st-master-1841982@1,1] HASH:FETCH table internal id 0 key pml.base.2.0
[st-master:1841986] HASH:FETCH data for key pmix.loc not found
[st-master:1841986] [client/pmix_client_get.c:1019] GDS FETCH KV WITH hash
[st-master:1841986] [prterun-st-master-1841982@1,0] pmix:gds:hash fetch pmix.loc for proc [prterun-st-master-1841982@1,2] on scope UNDEFINED
[st-master:1841988] [prterun-st-master-1841982@1,2] HASH:FETCH table internal id 0 key pml.base.2.0
[st-master:1841986] [prterun-st-master-1841982@1,0] HASH:FETCH table internal id 2 key pmix.loc
[st-master:1841982] SWITCHYARD for prterun-st-master-1841982@1:0:23
[st-master:1841982] [prterun-st-master-1841982@0,0] recvd pmix cmd GROUP CONSTRUCT from [prterun-st-master-1841982@1,0] bytes 102
[st-master:1841982] [server/pmix_server_ops.c:687] GDS FETCH KV WITH hash
[st-master:1841982] [prterun-st-master-1841982@0,0] pmix:gds:hash fetch NULL for proc [prterun-st-master-1841982@1,0] on scope SHARE ON REMOTE NODES ONLY
[st-master:1841982] [prterun-st-master-1841982@0,0] HASH:FETCH table remote id 0 key NULL
[st-master:1841982] [prterun-st-master-1841982@0,0] FETCH NULL LOOKING AT btl.tcp.5.1
[st-master:1841982] [prterun-st-master-1841982@0,0] FETCH NULL LOOKING AT pml.base.2.0
[st-master:1841982] [server/pmix_server_ops.c:780] GDS FETCH KV WITH hash
[st-master:1841982] [prterun-st-master-1841982@0,0] pmix:gds:hash fetch NULL for proc [prterun-st-master-1841982@1,0] on scope SHARE ON REMOTE NODES ONLY
[st-master:1841982] [prterun-st-master-1841982@0,0] HASH:FETCH table remote id 0 key NULL
[st-master:1841982] [prterun-st-master-1841982@0,0] FETCH NULL LOOKING AT btl.tcp.5.1
[st-master:1841982] [prterun-st-master-1841982@0,0] FETCH NULL LOOKING AT pml.base.2.0
[st-master:1841982] SWITCHYARD for prterun-st-master-1841982@1:1:20
[st-master:1841982] [prterun-st-master-1841982@0,0] recvd pmix cmd GROUP CONSTRUCT from [prterun-st-master-1841982@1,1] bytes 132
[st-master:1841982] SWITCHYARD for prterun-st-master-1841982@1:2:19
[st-master:1841982] [prterun-st-master-1841982@0,0] recvd pmix cmd GROUP CONSTRUCT from [prterun-st-master-1841982@1,2] bytes 132
[st-master:1841982] [server/pmix_server_ops.c:687] GDS FETCH KV WITH hash
[st-master:1841982] [prterun-st-master-1841982@0,0] pmix:gds:hash fetch NULL for proc [prterun-st-master-1841982@1,1] on scope SHARE ON REMOTE NODES ONLY
[st-master:1841982] [prterun-st-master-1841982@0,0] HASH:FETCH table remote id 1 key NULL
[st-master:1841982] [prterun-st-master-1841982@0,0] FETCH NULL LOOKING AT btl.tcp.5.1
[st-master:1841982] [server/pmix_server_ops.c:687] GDS FETCH KV WITH hash
[st-master:1841982] [prterun-st-master-1841982@0,0] pmix:gds:hash fetch NULL for proc [prterun-st-master-1841982@1,2] on scope SHARE ON REMOTE NODES ONLY
[st-master:1841982] [prterun-st-master-1841982@0,0] HASH:FETCH table remote id 2 key NULL
[st-master:1841982] [prterun-st-master-1841982@0,0] FETCH NULL LOOKING AT btl.tcp.5.1
[st-master:1841982] [server/pmix_server_ops.c:780] GDS FETCH KV WITH hash
[st-master:1841982] [prterun-st-master-1841982@0,0] pmix:gds:hash fetch NULL for proc [prterun-st-master-1841982@1,1] on scope SHARE ON REMOTE NODES ONLY
[st-master:1841982] [prterun-st-master-1841982@0,0] HASH:FETCH table remote id 1 key NULL
[st-master:1841982] [prterun-st-master-1841982@0,0] FETCH NULL LOOKING AT btl.tcp.5.1
[st-master:1841982] [server/pmix_server_ops.c:780] GDS FETCH KV WITH hash
[st-master:1841982] [prterun-st-master-1841982@0,0] pmix:gds:hash fetch NULL for proc [prterun-st-master-1841982@1,2] on scope SHARE ON REMOTE NODES ONLY
[st-master:1841982] [prterun-st-master-1841982@0,0] HASH:FETCH table remote id 2 key NULL
[st-master:1841982] [prterun-st-master-1841982@0,0] FETCH NULL LOOKING AT btl.tcp.5.1
[st-master:1841982] [server/pmix_server_ops.c:3843] GDS STORE MODEX WITH shmem
[st-master:1841982] gds:shmem:HERE AT server_store_modex,2161
[st-master:1841982] gds:shmem:server_store_modex:[prterun-st-master-1841982@0,0] for namespace=prterun-st-master-1841982@1 (nprocs=3, buff_size=155)
[st-master:1841982] gds:shmem:server_store_modex_cb:[prterun-st-master-1841982@0,0] for namespace=prterun-st-master-1841982@1
[st-master:1841982] [prterun-st-master-1841982@0,0] HASH:STORE:QUAL table (null) rank 0 key btl.tcp.5.1
[st-master:1841982] [prterun-st-master-1841982@0,0] PREEXISTING ENTRY FOR PROC 0 KEY btl.tcp.5.1:  PMIX_VALUE:  Data type: PMIX_BYTE_OBJECT	Size: 96
[st-master:1841982] EQUAL VALUE - IGNORING
[st-master:1841982] [prterun-st-master-1841982@0,0] HASH:STORE:QUAL table (null) rank 0 key pml.base.2.0
[st-master:1841982] [prterun-st-master-1841982@0,0] PREEXISTING ENTRY FOR PROC 0 KEY pml.base.2.0:  PMIX_VALUE:  Data type: PMIX_BYTE_OBJECT	Size: 4
[st-master:1841982] EQUAL VALUE - IGNORING
[st-master:1841982] [server/pmix_server_ops.c:3843] GDS STORE MODEX WITH shmem
[st-master:1841982] gds:shmem:HERE AT server_store_modex,2161
[st-master:1841982] gds:shmem:server_store_modex:[prterun-st-master-1841982@0,0] for namespace=prterun-st-master-1841982@1 (nprocs=3, buff_size=255)
[st-master:1841982] gds:shmem:server_store_modex_cb:[prterun-st-master-1841982@0,0] for namespace=prterun-st-master-1841982@1
[st-master:1841982] [prterun-st-master-1841982@0,0] HASH:STORE:QUAL table (null) rank 1 key btl.tcp.5.1
[st-master:1841982] [prterun-st-master-1841982@0,0] PREEXISTING ENTRY FOR PROC 1 KEY btl.tcp.5.1:  PMIX_VALUE:  Data type: PMIX_BYTE_OBJECT	Size: 96
[st-master:1841982] EQUAL VALUE - IGNORING

***
An unrecoverable error occurred in the gds/shmem component.
Resolve this issue by disabling it. Set in your environment the following:
PMIX_MCA_gds=hash
***

@samuelkgutierrez
Copy link
Member

This one looks like it is mine. Can someone please assign it to me?

@hppritcha
Copy link
Member

okay i''m writing up a github action to run mpi4py tests with every PR.
@dalcinl
but i don't want to use mpi4py head of master but mpi4py 3.1.5. But where's test/main.py in that tag? Do you have an alternative way to run the tests from a tag checkout?

hppritcha added a commit to hppritcha/ompi that referenced this issue Jan 4, 2024
seems like mpi4py finds a problem almost every time
we advance openpmix/prrte shas so catch it early here.

related to open-mpi#12195

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
hppritcha added a commit to hppritcha/ompi that referenced this issue Jan 6, 2024
seems like mpi4py finds a problem almost every time
we advance openpmix/prrte shas so catch it early here.

we test mpi4py master as it contains the mpi 4 stuff that so often breaks.

related to open-mpi#12195

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
@samuelkgutierrez
Copy link
Member

Thanks again for catching this, @dalcinl. This issue should be fixed in OpenPMIx now. Once the submodule pointers are updated in Open MPI, this regression should be fixed.

hppritcha added a commit to hppritcha/ompi that referenced this issue Jan 12, 2024
seems like mpi4py finds a problem almost every time
we advance openpmix/prrte shas so catch it early here.

we test mpi4py master as it contains the mpi 4 stuff that so often breaks.

related to open-mpi#12195

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
hppritcha added a commit to hppritcha/ompi that referenced this issue Jan 16, 2024
seems like mpi4py finds a problem almost every time
we advance openpmix/prrte shas so catch it early here.

we test mpi4py master as it contains the mpi 4 stuff that so often breaks.

related to open-mpi#12195

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
hppritcha added a commit to hppritcha/ompi that referenced this issue Jan 16, 2024
seems like mpi4py finds a problem almost every time
we advance openpmix/prrte shas so catch it early here.

we test mpi4py master as it contains the mpi 4 stuff that so often breaks.

related to open-mpi#12195

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
hppritcha added a commit to hppritcha/ompi that referenced this issue Jan 16, 2024
seems like mpi4py finds a problem almost every time
we advance openpmix/prrte shas so catch it early here.

we test mpi4py master as it contains the mpi 4 stuff that so often breaks.

related to open-mpi#12195

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
hppritcha added a commit to hppritcha/ompi that referenced this issue Jan 18, 2024
seems like mpi4py finds a problem almost every time
we advance openpmix/prrte shas so catch it early here.

we test mpi4py master as it contains the mpi 4 stuff that so often breaks.

related to open-mpi#12195

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
hppritcha added a commit to hppritcha/ompi that referenced this issue Jan 19, 2024
seems like mpi4py finds a problem almost every time
we advance openpmix/prrte shas so catch it early here.

we test mpi4py master as it contains the mpi 4 stuff that so often breaks.

related to open-mpi#12195

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
@wenduwan
Copy link
Contributor

Fixes all merged in. Closing.

hppritcha added a commit to hppritcha/ompi that referenced this issue Jan 25, 2024
seems like mpi4py finds a problem almost every time
we advance openpmix/prrte shas so catch it early here.

we test mpi4py master as it contains the mpi 4 stuff that so often breaks.

related to open-mpi#12195

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
hppritcha added a commit to hppritcha/ompi that referenced this issue Jan 29, 2024
seems like mpi4py finds a problem almost every time
we advance openpmix/prrte shas so catch it early here.

we test mpi4py master as it contains the mpi 4 stuff that so often breaks.

related to open-mpi#12195

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
hppritcha added a commit to hppritcha/ompi that referenced this issue Jan 29, 2024
seems like mpi4py finds a problem almost every time
we advance openpmix/prrte shas so catch it early here.

we test mpi4py master as it contains the mpi 4 stuff that so often breaks.

related to open-mpi#12195

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
hppritcha added a commit to hppritcha/ompi that referenced this issue Jan 30, 2024
seems like mpi4py finds a problem almost every time
we advance openpmix/prrte shas so catch it early here.

we test mpi4py master as it contains the mpi 4 stuff that so often breaks.

related to open-mpi#12195

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
hppritcha added a commit to hppritcha/ompi that referenced this issue Jan 30, 2024
seems like mpi4py finds a problem almost every time
we advance openpmix/prrte shas so catch it early here.

we test mpi4py master as it contains the mpi 4 stuff that so often breaks.

related to open-mpi#12195

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants