Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OMPI integration issues to be resolved #298

Closed
1 of 8 tasks
rhc54 opened this issue Jan 14, 2020 · 2 comments
Closed
1 of 8 tasks

OMPI integration issues to be resolved #298

rhc54 opened this issue Jan 14, 2020 · 2 comments

Comments

@rhc54
Copy link
Contributor

rhc54 commented Jan 14, 2020

There are a number of things that need to be done to complete the OMPI integration effort. I'm going to list them here for tracking purposes and in the hope that others might pick some of them up. If you do, please edit this comment and put your name at the beginning of the item you are working on so we avoid duplicate effort. Obviously, there will be some "ompi" items in this list. This is a "living" list, so expect more things to be added as they are identified.

  • [@rhc54] Revise command line setup/parsing. Need to expand it a bit to allow for multiple command line definitions. Need to handle different MCA params for OMPI vs PRRTE.

  • Singleton support. IIRC, I enabled PMIx_Init to support singletons - i.e., when the client is not launched by a daemon and thus has no contact information for a PMIx server. However, I didn't do anything about the case of singleton comm_spawn where the client needs to start a PMIx server and then connect back to it.

  • Resolve reported comm_spawn issues. Multiple reports of comm_spawn problems on the OMPI mailing lists and issues. Includes missing support for various MPI_Info arguments such as "add_hostfile" that may (likely) require some updates to PRRTE

  • Decide what to do about legacy ORTE MCA params. These probably need to be detected and converted to their PRRTE equivalent

  • Update PRRTE frameworks to use MCA params solely for setting default behavior, overridden on a per-job basis by user specifications.

  • [@jsquyres] Come up with a way for "ompi_info" to include PRRTE information

  • Resolve multi-mpirun connect/accept issues - do we auto-detect the presence of another DVM and launch within it, or do we launch a 2nd DVM and "connect" between them, or...?

  • Devise support for user obtaining an MPI "port", printing it out, and then feeding it to another mpirun on the cmd line for connect/accept

@jjhursey
Copy link
Member

jjhursey commented Jan 14, 2020

Testing infrastructure:

  • (@jjhursey) Setup CI for PRRTE PRs that run a defined set of tests across multiple nodes

  • (@jjhursey) Define a basic set of PRRTE specific runtime tests that can be run individually and by CI.

  • (@jjhursey) Setup CI for pmix-tests to verify new tests being proposed against PRRTE master.

    • This should be the same CI infrastructure that we use for PRRTE PR testing for consistency.
  • Create a "PMIx acceptance test" for Open MPI. This is a PMIx unit test that exercises the PMIx API/Attributes that Open MPI uses.

    • @jjhursey is taking the lead on this, but help is more than welcome.
  • Extend CI to cover multiple versions of PMIx

    • Need to cover at least master and latest v3.1.x

@rhc54
Copy link
Contributor Author

rhc54 commented Mar 30, 2020

Pulled out into individual issues

@rhc54 rhc54 closed this as completed Mar 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants