Skip to content

Future of prrte and openpmix at and beyond Open MPI 5.0.x

Howard Pritchard edited this page Jul 25, 2024 · 16 revisions

Two different paths:

  1. Open MPI main branch (i.e., what will presumably become v6.0.x)
  2. Open MPI v5.0.x branch

Below, we'll go through each separately.

Open MPI main branch (i.e., what will presumably become v6.0.x)

  • Point to Open MPI fork of PRRTE
  • Maintain a different version number (in https://github.com/open-mpi/ompi/pull/12449, just add ompi to greek field in VERSION file)
  • The goal of this fork will be stability -- not keeping up with all new features in upstream PRRTE.
    • I.e., if we take commits from upstream/community PRRTE, it should probably be with the goal of fixing bugs -- probably (but not necessarily always) not with the goal of adding new features.
  • We can cherry-pick commits from upstream PRRTE repo if desired. As time goes on, taking individual commits from upstream PRRTE may turn more into back ports than cherry-picks (i.e., it may involve some porting efforts if/when upstream PRRTE diverges from the OMPI PRRTE
  • The idea is that the OMPI-ized PRRTE becomes an implementation detail for Open MPI
    • We'll change our docs to discuss PRRTE differently -- the upstream PRRTE will be another external launcher (like SLURM) that can be used
    • There will be no --with-prrte=DIR configure CLI option. (https://github.com/open-mpi/ompi/issues/12684)
    • We'll never build against an external PRRTE
    • We'll always build our internal OMPI PRRTE
    • Open MPI's mpirun will always use our internal OMPI PRTE
    • Any bugs that are found are viewed as bugs in mpirun -- the fact that it used a fork of the community PRRTE is just an implementation detail. It will be up to us / the OMPI community to fix the problem (not the PRRTE community). We have options on how to fix such things: e.g.,
      • fix them in the upstream/community and cherry-pick or back-port the change back to the OMPI PRRTE, or
      • fix them directly in the OMPI PRRTE
  • Over time, we can improve the integration of the OMPI PRRTE into OMPI, for example (but not limited to):
    • Move the code out of the open-mpi/prrte repo into the main open-mpi/ompi repo (i.e., eventually, there will be no need for a git submodule)
    • Move it out of the 3rd-party tree (https://github.com/open-mpi/prrte/issues/13)
    • No longer build unnecessary PRRTE executables and/or libraries (https://github.com/open-mpi/prrte/issues/8) (e.g., prun and psched and prte_info)
      • Might be a good idea to do this sooner rather than later (it's also pretty easy to do)
    • Rename all PRRTE executables and/or libraries so that we don't conflict with an actual PRRTE install. (https://github.com/open-mpi/prrte/issues/10)
    • Make it so mpirun doesn't have to exec a 2nd executable (https://github.com/open-mpi/ompi/issues/12712)
    • Integrate the PRRTE docs/man pages into OMPI docs/man pages
    • Remove schizo and only have an OMPI personality (https://github.com/open-mpi/prrte/issues/9)
    • Remove other parts of PRRTE that Open MPI is not using
    • Deprecate PRRTE MCA params
      • Probably need to alias them for at least a few versions so that they keep working for a while
      • But we should eventually get rid of them (v7.0.x?)

Open MPI v5.x release stream

  • Note: The PRRTE submodule in the OMPI v5.0.x branch is currently tracking the upstream PRRTE v3.0 branch
  • We can re-orient the Open MPI PRRTE submodule to point to the OMPI PRRTE, on the same v3.0 branch
  • We can change the version number on the OMPI PRRTE v3.0 branch (e.g., add ompi to the greek field in VERSION) to make it clear that this is the OMPI PRRTE
  • NOTE: There will never be a standalone OMPI PRRTE v3.0.x release. All releases will always be through the main Open MPI release.
  • At any time / as we see fit, we can cherry-pick any commits that end up on the upstream PRRTE v3.0 branch down to the OMPI fork
  • This actually frees us to release Open MPI v5.0.x at any time, because we can control the version number that it is released in it -- i.e., that "PRRTE" is uniquely identified. We don't have to bug Ralph for a release.
  • We don't know exactly how the community PRRTE v3.0.x series is going to progress -- i.e., how many more commits and/or releases there will be on it.
  • It is unlikely we that will ever do anything unique on the OMPI PRRTE v3.0 branch: its purpose is to act as a staging ground and unique version identifier to be bundled in Open MPI v5.0.x releases
  • If activity stops on the upstream PRRTE v3.0 branch, we can put unique bug fix commits on the OMPI PRRTE v3.0 branch (i.e., that aren't on the upstream PRRTE v3.0 branch).
  • The hope is that the Open MPI v5.0.x series will be replaced by the v6.0.x series in the "near future" so that we move forward and the maintenance of the OMPI PRRTE v3.0 branch becomes less of an issue over time

Comments

  • Q: Would it be useful to explicitly change package name for forked "ompi prrte" version so that it never escapes and causes confusion? I'm thinking specifically about change to AC_INIT that would result in something like "prrte-oe" (PRRTE Open MPI Edition) or the like. This could be in addition to change of the GREEK change for version.
    • This might also help for executables that we need to install for RTE info, e.g., prte_info would become prte-oe_info.

    • [Jeff] FWIW, I'm not too worried about this. As PRRTE becomes an internal implementation detail, I think it'll get harder to actually extract it and use it separately. As we fold the code back into Open MPI, it'll cease to be a separate entity, anyway.

      Meaning: I think that adding ompi to the greek in PRRTE's VERSION file is sufficient to differentiate this PRRTE from the upstream PRRTE. At least in the beginning, we'll really just be shipping a slightly patched version of PRTE, and the ompi designation in the version number is sufficient, IMHO.

      That being said, I won't object if someone wants to do a more significant rename, but I'm not sure it's worth the effort.

Clone this wiki locally