Skip to content

WeeklyTelcon_20230530

Geoffrey Paulsen edited this page Jul 25, 2023 · 2 revisions

Open MPI Weekly Telecon ---

  • Dialup Info: (Do not post to public mailing list or public wiki)

Attendees (on Web-ex)

  • Tommy Janjusic (nVidia)
  • Jeff Squires (CISCO)
  • Luke Robison (Amazon)
  • Thomas Huber
  • Todd Kordenbrock
  • Amir (ORNL)
  • Howard Pritchard (LANL)
  • Ralph Castain
  • David Bernholdt (ORNL)

New Issues:

  • #11722 - Cannot build+install with out of source builds (VPATH)

    • Possible blocker, need to update submodule pointers.
  • #11726 -N bind ppr:X:node, map by package (socket), or core

v4.1

  • No updates

v5.0

Current issues:

  • PMIX v4.2 async modex issue: https://github.com/openpmix/openpmix/issues/3077

    • Work around: -x PMIX_MCA_gds=hash or enable opal_pmix_collect_all_data
    • Need to up the timeout, fix in OMPI before PMIX_Get, increase timeout as a function of scale with user override.
    • Likely that the original issue is missing an additional variable for async modex. to ompi_pml_base_check_pml
    • New parameter exists for v5.0.x MUST be documented,
  • MCA Params issues are biggest issues now - no new updates.

  • Need to cherry-pick NIC selection (distances PR fixes) to v5.0.x

    • Several PRs will go into main, including coverity fixes.
    • Amir to open up a v5.0.x PR to track all main commits and cherry-pick to v5.0.x when finished.
    • Pending review -
    • Will create initial v5.0.x PR as a pre-PR for the NIC selection: needs review
  • UCX and enable mca dso do not mix issue: https://github.com/open-mpi/ompi/issues/11632

  • Issue #11532: mca_base_param_files option is no longer read

    • PMIX command line parsing issue fixed the first stage completed, next stage fix over the next few days.
  • PR 11681 Propagate the error from callback *Legit bug fixed by George but introduced behavior change, need community review.

Clone this wiki locally