Skip to content

WeeklyTelcon_20201123

Austen Lauria edited this page Nov 24, 2020 · 12 revisions

Open MPI Weekly Telecon ---

  • Dialup Info: (Do not post to public mailing list or public wiki)

Attendees (on Web-ex)

  • Jeff Squyres (Cisco)
  • Harumi Kuno (HPE)
  • Hessam Mirsadeghi (NVIDIA)
  • Austen Lauria (IBM)
  • Howard Pritchard (LANL)
  • Ralph Castain (Intel)
  • Todd Kordenbrock (Sandia)
  • William Zhang (AWS)
  • Brendan Cunningham (Intel)
  • Raghu Raja (AWS)
  • Naughton III, Thomas (ORNL)
  • Michael Heinz (Intel)
  • Matthew Dosanjh (Sandia)
  • David Bernholdt (ORNL)

not there today (I keep this for easy cut-n-paste for future notes)

  • Akshay Venkatesh (NVIDIA)

  • Aurelien Bouteiller (UTK)

  • Christoph Niethammer (HLRS)

  • Edgar Gabriel (UH)

  • George Bosilca (UTK)

  • Joseph Schuchart (HLRS)

  • Josh Hursey (IBM)

  • Noah Evans (Sandia)

  • Geoffrey Paulsen (IBM)

  • Joshua Ladd (nVidia/Mellanox)

  • Artem Polyakov (nVidia/Mellanox)

  • Tomislav Janjusic (nVidia/Mellanox)

  • Brandon Yates (Intel)

  • Charles Shereda (LLNL)

  • David Bernhold (ORNL)

  • Erik Zeiske (HPE)

  • Geoffroy Vallee (ARM)

  • Mark Allen (IBM)

  • Matias Cabral (Intel)

  • Nathan Hjelm (Google)

  • Scott Breyer (Sandia?)

  • Shintaro iwasaki

  • Xin Zhao (nVidia/Mellanox)

  • mohan (AWS)

Release Branches

Review v4.0.x Milestones v4.0.5

  • No v4.0 rc this week. Issue #8246: ROMIO/Luster -
  • Thought 4.0.x was on track for an RC, but RM's now want a better idea of Luster problem. Need to do ROMIO refresh to 3.3.2. Lots of changes between 3.3 and 3.3.2.
    • Pretty large delta for a release branch.
    • Want to get a better understanding of what's going on before another rc.
    • It may be that the right thing to do is put this on 4.1.x instead of 4.0.x.
  • Thinking of putting all unit tests for ROMIO into IBM folder. It might help catch this issue earlier.
  • This is highest priority for RM's- Howard will start testing new ROMIO this week to see if it fixes the issue. Issue #8217: Memory leaks -
  • Do we have a PR on this?
    • Asked creator of ticket - we don't think he created a PR yet.
  • Would be easy for us to create the PR.
    • Howard/Geoff Paulsen will try to do this patch next week.
  • Almost all of this patch will apply to 4.1.x as well. Issue #8252:
  • Thomas Naughton found an issue with UCX in OSU benchmark. Issue opened.

Review v4.1.x Milestones v4.1.0

  • Was close to an rc. Had the tarball's ready. But #8246 is holding it up now.
    • If upgrading ROMIO is part of the solution, it is best to put it now.
  • Other than that, RM's believe they have everything ready for an rc.
  • Going to go ahead and release an rc anyway, so please test it!
    • Not going to lose anything if RM's do another rc with new ROMIO.
  • Ralph is still getting a flood of warnings on v4.1.x.
    • Jeff Squyres will take a look again.

Review v5.0.0 Milestones v5.0.0

  • No updates from RM's. Haven't met in a couple weeks due to conflicting schedules.
  • Ralph has updated PMIx/PRRTE pointers.

Master

MTT master failures:

  • MTT compile failures with Clang.
  • Invalid window failures.
    • Jeff Squyres will ask Nathan Hjelm. These are happening because OSC pt2pt is gone.
  • Attribute tests reporting an invalid communicator.
  • Other than that, MTT looks pretty clean on master.

Other misc issues

  • Jeff: Docs issue
    • Sphinx / ReadTheDocs / RST going well. README's done. Working on FAQ. Man pages will come later (waiting for students to finish their part).
    • Doing some minor restructuring.
      • We could really use a definitive list in the README section (i.e., near the top of the docs) about:
        • What Operating Systems are supported
        • What Network stacks are supported
        • What versions of 3rd-party libraries are supported:
          • PMIx
          • PRRTE
          • hwloc
          • libevent
  • Jeff/George: State of the State Of the Union
  • Howard: ROMIO issue: some problem with UCX...?
  • Howard: some other smaller random issues
Clone this wiki locally