Skip to content

WeeklyTelcon_20190903

Geoffrey Paulsen edited this page Sep 3, 2019 · 1 revision

Open MPI Weekly Telecon


  • Dialup Info: (Do not post to public mailing list or public wiki)

Attendees (on Web-ex)

  • Brendan Cunningham (Intel)
  • Edgar Gabriel (UH)
  • Erik Zeiske
  • Geoffrey Paulsen (IBM)
  • Harumi Kuno
  • Jeff Squyres (Cisco)
  • Ralph Castain (Intel)
  • Todd Kordenbrock (Sandia)
  • Howard Pritchard (LANL)
  • Tom Naughton

not there today (I keep this for easy cut-n-paste for future notes)

  • Akshay Venkatesh (NVIDIA)
  • Harumi Kuno (HPE)
  • Matthew Dosanjh (Sandia)
  • Michael Heinz (Intel)
  • Artem Polyakov (Mellanox)
  • Brandon Yates (Intel)
  • Josh Hursey (IBM)
  • Brian Barrett (AWS)
  • David Bernhold (ORNL)
  • George Bosilca (UTK)
  • Joshua Ladd (Mellanox)
  • Mark Allen (IBM)
  • Matias Cabral (Intel)
  • Nathan Hjelm (Google)
  • Noah Evans (Sandia)
  • Thomas Naughton (ORNL)
  • Xin Zhao (Mellanox)
  • mohan (AWS)

Agenda/New Business

Discussed xpmem / CMA

  • PR6844 - Want to test if this affects containers.
    • Worth the question, don't see any reason not to take this.
    • Jeff will review and add comments.
    • Howard will do some testing and talk to Charlie-cloud.

Infrastrastructure

Process enforcement bots

  • No update (Brian on vacation)

Submodule prototype

  • Merged --recurse-submodules update into ompi-scripts Jenkins script as first step. Let's see if that works.

Release Branches

Review v3.0.x Milestones v3.0.4

  • No new news

Review v3.1.x Milestones v3.1.4

  • PR6556 and PR 6621 should go to the v3.x release branches.
  • No new news

Review v4.0.x Milestones v4.0.2

  • Still have some issues; we expect to still have to do an rc2, e.g., https://github.com/open-mpi/ompi/issues/6932.

  • Discuss Issue 6568 - large messages overwhelm put

    • This SHOULD stay as a blocker, since it ends in hang.
    • We need to look for a workaround.
      • Could disable put completely.
      • Could use an opal_unlikely check of message-size, and only then kick it back if the message size is too large.
    • OB1 tries put / get, and if these don't work, it falls back to send/recv.?
    • possibly a flaw in put itself.
    • Jeff will ask george what would be viable workaround, and identify.
      • Not signing up to implement.
  • PR6942 - ready to merge.

  • https://github.com/open-mpi/ompi/issues/6949 - Geoff (and others please review)

  • MTT failures in Generic Simple unpack on v4.0.x - segfaults, assertions.

    • DDT-unpack assertion on v4.0.x
  • See older weekday notes for prior items.

Review Master Master Pull Requests

  • IBM's PGI test has NEVER worked. Is it a real issue or local to IBM.
  • nVidia bought PGI, perhaps someone there could take a look?
    • Akshay said he'd talk to a PGI person at nVidia to see.
  • Edgar mentioned that Mark Allen should rebase PR6756 and get that in to resolve an issue another customer is seeing.

CI status

  • Cray running into problems again. :frown:
    • Back on track.

v5.0.0


Depdendancies

PMIx Update

ORTE/PRRTE


Next face to face

MTT

  • IBM has to triage some failures on master and v4.0.x and some test build issues. Josh Hursey thought they might be accidentally mixing XLC and PGI compilers. Will investigate.
  • Cisco has a build failure to investigate.

Back to 2019 WeeklyTelcon-2019

Clone this wiki locally