Skip to content

WeeklyTelcon_20170509

Geoffrey Paulsen edited this page Jan 9, 2018 · 1 revision

Open MPI Weekly Telcon


  • Dialup Info: (Do not post to public mailing list or public wiki)

Attendees

  • Geoff Paulsen
  • Jeff Squyres
  • Artem Polyakov
  • David Bernholdt
  • Edgar
  • Geoffroy Vallee
  • Howard
  • Joshua Ladd
  • Mohan (Amazon)
  • Nathan Hjelm
  • Ralph

Agenda

1.10.7

  • Ralph has all PRs in there but PR2247. Once that's in, then 1.10.7 RC will roll today.
  • All set to go last week, but got two down-stream issues from packagers.
    • 3450 - opal_fifo test was stalling in SUSE. PR3468 resolved this in master, and important to put into v2.0.x, v2.1.x and v3.x.
  • Two other PRs on v2.1.1 filed in the last 24 hours. Seem relatively small corner cases, but want to cut it now.
  • v2.0.x is in same situation (a bunch is backed up). Main hold up is sequencing of 4 release branches.
  • Brian sent out email Friday, about branching - think we misnamed the branch as v3.x rather than v3.0.x
    • PRs that are outstanding Friday night against v3.x will need to be refiled against v3.0.x monday morning.
    • Logistically this would involve creating the new v3.0.x branch from where v3.x is, and delete the v3.x branch. This will invalidate any outstanding PR.
  • Thanks to all who've added testing of v3.x branch.
  • PMIx - if they configure with PMIx v1 or v2, that will work, but if they configure "out of the box", SLURM with PMIx plugins, won't work.
    • PMIx shared memory is broken in PMIx (regression from PMIx v1.2)
    • Artem - as of Friday, it's in testing. So might be done by end of this week.
  • MTTs look pretty good overall.
    • Two things are adding some noise.
      • Absoft tests.
      • Cisco's running some new tests, with big failure rate, but it's trying to run tests that weren't built.
    • And many failures are still OSHMEM related, so only 12 tests failing in Cisco results.
  • Some confusion as far as which DAY of the month (1st, 30th)?
    • Last Face 2 Face discussed the 15th as the release date.
  • Would people be okay doing a release candidate without the PMIx regression?
    • With a restriction that launching SLURM 16.05 configured with PMIx will fail.
    • Fallback would be to configure Open MPI PMIX external to use the PMIX the SLURM was configured with.
  • Shoot to have a release candidate on the 23rd with new PMIx changes in.
  • Should talk at the developer's meeting if we should merge instead of cherry pick.

  • When having discussion about a PR, please ONLY PR it to master, and only after the discussion has completed, PR that to other branches. Multiple PRs for "the same" content confuses the discussion.

MTT Dev status:

  • Made a change to ONLY allow Open MPI community members trigger build / test of PRs to prevent possible malicious commit PRs.

Exceptional topics

  • Security FYI.
    • When you create a PR from a fork to a repo, GitHub opens up the branch permissions for all folks with write access to the REPO.
      • Anyone can even rebase your private fork's branch.
    • GitHub assumes that projects have code-czars "maintainers" who may want to modify your private forks.
    • When you create a PR, there is a check-box off to the side "allow edits by maintainers to modify PR".
      • Most users assume that this check-box only refer to PR title, and comments, not actual Content.
    • As a Policy we agree that good community behavior would be to NOT do this, and only talk to the person before changing anything in their fork.
    • Before this announcement, we thought that forks were always private... but not once you create a PR.
  • Request for folks to conclude discussion about PR2941
    • Jeff will refresh and comment on.
  • Face2Face -
    • Date: July 11-13
      • Nathan can come to Chicago, but not Dallas. IBM can send more folks to Dallas, but not Chicago.
      • consent was for Chicago (Book to OHARE).
    • Cisco has space in Chicago (amazon does too if that falls through).

Status Updates:

  • Geoffroy Vallee - Starting to run MTT Open MPI v3.x and Master on Oakridge systems.
    • Running into a few issues with PGI and LSF.
    • Josh Hursey (IBM) is helping.

Status Update Rotation

  1. Cisco, ORNL, UTK, NVIDIA
  2. Mellanox, Sandia, Intel
  3. LANL, Houston, IBM, Fujitsu

Back to 2017 WeeklyTelcon-2017

Clone this wiki locally