Skip to content

WeeklyTelcon_20170221

Geoffrey Paulsen edited this page Jan 9, 2018 · 1 revision

Open MPI Weekly Telcon


  • Dialup Info: (Do not post to public mailing list or public wiki)

Attendees

  • Geoff Paulsen
  • Jeff Squyres
  • Artem Polyakov
  • Edgar Gabriel
  • Geoffroy Vallee
  • Howard
  • josh Hursey
  • Josh Ladd
  • Nathan Hjelm
  • Ralph
  • Thomas Naughton
  • Todd Kordenbrock

Agenda

  • No plans for a v1.10.7
  • No plans for a v2.0.3
  • PMIx 1.2.1 will release today.
    • Nathan thinks he can test Open MPI master with both PMIx 1.2.1 and PMIx master.
    • Nathan is still concerned, still sees lots of scaling issues.
    • Josh: Release today. PR to Open MPI tomorrow, and have a few days for Nathan to test PMIx 1.2.1 and Master.
    • Ralph: We know Open MPI v2.1 won't scale that well, the memory scaling will better, but scaling won't be as good as Open MPI v2.0.
      • Don't know how much better it will be with Direct Launch (still have memory scaling issue - doesn't use dstore, unless SLURM plugin uses it).
    • How do we message PMIx 1.2.1 rev in Open MPI v2.1 release? - Reduced memory footprint, haven't fixed launch time problem.
  • A bunch of PRs on 2.1.0 - Howard will merge in when he gets a chance.
  • Went through blocker list on v2.1.0
    • Nathan will try to get Issue 2106 in today - fail eligantly.
      • Probably not BSD specific, could show up, so adding graceful fail. Removed blocker.
    • Bcast Corruption in libnbc.
      • Hard to fix, without packing. The problem is that it's picking two different algorithms on each side.
      • We have the work around in, so Not a blocker for v2.1. Removed blocker and moved to v3.0
    • Missing a few F08 symbols in C mpi.h.
      • MPI says all constants are supposed to appear in all headers, regardless of language.
      • Not technically MPI 3.0 compliant without this. Removed blocker for v2.1.
    • So now, only two blockers are PMIx, and release checklist.
    • Do we have a code-complete date? No new features when PMIx, but bug fixes until we release?
      • We all want to get this out soon.
  • Proposal to accelerate Discuss Skipping v2.2, and moving toward v3.0 as soon as v2.1.0 is out (soon!)
    • Proposal was to branch v3.0 "soon", and then release on June 15th.
    • The four month release cycle (off of master) may not be feasible until we have better CI.
      • CI only provides faster turnaround and prevents really bad code going into master.
    • Want to release what we test, but there are many features that we can't test.
      • value in guidelines, but dangerous to set this as a hard rule, because we don't want to kill things that people out there depend on us for.
      • can request community to test Release Candidates.
    • Cutting a release branch, enables vendors to begin their back-end testing. With the four month cycle, that's
    • We know there are things in Master we need, but probably don't want to back-port to v2.x
      • So probably want to branch v3.0 pretty soon.
      • MTT master doesn't look too bad, but some issues.
    • New proposal doesn't leave much time for new features wanted for v3.0:
    • New features needed for v3.0 on call?
      • hooks framework, ugenie btl
      • put something out on devel.
    • New date-based approach allows us to ship v2.1 and push some non-regression type bugs back to the next release.
    • Write up an email for devel email list that we branch off of master for v3.0 Feb 28th [Action IBM]

MTT Dev status:


Exceptional topics

  • External Component renaming of external component symbols.
    • When we first embedding things, they weren't available in distros (at least the levels we are requiring).
    • PMIx - Not in RHEL5 or RHEL6, May be in early adopter phase were we have to carry it with us.
    • libevent - Could use alternative libev, but would HAVE to have downstream fork.
    • hwloc - configure test (do you have hwloc 1.8 or newer), due to a function introduced in 1.8, but looks like we don't actually use it, then most distros would already have hwloc, and it wouldn't be an issue.
    • Would be nice to strip them out, and have the glue to make them work.
    • What would this look like?
      • If we get rid of internal component, we'd still have 1 or more external components to link against various external component libraries.
    • Can we really go back to an hwloc 1.7?

Status Updates:

Status Update Rotation

  1. Cisco, ORNL, UTK, NVIDIA
  2. Mellanox, Sandia, Intel
  3. LANL, Houston, IBM, Fujitsu

Back to 2017 WeeklyTelcon-2017

Clone this wiki locally