Skip to content

WeeklyTelcon_20210119

Geoffrey Paulsen edited this page Jan 19, 2021 · 1 revision

Open MPI Weekly Telecon ---

  • Dialup Info: (Do not post to public mailing list or public wiki)

Attendees (on Web-ex)

  • Akshay Venkatesh (NVIDIA)
  • Aurelien Bouteiller (UTK)
  • Austen Lauria (IBM)
  • Brendan Cunningham (Cornelis Networks)
  • Christoph Niethammer (HLRS)
  • David Bernhold (ORNL)
  • Edgar Gabriel (UH)
  • Geoffrey Paulsen (IBM)
  • George Bosilca (UTK)
  • Hessam Mirsadeghi (UCX/nVidia)
  • Howard Pritchard (LANL)
  • Jeff Squyres (Cisco)
  • Joseph Schuchart
  • Josh Hursey (IBM)
  • Joshua Ladd (nVidia/Mellanox)
  • Matthew Dosanjh (Sandia)
  • Michael Heinz (Cornelis Networks)
  • Naughton III, Thomas (ORNL)
  • Raghu Raja (AWS)
  • Ralph Castain (Intel)
  • Todd Kordenbrock (Sandia)
  • William Zhang (AWS)

not there today (I keep this for easy cut-n-paste for future notes)

  • Artem Polyakov (nVidia/Mellanox)
  • Barrett, Brian (AWS)
  • Brandon Yates (Intel)
  • Charles Shereda (LLNL)
  • Erik Zeiske
  • Geoffroy Vallee (ARM)
  • Harumi Kuno (HPE)
  • Mark Allen (IBM)
  • Matias Cabral (Intel)
  • Nathan Hjelm (Google)
  • Noah Evans (Sandia)
  • Scott Breyer (Sandia?)
  • Shintaro iwasaki
  • Tomislav Janjusic
  • Xin Zhao (nVidia/Mellanox)

Web-Ex

  • link has changed for 2021. Please see email from Jeff Squyres to devel-core@lists.open-mpi.org on 12/15/2020 for the new link

4.0.x

  • Flux fix in master + UCX PML. Commit merged.
  • SLURM_WHOLE issue, want to stay in sync with OMPI v4.1.x.
    • Revert "v4.0.x: Update Slurm launch support"
    • Want concensus with v4.1 branch.
    • Running into lots of problems trying to srun a job with SLURM (20.11.{0,1,2})
      • with or without this patch, still seeing some other issues.
      • Not just confined to OMPI.
      • Ralph has been advising users to either downgrade or upgrade.
    • Ralph is suggesting to revert this and advise users to not use those versions of SLURM
      • SLURM has posted a 10.11.3 tarball that reverts the offending changes.
    • Reverting in Ompi v4.0 and v4.1 and Ralph reverting in PRRTE
  • v4.0 release, would like to take this ROMIO one-off fix instead of
    • https://github.com/open-mpi/ompi/pull/8370 - Fixes HDF5 on LUSTRE
    • Proposing take this one-off for v4.0.6, as a whole new ROMIO is a big change.
    • Can ask Rob Latham - what the advice of taking this in the middle of a release train.
      • How disruptive would this be?
      • case by case basis
      • Geoff will email him, and ask.
    • ROMIO Author.
    • Waiting on v4.0.6rc2 until we get an aswer.
  • Discussed https://github.com/open-mpi/ompi/issues/8321
    • Howard is trying to reproduce, but another user having difficulty reproducing.
    • Could affect
    • UCX in VM possible silent error.
    • Added blocker label.
    • in v4.0.x and master, though might be down in UCX.

v4.1

  • Flux fix in master + UCX PML.
  • 8367 - Packager Hassan and Josh will take to UCX community
  • Issue 8334 - a performance regression with AVX512 on Skylake. Still digging into.
    • Nex Gen of Processor doesn't hit this issue.
      • THIS is an issue in LAMMPS.
    • Simple MCA parameter on v4.1 to remove this code-path.
    • George had a PR to not use these by default until we can do something better.
    • Raghu tested AVX512 seems to make it slower.
    • Papers show that anything after AVX2 throttles down cores and have this effect.
    • Conservative approach is to disable the AVX enhancement by default.
      • PR 8176 disables this optimization by default.
      • Would like to merge to master (at least temporarily) to PR to v4.1
  • Issue 8379 - UCT appears to be default and not UCX
    • UCT One-sided issue
      • Everyone on call thought UCT BTL was Disabled by default.
    • Looks like a bug has crept in? Perhaps selection is not selecting the UCX correctly due to versioning?
      • He's uing UCX 1.9, but specifically disallowing UCT on anything > 1.8
  • Big performance regression in Open-MPI v4.1.0 in VAST
    • PR 8123
    • Brought in PMIx v3.2.1 as internal PMIx
      • from PMIx v3.1.5
      • Wasn't brought into master as normal, due to submodules.
      • Started bysecting v3.2.x
        • Properly support demodex PR on PMIx.
    • What is the default for preconnect?
      • If we turn preconnect-all to true, then this resolves the performance regression
      • 32Nodes 40ppn - 80 seconds wireup
      • Is direct modex the default?
    • Is it the auto-selection if PML is not specified?
    • Looks like Default in OMPI v4.1.0 was changed to Direct-Modex from Full-Modex
      • Bringing this up to PMIx standard, shouldn't have
      • The issue of Not knowing the PMLs, migh5t have caused each node to do a Direct Modex with everyone else.
      • PML Direct Modex was fixed in master, but not sure if it was taking back to OMPI v4.1
    • nVidia is proposing we revert the Direct-Modex default change
    • Ralph will make a default change in next few hours.
    • But will need an OMPI v4.1 fix soon.

Open-MPI v5.0

What's the state of ULFM (PR 7740) for v5.0?

  • Does the community want this ULFM PR 7740 for OMPI v5.0? If so, we need a PRRTE v3.0
    • Aurelien will rebase.
    • Works with PRRTE refered to ompi master submodule pointer.
    • Currently used in a bunch of places.
    • Run normal regression tests. Should not see any performance regressions.
    • When this works, can provide other tests.
    • Is a configure flag. Default is to configure in, but disabled at runtime.
      • A number of things to set to enable.
      • Aurelien is working to get a single parameter
    • Lets get some CODE reviews done.
      • Look at intersections of the core, and ensure that the NOT-ULFM paths are "clean".
    • Also we have a downstream affect PMIX and PRRTE to get a
    • Lets put a deadline on reviews. Lets say in 4 weeks, we'll push the merge button.
      • Jan 26th we'll merge if no issues

Josh and George removed Checkpoint Restart

  • Modified ABI - removed one callback/member function from some components (BTLs/PMLs) used for FT event.
    • All these structures for these components.
    • Pending for this discussion.
    • Going to version the frameworks that are affected.
    • Not this simple in practice, because usually we just return a pointer to a static object.
      • But this isn't possible anymore.
      • We don't support multiple versions
  • Do we think we should allow Open-MPI v5.0 to run with mcas from past versions?
    • Maybe good to protect against it?
    • Unless we know of someone we need to support like this, we shouldn't bend over for this.
    • Josh thinks the Container community is experimenting with this.
  • Josh has advised that Open-MPI doesn't guarantee
  • v5.0 is advertised as an ABI break.
  • In this case, the framework doesn't exist anymore.
  • George will do a check to ensure we're not loading mcas from earlier version. *

Jeff Squyres want the v5.0 RMs to generate a list of versions it'll support, to document.

  • Still need to coirdinate on this. He'd like this, this week.

  • PMIx v4.0 working on Tools, hopefully done soon.

    • PMIx go through python bindings.
    • a new Shmem component to replace
    • Still working on.
  • Dave Wooten pushed up some PRRTE patches, and making some progress there.

    • Slow but steady progress.
    • Once tool work is more stabilized on PMIx v4.0, will add some tool tests to CI.
    • Probably won't start until first of the year.
  • How is the submodule reference updatees on Open-MPI master

    • Probably be switching OMPI master to master PMIx in next few weeks.
      • PR 8319 - this failed. Should this be closed and create a new one?
    • Josh was still looking to see about adding some cross checking CI
    • When making a PRTE PR, could add some comment to the PR and it'll trigger Open-MPI CI with that PR.
  • v4.0 PMIx and PRRTE master.

    • When PRRTE branches a v2.0 branch, we can switch to that then, but that'll
  • Two different drivers:

    • OFI MTL
    • HFI support
    • Interest in PRRTE in a release, and a few other things that are already in v4.1.x
    • HAN and ADAPT as default.
    • Amazon helping testing and other resources
    • Amazon also investing to contract Ralph to help get PRRTE up to speed.
  • Other features in PMIX

    • can set GPU affinities, can query GPU info

Longer Term discussions

ROMIO Long Term (12/8)

  • What do we want to do about ROMIO in general.
    • OMPIO is the default everywhere.
    • Giles is saying the changes we made are integration changes.
      • There have been some OMPI specific changes put into ROMIO, meaning upstream maintainers refuse to help us with it.
      • We may be able to work with upstream to make a clear API between the two.
    • As a 3rd party package, should we move it upto the 3rd party packaging area, to be clear that we shouldn't make changes to this area?
  • Need to look at this treematch thing. Upstream package that is now inside of Open-MPI.
  • Might want a CI bot to watch a set of files, and flag PRs that violate principles like this.

Doc update

  • PR 8329 - convert README, HACKING, and possibly Manpages to restructured text.
    • Uses https://www.sphinx-doc.org/en/master/ (Python tool, can pip install)
    • Has a built from this PR, so we can see what it looks like.
    • Have a look. It's a different approach to have one document that's the whole thing.
      • FAQ, README, HACKING.
  • Do people even use manpages anymore? Do we need/want them in our tarballs?
  • Putting new tests there
  • Very little there so far, but working on adding some more.
  • Should have some new Sessions tests

What's going to be the state of the SM Cuda BTL and CUDA support in v5.0?

  • What's the general state? Any known issues?

  • AWS would like to get.

  • Josh Ladd - Will take internally to see what they have to say.

  • From nVidia/Mellanox, Cuda Support is through UCX, SM Cuda isn't tested that much.

  • Hessam Mirsadeg - All Cuda awareness through UCX

  • May ask George Bosilica about this.

  • Don't want to remove a BTL if someone is interested in it.

  • UCX also supports TCP via CUDA

  • PRRTE CLI on v5.0 will have some GPU functionality that Ralph is working on

  • Update 11/17/2020

    • UTK is interested in this BTL, and maybe others.
    • Still gap in the MTL use-case.
    • nVidia is not maintaining SMCuda anymore. All CUDA support will be through UCX
    • What's the state of the shared memory in the BTL?
      • This is the really old generation Shared Memory. Older than Vader.
    • Was told after a certain point, no more development in SM Cuda.
    • One option might be to
    • Another option might be to bring that SM in SMCuda to Vader(now SM)
  • Discussion on:

    • Draft Request Make default static https://github.com/open-mpi/ompi/pull/8132
    • One con is that many providers hard link against libraries, which would then make libmpi dependent on this.
    • Non-Homogenous clusters (GPUs on some nodes, and non-GPUs on some other)

Video Presentation

  • New George and Jeff are leading
  • One for Open-MPI and one for PMIx
  • In a month and a half or so. George will send date to Jeff
Clone this wiki locally