Skip to content

WeeklyTelcon_20230207

Geoffrey Paulsen edited this page Feb 8, 2023 · 1 revision

Open MPI Weekly Telecon ---

  • Dialup Info: (Do not post to public mailing list or public wiki)

Attendees (on Web-ex)

  • Geoffrey Paulsen (IBM)
  • Jeff Squyres (Cisco)
  • Austen Lauria (IBM)
  • Brendan Cunningham (Cornelis Networks)
  • David Bernholdt
  • Edgar Gabriel (AMD)
  • Howard Pritchard (LANL)
  • Josh Fisher (Cornelis Networks)
  • Josh Hursey (IBM)
  • Luke Robison (Amazon)
  • Thomas Naughton (ORNL)
  • Todd Kordenbrock (Sandia)
  • William Zhang (AWS)

Not here today, but keep here for easy cut-n-paste for future.

  • Joseph Schuchart (UTK)

New Items

  • New - 32bit (64bit came out 20 years ago)

    • Debbian noticed that Open MPI fails to build on 32bit build in configure.
      • This breaks a bunch of other packages that can't be built.
    • But are there real users? Or just inertia?
      • Looks like Inertia, but for example Boost library could just turn off MPI support for 32bit builds.
    • They're sticking with Open MPI v4.1.x for immediate need.
    • Lets check back in a week on estimate for 32bit scoping.
      • We do have 32bit testing that's turned off. So if we decide to test it's easy to reenable.
  • Issue #11347 Versioning is wrong in v5.0.x

    • We agreed v4.0.x -> v4.1.x -> v5.0.x should be ABI compatible.
      • Compile an MPI Application with v4.0.x, then RM -Rf OMPI, and then install the v5.0.0 into the same location, and it just work.
      • Did we figure out the Fortran ABI break?
        • Memory: Yes we did break Fortran ABI.
        • Broke ABI in a very narrow case, when you compile Fortran with 8byte ints, and C 4byte int.
        • Two other things that may or maynot break ABI.
        • Did some stuff with intents and asyncs, and went from named interfaces to unnamed.
          • Unsure if this affects ABI.
      • ABI mostly just care about C and mpif.h
      • Fortran library has different .so versioning.
    • Blocker for next v5.0.0rc - get library versioning correct.
    • When we talk about ABI - Fortran will be nuanced.
  • Comm Size Issue.

    • A bunch of new data added for MPI_Sessions.
    • But to honor ABI compatibility, we'll need to take some of the data out of the communicator structure and point to it to ensure the Comm structure size doesn't change between v4.1.x and v5.0.0
    • Issue #11373.
    • Will see a warning at load time
    • Need to do some testing for some File Handles.
      • Might be useful to do ABI test (that breaks every type).
      • Some GNU tools might help.
      • Might be able to rerun some test.

v4.1.x

  • Fix cuda issue, due to a bad cherry-pick from earlier.
    • Reworking a PR, in progress.
  • Made a minor change for another rc. Trying to get rc built.

v5.0.x

  • RC from last week, got pushed to this week.
    • Still waiting on https://github.com/open-mpi/ompi/issues/11354
    • may be enable dso option?
      • Accelerator initially picks CUDA and then disqualifies it, but at teardown it trys to teardown CUDA.
        • Reason it does this, is because CUDA now uses delayed startup so will still be enabled.
        • Another variable if CUDA was initialized.
      • Should also be on main (comment saying otherwise
    • Howard said after the call that this isn't a blocker for rc10
  • Cuda Framework #11354 - Howard is working on it.
    • SM-Cuda if you disable building it, the problem doesn't occur.
    • --enable-so don't see this.
    • Dont see if app initializes cuda before MPI_Init (maybe)
    • Takes a number of factors to see this.
    • If application is actually using CUDA - then everything works.
    • Problem is that app doesn't use CUDA, but sm-cuda then initializes (even though application doesn't need cuda)
      • Calls into framework, to
      • At Finalize makes calls into the accelerator, it gets cuda runtime errors.
    • We think want SM-CUDA if running on a single node.
    • Was it just the IPC or also something else? Believe it was IPC stuff.
      • No IPC support to Accelerator framework. Just direct dependency on cuda.
    • For collective cuda - never directly uses Cuda buffers, just checks and then memcopies into host.
      • All of this does use accelerator framework.
      • These three components added a direct CUDA dependency because they call CUDA directly, instead of calling through framework.
        • BTL-sM-cuda
        • Rcache-somethign-sm
        • Rcache-gpu-sm
  • ROMIO isn't included in packaging properly.
    • Issue #11364 Austen is taking a look. Might have missed something.
  • Waiting on PMIx and PRRTE submodule update.
    • Ralph pestered us to please merge it. - just merged on main.
    • Merged, will make rc10
  • Need documentation for v5.0.0
  • Manpages need an audit before release.
    • Double check --prefix behavior
    • Not the same behavior as v4.1.x
  • What is status of HAN?
    • Priority bump of HAN PR #11362 to main, need one to v5.0.x
    • Joseph pushed a bunch of data, but not on the call. Go read this.
    • Joseph had some more experiments. HAN collective component with shared memory PR, we were pretty good compared to tuned and another
      • Comparing HAN with shared Mem component.
      • How many ppr? Between 2ppr and 64ppr
    • Better numbers, would be good to document this.
      • In OSU there's always a barrier before the operation. If Barrier and operation match up well, you get lower latency.
      • We'd talked about supplying some docs about how HAN is great, and why we're enabling it for v5.0.0 by default.
        • Like to include instructions on how to reproduce as well for users.
        • document in ECP -
      • Our current resolution is to enable it as is, and fix current regressions in future releases.
      • What else is needed to enable it by default?
        • Just need to flip a switch.
        • The module that Joseph has for shared memory for HAN at the moment would need some work to add additional collectives.
        • And it relies on xpmem to be available.
        • So for now just enable HAN for collectives we have, and later enable for other collectives.
        • George would like to re-use what tuned does, without reimplementing everything, but a shared memory component is a better choice, but with more work.
        • If we don't enabled HAN now by default, it's v5.1 (best case) before it's enabled.
          • The trade offs lean toward turning it on and fixing whatever problems might be there.
        • There is a PR for tuned (increases default segment size), and changes algorithms in tuned for shared memory.
        • Need to start moving forward, rather than doing more analysis.

Main branch

Administration Topics

Clone this wiki locally