Skip to content

WeeklyTelcon_20220927

Geoffrey Paulsen edited this page Oct 4, 2022 · 1 revision

Open MPI Weekly Telecon ---

  • Dialup Info: (Do not post to public mailing list or public wiki)

Attendees (on Web-ex)

  • Akshay Venkatesh (NVIDIA)
  • Austen Lauria (IBM)
  • Brendan Cunningham (Cornelis Networks)
  • Brian Barrett (AWS)
  • Christoph Niethammer (HLRS)
  • David Bernhold (ORNL)
  • Edgar Gabriel (UoH)
  • Geoffrey Paulsen (IBM)
  • Josh Fisher (Cornelis Networks)
  • Thomas Naughton (ORNL)
  • Todd Kordenbrock (Sandia)
  • Tommy Janjusic (nVidia)
  • William Zhang (AWS)

not there today (I keep this for easy cut-n-paste for future notes)

  • Artem Polyakov (nVidia)
  • Aurelien Bouteiller (UTK)
  • Brandon Yates (Intel)
  • Charles Shereda (LLNL)
  • Erik Zeiske
  • George Bosilca (UTK)
  • Harumi Kuno (HPE)
  • Hessam Mirsadeghi (UCX/nVidia)
  • Howard Pritchard (LANL)
  • Jan (Sandia -ULT support in Open MPI)
  • Jeff Squyres (Cisco)
  • Jingyin Tang
  • Joseph Schuchart
  • Josh Hursey (IBM)
  • Marisa Roman (Cornelius)
  • Mark Allen (IBM)
  • Matias Cabral (Intel)
  • Matthew Dosanjh (Sandia)
  • Michael Heinz (Cornelis Networks)
  • Nathan Hjelm (Google)
  • Noah Evans (Sandia)
  • Raghu Raja (AWS)
  • Ralph Castain (Intel)
  • Sam Gutierrez (LLNL)10513
  • Scott Breyer (Sandia?)
  • Shintaro iwasaki
  • Xin Zhao (nVidia)

v4.1.x

  • Multiple weeks on CVE from nvidia.
  • libevent CVE checkers - If we patch out the CVE issues (in libevent code that we're not using)
    • One of the two the warnings go away.
    • The other is just looking at libevent version numbers, so CVE issue does NOT go away.
    • Updating libevent in v4.x is painful, not painful in v5.x
      • Not updating libevent version in v4.1.x
  • v4.1.5
    • Schedule: targeting ~6 mon (Targeting November, so first RC next week or two)

v5.0.x

  • Austen posted a PR to update PRRTE+PMIx 10858

    • Git commit checker is hung. Merged new updates he posted.
    • In prep for new RC tomorrow.
  • Going to meet with Ralph 3pm Easter today to discuss regarding

  • Thomas had a todo to look into mpirun passing DVM piece.

    • Still trying to figure out if we could do everything we want from prun.
    • Right thing might be to pass DVM_URI to mpirun.
    • Happy to join in, would simplify all stuff you get with OMPI personality.
  • Discuss mca_base_env_list https://github.com/open-mpi/ompi/pull/10788

  • Discuss Remaining PRRTE CLI issues (https://github.com/open-mpi/ompi/issues/10698)

    • -N document an error if they try to error if --map-by conflict.
    • --show-progress - do the little ... on terminal to display, now it doesn't do anything.
      • DOE may set this by default in MCA parameters (makes some users feel happy)
    • --display-topo Generally we've tried to be backwards compatible.
    • -v version
    • -V verbose
    • -s|--preload-binary <- functionally it works, but with -n gets messed up
    • rankfile <- NOT deprecating
    • --mca is Open MPI's framework
    • No gprtemca. Created by PRRTE, but do we continue to support --gpmixmca?
    • --test-suicide and others all prrtedameon not exposed to the users.
      • passed to prrte launcher
  • Posted Issue Open-MPI #10698 with about 13 issue, that will need

  • No longer trust the verbage here, based on Ralph's comment

    • Not recognized from mpirun, but sited in --help.
    • Some of these aren't possible??? and mpirun -> prterun (one shot thing)
  • Should mpirun be able to talk to an existing dvm???

    • Or is it always a 1 shot thing?
    • If we have it talk to an existing DVM,
    • prte to startup prteds, and pruns at that.
    • If you're using MPI front-end, and want to interact with DVM, how should we tell users to do that?
      • What should they do?
      • Go through mpirun, or go through prun (with ompi personality?)
    • Thomas can look and see if you can get everything you need.
    • There were some common things that were difficult when switching between the two.
    • Was there an option for this in v4.1?
      • Yes, but perhaps wasn't working much.
      • Are there legacy command line options that we should support or alias?
  • Are we dropping DVM support for v5?

    • How did this work in v4?
    • Howard thought you fired up an orte something, and that would provide a command line
    • Couldn't do all of this with mpirun, it was a two stage process.
    • Had to start DVM manually, and got back a URI
      • But thought if you sourced this scziso and gave it a URI, it would do all of the right things.
    • Could add support if the user fired up using PRTE the DVM, and got URI back.
      • Don't have ompi-dvm executable in v5, so this is already a deviation.
    • What do we do?
      1. support same CLI options (and executables, etc as documented for v4.x
      2. Don't support at all in v5, and if you want to do DVM things
      3. Maybe something in the middle.
    • Does anyone care about DVM?
    • Can we run ompi_scizo / personality with vanilla PRUN?
      • Some people on call DO care about DVM.
    • Early days of Sesions needed DVM run (no longer needed in main/v5)
  • Usually if customers are interested in doing this, they're willing do to a bit more work.

    • But if we want to get v5.0.0 out in near future, it'd be more likely if we
    • Thomas gets a lot of use with mini-task, some are MPI parallel.
      • This is where DVM is useful because slamming lots of serial and parallel jobs in a short time.
      • If they can do this via prun to get ompi_schziso doesn't matter the path.
      • Thomas will investigate proper options.
    • Could do a CLI interface for mpirun in a future version to have mpirun not call prterun
      • Don't want to rush this.
  • Schedule:

    • PMIx and PRRTE changes coming at end of August.
      • PMIx v3.2 released.
      • Try to have bugfixes PRed end of August, to give time to iterate and merged.
    • Still using Critical v5.0.x Issues (https://github.com/open-mpi/ompi/projects/3) yesterday
  • Docs

    • mpirun --help is OUT OF DATE.
      • Have to do this relatively quickly, before PRRTE releases.
      • Austen, Geoff and Tomi will be
      • REASON for this, is because mpirun command line is in PRRTE.
  • mpirun manpage needs to be re-written.

    • Docs are online and can be updates asyncronously.
    • Jeff posted PR to document runpath vs rpath
      • Our configure checks some linker flags, but there might be default in linker or in system that really governs what happens.
  • Symbol Pollution - Need an issue posted.

    • OPAL_DECLSPEC - Do we have docs on this?
      • No. Intent is where do you want a symbol available?
        • Outside of your library, then use OPAL_DECLSPEC (like Windows DECLSPEC)
        • I want you to export this symbol.
    • need to clean up as much as possible.
    • Open-MPI community's perspective, our ABI is just MPI_Symbols
    • Still unfortunate. We need to clean up as much as possible.

Main branch

  • Community CI Jenkins had some errors last week.
    • Needed to upgrade to Java (11 or 17?) from Java 8, and that caused some subtle issues.
    • Cisco student to upgrade or replace use of some now deprecated jenkins plugins to improve stability/performance of jenkins.

Accelerator framework

  • Bulk of the work is merged. Some follow up patches, etc.
    • Then once this is done, will backport to v5.0.x

MTT

Administrative tasks

Face-to-face

Clone this wiki locally