Skip to content

Meeting 2015 06

Tomislav Janjusic edited this page Jan 6, 2024 · 1 revision

June 2015 OMPI Developer's Meeting

This is a standalone meeting; it is not being held in conjunction with an MPI Forum meeting.

Logistics

Doodle for choosing the date: http://doodle.com/4arc4ciiby2ve222

  • Date: 9am Tuesday, June 23 through 3pm Thursday, June 25
  • Location:
    • Cisco Building 1, 3850 Zanker Road, San Jose, California 95134
    • Tuesday: Mt. Everest conference room
    • Wednesday: Mt. Everest conference room
    • Thursday: Mt. Aetna conference room

Link to January meeting wiki notes.

Attendees

Local attendees:

  1. Jeff Squyres - Cisco
  2. Ralph Castain - Intel
  3. Nathan Hjelm - LANL
  4. Dave Solt - IBM
  5. Dave Goodell - Cisco (may not be there for the whole meeting)
  6. Howard Pritchard - LANL
  7. Shinji Sumimoto - Fujitsu
  8. George Bosilca - UTK
  9. Devendar Bureddy - Mellanox
  10. Jithin Jose - Intel
  11. Yohann Burette - Intel
  12. Rolf vandeVaart - NVIDIA
  13. Andrew Friedley - Intel

Topics to discuss

  • ORTE process name change - replace jobid with namespace

Results

  • Version number scheme / roadmap announcement to users
    • Jeff's slides as pptx
    • Jeff's slides as PDF
    • Endless discussion - slides cleaned up. Major decisions were to revise definition of backward compatibility to be "binary compatible + CLI + MCA params"
    • v1.10.0 will not meet this definition relative to v1.8 series
    • NEWS will contain list of CLI and MCA param changes
  • Plan for v1.10.x
    • Feature-complete once PSM2 PR is committed
    • Mellanox may have some PowerPC contributions, more as bug-fixes/optimizations
  • MPI 3.1 - are we ready?
    • Jeff is blocker - Fortran changes
    • Edgar - need to know plans for non-blocking IO
    • Nathan - looking at MPICH Generalized request code to see if we can bring it into OMPI to enable ROMIO 3.1 support
    • Howard has wiki page tracking MPI-3 compliance - reviewed all the 3.0 errata and 3.1 outstanding tickets and assigned them to people
  • Open MPI legacy code: can we chop off support for some older systems?
    • E.g., can we force users to use compilers with <stdbool.h>? (has implications in opal_config/opal_config_bottom.h)
    • C99 requires stdbool.h exist, so this is stale and can be removed
    • Nathan points out that other headers and types are in this category, so we need to scrub the entire configure system to remove extraneous header and type checks (full list TBD)
  • Git / github usage. How's it going? What's going well / not well? What can we improve on?
    • One idea: should everything on master be a pull request (just to get smoke testing on a variety of systems)? NOTE: NOT advocating a code czar -- anyone can still push the merge button.
    • Jenkins usage on PRs.
    • LANL work on a Jenkins aggregator for Github.
    • Other Github webhooks that might be useful?
    • Anything else we want to tweak?
    • Cisco et al will work on completing the Jenkins aggregator project and improving both throughput and reliability of the service
    • We strongly encourage developers to use PR's to bring changes into the master. Once the testing support has been improved, we may change this policy to a requirement
    • We will look at the possibility of pulling all outstanding master PR's into an integrated tarball and running it thru MTT on a nightly basis
  • coll ml discussion - get rid of this completely?
  • Re-introducing Microsoft Windows support (http://herbsutter.com/2012/05/03/reader-qa-what-about-vc-and-c99/)
    • IBM to look at what would be required
    • Requested that IBM provide a Jenkins-like tester so we can know when we break it
    • Probably want a handcoded .project file, no-build (either no Cmake file or other way) components that cannot work under Windows
    • Windows support removed at open-mpi/ompi@a4b6fb241fe0bdf082431e8a380c1a1ab8b25799
  • collectives and CID allocation
    • George will push a commit to improve CID allocation algorithm
    • George is going to look at reviving the hier coll component and compare its performance to coll/ml
  • Fujitsu Development Status and Some Topics towards Next MPI development
  • revisit mtl one-sided support
    • Decided that we will extend the MTL interface to add one-sided APIs
    • Nathan will provide an RFC of the revised APIs
  • libfabric support
    • getting used in multiple places within the code
    • will add opal/mca/common/libfabric to centralize some of the functions
  • Can we add new environment variable CUDA_AWARE_SUPPORT and also create info key on MPI_COMM_WORLD for runtime detection?
    • use MPI_T to access the control variable which is read-only
    • add the "macro" as an extension, need to work the configure logic so it gets built whenever --enable-cuda is specified
    • Ralph volunteered to help Rolf out by creating the extension directory and creating the required configure logic
  • "Instant On" status and planning
    • Async add procs
      • Nathan reports BTL support is ready, waiting on a couple more BTLs to be updated
      • Cutoff param dictates whether or not the new path is used - smaller jobs will still do add procs for comm_world, larger jobs will use the async add_procs mode.
      • MTL support needs to be further investigated. No showstoppers apparent, but CM may need some work. Mellanox is looking at the MXM/Yalla changes. Nathan will test the PSM and PSM2 components to see how they respond to multiple calls to add_procs.
    • Removal of the rte_barrier at the end of MPI_Init should be doable. Ralph to check with Jeff to verify - Jeff may require it. If he does, we will use the "disconnect barrier" flag to indicate that this barrier should be run. Fujitsu will check their BTL, but think it would be okay
    • Removal of rte_barrier in MPI disconnect is doable IF all procs are using same BTLs (create another MCA param to indicate homogeneity) AND all those BTLs don't need it (BTL to indicate in some way, or just use the MCA param and user beware)
    • Nathan proposes another BTL flag (BTL_FLAG_HOMOGENEOUS) to indicate that we can assume that any active BTL can reach all other procs in the job (may need multiple flags to indicate greater atomicity on what can be supported in this mode and what can't) to avoid some code paths that will reduce setup time for comm_world. First step towards a more general use-case for other communicators. Definitely "user-beware" as there is no way to tell them they made a mistake.
    • Direct modex support - in PMIx, integration with ORTE underway
    • Distributed mapping - on Ralph's branch, waiting to stabilize/update once PMIx integration done and then will commit
  • Plan for v2.x
  • Coverity update
    • Nathan has reduced the OPAL defect level to a small number
    • Ralph will start work on the ORTE level
  • Thread multiple support status: waiting for 3 things
  • PMIx integration and API extension for PMIx v2.0
    • v1.0.0 released and being integrated to OMPI
    • request input and participation on 2.0 definition
  • 1.8.6 issues
    • situation wrt coll/ml vs coll/hcol
      • Devendar will work within Mellanox to tell us whether their fix to libhcol will adequately address the problem or we need an immediate 1.8.7 release
    • ompi_free_list segfaults
      • Ralph will fix the hetero mapping issue first, then we can see if this was causing the problem due to some memory corruption
  • Review action items from Jan meeting
    • Everything looks done, except for that slacker Ralph's AR
      • Error response propagation (e.g., BTL error propagation up from OPAL into ORTE and OMPI, particularly in the presence of async progress).
        • Create opal_errhandler registration, call that function with errcode and remote process involved (if applicable) when encountering error that cannot be propagated upward (e.g., async progress thread)
        • Ralph will move the orte_event_base + progress thread down to OPAL
        • Ralph will provide opal_errhandler registration and callback mechanism
        • Ralph will integrate the pmix progress thread to the OPAL one opal_event_base priority reservations:
          • error handler (top)
          • next 4 levels for BTLs
          • lowest 3 levels for ORTE/RTE
  • 2.x pruning:
    • Anything people want to delete from 2.x branch for 2.0.0 release?
    • Fault tolerance removal
    • ...?

Presentation Material

  • ...fill in content here
Clone this wiki locally