Skip to content

Meeting 2018 03

Tomislav Janjusic edited this page Jan 6, 2024 · 1 revision

Open MPI Developer's Face-to-Face Meeting

Also piggybacking a PMIx meeting in order to reduce travel for people.

Overall dates: March 20-22, 2018 (see below)

  • Open MPI
    • Tue, Mar 20: 9am-6pm
    • Wed, Mar 21: 9am-noon
    • Thu, Mar 22: 9am-noon
  • PMIx
    • Thu, Mar 22: noon-6pm
    • Fri, Mar 23: 9am-noon
  • ORTE
    • Wed, Mar 21: noon-6pm

Dinner

  • Thursday:
    • TBD

Location:

  • IBM Dallas Innovation Center web site
    • Same facility as previous Open MPI face-to-face IBM/Dallas meetings
    • Google Maps
    • Street address: 1177 South Beltline Road, Coppell, Texas 75019 USA
    • Enter on the East entrance (closest to Beltline Road)
      • An IBMer will escort you to Room A1010
      • Receptionist should have nametags for everyone.
      • Foreign Nationals welcome.
  • Surrounding Hotels with Shuttle Service (re verified Feb, 2018):
    • These 3 hotels offer shuttles both to/from IBM site AND the DFW International Airport:
      • Sheraton Grand Hotel (972-929-8400) (5 mile shuttle) - 4440 West John W Carpenter Freeway, Irving, TX 75063, USA
      • Holiday Inn Express (972-929-4499) (3 mile shuttle) - 4550 West John W Carpenter Freeway, Irving, TX 75063, USA
      • Hampton Inn (972-471-5000) (3 mile shuttle) - 1750 TX-121, Grapevine, TX 76051, USA
    • https://www.mapcustomizer.com/map/IBM%20IIC%20and%20Hotels%20-%20Map2?utm_source=share&utm_medium=email
    • NOTE: Hotels in no particular order, we just stopped calling after we found 3 hotels with shuttle services for DFW and IBM site.

Attendees

Add your name below if you plan to attend:

  1. Ralph Castain (Intel)
  2. Jeff Squyres (Cisco)
  3. Geoff Paulsen (IBM)
  4. Josh Hursey (IBM)
  5. Mark Allen (IBM)
  6. George Bosilca (UTK)
  7. Brice Goglin (Inria)
  8. Howard Pritchard (LANL)
  9. Shinji Sumimoto (Fujitsu)
  10. Takahiro Kawashima (Fujitsu)
  11. Matthew Dosanjh (Sandia National Laboratories)
  12. Edgar Gabriel (UH)
  13. Arm Patinyasakdikul (UTK)
  14. Dong Zhong (UTK)
  15. Josh Ladd (Mellanox) [Phone]
  16. Brian Barrett (AWS)
  17. Geoffroy Vallee (ORNL)
  18. Xin Zhao (Mellanox)

Remote attendance

Webex for joining remotely will be posted on the day of the meetings.

Topics to discuss

PMIx things

  1. Deeper utilization of PMIx
    Our integration strategy for PMIx so far has been just replacement - i.e., we made no code path or logic changes, but simply replaced RTE-related calls with their PMIx equivalent. Perhaps it is time to step back and take a fresh look at how we can exploit PMIx. For example, we engage in potentially multiple negotiating steps to determine a new communicator ID to ensure it is globally unique - could we instead utilize the PMIx_Connect function (which returns a globally unique nspace identifier)?
    See https://github.com/open-mpi/ompi/issues/4542 for some initial thoughts.
  2. Backward compatibility concerns
    • Need to start testing cross-version support in both PMIx and OMPI

Everything else

Done

  1. Trivial: add github contributor guidelines
  2. https://www.open-mpi.org/papers/ page:
    • Listing of academic papers gets sparse after 2007.
    • One possibility:
      • Remove listing of all academic papers from the main page (except the one seminal paper that we ask everyone to cite)
      • Keep the actual pages of all those papers, just in case there are links to them elsewhere
      • Only list BOF slides / OMPI-project-specific pages
    • ...other suggestions?
  3. Memkind mpool component needs a maintainer - the APIs being called in it have been deprecated.
    • LANL will maintain this. Howard's currently working on a simplified variant.
    • Is it really deprecated? There's a Jan 2018 release listed in http://memkind.github.io/memkind/
  4. One sided / osc_rdma updates
  5. More Jenkins testing:
    • Absoft running in EC2
      • Have license -- will install when possible.
    • ??NAG running in EC2??
      • No reply since March 5; NAG probably not interested.
  6. Webpage updates
    • What needs updates, what is good enough, lets take on some of the work.
  7. MPIR deprecation warning
    • Add to NEWS?
    • Output when attached?
    • Replaced
  8. What to do about unsupported platforms (e.g., in the context of POWER 7/BE)
    • Alastair Mc. made some good points about just letting "unsupported" platforms build and let people know THIS IS UNSUPPORTED!: https://github.com/open-mpi/ompi/issues/4349#issuecomment-364382688
    • E.g., should we re-enable POWER BE under this nomenclature? We don't know that it's broken -- we just know that we disabled it in v2.0.x and v2.1.x... for some reason.
  9. HWLOC
    • inventory collection - get GIDs etc?
    • upgrade to v2.0 planning
    • maintain internal?
  10. libevent replacement
    • remove embedded?
  11. Software-based performance counters (touches a lot of code) (not Tuesday if possible)
  12. PMIx integration
    • remove embedded?
    • forward-version compatibility in components to support packagers
  13. Now that PMIx is a stable, standalone project, is it time to talk about separating MCA into a separate library again? (this is a shade different than making OPAL a standalone library) Yes, there are many challenges with this. ...but is it time to figure them out?
  14. Check padding on MPI predefined objects to ensure adequate room for lifetime of 4.0
  15. Envar version of allow-run-as-root for container folks who keep complaining about it?
  16. Default binding policy considering #4799.
  17. Open MPI papers page: converted to "slides and presentations"
  18. Improve Jenkins reliability
    • We have regular problems with the Jenkins testers yielding false positives (e.g., full disks). These failures sometimes occur during inconvenient times such as on weekends or USA holidays when people are not available to fix them. This leaves non-USA developers (and others working on their own time) with no recourse.
    • Could/should we provide a bot to repair identifiable problems?
    • training / documentation could help bring more to help.
    • Other options?
  19. Status of coll/sm component
  20. old PR Round-up
  21. open Issue Round-up
  22. ORTE discussion
    • ThomasN wants to participate remotely via Webex (wed afternoon before 6pm ET works for me, but will follow crowd for other times)
  23. Re-evaluate compiler support for Open MPI v4.0.0 (drop older than gcc 5 support, etc)
  24. When using multiple MPI_COMM_DUPs simultaneously in multiple threads, we barf.
  25. MPI_Init Connectivity Map (IBM)
  26. Spark-MPI-TensorFlow (not Tuesday if possible)
    • Ralph will provide presentation describing what has been done, if interest
    • Initiate discussion on possible MPI Sessions role
  27. PMIx Stuff
    • Ralph will give a presentation about all the new PMIx functionality (E.g., PMIx debuggers, etc.)
    • Multiple ext component build against common external installation
      • looks feasible
      • ??,Jeff must look at libtool c,r,a issue
    • reCaptcha webex meeting info
      • Ralph, Jeff
  28. OSHMEM status. [Slides]
  29. Fujitsu's status [Slides]
    • Persistent collective operations
    • MTT run on SPARC
    • Other development status
  30. MTT update - status of the Python based client, server, viewer.
    • Walkthrough of how to move from the Perl to Python client for Open MPI testing. (Howard?)
  31. Dealing with long standing Issues
  32. Encourage people to use "unset" MCA params (vs. sentinel values).
  33. How do we expose MCA params in component packages such as PMIx?
    • Do we need some kind of "registration" API that ompi_info can call to harvest them?
  34. Endpoint management (Ralph, Jeff, Howard)
    • How to handle multiple libraries/plugins creating libfabric endpoints when "instant on" provides single endpoint?
    • Can we define a single rendezvous connection point for each proc, and then exchange endpoint-specific info via that?
    • Does that require an endpoint manager plugin for OFI?

Presentation Material

Clone this wiki locally