Skip to content

WeeklyTelcon_20160524

Jeff Squyres edited this page Nov 18, 2016 · 1 revision

Open MPI Weekly Telcon


  • Dialup Info: (Do not post to public mailing list or public wiki)

Attendees

  • Jeff Squyres
  • Brad Benton
  • George
  • Howard
  • Joshua Ladd
  • Nathan Hjelm
  • Nysal
  • Ralph
  • Sylvain Jeaugey
  • Todd Kordenbrock

Agenda

Review 1.10

  • Milestones: https://github.com/open-mpi/ompi-release/milestones/v1.10.3
  • 5 or 6 bug reports appear over weekend. Knocked them out over the weekend. Down to 2
    • 1175 - changes the way we compute space for datatype arguments. Someone to review.
    • Everything failed Nvidia 1.10.3. Something might have gone wrong on cluster.
      • hwloc segfaulted. Known failure, but came out after last release.
      • there are about 30 commits on hwloc past 1.9.1, but he's not going to do a 1.9.2 (too old).
      • Reise is advocating moving to the 1.11.1 series.
      • WHat should we do in Open MPI 1.10.x stream? 2.x and master are already at 1.11.1
      • 1.9.1 + patch of all those commits could be doable.
      • significant changes in hwloc 1.10 and 1.11 streams.
      • apathy in meeting (or is it consensus?) for moving Open MPI 1.10.x to use hwloc 1.9.1 + all commits.
      • 1182 - Reise pretty confident that this will fix it.
    • Schedule, do an RC after Sylvan can do some testing. Perhaps release towards end of next week (June 3rd?)

Review 2.0.x

  • Wiki: https://github.com/open-mpi/ompi/wiki/Releasev20
  • Blocker Issues: https://github.com/open-mpi/ompi/issues?utf8=%E2%9C%93&q=is%3Aopen+milestone%3Av2.0.0+label%3Ablocker *
  • Milestones: https://github.com/open-mpi/ompi-release/milestones/v2.0.0
    • Hetrogenous is broken in v2.x stream. Works on master, broken in v2.x tarball.
    • A couple of old issues
    • Issue 1171 - multiple threads.
      • Nathan's patch worked very well, ran for 24hours, no failures. Concern about performance though.
      • Injection rate of messages, we're slower than MPICH by 10x for applications that do heavy multi-threading.
        • We never had this problem before, because we didn't run in multi-threaded mode anyway.
      • What Nathan did, is working, done.
    • master PR 1492 - If people are using MPI_Init_thread with MPI_THREAD_MULTIPLE -
      • on v2.x still need to use --enable-thread-multiple
      • we need a Request structure workover to give acceptable performance.
      • With fix, we go from 10x slower to 5% faster than MPICH.
      • We expect this to improve alot for 2.1.
      • Performance no impact on single threaded. Also did test_some and test_all, a little gain there.
    • 1174 - discussion about frequency of looking for other connections. If we only test every 128 loops, depending on app might never see new connection.
      • Nathan says we need a new way that BTLs register new connections.
    • 1183 - change

Review Master MTT testing (https://mtt.open-mpi.org/)

  • 23 pull requests on master, some since last October. Not TODAY (since we want George's Multithreaded thing in), but should bring them in or kill them.

MTT Dev status:

  • Try to start using MTT issues and Pull Requests to track items, rather than MTT devel list.
  • Howard's student will arrive next week. He'll open some issues like Python 3 vs 2.
  • Thread about Public OMPI testing repo. Some back and forth about this.
  • Some discussion about looking at mtt-tests repot and deciding which ones we CAN redistribute, and put them on an Open Repo. Some interest in using MTT harness for other projects (OpenHPC and others), and would be good to have tests in them.

Status Updates:


Status Update Rotation

  1. Mellanox, Sandia, Intel
  2. LANL, Houston, IBM
  3. Cisco, ORNL, UTK, NVIDIA

Back to 2016 WeeklyTelcon-2016

Clone this wiki locally