-
Notifications
You must be signed in to change notification settings - Fork 876
WeeklyTelcon_20190129
Geoffrey Paulsen edited this page Mar 12, 2019
·
2 revisions
- Dialup Info: (Do not post to public mailing list or public wiki)
- Geoff Paulsen
- Jeff Squyres
- Akshay Venkatesh
- Brian Barrett
- David Bernholdt
- Edgar Gabriel
- Geoffroy Vallee
- Howard Pritchard
- Josh Hursey
- Matias Cabral
- Ralph Castain
- Todd Kordenbrock
- Xin Zhao
- Aravind Gopalakrishnan (Intel)
- Joshua Ladd
- Nathan Hjelm
- Dan Topa (LANL)
- Thomas Naughton
- Akshay Venkatesh (nVidia)
- Matthew Dosanjh
- Arm (UTK)
- George
- Peter Gottesman (Cisco)
- mohan
-
Doodle to select week for OMPI face-to-face
- week of April 22 won.
- Locations: Jeff will ping Brian, Dallas or San Jose
- https://docs.google.com/forms/d/e/1FAIpQLSdrJw7xfVNo3nAfoB4dsnMu7ihiZ0WCjglo2KBZqvY_3BZkkg/viewform
-
Future of the Openib BTL on master: https://github.com/open-mpi/ompi/pull/6270
- We won't have any verbs code at all in v5.0
- --with-verbs might totally go away.
- Several common components here.
- hwloc might still need ibverbs
-
OpenIB - is only verbs based BTL that supports Instant on (no modex in MPI)
- was it integrated? No not been integated.
- UCX also was not integrated.
- Component that needs to go into the pmix pnet framework.
Review All Open Blockers
Review v3.0.x Milestones v3.0.3
- Scheduled 3.1.4 may of 2019? Probably earlier
- No progress
- Should resolve https://github.com/open-mpi/ompi/issues/6198 before releasing
Review v3.1.x Milestones v3.1.0
- Brian will put out an RC on Friday
- No progress
- Should resolve https://github.com/open-mpi/ompi/issues/6198 before releasing
Review v4.0.x Milestones v4.0.1
- Schedule: Need a quick turn around for a v4.0.1
- v4.0.0
- Merged in PMIx update.
- Adding OSHMEM API - bugfix. Need to rev .so versions correctly
- Some Fixes in onesided datatype in past week or two, not sure if this went in.
- There have been other non-blocker fixes:
- hwloc macros, libfabric, ompi-io issues fixed in master
-
https://github.com/open-mpi/ompi/issues/6278
- Removed symbols and nice message on master and v4.0.x does not give
a compile time error. What do we want?
- Do we want compile time error? Or just removed symbol and linker error
- Could add a Check for C11, and use 'static assert' for nice message.
- For older compilers could just NOT declare the function.
- but that doesn't work for v4.0.x since the symbols in the library will be there, and the comiler will only issue a warning that about no prototype, but will succeed and link correctly.
- It was decided that this is okay, if the C11 static assert check is in mpi.h. Most users set 'no prototype' as an error.
- Tests on v4.0.x started passing, but possibly false positives. We will look at how the ibm tests are passing with #6278 issue on master and v4.0.x
- Removed symbols and nice message on master and v4.0.x does not give
a compile time error. What do we want?
- Should resolve https://github.com/open-mpi/ompi/issues/6198 before releasing
- OOB TCP is ignorning virtual interfaces.
- What's the right fix? TCP btl allows virtual interfaces, but
- Want users to allow mpirun to work on node. But if we allow virtual interface, some providers don't support loopback.
- What do we do in TCP btl. Do we set the exclude for a default value
- Long term we should finish reachability functionality.
- for v4.0.x may need something in include/exclude default.
- Any fix for OOB tCP should be pushed up to PRTE/oob/tcp
- Will create an issue and solve over email with code, rather than solving on phone.
- PR6306 - RegEx - they want to push into v4.0.x.
Problem is that any RegEx we come up with has a problem in a special case.
Worried about getting into a mode where fixing something for one, there will
be a node-name convention that will break it.
PRTE threw this framework out, and just use a PMIx parser. Because this PR
would cause the PMIX parser to get out of sync. Want to have same answer out
of both parsers.
Need to Open an issue on Open MPI to ensure we don't continue breaking patterns.
Some ideas:
- Don't try to do Reg-ex, and instead do compression.
- Use a 3rd party existing reg-ex generator (generate a reg-ex from a list of hostnames)
- Any Schedule for this yet? Summer of 2019
- Discussion of schedule depends on scope discussion
- if we want to seperate Orte out for that? Might delay a bit
- May want to open up release-manager elections.
- There was a problem with PMIx v3.1.0 - should post another today.
- Cisco showing build failure.
- IBM test configure should have caused that.
- Cisco has a one-sided info check that failed a hundred times.
- Cisco install fail looks like a legit compile fail (ipv6 master)
-
PMIX direct call / PRTE replacement for ORTE.
-
Ralph's Thinking about approach.
- Perhaps we don't worry about PMI1 and PMI2 calls, and let PMIx compat support clients that make older style PMI1 or PMI2 style calls.
- Ralph will discuss with Howard best way forward.
-
Howard has been changing OMPI or OPAL places that call the PMIx framework,
- to use PMIx data structures directly in the code.
- Doesn't look like Howard would step on Ralph's toes.
-
March 4th is next MPI Forum (then June)
-
We have a new open-mpi SLACK channel for Open MPI developers.
- Not for users, just developers...
- email Jeff If you're interested in being added.
Review Master Master Pull Requests
- didn't discuss today.
Review Master MTT testing
- Mellanox, Sandia, Intel
- LANL, Houston, IBM, Fujitsu
- Amazon,
- Cisco, ORNL, UTK, NVIDIA