Skip to content

WeeklyTelcon_20230801

Geoffrey Paulsen edited this page Aug 8, 2023 · 1 revision

Small Meeting today. Missing 4x and 5x release managers. Primary discussion was brought up by Howard and Edgar:

Howard: Testing PR 11689 currently

Edgar: Issue 11818 Error handler type (https://github.com/open-mpi/ompi/pull/11818)

  • Does this PR need backport?
  • Wenduo confirmed it was done, added comment.

Edgar: https://github.com/open-mpi/ompi/issues/11831

  • main and 5.0 are both affected.
  • We can probably undo the PR in 5.0, because col-cuda is always compiled in main, but not in 5.0.
  • Need to find a fix that causes cuda_delayed_init to properly get out of the way.

Edgar has a technical question: Can we have a new SM component for OFI?

  • Libfabric SM component supports CUDA/ROCm/Intel devices
  • motivated by https://github.com/open-mpi/ompi/pull/10959
  • Howard sees it could help Frontier users, might be interested in supplying an intern to assist.
  • General agreement that having a libfabric SM component might be an efficient path to supporting the various SM paths.
  • This could be a 5.1 feature request, needs further technical investigation and discussion with corp management.
Clone this wiki locally