Make sure opal_start_thread always spawns pthreads #9326

devreal · 2021-08-27T14:06:46Z

Users of opal_start_thread (btl/tcp, ft, smcuda, progress thread) may spawn threads that may block in functions unaware of argobots or qthreads (e.g., libevent or read(3)). If we spawn an argobot or qthread instead of a pthread the thread
executing the argobot or qthread (potentially the main thread) blocks, leading to a deadlock situation. Open MPI expects the semantics of a pthread so we should handle all internal threads as such.

This PR moves opal_start_thread and thread identification functions into threads/base and makes them operate on pthreads. As a side-effect, even if the main thread executes an argobot or qthread, it will still be considered the main thread, i.e., MPI_Is_thread_main will return flag = 1 (which is a change in behavior from the current implementation). The argobots and qthread integration will still be used to switch between ULTs to avoid blocking.

This should be backported to 5.x once merged.

Signed-off-by: Joseph Schuchart schuchart@icl.utk.edu

ibm-ompi · 2021-08-27T14:44:24Z

The IBM CI (GNU/Scale) build failed! Please review the log, linked below.

Gist: https://gist.github.com/e86568ad1ec3570fb3ffbfcff19c0b75

ibm-ompi · 2021-08-27T14:46:13Z

The IBM CI (XL) build failed! Please review the log, linked below.

Gist: https://gist.github.com/5a6d67f030faae3c74bef8b0f00836cf

Users of `opal_start_thread` (btl/tcp, ft, smcuda, progress thread) may spawn threads that may block in functions unaware of argobots or qthreads (e.g., libevent or read(3)). If we spawn an argobot or qthread instead of a pthread the thread executing the argobot or qthread (potentially the main thread) blocks, leading to a deadlock situation. Open MPI expects the semantics of a pthread so we should handle all internal threads as such. Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>

ibm-ompi · 2021-08-27T15:11:18Z

The IBM CI (PGI) build failed! Please review the log, linked below.

Gist: https://gist.github.com/88394f49f995536b02d9c90d2997c616

bosilca · 2021-08-27T21:11:06Z

With this change all OMPI internal threads will be pthreads, but all the synchronization variables (mutex, conditions and wait_sync) will use whatever thread package has been selected. We tried to understand how the synchronization would then work, and at least in the case of argobots things seem to end-up in a futex (so it might be working as expected). However, this is a major change and before it gets pulled into OMPI we need to understand its impact on correctness and performance.

devreal · 2021-08-30T13:39:23Z

The way I see it there are correctness and potential performance issues at play here:

Correctness 1: we should not block argobots/qthreads in external library calls (e.g., libevent) unless we know that they are aware of these models (we don't). This is addressed in this PR. Does MPICH ship its own libevent with argobots support? What about libevent with qthread support?
Correctness 2: can we safely synchronize threads that are not under the control of libabt/libqthread? For Argobots, the answer seems to be yes but I couldn't find similar information in the qthread docs. However, if that wasn't possible we have another problem at our hands: no third-party thread would be allowed to call into OMPI. I don't think that is reasonable.
Performance: yes, we revert back to using pthreads that wake up once there is activity on the fd (my understanding of the setup in btl/tcp). IMO, this is secondary unless our metric is time-to-deadlock. Did we have data that suggests that having progress ULTs has a significant benefit over pthreads?

If we want to preserve spawning ULTs instead of pthreads we have to distinguish between the pthread and ULT backends and prevent the ULT from blocking, instead making them polling for whatever they are waiting on. I'm not sure whether a blocking pthread is worse than a progress ULT that polls when scheduled even if there is nothing to progress. And the overhead of distinguishing between pthread and ULTs in the upper layers seems prohibitive.

devreal added the Target: main label Aug 27, 2021

devreal requested review from hjelmn and hppritcha August 27, 2021 14:06

hppritcha approved these changes Aug 27, 2021

View reviewed changes

devreal force-pushed the opal-thread-create-pthread branch from 286095f to e3ca132 Compare August 27, 2021 15:10

devreal merged commit a429f7c into open-mpi:master Sep 14, 2021

devreal mentioned this pull request Sep 20, 2021

Make sure opal_start_thread always spawns pthreads [v5.0.x] #9395

Merged

devreal deleted the opal-thread-create-pthread branch October 3, 2022 15:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make sure opal_start_thread always spawns pthreads #9326

Make sure opal_start_thread always spawns pthreads #9326

devreal commented Aug 27, 2021

ibm-ompi commented Aug 27, 2021

ibm-ompi commented Aug 27, 2021

ibm-ompi commented Aug 27, 2021

bosilca commented Aug 27, 2021

devreal commented Aug 30, 2021

Make sure opal_start_thread always spawns pthreads #9326

Make sure opal_start_thread always spawns pthreads #9326

Conversation

devreal commented Aug 27, 2021

ibm-ompi commented Aug 27, 2021

ibm-ompi commented Aug 27, 2021

ibm-ompi commented Aug 27, 2021

bosilca commented Aug 27, 2021

devreal commented Aug 30, 2021