Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make sure opal_start_thread always spawns pthreads #9326

Merged
merged 1 commit into from
Sep 14, 2021

Conversation

devreal
Copy link
Contributor

@devreal devreal commented Aug 27, 2021

Users of opal_start_thread (btl/tcp, ft, smcuda, progress thread) may spawn threads that may block in functions unaware of argobots or qthreads (e.g., libevent or read(3)). If we spawn an argobot or qthread instead of a pthread the thread
executing the argobot or qthread (potentially the main thread) blocks, leading to a deadlock situation. Open MPI expects the semantics of a pthread so we should handle all internal threads as such.

This PR moves opal_start_thread and thread identification functions into threads/base and makes them operate on pthreads. As a side-effect, even if the main thread executes an argobot or qthread, it will still be considered the main thread, i.e., MPI_Is_thread_main will return flag = 1 (which is a change in behavior from the current implementation). The argobots and qthread integration will still be used to switch between ULTs to avoid blocking.

This should be backported to 5.x once merged.

Signed-off-by: Joseph Schuchart schuchart@icl.utk.edu

@ibm-ompi
Copy link

The IBM CI (GNU/Scale) build failed! Please review the log, linked below.

Gist: https://gist.github.com/e86568ad1ec3570fb3ffbfcff19c0b75

@ibm-ompi
Copy link

The IBM CI (XL) build failed! Please review the log, linked below.

Gist: https://gist.github.com/5a6d67f030faae3c74bef8b0f00836cf

Users of `opal_start_thread` (btl/tcp, ft, smcuda, progress thread)
may spawn threads that may block in functions unaware of argobots
or qthreads (e.g., libevent or read(3)). If we spawn an argobot or
qthread instead of a pthread the thread executing the argobot or
qthread (potentially the main thread) blocks, leading to a deadlock
situation. Open MPI expects the semantics of a pthread so
we should handle all internal threads as such.

Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
@ibm-ompi
Copy link

The IBM CI (PGI) build failed! Please review the log, linked below.

Gist: https://gist.github.com/88394f49f995536b02d9c90d2997c616

@bosilca
Copy link
Member

bosilca commented Aug 27, 2021

With this change all OMPI internal threads will be pthreads, but all the synchronization variables (mutex, conditions and wait_sync) will use whatever thread package has been selected. We tried to understand how the synchronization would then work, and at least in the case of argobots things seem to end-up in a futex (so it might be working as expected). However, this is a major change and before it gets pulled into OMPI we need to understand its impact on correctness and performance.

@devreal
Copy link
Contributor Author

devreal commented Aug 30, 2021

The way I see it there are correctness and potential performance issues at play here:

  1. Correctness 1: we should not block argobots/qthreads in external library calls (e.g., libevent) unless we know that they are aware of these models (we don't). This is addressed in this PR. Does MPICH ship its own libevent with argobots support? What about libevent with qthread support?
  2. Correctness 2: can we safely synchronize threads that are not under the control of libabt/libqthread? For Argobots, the answer seems to be yes but I couldn't find similar information in the qthread docs. However, if that wasn't possible we have another problem at our hands: no third-party thread would be allowed to call into OMPI. I don't think that is reasonable.
  3. Performance: yes, we revert back to using pthreads that wake up once there is activity on the fd (my understanding of the setup in btl/tcp). IMO, this is secondary unless our metric is time-to-deadlock. Did we have data that suggests that having progress ULTs has a significant benefit over pthreads?

If we want to preserve spawning ULTs instead of pthreads we have to distinguish between the pthread and ULT backends and prevent the ULT from blocking, instead making them polling for whatever they are waiting on. I'm not sure whether a blocking pthread is worse than a progress ULT that polls when scheduled even if there is nothing to progress. And the overhead of distinguishing between pthread and ULTs in the upper layers seems prohibitive.

@devreal devreal merged commit a429f7c into open-mpi:master Sep 14, 2021
@devreal devreal deleted the opal-thread-create-pthread branch October 3, 2022 15:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants