Warn if there are unstopped threads #304
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
One of the goals with UCXX is to, as much as possible, prevent the user from having to manually cleanup resources, which allows for a simpler programming model and prevents memory leaks. However, there are some resources that are notably difficult to ensure they will be cleaned up appropriately, threads for example.
With this PR the user will now be warned about running threads when a worker is being destroyed, but will nevertheless attempt to stop them. The expectation is that with this change the user can be told how to resolve such problems so that it can be done manually to guarantee there's no leakage of resources in the running thread. What currently happens is, when a certain codeblock causes a
ucxx::Worker
to go out-of-scope, if care is not taken to ensure, for example, the progress thread has already completed processing all pending tasks (e.g.,ucxx::Request
s), the surviving thread will end up destroying theucxx::Worker
, and subsequently itself, from the running progress thread which is an invalid pattern and will cause deadlocks. This is currently observed intermittently in some C++ tests, wherestd::system_error
may be raised:The tests are also modified to apply this pattern of calling
ucxx::Worker::stopProgressThread()
at teardown, and thus prevent errors like above from occurring.It may still be possible to handle this issue more gracefully, but for the moment it's best to ensure the user takes care of it while a more resilient solution can be worked on.