Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warn if there are unstopped threads #304

Merged
merged 4 commits into from
Oct 23, 2024

Conversation

pentschev
Copy link
Member

One of the goals with UCXX is to, as much as possible, prevent the user from having to manually cleanup resources, which allows for a simpler programming model and prevents memory leaks. However, there are some resources that are notably difficult to ensure they will be cleaned up appropriately, threads for example.

With this PR the user will now be warned about running threads when a worker is being destroyed, but will nevertheless attempt to stop them. The expectation is that with this change the user can be told how to resolve such problems so that it can be done manually to guarantee there's no leakage of resources in the running thread. What currently happens is, when a certain codeblock causes a ucxx::Worker to go out-of-scope, if care is not taken to ensure, for example, the progress thread has already completed processing all pending tasks (e.g., ucxx::Requests), the surviving thread will end up destroying the ucxx::Worker, and subsequently itself, from the running progress thread which is an invalid pattern and will cause deadlocks. This is currently observed intermittently in some C++ tests, where std::system_error may be raised:

terminate called after throwing an instance of 'std::system_error'
  what():  Resource deadlock avoided
timeout: the monitored command dumped core

The tests are also modified to apply this pattern of calling ucxx::Worker::stopProgressThread() at teardown, and thus prevent errors like above from occurring.

It may still be possible to handle this issue more gracefully, but for the moment it's best to ensure the user takes care of it while a more resilient solution can be worked on.

@pentschev pentschev requested review from a team as code owners October 18, 2024 14:01
cpp/src/worker.cpp Outdated Show resolved Hide resolved
cpp/src/worker.cpp Outdated Show resolved Hide resolved
Co-authored-by: Lawrence Mitchell <wence@gmx.li>
@pentschev
Copy link
Member Author

Thanks @wence- for the review and fixes.

@pentschev
Copy link
Member Author

/merge

@rapids-bot rapids-bot bot merged commit 6697d0d into rapidsai:branch-0.41 Oct 23, 2024
68 checks passed
@pentschev pentschev deleted the warn-unstopped-threads branch October 25, 2024 12:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants