Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Patch gtest to terminate as soon as it detects that a ScopedTrace has migrated to another worker thread #1258

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

msimberg
Copy link
Collaborator

See first commit message for more details.

I've added a temporary commit which explicitly adds more yields while holding a ScopedTrace to check that it actually works in CI.

I've opted for a local patch which I think is easily updated if we update the gtest tag. An alternative strategy could be to apply the patch through the spack package and I don't have a strong preference either way, other than me not (yet) knowing how to apply the patch with spack to something that is pulled only at the CMake configure stage. One benefit of patching gtest like this is that it will always apply, even if one uses a spack build-env which doesn't run CMake through spack.

@msimberg msimberg requested a review from rasolca January 14, 2025 15:22
@msimberg msimberg self-assigned this Jan 14, 2025
Comment on lines +140 to +71
+ std::cerr << "ScopedTrace was created and destroyed on different "
+ "std::threads, terminating to avoid segfaults, hangs, or "
+ "silent corruption. Are you using any pika functionality that "
+ "may yield a task after creating the ScopedTrace?\n";
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message is here. Suggestions for how we can make this as useful/informative as possible are very welcome!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@@ -195,6 +195,7 @@ void testBackTransformationReductionToBand(SizeType m, SizeType n, SizeType mb,
mat_c_h.waitLocalTiles();
SCOPED_TRACE(::testing::Message() << "m = " << m << ", n = " << n << ", mb = " << mb << ", nb = " << nb
<< ", b = " << b);
pika::this_thread::yield();
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To do: remove these changes.

@msimberg msimberg mentioned this pull request Jan 14, 2025
@msimberg
Copy link
Collaborator Author

cscs-ci run

@msimberg msimberg force-pushed the gtest-terminate-on-thread-migrate branch from 78fdf9c to a15bb32 Compare January 16, 2025 16:42
@msimberg
Copy link
Collaborator Author

cscs-ci run

@msimberg msimberg force-pushed the gtest-terminate-on-thread-migrate branch from a15bb32 to 83a4712 Compare January 16, 2025 17:00
@msimberg
Copy link
Collaborator Author

cscs-ci run

Comment on lines 21 to 22
PATCH_COMMAND git cherry-pick -n 5cd81a7848203d3df6959c148eb91f1a4ff32c1d -m 1 && git apply
"${CMAKE_CURRENT_SOURCE_DIR}/scoped_trace_terminate_on_thread_migrate.patch"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like this only works once. The PATCH_COMMAND seems to be run every time CMake is run, and the second time the git apply fails. I think we can git reset and re-apply, but I'm wondering if there are other options?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should work if you replace && by ;
If you cherry-pick a commit that was already cherry-picked, it would exit 1

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cherry-pick was always there and works fine when repeated. The apply fails the second time because the patch doesn't apply anymore.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I now git reset every time before applying the patches. This seems to work, but is not particularly elegant.

Comment on lines +140 to +71
+ std::cerr << "ScopedTrace was created and destroyed on different "
+ "std::threads, terminating to avoid segfaults, hangs, or "
+ "silent corruption. Are you using any pika functionality that "
+ "may yield a task after creating the ScopedTrace?\n";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@msimberg msimberg force-pushed the gtest-terminate-on-thread-migrate branch from 83a4712 to 07cf5f1 Compare January 17, 2025 16:31
@msimberg
Copy link
Collaborator Author

cscs-ci run

…read different than where it was destroyed

gtest ScopedTrace/SCOPED_TRACE uses thread local stacks to keep track of traces. When combined with pika
tasks that may yield and be stolen onto other worker threads, destroying a ScopedTrace on a different
thread can segfault, cause hangs, or simply corrupt data. Instead of allowing that to happen every now
and then we instead patch gtest to check if this happens and terminate as soon as it's detected. Use of
ScopedTrace should be avoided in situations where tasks can yield. This patch does not fix the issue, but
helps recognizing when e.g. a segfault is the cause of "only" improper use of ScopedTrace or if it is a
result of other algorithmic or implementation issues.
@msimberg msimberg force-pushed the gtest-terminate-on-thread-migrate branch from 07cf5f1 to 64e970c Compare January 17, 2025 19:12
@msimberg
Copy link
Collaborator Author

cscs-ci run

@msimberg
Copy link
Collaborator Author

The patch seems to be correctly applied now, but can't be tested because CI needs to be moved to alps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

2 participants