-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Patch gtest to terminate as soon as it detects that a ScopedTrace
has migrated to another worker thread
#1258
base: master
Are you sure you want to change the base?
Patch gtest to terminate as soon as it detects that a ScopedTrace
has migrated to another worker thread
#1258
Conversation
+ std::cerr << "ScopedTrace was created and destroyed on different " | ||
+ "std::threads, terminating to avoid segfaults, hangs, or " | ||
+ "silent corruption. Are you using any pika functionality that " | ||
+ "may yield a task after creating the ScopedTrace?\n"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error message is here. Suggestions for how we can make this as useful/informative as possible are very welcome!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
@@ -195,6 +195,7 @@ void testBackTransformationReductionToBand(SizeType m, SizeType n, SizeType mb, | |||
mat_c_h.waitLocalTiles(); | |||
SCOPED_TRACE(::testing::Message() << "m = " << m << ", n = " << n << ", mb = " << mb << ", nb = " << nb | |||
<< ", b = " << b); | |||
pika::this_thread::yield(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To do: remove these changes.
cscs-ci run |
78fdf9c
to
a15bb32
Compare
cscs-ci run |
a15bb32
to
83a4712
Compare
cscs-ci run |
external/CMakeLists.txt
Outdated
PATCH_COMMAND git cherry-pick -n 5cd81a7848203d3df6959c148eb91f1a4ff32c1d -m 1 && git apply | ||
"${CMAKE_CURRENT_SOURCE_DIR}/scoped_trace_terminate_on_thread_migrate.patch" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like this only works once. The PATCH_COMMAND
seems to be run every time CMake is run, and the second time the git apply
fails. I think we can git reset
and re-apply
, but I'm wondering if there are other options?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it should work if you replace && by ;
If you cherry-pick a commit that was already cherry-picked, it would exit 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cherry-pick was always there and works fine when repeated. The apply fails the second time because the patch doesn't apply anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I now git reset
every time before applying the patches. This seems to work, but is not particularly elegant.
+ std::cerr << "ScopedTrace was created and destroyed on different " | ||
+ "std::threads, terminating to avoid segfaults, hangs, or " | ||
+ "silent corruption. Are you using any pika functionality that " | ||
+ "may yield a task after creating the ScopedTrace?\n"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
83a4712
to
07cf5f1
Compare
cscs-ci run |
…read different than where it was destroyed gtest ScopedTrace/SCOPED_TRACE uses thread local stacks to keep track of traces. When combined with pika tasks that may yield and be stolen onto other worker threads, destroying a ScopedTrace on a different thread can segfault, cause hangs, or simply corrupt data. Instead of allowing that to happen every now and then we instead patch gtest to check if this happens and terminate as soon as it's detected. Use of ScopedTrace should be avoided in situations where tasks can yield. This patch does not fix the issue, but helps recognizing when e.g. a segfault is the cause of "only" improper use of ScopedTrace or if it is a result of other algorithmic or implementation issues.
…scoped trace to test gtest patch
07cf5f1
to
64e970c
Compare
cscs-ci run |
The patch seems to be correctly applied now, but can't be tested because CI needs to be moved to alps. |
See first commit message for more details.
I've added a temporary commit which explicitly adds more yields while holding a
ScopedTrace
to check that it actually works in CI.I've opted for a local patch which I think is easily updated if we update the gtest tag. An alternative strategy could be to apply the patch through the spack package and I don't have a strong preference either way, other than me not (yet) knowing how to apply the patch with spack to something that is pulled only at the CMake configure stage. One benefit of patching gtest like this is that it will always apply, even if one uses a spack build-env which doesn't run CMake through spack.