Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TSAN error triggered by std::string race in FakeRawConnection::write #944

Closed
htuch opened this issue May 11, 2017 · 4 comments
Closed

TSAN error triggered by std::string race in FakeRawConnection::write #944

htuch opened this issue May 11, 2017 · 4 comments
Labels

Comments

@htuch
Copy link
Member

htuch commented May 11, 2017

From debugging prints etc. it seems that memcpy from data.c_str() in the closure in FakeRawConnection::write() is racing with delete on std::string in the main thread execution action closure.

In gcc-4.9, the std::basic_string implementation is using reference counting and CoW. It's unclear exactly why the reference counting is not working correctly across threads, but there are some known issues (e.g. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=21334).

The workaround is to ensure the underlying std::string that is shared across threads stays alive for the test.

This may just be an artifact of TSAN, or affect other parts of Envoy. This issue should stay open until we understand better what's going on.

WARNING: ThreadSanitizer: data race (pid=7)
  Write of size 8 at 0x7b080001fdd8 by main thread:
    #0 operator delete(void*) /proc/self/cwd/third_party/llvm/llvm/projects/compiler-rt/lib/tsan/rtl/tsan_new_delete.cc:73 (integration_test+0x53b570)
    #1 operator() /usr/local/google/home/htuch/.cache/bazel/_bazel_htuch/89676793239ac96d94294e6c7a44597f/bazel-sandbox/faa4a85d-81af-4127-a69d-f89c149adef2-1/execroot/envoy/test/integration/integration_t
est.cc:158 (integration_test+0x54725d)
    #2 std::_Function_handler<void (), IntegrationTest_TcpProxyDownstreamDisconnect_Test::TestBody()::$_17>::_M_invoke(std::_Any_data const&) /google/src/files/151379707/depot/google3/third_party/crosstoo
l/v18/llvm_unstable/wrappers/bin/../../toolchain/bin/../lib/gcc/x86_64-grtev4-linux-gnu/4.9.x-google/../../../../x86_64-grtev4-linux-gnu/include/c++/4.9.x-google/functional:2039 (integration_test+0x546f4a
)
    #3 std::function<void ()>::operator()() const /google/src/files/151379707/depot/google3/third_party/crosstool/v18/llvm_unstable/wrappers/bin/../../toolchain/bin/../lib/gcc/x86_64-grtev4-linux-gnu/4.9.
x-google/../../../../x86_64-grtev4-linux-gnu/include/c++/4.9.x-google/functional:2440 (integration_test+0x55747e)
    #4 BaseIntegrationTest::executeActions(std::list<std::function<void ()>, std::allocator<std::function<void ()> > >) /usr/local/google/home/htuch/.cache/bazel/_bazel_htuch/89676793239ac96d94294e6c7a445
97f/bazel-sandbox/0d694ed6-27b8-4532-ae2e-779131fca1cd-138/execroot/envoy/./test/integration/integration.h:157 (integration_test+0x54b535)
    #5 IntegrationTest_TcpProxyDownstreamDisconnect_Test::TestBody() /usr/local/google/home/htuch/.cache/bazel/_bazel_htuch/89676793239ac96d94294e6c7a44597f/bazel-sandbox/faa4a85d-81af-4127-a69d-f89c149ad
ef2-1/execroot/envoy/test/integration/integration_test.cc:153 (integration_test+0x53ea44)
    #6 void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /usr/local/google/home/htuch/.cache/bazel/_bazel_htuch/89676793239ac96d94294e6c7a44597f/external/envoy_deps_cache_55b3dedbe251305eb00b70c97827640c/googletest.dep.build/googletest-release-1.8.0/googletest/src/gtest.cc:2402 (integration_test+0x102c4c6)
    #7 void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /usr/local/google/home/htuch/.cache/bazel/_bazel_htuch/89676
793239ac96d94294e6c7a44597f/external/envoy_deps_cache_55b3dedbe251305eb00b70c97827640c/googletest.dep.build/googletest-release-1.8.0/googletest/src/gtest.cc:2438 (integration_test+0x102c4c6)
    #8 TestRunner::RunTests(int, char**) /usr/local/google/home/htuch/.cache/bazel/_bazel_htuch/89676793239ac96d94294e6c7a44597f/bazel-sandbox/0d694ed6-27b8-4532-ae2e-779131fca1cd-107/execroot/envoy/./tes
t/test_runner.h:32 (integration_test+0xe60e79)
    #9 main /usr/local/google/home/htuch/.cache/bazel/_bazel_htuch/89676793239ac96d94294e6c7a44597f/bazel-sandbox/0d694ed6-27b8-4532-ae2e-779131fca1cd-107/execroot/envoy/test/main.cc:32 (integration_test+
0xe5ff8e)

  Previous read of size 1 at 0x7b080001fddc by thread T1:
    #0 memcpy /proc/self/cwd/third_party/llvm/llvm/projects/compiler-rt/lib/sanitizer_common/sanitizer_common_interceptors.inc:696 (integration_test+0x4e73c0)
    #1 evbuffer_add /usr/local/google/home/htuch/.cache/bazel/_bazel_htuch/89676793239ac96d94294e6c7a44597f/external/envoy_deps_cache_55b3dedbe251305eb00b70c97827640c/libevent.dep.build/libevent-2.1.8-sta
ble/buffer.c:? (integration_test+0xf15f06)
    #2 OwnedImpl /usr/local/google/home/htuch/.cache/bazel/_bazel_htuch/89676793239ac96d94294e6c7a44597f/bazel-sandbox/0d694ed6-27b8-4532-ae2e-779131fca1cd-73/execroot/envoy/source/common/buffer/buffer_im
pl.cc:106 (integration_test+0xf12ab7)
    #3 operator() /usr/local/google/home/htuch/.cache/bazel/_bazel_htuch/89676793239ac96d94294e6c7a44597f/bazel-sandbox/faa4a85d-81af-4127-a69d-f89c149adef2-0/execroot/envoy/test/integration/fake_upstream
.cc:294 (integration_test+0x591811)
    #4 std::_Function_handler<void (), FakeRawConnection::write(std::string const&)::$_8>::_M_invoke(std::_Any_data const&) /google/src/files/151379707/depot/google3/third_party/crosstool/v18/llvm_unstabl
e/wrappers/bin/../../toolchain/bin/../lib/gcc/x86_64-grtev4-linux-gnu/4.9.x-google/../../../../x86_64-grtev4-linux-gnu/include/c++/4.9.x-google/functional:2039 (integration_test+0x59144a)
    #5 std::function<void ()>::operator()() const /google/src/files/151379707/depot/google3/third_party/crosstool/v18/llvm_unstable/wrappers/bin/../../toolchain/bin/../lib/gcc/x86_64-grtev4-linux-gnu/4.9.
x-google/../../../../x86_64-grtev4-linux-gnu/include/c++/4.9.x-google/functional:2440 (integration_test+0x55747e)
    #6 Event::DispatcherImpl::runPostCallbacks() /usr/local/google/home/htuch/.cache/bazel/_bazel_htuch/89676793239ac96d94294e6c7a44597f/bazel-sandbox/0d694ed6-27b8-4532-ae2e-779131fca1cd-96/execroot/envo
y/source/common/event/dispatcher_impl.cc:152 (integration_test+0x681dd7)
    #7 operator() /usr/local/google/home/htuch/.cache/bazel/_bazel_htuch/89676793239ac96d94294e6c7a44597f/bazel-sandbox/0d694ed6-27b8-4532-ae2e-779131fca1cd-96/execroot/envoy/source/common/event/dispatche
r_impl.cc:28 (integration_test+0x6832a8)
    #8 std::_Function_handler<void (), Event::DispatcherImpl::DispatcherImpl()::$_1>::_M_invoke(std::_Any_data const&) /google/src/files/151379707/depot/google3/third_party/crosstool/v18/llvm_unstable/wra
ppers/bin/../../toolchain/bin/../lib/gcc/x86_64-grtev4-linux-gnu/4.9.x-google/../../../../x86_64-grtev4-linux-gnu/include/c++/4.9.x-google/functional:2039 (integration_test+0x682fda)
    #9 std::function<void ()>::operator()() const /google/src/files/151379707/depot/google3/third_party/crosstool/v18/llvm_unstable/wrappers/bin/../../toolchain/bin/../lib/gcc/x86_64-grtev4-linux-gnu/4.9.
x-google/../../../../x86_64-grtev4-linux-gnu/include/c++/4.9.x-google/functional:2440 (integration_test+0x55747e)
    #10 operator() /usr/local/google/home/htuch/.cache/bazel/_bazel_htuch/89676793239ac96d94294e6c7a44597f/bazel-sandbox/0d694ed6-27b8-4532-ae2e-779131fca1cd-106/execroot/envoy/source/common/event/timer_i
mpl.cc:14 (integration_test+0x68b616)
    #11 __invoke /usr/local/google/home/htuch/.cache/bazel/_bazel_htuch/89676793239ac96d94294e6c7a44597f/bazel-sandbox/0d694ed6-27b8-4532-ae2e-779131fca1cd-106/execroot/envoy/source/common/event/timer_imp
l.cc:14 (integration_test+0x68b597)
    #12 event_process_active_single_queue /usr/local/google/home/htuch/.cache/bazel/_bazel_htuch/89676793239ac96d94294e6c7a44597f/external/envoy_deps_cache_55b3dedbe251305eb00b70c97827640c/libevent.dep.bu
ild/libevent-2.1.8-stable/event.c:1646 (integration_test+0xf24432)
    #13 FakeUpstream::threadRoutine() /usr/local/google/home/htuch/.cache/bazel/_bazel_htuch/89676793239ac96d94294e6c7a44597f/bazel-sandbox/faa4a85d-81af-4127-a69d-f89c149adef2-0/execroot/envoy/test/integ
ration/fake_upstream.cc:247 (integration_test+0x58c586)
    #14 operator() /usr/local/google/home/htuch/.cache/bazel/_bazel_htuch/89676793239ac96d94294e6c7a44597f/bazel-sandbox/faa4a85d-81af-4127-a69d-f89c149adef2-0/execroot/envoy/test/integration/fake_upstrea
m.cc:220 (integration_test+0x591118)
htuch added a commit to htuch/envoy that referenced this issue May 11, 2017
See envoyproxy#944. I don't think we properly understand the root cause
yet, but this at least unblocks CI.
mattklein123 pushed a commit that referenced this issue May 11, 2017
See #944. I don't think we properly understand the root cause
yet, but this at least unblocks CI.
@mattklein123
Copy link
Member

AFAIK this repros using GCC 5.4 stdlib on 16.04 in CI, so it's definitely not fixed by GCC 5 unless i'm missing something. I haven't looked into this in detail, but I would be utterly shocked if this bug exists even in 4.9 on normally compiled code. I strongly suspect that this is somehow related to TSAN clang compilation. We probably need to compare the generated unoptimized code from TSAN to both normal clang as well as GCC 5.4.

@htuch
Copy link
Member Author

htuch commented May 11, 2017

Did you set -D _GLIBCXX_USE_CXX11_ABI=1? This is needed to use the new std::string in GCC 5.0+ since it is ABI incompatible. We explicitly disable this in Clang in the build for this reason, since it breaks the prebuilts, which are compiled with the old ABI.

tschroed pushed a commit to tschroed/envoy that referenced this issue May 11, 2017
See envoyproxy#944. I don't think we properly understand the root cause
yet, but this at least unblocks CI.
@mattklein123
Copy link
Member

Did you set -D _GLIBCXX_USE_CXX11_ABI=1?

I see. Wow. Learned something new today. Kind of amazed that I never encountered this before. I did a bunch of reading and I concur this is busted before GCC 5.

My suggestion is that we drop GCC 4.9 and move to 5.4. On trusty we already make people install the PPA to get 4.9. 5.4 is available. On Xenial it's the default. RH is so far behind that people have to compile the toolchain anyway to get 4.9 so I think it doesn't matter.

Thoughts?

@dnoe
Copy link
Contributor

dnoe commented May 11, 2017

Additional data point: This issue did not reproduce in repeated runs of 1000 simultaneous tests with "-c opt" build mode. It only occurs in debug builds.

mattklein123 added a commit that referenced this issue May 12, 2017
We are moving to GCC > 5 because std::string in GCC < 5 is not
thread safe and we were seeing issues on TSAN builds.

Fixes #944
mattklein123 added a commit that referenced this issue May 12, 2017
We are moving to GCC > 5 because std::string in GCC < 5 is not
thread safe and we were seeing issues on TSAN builds.

Fixes #944
tschroed pushed a commit to tschroed/envoy that referenced this issue May 16, 2017
See envoyproxy#944. I don't think we properly understand the root cause
yet, but this at least unblocks CI.
jpsim pushed a commit that referenced this issue Nov 28, 2022
Description: Switches CI workflows to be based off our new default main branch.
Risk Level: Low
Testing: CI

Signed-off-by: Mike Schore <mike.schore@gmail.com>
Signed-off-by: JP Simard <jp@jpsim.com>
jpsim pushed a commit that referenced this issue Nov 29, 2022
Description: Switches CI workflows to be based off our new default main branch.
Risk Level: Low
Testing: CI

Signed-off-by: Mike Schore <mike.schore@gmail.com>
Signed-off-by: JP Simard <jp@jpsim.com>
nezdolik pushed a commit to nezdolik/envoy that referenced this issue May 4, 2024
malloc_fast_path now receives oom function instead of full allocation
function and windows/patch_function.cc wasn't updated until now. It
caused assertion failures as reported in issue envoyproxy#944.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants