-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Detecting GLIBC version (DTLS SIGSEGV). #1322
Comments
Temporarily we disable DTLS with |
I was trying to update to the new sysrepo, and that included bumping buildroot. That, however, resulted in too new `fmt`, which was not compatible with the version of `spdlog` that we're bundling. Solve that by using the systemwide spdlog, and while we're at it, do that for pybind11 as well. I don't think we're ever changing it. Also try to switch to the systemwide Boost. That's only 1.69 on Fedora 32 (which is older than we have today), but the good thing is that Fedora 33 has Boost 1.73. Let's see what breaks. I tried ot do this on legacy sysrepo, but there are some issues with libev behavior, so I gave up and I'm interested in getting that thing working with the current stack. The old LSAN suppressions are not needed anymore, but we have a wonderful new crash instead, this time in sysrepo's test_modules, and *only* when running for the first time and with empty repository state. It segfaults at exit: Tracer caught signal 11: addr=0x0 pc=0x4e9218 sp=0x7f9ad95f6d10 ==25238==LeakSanitizer has encountered a fatal error. ==25238==HINT: For debugging, try setting environment variable LSAN_OPTIONS=verbosity=1:log_threads=1 ==25238==HINT: LeakSanitizer does not work under ptrace (strace, gdb, etc) ...and the backtrace (which is rather hard to obtain because LSAN doesn't really work under gdb, so the trick is to run it as `ASAN_OPTIONS=sleep_before_dying=666`, and then attach via `gdb -p $(pidof test_modules)`, yay!) looks like this: (gdb) bt #0 __GI___clock_nanosleep (clock_id=clock_id@entry=0, flags=flags@entry=0, req=req@entry=0x7ffc7d3e90a0, rem=rem@entry=0x7ffc7d3e90a0) at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:79 #1 0x00007fb5d5114157 in __GI___nanosleep (requested_time=requested_time@entry=0x7ffc7d3e90a0, remaining=remaining@entry=0x7ffc7d3e90a0) at nanosleep.c:27 #2 0x00007fb5d511408e in __sleep (seconds=0) at ../sysdeps/posix/sleep.c:55 #3 0x00000000004c587b in __asan::AsanDie() () #4 0x00000000004dd56e in __sanitizer::Die() () #5 0x00000000004ec1eb in __lsan::CheckForLeaks() () #6 0x00000000004ec219 in __lsan::DoLeakCheck() () #7 0x00007fb5d50853a7 in __run_exit_handlers (status=0, listp=0x7fb5d5209578 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true, run_dtors=run_dtors@entry=true) at exit.c:108 #8 0x00007fb5d5085550 in __GI_exit (status=<optimized out>) at exit.c:139 #9 0x00007fb5d506d049 in __libc_start_main (main=0x4f3650 <main>, argc=1, argv=0x7ffc7d3e92f8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffc7d3e92e8) at ../csu/libc-start.c:342 #10 0x000000000041c6ce in _start () A TL;DR version is that this is *probably* related to how ASAN guesses glibc versions, and running it via `ASAN_OPTIONS=verbosity=2:intercept_tls_get_addr=1` prevents that segfault. That was very yummy to debug, of course. Change-Id: Idf19d5f3ed4ed838dcb89a2dc624aa918900ac7b Bug: google/sanitizers#1322
It appears that under particular circumstances, ASan will crash on exit due to a bug where it mis-detects the glibc version. See: google/sanitizers#1322
google/sanitizers#1322 According to that github issue, lsan will mistakenly detect "2.19 for the real glibc of version 2.25 or higher." On our xenial test bots, /lib/x86_64-linux-gnu/libc.so.6 is 2.23. On our bionic test bots, libc is 2.27. So when we switched our asan/lsan tests to bionic, we started hitting strange lsan errors: Tracer caught signal 11: addr=0x330004f3 pc=0x55af5266644a sp=0x7f405e956d40 ==28674==LeakSanitizer has encountered a fatal error. eg: https://chromium-swarm.appspot.com/task?id=53745e69a9525810 It appears that error is the same reported in the github issue, as disabling the DTLS check prevents those same LSan errors. Though this doesn't entirely explain why the failures are transient when nothing about the bots seemingly changes. eg: https://ci.chromium.org/p/chromium/builders/ci/Linux%20ASan%20LSan%20Tests%20%281%29 Why does components_unittests fail in builds 89913 - 89916, but is fine before and after? Bug: 1200574 Change-Id: Id8ffed3bd50d648a1147de50bb4fd2f001c01d12 Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/2888418 Reviewed-by: Thomas Anderson <thomasanderson@chromium.org> Reviewed-by: Garrett Beaty <gbeaty@chromium.org> Reviewed-by: Dirk Pranke <dpranke@google.com> Owners-Override: Garrett Beaty <gbeaty@chromium.org> Commit-Queue: Ben Pastene <bpastene@chromium.org> Cr-Commit-Position: refs/heads/master@{#883698}
Seeing that this is affecting multiple projects and that no fix seems to be forthcoming, perhaps this test should be disabled by default, at least if using GLIBC? |
See google/sanitizers#1322. It might be the cause of the intermittent asan failures on CI.
PR-URL: #43085 Refs: google/sanitizers#1322 Refs: #43082 Reviewed-By: Jiawen Geng <technicalcute@gmail.com> Reviewed-By: LiviaMedeiros <livia@cirno.name> Reviewed-By: Rich Trott <rtrott@gmail.com> Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
PR-URL: #43085 Refs: google/sanitizers#1322 Refs: #43082 Reviewed-By: Jiawen Geng <technicalcute@gmail.com> Reviewed-By: LiviaMedeiros <livia@cirno.name> Reviewed-By: Rich Trott <rtrott@gmail.com> Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
PR-URL: #43085 Refs: google/sanitizers#1322 Refs: #43082 Reviewed-By: Jiawen Geng <technicalcute@gmail.com> Reviewed-By: LiviaMedeiros <livia@cirno.name> Reviewed-By: Rich Trott <rtrott@gmail.com> Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
PR-URL: #43085 Refs: google/sanitizers#1322 Refs: #43082 Reviewed-By: Jiawen Geng <technicalcute@gmail.com> Reviewed-By: LiviaMedeiros <livia@cirno.name> Reviewed-By: Rich Trott <rtrott@gmail.com> Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
PR-URL: #43085 Refs: google/sanitizers#1322 Refs: #43082 Reviewed-By: Jiawen Geng <technicalcute@gmail.com> Reviewed-By: LiviaMedeiros <livia@cirno.name> Reviewed-By: Rich Trott <rtrott@gmail.com> Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
PR-URL: #43085 Refs: google/sanitizers#1322 Refs: #43082 Reviewed-By: Jiawen Geng <technicalcute@gmail.com> Reviewed-By: LiviaMedeiros <livia@cirno.name> Reviewed-By: Rich Trott <rtrott@gmail.com> Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
PR-URL: nodejs/node#43085 Refs: google/sanitizers#1322 Refs: nodejs/node#43082 Reviewed-By: Jiawen Geng <technicalcute@gmail.com> Reviewed-By: LiviaMedeiros <livia@cirno.name> Reviewed-By: Rich Trott <rtrott@gmail.com> Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
As suggested here: google/sanitizers#1322.
As suggested here: google/sanitizers#1322.
It might be the cause of the intermittent ASan failures on CI. See: google/sanitizers#1322
It might be the cause of the intermittent ASan failures on CI. See: google/sanitizers#1322
Set `ASAN_OPTIONS=intercept_tls_get_addr=0` to avoid problems with sanitizers running in containers. see: google/sanitizers#1322
Set `ASAN_OPTIONS=intercept_tls_get_addr=0` to avoid problems with sanitizers running in containers. see: google/sanitizers#1322 (cherry picked from commit 30a263a)
Set `ASAN_OPTIONS=intercept_tls_get_addr=0` to avoid problems with sanitizers running in containers. see: google/sanitizers#1322
"Fix tls_get_addr handling for glibc >=2.25" (https://reviews.llvm.org/D147459) landed in LLVM upstream last Friday, which should fix the issue. |
### Briefly, what does this PR introduce? Potentially resolves this, per google/sanitizers#1322... ### What kind of change does this PR introduce? - [x] Bug fix (issue #1043) - [ ] New feature (issue #__) - [ ] Documentation update - [ ] Other: __ ### Please check if this PR fulfills the following: - [ ] Tests for the changes have been added - [ ] Documentation has been added / updated - [ ] Changes have been communicated to collaborators ### Does this PR introduce breaking changes? What changes might users need to make to their code? No. ### Does this PR change default behavior? No.
The issue was reported before:
#914
#1267
#1170
But seems like, neither of the solutions were implemented.
The main problem here is that the LeakSan can mistakenly detect the glibc version. It detects 2.19 for the real glibc of version 2.25 or higher.
And that causes SIGSEGV later in ScanRangeForPointers function from lsan_common.cc (because DTLS range is illegal).
The condition in DTLS_on_tls_get_addr() function can be false positive: sometimes
((tls_beg % 4096) == sizeof(Glibc_2_19_tls_header))
evaluates to true for pointers from glibc 2.25.I kindly ask you to do something to avoid these annoying crashes. The LeakSan's code suggests gnu_get_libc_version() in order to limit the supported glibc versions :)
The text was updated successfully, but these errors were encountered: