Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation violation on testing with attempted backtrace using debug symbols #46439

Closed
actual-nh opened this issue Dec 31, 2020 · 15 comments · Fixed by #46498
Closed

Segmentation violation on testing with attempted backtrace using debug symbols #46439

actual-nh opened this issue Dec 31, 2020 · 15 comments · Fixed by #46498
Labels
Code: Tests Measurement, self-control, statistics, balancing. <Crash / Freeze> Fatal bug that results in hangs or crashes.

Comments

@actual-nh
Copy link
Contributor

actual-nh commented Dec 31, 2020

When running all tests in randomized order (--order rand), with two different random seeds, vehicle_level_test gives an error (on beetle edge drop), this becomes a fatal error, and Catch has a segmentation violation on trying to repeat the stack trace with debug symbols.

Example:

0.412 s: beetle body drop
0.413 s: vehicle_level_test
-------------------------------------------------------------------------------
vehicle_level_test
  beetle edge drop
-------------------------------------------------------------------------------
../tests/vehicle_ramp_test.cpp:298
...............................................................................

../tests/vehicle_ramp_test.cpp:287: FAILED:
  CHECK( veh.global_part_pos3( *prt ).z == 0 )
with expansion:
  1 == 0

../tests/vehicle_ramp_test.cpp:287: FAILED:
  CHECK( veh.global_part_pos3( *prt ).z == 0 )
with expansion:
  1 == 0

../tests/vehicle_ramp_test.cpp:287: FAILED:
  CHECK( veh.global_part_pos3( *prt ).z == 0 )
with expansion:
  1 == 0

../tests/vehicle_ramp_test.cpp:287: FAILED:
  CHECK( veh.global_part_pos3( *prt ).z == 0 )
with expansion:
  1 == 0

../tests/vehicle_ramp_test.cpp:287: FAILED:
  CHECK( veh.global_part_pos3( *prt ).z == 0 )
with expansion:
  1 == 0

../tests/vehicle_ramp_test.cpp:287: FAILED:
  CHECK( veh.global_part_pos3( *prt ).z == 0 )
with expansion:
  1 == 0

../tests/vehicle_ramp_test.cpp:287: FAILED:
  CHECK( veh.global_part_pos3( *prt ).z == 0 )
with expansion:
  1 == 0

../tests/vehicle_ramp_test.cpp:287: FAILED:
  CHECK( veh.global_part_pos3( *prt ).z == 0 )
with expansion:
  1 == 0

../tests/vehicle_ramp_test.cpp:287: FAILED:
  CHECK( veh.global_part_pos3( *prt ).z == 0 )
with expansion:
  1 == 0

../tests/vehicle_ramp_test.cpp:287: FAILED:
  CHECK( veh.global_part_pos3( *prt ).z == 0 )
with expansion:
  1 == 0

../tests/vehicle_ramp_test.cpp:287: FAILED:
  CHECK( veh.global_part_pos3( *prt ).z == 0 )
with expansion:
  1 == 0

../tests/vehicle_ramp_test.cpp:287: FAILED:
  CHECK( veh.global_part_pos3( *prt ).z == 0 )
with expansion:
  1 == 0

../tests/vehicle_ramp_test.cpp:287: FAILED:
  CHECK( veh.global_part_pos3( *prt ).z == 0 )
with expansion:
  1 == 0

../tests/vehicle_ramp_test.cpp:287: FAILED:
  CHECK( veh.global_part_pos3( *prt ).z == 0 )
with expansion:
  1 == 0

../tests/vehicle_ramp_test.cpp:287: FAILED:
  CHECK( veh.global_part_pos3( *prt ).z == 0 )
with expansion:
  1 == 0

Stack trace at fatal error:


    Attempting to repeat stack trace using debug symbols…
../tests/vehicle_ramp_test.cpp:287: FAILED:
  {Unknown expression after the reported line}
due to a fatal error condition:
  SIGSEGV - Segmentation violation signal

Log messages during failed test:
12:00:00AM: You are slammed against the Beetle.
12:00:00AM: You're knocked to the floor!
12:00:00AM: You land on the Beetle.
12:00:00AM: Your Beetle's <color_c_light_green>||</color> frame rams into you!
12:00:00AM: You are slammed against the Beetle.
12:00:00AM: You are slammed against the Beetle.
12:00:00AM: You are slammed against the Beetle.
12:00:00AM: You are slammed against the Beetle.
12:00:00AM: You land on the Beetle.
12:00:00AM: Your Beetle's <color_c_light_green>||</color> frame rams into you!
12:00:00AM: You are slammed against the Beetle.
12:00:00AM: You are slammed against the Beetle.
12:00:00AM: You are slammed against the Beetle.
12:00:00AM: You are slammed against the Beetle.
12:00:00AM: You are slammed against the Beetle.
12:00:00AM: Your Beetle's <color_c_light_green>||</color> frame rams into you!

An earlier test (from seed 1609442887) also had problems with the backtrace:

0.821 s: no ramp
0.832 s: vehicle_ramp_test_61
0.646 s: ramp up
0.646 s: vehicle_ramp_test_61
0.818 s: ramp down
0.818 s: vehicle_ramp_test_61
0.986 s: angled no ramp
0.986 s: vehicle_ramp_test_61
1.150 s: angled ramp down
1.150 s: vehicle_ramp_test_61
1.164 s: angled ramp up
1.164 s: vehicle_ramp_test_61
0.754 s: no ramp
0.754 s: vehicle_ramp_test_60
0.683 s: ramp up
0.683 s: vehicle_ramp_test_60
0.905 s: ramp down
0.905 s: vehicle_ramp_test_60
0.909 s: angled no ramp
0.909 s: vehicle_ramp_test_60
1.408 s: angled ramp down
1.408 s: vehicle_ramp_test_60
1.037 s: angled ramp up
1.037 s: vehicle_ramp_test_60

14:38:12.920 ERROR : (error message will follow backtrace)
    0   cata_test                           0x000000010d3f6bd6 _Z21debug_write_backtraceRNSt3__113basic_ostreamIcNS_11char_traitsIcEEEE + 38
    1   cata_test                           0x000000010d3f4e9e _Z8DebugLog10DebugLevel10DebugClass + 462
    2   cata_test                           0x000000010d3f463b _Z12realDebugmsgPKcS0_S0_RKNSt3__112basic_stringIcNS1_11char_traitsIcEENS1_9allocatorIcEEEE + 859
    3   cata_test                           0x000000010f21e1f2 _Z12realDebugmsgIJiiiEEvPKcS1_S1_S1_DpOT_ + 162
    4   cata_test                           0x000000010f21e13a _ZN3npc12place_on_mapEv + 1354
    5   cata_test                           0x000000010d86f1de _ZN4game9load_npcsEv + 2270
    6   cata_test                           0x000000010c113f18 _ZL18create_test_talkerv + 152
    7   cata_test                           0x000000010c11256a _ZL9prep_testR8dialogue + 602
    8   cata_test                           0x000000010c0edf5d _ZL30____C_A_T_C_H____T_E_S_T____17v + 45
    9   cata_test                           0x000000010c4f6293 _ZNK5Catch21TestInvokerAsFunction6invokeEv + 19
    10  cata_test                           0x000000010c4e4d37 _ZNK5Catch8TestCase6invokeEv + 39
    11  cata_test                           0x000000010c4e4c79 _ZN5Catch10RunContext20invokeActiveTestCaseEv + 41
    12  cata_test                           0x000000010c4e06c0 _ZN5Catch10RunContext14runCurrentTestERNSt3__112basic_stringIcNS1_11char_traitsIcEENS1_9allocatorIcEEEES8_ + 1952
    13  cata_test                           0x000000010c4de81a _ZN5Catch10RunContext7runTestERKNS_8TestCaseE + 1114
    14  cata_test                           0x000000010c4e9788 _ZN5Catch12_GLOBAL__N_19TestGroup7executeEv + 952
    15  cata_test                           0x000000010c4e82a7 _ZN5Catch7Session11runInternalEv + 551
    16  cata_test                           0x000000010c4e8015 _ZN5Catch7Session3runEv + 101
    17  cata_test                           0x000000010c528345 main + 3045
    18  libdyld.dylib                       0x00007fff945fd235 start + 1

    Attempting to repeat stack trace using debug symbols…
    backtrace: Could not extract binary name from line
    backtrace: Could not extract binary name from line
    backtrace: Could not extract binary name from line
    backtrace: Could not extract binary name from line
    backtrace: Could not extract binary name from line
    backtrace: Could not extract binary name from line
    backtrace: Could not extract binary name from line
    backtrace: Could not extract binary name from line
    backtrace: Could not extract binary name from line
    backtrace: Could not extract binary name from line
    backtrace: Could not extract binary name from line
    backtrace: Could not extract binary name from line
    backtrace: Could not extract binary name from line
    backtrace: Could not extract binary name from line
    backtrace: Could not extract binary name from line
    backtrace: Could not extract binary name from line
    backtrace: Could not extract binary name from line
    backtrace: Could not extract binary name from line
    backtrace: Could not extract binary name from line
Backtrace emission took 1 seconds.
(continued from above) ERROR : src/npc.cpp:772 [void npc::place_on_map()] Failed to place NPC in a valid location near (25,25,0)0.442 s: npc_talk_role

Steps To Reproduce

Run (using a non-release CDDA compiled with DEBUG_SYMBOLS=1):

  • tests/cata_test --min-duration 0.2 --rng-seed 1609446657 --order rand
  • tests/cata_test --min-duration 0.2 --rng-seed 1609442887 --order rand

Expected behavior

(Not failing the test would be good; I am not sure how to handle this one, since it already does clear_map() and attempts to get the player out of the way... I have opened a separate issue, #46441, for this.) When a test is failed, a more-usable stack trace would be nice; barring that, not having a segmentation violation would be helpful.

Versions and configuration

  • OS: OS X 10.12.6
      • Apple LLVM version 8.1.0 (clang-802.0.42)
      • Target: x86_64-apple-darwin16.7.0
      • Thread model: posix
      • InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
  • Game Version: 0.E-8560-ga3bf351bf5
    • Compiled CDDA using: make NATIVE=osx OSX_MIN=10.12 CLANG=1 MACPORTS=1 USE_HOME_DIR=1 DEBUG_SYMBOLS=1
  • Graphics version: Tiles Terminal
  • Ingame language: C locale
  • Mods loaded: None loaded directly, so dda is loaded as the default.

Additional context

test_user_dir.zip

I have had major problems in the past trying to get a core file out of crashes (of other programs), and this one was no exception.

Ping: @jbytheway, @Qrox, @wapcaplet (as people involved with testing overall)

@actual-nh actual-nh changed the title Segmentation violation on testing with attempted backtrace with debug symbols Segmentation violation on testing with attempted backtrace using debug symbols Dec 31, 2020
@BrettDong BrettDong added <Crash / Freeze> Fatal bug that results in hangs or crashes. Code: Tests Measurement, self-control, statistics, balancing. labels Dec 31, 2020
@Qrox
Copy link
Contributor

Qrox commented Jan 1, 2021

Stack trace of the first test from gdb:

#0  0x00000000006fa562 in level_out(string_id<vehicle_prototype> const&, bool) ()
#1  0x00000000006faa70 in test_leveling(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) ()
#2  0x00000000006fab12 in ____C_A_T_C_H____T_E_S_T____21 ()
#3  0x00000000006c8f85 in Catch::RunContext::runCurrentTest(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&) ()
#4  0x00000000006ca909 in Catch::RunContext::runTest(Catch::TestCase const&) ()
#5  0x00000000006cbda0 in Catch::Session::runInternal() ()
#6  0x00000000006cbfa8 in Catch::Session::run() ()
#7  0x00000000013a5ec7 in main ()

gdb also reported a second crash when I continued execution, which I guess was the crash in the stack trace code, but the crash stack trace was identical to the first one, so I don't know what caused the crash exactly. It might be because I didn't build with debug symbols though.

EDIT: well crap, with debug symbols gdb says Backtrace stopped: previous frame identical to this frame (corrupt stack?) instead. (Just thinking though, maybe that's why the first stack trace was empty?)

The second one in your OP actually printed a backtrace with mangled symbol names, but the addr2line calls to extract unmangled symbol names failed. I don't know much about addr2line though, @jbytheway might know more.

@actual-nh
Copy link
Contributor Author

actual-nh commented Jan 1, 2021

@Qrox - so you're using g++? What OS? (I suspect Catch has an internal RNG for reordering, to make that OS-independent, but it might not.)

@Qrox
Copy link
Contributor

Qrox commented Jan 2, 2021

I'm using MinGW-w64 on Windows 10.

actual-nh added a commit to actual-nh/Cataclysm-DDA that referenced this issue Jan 2, 2021
Randomize ordering of tests to check for problems such as seen in CleverRaven#46439.
@actual-nh
Copy link
Contributor Author

Let's see what happens with using the PR test infrastructure.

From the above two crashes, I am suspecting something with the map code, BTW.

@actual-nh
Copy link
Contributor Author

actual-nh commented Jan 2, 2021

Huh. Travis did manage to get debug symbols - at least for part of it (quite a bit is still blank), and also had a segmentation error (plus quite a few other test failures!):

Stack trace at fatal error:
    ./tests/cata_test(_Z21debug_write_backtraceRSo+0x2b) [0x156728f]
    ./tests/cata_test(_ZN12CataListener14assertionEndedERKN5Catch14AssertionStatsE+0x2c) [0x12fbf38]
    ./tests/cata_test(_ZN5Catch17ListeningReporter14assertionEndedERKNS_14AssertionStatsE+0x27) [0x12dbe6b]
    ./tests/cata_test(_ZN5Catch10RunContext14assertionEndedERKNS_15AssertionResultE+0x9a) [0x12cacb6]
    ./tests/cata_test(_ZN5Catch10RunContext25handleFatalErrorConditionENS_9StringRefE+0x109) [0x12cb431]
    ./tests/cata_test(_ZN5Catch21FatalConditionHandler12handleSignalEi+0xa1) [0x12c4c65]
    /lib/x86_64-linux-gnu/libpthread.so.0(+0x11390) [0x7fdde00f7390]
    ./tests/cata_test() [0x1336813]
    ./tests/cata_test() [0x1332bd6]
    ./tests/cata_test(_ZN5Catch10RunContext20invokeActiveTestCaseEv+0x1c) [0x12cba54]
    ./tests/cata_test(_ZN5Catch10RunContext14runCurrentTestERNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEES7_+0x148) [0x12ca8b2]
    ./tests/cata_test(_ZN5Catch10RunContext7runTestERKNS_8TestCaseE+0x1b0) [0x12ca242]
    ./tests/cata_test(_ZN5Catch7Session11runInternalEv+0x935) [0x12ce2dd]
    ./tests/cata_test(_ZN5Catch7Session3runEv+0x83) [0x12cd91d]
    ./tests/cata_test(main+0xef1) [0x12dfd07]
    /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7fdddf43a830]
    ./tests/cata_test(_start+0x29) [0xe349e9]
    Attempting to repeat stack trace using debug symbols…
    debug_write_backtrace(std::ostream&)
    ??:?
    CataListener::assertionEnded(Catch::AssertionStats const&)
    ??:?
    Catch::ListeningReporter::assertionEnded(Catch::AssertionStats const&)
    ??:?
    Catch::RunContext::assertionEnded(Catch::AssertionResult const&)
    ??:?
    Catch::RunContext::handleFatalErrorCondition(Catch::StringRef)
    ??:?
    Catch::FatalConditionHandler::handleSignal(int)
    ??:?
    ??
    ??:0
    level_out(string_id<vehicle_prototype> const&, bool)
    ../tests/vehicle_ramp_test.cpp:?
    ____C_A_T_C_H____T_E_S_T____21()
    ../tests/vehicle_ramp_test.cpp:?
    Catch::RunContext::invokeActiveTestCase()
    ??:?
    Catch::RunContext::runCurrentTest(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&)
    ??:?
    Catch::RunContext::runTest(Catch::TestCase const&)
    ??:?
    Catch::Session::runInternal()
    ??:?
    Catch::Session::run()
    ??:?
    main
    ??:?
    __libc_start_main
    ??:?
    _start
    ??:?
-------------------------------------------------------------------------------
vehicle_level_test
  beetle edge drop
-------------------------------------------------------------------------------
../tests/vehicle_ramp_test.cpp:298
...............................................................................
../tests/vehicle_ramp_test.cpp:287: FAILED:
  {Unknown expression after the reported line}
due to a fatal error condition:
  SIGSEGV - Segmentation violation signal
Log messages during failed test:
12:00:00AM: You land on the Beetle.
12:00:00AM: Your Beetle's <color_c_light_green>||</color> frame rams into you!
12:00:00AM: You land on the pavement.
12:00:00AM: Your Beetle's <color_c_light_green>||</color> frame rams into you!

It did get a core dump; will have to check if there's some way to access it. (Even if not, the github ones should give us something.)

@actual-nh
Copy link
Contributor Author

Yes, the github one did. I've appealed for someone with a Linux box to take a look - #46476.

@actual-nh
Copy link
Contributor Author

I've narrowed down which tests are needed to run to get the crash. See #46441 for info.

@actual-nh
Copy link
Contributor Author

@anothersimulacrum suggested a small patch that converts it from a segmentation fault to a failure; will put in as PR to close this issue.

actual-nh added a commit to actual-nh/Cataclysm-DDA that referenced this issue Jan 2, 2021
Using a patch suggested by @anothersimulacrum, prevents a segmentation error (instead giving a much more useful failed test).
@actual-nh
Copy link
Contributor Author

I am not sure why this isn't being linked as closed by said PR... sigh.

@actual-nh
Copy link
Contributor Author

Finicky, isn't it?

ZhilkinSerg added a commit that referenced this issue Jan 3, 2021
@jbytheway
Copy link
Contributor

FYI, the extraction of debug symbols doesn't work on OS X, which is why the backtrace was failing. The ideal solution for this is probably for someone to extend the libbacktrace-based stack trace code to other platforms (currently it's only used on Windows) and only use the addr2line-based solution on Linux, and only when libbacktrace is disabled at build time.

@actual-nh
Copy link
Contributor Author

@jbytheway: It was also failing to extract on MinGW-w64 on Windows 10, BTW. @Qrox, I'm thinking your build would have used libbacktrace?

@Qrox
Copy link
Contributor

Qrox commented Jan 4, 2021

It was also failing to extract on MinGW-w64 on Windows 10

I didn't say that... The test didn't report a backtrace on Windows when crashed, but Catch2 didn't report the crash either, so I think the signal handling code might have caused a second crash before the backtrace was printed. Other backtraces were correctly printed with mangled symbol names, but I didn't build with debug symbols so they didn't report the unmangled names. When I implemented backtrace on Windows it was able to report the unmangled names so I don't think it would be any different this time.

On a side note, now I look at the first log in your OP I think the code actually caught the signal and didn't caused a second crash on your end (Catch2 caught the signal, our custom signal handling code tried to print a backtrace, and Catch2 continued to report the same signal). It's just that the stack walking failed and the backtrace was not printed, probably due to corrupted stack?

@actual-nh
Copy link
Contributor Author

I didn't say that... The test didn't report a backtrace on Windows when crashed, but Catch2 didn't report the crash either, so I think the signal handling code might have caused a second crash before the backtrace was printed. Other backtraces were correctly printed with mangled symbol names, but I didn't build with debug symbols so they didn't report the unmangled names. When I implemented backtrace on Windows it was able to report the unmangled names so I don't think it would be any different this time.

OK.

On a side note, now I look at the first log in your OP I think the code actually caught the signal and didn't caused a second crash on your end (Catch2 caught the signal, our custom signal handling code tried to print a backtrace, and Catch2 continued to report the same signal). It's just that the stack walking failed and the backtrace was not printed, probably due to corrupted stack?

From what @jbytheway was saying, I'm not sure whether the stack was corrupted or the backtracing simply didn't work (and did so in such a way that the stack appeared to be corrupted). What would be involved in implementing libbacktrace use on OS X, other than making sure a copy of libbacktrace was around and having the Makefile use it as it does on Windows?

@Qrox
Copy link
Contributor

Qrox commented Jan 7, 2021

libbacktrace is not distributed as a library with GCC so we might need to build it ourselves for OS X. Also the window backtrace code currently uses a combination of dbghelp (the windows native backtrace lib) and libbacktrace, so the code needs to be changed too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Code: Tests Measurement, self-control, statistics, balancing. <Crash / Freeze> Fatal bug that results in hangs or crashes.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants