Skip to content
This repository has been archived by the owner on Jul 1, 2023. It is now read-only.

BOLT-CoreDump: assert failed in long-jmp stage on aarch64 platform #206

Closed
HShan886 opened this issue Aug 26, 2021 · 16 comments
Closed

BOLT-CoreDump: assert failed in long-jmp stage on aarch64 platform #206

HShan886 opened this issue Aug 26, 2021 · 16 comments

Comments

@HShan886
Copy link

HShan886 commented Aug 26, 2021

Hi,
thanks firstly.

using llvm-bolt(lastest main branch) to generate optimal binary for aarch64.

llvm-bolt ./server_binary -o ./server_binary.bolt -data=./server_binary.fdata -reorder-blocks=cache+ -reorder-functions=hfsort -split-functions=2 -split-all-cold -split-eh -dyno-stats -v=2

Then I get assert failed in long-jmp pass.

BOLT/bolt/src/BinaryFunction.cpp:3375: llvm::MCSymbol* llvm::bolt::BinaryFunction::addEntryPoint(const llvm::bolt::BinaryBasicBlock&): Assertion `CurrentState == State::CFG && "basic block can be added as an entry only in a function with CFG"' failed.

 #9 0x00000000004af9de llvm::bolt::BinaryFunction::addEntryPoint(llvm::bolt::BinaryBasicBlock const&) /tmp/BOLT/bolt/src/BinaryFunction.cpp:3378:32
#10 0x0000000000f7c838 llvm::bolt::LongJmpPass::replaceTargetWithStub(llvm::bolt::BinaryBasicBlock&, llvm::MCInst&, unsigned long, unsigned long) /tmp/BOLT/bolt/src/Passes/LongJmp.cpp:254:19
#11 0x0000000000f7e5f6 llvm::bolt::LongJmpPass::relax(llvm::bolt::BinaryFunction&) /tmp/BOLT/bolt/src/Passes/LongJmp.cpp:579:30
#12 0x0000000000f7e964 llvm::bolt::LongJmpPass::runOnFunctions(llvm::bolt::BinaryContext&) /tmp/BOLT/bolt/src/Passes/LongJmp.cpp:618:7

this assert failure show we can't insert new tailcall basicblock in CFG_Finalized stage.
Therefore, I update state of Function, and this assert failure is gone.
and new basic block information:

.LStub1189:
    00000000:   b       SYMBOLat0x1a6bcf0 # TAILCALL

But in the later emit-link stage, I encounted another assert failure.

resolveAArch64Relocation, LocalAddress: 0x7fdbc5b5d340 FinalAddress: 0xa1e4140 Value: 0x1a6bcf0 Type: 0x11b Addend: 0x0
llvm-bolt: /tmp/BOLT/llvm/lib/ExecutionEngine/RuntimeDyld/RuntimeDyldELF.cpp:464: void llvm::RuntimeDyldELF::resolveAArch64Relocation(const llvm::SectionEntry&, uint64_t, uint64_t, uint32_t, int64_t): Assertion `isInt<28>(BranchImm)' failed.
 ...
 #9 0x0000000002843939 llvm::RuntimeDyldELF::resolveAArch64Relocation(llvm::SectionEntry const&, unsigned long, unsigned long, unsigned int, long) /tmp/BOLT/llvm/lib/ExecutionEngine/RuntimeDyld/RuntimeDyldELF.cpp:465:48
#10 0x0000000002845d52 llvm::RuntimeDyldELF::resolveRelocation(llvm::SectionEntry const&, unsigned long, unsigned long, unsigned int, long, unsigned long, unsigned int) /tmp/BOLT/llvm/lib/ExecutionEngine/RuntimeDyld/RuntimeDyldELF.cpp:1024:5
#11 0x0000000002845c6a llvm::RuntimeDyldELF::resolveRelocation(llvm::RelocationEntry const&, unsigned long) /tmp/BOLT/llvm/lib/ExecutionEngine/RuntimeDyld/RuntimeDyldELF.cpp:1005:27
#12 0x0000000002826bdc llvm::RuntimeDyldImpl::resolveRelocationList(llvm::SmallVector<llvm::RelocationEntry, 64u> const&, unsigned long) /tmp/BOLT/llvm/lib/ExecutionEngine/RuntimeDyld/RuntimeDyld.cpp:1087:22
#13 0x00000000028270a9 llvm::RuntimeDyldImpl::applyExternalSymbolRelocations(llvm::StringMap<llvm::JITEvaluatedSymbol, llvm::MallocAllocator>) /tmp/BOLT/llvm/lib/ExecutionEngine/RuntimeDyld/RuntimeDyld.cpp:1093:24
#14 0x000000000282761b llvm::RuntimeDyldImpl::resolveExternalSymbols() /tmp/BOLT/llvm/lib/ExecutionEngine/RuntimeDyld/RuntimeDyld.cpp:1193:33
#15 0x0000000002821c4b llvm::RuntimeDyldImpl::resolveRelocations() /tmp/BOLT/llvm/lib/ExecutionEngine/RuntimeDyld/RuntimeDyld.cpp:131:42
#16 0x00000000028287bc llvm::RuntimeDyld::resolveRelocations() /tmp/BOLT/llvm/lib/ExecutionEngine/RuntimeDyld/RuntimeDyld.cpp:1370:70
#17 0x00000000028288a7 llvm::RuntimeDyld::finalizeWithMemoryManagerLocking() /tmp/BOLT/llvm/lib/ExecutionEngine/RuntimeDyld/RuntimeDyld.cpp:1389:19
#18 0x0000000000605f2f llvm::bolt::RewriteInstance::emitAndLink() /tmp/BOLT/bolt/src/RewriteInstance.cpp:3118:23
#19 0x00000000005f8b64 llvm::bolt::RewriteInstance::run() /tmp/BOLT/bolt/src/RewriteInstance.cpp:889:17
#20 0x0000000000412bb2 main /tmp/BOLT/bolt/src/llvm-bolt.cpp:304:49
#21 0x00007fde046af193 __libc_start_main (/lib64/libc.so.6+0x27193)
#22 0x00000000004116be _start (../build/bin/llvm-bolt+0x4116be)
PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace.

This assert failure show that address of new basic block overflow.

So how to insert stub basciblock correctly, when input a larger binary for llvm-bolt?

@yota9
Copy link
Contributor

yota9 commented Oct 16, 2021

@Haishan312 Hello. Could you please check the issue on the latest bolt? I believe it would be fixed. Thank you!

@HShan886
Copy link
Author

HShan886 commented Oct 18, 2021

@Haishan312 Hello. Could you please check the issue on the latest bolt? I believe it would be fixed. Thank you!

Not yet, I got another core. may be relative with island PR (abf5b7e)

#0  0x0000000000464210 in std::_Rb_tree<llvm::bolt::BinaryFunction*, llvm::bolt::BinaryFunction*, std::_Identity<llvm::bolt::BinaryFunction*>, std::less<llvm::bolt::BinaryFunction*>, std::allocator<llvm::bolt::BinaryFunction*> >::_M_begin (this=0x1c8) at /usr/include/c++/9/bits/stl_tree.h:745
#1  0x0000000000467f09 in std::_Rb_tree<llvm::bolt::BinaryFunction*, llvm::bolt::BinaryFunction*, std::_Identity<llvm::bolt::BinaryFunction*>, std::less<llvm::bolt::BinaryFunction*>, std::allocator<llvm::bolt::BinaryFunction*> >::_M_get_insert_unique_pos (this=0x1c8, __k=@0x41bb6ae8: 0x24cc7b98)   at /usr/include/c++/9/bits/stl_tree.h:2089
#2  0x000000000045c6f9 in std::_Rb_tree<llvm::bolt::BinaryFunction*, llvm::bolt::BinaryFunction*, std::_Identity<llvm::bolt::BinaryFunction*>, std::less<llvm::bolt::BinaryFunction*>, std::allocator<llvm::bolt::BinaryFunction*> >::_M_insert_unique<llvm::bolt::BinaryFunction* const&> (this=0x1c8, __v=@0x41bb6ae8: 0x24cc7b98) at /usr/include/c++/9/bits/stl_tree.h:2147
#3  0x0000000000452127 in std::set<llvm::bolt::BinaryFunction*, std::less<llvm::bolt::BinaryFunction*>, std::allocator<llvm::bolt::BinaryFunction*> >::insert (this=0x1c8, __x=@0x41bb6ae8: 0x24cc7b98) at /usr/include/c++/9/bits/stl_set.h:511
#4  0x0000000000434b8f in llvm::bolt::BinaryContext::handleAddressRef (this=0x5aaec10, Address=34951168, 
    BF=..., IsPCRel=true) at /tmp/BOLT/bolt/src/BinaryContext.cpp:448
#5  0x00000000004a664f in llvm::bolt::BinaryFunction::<lambda(llvm::MCInst&, uint64_t, uint64_t)>::operator()(llvm::MCInst &, uint64_t, uint64_t) const (__closure=0x7fffffffd090, Instruction=..., Address=34951564, Size=4) at /tmp/BOLT/bolt/src/BinaryFunction.cpp:1091
#6  0x00000000004a863f in llvm::bolt::BinaryFunction::disassemble (this=0x24cc84e8) at /tmp/BOLT/bolt/src/BinaryFunction.cpp:1430
#7  0x0000000000611848 in llvm::bolt::RewriteInstance::disassembleFunctions (this=0x7fffffffd850) at /tmp/BOLT/bolt/src/RewriteInstance.cpp:2862
#8  0x0000000000605c77 in llvm::bolt::RewriteInstance::run (this=0x7fffffffd850) at /tmp/BOLT/bolt/src/RewriteInstance.cpp:873
#9  0x0000000000412bc3 in main (argc=12, argv=0x7fffffffe078) at /tmp/BOLT/bolt/src/llvm-bolt.cpp:323

@yota9
Copy link
Contributor

yota9 commented Oct 18, 2021

@Haishan312 The PR indeed has a problem, I already fixed that in #228 it is currently on review

@yota9
Copy link
Contributor

yota9 commented Oct 19, 2021

Hello @Haishan312 . The fix was merge, I think now it should be OK :)

@HShan886
Copy link
Author

@yota9 it works, but in emit instruction stage, I still get a core, Maybe add a mov instruction for a big immediate into register, it will be fixed absolutely. because the immediate is bigger than int<28>.

@yota9
Copy link
Contributor

yota9 commented Oct 19, 2021

@Haishan312 Interesting.. Could you please give the log? How big is the binary? Is it exec or dyn? Probably it is exec, we don not support all of the possible relocations now, like the AARCH64_MOVW_UABS_G* I've added recently.

@HShan886
Copy link
Author

HShan886 commented Oct 20, 2021

@yota9 Yes, it is executable binary, and its size more than 2G. If I set split-function=0, bolt works fine.
core stack:

llvm-bolt: /tmp/BOLT/llvm/lib/ExecutionEngine/RuntimeDyld/RuntimeDyldELF.cpp:483: 
void llvm::RuntimeDyldELF::resolveAArch64Relocation(const llvm::SectionEntry&, uint64_t, uint64_t, uint32_t, int64_t): Assertion `isInt<28>(BranchImm)' failed.
 #0 0x0000000001e4437d llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /tmp/BOLT/llvm/lib/Support/Unix/Signals.inc:565:22
 #1 0x0000000001e44434 PrintStackTraceSignalHandler(void*) /tmp/BOLT/llvm/lib/Support/Unix/Signals.inc:632:1
 #2 0x0000000001e42410 llvm::sys::RunSignalHandlers() /tmp/BOLT/llvm/lib/Support/Signals.cpp:97:20
 #3 0x0000000001e43dc4 SignalHandler(int) /tmp/BOLT/llvm/lib/Support/Unix/Signals.inc:407:1
 #4 0x00007f1c487499d0 __restore_rt sigaction.c:0:0
 #5 0x00007f1c47e0cf35 raise (/lib64/libc.so.6+0x3bf35)
 #6 0x00007f1c47df68d7 abort (/lib64/libc.so.6+0x258d7)
 #7 0x00007f1c47df67a7 _nl_load_domain.cold loadmsgcat.c:0:0
 #8 0x00007f1c47e05536 (/lib64/libc.so.6+0x34536)
 #9 0x0000000002a9fa1e llvm::RuntimeDyldELF::resolveAArch64Relocation(llvm::SectionEntry const&, unsigned long, unsigned long, unsigned int, long) /tmp/BOLT/llvm/lib/ExecutionEngine/RuntimeDyld/RuntimeDyldELF.cpp:484:48
#10 0x0000000002aa1e38 llvm::RuntimeDyldELF::resolveRelocation(llvm::SectionEntry const&, unsigned long, unsigned long, unsigned int, long, unsigned long, unsigned int) /tmp/BOLT/llvm/lib/ExecutionEngine/RuntimeDyld/RuntimeDyldELF.cpp:1043:5
#11 0x0000000002aa1d50 llvm::RuntimeDyldELF::resolveRelocation(llvm::RelocationEntry const&, unsigned long) /tmp/BOLT/llvm/lib/ExecutionEngine/RuntimeDyld/RuntimeDyldELF.cpp:1024:27
#12 0x0000000002a82a4e llvm::RuntimeDyldImpl::resolveRelocationList(llvm::SmallVector<llvm::RelocationEntry, 64u> const&, unsigned long) /tmp/BOLT/llvm/lib/ExecutionEngine/RuntimeDyld/RuntimeDyld.cpp:1108:22
#13 0x0000000002a82f5d llvm::RuntimeDyldImpl::applyExternalSymbolRelocations(llvm::StringMap<llvm::JITEvaluatedSymbol, llvm::MallocAllocator>) /tmp/BOLT/llvm/lib/ExecutionEngine/RuntimeDyld/RuntimeDyld.cpp:1114:24
#14 0x0000000002a834cf llvm::RuntimeDyldImpl::resolveExternalSymbols() /tmp/BOLT/llvm/lib/ExecutionEngine/RuntimeDyld/RuntimeDyld.cpp:1214:33
#15 0x0000000002a7d947 llvm::RuntimeDyldImpl::resolveRelocations() /tmp/BOLT/llvm/lib/ExecutionEngine/RuntimeDyld/RuntimeDyld.cpp:131:42
#16 0x0000000002a846a0 llvm::RuntimeDyld::resolveRelocations() /tmp/BOLT/llvm/lib/ExecutionEngine/RuntimeDyld/RuntimeDyld.cpp:1399:70
#17 0x0000000002a8478b llvm::RuntimeDyld::finalizeWithMemoryManagerLocking() /tmp/BOLT/llvm/lib/ExecutionEngine/RuntimeDyld/RuntimeDyld.cpp:1418:19
#18 0x0000000000612837 llvm::bolt::RewriteInstance::emitAndLink() /tmp/BOLT/bolt/src/RewriteInstance.cpp:3131:23
#19 0x00000000006051fc llvm::bolt::RewriteInstance::run() /tmp/BOLT/bolt/src/RewriteInstance.cpp:890:17
#20 0x0000000000412bc3 main /tmp/BOLT/bolt/src/llvm-bolt.cpp:304:49

@yota9
Copy link
Contributor

yota9 commented Oct 20, 2021

@Haishan312 Hm interesting, one of the branches seems to be not relaxed. I will fix couple of small issues soon, but I don't think I can blame them in it.. Is it possible to get binary in profile file that you are using?

@HShan886
Copy link
Author

HShan886 commented Oct 21, 2021

@yota9 This instruction( b SYMBOLat0x1a6bcf0 # TAILCALL) is inserted in longjump pass.

@yota9
Copy link
Contributor

yota9 commented Oct 21, 2021

@Haishan312 Interesting, it seems to be stub not relaxed properly.. I've created new PR #241 , it also fixes minor problem with function alignment. And as far as I see the longjmp pass currently does not take into account basic block alignment, do you use AlignBlocks or PreserveBlocksAlignment ooption?

@HShan886
Copy link
Author

@yota9 That's great. I doesn't use --align-blocks.

@yota9
Copy link
Contributor

yota9 commented Oct 21, 2021

@Haishan312

That's great.

So does it work now? :)

@HShan886
Copy link
Author

@Haishan312

That's great.

So does it work now? :)

Not yet, I get same core when I add "--align-blocks"

@yota9
Copy link
Contributor

yota9 commented Oct 21, 2021

Not yet, I get same core when I add "--align-blocks"

It is expectedly, as I said align-blocks does not work now (I'm not sure will I fix it or not). If it works without align-blocks I purpose to close the issue, I will open new one with align-blocks problem.

@dongjianqiang2
Copy link

@yota9 hello, I also encount the same issue when try to instrument big binary. it seems to be stub not relaxed properly.
resolveAArch64Relocation, LocalAddress: 0x7f393f3a82e4 FinalAddress: 0x6ce732e4 Value: 0x82b0000 Type: 0x11b Addend: 0x0
llvm-bolt: /home/djq/codehub/cpu-bolt/llvm/lib/ExecutionEngine/RuntimeDyld/RuntimeDyldELF.cpp:483: void llvm::RuntimeDyldELF::resolveAArch64Relocation(const llvm::SectionEntry&, uint64_t, uint64_t, uint32_t, int64_t): Assertion `isInt<28>(BranchImm)' failed.
#0 0x0000555772b0936e llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /home/djq/codehub/cpu-bolt/llvm/lib/Support/Unix/Signals.inc:565:0
#1 0x0000555772b09425 PrintStackTraceSignalHandler(void*) /home/djq/codehub/cpu-bolt/llvm/lib/Support/Unix/Signals.inc:632:0
#2 0x0000555772b070d4 llvm::sys::RunSignalHandlers() /home/djq/codehub/cpu-bolt/llvm/lib/Support/Signals.cpp:96:0
#3 0x0000555772b08cef SignalHandler(int) /home/djq/codehub/cpu-bolt/llvm/lib/Support/Unix/Signals.inc:407:0
#4 0x00007f3d1a4f3980 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x12980)
#5 0x00007f3d191a4fb7 raise /build/glibc-S9d2JN/glibc-2.27/signal/../sysdeps/unix/sysv/linux/raise.c:51:0
#6 0x00007f3d191a6921 abort /build/glibc-S9d2JN/glibc-2.27/stdlib/abort.c:81:0
#7 0x00007f3d1919648a __assert_fail_base /build/glibc-S9d2JN/glibc-2.27/assert/assert.c:89:0
#8 0x00007f3d19196502 (/lib/x86_64-linux-gnu/libc.so.6+0x30502)
#9 0x0000555772d418e3 llvm::RuntimeDyldELF::resolveAArch64Relocation(llvm::SectionEntry const&, unsigned long, unsigned long, unsigned int, long) /home/djq/codehub/cpu-bolt/llvm/lib/ExecutionEngine/RuntimeDyld/RuntimeDyldELF.cpp:484:0
#10 0x0000555772d43eb7 llvm::RuntimeDyldELF::resolveRelocation(llvm::SectionEntry const&, unsigned long, unsigned long, unsigned int, long, unsigned long, unsigned int) /home/djq/codehub/cpu-bolt/llvm/lib/ExecutionEngine/RuntimeDyld/RuntimeDyldELF.cpp:1043:0
#11 0x0000555772d43dce llvm::RuntimeDyldELF::resolveRelocation(llvm::RelocationEntry const&, unsigned long) /home/djq/codehub/cpu-bolt/llvm/lib/ExecutionEngine/RuntimeDyld/RuntimeDyldELF.cpp:1024:0
#12 0x0000555772d22062 llvm::RuntimeDyldImpl::resolveRelocationList(llvm::SmallVector<llvm::RelocationEntry, 64u> const&, unsigned long) /home/djq/codehub/cpu-bolt/llvm/lib/ExecutionEngine/RuntimeDyld/RuntimeDyld.cpp:1112:0
#13 0x0000555772d2259d llvm::RuntimeDyldImpl::applyExternalSymbolRelocations(llvm::StringMap<llvm::JITEvaluatedSymbol, llvm::MallocAllocator>) /home/djq/codehub/cpu-bolt/llvm/lib/ExecutionEngine/RuntimeDyld/RuntimeDyld.cpp:1118:0
#14 0x0000555772d22b5e llvm::RuntimeDyldImpl::resolveExternalSymbols() /home/djq/codehub/cpu-bolt/llvm/lib/ExecutionEngine/RuntimeDyld/RuntimeDyld.cpp:1218:0
#15 0x0000555772d1cad4 llvm::RuntimeDyldImpl::resolveRelocations() /home/djq/codehub/cpu-bolt/llvm/lib/ExecutionEngine/RuntimeDyld/RuntimeDyld.cpp:133:0
#16 0x0000555772d23e0c llvm::RuntimeDyld::resolveRelocations() /home/djq/codehub/cpu-bolt/llvm/lib/ExecutionEngine/RuntimeDyld/RuntimeDyld.cpp:1403:0
#17 0x0000555772d23ef7 llvm::RuntimeDyld::finalizeWithMemoryManagerLocking() /home/djq/codehub/cpu-bolt/llvm/lib/ExecutionEngine/RuntimeDyld/RuntimeDyld.cpp:1422:0
#18 0x00005557726848cc llvm::bolt::RewriteInstance::emitAndLink() /home/djq/codehub/cpu-bolt/bolt/lib/Rewrite/RewriteInstance.cpp:3019:0
#19 0x00005557726764ec llvm::bolt::RewriteInstance::run() /home/djq/codehub/cpu-bolt/bolt/lib/Rewrite/RewriteInstance.cpp:782:0
#20 0x0000555771cd22b3 main /home/djq/codehub/cpu-bolt/bolt/tools/driver/llvm-bolt.cpp:250:0

@yota9
Copy link
Contributor

yota9 commented Mar 22, 2022

Hello @dongjianqiang2 ! I'm trying to fix it from time to time, recently new patches were merged and one of them is currently on the review https://reviews.llvm.org/D122039. Please try to apply it to the latest bolt version from the llvm repo https://github.com/llvm/llvm-project. If it won't help please let me see the binary if it is possible. Thank you!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants