Avoid conflict with issue 1 from original repo #1

arichardson · 2019-02-18T10:58:34Z

No description provided.

Currently, the type id for a derived type is computed incorrectly. For example, type CTSRD-CHERI#1: int type CTSRD-CHERI#2: ptr to CTSRD-CHERI#1 For a global variable "int *a", type CTSRD-CHERI#1 will be attributed to variable "a". This is due to a bug which assigns the type id of the basetype of that derived type as the derived type's type id. This happens to "const", "volatile", "restrict", "typedef" and "pointer" types. This patch fixed this bug, fixed existing test cases and added a new one focusing on pointers plus other derived types. Signed-off-by: Yonghong Song <yhs@fb.com> llvm-svn: 356727

Looks like a MachinePipeliner algorithm problem found by sanitizer-x86_64-linux-fast. I will backout this test first while investigating the problem to unblock buildbot. ==49637==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x614000002e08 at pc 0x000004364350 bp 0x7ffe228a3bd0 sp 0x7ffe228a3bc8 READ of size 4 at 0x614000002e08 thread T0 #0 0x436434f in llvm::SwingSchedulerDAG::checkValidNodeOrder(llvm::SmallVector<llvm::NodeSet, 8u> const&) const /b/sanitizer-x86_64-linux-fast/build/llvm/lib/CodeGen/MachinePipeliner.cpp:3736:11 #1 0x4342cd0 in llvm::SwingSchedulerDAG::schedule() /b/sanitizer-x86_64-linux-fast/build/llvm/lib/CodeGen/MachinePipeliner.cpp:486:3 #2 0x434042d in llvm::MachinePipeliner::swingModuloScheduler(llvm::MachineLoop&) /b/sanitizer-x86_64-linux-fast/build/llvm/lib/CodeGen/MachinePipeliner.cpp:385:7 #3 0x433eb90 in llvm::MachinePipeliner::runOnMachineFunction(llvm::MachineFunction&) /b/sanitizer-x86_64-linux-fast/build/llvm/lib/CodeGen/MachinePipeliner.cpp:207:5 #4 0x428b7ea in llvm::MachineFunctionPass::runOnFunction(llvm::Function&) /b/sanitizer-x86_64-linux-fast/build/llvm/lib/CodeGen/MachineFunctionPass.cpp:73:13 #5 0x4d1a913 in llvm::FPPassManager::runOnFunction(llvm::Function&) /b/sanitizer-x86_64-linux-fast/build/llvm/lib/IR/LegacyPassManager.cpp:1648:27 #6 0x4d1b192 in llvm::FPPassManager::runOnModule(llvm::Module&) /b/sanitizer-x86_64-linux-fast/build/llvm/lib/IR/LegacyPassManager.cpp:1685:16 #7 0x4d1c06d in runOnModule /b/sanitizer-x86_64-linux-fast/build/llvm/lib/IR/LegacyPassManager.cpp:1752:27 #8 0x4d1c06d in llvm::legacy::PassManagerImpl::run(llvm::Module&) /b/sanitizer-x86_64-linux-fast/build/llvm/lib/IR/LegacyPassManager.cpp:1865 #9 0xa48ca3 in compileModule(char**, llvm::LLVMContext&) /b/sanitizer-x86_64-linux-fast/build/llvm/tools/llc/llc.cpp:611:8 #10 0xa4270f in main /b/sanitizer-x86_64-linux-fast/build/llvm/tools/llc/llc.cpp:365:22 #11 0x7fec902572e0 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x202e0) #12 0x971b69 in _start (/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan/bin/llc+0x971b69) llvm-svn: 363105

Summary: This reverts commit r372204. This change causes build bot failures under msan: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/35236/steps/check-llvm%20msan/logs/stdio: ``` FAIL: LLVM :: DebugInfo/AArch64/asan-stack-vars.mir (19531 of 33579) ******************** TEST 'LLVM :: DebugInfo/AArch64/asan-stack-vars.mir' FAILED ******************** Script: -- : 'RUN: at line 1'; /b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/llc -O0 -start-before=livedebugvalues -filetype=obj -o - /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/DebugInfo/AArch64/asan-stack-vars.mir | /b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/llvm-dwarfdump -v - | /b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/FileCheck /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/DebugInfo/AArch64/asan-stack-vars.mir -- Exit Code: 2 Command Output (stderr): -- ==62894==WARNING: MemorySanitizer: use-of-uninitialized-value #0 0xdfcafb in llvm::AArch64FrameLowering::resolveFrameOffsetReference(llvm::MachineFunction const&, int, bool, unsigned int&, bool, bool) const /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp:1658:3 CTSRD-CHERI#1 0xdfae8a in resolveFrameIndexReference /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp:1580:10 CTSRD-CHERI#2 0xdfae8a in llvm::AArch64FrameLowering::getFrameIndexReference(llvm::MachineFunction const&, int, unsigned int&) const /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp:1536 CTSRD-CHERI#3 0x46642c1 in (anonymous namespace)::LiveDebugValues::extractSpillBaseRegAndOffset(llvm::MachineInstr const&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/CodeGen/LiveDebugValues.cpp:582:21 CTSRD-CHERI#4 0x4647cb3 in transferSpillOrRestoreInst /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/CodeGen/LiveDebugValues.cpp:883:11 CTSRD-CHERI#5 0x4647cb3 in process /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/CodeGen/LiveDebugValues.cpp:1079 CTSRD-CHERI#6 0x4647cb3 in (anonymous namespace)::LiveDebugValues::ExtendRanges(llvm::MachineFunction&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/CodeGen/LiveDebugValues.cpp:1361 CTSRD-CHERI#7 0x463ac0e in (anonymous namespace)::LiveDebugValues::runOnMachineFunction(llvm::MachineFunction&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/CodeGen/LiveDebugValues.cpp:1415:18 CTSRD-CHERI#8 0x4854ef0 in llvm::MachineFunctionPass::runOnFunction(llvm::Function&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/CodeGen/MachineFunctionPass.cpp:73:13 CTSRD-CHERI#9 0x53b0b01 in llvm::FPPassManager::runOnFunction(llvm::Function&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1648:27 CTSRD-CHERI#10 0x53b15f6 in llvm::FPPassManager::runOnModule(llvm::Module&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1685:16 CTSRD-CHERI#11 0x53b298d in runOnModule /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1750:27 CTSRD-CHERI#12 0x53b298d in llvm::legacy::PassManagerImpl::run(llvm::Module&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1863 CTSRD-CHERI#13 0x905f21 in compileModule(char**, llvm::LLVMContext&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/tools/llc/llc.cpp:601:8 CTSRD-CHERI#14 0x8fdc4e in main /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/tools/llc/llc.cpp:355:22 CTSRD-CHERI#15 0x7f67673632e0 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x202e0) CTSRD-CHERI#16 0x882369 in _start (/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/llc+0x882369) MemorySanitizer: use-of-uninitialized-value /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp:1658:3 in llvm::AArch64FrameLowering::resolveFrameOffsetReference(llvm::MachineFunction const&, int, bool, unsigned int&, bool, bool) const Exiting error: -: The file was not recognized as a valid object file FileCheck error: '-' is empty. FileCheck command line: /b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/FileCheck /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/DebugInfo/AArch64/asan-stack-vars.mir ``` Reviewers: bkramer Reviewed By: bkramer Subscribers: sdardis, aprantl, kristof.beyls, jrtc27, atanasyan, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67710 llvm-svn: 372228

Fixes a leak introduced in r372903, detected on the ASan bot. http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/35430/steps/check-clang%20asan/logs/stdio Direct leak of 192 byte(s) in 1 object(s) allocated from: #0 0x561d88 in operator new(unsigned long) /b/sanitizer-x86_64-linux-fast/build/llvm-project/compiler-rt/lib/asan/asan_new_delete.cc:105 CTSRD-CHERI#1 0x1a48779 in clang::ItaniumMangleContext::create(clang::ASTContext&, clang::DiagnosticsEngine&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/clang/lib/AST/ItaniumMangle.cpp:5134:10 CTSRD-CHERI#2 0xdff000 in Decl_AsmLabelAttr_Test::TestBody() /b/sanitizer-x86_64-linux-fast/build/llvm-project/clang/unittests/AST/DeclTest.cpp:97:23 llvm-svn: 372925

Summary: If the .symtab section is stripped from the binary it might be that there's a .gnu_debugdata section which contains a smaller .symtab in order to provide enough information to create a backtrace with function names or to set and hit a breakpoint on a function name. This change looks for a .gnu_debugdata section in the ELF object file. The .gnu_debugdata section contains a xz-compressed ELF file with a .symtab section inside. Symbols from that compressed .symtab section are merged with the main object file's .dynsym symbols (if any). In addition we always load the .dynsym even if there's a .symtab section. For example, the Fedora and RHEL operating systems strip their binaries but keep a .gnu_debugdata section. While gdb already can read this section, LLDB until this patch couldn't. To test this patch on a Fedora or RHEL operating system, try to set a breakpoint on the "help" symbol in the "zip" binary. Before this patch, only GDB can set this breakpoint; now LLDB also can do so without installing extra debug symbols: lldb /usr/bin/zip -b -o "b help" -o "r" -o "bt" -- -h The above line runs LLDB in batch mode and on the "/usr/bin/zip -h" target: (lldb) target create "/usr/bin/zip" Current executable set to '/usr/bin/zip' (x86_64). (lldb) settings set -- target.run-args "-h" Before the program starts, we set a breakpoint on the "help" symbol: (lldb) b help Breakpoint 1: where = zip`help, address = 0x00000000004093b0 Once the program is run and has hit the breakpoint we ask for a backtrace: (lldb) r Process 10073 stopped * thread CTSRD-CHERI#1, name = 'zip', stop reason = breakpoint 1.1 frame #0: 0x00000000004093b0 zip`help zip`help: -> 0x4093b0 <+0>: pushq %r12 0x4093b2 <+2>: movq 0x2af5f(%rip), %rsi ; + 4056 0x4093b9 <+9>: movl $0x1, %edi 0x4093be <+14>: xorl %eax, %eax Process 10073 launched: '/usr/bin/zip' (x86_64) (lldb) bt * thread CTSRD-CHERI#1, name = 'zip', stop reason = breakpoint 1.1 * frame #0: 0x00000000004093b0 zip`help frame CTSRD-CHERI#1: 0x0000000000403970 zip`main + 3248 frame CTSRD-CHERI#2: 0x00007ffff7d8bf33 libc.so.6`__libc_start_main + 243 frame CTSRD-CHERI#3: 0x0000000000408cee zip`_start + 46 In order to support the .gnu_debugdata section, one has to have LZMA development headers installed. The CMake section, that controls this part looks for the LZMA headers and enables .gnu_debugdata support by default if they are found; otherwise or if explicitly requested, the minidebuginfo support is disabled. GDB supports the "mini debuginfo" section .gnu_debugdata since v7.6 (2013). Reviewers: espindola, labath, jankratochvil, alexshap Reviewed By: labath Subscribers: rnkovacs, wuzish, shafik, emaste, mgorny, arichardson, hiraditya, MaskRay, lldb-commits Tags: #lldb, #llvm Differential Revision: https://reviews.llvm.org/D66791 llvm-svn: 373891

This test is not defined. FAIL: LLVM-Unit :: ADT/./ADTTests/ArrayRefTest.SizeTSizedOperations (178 of 33926) ******************** TEST 'LLVM-Unit :: ADT/./ADTTests/ArrayRefTest.SizeTSizedOperations' FAILED ******************** Note: Google Test filter = ArrayRefTest.SizeTSizedOperations [==========] Running 1 test from 1 test case. [----------] Global test environment set-up. [----------] 1 test from ArrayRefTest [ RUN ] ArrayRefTest.SizeTSizedOperations /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/include/llvm/ADT/ArrayRef.h:180:32: runtime error: applying non-zero offset 9223372036854775806 to null pointer #0 0x5ae8dc in llvm::ArrayRef<char>::slice(unsigned long, unsigned long) const /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/include/llvm/ADT/ArrayRef.h:180:32 CTSRD-CHERI#1 0x5ae44c in (anonymous namespace)::ArrayRefTest_SizeTSizedOperations_Test::TestBody() /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/unittests/ADT/ArrayRefTest.cpp:85:3 CTSRD-CHERI#2 0x928a96 in testing::Test::Run() /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/utils/unittest/googletest/src/gtest.cc:2474:5 CTSRD-CHERI#3 0x929793 in testing::TestInfo::Run() /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/utils/unittest/googletest/src/gtest.cc:2656:11 CTSRD-CHERI#4 0x92a152 in testing::TestCase::Run() /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/utils/unittest/googletest/src/gtest.cc:2774:28 CTSRD-CHERI#5 0x9319d2 in testing::internal::UnitTestImpl::RunAllTests() /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/utils/unittest/googletest/src/gtest.cc:4649:43 CTSRD-CHERI#6 0x931416 in testing::UnitTest::Run() /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/utils/unittest/googletest/src/gtest.cc:4257:10 CTSRD-CHERI#7 0x920ac3 in RUN_ALL_TESTS /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/utils/unittest/googletest/include/gtest/gtest.h:2233:46 CTSRD-CHERI#8 0x920ac3 in main /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/utils/unittest/UnitTestMain/TestMain.cpp:50:10 CTSRD-CHERI#9 0x7f66135b72e0 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x202e0) CTSRD-CHERI#10 0x472c19 in _start (/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/unittests/ADT/ADTTests+0x472c19) SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/include/llvm/ADT/ArrayRef.h:180:32 in llvm-svn: 374327

…the branch where it's used The existing code is not defined, you are not allowed to produce non-null pointer from null pointer (F->FileSortedDecls here). That being said, i'm not really confident this is fix-enough, but we'll see. FAIL: Clang :: Modules/no-module-map.cpp (6879 of 16079) ******************** TEST 'Clang :: Modules/no-module-map.cpp' FAILED ******************** Script: -- : 'RUN: at line 1'; /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/bin/clang -cc1 -internal-isystem /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/lib/clang/10.0.0/include -nostdsysteminc -fmodules-ts -fmodule-name=ab -x c++-header /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/clang/test/Modules/Inputs/no-module-map/a.h /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/clang/test/Modules/Inputs/no-module-map/b.h -emit-header-module -o /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/tools/clang/test/Modules/Output/no-module-map.cpp.tmp.pcm : 'RUN: at line 2'; /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/bin/clang -cc1 -internal-isystem /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/lib/clang/10.0.0/include -nostdsysteminc -fmodules-ts -fmodule-file=/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/tools/clang/test/Modules/Output/no-module-map.cpp.tmp.pcm /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/clang/test/Modules/no-module-map.cpp -I/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/clang/test/Modules/Inputs/no-module-map -verify : 'RUN: at line 3'; /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/bin/clang -cc1 -internal-isystem /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/lib/clang/10.0.0/include -nostdsysteminc -fmodules-ts -fmodule-file=/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/tools/clang/test/Modules/Output/no-module-map.cpp.tmp.pcm /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/clang/test/Modules/no-module-map.cpp -I/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/clang/test/Modules/Inputs/no-module-map -verify -DA : 'RUN: at line 4'; /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/bin/clang -cc1 -internal-isystem /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/lib/clang/10.0.0/include -nostdsysteminc -fmodules-ts -fmodule-file=/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/tools/clang/test/Modules/Output/no-module-map.cpp.tmp.pcm /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/clang/test/Modules/no-module-map.cpp -I/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/clang/test/Modules/Inputs/no-module-map -verify -DB : 'RUN: at line 5'; /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/bin/clang -cc1 -internal-isystem /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/lib/clang/10.0.0/include -nostdsysteminc -fmodules-ts -fmodule-file=/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/tools/clang/test/Modules/Output/no-module-map.cpp.tmp.pcm /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/clang/test/Modules/no-module-map.cpp -I/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/clang/test/Modules/Inputs/no-module-map -verify -DA -DB : 'RUN: at line 7'; /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/bin/clang -cc1 -internal-isystem /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/lib/clang/10.0.0/include -nostdsysteminc -E /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/tools/clang/test/Modules/Output/no-module-map.cpp.tmp.pcm -o - | /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/bin/FileCheck /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/clang/test/Modules/no-module-map.cpp : 'RUN: at line 8'; /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/bin/clang -cc1 -internal-isystem /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/lib/clang/10.0.0/include -nostdsysteminc -frewrite-imports -E /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/tools/clang/test/Modules/Output/no-module-map.cpp.tmp.pcm -o - | /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/bin/FileCheck /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/clang/test/Modules/no-module-map.cpp -- Exit Code: 2 Command Output (stderr): -- /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/clang/lib/Serialization/ASTReader.cpp:1526:50: runtime error: applying non-zero offset 8 to null pointer #0 0x3a9bd0c in clang::ASTReader::ReadSLocEntry(int) /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/clang/lib/Serialization/ASTReader.cpp:1526:50 CTSRD-CHERI#1 0x328b6f8 in clang::SourceManager::loadSLocEntry(unsigned int, bool*) const /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/clang/lib/Basic/SourceManager.cpp:461:28 CTSRD-CHERI#2 0x328b351 in clang::SourceManager::initializeForReplay(clang::SourceManager const&) /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/clang/lib/Basic/SourceManager.cpp:399:11 CTSRD-CHERI#3 0x3996c71 in clang::FrontendAction::BeginSourceFile(clang::CompilerInstance&, clang::FrontendInputFile const&) /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/clang/lib/Frontend/FrontendAction.cpp:581:27 CTSRD-CHERI#4 0x394f341 in clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/clang/lib/Frontend/CompilerInstance.cpp:956:13 CTSRD-CHERI#5 0x3a8a92b in clang::ExecuteCompilerInvocation(clang::CompilerInstance*) /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/clang/lib/FrontendTool/ExecuteCompilerInvocation.cpp:290:25 CTSRD-CHERI#6 0xaf8d62 in cc1_main(llvm::ArrayRef<char const*>, char const*, void*) /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/clang/tools/driver/cc1_main.cpp:250:15 CTSRD-CHERI#7 0xaf1602 in ExecuteCC1Tool /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/clang/tools/driver/driver.cpp:309:12 CTSRD-CHERI#8 0xaf1602 in main /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/clang/tools/driver/driver.cpp:382:12 CTSRD-CHERI#9 0x7f2c1eecc2e0 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x202e0) CTSRD-CHERI#10 0xad57f9 in _start (/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/bin/clang-10+0xad57f9) SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/clang/lib/Serialization/ASTReader.cpp:1526:50 in llvm-svn: 374328

Currently, clang emits subprograms for declared functions when the target debugger or DWARF standard is known to support entry values (DW_OP_entry_value & the GNU equivalent). Treat DW_AT_tail_call the same way to allow debuggers to follow cross-TU tail calls. Pre-patch debug session with a cross-TU tail call: ``` * frame #0: 0x0000000100000fa4 main`target at b.c:4:3 [opt] frame CTSRD-CHERI#1: 0x0000000100000f99 main`main at a.c:8:10 [opt] ``` Post-patch (note that the tail-calling frame, "helper", is visible): ``` * frame #0: 0x0000000100000fa4 main`target at b.c:4:3 [opt] frame CTSRD-CHERI#1: 0x0000000100000f80 main`helper [opt] [artificial] frame CTSRD-CHERI#2: 0x0000000100000f99 main`main at a.c:8:10 [opt] ``` rdar://46577651 Differential Revision: https://reviews.llvm.org/D69743

…_call is understood" This caused Chromium builds to fail with "inlinable function call in a function with debug info must have a !dbg location" errors. See https://bugs.chromium.org/p/chromium/issues/detail?id=1022296#c1 for a reproducer. > Currently, clang emits subprograms for declared functions when the > target debugger or DWARF standard is known to support entry values > (DW_OP_entry_value & the GNU equivalent). > > Treat DW_AT_tail_call the same way to allow debuggers to follow cross-TU > tail calls. > > Pre-patch debug session with a cross-TU tail call: > > ``` > * frame #0: 0x0000000100000fa4 main`target at b.c:4:3 [opt] > frame CTSRD-CHERI#1: 0x0000000100000f99 main`main at a.c:8:10 [opt] > ``` > > Post-patch (note that the tail-calling frame, "helper", is visible): > > ``` > * frame #0: 0x0000000100000fa4 main`target at b.c:4:3 [opt] > frame CTSRD-CHERI#1: 0x0000000100000f80 main`helper [opt] [artificial] > frame CTSRD-CHERI#2: 0x0000000100000f99 main`main at a.c:8:10 [opt] > ``` > > rdar://46577651 > > Differential Revision: https://reviews.llvm.org/D69743

Currently, clang emits subprograms for declared functions when the target debugger or DWARF standard is known to support entry values (DW_OP_entry_value & the GNU equivalent). Treat DW_AT_tail_call the same way to allow debuggers to follow cross-TU tail calls. Pre-patch debug session with a cross-TU tail call: ``` * frame #0: 0x0000000100000fa4 main`target at b.c:4:3 [opt] frame CTSRD-CHERI#1: 0x0000000100000f99 main`main at a.c:8:10 [opt] ``` Post-patch (note that the tail-calling frame, "helper", is visible): ``` * frame #0: 0x0000000100000fa4 main`target at b.c:4:3 [opt] frame CTSRD-CHERI#1: 0x0000000100000f80 main`helper [opt] [artificial] frame CTSRD-CHERI#2: 0x0000000100000f99 main`main at a.c:8:10 [opt] ``` rdar://46577651 Differential Revision: https://reviews.llvm.org/D69743

…_call is understood" This caused Chromium builds to fail with "inlinable function call in a function with debug info must have a !dbg location" errors. See https://bugs.chromium.org/p/chromium/issues/detail?id=1022296#c1 for a reproducer. > Currently, clang emits subprograms for declared functions when the > target debugger or DWARF standard is known to support entry values > (DW_OP_entry_value & the GNU equivalent). > > Treat DW_AT_tail_call the same way to allow debuggers to follow cross-TU > tail calls. > > Pre-patch debug session with a cross-TU tail call: > > ``` > * frame #0: 0x0000000100000fa4 main`target at b.c:4:3 [opt] > frame CTSRD-CHERI#1: 0x0000000100000f99 main`main at a.c:8:10 [opt] > ``` > > Post-patch (note that the tail-calling frame, "helper", is visible): > > ``` > * frame #0: 0x0000000100000fa4 main`target at b.c:4:3 [opt] > frame CTSRD-CHERI#1: 0x0000000100000f80 main`helper [opt] [artificial] > frame CTSRD-CHERI#2: 0x0000000100000f99 main`main at a.c:8:10 [opt] > ``` > > rdar://46577651 > > Differential Revision: https://reviews.llvm.org/D69743

During register coalescing, we update the live-intervals on-the-fly. To do that we are in this strange mode where the live-intervals can be slightly out-of-sync (more precisely they are forward looking) compared to what the IR actually represents. This happens because the register coalescer only updates the IR when it is done with updating the live-intervals and it has to do it this way because updating the IR on-the-fly would actually clobber some information on how the live-ranges that are being updated look like. This is problematic for updates that rely on the IR to accurately represents the state of the live-ranges. Right now, we have only one of those: stripValuesNotDefiningMask. To reconcile this need of out-of-sync IR, this patch introduces a new argument to LiveInterval::refineSubRanges that allows the code doing the live range updates to reason about how the code should look like after the coalescer will have rewritten the registers. Essentially this captures how a subregister index with be offseted to match its position in a new register class. E.g., let say we want to merge: V1.sub1:<2 x s32> = COPY V2.sub3:<4 x s32> We do that by choosing a class where sub1:<2 x s32> and sub3:<4 x s32> overlap, i.e., by choosing a class where we can find "offset + 1 == 3". Put differently we align V2's sub3 with V1's sub1: V2: sub0 sub1 sub2 sub3 V1: <offset> sub0 sub1 This offset will look like a composed subregidx in the the class: V1.(composed sub2 with sub1):<4 x s32> = COPY V2.sub3:<4 x s32> => V1.(composed sub2 with sub1):<4 x s32> = COPY V2.sub3:<4 x s32> Now if we didn't rewrite the uses and def of V1, all the checks for V1 need to account for this offset to match what the live intervals intend to capture. Prior to this patch, we would fail to recognize the uses and def of V1 and would end up with machine verifier errors: No live segment at def. This could lead to miscompile as we would drop some live-ranges and thus, miss some interferences. For this problem to trigger, we need to reach stripValuesNotDefiningMask while having a mismatch between the IR and the live-ranges (i.e., we have to apply a subreg offset to the IR.) This requires the following three conditions: 1. An update of overlapping subreg lanes: e.g., dsub0 == <ssub0, ssub1> 2. An update with Tuple registers with a possibility to coalesce the subreg index: e.g., v1.dsub_1 == v2.dsub_3 3. Subreg liveness enabled. looking at the IR to decide what is alive and what is not, i.e., calling stripValuesNotDefiningMask. coalescer maintains for the live-ranges information. None of the targets that currently use subreg liveness (i.e., the targets that fulfill CTSRD-CHERI#3, Hexagon, AMDGPU, PowerPC, and SystemZ IIRC) expose CTSRD-CHERI#1 and and CTSRD-CHERI#2, so this patch also artificial enables subreg liveness for ARM, so that a nice test case can be attached.

…ood (reland with fixes) Currently, clang emits subprograms for declared functions when the target debugger or DWARF standard is known to support entry values (DW_OP_entry_value & the GNU equivalent). Treat DW_AT_tail_call the same way to allow debuggers to follow cross-TU tail calls. Pre-patch debug session with a cross-TU tail call: ``` * frame #0: 0x0000000100000fa4 main`target at b.c:4:3 [opt] frame CTSRD-CHERI#1: 0x0000000100000f99 main`main at a.c:8:10 [opt] ``` Post-patch (note that the tail-calling frame, "helper", is visible): ``` * frame #0: 0x0000000100000fa4 main`target at b.c:4:3 [opt] frame CTSRD-CHERI#1: 0x0000000100000f80 main`helper [opt] [artificial] frame CTSRD-CHERI#2: 0x0000000100000f99 main`main at a.c:8:10 [opt] ``` This was reverted in 5b9a072 because it attached declaration subprograms to inlinable builtin calls, which interacted badly with the MergeICmps pass. The fix is to not attach declarations to builtins. rdar://46577651 Differential Revision: https://reviews.llvm.org/D69743

@0

…iant.load" should not be shared with general accesses. Fix for https://bugs.llvm.org/show_bug.cgi?id=42151" Summary: Revert "[DependenceAnalysis] Dependecies for loads marked with "ivnariant.load" should not be shared with general accesses. Fix for https://bugs.llvm.org/show_bug.cgi?id=42151" This reverts commit 5f026b6. We're (tensorflow.org/xla team) seeing some misscompiles with the new change, only at -O3, with fast math disabled. I'm still trying to come up with a useful/small/external example, but for now, the following IR: ``` ; ModuleID = '__compute_module' source_filename = "__compute_module" target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128" target triple = "x86_64-grtev4-linux-gnu" @0 = private unnamed_addr constant [4 x i8] c"\DB\0F\C9@" @1 = private unnamed_addr constant [4 x i8] c"\00\00\00?" ; Function Attrs: uwtable define void @jit_wrapped_fun.31(i8* %retval, i8* noalias %run_options, i8** noalias %params, i8** noalias %buffer_table, i64* noalias %prof_counters) #0 { entry: %fusion.invar_address.dim.2 = alloca i64 %fusion.invar_address.dim.1 = alloca i64 %fusion.invar_address.dim.0 = alloca i64 %fusion.1.invar_address.dim.2 = alloca i64 %fusion.1.invar_address.dim.1 = alloca i64 %fusion.1.invar_address.dim.0 = alloca i64 %0 = getelementptr inbounds i8*, i8** %buffer_table, i64 1 %1 = load i8*, i8** %0, !invariant.load !0, !dereferenceable !1, !align !2 %parameter.3 = bitcast i8* %1 to [2 x [1 x [4 x float]]]* %2 = getelementptr inbounds i8*, i8** %buffer_table, i64 5 %3 = load i8*, i8** %2, !invariant.load !0, !dereferenceable !1, !align !2 %fusion.1 = bitcast i8* %3 to [2 x [1 x [4 x float]]]* store i64 0, i64* %fusion.1.invar_address.dim.0 br label %fusion.1.loop_header.dim.0 fusion.1.loop_header.dim.0: ; preds = %fusion.1.loop_exit.dim.1, %entry %fusion.1.indvar.dim.0 = load i64, i64* %fusion.1.invar_address.dim.0 %4 = icmp uge i64 %fusion.1.indvar.dim.0, 2 br i1 %4, label %fusion.1.loop_exit.dim.0, label %fusion.1.loop_body.dim.0 fusion.1.loop_body.dim.0: ; preds = %fusion.1.loop_header.dim.0 store i64 0, i64* %fusion.1.invar_address.dim.1 br label %fusion.1.loop_header.dim.1 fusion.1.loop_header.dim.1: ; preds = %fusion.1.loop_exit.dim.2, %fusion.1.loop_body.dim.0 %fusion.1.indvar.dim.1 = load i64, i64* %fusion.1.invar_address.dim.1 %5 = icmp uge i64 %fusion.1.indvar.dim.1, 1 br i1 %5, label %fusion.1.loop_exit.dim.1, label %fusion.1.loop_body.dim.1 fusion.1.loop_body.dim.1: ; preds = %fusion.1.loop_header.dim.1 store i64 0, i64* %fusion.1.invar_address.dim.2 br label %fusion.1.loop_header.dim.2 fusion.1.loop_header.dim.2: ; preds = %fusion.1.loop_body.dim.2, %fusion.1.loop_body.dim.1 %fusion.1.indvar.dim.2 = load i64, i64* %fusion.1.invar_address.dim.2 %6 = icmp uge i64 %fusion.1.indvar.dim.2, 4 br i1 %6, label %fusion.1.loop_exit.dim.2, label %fusion.1.loop_body.dim.2 fusion.1.loop_body.dim.2: ; preds = %fusion.1.loop_header.dim.2 %7 = load float, float* bitcast ([4 x i8]* @0 to float*) %8 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %parameter.3, i64 0, i64 %fusion.1.indvar.dim.0, i64 0, i64 %fusion.1.indvar.dim.2 %9 = load float, float* %8, !invariant.load !0, !noalias !3 %10 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %parameter.3, i64 0, i64 %fusion.1.indvar.dim.0, i64 0, i64 %fusion.1.indvar.dim.2 %11 = load float, float* %10, !invariant.load !0, !noalias !3 %12 = fmul float %9, %11 %13 = fmul float %7, %12 %14 = call float @llvm.log.f32(float %13) %15 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %fusion.1, i64 0, i64 %fusion.1.indvar.dim.0, i64 0, i64 %fusion.1.indvar.dim.2 store float %14, float* %15, !alias.scope !7, !noalias !8 %invar.inc2 = add nuw nsw i64 %fusion.1.indvar.dim.2, 1 store i64 %invar.inc2, i64* %fusion.1.invar_address.dim.2 br label %fusion.1.loop_header.dim.2 fusion.1.loop_exit.dim.2: ; preds = %fusion.1.loop_header.dim.2 %invar.inc1 = add nuw nsw i64 %fusion.1.indvar.dim.1, 1 store i64 %invar.inc1, i64* %fusion.1.invar_address.dim.1 br label %fusion.1.loop_header.dim.1 fusion.1.loop_exit.dim.1: ; preds = %fusion.1.loop_header.dim.1 %invar.inc = add nuw nsw i64 %fusion.1.indvar.dim.0, 1 store i64 %invar.inc, i64* %fusion.1.invar_address.dim.0 br label %fusion.1.loop_header.dim.0 fusion.1.loop_exit.dim.0: ; preds = %fusion.1.loop_header.dim.0 %16 = getelementptr inbounds i8*, i8** %buffer_table, i64 4 %17 = load i8*, i8** %16, !invariant.load !0, !dereferenceable !9, !align !2 %parameter.1 = bitcast i8* %17 to float* %18 = getelementptr inbounds i8*, i8** %buffer_table, i64 2 %19 = load i8*, i8** %18, !invariant.load !0, !dereferenceable !10, !align !2 %parameter.2 = bitcast i8* %19 to [3 x [1 x float]]* %20 = getelementptr inbounds i8*, i8** %buffer_table, i64 0 %21 = load i8*, i8** %20, !invariant.load !0, !dereferenceable !11, !align !2 %fusion = bitcast i8* %21 to [2 x [3 x [4 x float]]]* store i64 0, i64* %fusion.invar_address.dim.0 br label %fusion.loop_header.dim.0 fusion.loop_header.dim.0: ; preds = %fusion.loop_exit.dim.1, %fusion.1.loop_exit.dim.0 %fusion.indvar.dim.0 = load i64, i64* %fusion.invar_address.dim.0 %22 = icmp uge i64 %fusion.indvar.dim.0, 2 br i1 %22, label %fusion.loop_exit.dim.0, label %fusion.loop_body.dim.0 fusion.loop_body.dim.0: ; preds = %fusion.loop_header.dim.0 store i64 0, i64* %fusion.invar_address.dim.1 br label %fusion.loop_header.dim.1 fusion.loop_header.dim.1: ; preds = %fusion.loop_exit.dim.2, %fusion.loop_body.dim.0 %fusion.indvar.dim.1 = load i64, i64* %fusion.invar_address.dim.1 %23 = icmp uge i64 %fusion.indvar.dim.1, 3 br i1 %23, label %fusion.loop_exit.dim.1, label %fusion.loop_body.dim.1 fusion.loop_body.dim.1: ; preds = %fusion.loop_header.dim.1 store i64 0, i64* %fusion.invar_address.dim.2 br label %fusion.loop_header.dim.2 fusion.loop_header.dim.2: ; preds = %fusion.loop_body.dim.2, %fusion.loop_body.dim.1 %fusion.indvar.dim.2 = load i64, i64* %fusion.invar_address.dim.2 %24 = icmp uge i64 %fusion.indvar.dim.2, 4 br i1 %24, label %fusion.loop_exit.dim.2, label %fusion.loop_body.dim.2 fusion.loop_body.dim.2: ; preds = %fusion.loop_header.dim.2 %25 = mul nuw nsw i64 %fusion.indvar.dim.2, 1 %26 = add nuw nsw i64 0, %25 %27 = udiv i64 %26, 4 %28 = mul nuw nsw i64 %fusion.indvar.dim.0, 1 %29 = add nuw nsw i64 0, %28 %30 = udiv i64 %29, 2 %31 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %fusion.1, i64 0, i64 %29, i64 0, i64 %26 %32 = load float, float* %31, !alias.scope !7, !noalias !8 %33 = mul nuw nsw i64 %fusion.indvar.dim.1, 1 %34 = add nuw nsw i64 0, %33 %35 = udiv i64 %34, 3 %36 = load float, float* %parameter.1, !invariant.load !0, !noalias !3 %37 = getelementptr inbounds [3 x [1 x float]], [3 x [1 x float]]* %parameter.2, i64 0, i64 %34, i64 0 %38 = load float, float* %37, !invariant.load !0, !noalias !3 %39 = fsub float %36, %38 %40 = fmul float %39, %39 %41 = mul nuw nsw i64 %fusion.indvar.dim.2, 1 %42 = add nuw nsw i64 0, %41 %43 = udiv i64 %42, 4 %44 = mul nuw nsw i64 %fusion.indvar.dim.0, 1 %45 = add nuw nsw i64 0, %44 %46 = udiv i64 %45, 2 %47 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %parameter.3, i64 0, i64 %45, i64 0, i64 %42 %48 = load float, float* %47, !invariant.load !0, !noalias !3 %49 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %parameter.3, i64 0, i64 %45, i64 0, i64 %42 %50 = load float, float* %49, !invariant.load !0, !noalias !3 %51 = fmul float %48, %50 %52 = fdiv float %40, %51 %53 = fadd float %32, %52 %54 = fneg float %53 %55 = load float, float* bitcast ([4 x i8]* @1 to float*) %56 = fmul float %54, %55 %57 = getelementptr inbounds [2 x [3 x [4 x float]]], [2 x [3 x [4 x float]]]* %fusion, i64 0, i64 %fusion.indvar.dim.0, i64 %fusion.indvar.dim.1, i64 %fusion.indvar.dim.2 store float %56, float* %57, !alias.scope !8, !noalias !12 %invar.inc5 = add nuw nsw i64 %fusion.indvar.dim.2, 1 store i64 %invar.inc5, i64* %fusion.invar_address.dim.2 br label %fusion.loop_header.dim.2 fusion.loop_exit.dim.2: ; preds = %fusion.loop_header.dim.2 %invar.inc4 = add nuw nsw i64 %fusion.indvar.dim.1, 1 store i64 %invar.inc4, i64* %fusion.invar_address.dim.1 br label %fusion.loop_header.dim.1 fusion.loop_exit.dim.1: ; preds = %fusion.loop_header.dim.1 %invar.inc3 = add nuw nsw i64 %fusion.indvar.dim.0, 1 store i64 %invar.inc3, i64* %fusion.invar_address.dim.0 br label %fusion.loop_header.dim.0 fusion.loop_exit.dim.0: ; preds = %fusion.loop_header.dim.0 %58 = getelementptr inbounds i8*, i8** %buffer_table, i64 3 %59 = load i8*, i8** %58, !invariant.load !0, !dereferenceable !2, !align !2 %tuple.30 = bitcast i8* %59 to [1 x i8*]* %60 = bitcast [2 x [3 x [4 x float]]]* %fusion to i8* %61 = getelementptr inbounds [1 x i8*], [1 x i8*]* %tuple.30, i64 0, i64 0 store i8* %60, i8** %61, !alias.scope !14, !noalias !8 ret void } ; Function Attrs: nounwind readnone speculatable willreturn declare float @llvm.log.f32(float) CTSRD-CHERI#1 attributes #0 = { uwtable "no-frame-pointer-elim"="false" } attributes CTSRD-CHERI#1 = { nounwind readnone speculatable willreturn } !0 = !{} !1 = !{i64 32} !2 = !{i64 8} !3 = !{!4, !6} !4 = !{!"buffer: {index:0, offset:0, size:96}", !5} !5 = !{!"XLA global AA domain"} !6 = !{!"buffer: {index:5, offset:0, size:32}", !5} !7 = !{!6} !8 = !{!4} !9 = !{i64 4} !10 = !{i64 12} !11 = !{i64 96} !12 = !{!13, !6} !13 = !{!"buffer: {index:3, offset:0, size:8}", !5} !14 = !{!13} ``` gets (correctly) optimized to the one below without the change: ``` ; ModuleID = '__compute_module' source_filename = "__compute_module" target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128" target triple = "x86_64-grtev4-linux-gnu" ; Function Attrs: nofree nounwind uwtable define void @jit_wrapped_fun.31(i8* nocapture readnone %retval, i8* noalias nocapture readnone %run_options, i8** noalias nocapture readnone %params, i8** noalias nocapture readonly %buffer_table, i64* noalias nocapture readnone %prof_counters) local_unnamed_addr #0 { entry: %0 = getelementptr inbounds i8*, i8** %buffer_table, i64 1 %1 = bitcast i8** %0 to [2 x [1 x [4 x float]]]** %2 = load [2 x [1 x [4 x float]]]*, [2 x [1 x [4 x float]]]** %1, align 8, !invariant.load !0, !dereferenceable !1, !align !2 %3 = getelementptr inbounds i8*, i8** %buffer_table, i64 5 %4 = bitcast i8** %3 to [2 x [1 x [4 x float]]]** %5 = load [2 x [1 x [4 x float]]]*, [2 x [1 x [4 x float]]]** %4, align 8, !invariant.load !0, !dereferenceable !1, !align !2 %6 = bitcast [2 x [1 x [4 x float]]]* %2 to <4 x float>* %7 = load <4 x float>, <4 x float>* %6, align 8, !invariant.load !0, !noalias !3 %8 = fmul <4 x float> %7, %7 %9 = fmul <4 x float> %8, <float 0x401921FB60000000, float 0x401921FB60000000, float 0x401921FB60000000, float 0x401921FB60000000> %10 = call <4 x float> @llvm.log.v4f32(<4 x float> %9) %11 = bitcast [2 x [1 x [4 x float]]]* %5 to <4 x float>* store <4 x float> %10, <4 x float>* %11, align 8, !alias.scope !7, !noalias !8 %12 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %2, i64 0, i64 1, i64 0, i64 0 %13 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %5, i64 0, i64 1, i64 0, i64 0 %14 = bitcast float* %12 to <4 x float>* %15 = load <4 x float>, <4 x float>* %14, align 8, !invariant.load !0, !noalias !3 %16 = fmul <4 x float> %15, %15 %17 = fmul <4 x float> %16, <float 0x401921FB60000000, float 0x401921FB60000000, float 0x401921FB60000000, float 0x401921FB60000000> %18 = call <4 x float> @llvm.log.v4f32(<4 x float> %17) %19 = bitcast float* %13 to <4 x float>* store <4 x float> %18, <4 x float>* %19, align 8, !alias.scope !7, !noalias !8 %20 = getelementptr inbounds i8*, i8** %buffer_table, i64 4 %21 = bitcast i8** %20 to float** %22 = load float*, float** %21, align 8, !invariant.load !0, !dereferenceable !9, !align !2 %23 = getelementptr inbounds i8*, i8** %buffer_table, i64 2 %24 = bitcast i8** %23 to [3 x [1 x float]]** %25 = load [3 x [1 x float]]*, [3 x [1 x float]]** %24, align 8, !invariant.load !0, !dereferenceable !10, !align !2 %26 = load i8*, i8** %buffer_table, align 8, !invariant.load !0, !dereferenceable !11, !align !2 %27 = load float, float* %22, align 8, !invariant.load !0, !noalias !3 %.phi.trans.insert28 = getelementptr inbounds [3 x [1 x float]], [3 x [1 x float]]* %25, i64 0, i64 2, i64 0 %.pre29 = load float, float* %.phi.trans.insert28, align 8, !invariant.load !0, !noalias !3 %28 = bitcast [3 x [1 x float]]* %25 to <2 x float>* %29 = load <2 x float>, <2 x float>* %28, align 8, !invariant.load !0, !noalias !3 %30 = insertelement <2 x float> undef, float %27, i32 0 %31 = shufflevector <2 x float> %30, <2 x float> undef, <2 x i32> zeroinitializer %32 = fsub <2 x float> %31, %29 %33 = fmul <2 x float> %32, %32 %shuffle30 = shufflevector <2 x float> %33, <2 x float> undef, <8 x i32> <i32 0, i32 0, i32 0, i32 0, i32 1, i32 1, i32 1, i32 1> %34 = fsub float %27, %.pre29 %35 = fmul float %34, %34 %36 = insertelement <4 x float> undef, float %35, i32 0 %37 = shufflevector <4 x float> %36, <4 x float> undef, <4 x i32> zeroinitializer %shuffle = shufflevector <4 x float> %10, <4 x float> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3> %38 = fmul <4 x float> %7, %7 %shuffle31 = shufflevector <4 x float> %38, <4 x float> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3> %39 = fdiv <8 x float> %shuffle30, %shuffle31 %40 = fadd <8 x float> %shuffle, %39 %41 = fmul <8 x float> %40, <float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01> %42 = bitcast i8* %26 to <8 x float>* store <8 x float> %41, <8 x float>* %42, align 8, !alias.scope !8, !noalias !12 %43 = getelementptr inbounds i8, i8* %26, i64 32 %44 = fdiv <4 x float> %37, %38 %45 = fadd <4 x float> %10, %44 %46 = fmul <4 x float> %45, <float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01> %47 = bitcast i8* %43 to <4 x float>* store <4 x float> %46, <4 x float>* %47, align 8, !alias.scope !8, !noalias !12 %.phi.trans.insert = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %5, i64 0, i64 1, i64 0, i64 0 %.phi.trans.insert12 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %2, i64 0, i64 1, i64 0, i64 0 %48 = bitcast float* %.phi.trans.insert to <4 x float>* %49 = load <4 x float>, <4 x float>* %48, align 8, !alias.scope !7, !noalias !8 %50 = bitcast float* %.phi.trans.insert12 to <4 x float>* %51 = load <4 x float>, <4 x float>* %50, align 8, !invariant.load !0, !noalias !3 %shuffle.1 = shufflevector <4 x float> %49, <4 x float> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3> %52 = getelementptr inbounds i8, i8* %26, i64 48 %53 = fmul <4 x float> %51, %51 %shuffle31.1 = shufflevector <4 x float> %53, <4 x float> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3> %54 = fdiv <8 x float> %shuffle30, %shuffle31.1 %55 = fadd <8 x float> %shuffle.1, %54 %56 = fmul <8 x float> %55, <float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01> %57 = bitcast i8* %52 to <8 x float>* store <8 x float> %56, <8 x float>* %57, align 8, !alias.scope !8, !noalias !12 %58 = getelementptr inbounds i8, i8* %26, i64 80 %59 = fdiv <4 x float> %37, %53 %60 = fadd <4 x float> %49, %59 %61 = fmul <4 x float> %60, <float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01> %62 = bitcast i8* %58 to <4 x float>* store <4 x float> %61, <4 x float>* %62, align 8, !alias.scope !8, !noalias !12 %63 = getelementptr inbounds i8*, i8** %buffer_table, i64 3 %64 = bitcast i8** %63 to [1 x i8*]** %65 = load [1 x i8*]*, [1 x i8*]** %64, align 8, !invariant.load !0, !dereferenceable !2, !align !2 %66 = getelementptr inbounds [1 x i8*], [1 x i8*]* %65, i64 0, i64 0 store i8* %26, i8** %66, align 8, !alias.scope !14, !noalias !8 ret void } ; Function Attrs: nounwind readnone speculatable willreturn declare <4 x float> @llvm.log.v4f32(<4 x float>) CTSRD-CHERI#1 attributes #0 = { nofree nounwind uwtable "no-frame-pointer-elim"="false" } attributes CTSRD-CHERI#1 = { nounwind readnone speculatable willreturn } !0 = !{} !1 = !{i64 32} !2 = !{i64 8} !3 = !{!4, !6} !4 = !{!"buffer: {index:0, offset:0, size:96}", !5} !5 = !{!"XLA global AA domain"} !6 = !{!"buffer: {index:5, offset:0, size:32}", !5} !7 = !{!6} !8 = !{!4} !9 = !{i64 4} !10 = !{i64 12} !11 = !{i64 96} !12 = !{!13, !6} !13 = !{!"buffer: {index:3, offset:0, size:8}", !5} !14 = !{!13} ``` and (incorrectly) optimized to the one below with the change: ``` ; ModuleID = '__compute_module' source_filename = "__compute_module" target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128" target triple = "x86_64-grtev4-linux-gnu" ; Function Attrs: nofree nounwind uwtable define void @jit_wrapped_fun.31(i8* nocapture readnone %retval, i8* noalias nocapture readnone %run_options, i8** noalias nocapture readnone %params, i8** noalias nocapture readonly %buffer_table, i64* noalias nocapture readnone %prof_counters) local_unnamed_addr #0 { entry: %0 = getelementptr inbounds i8*, i8** %buffer_table, i64 1 %1 = bitcast i8** %0 to [2 x [1 x [4 x float]]]** %2 = load [2 x [1 x [4 x float]]]*, [2 x [1 x [4 x float]]]** %1, align 8, !invariant.load !0, !dereferenceable !1, !align !2 %3 = getelementptr inbounds i8*, i8** %buffer_table, i64 5 %4 = bitcast i8** %3 to [2 x [1 x [4 x float]]]** %5 = load [2 x [1 x [4 x float]]]*, [2 x [1 x [4 x float]]]** %4, align 8, !invariant.load !0, !dereferenceable !1, !align !2 %6 = bitcast [2 x [1 x [4 x float]]]* %2 to <4 x float>* %7 = load <4 x float>, <4 x float>* %6, align 8, !invariant.load !0, !noalias !3 %8 = fmul <4 x float> %7, %7 %9 = fmul <4 x float> %8, <float 0x401921FB60000000, float 0x401921FB60000000, float 0x401921FB60000000, float 0x401921FB60000000> %10 = call <4 x float> @llvm.log.v4f32(<4 x float> %9) %11 = bitcast [2 x [1 x [4 x float]]]* %5 to <4 x float>* store <4 x float> %10, <4 x float>* %11, align 8, !alias.scope !7, !noalias !8 %12 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %2, i64 0, i64 1, i64 0, i64 0 %13 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %5, i64 0, i64 1, i64 0, i64 0 %14 = bitcast float* %12 to <4 x float>* %15 = load <4 x float>, <4 x float>* %14, align 8, !invariant.load !0, !noalias !3 %16 = fmul <4 x float> %15, %15 %17 = fmul <4 x float> %16, <float 0x401921FB60000000, float 0x401921FB60000000, float 0x401921FB60000000, float 0x401921FB60000000> %18 = call <4 x float> @llvm.log.v4f32(<4 x float> %17) %19 = bitcast float* %13 to <4 x float>* store <4 x float> %18, <4 x float>* %19, align 8, !alias.scope !7, !noalias !8 %20 = getelementptr inbounds i8*, i8** %buffer_table, i64 4 %21 = bitcast i8** %20 to float** %22 = load float*, float** %21, align 8, !invariant.load !0, !dereferenceable !9, !align !2 %23 = getelementptr inbounds i8*, i8** %buffer_table, i64 2 %24 = bitcast i8** %23 to [3 x [1 x float]]** %25 = load [3 x [1 x float]]*, [3 x [1 x float]]** %24, align 8, !invariant.load !0, !dereferenceable !10, !align !2 %26 = load i8*, i8** %buffer_table, align 8, !invariant.load !0, !dereferenceable !11, !align !2 %27 = load float, float* %22, align 8, !invariant.load !0, !noalias !3 %.phi.trans.insert28 = getelementptr inbounds [3 x [1 x float]], [3 x [1 x float]]* %25, i64 0, i64 2, i64 0 %.pre29 = load float, float* %.phi.trans.insert28, align 8, !invariant.load !0, !noalias !3 %28 = bitcast [3 x [1 x float]]* %25 to <2 x float>* %29 = load <2 x float>, <2 x float>* %28, align 8, !invariant.load !0, !noalias !3 %30 = insertelement <2 x float> undef, float %27, i32 0 %31 = shufflevector <2 x float> %30, <2 x float> undef, <2 x i32> zeroinitializer %32 = fsub <2 x float> %31, %29 %33 = fmul <2 x float> %32, %32 %shuffle32 = shufflevector <2 x float> %33, <2 x float> undef, <8 x i32> <i32 0, i32 0, i32 0, i32 0, i32 1, i32 1, i32 1, i32 1> %34 = fsub float %27, %.pre29 %35 = fmul float %34, %34 %36 = insertelement <4 x float> undef, float %35, i32 0 %37 = shufflevector <4 x float> %36, <4 x float> undef, <4 x i32> zeroinitializer %shuffle = shufflevector <4 x float> %10, <4 x float> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3> %38 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %5, i64 0, i64 0, i64 0, i64 3 %39 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %2, i64 0, i64 0, i64 0, i64 3 %40 = fmul <4 x float> %7, %7 %41 = shufflevector <4 x float> %40, <4 x float> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef> %42 = fdiv <8 x float> %shuffle32, %41 %43 = fadd <8 x float> %shuffle, %42 %44 = fmul <8 x float> %43, <float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01> %45 = bitcast i8* %26 to <8 x float>* store <8 x float> %44, <8 x float>* %45, align 8, !alias.scope !8, !noalias !12 %46 = extractelement <4 x float> %10, i32 0 %47 = getelementptr inbounds i8, i8* %26, i64 32 %48 = extractelement <4 x float> %10, i32 1 %49 = extractelement <4 x float> %10, i32 2 %50 = load float, float* %38, align 4, !alias.scope !7, !noalias !8 %51 = load float, float* %39, align 4, !invariant.load !0, !noalias !3 %52 = fmul float %51, %51 %53 = insertelement <4 x float> undef, float %52, i32 3 %54 = fdiv <4 x float> %37, %53 %55 = insertelement <4 x float> undef, float %46, i32 0 %56 = insertelement <4 x float> %55, float %48, i32 1 %57 = insertelement <4 x float> %56, float %49, i32 2 %58 = insertelement <4 x float> %57, float %50, i32 3 %59 = fadd <4 x float> %58, %54 %60 = fmul <4 x float> %59, <float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01> %61 = bitcast i8* %47 to <4 x float>* store <4 x float> %60, <4 x float>* %61, align 8, !alias.scope !8, !noalias !12 %.phi.trans.insert = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %5, i64 0, i64 1, i64 0, i64 0 %.phi.trans.insert12 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %2, i64 0, i64 1, i64 0, i64 0 %62 = bitcast float* %.phi.trans.insert to <4 x float>* %63 = load <4 x float>, <4 x float>* %62, align 8, !alias.scope !7, !noalias !8 %64 = bitcast float* %.phi.trans.insert12 to <4 x float>* %65 = load <4 x float>, <4 x float>* %64, align 8, !invariant.load !0, !noalias !3 %shuffle.1 = shufflevector <4 x float> %63, <4 x float> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3> %66 = getelementptr inbounds i8, i8* %26, i64 48 %67 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %5, i64 0, i64 1, i64 0, i64 3 %68 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %2, i64 0, i64 1, i64 0, i64 3 %69 = fmul <4 x float> %65, %65 %70 = shufflevector <4 x float> %69, <4 x float> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3> %71 = fdiv <8 x float> %shuffle32, %70 %72 = fadd <8 x float> %shuffle.1, %71 %73 = fmul <8 x float> %72, <float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01> %74 = bitcast i8* %66 to <8 x float>* store <8 x float> %73, <8 x float>* %74, align 8, !alias.scope !8, !noalias !12 %75 = extractelement <4 x float> %69, i32 0 %76 = extractelement <4 x float> %63, i32 0 %77 = getelementptr inbounds i8, i8* %26, i64 80 %78 = extractelement <4 x float> %69, i32 1 %79 = extractelement <4 x float> %63, i32 1 %80 = extractelement <4 x float> %69, i32 2 %81 = extractelement <4 x float> %63, i32 2 %82 = load float, float* %67, align 4, !alias.scope !7, !noalias !8 %83 = load float, float* %68, align 4, !invariant.load !0, !noalias !3 %84 = fmul float %83, %83 %85 = insertelement <4 x float> undef, float %75, i32 0 %86 = insertelement <4 x float> %85, float %78, i32 1 %87 = insertelement <4 x float> %86, float %80, i32 2 %88 = insertelement <4 x float> %87, float %84, i32 3 %89 = fdiv <4 x float> %37, %88 %90 = insertelement <4 x float> undef, float %76, i32 0 %91 = insertelement <4 x float> %90, float %79, i32 1 %92 = insertelement <4 x float> %91, float %81, i32 2 %93 = insertelement <4 x float> %92, float %82, i32 3 %94 = fadd <4 x float> %93, %89 %95 = fmul <4 x float> %94, <float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01> %96 = bitcast i8* %77 to <4 x float>* store <4 x float> %95, <4 x float>* %96, align 8, !alias.scope !8, !noalias !12 %97 = getelementptr inbounds i8*, i8** %buffer_table, i64 3 %98 = bitcast i8** %97 to [1 x i8*]** %99 = load [1 x i8*]*, [1 x i8*]** %98, align 8, !invariant.load !0, !dereferenceable !2, !align !2 %100 = getelementptr inbounds [1 x i8*], [1 x i8*]* %99, i64 0, i64 0 store i8* %26, i8** %100, align 8, !alias.scope !14, !noalias !8 ret void } ; Function Attrs: nounwind readnone speculatable willreturn declare <4 x float> @llvm.log.v4f32(<4 x float>) CTSRD-CHERI#1 attributes #0 = { nofree nounwind uwtable "no-frame-pointer-elim"="false" } attributes CTSRD-CHERI#1 = { nounwind readnone speculatable willreturn } !0 = !{} !1 = !{i64 32} !2 = !{i64 8} !3 = !{!4, !6} !4 = !{!"buffer: {index:0, offset:0, size:96}", !5} !5 = !{!"XLA global AA domain"} !6 = !{!"buffer: {index:5, offset:0, size:32}", !5} !7 = !{!6} !8 = !{!4} !9 = !{i64 4} !10 = !{i64 12} !11 = !{i64 96} !12 = !{!13, !6} !13 = !{!"buffer: {index:3, offset:0, size:8}", !5} !14 = !{!13} ``` This results in bad numerical answers when used through XLA. Again, it's not that easy to give a small fully-reproducible example, but the misscompare is: ``` Expected literal: ( f32[2,3,4] { { { nan, -inf, -3181.35, -inf }, { nan, -inf, -28.2577019, -inf }, { nan, -inf, -28.2577019, -inf } }, { { -inf, -inf, -inf, -inf }, { -6.60753046e+28, -1.47314833e+23, -inf, -inf }, { -2.43504347e+30, -5.42892693e+24, -inf, -inf } } } ) Actual literal: ( f32[2,3,4] { { { nan, -inf, -3181.35, -inf }, { nan, -inf, -inf, -inf }, { inf, -inf, -28.2577019, -inf } }, { { -inf, -inf, -inf, -inf }, { -6.60753046e+28, -1.47314833e+23, -inf, -inf }, { -2.43504347e+30, -5.42892693e+24, -inf, -inf } } } ) ``` Reviewers: sanjoy.google, sanjoy, ebrevnov, jdoerfert, reames, chandlerc Subscribers: hiraditya, Charusso, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70516

This patch adds the following intrinsics for gather loads with 64-bit offsets: * @llvm.aarch64.sve.ld1.gather (unscaled offset) * @llvm.aarch64.sve.ld1.gather.index (scaled offset) These intrinsics map 1-1 to the following AArch64 instructions respectively (examples for half-words): * ld1h { z0.d }, p0/z, [x0, z0.d] * ld1h { z0.d }, p0/z, [x0, z0.d, lsl CTSRD-CHERI#1] Committing on behalf of Andrzej Warzynski (andwar) Reviewers: sdesmalen, huntergr, rovka, mgudim, dancgr, rengolin, efriedma Reviewed By: efriedma Tags: #llvm Differential Revision: https://reviews.llvm.org/D70542

This patch adds intrinsics for SVE gather loads for which the offsets are 32-bits wide and are: * unscaled * @llvm.aarch64.sve.ld1.gather.sxtw * @llvm.aarch64.sve.ld1.gather.uxtw * scaled (offsets become indices) * @llvm.arch64.sve.ld1.gather.sxtw.index * @llvm.arch64.sve.ld1.gather.uxtw.index The offsets are either zero (uxtw) or sign (sxtw) extended to 64 bits. These intrinsics map 1-1 to the corresponding SVE instructions (examples for half-words): * unscaled * ld1h { z0.s }, p0/z, [x0, z0.s, sxtw] * ld1h { z0.s }, p0/z, [x0, z0.s, uxtw] * scaled * ld1h { z0.s }, p0/z, [x0, z0.s, sxtw CTSRD-CHERI#1] * ld1h { z0.s }, p0/z, [x0, z0.s, uxtw CTSRD-CHERI#1] Committed on behalf of Andrzej Warzynski (andwar) Reviewers: sdesmalen, kmclaughlin, eli.friedman, rengolin, rovka, huntergr, dancgr, mgudim, efriedma Reviewed By: sdesmalen Tags: #llvm Differential Revision: https://reviews.llvm.org/D70782

…ood (reland with fixes) Currently, clang emits subprograms for declared functions when the target debugger or DWARF standard is known to support entry values (DW_OP_entry_value & the GNU equivalent). Treat DW_AT_tail_call the same way to allow debuggers to follow cross-TU tail calls. Pre-patch debug session with a cross-TU tail call: ``` * frame #0: 0x0000000100000fa4 main`target at b.c:4:3 [opt] frame CTSRD-CHERI#1: 0x0000000100000f99 main`main at a.c:8:10 [opt] ``` Post-patch (note that the tail-calling frame, "helper", is visible): ``` * frame #0: 0x0000000100000fa4 main`target at b.c:4:3 [opt] frame CTSRD-CHERI#1: 0x0000000100000f80 main`helper [opt] [artificial] frame CTSRD-CHERI#2: 0x0000000100000f99 main`main at a.c:8:10 [opt] ``` This was reverted in 5b9a072 because it attached declaration subprograms to inlinable builtin calls, which interacted badly with the MergeICmps pass. The fix is to not attach declarations to builtins. rdar://46577651 Differential Revision: https://reviews.llvm.org/D69743

@0

…iant.load" should not be shared with general accesses. Fix for https://bugs.llvm.org/show_bug.cgi?id=42151" Summary: Revert "[DependenceAnalysis] Dependecies for loads marked with "ivnariant.load" should not be shared with general accesses. Fix for https://bugs.llvm.org/show_bug.cgi?id=42151" This reverts commit 5f026b6. We're (tensorflow.org/xla team) seeing some misscompiles with the new change, only at -O3, with fast math disabled. I'm still trying to come up with a useful/small/external example, but for now, the following IR: ``` ; ModuleID = '__compute_module' source_filename = "__compute_module" target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128" target triple = "x86_64-grtev4-linux-gnu" @0 = private unnamed_addr constant [4 x i8] c"\DB\0F\C9@" @1 = private unnamed_addr constant [4 x i8] c"\00\00\00?" ; Function Attrs: uwtable define void @jit_wrapped_fun.31(i8* %retval, i8* noalias %run_options, i8** noalias %params, i8** noalias %buffer_table, i64* noalias %prof_counters) #0 { entry: %fusion.invar_address.dim.2 = alloca i64 %fusion.invar_address.dim.1 = alloca i64 %fusion.invar_address.dim.0 = alloca i64 %fusion.1.invar_address.dim.2 = alloca i64 %fusion.1.invar_address.dim.1 = alloca i64 %fusion.1.invar_address.dim.0 = alloca i64 %0 = getelementptr inbounds i8*, i8** %buffer_table, i64 1 %1 = load i8*, i8** %0, !invariant.load !0, !dereferenceable !1, !align !2 %parameter.3 = bitcast i8* %1 to [2 x [1 x [4 x float]]]* %2 = getelementptr inbounds i8*, i8** %buffer_table, i64 5 %3 = load i8*, i8** %2, !invariant.load !0, !dereferenceable !1, !align !2 %fusion.1 = bitcast i8* %3 to [2 x [1 x [4 x float]]]* store i64 0, i64* %fusion.1.invar_address.dim.0 br label %fusion.1.loop_header.dim.0 fusion.1.loop_header.dim.0: ; preds = %fusion.1.loop_exit.dim.1, %entry %fusion.1.indvar.dim.0 = load i64, i64* %fusion.1.invar_address.dim.0 %4 = icmp uge i64 %fusion.1.indvar.dim.0, 2 br i1 %4, label %fusion.1.loop_exit.dim.0, label %fusion.1.loop_body.dim.0 fusion.1.loop_body.dim.0: ; preds = %fusion.1.loop_header.dim.0 store i64 0, i64* %fusion.1.invar_address.dim.1 br label %fusion.1.loop_header.dim.1 fusion.1.loop_header.dim.1: ; preds = %fusion.1.loop_exit.dim.2, %fusion.1.loop_body.dim.0 %fusion.1.indvar.dim.1 = load i64, i64* %fusion.1.invar_address.dim.1 %5 = icmp uge i64 %fusion.1.indvar.dim.1, 1 br i1 %5, label %fusion.1.loop_exit.dim.1, label %fusion.1.loop_body.dim.1 fusion.1.loop_body.dim.1: ; preds = %fusion.1.loop_header.dim.1 store i64 0, i64* %fusion.1.invar_address.dim.2 br label %fusion.1.loop_header.dim.2 fusion.1.loop_header.dim.2: ; preds = %fusion.1.loop_body.dim.2, %fusion.1.loop_body.dim.1 %fusion.1.indvar.dim.2 = load i64, i64* %fusion.1.invar_address.dim.2 %6 = icmp uge i64 %fusion.1.indvar.dim.2, 4 br i1 %6, label %fusion.1.loop_exit.dim.2, label %fusion.1.loop_body.dim.2 fusion.1.loop_body.dim.2: ; preds = %fusion.1.loop_header.dim.2 %7 = load float, float* bitcast ([4 x i8]* @0 to float*) %8 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %parameter.3, i64 0, i64 %fusion.1.indvar.dim.0, i64 0, i64 %fusion.1.indvar.dim.2 %9 = load float, float* %8, !invariant.load !0, !noalias !3 %10 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %parameter.3, i64 0, i64 %fusion.1.indvar.dim.0, i64 0, i64 %fusion.1.indvar.dim.2 %11 = load float, float* %10, !invariant.load !0, !noalias !3 %12 = fmul float %9, %11 %13 = fmul float %7, %12 %14 = call float @llvm.log.f32(float %13) %15 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %fusion.1, i64 0, i64 %fusion.1.indvar.dim.0, i64 0, i64 %fusion.1.indvar.dim.2 store float %14, float* %15, !alias.scope !7, !noalias !8 %invar.inc2 = add nuw nsw i64 %fusion.1.indvar.dim.2, 1 store i64 %invar.inc2, i64* %fusion.1.invar_address.dim.2 br label %fusion.1.loop_header.dim.2 fusion.1.loop_exit.dim.2: ; preds = %fusion.1.loop_header.dim.2 %invar.inc1 = add nuw nsw i64 %fusion.1.indvar.dim.1, 1 store i64 %invar.inc1, i64* %fusion.1.invar_address.dim.1 br label %fusion.1.loop_header.dim.1 fusion.1.loop_exit.dim.1: ; preds = %fusion.1.loop_header.dim.1 %invar.inc = add nuw nsw i64 %fusion.1.indvar.dim.0, 1 store i64 %invar.inc, i64* %fusion.1.invar_address.dim.0 br label %fusion.1.loop_header.dim.0 fusion.1.loop_exit.dim.0: ; preds = %fusion.1.loop_header.dim.0 %16 = getelementptr inbounds i8*, i8** %buffer_table, i64 4 %17 = load i8*, i8** %16, !invariant.load !0, !dereferenceable !9, !align !2 %parameter.1 = bitcast i8* %17 to float* %18 = getelementptr inbounds i8*, i8** %buffer_table, i64 2 %19 = load i8*, i8** %18, !invariant.load !0, !dereferenceable !10, !align !2 %parameter.2 = bitcast i8* %19 to [3 x [1 x float]]* %20 = getelementptr inbounds i8*, i8** %buffer_table, i64 0 %21 = load i8*, i8** %20, !invariant.load !0, !dereferenceable !11, !align !2 %fusion = bitcast i8* %21 to [2 x [3 x [4 x float]]]* store i64 0, i64* %fusion.invar_address.dim.0 br label %fusion.loop_header.dim.0 fusion.loop_header.dim.0: ; preds = %fusion.loop_exit.dim.1, %fusion.1.loop_exit.dim.0 %fusion.indvar.dim.0 = load i64, i64* %fusion.invar_address.dim.0 %22 = icmp uge i64 %fusion.indvar.dim.0, 2 br i1 %22, label %fusion.loop_exit.dim.0, label %fusion.loop_body.dim.0 fusion.loop_body.dim.0: ; preds = %fusion.loop_header.dim.0 store i64 0, i64* %fusion.invar_address.dim.1 br label %fusion.loop_header.dim.1 fusion.loop_header.dim.1: ; preds = %fusion.loop_exit.dim.2, %fusion.loop_body.dim.0 %fusion.indvar.dim.1 = load i64, i64* %fusion.invar_address.dim.1 %23 = icmp uge i64 %fusion.indvar.dim.1, 3 br i1 %23, label %fusion.loop_exit.dim.1, label %fusion.loop_body.dim.1 fusion.loop_body.dim.1: ; preds = %fusion.loop_header.dim.1 store i64 0, i64* %fusion.invar_address.dim.2 br label %fusion.loop_header.dim.2 fusion.loop_header.dim.2: ; preds = %fusion.loop_body.dim.2, %fusion.loop_body.dim.1 %fusion.indvar.dim.2 = load i64, i64* %fusion.invar_address.dim.2 %24 = icmp uge i64 %fusion.indvar.dim.2, 4 br i1 %24, label %fusion.loop_exit.dim.2, label %fusion.loop_body.dim.2 fusion.loop_body.dim.2: ; preds = %fusion.loop_header.dim.2 %25 = mul nuw nsw i64 %fusion.indvar.dim.2, 1 %26 = add nuw nsw i64 0, %25 %27 = udiv i64 %26, 4 %28 = mul nuw nsw i64 %fusion.indvar.dim.0, 1 %29 = add nuw nsw i64 0, %28 %30 = udiv i64 %29, 2 %31 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %fusion.1, i64 0, i64 %29, i64 0, i64 %26 %32 = load float, float* %31, !alias.scope !7, !noalias !8 %33 = mul nuw nsw i64 %fusion.indvar.dim.1, 1 %34 = add nuw nsw i64 0, %33 %35 = udiv i64 %34, 3 %36 = load float, float* %parameter.1, !invariant.load !0, !noalias !3 %37 = getelementptr inbounds [3 x [1 x float]], [3 x [1 x float]]* %parameter.2, i64 0, i64 %34, i64 0 %38 = load float, float* %37, !invariant.load !0, !noalias !3 %39 = fsub float %36, %38 %40 = fmul float %39, %39 %41 = mul nuw nsw i64 %fusion.indvar.dim.2, 1 %42 = add nuw nsw i64 0, %41 %43 = udiv i64 %42, 4 %44 = mul nuw nsw i64 %fusion.indvar.dim.0, 1 %45 = add nuw nsw i64 0, %44 %46 = udiv i64 %45, 2 %47 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %parameter.3, i64 0, i64 %45, i64 0, i64 %42 %48 = load float, float* %47, !invariant.load !0, !noalias !3 %49 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %parameter.3, i64 0, i64 %45, i64 0, i64 %42 %50 = load float, float* %49, !invariant.load !0, !noalias !3 %51 = fmul float %48, %50 %52 = fdiv float %40, %51 %53 = fadd float %32, %52 %54 = fneg float %53 %55 = load float, float* bitcast ([4 x i8]* @1 to float*) %56 = fmul float %54, %55 %57 = getelementptr inbounds [2 x [3 x [4 x float]]], [2 x [3 x [4 x float]]]* %fusion, i64 0, i64 %fusion.indvar.dim.0, i64 %fusion.indvar.dim.1, i64 %fusion.indvar.dim.2 store float %56, float* %57, !alias.scope !8, !noalias !12 %invar.inc5 = add nuw nsw i64 %fusion.indvar.dim.2, 1 store i64 %invar.inc5, i64* %fusion.invar_address.dim.2 br label %fusion.loop_header.dim.2 fusion.loop_exit.dim.2: ; preds = %fusion.loop_header.dim.2 %invar.inc4 = add nuw nsw i64 %fusion.indvar.dim.1, 1 store i64 %invar.inc4, i64* %fusion.invar_address.dim.1 br label %fusion.loop_header.dim.1 fusion.loop_exit.dim.1: ; preds = %fusion.loop_header.dim.1 %invar.inc3 = add nuw nsw i64 %fusion.indvar.dim.0, 1 store i64 %invar.inc3, i64* %fusion.invar_address.dim.0 br label %fusion.loop_header.dim.0 fusion.loop_exit.dim.0: ; preds = %fusion.loop_header.dim.0 %58 = getelementptr inbounds i8*, i8** %buffer_table, i64 3 %59 = load i8*, i8** %58, !invariant.load !0, !dereferenceable !2, !align !2 %tuple.30 = bitcast i8* %59 to [1 x i8*]* %60 = bitcast [2 x [3 x [4 x float]]]* %fusion to i8* %61 = getelementptr inbounds [1 x i8*], [1 x i8*]* %tuple.30, i64 0, i64 0 store i8* %60, i8** %61, !alias.scope !14, !noalias !8 ret void } ; Function Attrs: nounwind readnone speculatable willreturn declare float @llvm.log.f32(float) CTSRD-CHERI#1 attributes #0 = { uwtable "no-frame-pointer-elim"="false" } attributes CTSRD-CHERI#1 = { nounwind readnone speculatable willreturn } !0 = !{} !1 = !{i64 32} !2 = !{i64 8} !3 = !{!4, !6} !4 = !{!"buffer: {index:0, offset:0, size:96}", !5} !5 = !{!"XLA global AA domain"} !6 = !{!"buffer: {index:5, offset:0, size:32}", !5} !7 = !{!6} !8 = !{!4} !9 = !{i64 4} !10 = !{i64 12} !11 = !{i64 96} !12 = !{!13, !6} !13 = !{!"buffer: {index:3, offset:0, size:8}", !5} !14 = !{!13} ``` gets (correctly) optimized to the one below without the change: ``` ; ModuleID = '__compute_module' source_filename = "__compute_module" target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128" target triple = "x86_64-grtev4-linux-gnu" ; Function Attrs: nofree nounwind uwtable define void @jit_wrapped_fun.31(i8* nocapture readnone %retval, i8* noalias nocapture readnone %run_options, i8** noalias nocapture readnone %params, i8** noalias nocapture readonly %buffer_table, i64* noalias nocapture readnone %prof_counters) local_unnamed_addr #0 { entry: %0 = getelementptr inbounds i8*, i8** %buffer_table, i64 1 %1 = bitcast i8** %0 to [2 x [1 x [4 x float]]]** %2 = load [2 x [1 x [4 x float]]]*, [2 x [1 x [4 x float]]]** %1, align 8, !invariant.load !0, !dereferenceable !1, !align !2 %3 = getelementptr inbounds i8*, i8** %buffer_table, i64 5 %4 = bitcast i8** %3 to [2 x [1 x [4 x float]]]** %5 = load [2 x [1 x [4 x float]]]*, [2 x [1 x [4 x float]]]** %4, align 8, !invariant.load !0, !dereferenceable !1, !align !2 %6 = bitcast [2 x [1 x [4 x float]]]* %2 to <4 x float>* %7 = load <4 x float>, <4 x float>* %6, align 8, !invariant.load !0, !noalias !3 %8 = fmul <4 x float> %7, %7 %9 = fmul <4 x float> %8, <float 0x401921FB60000000, float 0x401921FB60000000, float 0x401921FB60000000, float 0x401921FB60000000> %10 = call <4 x float> @llvm.log.v4f32(<4 x float> %9) %11 = bitcast [2 x [1 x [4 x float]]]* %5 to <4 x float>* store <4 x float> %10, <4 x float>* %11, align 8, !alias.scope !7, !noalias !8 %12 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %2, i64 0, i64 1, i64 0, i64 0 %13 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %5, i64 0, i64 1, i64 0, i64 0 %14 = bitcast float* %12 to <4 x float>* %15 = load <4 x float>, <4 x float>* %14, align 8, !invariant.load !0, !noalias !3 %16 = fmul <4 x float> %15, %15 %17 = fmul <4 x float> %16, <float 0x401921FB60000000, float 0x401921FB60000000, float 0x401921FB60000000, float 0x401921FB60000000> %18 = call <4 x float> @llvm.log.v4f32(<4 x float> %17) %19 = bitcast float* %13 to <4 x float>* store <4 x float> %18, <4 x float>* %19, align 8, !alias.scope !7, !noalias !8 %20 = getelementptr inbounds i8*, i8** %buffer_table, i64 4 %21 = bitcast i8** %20 to float** %22 = load float*, float** %21, align 8, !invariant.load !0, !dereferenceable !9, !align !2 %23 = getelementptr inbounds i8*, i8** %buffer_table, i64 2 %24 = bitcast i8** %23 to [3 x [1 x float]]** %25 = load [3 x [1 x float]]*, [3 x [1 x float]]** %24, align 8, !invariant.load !0, !dereferenceable !10, !align !2 %26 = load i8*, i8** %buffer_table, align 8, !invariant.load !0, !dereferenceable !11, !align !2 %27 = load float, float* %22, align 8, !invariant.load !0, !noalias !3 %.phi.trans.insert28 = getelementptr inbounds [3 x [1 x float]], [3 x [1 x float]]* %25, i64 0, i64 2, i64 0 %.pre29 = load float, float* %.phi.trans.insert28, align 8, !invariant.load !0, !noalias !3 %28 = bitcast [3 x [1 x float]]* %25 to <2 x float>* %29 = load <2 x float>, <2 x float>* %28, align 8, !invariant.load !0, !noalias !3 %30 = insertelement <2 x float> undef, float %27, i32 0 %31 = shufflevector <2 x float> %30, <2 x float> undef, <2 x i32> zeroinitializer %32 = fsub <2 x float> %31, %29 %33 = fmul <2 x float> %32, %32 %shuffle30 = shufflevector <2 x float> %33, <2 x float> undef, <8 x i32> <i32 0, i32 0, i32 0, i32 0, i32 1, i32 1, i32 1, i32 1> %34 = fsub float %27, %.pre29 %35 = fmul float %34, %34 %36 = insertelement <4 x float> undef, float %35, i32 0 %37 = shufflevector <4 x float> %36, <4 x float> undef, <4 x i32> zeroinitializer %shuffle = shufflevector <4 x float> %10, <4 x float> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3> %38 = fmul <4 x float> %7, %7 %shuffle31 = shufflevector <4 x float> %38, <4 x float> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3> %39 = fdiv <8 x float> %shuffle30, %shuffle31 %40 = fadd <8 x float> %shuffle, %39 %41 = fmul <8 x float> %40, <float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01> %42 = bitcast i8* %26 to <8 x float>* store <8 x float> %41, <8 x float>* %42, align 8, !alias.scope !8, !noalias !12 %43 = getelementptr inbounds i8, i8* %26, i64 32 %44 = fdiv <4 x float> %37, %38 %45 = fadd <4 x float> %10, %44 %46 = fmul <4 x float> %45, <float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01> %47 = bitcast i8* %43 to <4 x float>* store <4 x float> %46, <4 x float>* %47, align 8, !alias.scope !8, !noalias !12 %.phi.trans.insert = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %5, i64 0, i64 1, i64 0, i64 0 %.phi.trans.insert12 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %2, i64 0, i64 1, i64 0, i64 0 %48 = bitcast float* %.phi.trans.insert to <4 x float>* %49 = load <4 x float>, <4 x float>* %48, align 8, !alias.scope !7, !noalias !8 %50 = bitcast float* %.phi.trans.insert12 to <4 x float>* %51 = load <4 x float>, <4 x float>* %50, align 8, !invariant.load !0, !noalias !3 %shuffle.1 = shufflevector <4 x float> %49, <4 x float> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3> %52 = getelementptr inbounds i8, i8* %26, i64 48 %53 = fmul <4 x float> %51, %51 %shuffle31.1 = shufflevector <4 x float> %53, <4 x float> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3> %54 = fdiv <8 x float> %shuffle30, %shuffle31.1 %55 = fadd <8 x float> %shuffle.1, %54 %56 = fmul <8 x float> %55, <float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01> %57 = bitcast i8* %52 to <8 x float>* store <8 x float> %56, <8 x float>* %57, align 8, !alias.scope !8, !noalias !12 %58 = getelementptr inbounds i8, i8* %26, i64 80 %59 = fdiv <4 x float> %37, %53 %60 = fadd <4 x float> %49, %59 %61 = fmul <4 x float> %60, <float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01> %62 = bitcast i8* %58 to <4 x float>* store <4 x float> %61, <4 x float>* %62, align 8, !alias.scope !8, !noalias !12 %63 = getelementptr inbounds i8*, i8** %buffer_table, i64 3 %64 = bitcast i8** %63 to [1 x i8*]** %65 = load [1 x i8*]*, [1 x i8*]** %64, align 8, !invariant.load !0, !dereferenceable !2, !align !2 %66 = getelementptr inbounds [1 x i8*], [1 x i8*]* %65, i64 0, i64 0 store i8* %26, i8** %66, align 8, !alias.scope !14, !noalias !8 ret void } ; Function Attrs: nounwind readnone speculatable willreturn declare <4 x float> @llvm.log.v4f32(<4 x float>) CTSRD-CHERI#1 attributes #0 = { nofree nounwind uwtable "no-frame-pointer-elim"="false" } attributes CTSRD-CHERI#1 = { nounwind readnone speculatable willreturn } !0 = !{} !1 = !{i64 32} !2 = !{i64 8} !3 = !{!4, !6} !4 = !{!"buffer: {index:0, offset:0, size:96}", !5} !5 = !{!"XLA global AA domain"} !6 = !{!"buffer: {index:5, offset:0, size:32}", !5} !7 = !{!6} !8 = !{!4} !9 = !{i64 4} !10 = !{i64 12} !11 = !{i64 96} !12 = !{!13, !6} !13 = !{!"buffer: {index:3, offset:0, size:8}", !5} !14 = !{!13} ``` and (incorrectly) optimized to the one below with the change: ``` ; ModuleID = '__compute_module' source_filename = "__compute_module" target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128" target triple = "x86_64-grtev4-linux-gnu" ; Function Attrs: nofree nounwind uwtable define void @jit_wrapped_fun.31(i8* nocapture readnone %retval, i8* noalias nocapture readnone %run_options, i8** noalias nocapture readnone %params, i8** noalias nocapture readonly %buffer_table, i64* noalias nocapture readnone %prof_counters) local_unnamed_addr #0 { entry: %0 = getelementptr inbounds i8*, i8** %buffer_table, i64 1 %1 = bitcast i8** %0 to [2 x [1 x [4 x float]]]** %2 = load [2 x [1 x [4 x float]]]*, [2 x [1 x [4 x float]]]** %1, align 8, !invariant.load !0, !dereferenceable !1, !align !2 %3 = getelementptr inbounds i8*, i8** %buffer_table, i64 5 %4 = bitcast i8** %3 to [2 x [1 x [4 x float]]]** %5 = load [2 x [1 x [4 x float]]]*, [2 x [1 x [4 x float]]]** %4, align 8, !invariant.load !0, !dereferenceable !1, !align !2 %6 = bitcast [2 x [1 x [4 x float]]]* %2 to <4 x float>* %7 = load <4 x float>, <4 x float>* %6, align 8, !invariant.load !0, !noalias !3 %8 = fmul <4 x float> %7, %7 %9 = fmul <4 x float> %8, <float 0x401921FB60000000, float 0x401921FB60000000, float 0x401921FB60000000, float 0x401921FB60000000> %10 = call <4 x float> @llvm.log.v4f32(<4 x float> %9) %11 = bitcast [2 x [1 x [4 x float]]]* %5 to <4 x float>* store <4 x float> %10, <4 x float>* %11, align 8, !alias.scope !7, !noalias !8 %12 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %2, i64 0, i64 1, i64 0, i64 0 %13 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %5, i64 0, i64 1, i64 0, i64 0 %14 = bitcast float* %12 to <4 x float>* %15 = load <4 x float>, <4 x float>* %14, align 8, !invariant.load !0, !noalias !3 %16 = fmul <4 x float> %15, %15 %17 = fmul <4 x float> %16, <float 0x401921FB60000000, float 0x401921FB60000000, float 0x401921FB60000000, float 0x401921FB60000000> %18 = call <4 x float> @llvm.log.v4f32(<4 x float> %17) %19 = bitcast float* %13 to <4 x float>* store <4 x float> %18, <4 x float>* %19, align 8, !alias.scope !7, !noalias !8 %20 = getelementptr inbounds i8*, i8** %buffer_table, i64 4 %21 = bitcast i8** %20 to float** %22 = load float*, float** %21, align 8, !invariant.load !0, !dereferenceable !9, !align !2 %23 = getelementptr inbounds i8*, i8** %buffer_table, i64 2 %24 = bitcast i8** %23 to [3 x [1 x float]]** %25 = load [3 x [1 x float]]*, [3 x [1 x float]]** %24, align 8, !invariant.load !0, !dereferenceable !10, !align !2 %26 = load i8*, i8** %buffer_table, align 8, !invariant.load !0, !dereferenceable !11, !align !2 %27 = load float, float* %22, align 8, !invariant.load !0, !noalias !3 %.phi.trans.insert28 = getelementptr inbounds [3 x [1 x float]], [3 x [1 x float]]* %25, i64 0, i64 2, i64 0 %.pre29 = load float, float* %.phi.trans.insert28, align 8, !invariant.load !0, !noalias !3 %28 = bitcast [3 x [1 x float]]* %25 to <2 x float>* %29 = load <2 x float>, <2 x float>* %28, align 8, !invariant.load !0, !noalias !3 %30 = insertelement <2 x float> undef, float %27, i32 0 %31 = shufflevector <2 x float> %30, <2 x float> undef, <2 x i32> zeroinitializer %32 = fsub <2 x float> %31, %29 %33 = fmul <2 x float> %32, %32 %shuffle32 = shufflevector <2 x float> %33, <2 x float> undef, <8 x i32> <i32 0, i32 0, i32 0, i32 0, i32 1, i32 1, i32 1, i32 1> %34 = fsub float %27, %.pre29 %35 = fmul float %34, %34 %36 = insertelement <4 x float> undef, float %35, i32 0 %37 = shufflevector <4 x float> %36, <4 x float> undef, <4 x i32> zeroinitializer %shuffle = shufflevector <4 x float> %10, <4 x float> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3> %38 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %5, i64 0, i64 0, i64 0, i64 3 %39 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %2, i64 0, i64 0, i64 0, i64 3 %40 = fmul <4 x float> %7, %7 %41 = shufflevector <4 x float> %40, <4 x float> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef> %42 = fdiv <8 x float> %shuffle32, %41 %43 = fadd <8 x float> %shuffle, %42 %44 = fmul <8 x float> %43, <float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01> %45 = bitcast i8* %26 to <8 x float>* store <8 x float> %44, <8 x float>* %45, align 8, !alias.scope !8, !noalias !12 %46 = extractelement <4 x float> %10, i32 0 %47 = getelementptr inbounds i8, i8* %26, i64 32 %48 = extractelement <4 x float> %10, i32 1 %49 = extractelement <4 x float> %10, i32 2 %50 = load float, float* %38, align 4, !alias.scope !7, !noalias !8 %51 = load float, float* %39, align 4, !invariant.load !0, !noalias !3 %52 = fmul float %51, %51 %53 = insertelement <4 x float> undef, float %52, i32 3 %54 = fdiv <4 x float> %37, %53 %55 = insertelement <4 x float> undef, float %46, i32 0 %56 = insertelement <4 x float> %55, float %48, i32 1 %57 = insertelement <4 x float> %56, float %49, i32 2 %58 = insertelement <4 x float> %57, float %50, i32 3 %59 = fadd <4 x float> %58, %54 %60 = fmul <4 x float> %59, <float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01> %61 = bitcast i8* %47 to <4 x float>* store <4 x float> %60, <4 x float>* %61, align 8, !alias.scope !8, !noalias !12 %.phi.trans.insert = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %5, i64 0, i64 1, i64 0, i64 0 %.phi.trans.insert12 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %2, i64 0, i64 1, i64 0, i64 0 %62 = bitcast float* %.phi.trans.insert to <4 x float>* %63 = load <4 x float>, <4 x float>* %62, align 8, !alias.scope !7, !noalias !8 %64 = bitcast float* %.phi.trans.insert12 to <4 x float>* %65 = load <4 x float>, <4 x float>* %64, align 8, !invariant.load !0, !noalias !3 %shuffle.1 = shufflevector <4 x float> %63, <4 x float> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3> %66 = getelementptr inbounds i8, i8* %26, i64 48 %67 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %5, i64 0, i64 1, i64 0, i64 3 %68 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %2, i64 0, i64 1, i64 0, i64 3 %69 = fmul <4 x float> %65, %65 %70 = shufflevector <4 x float> %69, <4 x float> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3> %71 = fdiv <8 x float> %shuffle32, %70 %72 = fadd <8 x float> %shuffle.1, %71 %73 = fmul <8 x float> %72, <float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01> %74 = bitcast i8* %66 to <8 x float>* store <8 x float> %73, <8 x float>* %74, align 8, !alias.scope !8, !noalias !12 %75 = extractelement <4 x float> %69, i32 0 %76 = extractelement <4 x float> %63, i32 0 %77 = getelementptr inbounds i8, i8* %26, i64 80 %78 = extractelement <4 x float> %69, i32 1 %79 = extractelement <4 x float> %63, i32 1 %80 = extractelement <4 x float> %69, i32 2 %81 = extractelement <4 x float> %63, i32 2 %82 = load float, float* %67, align 4, !alias.scope !7, !noalias !8 %83 = load float, float* %68, align 4, !invariant.load !0, !noalias !3 %84 = fmul float %83, %83 %85 = insertelement <4 x float> undef, float %75, i32 0 %86 = insertelement <4 x float> %85, float %78, i32 1 %87 = insertelement <4 x float> %86, float %80, i32 2 %88 = insertelement <4 x float> %87, float %84, i32 3 %89 = fdiv <4 x float> %37, %88 %90 = insertelement <4 x float> undef, float %76, i32 0 %91 = insertelement <4 x float> %90, float %79, i32 1 %92 = insertelement <4 x float> %91, float %81, i32 2 %93 = insertelement <4 x float> %92, float %82, i32 3 %94 = fadd <4 x float> %93, %89 %95 = fmul <4 x float> %94, <float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01> %96 = bitcast i8* %77 to <4 x float>* store <4 x float> %95, <4 x float>* %96, align 8, !alias.scope !8, !noalias !12 %97 = getelementptr inbounds i8*, i8** %buffer_table, i64 3 %98 = bitcast i8** %97 to [1 x i8*]** %99 = load [1 x i8*]*, [1 x i8*]** %98, align 8, !invariant.load !0, !dereferenceable !2, !align !2 %100 = getelementptr inbounds [1 x i8*], [1 x i8*]* %99, i64 0, i64 0 store i8* %26, i8** %100, align 8, !alias.scope !14, !noalias !8 ret void } ; Function Attrs: nounwind readnone speculatable willreturn declare <4 x float> @llvm.log.v4f32(<4 x float>) CTSRD-CHERI#1 attributes #0 = { nofree nounwind uwtable "no-frame-pointer-elim"="false" } attributes CTSRD-CHERI#1 = { nounwind readnone speculatable willreturn } !0 = !{} !1 = !{i64 32} !2 = !{i64 8} !3 = !{!4, !6} !4 = !{!"buffer: {index:0, offset:0, size:96}", !5} !5 = !{!"XLA global AA domain"} !6 = !{!"buffer: {index:5, offset:0, size:32}", !5} !7 = !{!6} !8 = !{!4} !9 = !{i64 4} !10 = !{i64 12} !11 = !{i64 96} !12 = !{!13, !6} !13 = !{!"buffer: {index:3, offset:0, size:8}", !5} !14 = !{!13} ``` This results in bad numerical answers when used through XLA. Again, it's not that easy to give a small fully-reproducible example, but the misscompare is: ``` Expected literal: ( f32[2,3,4] { { { nan, -inf, -3181.35, -inf }, { nan, -inf, -28.2577019, -inf }, { nan, -inf, -28.2577019, -inf } }, { { -inf, -inf, -inf, -inf }, { -6.60753046e+28, -1.47314833e+23, -inf, -inf }, { -2.43504347e+30, -5.42892693e+24, -inf, -inf } } } ) Actual literal: ( f32[2,3,4] { { { nan, -inf, -3181.35, -inf }, { nan, -inf, -inf, -inf }, { inf, -inf, -28.2577019, -inf } }, { { -inf, -inf, -inf, -inf }, { -6.60753046e+28, -1.47314833e+23, -inf, -inf }, { -2.43504347e+30, -5.42892693e+24, -inf, -inf } } } ) ``` Reviewers: sanjoy.google, sanjoy, ebrevnov, jdoerfert, reames, chandlerc Subscribers: hiraditya, Charusso, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70516

This patch adds the following intrinsics for gather loads with 64-bit offsets: * @llvm.aarch64.sve.ld1.gather (unscaled offset) * @llvm.aarch64.sve.ld1.gather.index (scaled offset) These intrinsics map 1-1 to the following AArch64 instructions respectively (examples for half-words): * ld1h { z0.d }, p0/z, [x0, z0.d] * ld1h { z0.d }, p0/z, [x0, z0.d, lsl CTSRD-CHERI#1] Committing on behalf of Andrzej Warzynski (andwar) Reviewers: sdesmalen, huntergr, rovka, mgudim, dancgr, rengolin, efriedma Reviewed By: efriedma Tags: #llvm Differential Revision: https://reviews.llvm.org/D70542

This patch adds intrinsics for SVE gather loads for which the offsets are 32-bits wide and are: * unscaled * @llvm.aarch64.sve.ld1.gather.sxtw * @llvm.aarch64.sve.ld1.gather.uxtw * scaled (offsets become indices) * @llvm.arch64.sve.ld1.gather.sxtw.index * @llvm.arch64.sve.ld1.gather.uxtw.index The offsets are either zero (uxtw) or sign (sxtw) extended to 64 bits. These intrinsics map 1-1 to the corresponding SVE instructions (examples for half-words): * unscaled * ld1h { z0.s }, p0/z, [x0, z0.s, sxtw] * ld1h { z0.s }, p0/z, [x0, z0.s, uxtw] * scaled * ld1h { z0.s }, p0/z, [x0, z0.s, sxtw CTSRD-CHERI#1] * ld1h { z0.s }, p0/z, [x0, z0.s, uxtw CTSRD-CHERI#1] Committed on behalf of Andrzej Warzynski (andwar) Reviewers: sdesmalen, kmclaughlin, eli.friedman, rengolin, rovka, huntergr, dancgr, mgudim, efriedma Reviewed By: sdesmalen Tags: #llvm Differential Revision: https://reviews.llvm.org/D70782

…t binding This fixes a failing testcase on Fedora 30 x86_64 (regression Fedora 29->30): PASS: ./bin/lldb ./lldb-test-build.noindex/functionalities/unwind/noreturn/TestNoreturnUnwind.test_dwarf/a.out -o 'settings set symbols.enable-external-lookup false' -o r -o bt -o quit * frame #0: 0x00007ffff7aa6e75 libc.so.6`__GI_raise + 325 frame CTSRD-CHERI#1: 0x00007ffff7a91895 libc.so.6`__GI_abort + 295 frame CTSRD-CHERI#2: 0x0000000000401140 a.out`func_c at main.c:12:2 frame CTSRD-CHERI#3: 0x000000000040113a a.out`func_b at main.c:18:2 frame CTSRD-CHERI#4: 0x0000000000401134 a.out`func_a at main.c:26:2 frame CTSRD-CHERI#5: 0x000000000040112e a.out`main(argc=<unavailable>, argv=<unavailable>) at main.c:32:2 frame CTSRD-CHERI#6: 0x00007ffff7a92f33 libc.so.6`__libc_start_main + 243 frame CTSRD-CHERI#7: 0x000000000040106e a.out`_start + 46 vs. FAIL - unrecognized abort() function: ./bin/lldb ./lldb-test-build.noindex/functionalities/unwind/noreturn/TestNoreturnUnwind.test_dwarf/a.out -o 'settings set symbols.enable-external-lookup false' -o r -o bt -o quit * frame #0: 0x00007ffff7aa6e75 libc.so.6`.annobin_raise.c + 325 frame CTSRD-CHERI#1: 0x00007ffff7a91895 libc.so.6`.annobin_loadmsgcat.c_end.unlikely + 295 frame CTSRD-CHERI#2: 0x0000000000401140 a.out`func_c at main.c:12:2 frame CTSRD-CHERI#3: 0x000000000040113a a.out`func_b at main.c:18:2 frame CTSRD-CHERI#4: 0x0000000000401134 a.out`func_a at main.c:26:2 frame CTSRD-CHERI#5: 0x000000000040112e a.out`main(argc=<unavailable>, argv=<unavailable>) at main.c:32:2 frame CTSRD-CHERI#6: 0x00007ffff7a92f33 libc.so.6`.annobin_libc_start.c + 243 frame CTSRD-CHERI#7: 0x000000000040106e a.out`.annobin_init.c.hot + 46 The extra ELF symbols are there due to Annobin (I did not investigate why this problem happened specifically since F-30 and not since F-28). It is due to: Symbol table '.dynsym' contains 2361 entries: Valu e Size Type Bind Vis Name 0000000000022769 5 FUNC LOCAL DEFAULT _nl_load_domain.cold 000000000002276e 0 NOTYPE LOCAL HIDDEN .annobin_abort.c.unlikely ... 000000000002276e 0 NOTYPE LOCAL HIDDEN .annobin_loadmsgcat.c_end.unlikely ... 000000000002276e 0 NOTYPE LOCAL HIDDEN .annobin_textdomain.c_end.unlikely 000000000002276e 548 FUNC GLOBAL DEFAULT abort 000000000002276e 548 FUNC GLOBAL DEFAULT abort@@GLIBC_2.2.5 000000000002276e 548 FUNC LOCAL DEFAULT __GI_abort 0000000000022992 0 NOTYPE LOCAL HIDDEN .annobin_abort.c_end.unlikely GDB has some more complicated preferences between overlapping and/or sharing address symbols, I have made here so far the most simple fix for this case. Differential revision: https://reviews.llvm.org/D63540

@AndreyChurbanov

TSan spuriously reports for any OpenMP application a race on the initialization of a runtime internal mutex: ``` Atomic read of size 1 at 0x7b6800005940 by thread T4: #0 pthread_mutex_lock <null> (a.out+0x43f39e) CTSRD-CHERI#1 __kmp_resume_64 <null> (libomp.so.5+0x84db4) Previous write of size 1 at 0x7b6800005940 by thread T7: #0 pthread_mutex_init <null> (a.out+0x424793) CTSRD-CHERI#1 __kmp_suspend_initialize_thread <null> (libomp.so.5+0x8422e) ``` According to @AndreyChurbanov this is a false positive report, as the control flow of the runtime guarantees the ordering of the mutex initialization and the lock: https://software.intel.com/en-us/forums/intel-open-source-openmp-runtime-library/topic/530363 To suppress this report, I suggest the use of TSAN_OPTIONS='ignore_uninstrumented_modules=1'. With this patch, a runtime warning is provided in case an OpenMP application is built with Tsan and executed without this Tsan-option. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D70412

The test is currently failing on some systems with ASAN enabled due to: ``` ==22898==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x603000003da4 at pc 0x00010951c33d bp 0x7ffee6709e00 sp 0x7ffee67095c0 READ of size 5 at 0x603000003da4 thread T0 #0 0x10951c33c in wrap_memmove+0x16c (libclang_rt.asan_osx_dynamic.dylib:x86_64+0x1833c) CTSRD-CHERI#1 0x7fff4a327f57 in CFDataReplaceBytes+0x1ba (CoreFoundation:x86_64+0x13f57) CTSRD-CHERI#2 0x7fff4a415a44 in __CFDataInit+0x2db (CoreFoundation:x86_64+0x101a44) CTSRD-CHERI#3 0x1094f8490 in main main.m:424 CTSRD-CHERI#4 0x7fff77482084 in start+0x0 (libdyld.dylib:x86_64+0x17084) 0x603000003da4 is located 0 bytes to the right of 20-byte region [0x603000003d90,0x603000003da4) allocated by thread T0 here: #0 0x109547c02 in wrap_calloc+0xa2 (libclang_rt.asan_osx_dynamic.dylib:x86_64+0x43c02) CTSRD-CHERI#1 0x7fff763ad3ef in class_createInstance+0x52 (libobjc.A.dylib:x86_64+0x73ef) CTSRD-CHERI#2 0x7fff4c6b2d73 in NSAllocateObject+0x12 (Foundation:x86_64+0x1d73) CTSRD-CHERI#3 0x7fff4c6b5e5f in -[_NSPlaceholderData initWithBytes:length:copy:deallocator:]+0x40 (Foundation:x86_64+0x4e5f) CTSRD-CHERI#4 0x7fff4c6d4cf1 in -[NSData(NSData) initWithBytes:length:]+0x24 (Foundation:x86_64+0x23cf1) CTSRD-CHERI#5 0x1094f8245 in main main.m:404 CTSRD-CHERI#6 0x7fff77482084 in start+0x0 (libdyld.dylib:x86_64+0x17084) ``` The reason is that we create a string "HELLO" but get the size wrong (it's 5 bytes instead of 4). Later on we read the buffer and pretend it is 5 bytes long, causing an OOB read which ASAN detects. In general this test probably needs some cleanup as it produces on macOS 10.15 around 100 compiler warnings which isn't great, but let's first get the bot green.

When building/testing ASan inside the GCC tree on Solaris while using GNU `ld` instead of Solaris `ld`, a large number of tests SEGVs on both sparc and x86 like this: Thread 2 received signal SIGSEGV, Segmentation fault. [Switching to Thread 1 (LWP 1)] 0xfe014cfc in __sanitizer::atomic_load<__sanitizer::atomic_uintptr_t> (a=0xfc602a58, mo=__sanitizer::memory_order_acquire) at sanitizer_common/sanitizer_atomic_clang_x86.h:46 46 v = a->val_dont_use; 1: x/i $pc => 0xfe014cfc <_ZN11__sanitizer11atomic_loadINS_16atomic_uintptr_tEEENT_4TypeEPVKS2_NS_12memory_orderE+62>: mov (%eax),%eax (gdb) bt #0 0xfe014cfc in __sanitizer::atomic_load<__sanitizer::atomic_uintptr_t> (a=0xfc602a58, mo=__sanitizer::memory_order_acquire) at sanitizer_common/sanitizer_atomic_clang_x86.h:46 #1 0xfe0bd1d7 in __sanitizer::DTLS_NextBlock (cur=0xfc602a58) at sanitizer_common/sanitizer_tls_get_addr.cpp:53 CTSRD-CHERI#2 0xfe0bd319 in __sanitizer::DTLS_Find (id=1) at sanitizer_common/sanitizer_tls_get_addr.cpp:77 CTSRD-CHERI#3 0xfe0bd466 in __sanitizer::DTLS_on_tls_get_addr (arg_void=0xfeffd068, res=0xfe602a18, static_tls_begin=0, static_tls_end=0) at sanitizer_common/sanitizer_tls_get_addr.cpp:116 CTSRD-CHERI#4 0xfe063f81 in __interceptor___tls_get_addr (arg=0xfeffd068) at sanitizer_common/sanitizer_common_interceptors.inc:5501 CTSRD-CHERI#5 0xfe0a3054 in __sanitizer::CollectStaticTlsBlocks (info=0xfeffd108, size=40, data=0xfeffd16c) at sanitizer_common/sanitizer_linux_libcdep.cpp:366 CTSRD-CHERI#6 0xfe6ba9fa in dl_iterate_phdr () from /usr/lib/ld.so.1 CTSRD-CHERI#7 0xfe0a3132 in __sanitizer::GetStaticTlsBoundary (addr=0xfe608020, size=0xfeffd244, align=0xfeffd1b0) at sanitizer_common/sanitizer_linux_libcdep.cpp:382 CTSRD-CHERI#8 0xfe0a33f7 in __sanitizer::GetTls (addr=0xfe608020, size=0xfeffd244) at sanitizer_common/sanitizer_linux_libcdep.cpp:482 CTSRD-CHERI#9 0xfe0a34b1 in __sanitizer::GetThreadStackAndTls (main=true, stk_addr=0xfe608010, stk_size=0xfeffd240, tls_addr=0xfe608020, tls_size=0xfeffd244) at sanitizer_common/sanitizer_linux_libcdep.cpp:565 The address being accessed is unmapped. However, even when the tests `PASS` with Solaris `ld`, `ASAN_OPTIONS=verbosity=2` shows ==6582==__tls_get_addr: Can't guess glibc version Given that that the code is stricly `glibc`-specific according to `sanitizer_tls_get_addr.h`, there seems little point in using the interceptor on non-`glibc` targets. That's what this patch does. Tested on `i386-pc-solaris2.11` and `sparc-sun-solaris2.11` inside the GCC tree. Differential Revision: https://reviews.llvm.org/D141385

Change https://reviews.llvm.org/D140059 exposed the following crash in Z3Solver, where bit widths were not checked consistently with that change. This change makes the check consistent, and fixes the crash. ``` clang: <root>/llvm/include/llvm/ADT/APSInt.h:99: int64_t llvm::APSInt::getExtValue() const: Assertion `isRepresentableByInt64() && "Too many bits for int64_t"' failed. ... Stack dump: 0. Program arguments: clang -cc1 -internal-isystem <root>/lib/clang/16/include -nostdsysteminc -analyze -analyzer-checker=core,unix.Malloc,debug.ExprInspection -analyzer-config crosscheck-with-z3=true -verify reproducer.c #0 0x00000000045b3476 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) <root>/llvm/lib/Support/Unix/Signals.inc:567:22 #1 0x00000000045b3862 PrintStackTraceSignalHandler(void*) <root>/llvm/lib/Support/Unix/Signals.inc:641:1 CTSRD-CHERI#2 0x00000000045b14a5 llvm::sys::RunSignalHandlers() <root>/llvm/lib/Support/Signals.cpp:104:20 CTSRD-CHERI#3 0x00000000045b2eb4 SignalHandler(int) <root>/llvm/lib/Support/Unix/Signals.inc:412:1 ... CTSRD-CHERI#9 0x0000000004be2eb3 llvm::APSInt::getExtValue() const <root>/llvm/include/llvm/ADT/APSInt.h:99:5 <root>/llvm/lib/Support/Z3Solver.cpp:740:53 clang::ASTContext&, clang::ento::SymExpr const*, llvm::APSInt const&, llvm::APSInt const&, bool) <root>/clang/include/clang/StaticAnalyzer/Core/PathSensitive/SMTConv.h:552:61 ``` Reviewed By: steakhal Differential Revision: https://reviews.llvm.org/D142627

…ak ordering `std::sort` requires a comparison operator that obides by strict weak ordering. `operator<=` on pointer does not and leads to undefined behaviour. Specifically, when we grow the `scratch_type_systems` vector slightly larger (and thus take `std::sort` down a slightly different codepath), we segfault. This happened while working on a patch that would in fact grow this vector. In such a case ASAN reports: ``` $ ./bin/lldb ./lldb-test-build.noindex/lang/cpp/complete-type-check/TestCppIsTypeComplete.test_builtin_types/a.out -o "script -- lldb.target.FindFirstType(\"void\")" (lldb) script -- lldb.target.FindFirstType("void") ================================================================= ==59975==ERROR: AddressSanitizer: container-overflow on address 0x000108f6b510 at pc 0x000280177b4c bp 0x00016b7d7430 sp 0x00016b7d7428 READ of size 8 at 0x000108f6b510 thread T0 #0 0x280177b48 in std::__1::shared_ptr<lldb_private::TypeSystem>::shared_ptr[abi:v15006](std::__1::shared_ptr<lldb_private::TypeSystem> const&)+0xb4 (/Users/michaelbuch/Git/lldb-build-main-no-modules/lib/liblldb.17.0.0git.dylib:arm64+0x177b48) (BuildId: ea963d2c0d47354fb647f5c5f32b76d932000000200000000100000000000d00) #1 0x280dcc008 in void std::__1::__introsort<std::__1::_ClassicAlgPolicy, lldb_private::Target::GetScratchTypeSystems(bool)::$_3&, std::__1::shared_ptr<lldb_private::TypeSystem>*>(std::__1::shared_ptr<lldb_private::TypeSystem>*, std::__1::shared_ ptr<lldb_private::TypeSystem>*, lldb_private::Target::GetScratchTypeSystems(bool)::$_3&, std::__1::iterator_traits<std::__1::shared_ptr<lldb_private::TypeSystem>*>::difference_type)+0x1050 (/Users/michaelbuch/Git/lldb-build-main-no-modules/lib/liblld b.17.0.0git.dylib:arm64+0xdcc008) (BuildId: ea963d2c0d47354fb647f5c5f32b76d932000000200000000100000000000d00) CTSRD-CHERI#2 0x280d88788 in lldb_private::Target::GetScratchTypeSystems(bool)+0x5a4 (/Users/michaelbuch/Git/lldb-build-main-no-modules/lib/liblldb.17.0.0git.dylib:arm64+0xd88788) (BuildId: ea963d2c0d47354fb647f5c5f32b76d932000000200000000100000000000d00) CTSRD-CHERI#3 0x28021f0b4 in lldb::SBTarget::FindFirstType(char const*)+0x624 (/Users/michaelbuch/Git/lldb-build-main-no-modules/lib/liblldb.17.0.0git.dylib:arm64+0x21f0b4) (BuildId: ea963d2c0d47354fb647f5c5f32b76d932000000200000000100000000000d00) CTSRD-CHERI#4 0x2804e9590 in _wrap_SBTarget_FindFirstType(_object*, _object*)+0x26c (/Users/michaelbuch/Git/lldb-build-main-no-modules/lib/liblldb.17.0.0git.dylib:arm64+0x4e9590) (BuildId: ea963d2c0d47354fb647f5c5f32b76d932000000200000000100000000000d00) CTSRD-CHERI#5 0x1062d3ad4 in cfunction_call+0x5c (/opt/homebrew/Cellar/python@3.11/3.11.1/Frameworks/Python.framework/Versions/3.11/Python:arm64+0xcfad4) (BuildId: c9efc4bbb1943f9a9b7cc4e91fce477732000000200000000100000000000d00) <--- snipped ---> 0x000108f6b510 is located 400 bytes inside of 512-byte region [0x000108f6b380,0x000108f6b580) allocated by thread T0 here: #0 0x105209414 in wrap__Znwm+0x74 (/Applications/Xcode2.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/14.0.3/lib/darwin/libclang_rt.asan_osx_dynamic.dylib:arm64e+0x51414) (BuildId: 0a44828ceb64337bbfff60b22cd838f0320000 00200000000100000000000b00) #1 0x280dca3b4 in std::__1::__split_buffer<std::__1::shared_ptr<lldb_private::TypeSystem>, std::__1::allocator<std::__1::shared_ptr<lldb_private::TypeSystem>>&>::__split_buffer(unsigned long, unsigned long, std::__1::allocator<std::__1::shared_pt r<lldb_private::TypeSystem>>&)+0x11c (/Users/michaelbuch/Git/lldb-build-main-no-modules/lib/liblldb.17.0.0git.dylib:arm64+0xdca3b4) (BuildId: ea963d2c0d47354fb647f5c5f32b76d932000000200000000100000000000d00) CTSRD-CHERI#2 0x280dc978c in void std::__1::vector<std::__1::shared_ptr<lldb_private::TypeSystem>, std::__1::allocator<std::__1::shared_ptr<lldb_private::TypeSystem>>>::__push_back_slow_path<std::__1::shared_ptr<lldb_private::TypeSystem> const&>(std::__1::s hared_ptr<lldb_private::TypeSystem> const&)+0x13c (/Users/michaelbuch/Git/lldb-build-main-no-modules/lib/liblldb.17.0.0git.dylib:arm64+0xdc978c) (BuildId: ea963d2c0d47354fb647f5c5f32b76d932000000200000000100000000000d00) CTSRD-CHERI#3 0x280d88dec in std::__1::vector<std::__1::shared_ptr<lldb_private::TypeSystem>, std::__1::allocator<std::__1::shared_ptr<lldb_private::TypeSystem>>>::push_back[abi:v15006](std::__1::shared_ptr<lldb_private::TypeSystem> const&)+0x80 (/Users/mic haelbuch/Git/lldb-build-main-no-modules/lib/liblldb.17.0.0git.dylib:arm64+0xd88dec) (BuildId: ea963d2c0d47354fb647f5c5f32b76d932000000200000000100000000000d00) CTSRD-CHERI#4 0x280d8857c in lldb_private::Target::GetScratchTypeSystems(bool)+0x398 (/Users/michaelbuch/Git/lldb-build-main-no-modules/lib/liblldb.17.0.0git.dylib:arm64+0xd8857c) (BuildId: ea963d2c0d47354fb647f5c5f32b76d932000000200000000100000000000d00) CTSRD-CHERI#5 0x28021f0b4 in lldb::SBTarget::FindFirstType(char const*)+0x624 (/Users/michaelbuch/Git/lldb-build-main-no-modules/lib/liblldb.17.0.0git.dylib:arm64+0x21f0b4) (BuildId: ea963d2c0d47354fb647f5c5f32b76d932000000200000000100000000000d00) CTSRD-CHERI#6 0x2804e9590 in _wrap_SBTarget_FindFirstType(_object*, _object*)+0x26c (/Users/michaelbuch/Git/lldb-build-main-no-modules/lib/liblldb.17.0.0git.dylib:arm64+0x4e9590) (BuildId: ea963d2c0d47354fb647f5c5f32b76d932000000200000000100000000000d00) CTSRD-CHERI#7 0x1062d3ad4 in cfunction_call+0x5c (/opt/homebrew/Cellar/python@3.11/3.11.1/Frameworks/Python.framework/Versions/3.11/Python:arm64+0xcfad4) (BuildId: c9efc4bbb1943f9a9b7cc4e91fce477732000000200000000100000000000d00) CTSRD-CHERI#8 0x10627fff0 in _PyObject_MakeTpCall+0x7c (/opt/homebrew/Cellar/python@3.11/3.11.1/Frameworks/Python.framework/Versions/3.11/Python:arm64+0x7bff0) (BuildId: c9efc4bbb1943f9a9b7cc4e91fce477732000000200000000100000000000d00) CTSRD-CHERI#9 0x106378a98 in _PyEval_EvalFrameDefault+0xbcf8 (/opt/homebrew/Cellar/python@3.11/3.11.1/Frameworks/Python.framework/Versions/3.11/Python:arm64+0x174a98) (BuildId: c9efc4bbb1943f9a9b7cc4e91fce477732000000200000000100000000000d00) ``` Differential Revision: https://reviews.llvm.org/D142709

Change https://reviews.llvm.org/D140059 exposed the following crash in Z3Solver, where bit widths were not checked consistently with that change. This change makes the check consistent, and fixes the crash. ``` clang: <root>/llvm/include/llvm/ADT/APSInt.h:99: int64_t llvm::APSInt::getExtValue() const: Assertion `isRepresentableByInt64() && "Too many bits for int64_t"' failed. ... Stack dump: 0. Program arguments: clang -cc1 -internal-isystem <root>/lib/clang/16/include -nostdsysteminc -analyze -analyzer-checker=core,unix.Malloc,debug.ExprInspection -analyzer-config crosscheck-with-z3=true -verify reproducer.c #0 0x00000000045b3476 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) <root>/llvm/lib/Support/Unix/Signals.inc:567:22 #1 0x00000000045b3862 PrintStackTraceSignalHandler(void*) <root>/llvm/lib/Support/Unix/Signals.inc:641:1 CTSRD-CHERI#2 0x00000000045b14a5 llvm::sys::RunSignalHandlers() <root>/llvm/lib/Support/Signals.cpp:104:20 CTSRD-CHERI#3 0x00000000045b2eb4 SignalHandler(int) <root>/llvm/lib/Support/Unix/Signals.inc:412:1 ... CTSRD-CHERI#9 0x0000000004be2eb3 llvm::APSInt::getExtValue() const <root>/llvm/include/llvm/ADT/APSInt.h:99:5 <root>/llvm/lib/Support/Z3Solver.cpp:740:53 clang::ASTContext&, clang::ento::SymExpr const*, llvm::APSInt const&, llvm::APSInt const&, bool) <root>/clang/include/clang/StaticAnalyzer/Core/PathSensitive/SMTConv.h:552:61 ``` Reviewed By: steakhal Differential Revision: https://reviews.llvm.org/D142627

…ak ordering `std::sort` requires a comparison operator that obides by strict weak ordering. `operator<=` on pointer does not and leads to undefined behaviour. Specifically, when we grow the `scratch_type_systems` vector slightly larger (and thus take `std::sort` down a slightly different codepath), we segfault. This happened while working on a patch that would in fact grow this vector. In such a case ASAN reports: ``` $ ./bin/lldb ./lldb-test-build.noindex/lang/cpp/complete-type-check/TestCppIsTypeComplete.test_builtin_types/a.out -o "script -- lldb.target.FindFirstType(\"void\")" (lldb) script -- lldb.target.FindFirstType("void") ================================================================= ==59975==ERROR: AddressSanitizer: container-overflow on address 0x000108f6b510 at pc 0x000280177b4c bp 0x00016b7d7430 sp 0x00016b7d7428 READ of size 8 at 0x000108f6b510 thread T0 #0 0x280177b48 in std::__1::shared_ptr<lldb_private::TypeSystem>::shared_ptr[abi:v15006](std::__1::shared_ptr<lldb_private::TypeSystem> const&)+0xb4 (/Users/michaelbuch/Git/lldb-build-main-no-modules/lib/liblldb.17.0.0git.dylib:arm64+0x177b48) (BuildId: ea963d2c0d47354fb647f5c5f32b76d932000000200000000100000000000d00) #1 0x280dcc008 in void std::__1::__introsort<std::__1::_ClassicAlgPolicy, lldb_private::Target::GetScratchTypeSystems(bool)::$_3&, std::__1::shared_ptr<lldb_private::TypeSystem>*>(std::__1::shared_ptr<lldb_private::TypeSystem>*, std::__1::shared_ ptr<lldb_private::TypeSystem>*, lldb_private::Target::GetScratchTypeSystems(bool)::$_3&, std::__1::iterator_traits<std::__1::shared_ptr<lldb_private::TypeSystem>*>::difference_type)+0x1050 (/Users/michaelbuch/Git/lldb-build-main-no-modules/lib/liblld b.17.0.0git.dylib:arm64+0xdcc008) (BuildId: ea963d2c0d47354fb647f5c5f32b76d932000000200000000100000000000d00) CTSRD-CHERI#2 0x280d88788 in lldb_private::Target::GetScratchTypeSystems(bool)+0x5a4 (/Users/michaelbuch/Git/lldb-build-main-no-modules/lib/liblldb.17.0.0git.dylib:arm64+0xd88788) (BuildId: ea963d2c0d47354fb647f5c5f32b76d932000000200000000100000000000d00) CTSRD-CHERI#3 0x28021f0b4 in lldb::SBTarget::FindFirstType(char const*)+0x624 (/Users/michaelbuch/Git/lldb-build-main-no-modules/lib/liblldb.17.0.0git.dylib:arm64+0x21f0b4) (BuildId: ea963d2c0d47354fb647f5c5f32b76d932000000200000000100000000000d00) CTSRD-CHERI#4 0x2804e9590 in _wrap_SBTarget_FindFirstType(_object*, _object*)+0x26c (/Users/michaelbuch/Git/lldb-build-main-no-modules/lib/liblldb.17.0.0git.dylib:arm64+0x4e9590) (BuildId: ea963d2c0d47354fb647f5c5f32b76d932000000200000000100000000000d00) CTSRD-CHERI#5 0x1062d3ad4 in cfunction_call+0x5c (/opt/homebrew/Cellar/python@3.11/3.11.1/Frameworks/Python.framework/Versions/3.11/Python:arm64+0xcfad4) (BuildId: c9efc4bbb1943f9a9b7cc4e91fce477732000000200000000100000000000d00) <--- snipped ---> 0x000108f6b510 is located 400 bytes inside of 512-byte region [0x000108f6b380,0x000108f6b580) allocated by thread T0 here: #0 0x105209414 in wrap__Znwm+0x74 (/Applications/Xcode2.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/14.0.3/lib/darwin/libclang_rt.asan_osx_dynamic.dylib:arm64e+0x51414) (BuildId: 0a44828ceb64337bbfff60b22cd838f0320000 00200000000100000000000b00) #1 0x280dca3b4 in std::__1::__split_buffer<std::__1::shared_ptr<lldb_private::TypeSystem>, std::__1::allocator<std::__1::shared_ptr<lldb_private::TypeSystem>>&>::__split_buffer(unsigned long, unsigned long, std::__1::allocator<std::__1::shared_pt r<lldb_private::TypeSystem>>&)+0x11c (/Users/michaelbuch/Git/lldb-build-main-no-modules/lib/liblldb.17.0.0git.dylib:arm64+0xdca3b4) (BuildId: ea963d2c0d47354fb647f5c5f32b76d932000000200000000100000000000d00) CTSRD-CHERI#2 0x280dc978c in void std::__1::vector<std::__1::shared_ptr<lldb_private::TypeSystem>, std::__1::allocator<std::__1::shared_ptr<lldb_private::TypeSystem>>>::__push_back_slow_path<std::__1::shared_ptr<lldb_private::TypeSystem> const&>(std::__1::s hared_ptr<lldb_private::TypeSystem> const&)+0x13c (/Users/michaelbuch/Git/lldb-build-main-no-modules/lib/liblldb.17.0.0git.dylib:arm64+0xdc978c) (BuildId: ea963d2c0d47354fb647f5c5f32b76d932000000200000000100000000000d00) CTSRD-CHERI#3 0x280d88dec in std::__1::vector<std::__1::shared_ptr<lldb_private::TypeSystem>, std::__1::allocator<std::__1::shared_ptr<lldb_private::TypeSystem>>>::push_back[abi:v15006](std::__1::shared_ptr<lldb_private::TypeSystem> const&)+0x80 (/Users/mic haelbuch/Git/lldb-build-main-no-modules/lib/liblldb.17.0.0git.dylib:arm64+0xd88dec) (BuildId: ea963d2c0d47354fb647f5c5f32b76d932000000200000000100000000000d00) CTSRD-CHERI#4 0x280d8857c in lldb_private::Target::GetScratchTypeSystems(bool)+0x398 (/Users/michaelbuch/Git/lldb-build-main-no-modules/lib/liblldb.17.0.0git.dylib:arm64+0xd8857c) (BuildId: ea963d2c0d47354fb647f5c5f32b76d932000000200000000100000000000d00) CTSRD-CHERI#5 0x28021f0b4 in lldb::SBTarget::FindFirstType(char const*)+0x624 (/Users/michaelbuch/Git/lldb-build-main-no-modules/lib/liblldb.17.0.0git.dylib:arm64+0x21f0b4) (BuildId: ea963d2c0d47354fb647f5c5f32b76d932000000200000000100000000000d00) CTSRD-CHERI#6 0x2804e9590 in _wrap_SBTarget_FindFirstType(_object*, _object*)+0x26c (/Users/michaelbuch/Git/lldb-build-main-no-modules/lib/liblldb.17.0.0git.dylib:arm64+0x4e9590) (BuildId: ea963d2c0d47354fb647f5c5f32b76d932000000200000000100000000000d00) CTSRD-CHERI#7 0x1062d3ad4 in cfunction_call+0x5c (/opt/homebrew/Cellar/python@3.11/3.11.1/Frameworks/Python.framework/Versions/3.11/Python:arm64+0xcfad4) (BuildId: c9efc4bbb1943f9a9b7cc4e91fce477732000000200000000100000000000d00) CTSRD-CHERI#8 0x10627fff0 in _PyObject_MakeTpCall+0x7c (/opt/homebrew/Cellar/python@3.11/3.11.1/Frameworks/Python.framework/Versions/3.11/Python:arm64+0x7bff0) (BuildId: c9efc4bbb1943f9a9b7cc4e91fce477732000000200000000100000000000d00) CTSRD-CHERI#9 0x106378a98 in _PyEval_EvalFrameDefault+0xbcf8 (/opt/homebrew/Cellar/python@3.11/3.11.1/Frameworks/Python.framework/Versions/3.11/Python:arm64+0x174a98) (BuildId: c9efc4bbb1943f9a9b7cc4e91fce477732000000200000000100000000000d00) ``` Differential Revision: https://reviews.llvm.org/D142709

… -analyzer-config I am working on another patch that changes StringMap's hash function, which changes the iteration order here, and breaks some tests, specifically: clang/test/Analysis/NSString.m clang/test/Analysis/shallow-mode.m with errors like: generated arguments do not match in round-trip generated arguments #1 in round-trip: <...> "-analyzer-config" "ipa=inlining" "-analyzer-config" "max-nodes=75000" <...> generated arguments CTSRD-CHERI#2 in round-trip: <...> "-analyzer-config" "max-nodes=75000" "-analyzer-config" "ipa=inlining" <...> To avoid this, sort the options by key, instead of using the default map iteration order. Reviewed By: jansvoboda11, MaskRay Differential Revision: https://reviews.llvm.org/D142861

This reverts commit d768b97. Causes sanitizer failure: https://lab.llvm.org/buildbot/#/builders/238/builds/1114 ``` /b/sanitizer-aarch64-linux-bootstrap-ubsan/build/llvm-project/llvm/lib/Support/xxhash.cpp:107:12: runtime error: applying non-zero offset 8 to null pointer #0 0xaaaab28ec6c8 in llvm::xxHash64(llvm::StringRef) /b/sanitizer-aarch64-linux-bootstrap-ubsan/build/llvm-project/llvm/lib/Support/xxhash.cpp:107:12 #1 0xaaaab28cbd38 in llvm::StringMapImpl::LookupBucketFor(llvm::StringRef) /b/sanitizer-aarch64-linux-bootstrap-ubsan/build/llvm-project/llvm/lib/Support/StringMap.cpp:87:28 ``` Probably causes test failure in `warn-unsafe-buffer-usage-fixits-local-var-span.cpp`: https://lab.llvm.org/buildbot/#/builders/60/builds/10619 Probably causes reverse-iteration test failure in `test-output-format.ll`: https://lab.llvm.org/buildbot/#/builders/54/builds/3545

Change https://reviews.llvm.org/D140059 exposed the following crash in Z3Solver, where bit widths were not checked consistently with that change. This change makes the check consistent, and fixes the crash. ``` clang: <root>/llvm/include/llvm/ADT/APSInt.h:99: int64_t llvm::APSInt::getExtValue() const: Assertion `isRepresentableByInt64() && "Too many bits for int64_t"' failed. ... Stack dump: 0. Program arguments: clang -cc1 -internal-isystem <root>/lib/clang/16/include -nostdsysteminc -analyze -analyzer-checker=core,unix.Malloc,debug.ExprInspection -analyzer-config crosscheck-with-z3=true -verify reproducer.c #0 0x00000000045b3476 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) <root>/llvm/lib/Support/Unix/Signals.inc:567:22 #1 0x00000000045b3862 PrintStackTraceSignalHandler(void*) <root>/llvm/lib/Support/Unix/Signals.inc:641:1 CTSRD-CHERI#2 0x00000000045b14a5 llvm::sys::RunSignalHandlers() <root>/llvm/lib/Support/Signals.cpp:104:20 CTSRD-CHERI#3 0x00000000045b2eb4 SignalHandler(int) <root>/llvm/lib/Support/Unix/Signals.inc:412:1 ... CTSRD-CHERI#9 0x0000000004be2eb3 llvm::APSInt::getExtValue() const <root>/llvm/include/llvm/ADT/APSInt.h:99:5 <root>/llvm/lib/Support/Z3Solver.cpp:740:53 clang::ASTContext&, clang::ento::SymExpr const*, llvm::APSInt const&, llvm::APSInt const&, bool) <root>/clang/include/clang/StaticAnalyzer/Core/PathSensitive/SMTConv.h:552:61 ``` Reviewed By: steakhal Differential Revision: https://reviews.llvm.org/D142627

…ak ordering `std::sort` requires a comparison operator that obides by strict weak ordering. `operator<=` on pointer does not and leads to undefined behaviour. Specifically, when we grow the `scratch_type_systems` vector slightly larger (and thus take `std::sort` down a slightly different codepath), we segfault. This happened while working on a patch that would in fact grow this vector. In such a case ASAN reports: ``` $ ./bin/lldb ./lldb-test-build.noindex/lang/cpp/complete-type-check/TestCppIsTypeComplete.test_builtin_types/a.out -o "script -- lldb.target.FindFirstType(\"void\")" (lldb) script -- lldb.target.FindFirstType("void") ================================================================= ==59975==ERROR: AddressSanitizer: container-overflow on address 0x000108f6b510 at pc 0x000280177b4c bp 0x00016b7d7430 sp 0x00016b7d7428 READ of size 8 at 0x000108f6b510 thread T0 #0 0x280177b48 in std::__1::shared_ptr<lldb_private::TypeSystem>::shared_ptr[abi:v15006](std::__1::shared_ptr<lldb_private::TypeSystem> const&)+0xb4 (/Users/michaelbuch/Git/lldb-build-main-no-modules/lib/liblldb.17.0.0git.dylib:arm64+0x177b48) (BuildId: ea963d2c0d47354fb647f5c5f32b76d932000000200000000100000000000d00) #1 0x280dcc008 in void std::__1::__introsort<std::__1::_ClassicAlgPolicy, lldb_private::Target::GetScratchTypeSystems(bool)::$_3&, std::__1::shared_ptr<lldb_private::TypeSystem>*>(std::__1::shared_ptr<lldb_private::TypeSystem>*, std::__1::shared_ ptr<lldb_private::TypeSystem>*, lldb_private::Target::GetScratchTypeSystems(bool)::$_3&, std::__1::iterator_traits<std::__1::shared_ptr<lldb_private::TypeSystem>*>::difference_type)+0x1050 (/Users/michaelbuch/Git/lldb-build-main-no-modules/lib/liblld b.17.0.0git.dylib:arm64+0xdcc008) (BuildId: ea963d2c0d47354fb647f5c5f32b76d932000000200000000100000000000d00) CTSRD-CHERI#2 0x280d88788 in lldb_private::Target::GetScratchTypeSystems(bool)+0x5a4 (/Users/michaelbuch/Git/lldb-build-main-no-modules/lib/liblldb.17.0.0git.dylib:arm64+0xd88788) (BuildId: ea963d2c0d47354fb647f5c5f32b76d932000000200000000100000000000d00) CTSRD-CHERI#3 0x28021f0b4 in lldb::SBTarget::FindFirstType(char const*)+0x624 (/Users/michaelbuch/Git/lldb-build-main-no-modules/lib/liblldb.17.0.0git.dylib:arm64+0x21f0b4) (BuildId: ea963d2c0d47354fb647f5c5f32b76d932000000200000000100000000000d00) CTSRD-CHERI#4 0x2804e9590 in _wrap_SBTarget_FindFirstType(_object*, _object*)+0x26c (/Users/michaelbuch/Git/lldb-build-main-no-modules/lib/liblldb.17.0.0git.dylib:arm64+0x4e9590) (BuildId: ea963d2c0d47354fb647f5c5f32b76d932000000200000000100000000000d00) CTSRD-CHERI#5 0x1062d3ad4 in cfunction_call+0x5c (/opt/homebrew/Cellar/python@3.11/3.11.1/Frameworks/Python.framework/Versions/3.11/Python:arm64+0xcfad4) (BuildId: c9efc4bbb1943f9a9b7cc4e91fce477732000000200000000100000000000d00) <--- snipped ---> 0x000108f6b510 is located 400 bytes inside of 512-byte region [0x000108f6b380,0x000108f6b580) allocated by thread T0 here: #0 0x105209414 in wrap__Znwm+0x74 (/Applications/Xcode2.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/14.0.3/lib/darwin/libclang_rt.asan_osx_dynamic.dylib:arm64e+0x51414) (BuildId: 0a44828ceb64337bbfff60b22cd838f0320000 00200000000100000000000b00) #1 0x280dca3b4 in std::__1::__split_buffer<std::__1::shared_ptr<lldb_private::TypeSystem>, std::__1::allocator<std::__1::shared_ptr<lldb_private::TypeSystem>>&>::__split_buffer(unsigned long, unsigned long, std::__1::allocator<std::__1::shared_pt r<lldb_private::TypeSystem>>&)+0x11c (/Users/michaelbuch/Git/lldb-build-main-no-modules/lib/liblldb.17.0.0git.dylib:arm64+0xdca3b4) (BuildId: ea963d2c0d47354fb647f5c5f32b76d932000000200000000100000000000d00) CTSRD-CHERI#2 0x280dc978c in void std::__1::vector<std::__1::shared_ptr<lldb_private::TypeSystem>, std::__1::allocator<std::__1::shared_ptr<lldb_private::TypeSystem>>>::__push_back_slow_path<std::__1::shared_ptr<lldb_private::TypeSystem> const&>(std::__1::s hared_ptr<lldb_private::TypeSystem> const&)+0x13c (/Users/michaelbuch/Git/lldb-build-main-no-modules/lib/liblldb.17.0.0git.dylib:arm64+0xdc978c) (BuildId: ea963d2c0d47354fb647f5c5f32b76d932000000200000000100000000000d00) CTSRD-CHERI#3 0x280d88dec in std::__1::vector<std::__1::shared_ptr<lldb_private::TypeSystem>, std::__1::allocator<std::__1::shared_ptr<lldb_private::TypeSystem>>>::push_back[abi:v15006](std::__1::shared_ptr<lldb_private::TypeSystem> const&)+0x80 (/Users/mic haelbuch/Git/lldb-build-main-no-modules/lib/liblldb.17.0.0git.dylib:arm64+0xd88dec) (BuildId: ea963d2c0d47354fb647f5c5f32b76d932000000200000000100000000000d00) CTSRD-CHERI#4 0x280d8857c in lldb_private::Target::GetScratchTypeSystems(bool)+0x398 (/Users/michaelbuch/Git/lldb-build-main-no-modules/lib/liblldb.17.0.0git.dylib:arm64+0xd8857c) (BuildId: ea963d2c0d47354fb647f5c5f32b76d932000000200000000100000000000d00) CTSRD-CHERI#5 0x28021f0b4 in lldb::SBTarget::FindFirstType(char const*)+0x624 (/Users/michaelbuch/Git/lldb-build-main-no-modules/lib/liblldb.17.0.0git.dylib:arm64+0x21f0b4) (BuildId: ea963d2c0d47354fb647f5c5f32b76d932000000200000000100000000000d00) CTSRD-CHERI#6 0x2804e9590 in _wrap_SBTarget_FindFirstType(_object*, _object*)+0x26c (/Users/michaelbuch/Git/lldb-build-main-no-modules/lib/liblldb.17.0.0git.dylib:arm64+0x4e9590) (BuildId: ea963d2c0d47354fb647f5c5f32b76d932000000200000000100000000000d00) CTSRD-CHERI#7 0x1062d3ad4 in cfunction_call+0x5c (/opt/homebrew/Cellar/python@3.11/3.11.1/Frameworks/Python.framework/Versions/3.11/Python:arm64+0xcfad4) (BuildId: c9efc4bbb1943f9a9b7cc4e91fce477732000000200000000100000000000d00) CTSRD-CHERI#8 0x10627fff0 in _PyObject_MakeTpCall+0x7c (/opt/homebrew/Cellar/python@3.11/3.11.1/Frameworks/Python.framework/Versions/3.11/Python:arm64+0x7bff0) (BuildId: c9efc4bbb1943f9a9b7cc4e91fce477732000000200000000100000000000d00) CTSRD-CHERI#9 0x106378a98 in _PyEval_EvalFrameDefault+0xbcf8 (/opt/homebrew/Cellar/python@3.11/3.11.1/Frameworks/Python.framework/Versions/3.11/Python:arm64+0x174a98) (BuildId: c9efc4bbb1943f9a9b7cc4e91fce477732000000200000000100000000000d00) ``` Differential Revision: https://reviews.llvm.org/D142709

… -analyzer-config I am working on another patch that changes StringMap's hash function, which changes the iteration order here, and breaks some tests, specifically: clang/test/Analysis/NSString.m clang/test/Analysis/shallow-mode.m with errors like: generated arguments do not match in round-trip generated arguments #1 in round-trip: <...> "-analyzer-config" "ipa=inlining" "-analyzer-config" "max-nodes=75000" <...> generated arguments CTSRD-CHERI#2 in round-trip: <...> "-analyzer-config" "max-nodes=75000" "-analyzer-config" "ipa=inlining" <...> To avoid this, sort the options by key, instead of using the default map iteration order. Reviewed By: jansvoboda11, MaskRay Differential Revision: https://reviews.llvm.org/D142861

This reverts commit d768b97. Causes sanitizer failure: https://lab.llvm.org/buildbot/#/builders/238/builds/1114 ``` /b/sanitizer-aarch64-linux-bootstrap-ubsan/build/llvm-project/llvm/lib/Support/xxhash.cpp:107:12: runtime error: applying non-zero offset 8 to null pointer #0 0xaaaab28ec6c8 in llvm::xxHash64(llvm::StringRef) /b/sanitizer-aarch64-linux-bootstrap-ubsan/build/llvm-project/llvm/lib/Support/xxhash.cpp:107:12 #1 0xaaaab28cbd38 in llvm::StringMapImpl::LookupBucketFor(llvm::StringRef) /b/sanitizer-aarch64-linux-bootstrap-ubsan/build/llvm-project/llvm/lib/Support/StringMap.cpp:87:28 ``` Probably causes test failure in `warn-unsafe-buffer-usage-fixits-local-var-span.cpp`: https://lab.llvm.org/buildbot/#/builders/60/builds/10619 Probably causes reverse-iteration test failure in `test-output-format.ll`: https://lab.llvm.org/buildbot/#/builders/54/builds/3545

…est unittest Need to finalize the DIBuilder to avoid leak sanitizer errors like this: Direct leak of 48 byte(s) in 1 object(s) allocated from: #0 0x55c99ea1761d in operator new(unsigned long) #1 0x55c9a518ae49 in operator new CTSRD-CHERI#2 0x55c9a518ae49 in llvm::MDTuple::getImpl(...) CTSRD-CHERI#3 0x55c9a4f1b1ec in getTemporary CTSRD-CHERI#4 0x55c9a4f1b1ec in llvm::DIBuilder::createFunction(...)

@foo

The motivation for this change is a workload generated by the XLA compiler targeting nvidia GPUs. This kernel has a few hundred i8 loads and stores. Merging is critical for performance. The current LSV doesn't merge these well because it only considers instructions within a block of 64 loads+stores. This limit is necessary to contain the O(n^2) behavior of the pass. I'm hesitant to increase the limit, because this pass is already one of the slowest parts of compiling an XLA program. So we rewrite basically the whole thing to use a new algorithm. Before, we compared every load/store to every other to see if they're consecutive. The insight (from tra@) is that this is redundant. If we know the offset from PtrA to PtrB, then we don't need to compare PtrC to both of them in order to tell whether C may be adjacent to A or B. So that's what we do. When scanning a basic block, we maintain a list of chains, where we know the offset from every element in the chain to the first element in the chain. Each instruction gets compared only to the leaders of all the chains. In the worst case, this is still O(n^2), because all chains might be of length 1. To prevent compile time blowup, we only consider the 64 most recently used chains. Thus we do no more comparisons than before, but we have the potential to make much longer chains. This rewrite affects many tests. The changes to tests fall into two categories. 1. The old code had what appears to be a bug when deciding whether a misaligned vectorized load is fast. Suppose TTI reports that load <i32 x 4> align 4 has relative speed 1, and suppose that load i32 align 4 has relative speed 32. The intent of the code seems to be that we prefer the scalar load, because it's faster. But the old code would choose the vectorized load. accessIsMisaligned would set RelativeSpeed to 0 for the scalar load (and not even call into TTI to get the relative speed), because the scalar load is aligned. After this patch, we will prefer the scalar load if it's faster. 2. This patch changes the logic for how we vectorize. Usually this results in vectorizing more. Explanation of changes to tests: - AMDGPU/adjust-alloca-alignment.ll: #1 - AMDGPU/flat_atomic.ll: CTSRD-CHERI#2, we vectorize more. - AMDGPU/int_sideeffect.ll: CTSRD-CHERI#2, there are two possible locations for the call to @foo, and the pass is brittle to this. Before, we'd vectorize in case 1 and not case 2. Now we vectorize in case 2 and not case 1. So we just move the call. - AMDGPU/adjust-alloca-alignment.ll: CTSRD-CHERI#2, we vectorize more - AMDGPU/insertion-point.ll: CTSRD-CHERI#2 we vectorize more - AMDGPU/merge-stores-private.ll: #1 (undoes changes from git rev 86f9117, which appear to have hit the bug from #1) - AMDGPU/multiple_tails.ll: #1 - AMDGPU/vect-ptr-ptr-size-mismatch.ll: Fix alignment (I think related to #1 above). - AMDGPU CodeGen: I have difficulty commenting on these changes, but many of them look like CTSRD-CHERI#2, we vectorize more. - NVPTX/4x2xhalf.ll: Fix alignment (I think related to #1 above). - NVPTX/vectorize_i8.ll: We don't generate <3 x i8> vectors on NVPTX because they're not legal (and eventually get split) - X86/correct-order.ll: CTSRD-CHERI#2, we vectorize more, probably because of changes to the chain-splitting logic. - X86/subchain-interleaved.ll: CTSRD-CHERI#2, we vectorize more - X86/vector-scalar.ll: CTSRD-CHERI#2, we can now vectorize scalar float + <1 x float> - X86/vectorize-i8-nested-add-inseltpoison.ll: Deleted the nuw test because it was nonsensical. It was doing `add nuw %v0, -1`, but this is equivalent to `add nuw %v0, 0xffff'ffff`, which is equivalent to asserting that %v0 == 0. - X86/vectorize-i8-nested-add.ll: Same as nested-add-inseltpoison.ll Differential Revision: https://reviews.llvm.org/D149893

…est unittest Need to finalize the DIBuilder to avoid leak sanitizer errors like this: Direct leak of 48 byte(s) in 1 object(s) allocated from: #0 0x55c99ea1761d in operator new(unsigned long) #1 0x55c9a518ae49 in operator new CTSRD-CHERI#2 0x55c9a518ae49 in llvm::MDTuple::getImpl(...) CTSRD-CHERI#3 0x55c9a4f1b1ec in getTemporary CTSRD-CHERI#4 0x55c9a4f1b1ec in llvm::DIBuilder::createFunction(...)

@foo

The motivation for this change is a workload generated by the XLA compiler targeting nvidia GPUs. This kernel has a few hundred i8 loads and stores. Merging is critical for performance. The current LSV doesn't merge these well because it only considers instructions within a block of 64 loads+stores. This limit is necessary to contain the O(n^2) behavior of the pass. I'm hesitant to increase the limit, because this pass is already one of the slowest parts of compiling an XLA program. So we rewrite basically the whole thing to use a new algorithm. Before, we compared every load/store to every other to see if they're consecutive. The insight (from tra@) is that this is redundant. If we know the offset from PtrA to PtrB, then we don't need to compare PtrC to both of them in order to tell whether C may be adjacent to A or B. So that's what we do. When scanning a basic block, we maintain a list of chains, where we know the offset from every element in the chain to the first element in the chain. Each instruction gets compared only to the leaders of all the chains. In the worst case, this is still O(n^2), because all chains might be of length 1. To prevent compile time blowup, we only consider the 64 most recently used chains. Thus we do no more comparisons than before, but we have the potential to make much longer chains. This rewrite affects many tests. The changes to tests fall into two categories. 1. The old code had what appears to be a bug when deciding whether a misaligned vectorized load is fast. Suppose TTI reports that load <i32 x 4> align 4 has relative speed 1, and suppose that load i32 align 4 has relative speed 32. The intent of the code seems to be that we prefer the scalar load, because it's faster. But the old code would choose the vectorized load. accessIsMisaligned would set RelativeSpeed to 0 for the scalar load (and not even call into TTI to get the relative speed), because the scalar load is aligned. After this patch, we will prefer the scalar load if it's faster. 2. This patch changes the logic for how we vectorize. Usually this results in vectorizing more. Explanation of changes to tests: - AMDGPU/adjust-alloca-alignment.ll: #1 - AMDGPU/flat_atomic.ll: CTSRD-CHERI#2, we vectorize more. - AMDGPU/int_sideeffect.ll: CTSRD-CHERI#2, there are two possible locations for the call to @foo, and the pass is brittle to this. Before, we'd vectorize in case 1 and not case 2. Now we vectorize in case 2 and not case 1. So we just move the call. - AMDGPU/adjust-alloca-alignment.ll: CTSRD-CHERI#2, we vectorize more - AMDGPU/insertion-point.ll: CTSRD-CHERI#2 we vectorize more - AMDGPU/merge-stores-private.ll: #1 (undoes changes from git rev 86f9117, which appear to have hit the bug from #1) - AMDGPU/multiple_tails.ll: #1 - AMDGPU/vect-ptr-ptr-size-mismatch.ll: Fix alignment (I think related to #1 above). - AMDGPU CodeGen: I have difficulty commenting on these changes, but many of them look like CTSRD-CHERI#2, we vectorize more. - NVPTX/4x2xhalf.ll: Fix alignment (I think related to #1 above). - NVPTX/vectorize_i8.ll: We don't generate <3 x i8> vectors on NVPTX because they're not legal (and eventually get split) - X86/correct-order.ll: CTSRD-CHERI#2, we vectorize more, probably because of changes to the chain-splitting logic. - X86/subchain-interleaved.ll: CTSRD-CHERI#2, we vectorize more - X86/vector-scalar.ll: CTSRD-CHERI#2, we can now vectorize scalar float + <1 x float> - X86/vectorize-i8-nested-add-inseltpoison.ll: Deleted the nuw test because it was nonsensical. It was doing `add nuw %v0, -1`, but this is equivalent to `add nuw %v0, 0xffff'ffff`, which is equivalent to asserting that %v0 == 0. - X86/vectorize-i8-nested-add.ll: Same as nested-add-inseltpoison.ll Differential Revision: https://reviews.llvm.org/D149893

Use hlfir::loadTrivialScalars to dereference pointer, allocatables, and load numerical and logical scalars. This has a small fallout on tests: - load is done on the HLFIR entity (#0 of hlfir.declare) and not the FIR one (#1). This makes no difference at the FIR level (#1 and #0 only differs to account for assumed and explicit shape lower bounds). - loadTrivialScalars get rids of allocatable fir.box for monomoprhic scalars (it is not needed). This exposed a bug in lowering of MERGE with a polymorphic and a monomorphic argument: when the monomorphic is not a fir.box, the polymorphic fir.class should not be reboxed but its address should be read. Reviewed By: tblah Differential Revision: https://reviews.llvm.org/D153252

@foo

Allow specifying 'nomerge' attribute for function pointers, e.g. like in the following C code: extern void (*foo)(void) __attribute__((nomerge)); void bar(long i) { if (i) foo(); else foo(); } With the goal to attach 'nomerge' to both calls done through 'foo': @foo = external local_unnamed_addr global ptr, align 8 define dso_local void @bar(i64 noundef %i) local_unnamed_addr #0 { ; ... %0 = load ptr, ptr @foo, align 8, !tbaa !5 ; ... if.then: tail call void %0() #1 br label %if.end if.else: tail call void %0() #1 br label %if.end if.end: ret void } ; ... attributes #1 = { nomerge ... } Report a warning in case if 'nomerge' is specified for a variable that is not a function pointer, e.g.: t.c:2:22: warning: 'nomerge' attribute is ignored because 'j' is not a function pointer [-Wignored-attributes] 2 | int j __attribute__((nomerge)); | ^ The intended use-case is for BPF backend. BPF provides a sort of "standard library" functions that are called helpers. BPF also verifies usage of these helpers before program execution. Because of limitations of verification / runtime model it is important to keep calls to some of such helpers from merging. An example could be found by the link [1], there input C code: if (data_end - data > 1024) { bpf_for_each_map_elem(&map1, cb, &cb_data, 0); } else { bpf_for_each_map_elem(&map2, cb, &cb_data, 0); } Is converted to bytecode equivalent to: if (data_end - data > 1024) tmp = &map1; else tmp = &map2; bpf_for_each_map_elem(tmp, cb, &cb_data, 0); However, BPF verification/runtime requires to use the same map address for each particular `bpf_for_each_map_elem()` call. The 'nomerge' attribute is a perfect match for this situation, but unfortunately BPF helpers are declared as pointers to functions: static long (*bpf_for_each_map_elem)(void *map, ...) = (void *) 164; Hence, this commit, allowing to use 'nomerge' for function pointers. [1] https://lore.kernel.org/bpf/03bdf90f-f374-1e67-69d6-76dd9c8318a4@meta.com/ Differential Revision: https://reviews.llvm.org/D152986

@foo

Allow specifying 'nomerge' attribute for function pointers, e.g. like in the following C code: extern void (*foo)(void) __attribute__((nomerge)); void bar(long i) { if (i) foo(); else foo(); } With the goal to attach 'nomerge' to both calls done through 'foo': @foo = external local_unnamed_addr global ptr, align 8 define dso_local void @bar(i64 noundef %i) local_unnamed_addr #0 { ; ... %0 = load ptr, ptr @foo, align 8, !tbaa !5 ; ... if.then: tail call void %0() #1 br label %if.end if.else: tail call void %0() #1 br label %if.end if.end: ret void } ; ... attributes #1 = { nomerge ... } Report a warning in case if 'nomerge' is specified for a variable that is not a function pointer, e.g.: t.c:2:22: warning: 'nomerge' attribute is ignored because 'j' is not a function pointer [-Wignored-attributes] 2 | int j __attribute__((nomerge)); | ^ The intended use-case is for BPF backend. BPF provides a sort of "standard library" functions that are called helpers. BPF also verifies usage of these helpers before program execution. Because of limitations of verification / runtime model it is important to keep calls to some of such helpers from merging. An example could be found by the link [1], there input C code: if (data_end - data > 1024) { bpf_for_each_map_elem(&map1, cb, &cb_data, 0); } else { bpf_for_each_map_elem(&map2, cb, &cb_data, 0); } Is converted to bytecode equivalent to: if (data_end - data > 1024) tmp = &map1; else tmp = &map2; bpf_for_each_map_elem(tmp, cb, &cb_data, 0); However, BPF verification/runtime requires to use the same map address for each particular `bpf_for_each_map_elem()` call. The 'nomerge' attribute is a perfect match for this situation, but unfortunately BPF helpers are declared as pointers to functions: static long (*bpf_for_each_map_elem)(void *map, ...) = (void *) 164; Hence, this commit, allowing to use 'nomerge' for function pointers. [1] https://lore.kernel.org/bpf/03bdf90f-f374-1e67-69d6-76dd9c8318a4@meta.com/ Differential Revision: https://reviews.llvm.org/D152986

Running this on Amazon Ubuntu the final backtrace is: ``` (lldb) thread backtrace * thread #1, name = 'a.out', stop reason = breakpoint 1.1 * frame #0: 0x0000aaaaaaaa07d0 a.out`func_c at main.c:10:3 frame #1: 0x0000aaaaaaaa07c4 a.out`func_b at main.c:14:3 frame CTSRD-CHERI#2: 0x0000aaaaaaaa07b4 a.out`func_a at main.c:18:3 frame CTSRD-CHERI#3: 0x0000aaaaaaaa07a4 a.out`main(argc=<unavailable>, argv=<unavailable>) at main.c:22:3 frame CTSRD-CHERI#4: 0x0000fffff7b373fc libc.so.6`___lldb_unnamed_symbol2962 + 108 frame CTSRD-CHERI#5: 0x0000fffff7b374cc libc.so.6`__libc_start_main + 152 frame CTSRD-CHERI#6: 0x0000aaaaaaaa06b0 a.out`_start + 48 ``` This causes the test to fail because of the extra ___lldb_unnamed_symbol2962 frame (an inlined function?). To fix this, strictly check all the frames in main.c then for the rest just check we find __libc_start_main and _start in that order regardless of other frames in between. Reviewed By: omjavaid Differential Revision: https://reviews.llvm.org/D154204

The original MFS work D85368 shows good performance improvement with Instrumented FDO. However, AutoFDO or Flow-Sensitive AutoFDO (FSAFDO) does not show performance gain. This is mainly caused by a less accurate profile compared to the iFDO profile. For the past few months, we have been working to improve FSAFDO quality, like in D145171. Taking advantage of this improvement, MFS now shows performance improvements over FSAFDO profiles. That being said, 2 minor changes need to be made, 1) An FS-AutoFDO profile generation pass needs to be added right before MFS pass and an FSAFDO profile load pass is needed when FS-AutoFDO is enabled and the MFS flag is present. 2) MFS only applies to hot functions, because we believe (and experiment also shows) FS-AutoFDO is more accurate about functions that have plenty of samples than those with no or very few samples. With this improvement, we see a 1.2% performance improvement in clang benchmark, 0.9% QPS improvement in our internal search benchmark, and 3%-5% improvement in internal storage benchmark. This is #1 of the two patches that enables the improvement. Reviewed By: wenlei, snehasish, xur Differential Revision: https://reviews.llvm.org/D152399

…tput The crash happens in clang::driver::tools::SplitDebugName when Output is InputInfo::Nothing. It doesn't happen with standalone clang driver because output is created in Driver::BuildJobsForActionNoCache. Example backtrace: ``` * thread #1, name = 'clangd', stop reason = hit program assert * frame #0: 0x00007ffff5c4eacf libc.so.6`raise + 271 frame #1: 0x00007ffff5c21ea5 libc.so.6`abort + 295 frame CTSRD-CHERI#2: 0x00007ffff5c21d79 libc.so.6`__assert_fail_base.cold.0 + 15 frame CTSRD-CHERI#3: 0x00007ffff5c47426 libc.so.6`__assert_fail + 70 frame CTSRD-CHERI#4: 0x000055555dc0923c clangd`clang::driver::InputInfo::getFilename(this=0x00007fffffff9398) const at InputInfo.h:84:5 frame CTSRD-CHERI#5: 0x000055555dcd0d8d clangd`clang::driver::tools::SplitDebugName(JA=0x000055555f6c6a50, Args=0x000055555f6d0b80, Input=0x00007fffffff9678, Output=0x00007fffffff9398) at CommonArgs.cpp:1275:40 frame CTSRD-CHERI#6: 0x000055555dc955a5 clangd`clang::driver::tools::Clang::ConstructJob(this=0x000055555f6c69d0, C=0x000055555f6c64a0, JA=0x000055555f6c6a50, Output=0x00007fffffff9398, Inputs=0x00007fffffff9668, Args=0x000055555f6d0b80, LinkingOutput=0x0000000000000000) const at Clang.cpp:5690:33 frame CTSRD-CHERI#7: 0x000055555dbf6b54 clangd`clang::driver::Driver::BuildJobsForActionNoCache(this=0x00007fffffffb5e0, C=0x000055555f6c64a0, A=0x000055555f6c6a50, TC=0x000055555f6c4be0, BoundArch=(Data = 0x0000000000000000, Length = 0), AtTopLevel=true, MultipleArchs=false, LinkingOutput=0x0000000000000000, CachedResults=size=1, TargetDeviceOffloadKind=OFK_None) const at Driver.cpp:5618:10 frame CTSRD-CHERI#8: 0x000055555dbf4ef0 clangd`clang::driver::Driver::BuildJobsForAction(this=0x00007fffffffb5e0, C=0x000055555f6c64a0, A=0x000055555f6c6a50, TC=0x000055555f6c4be0, BoundArch=(Data = 0x0000000000000000, Length = 0), AtTopLevel=true, MultipleArchs=false, LinkingOutput=0x0000000000000000, CachedResults=size=1, TargetDeviceOffloadKind=OFK_None) const at Driver.cpp:5306:26 frame CTSRD-CHERI#9: 0x000055555dbeb590 clangd`clang::driver::Driver::BuildJobs(this=0x00007fffffffb5e0, C=0x000055555f6c64a0) const at Driver.cpp:4844:5 frame CTSRD-CHERI#10: 0x000055555dbe6b0f clangd`clang::driver::Driver::BuildCompilation(this=0x00007fffffffb5e0, ArgList=ArrayRef<const char *> @ 0x00007fffffffb268) at Driver.cpp:1496:3 frame CTSRD-CHERI#11: 0x000055555b0cc0d9 clangd`clang::createInvocation(ArgList=ArrayRef<const char *> @ 0x00007fffffffbb38, Opts=CreateInvocationOptions @ 0x00007fffffffbb90) at CreateInvocationFromCommandLine.cpp:53:52 frame CTSRD-CHERI#12: 0x000055555b378e7b clangd`clang::clangd::buildCompilerInvocation(Inputs=0x00007fffffffca58, D=0x00007fffffffc158, CC1Args=size=0) at Compiler.cpp:116:44 frame CTSRD-CHERI#13: 0x000055555895a6c8 clangd`clang::clangd::(anonymous namespace)::Checker::buildInvocation(this=0x00007fffffffc760, TFS=0x00007fffffffe570, Contents= Has Value=false ) at Check.cpp:212:9 frame CTSRD-CHERI#14: 0x0000555558959cec clangd`clang::clangd::check(File=(Data = "build/test.cpp", Length = 64), TFS=0x00007fffffffe570, Opts=0x00007fffffffe600) at Check.cpp:486:34 frame CTSRD-CHERI#15: 0x000055555892164a clangd`main(argc=4, argv=0x00007fffffffecd8) at ClangdMain.cpp:993:12 frame CTSRD-CHERI#16: 0x00007ffff5c3ad85 libc.so.6`__libc_start_main + 229 frame CTSRD-CHERI#17: 0x00005555585bbe9e clangd`_start + 46 ``` Test Plan: ninja ClangDriverTests && tools/clang/unittests/Driver/ClangDriverTests Differential Revision: https://reviews.llvm.org/D154602

The original MFS work D85368 shows good performance improvement with Instrumented FDO. However, AutoFDO or Flow-Sensitive AutoFDO (FSAFDO) does not show performance gain. This is mainly caused by a less accurate profile compared to the iFDO profile. For the past few months, we have been working to improve FSAFDO quality, like in D145171. Taking advantage of this improvement, MFS now shows performance improvements over FSAFDO profiles. That being said, 2 minor changes need to be made, 1) An FS-AutoFDO profile generation pass needs to be added right before MFS pass and an FSAFDO profile load pass is needed when FS-AutoFDO is enabled and the MFS flag is present. 2) MFS only applies to hot functions, because we believe (and experiment also shows) FS-AutoFDO is more accurate about functions that have plenty of samples than those with no or very few samples. With this improvement, we see a 1.2% performance improvement in clang benchmark, 0.9% QPS improvement in our internal search benchmark, and 3%-5% improvement in internal storage benchmark. This is #1 of the two patches that enables the improvement. Reviewed By: wenlei, snehasish, xur Differential Revision: https://reviews.llvm.org/D152399

…tput The crash happens in clang::driver::tools::SplitDebugName when Output is InputInfo::Nothing. It doesn't happen with standalone clang driver because output is created in Driver::BuildJobsForActionNoCache. Example backtrace: ``` * thread #1, name = 'clangd', stop reason = hit program assert * frame #0: 0x00007ffff5c4eacf libc.so.6`raise + 271 frame #1: 0x00007ffff5c21ea5 libc.so.6`abort + 295 frame CTSRD-CHERI#2: 0x00007ffff5c21d79 libc.so.6`__assert_fail_base.cold.0 + 15 frame CTSRD-CHERI#3: 0x00007ffff5c47426 libc.so.6`__assert_fail + 70 frame CTSRD-CHERI#4: 0x000055555dc0923c clangd`clang::driver::InputInfo::getFilename(this=0x00007fffffff9398) const at InputInfo.h:84:5 frame CTSRD-CHERI#5: 0x000055555dcd0d8d clangd`clang::driver::tools::SplitDebugName(JA=0x000055555f6c6a50, Args=0x000055555f6d0b80, Input=0x00007fffffff9678, Output=0x00007fffffff9398) at CommonArgs.cpp:1275:40 frame CTSRD-CHERI#6: 0x000055555dc955a5 clangd`clang::driver::tools::Clang::ConstructJob(this=0x000055555f6c69d0, C=0x000055555f6c64a0, JA=0x000055555f6c6a50, Output=0x00007fffffff9398, Inputs=0x00007fffffff9668, Args=0x000055555f6d0b80, LinkingOutput=0x0000000000000000) const at Clang.cpp:5690:33 frame CTSRD-CHERI#7: 0x000055555dbf6b54 clangd`clang::driver::Driver::BuildJobsForActionNoCache(this=0x00007fffffffb5e0, C=0x000055555f6c64a0, A=0x000055555f6c6a50, TC=0x000055555f6c4be0, BoundArch=(Data = 0x0000000000000000, Length = 0), AtTopLevel=true, MultipleArchs=false, LinkingOutput=0x0000000000000000, CachedResults=size=1, TargetDeviceOffloadKind=OFK_None) const at Driver.cpp:5618:10 frame CTSRD-CHERI#8: 0x000055555dbf4ef0 clangd`clang::driver::Driver::BuildJobsForAction(this=0x00007fffffffb5e0, C=0x000055555f6c64a0, A=0x000055555f6c6a50, TC=0x000055555f6c4be0, BoundArch=(Data = 0x0000000000000000, Length = 0), AtTopLevel=true, MultipleArchs=false, LinkingOutput=0x0000000000000000, CachedResults=size=1, TargetDeviceOffloadKind=OFK_None) const at Driver.cpp:5306:26 frame CTSRD-CHERI#9: 0x000055555dbeb590 clangd`clang::driver::Driver::BuildJobs(this=0x00007fffffffb5e0, C=0x000055555f6c64a0) const at Driver.cpp:4844:5 frame CTSRD-CHERI#10: 0x000055555dbe6b0f clangd`clang::driver::Driver::BuildCompilation(this=0x00007fffffffb5e0, ArgList=ArrayRef<const char *> @ 0x00007fffffffb268) at Driver.cpp:1496:3 frame CTSRD-CHERI#11: 0x000055555b0cc0d9 clangd`clang::createInvocation(ArgList=ArrayRef<const char *> @ 0x00007fffffffbb38, Opts=CreateInvocationOptions @ 0x00007fffffffbb90) at CreateInvocationFromCommandLine.cpp:53:52 frame CTSRD-CHERI#12: 0x000055555b378e7b clangd`clang::clangd::buildCompilerInvocation(Inputs=0x00007fffffffca58, D=0x00007fffffffc158, CC1Args=size=0) at Compiler.cpp:116:44 frame CTSRD-CHERI#13: 0x000055555895a6c8 clangd`clang::clangd::(anonymous namespace)::Checker::buildInvocation(this=0x00007fffffffc760, TFS=0x00007fffffffe570, Contents= Has Value=false ) at Check.cpp:212:9 frame CTSRD-CHERI#14: 0x0000555558959cec clangd`clang::clangd::check(File=(Data = "build/test.cpp", Length = 64), TFS=0x00007fffffffe570, Opts=0x00007fffffffe600) at Check.cpp:486:34 frame CTSRD-CHERI#15: 0x000055555892164a clangd`main(argc=4, argv=0x00007fffffffecd8) at ClangdMain.cpp:993:12 frame CTSRD-CHERI#16: 0x00007ffff5c3ad85 libc.so.6`__libc_start_main + 229 frame CTSRD-CHERI#17: 0x00005555585bbe9e clangd`_start + 46 ``` Test Plan: ninja ClangDriverTests && tools/clang/unittests/Driver/ClangDriverTests Differential Revision: https://reviews.llvm.org/D154602

BlockDecl should be invalidated because of its invalid ParmVarDecl. Fixes #1 of llvm/llvm-project#64005 Differential Revision: https://reviews.llvm.org/D155984

arichardson closed this as completed Feb 18, 2019

heshamelmatary mentioned this issue Nov 27, 2019

Assertion failure RISC-V: Assertion `!CGF.CGM.getDataLayout().isFatPointer(Val->getType())' failed #354

Closed

bsdjhb mentioned this issue Jan 29, 2020

Failed assertion compiling machine/minit.c from bbl as purecap #373

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid conflict with issue 1 from original repo #1

Avoid conflict with issue 1 from original repo #1

arichardson commented Feb 18, 2019

Avoid conflict with issue 1 from original repo #1

Avoid conflict with issue 1 from original repo #1

Comments

arichardson commented Feb 18, 2019