Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error during the asan CI job: "LLVM ERROR: inconsistency in registered CommandLine options" #42540

Closed
DilumAluthge opened this issue Oct 7, 2021 · 10 comments · Fixed by #44420
Labels
domain:ci Continuous integration external dependencies Involves LLVM, OpenBLAS, or other linked libraries kind:bug Indicates an unexpected problem or unintended behavior

Comments

@DilumAluthge
Copy link
Member

This issue is to track the progress on fixing the "LLVM ERROR: inconsistency in registered CommandLine options" error seen in the asan CI job.

    LINK tmp/test-asan/asan/usr/lib/libjulia-codegen-debug.so.1
    LINK tmp/test-asan/asan/usr/lib/libjulia-codegen-debug.so
make[1]: Leaving directory '/cache/build/amdci8-0/julialang/julia-master/tmp/test-asan/asan/src'
make[1]: Entering directory '/cache/build/amdci8-0/julialang/julia-master/tmp/test-asan/asan'
    JULIA tmp/test-asan/asan/usr/lib/julia/corecompiler.ji
: CommandLine Error: Option 'help-list' registered more than once!
LLVM ERROR: inconsistency in registered CommandLine options
Aborted
make[1]: *** [/cache/build/amdci8-0/julialang/julia-master/sysimage.mk:61: /cache/build/amdci8-0/julialang/julia-master/tmp/test-asan/asan/usr/lib/julia/corecompiler.ji] Error 134
make[1]: Leaving directory '/cache/build/amdci8-0/julialang/julia-master/tmp/test-asan/asan'
make: *** [/cache/build/amdci8-0/julialang/julia-master/Makefile:82: julia-sysimg-ji] Error 2
make: Leaving directory '/cache/build/amdci8-0/julialang/julia-master/tmp/test-asan/asan'
🚨 Error: The command exited with status 2

Example log: https://buildkite.com/julialang/julia-master/builds/4455#3d8f2204-c0ae-43b8-a232-cfeb2a5be0c6

Introduced in #41936

@DilumAluthge DilumAluthge added kind:bug Indicates an unexpected problem or unintended behavior external dependencies Involves LLVM, OpenBLAS, or other linked libraries domain:ci Continuous integration labels Oct 7, 2021
@vchuravy
Copy link
Sponsor Member

vchuravy commented Oct 8, 2021

So what this feel like is like we load two different LLVM libraries with the symbol versions being equal. When someone looks into it, I would be curious to see a backtraces on that error.

@tkf
Copy link
Member

tkf commented Oct 14, 2021

I tried this with LLVM_SANITIZE=1 and LLVM_DEBUG=1. I get an ODR violation instead:

    JULIA usr/lib/julia/corecompiler.ji
=================================================================
==16275==ERROR: AddressSanitizer: odr-violation (0x7f53ceff4620):
  [1] size=4 'llvm::DisableABIBreakingChecks' /home/arakaki/repos/watch/_wt/julia/llvm-asan-debug/deps/srccache/llvm-julia-12.0.1-4/llvm/lib/Support/ABIBreak.cpp:20:5
  [2] size=4 'llvm::DisableABIBreakingChecks' /home/arakaki/repos/watch/_wt/julia/llvm-asan-debug/deps/srccache/llvm-julia-12.0.1-4/llvm/lib/Support/ABIBreak.cpp:20:5
These globals were registered at these points:
  [1]:
    #0 0x42bf87 in __asan_register_globals /home/arakaki/repos/watch/_wt/julia/toolchain-asan-debug/deps/srccache/llvm-julia-12.0.1-4/compiler-rt/lib/asan/asan_globals.cpp:360:3
    #1 0x7f53c03f3f5b in asan.module_ctor (/home/arakaki/repos/watch/_wt/julia/llvm-asan-debug/usr/bin/../lib/libLLVM-12jl.so+0x3324f5b)

  [2]:
    #0 0x42bf87 in __asan_register_globals /home/arakaki/repos/watch/_wt/julia/toolchain-asan-debug/deps/srccache/llvm-julia-12.0.1-4/compiler-rt/lib/asan/asan_globals.cpp:360:3
    #1 0x7f53ce65a30b in asan.module_ctor (/home/arakaki/repos/watch/_wt/julia/llvm-asan-debug/usr/bin/../lib/libjulia-internal-debug.so.1+0x42a30b)

If I disable the ODR violation detection with make -j32 ASAN_OPTIONS=detect_leaks=0:fast_unwind_on_malloc=0:allow_user_segv_handler=1:malloc_context_size=2:detect_odr_violation=0, I get

julia-debug: /home/arakaki/repos/watch/_wt/julia/llvm-asan-debug/deps/srccache/llvm-julia-12.0.1-4/llvm/lib/Support/CommandLine.cpp:361: void (anonymous namespace)::CommandLineParser::registerCategory(llvm::cl::OptionCategory *): Assertion `count_if(RegisteredOptionCategories, [cat](const OptionCategory *Category) { return cat->getName() == Category->getName(); }) == 0 && "Duplicate option categories"' failed.

It's not quite the same as the error in the OP but it looks like it's coming from the same place?

(Edit: hmm... or is it just telling me that there's something wrong in my build configuration?)

@DilumAluthge DilumAluthge pinned this issue Oct 25, 2021
@DilumAluthge DilumAluthge changed the title Error in the asan CI job: "LLVM ERROR: inconsistency in registered CommandLine options" Error during the asan CI job: "LLVM ERROR: inconsistency in registered CommandLine options" Oct 25, 2021
@DilumAluthge DilumAluthge unpinned this issue Nov 4, 2021
@vtjnash
Copy link
Sponsor Member

vtjnash commented Nov 8, 2021

@JeffBezanson I think we either need to fix this or revert the codegen-split PR. We've already gone more than a month with CI disabled due to this.

@DilumAluthge
Copy link
Member Author

Just to clarify, it's only the asan CI job that has been marked as "allow failures". All of the other Buildkite CI jobs are running normally (with failures disallowed).

That being said, I do think it would be nice to fix this failure so that we can switch the asan CI job to run normally.

@tkf
Copy link
Member

tkf commented Nov 16, 2021

this feel like is like we load two different LLVM libraries with the symbol versions being equal

I tried LD_DEBUG=all ../usr/bin/julia-debug -C "native" --output-ji ../usr/lib/julia/corecompiler.ji.tmp --startup-file=no --warn-overwrite=yes -g0 -O0 compiler/compiler.jl 2>&1 | grep -v 'trying file=' | grep -Eo '/[^ ]*/libLLVM[^ ]*.so' | sort -u to see if this is true. But it prints a single path /home/arakaki/repos/watch/_wt/julia/llvm-asan-debug/usr/bin/../lib/libLLVM-12jl.so which indicates that we only load one libLLVM?

Also, @vtjnash mentioned that LLVM 13 may fix it #42602

So maybe we are hitting an edge case in LLVM 12? I'll be happy if 13 fixes it anyway though :)

@tkf
Copy link
Member

tkf commented Jan 26, 2022

It seems #43685 improved ASAN but we still have

LoadError("sysimg.jl", 19, LoadError("/cache/build/default-amdci4-1/julialang/julia-master/tmp/test-asan/asan/usr/share/julia/stdlib/v1.8/Random/src/Random.jl", 3, LoadError("/cache/build/default-amdci4-1/julialang/julia-master/tmp/test-asan/asan/usr/share/julia/stdlib/v1.8/Random/src/DSFMT.jl", 3, InexactError(:trunc, Int64, Inf))))

which seems to be quite reproducible. I just checked four recent builds on master and all of them failed with the same error:

https://buildkite.com/julialang/julia-master/builds/8121#179ccdd4-0691-496c-97cd-26c1039be38c/148-942
https://buildkite.com/julialang/julia-master/builds/8113#6f1c55e4-9e3b-4206-9c33-217cf3683eca/146-940
https://buildkite.com/julialang/julia-master/builds/8112#30b7be5d-b949-4018-9d73-e7e97a595b31/157-951
https://buildkite.com/julialang/julia-master/builds/8111#8200fe2a-6891-4ed0-a87b-9422b5defa4d/160-954

I don't know why ASAN triggers this though

@vtjnash
Copy link
Sponsor Member

vtjnash commented Jan 27, 2022

It is an LLVM assertion error, but you are not running with assertions so it does not report it with complete information

@DilumAluthge
Copy link
Member Author

DilumAluthge commented Jan 27, 2022

For the asan CI job, we should probably build with assertions enabled.

@tkf
Copy link
Member

tkf commented Jan 27, 2022

Do you mean to build (and cache) LLVM with assertion turned on?

@DilumAluthge
Copy link
Member Author

IIUC, if you set LLVM_ASSERTIONS=1, we don't actually have to build LLVM from source; it will automatically download and use LLVM_assert_jll.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:ci Continuous integration external dependencies Involves LLVM, OpenBLAS, or other linked libraries kind:bug Indicates an unexpected problem or unintended behavior
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants