Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libdxcompiler.so locks up when used in many threads at once #4792

Open
MarijnS95 opened this issue Nov 16, 2022 · 1 comment
Open

libdxcompiler.so locks up when used in many threads at once #4792

MarijnS95 opened this issue Nov 16, 2022 · 1 comment
Milestone

Comments

@MarijnS95
Copy link
Contributor

We're parallelizing our shader (asset) builds and hit a snag where - on Linux - this process locks up fairly often. With some debugging in gdb we've pinpointed the issue to reside inside CALL_ONCE_INITIALIZATION on a static volatile:

#define CALL_ONCE_INITIALIZATION(function) \
static volatile sys::cas_flag initialized = 0; \
sys::cas_flag old_val = sys::CompareAndSwap(&initialized, 1, 0); \
if (old_val == 0) { \
function(Registry); \
sys::MemoryFence(); \
TsanIgnoreWritesBegin(); \
TsanHappensBefore(&initialized); \
initialized = 2; \
TsanIgnoreWritesEnd(); \
} else { \
sys::cas_flag tmp = initialized; \
sys::MemoryFence(); \
while (tmp != 2) { \
tmp = initialized; \
sys::MemoryFence(); \
} \
} \
TsanHappensAfter(&initialized);

All threads are either stuck on the MemoryFence(), or initializeLoopSimplifyPass directly above that:

... non-worker threads
  23   Thread 0x7fb450bff6c0 (LWP 128163) "ci-asset-builde" 0x00007fb31eb16893 in llvm::sys::MemoryFence() () from libdxcompiler.so
  24   Thread 0x7fb4509fe6c0 (LWP 128164) "ci-asset-builde" 0x00007fb31eb16893 in llvm::sys::MemoryFence() () from libdxcompiler.so
  25   Thread 0x7fb4507fd6c0 (LWP 128165) "ci-asset-builde" 0x00007fb31eb16893 in llvm::sys::MemoryFence() () from libdxcompiler.so
  26   Thread 0x7fb4505fc6c0 (LWP 128166) "ci-asset-builde" 0x00007fb31eb16893 in llvm::sys::MemoryFence() () from libdxcompiler.so
  27   Thread 0x7fb4503fb6c0 (LWP 128167) "ci-asset-builde" 0x00007fb31eb16893 in llvm::sys::MemoryFence() () from libdxcompiler.so
  28   Thread 0x7fb433fff6c0 (LWP 128168) "ci-asset-builde" 0x00007fb31eb3b4db in llvm::initializeLoopSimplifyPass(llvm::PassRegistry&) () from libdxcompiler.so
  29   Thread 0x7fb433dfe6c0 (LWP 128169) "ci-asset-builde" 0x00007fb31eb3b4d6 in llvm::initializeLoopSimplifyPass(llvm::PassRegistry&) () from libdxcompiler.so
  30   Thread 0x7fb433bfd6c0 (LWP 128170) "ci-asset-builde" 0x00007fb31eb3b4db in llvm::initializeLoopSimplifyPass(llvm::PassRegistry&) () from libdxcompiler.so
  31   Thread 0x7fb4339fc6c0 (LWP 128171) "ci-asset-builde" 0x00007fb31eb16893 in llvm::sys::MemoryFence() () from libdxcompiler.so
... and many more threads
(gdb) thread 23
[Switching to thread 23 (Thread 0x7fb450bff6c0 (LWP 128163))]
#0  0x00007fb31eb16893 in llvm::sys::MemoryFence() () from libdxcompiler.so
(gdb) bt
#0  0x00007fb31eb16893 in llvm::sys::MemoryFence() () from libdxcompiler.so
#1  0x00007fb31eb3b4db in llvm::initializeLoopSimplifyPass(llvm::PassRegistry&) () from libdxcompiler.so
#2  0x00007fb31eb3bbaa in llvm::Pass* llvm::callDefaultCtor<LoopSimplify>() () from libdxcompiler.so
#3  0x00007fb31f5dc884 in llvm::PMTopLevelManager::schedulePass(llvm::Pass*) () from libdxcompiler.so
#4  0x00007fb31f4b44d5 in addHLSLPasses(bool, unsigned int, bool, bool, bool, hlsl::HLSLExtensionsCodegenHelper*, llvm::legacy::PassManagerBase&) () from libdxcompiler.so
#5  0x00007fb31f4b3958 in llvm::PassManagerBuilder::populateModulePassManager(llvm::legacy::PassManagerBase&) () from libdxcompiler.so
#6  0x00007fb31eb9295b in clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::StringRef, llvm::Module*, clang::BackendAction, llvm::raw_pwrite_stream*) ()
   from libdxcompiler.so
#7  0x00007fb31eb80189 in clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) () from libdxcompiler.so
#8  0x00007fb31f3e942c in clang::ParseAST(clang::Sema&, bool, bool) () from libdxcompiler.so
#9  0x00007fb31ecf0ba8 in clang::FrontendAction::Execute() () from libdxcompiler.so
#10 0x00007fb31e727649 in DxcCompiler::Compile(DxcBuffer const*, wchar_t const**, unsigned int, IDxcIncludeHandler*, _GUID const&, void**) () from libdxcompiler.so
#11 0x00007fb31e721272 in hlsl::DxcCompilerAdapter::WrapCompile(bool, IDxcBlob*, wchar_t const*, wchar_t const*, wchar_t const*, wchar_t const**, unsigned int, DxcDefine const*, unsigned int, IDxcIncludeHandler*, IDxcOperationResult**, wchar_t**, IDxcBlob**) ()
   from libdxcompiler.so
#12 0x00007fb31e72219f in hlsl::DxcCompilerAdapter::CompileWithDebug(IDxcBlob*, wchar_t const*, wchar_t const*, wchar_t const*, wchar_t const**, unsigned int, DxcDefine const*, unsigned int, IDxcIncludeHandler*, IDxcOperationResult**, wchar_t**, IDxcBlob**) ()
   from libdxcompiler.so
#13 0x00007fb31e722ec8 in hlsl::DxcCompilerAdapter::Compile(IDxcBlob*, wchar_t const*, wchar_t const*, wchar_t const*, wchar_t const**, unsigned int, DxcDefine const*, unsigned int, IDxcIncludeHandler*, IDxcOperationResult**) ()
   from libdxcompiler.so

Without diving into the code and whether this may or may not be free of race conditions or not guarantee forward progress, I forward-ported some of the upstream LLVM changes to at least use std::call_once instead of this custom implementation in https://github.com/MarijnS95/DirectXShaderCompiler/compare/import-llvm_once-upstream-changes. Unfortunately this also locks up:

...
  23   Thread 0x7f636bfff6c0 (LWP 146834) "ci-asset-builde" 0x00007f63bd1bc821 in ?? () from /usr/lib/libc.so.6
  24   Thread 0x7f636bdfe6c0 (LWP 146835) "ci-asset-builde" 0x00007f63bd1bc821 in ?? () from /usr/lib/libc.so.6
  25   Thread 0x7f636bbfd6c0 (LWP 146836) "ci-asset-builde" 0x00007f63bd1bc821 in ?? () from /usr/lib/libc.so.6
  26   Thread 0x7f636b9fc6c0 (LWP 146837) "ci-asset-builde" 0x00007f63bd1bc821 in ?? () from /usr/lib/libc.so.6
  27   Thread 0x7f636b7fb6c0 (LWP 146838) "ci-asset-builde" 0x00007f63bd1bc821 in ?? () from /usr/lib/libc.so.6
  28   Thread 0x7f636b5fa6c0 (LWP 146839) "ci-asset-builde" 0x00007f63bd1bc821 in ?? () from /usr/lib/libc.so.6
  29   Thread 0x7f636b3f96c0 (LWP 146840) "ci-asset-builde" 0x00007f63bd1bc821 in ?? () from /usr/lib/libc.so.6
  30   Thread 0x7f636b1f86c0 (LWP 146841) "ci-asset-builde" 0x00007f63bd1bc821 in ?? () from /usr/lib/libc.so.6
  31   Thread 0x7f636aff76c0 (LWP 146842) "ci-asset-builde" 0x00007f63bd1bc821 in ?? () from /usr/lib/libc.so.6
...
(gdb) thread 23
[Switching to thread 23 (Thread 0x7f636bfff6c0 (LWP 146834))]
#0  0x00007f63bd1bc821 in ?? () from /usr/lib/libc.so.6
(gdb) bt
#0  0x00007f63bd1bc821 in ?? () from /usr/lib/libc.so.6
#1  0x00007f63a47069a3 in __gthread_once (__once=0x7f63a5821800 <InitializeLoopSimplifyPassFlag>, __func=0x80)
    at /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/12.2.0/../../../../include/c++/12.2.0/x86_64-pc-linux-gnu/bits/gthr-default.h:700
#2  std::call_once<void* (&)(llvm::PassRegistry&), std::reference_wrapper<llvm::PassRegistry> > (__once=..., __f=<optimized out>,
    __args=...) at /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/12.2.0/../../../../include/c++/12.2.0/mutex:859
#3  llvm::call_once<void* (&)(llvm::PassRegistry&), std::reference_wrapper<llvm::PassRegistry> > (flag=..., F=<optimized out>,
    ArgList=...) at include/llvm/Support/Threading.h:92
#4  llvm::initializeLoopSimplifyPass (Registry=...)
    at lib/Transforms/Utils/LoopSimplify.cpp:751
... same as above

(Note that this happened to be a RelWithDebInfo build versus Release above, but does not change the outcome)


Fortunately we also stumbled upon https://reviews.llvm.org/D19271: this patch has not yet been applied, but is much simpler in design by replacing the entire call_once logic with a simple static initializer through a lambda expression. We are now running with this and have not yet observed any erratic behaviour.

Additional context

DXC was tested on the latest commit as of yesterday: 24ca1f4

@MarijnS95
Copy link
Contributor Author

MarijnS95 commented Nov 24, 2022

For completeness, here's a little internal testing matrix, where I tried to reach 100 runs of our build system, running 64 jobs in parallel on a ThreadRipper 3970X:

DXC built on Release RelWithDebInfo std::call_once + RelWithDebInfo static bool initializer + Release
Size ±27MB ±330MB ±330MB ±27MB
DXC CI Unavailable Unavailable Appveyor build from #4818 1
Our CI ✔️ ✔️
Marijn workstation ✔️ 2 ❌ Locked up on run 18! ❌ Locked up on run 17! ✔️

On all failed runs it typically happens well before reaching 10.

Footnotes

  1. Unable to build many of our shaders due to an unrelated SPIR-V codegen issue.

  2. My workstation builds DXC with a too-new GLIBC for the shader compiler to be used on our Continuous Integration; and I don't like pushing custom-built assets there.

@llvm-beanz llvm-beanz added the needs-triage Awaiting triage label Jun 29, 2023
@pow2clk pow2clk added bug Bug, regression, crash and removed needs-triage Awaiting triage labels Nov 14, 2023
@llvm-beanz llvm-beanz added needs-triage Awaiting triage and removed bug Bug, regression, crash labels Nov 14, 2023
@pow2clk pow2clk moved this to Done in HLSL Triage Nov 14, 2023
@damyanp damyanp added this to the Dormant milestone Sep 26, 2024
@damyanp damyanp removed the needs-triage Awaiting triage label Sep 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Triaged
4 participants