Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rustc master has a flaky failure when compiled against LLVM main #99432

Closed
durin42 opened this issue Jul 18, 2022 · 3 comments · Fixed by #99512
Closed

rustc master has a flaky failure when compiled against LLVM main #99432

durin42 opened this issue Jul 18, 2022 · 3 comments · Fixed by #99512
Assignees

Comments

@durin42
Copy link
Contributor

durin42 commented Jul 18, 2022

LLVM commit ede600377cb6df1bef71f070130d8cfe734cc5b7 moved some stuff from being ManagedStatic to just thread-local static, and it seems we're not correctly blocking on LLVM threads being complete. If I apply this:

diff --git a/compiler/rustc_driver/src/lib.rs b/compiler/rustc_driver/src/lib.rs
index b71cdad718a..96f34c9ec3b 100644
--- a/compiler/rustc_driver/src/lib.rs
+++ b/compiler/rustc_driver/src/lib.rs
@@ -1336,6 +1336,10 @@ pub fn main() -> ! {
         let end_rss = get_resident_set_size();
         print_time_passes_entry("total", start_time.elapsed(), start_rss, end_rss);
     }
-
-    process::exit(exit_code)
+    if 1 == 200 {
+        process::exit(exit_code)
+    }
+    unsafe {
+    libc::_exit(exit_code)
+    }
 }

then I no longer see the failure in 10,000 attempts, which always looks like this when it happens:

---- [ui] src/test/ui/cmse-nonsecure/cmse-nonsecure-call/params-on-stack.rs stdout ----
 
error: Error: expected failure status (Some(1)) but received status Some(101).
status: exit status: 101
command: "/var/lib/buildkite-agent/builds/rust-llvm-integrate/llvm-project/rust-llvm-integrate-prototype/build/x86_64-unknown-linux-gnu/stage1/bin/rustc" "/var/lib/buildkite-agent/builds/rust-llvm-integrate/llvm-project/rust-llvm-integrate-prototype/src/test/ui/cmse-nonsecure/cmse-nonsecure-call/params-on-stack.rs" "-Zthreads=1" "--error-format" "json" "--json" "future-incompat" "-Ccodegen-units=1" "-Zui-testing" "-Zdeduplicate-diagnostics=no" "-Cstrip=debuginfo" "-C" "prefer-dynamic" "--out-dir" "/var/lib/buildkite-agent/builds/rust-llvm-integrate/llvm-project/rust-llvm-integrate-prototype/build/x86_64-unknown-linux-gnu/test/ui/cmse-nonsecure/cmse-nonsecure-call/params-on-stack" "-A" "unused" "-Crpath" "-Cdebuginfo=0" "-Lnative=/var/lib/buildkite-agent/builds/rust-llvm-integrate/llvm-project/rust-llvm-integrate-prototype/build/x86_64-unknown-linux-gnu/native/rust-test-helpers" "-Clinker=/usr/bin/clang-13" "--target" "thumbv8m.main-none-eabi" "--crate-type" "lib" "-L" "/var/lib/buildkite-agent/builds/rust-llvm-integrate/llvm-project/rust-llvm-integrate-prototype/build/x86_64-unknown-linux-gnu/test/ui/cmse-nonsecure/cmse-nonsecure-call/params-on-stack/auxiliary"
stdout: none
--- stderr -------------------------------
error: <unknown>:0:0: in function test i32 (i32, i32, i32, i32, i32): call to non-secure function would require passing arguments on stack
 
error: aborting due to previous error
 
LLVM ERROR: Do not know how to split the result of this operator!
------------------------------------------
 
 
 
failures:
[ui] src/test/ui/cmse-nonsecure/cmse-nonsecure-call/params-on-stack.rs

On our CI VM (which is fairly small) this appears to happen roughly a third or half the time. On my workstation it's less than 1% of the time, best guess about 0.3% of the time, so 10k runs is necessary to reliably see the issue.

I'll hopefully have time to fix this in the near future, but wanted to record this ASAP so we don't forget when the LLVM upgrade is due.

@nikic
Copy link
Contributor

nikic commented Jul 19, 2022

I think the problem is that

just returns. We also have a different CodegenAborted message that does wait for other workers:
Message::CodegenAborted => {

@nikic
Copy link
Contributor

nikic commented Jul 20, 2022

Okay, I was quite off track here. I believe the actual problem is that we emit a fatal error in the LLVM worker, which will send an SharedEmitterMessage::Fatal message, and then raise a fatal error in the main thread:

This will unwind and exit the process. However, there might still be other LLVM threads active, and more importantly, I believe the thread that emitted the diagnostic (if it is a codegen diagnostic) will also continue running.

@nikic nikic self-assigned this Jul 20, 2022
@nikic
Copy link
Contributor

nikic commented Jul 20, 2022

I've included a fix for this in #99512. Turned out to be less straightforward than I hoped.

@bors bors closed this as completed in 9de7474 Jul 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants