Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

doc test "Couldn't compile the test" on aarch64 + LTO #91671

Closed
pnkfelix opened this issue Dec 8, 2021 · 30 comments · Fixed by #93426
Closed

doc test "Couldn't compile the test" on aarch64 + LTO #91671

pnkfelix opened this issue Dec 8, 2021 · 30 comments · Fixed by #93426
Assignees
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. A-LTO Area: Link-time optimization (LTO) C-bug Category: This is a bug. I-crash Issue: The compiler crashes (SIGSEGV, SIGABRT, etc). Use I-ICE instead when the compiler panics. O-Arm Target: 32-bit Arm processors (armv6, armv7, thumb...), including 64-bit Arm in AArch32 state P-high High priority regression-from-stable-to-stable Performance or correctness regression from one stable version to another. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@pnkfelix
Copy link
Member

pnkfelix commented Dec 8, 2021

Update 2021-12-27: More minimized MCVE at repo here (also linked from comment below.)


I tried this code (note that the crate name needs to match its usage in the doc test to reproduce the bug properly):

# Cargo.toml

[package]
name = "a64_doctestfail"
version = "0.1.0"
edition = "2021"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]
tokio = { version = "0.2", features = ["full"] }

[profile.release]
lto = true
// lib.rs

pub fn bad() {
    let _ = tokio::runtime::Builder::new();
}

/// ```
/// use a64_doctestfail;
/// ```
pub struct X;

And then ran cargo --release --doc.

I expected to see this happen:

(Working behavior from 1.56.1, witnessed via nightly-2021-10-13)

% cargo +nightly-2021-10-13 test --release --doc
   Compiling a64_doctestfail v0.1.0 (/Users/pnkfelix/Dev/Rust/a64_doctestfail)
    Finished release [optimized] target(s) in 0.09s
   Doc-tests a64_doctestfail

running 1 test
test src/lib.rs - X (line 7) ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.19s

Instead, this happened:

(Broken behavior from 1.57.0, witnessed via nightly-2021-10-14)

% cargo +nightly-2021-10-14 test --release --doc
    Finished release [optimized] target(s) in 0.01s
   Doc-tests a64_doctestfail

running 1 test
test src/lib.rs - X (line 7) ... FAILED

failures:

---- src/lib.rs - X (line 7) stdout ----
Couldn't compile the test.

failures:
    src/lib.rs - X (line 7)

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.68s

error: test failed, to rerun pass '--doc'

Meta

rustc +nightly-2021-10-14 --version --verbose:
rustc 1.57.0-nightly (dfc5add 2021-10-13)
binary: rustc
commit-hash: dfc5add
commit-date: 2021-10-13
host: aarch64-apple-darwin
release: 1.57.0-nightly
LLVM version: 13.0.0


`rustc +stable --version --verbose`:

rustc 1.57.0 (f1edd04 2021-11-29)
binary: rustc
commit-hash: f1edd04
commit-date: 2021-11-29
host: aarch64-apple-darwin
release: 1.57.0
LLVM version: 13.0.0

@pnkfelix pnkfelix added C-bug Category: This is a bug. O-Arm Target: 32-bit Arm processors (armv6, armv7, thumb...), including 64-bit Arm in AArch32 state A-LTO Area: Link-time optimization (LTO) labels Dec 8, 2021
@pnkfelix
Copy link
Member Author

pnkfelix commented Dec 8, 2021

(by the way, the reason that I didn't include cargo-bisect-rustc output above is that the tool was not working for me today for this target; some kind of failure in its attempts to download the pre-built binaries per-commit. I haven't looked in more detail into it; for all I know, maybe we don't keep per-commit builds for aarch64...)

@pnkfelix
Copy link
Member Author

pnkfelix commented Dec 8, 2021

Here are the relevant commits over the period where this bug was injected:

% git log --author=bors d7c97a02d..dfc5add91 --format=oneline:

@pnkfelix pnkfelix added the regression-from-stable-to-stable Performance or correctness regression from one stable version to another. label Dec 8, 2021
@rustbot rustbot added the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label Dec 8, 2021
@pnkfelix pnkfelix added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Dec 8, 2021
@pnkfelix
Copy link
Member Author

pnkfelix commented Dec 8, 2021

(it would be good to further minimize this. if possible I'd like to rewrite it as a normal test rather than a rustdoc one, if possible.)

@pnkfelix pnkfelix added the E-needs-mcve Call for participation: This issue has a repro, but needs a Minimal Complete and Verifiable Example label Dec 8, 2021
@jyn514
Copy link
Member

jyn514 commented Dec 8, 2021

@pnkfelix can you run cargo test --release --doc -- --nocapture and post the output? I think rustdoc is hiding the actual compile error.

@jyn514
Copy link
Member

jyn514 commented Dec 8, 2021

It seems strange this only happens with doctests - does it also happen if you add a second crate that depends on a64_doctestfail (without a doctest)?

@hkratz
Copy link
Contributor

hkratz commented Dec 8, 2021

Bisecting worked fine for me. The regression pinpointed to the cargo update #89802

bisected with cargo-bisect-rustc v0.6.1

searched toolchains d7c97a0 through dfc5add


Regression in a16f686


searched nightlies: from nightly-2021-10-13 to nightly-2021-12-08
regressed nightly: nightly-2021-10-14
searched commit range: d7c97a0...dfc5add
regressed commit: a16f686

Host triple: aarch64-apple-darwin
Reproduce with:

cargo bisect-rustc 2021-10-13 -- test --release --doc -- --nocapture

@pnkfelix
Copy link
Member Author

pnkfelix commented Dec 8, 2021

@pnkfelix can you run cargo test --release --doc -- --nocapture and post the output? I think rustdoc is hiding the actual compile error.

Adding --nocapture alone does not seem to help here. I think you are right that rustdoc is hiding the actual compiler error; but I also think that we need to go deeper to dig out the root cause. (I do wish that --nocapture did resolve it, or some better still, some set of --verbose flags...)

@pnkfelix
Copy link
Member Author

pnkfelix commented Dec 8, 2021

I figured out how to use rustdoc --test-builder to feed in a wrapper script around rustc that also emits the arguments that were passed into rustc. That should hopefully be enough for me to narrow this down to a bug that doesn't rely on rustdoc for us to replicate it.

@pnkfelix
Copy link
Member Author

pnkfelix commented Dec 8, 2021

Also, since I'm staring at a bunch of verbose command output, and the problem was already bisected to a cargo update #89802 ... maybe the real culprit a change to how rustdoc is being invoked by cargo here?

In 2021-10-13, the rustdoc invocation from cargo looks like this:

     Running `rustdoc --edition=2021 --crate-type lib --crate-name a64_doctestfail --test /Users/pnkfelix/Dev/Rust/a64_doctestfail/src/lib.rs -L dependency=/Users/pnkfelix/Dev/Rust/a64_doctestfail/target/release/deps -L dependency=/Users/pnkfelix/Dev/Rust/a64_doctestfail/target/release/deps --extern a64_doctestfail=/Users/pnkfelix/Dev/Rust/a64_doctestfail/target/release/deps/liba64_doctestfail-c008d086ffd3f1fc.rlib --extern tokio=/Users/pnkfelix/Dev/Rust/a64_doctestfail/target/release/deps/libtokio-9a16f25c00e3ffb9.rlib -C embed-bitcode=no --error-format human`

In 2021-10-14, the rustdoc invocation from cargo looks like this:

     Running `rustdoc --edition=2021 --crate-type lib --crate-name a64_doctestfail --test /Users/pnkfelix/Dev/Rust/a64_doctestfail/src/lib.rs -L dependency=/Users/pnkfelix/Dev/Rust/a64_doctestfail/target/release/deps -L dependency=/Users/pnkfelix/Dev/Rust/a64_doctestfail/target/release/deps --extern a64_doctestfail=/Users/pnkfelix/Dev/Rust/a64_doctestfail/target/release/deps/liba64_doctestfail-92a7e819a735c31f.rlib --extern tokio=/Users/pnkfelix/Dev/Rust/a64_doctestfail/target/release/deps/libtokio-72abe3f6a7246c4a.rlib -C lto --error-format human`

Leveraging Emacs ediff-mode, these invocations have only three differences:

  1. liba64_doctestfail-c008d086ffd3f1fc.rlib is rewritten to liba64_doctestfail-92a7e819a735c31f.rlib, and libtokio-9a16f25c00e3ffb9.rlib is rewritten to libtokio-72abe3f6a7246c4a.rlib We can probably safely disregard these two differences, which are just artifacts of how we generate filenames.
  2. The old version has -C embed-bitcode=no
  3. The new version has -C lto

So... did cargo start passing along -C lto into rustdoc invocations and it is exposing some underlying bug from LTO?

@pnkfelix
Copy link
Member Author

pnkfelix commented Dec 8, 2021

So... did cargo start passing along -C lto into rustdoc invocations and it is exposing some underlying bug from LTO?

Aha, I bet its fallout from this new cargo feature rust-lang/cargo#9943

@pnkfelix
Copy link
Member Author

pnkfelix commented Dec 9, 2021

Running the 2021-10-14 rustdoc invocation manually without -C lto does indeed pass the test.

So this seems like it is probably exposing some LTO-related bug, one that may well predate nightly-2021-10-14.

Its non-trivial to try to test this thesis against the 2021-10-13 build, though, because its not simply a matter of adding -C lto to the rustdoc invocation at the end in that case. If you do try that, then you get the error:

error: failed to get bitcode from object file for LTO (Bitcode section not found in object file)

So I'm guessing more parts of the build recipe would need to be revised to pass -C lto (or at least -C embed-bitcode=yes ?)

And I think before I put in that kind of effort, I'm first going to investigate whether we can reduce this to a smaller MCVE (preferably one that doesn't need to pull in tokio ...)

@pnkfelix
Copy link
Member Author

pnkfelix commented Dec 9, 2021

The problem isn't tokio-specific; I was able to get the same result with an MCVE variant that used Rayon instead of Tokio, with code like this:

pub fn bad() {
    let _: rayon::Scope;
}

but its does not duplicate with just any old crate either. (I tried using an ancient demo crate of mine, add3, and that was not sufficient to witness the problem here.

@jyn514
Copy link
Member

jyn514 commented Dec 9, 2021

@pnkfelix did you make progress with your idea to use a wrapper script to get the rustc args? That seems like an easy way to avoid making this reliant on rustdoc (i.e make it possible to reproduce with an older version of cargo).

@pnkfelix
Copy link
Member Author

pnkfelix commented Dec 9, 2021

I was indeed able to get wrapper script to print the rustc args.

But since then, I've decided to focus on first reducing the MCVE further.

E.g. at this point I've found that just an extern crate rayon; declaration suffices to reproduce the problem; no pub fn bad() { ... } necessary at all, nor any imports from the crate.

I want to try to minimize the set of crate dependencies first, and then I'll switch back to extracting the series of rustc invocations (which will hopefully be teeny tiny by that point...)

@pnkfelix pnkfelix self-assigned this Dec 16, 2021
@pnkfelix
Copy link
Member Author

@rustbot label: +P-high -I-prioritize

@rustbot rustbot added P-high High priority and removed I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels Dec 16, 2021
@pnkfelix
Copy link
Member Author

pnkfelix commented Dec 17, 2021

Okay, I've minimized it to a tiny fragment of rayon with no further dependencies beyond.

I still need two distinct crates (and the rustdoc invocation), but that's mostly because I wanted to get to this point before I looked more into reducing that aspect of things. (Well, that, and the fact that its an LTO bug pretty much implies that an MCVE is going to require multiple crates.)

Here's the repo where I've checkpointed my work: https://github.com/pnkfelix/issue-91671-a64-doctestfail

@pnkfelix
Copy link
Member Author

@pnkfelix did you make progress with your idea to use a wrapper script to get the rustc args? That seems like an easy way to avoid making this reliant on rustdoc (i.e make it possible to reproduce with an older version of cargo).

Okay, I was successful in getting rustdoc (and cargo) entirely out of the picture.

The result is that rustc itself is segfaulting during the invocation on this line:

https://github.com/pnkfelix/issue-91671-a64-doctestfail/blob/4f2e874fa404ca462e1993c20368ed1629bd2a2c/repro.sh#L9

I'm going to keep investigating. It seems very likely to be some sort of issue with LLVM.

@pnkfelix
Copy link
Member Author

The segfault appears to have been injected between nightly-2021-08-21 nightly-2021-08-22.

@pnkfelix
Copy link
Member Author

% git log --author=bors a0035916e..d3e2578c3 --format=oneline
d3e2578c31688619ddc0a10ddf8543bf4ebcba5b Auto merge of #88135 - crlf0710:trait_upcasting_part_3, r=nikomatsakis
b1928aa3b4a8a2df462e408b67ad29737a3f8f31 Auto merge of #82776 - jyn514:extern-url-fallback, r=GuillaumeGomez
99b73e81b351d036449e76ad753160853625c5b6 Auto merge of #88134 - rylev:force-warn-improvements, r=nikomatsakis
b6e334d87349502766be70d649e6fe4a73573482 Auto merge of #88128 - cuviper:needs-asm-support, r=Mark-Simulacrum
db002a06ae9154a35d410550bc5132df883d7baa Auto merge of #87570 - nikic:llvm-13, r=nagisa
e7f7fe462a54b1caeb804a974cd43ba9fd7bee5c Auto merge of #88073 - lnicola:rust-analyzer-2021-08-16, r=lnicola
797095a686bdc821143e52ed1db2b98db9d0f3eb Auto merge of #88149 - Mark-Simulacrum:prep-never-type, r=jackh726
1e3d632f8f921d03ccc5b71d97decf980df7dbe4 Auto merge of #88087 - jesyspa:issue-87935-box, r=jackh726

Skimming over that, I'm betting this is connected to the LLVM 13 upgrade in #87570.

@jyn514 jyn514 added the A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. label Dec 18, 2021
@pnkfelix
Copy link
Member Author

Sweet! I generalized my repro.sh script to support cross-compilation, and now I can reproduce the seg fault on Linux x86_64!

This means I get to use pernos.co to debug the segmentation fault!

@pnkfelix
Copy link
Member Author

FYI. (I haven't made a local build with better debugging info yet. But the stack trace seems to solidly point to LLVM, as previously hypothesized.)

Thread 2 received signal SIGSEGV, Segmentation fault.
[Switching to Thread 167404.167414]
0x00007f34b5f640c9 in selectCopy(llvm::MachineInstr&, llvm::TargetInstrInfo const&, llvm::MachineRegisterInfo&, llvm::TargetRegisterInfo const&, llvm::RegisterBankInfo const&) () from /home/pnkfelix/.rustup/toolchains/nightly-2021-08-22-x86_64-unknown-linux-gnu/bin/../lib/../lib/libLLVM-13-rust-1.56.0-nightly.so
(rr) bt
#0  0x00007f34b5f640c9 in selectCopy(llvm::MachineInstr&, llvm::TargetInstrInfo const&, llvm::MachineRegisterInfo&, llvm::TargetRegisterInfo const&, llvm::RegisterBankInfo const&) () from /home/pnkfelix/.rustup/toolchains/nightly-2021-08-22-x86_64-unknown-linux-gnu/bin/../lib/../lib/libLLVM-13-rust-1.56.0-nightly.so
#1  0x00007f34b4c59dae in llvm::InstructionSelect::runOnMachineFunction(llvm::MachineFunction&) ()
   from /home/pnkfelix/.rustup/toolchains/nightly-2021-08-22-x86_64-unknown-linux-gnu/bin/../lib/../lib/libLLVM-13-rust-1.56.0-nightly.so
#2  0x00007f34b45d3fde in llvm::MachineFunctionPass::runOnFunction(llvm::Function&) ()
   from /home/pnkfelix/.rustup/toolchains/nightly-2021-08-22-x86_64-unknown-linux-gnu/bin/../lib/../lib/libLLVM-13-rust-1.56.0-nightly.so
#3  0x00007f34b4370429 in llvm::FPPassManager::runOnFunction(llvm::Function&) ()
   from /home/pnkfelix/.rustup/toolchains/nightly-2021-08-22-x86_64-unknown-linux-gnu/bin/../lib/../lib/libLLVM-13-rust-1.56.0-nightly.so
#4  0x00007f34b43773e3 in llvm::FPPassManager::runOnModule(llvm::Module&) ()
   from /home/pnkfelix/.rustup/toolchains/nightly-2021-08-22-x86_64-unknown-linux-gnu/bin/../lib/../lib/libLLVM-13-rust-1.56.0-nightly.so
#5  0x00007f34b4370c90 in llvm::legacy::PassManagerImpl::run(llvm::Module&) ()
   from /home/pnkfelix/.rustup/toolchains/nightly-2021-08-22-x86_64-unknown-linux-gnu/bin/../lib/../lib/libLLVM-13-rust-1.56.0-nightly.so
#6  0x00007f34b9b8c268 in LLVMRustWriteOutputFile ()
   from /home/pnkfelix/.rustup/toolchains/nightly-2021-08-22-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-f66b15f11dec651a.so
#7  0x00007f34b9b1304f in rustc_codegen_llvm::back::write::write_output_file ()
   from /home/pnkfelix/.rustup/toolchains/nightly-2021-08-22-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-f66b15f11dec651a.so
#8  0x00007f34b9b15f70 in rustc_codegen_llvm::back::write::codegen ()
   from /home/pnkfelix/.rustup/toolchains/nightly-2021-08-22-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-f66b15f11dec651a.so
#9  0x00007f34b9b2146b in rustc_codegen_ssa::back::write::finish_intra_module_work ()
   from /home/pnkfelix/.rustup/toolchains/nightly-2021-08-22-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-f66b15f11dec651a.so
#10 0x00007f34b9b1ad84 in rustc_codegen_ssa::back::write::execute_work_item ()
   from /home/pnkfelix/.rustup/toolchains/nightly-2021-08-22-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-f66b15f11dec651a.so
#11 0x00007f34b9b5cd3c in std::sys_common::backtrace::__rust_begin_short_backtrace ()
   from /home/pnkfelix/.rustup/toolchains/nightly-2021-08-22-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-f66b15f11dec651a.so
#12 0x00007f34b9b7931c in core::ops::function::FnOnce::call_once{{vtable.shim}} ()
   from /home/pnkfelix/.rustup/toolchains/nightly-2021-08-22-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-f66b15f11dec651a.so
#13 0x00007f34b77a0fe3 in alloc::boxed::{impl#44}::call_once<(), dyn core::ops::function::FnOnce<(), Output=()>, alloc::alloc::Global> ()
    at /rustc/d3e2578c31688619ddc0a10ddf8543bf4ebcba5b/library/alloc/src/boxed.rs:1636
#14 alloc::boxed::{impl#44}::call_once<(), alloc::boxed::Box<dyn core::ops::function::FnOnce<(), Output=()>, alloc::alloc::Global>, alloc::alloc::Global> ()
    at /rustc/d3e2578c31688619ddc0a10ddf8543bf4ebcba5b/library/alloc/src/boxed.rs:1636
#15 std::sys::unix::thread::{impl#2}::new::thread_start () at library/std/src/sys/unix/thread.rs:106
#16 0x00007f34b7539927 in start_thread (arg=<optimized out>) at pthread_create.c:435
#17 0x00007f34b75c99e4 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:100

@jyn514 jyn514 added the I-crash Issue: The compiler crashes (SIGSEGV, SIGABRT, etc). Use I-ICE instead when the compiler panics. label Dec 22, 2021
@pnkfelix
Copy link
Member Author

Rebuilt Rust with an LLVM Debug build (I always forget that its not enough to just edit the [llvm] section of the config.toml, you need to explicitly force LLVM to be rebuilt).

I hit the assertion:

rustc: llvm-project/llvm/lib/CodeGen/MachineRegisterInfo.cpp:381: void llvm::MachineRegisterInfo::replaceRegWith(llvm::Register, llvm::Register): Assertion `FromReg != ToReg && "Cannot replace a reg with itself"' failed.

Here is a pernos.co session for the bug.

I'm going to spend a little while poking around in the Pernos.co session.

If I don't find an obvious bug that way, then I will see if I can make a standalone LLVM test case.

@pnkfelix
Copy link
Member Author

Rebuilt Rust with an LLVM Debug build (I always forget that its not enough to just edit the [llvm] section of the config.toml, you need to explicitly force LLVM to be rebuilt).

I hit the assertion:

rustc: llvm-project/llvm/lib/CodeGen/MachineRegisterInfo.cpp:381: void llvm::MachineRegisterInfo::replaceRegWith(llvm::Register, llvm::Register): Assertion `FromReg != ToReg && "Cannot replace a reg with itself"' failed.

Part of the problem is that I could not readily tell if that Debug assert was related to the root cause of the bug in play here, or is an artifact of some other unrelated issue.

So I made a quick hack and put this in:

diff --git a/llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp b/llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp
index 06d827de2e96..aef9ba02b000 100644
--- a/llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp
+++ b/llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp
@@ -161,7 +161,9 @@ void CombinerHelper::applyCombineCopy(MachineInstr &MI) {
   Register DstReg = MI.getOperand(0).getReg();
   Register SrcReg = MI.getOperand(1).getReg();
   MI.eraseFromParent();
-  replaceRegWith(MRI, DstReg, SrcReg);
+  if (DstReg != SrcReg) {
+      replaceRegWith(MRI, DstReg, SrcReg);
+  }
 }

 bool CombinerHelper::tryCombineConcatVectors(MachineInstr &MI) {

(this side-steps the assertion failure by not doing the replaceRegWith call on the instruction at all when the two registers are the same. In theory this should have no effect on the semantics of the program output.)

After doing this, I again get a segfault from the resulting build of rustc itself.

(Its still possible that the Assert does have the same root cause, or a related one. But I'm hoping I will have an easier time identifying the actual root cause of the segfault itself this way.)

This is a link to a new pernos.co session (still being built as of this writing) that uses a binary with the LLVM diff above:

https://pernos.co/debug/WFSi29iBSkbBC0n-bHlqKA/index.html

@pnkfelix
Copy link
Member Author

pnkfelix commented Dec 27, 2021

Looks like this invocation of RegisterBankInfo::getRegBank from LLVM is returning nullptr, and that line immediately dereferences it.

const RegisterBank &SrcRegBank = *RBI.getRegBank(SrcReg, MRI, TRI);

Pernosco link (if that link makes you log into github, do so, and then go back and follow the link again; from my experience, the much of specific context is lost during the github login process.)

@pnkfelix
Copy link
Member Author

pnkfelix commented Jan 4, 2022

Okay, newest update:

  • The pernosco links above are not as useful as I would like, because they do not have any logging output from LLVM, so its hard to understand the intermediate states of things as you work through the code.
  • I tried doing runs with an debug-enabled LLVM and passing the -debug flag as an LLVM arg. This, on its face, caused my Rust-debug-noopt build time to go from 48 seconds to, um, five days or so. 😆
  • In parallel, I followed @jswrenn 's advice to try making a no_core version of the MCVE, in hopes that would reduce the build times sufficiently that LLVM -debug would be worth while.
  • Long story short: It took a day to make a no_core version. It takes less than 2 seconds to build! (An intermediate no_std version took 20 seconds to build.) Passing the -debug LLVM arg to this variant does not blow up the build times the way it did on the original example.
  • The new MCVE is at this repo; it is now two files, instead of three.
  • And, here is the newest pernosco link, now with LLVM debug output.
    • The above pernos.co trace is built with a slightly editted version of LLVM. Namely, I added some assertions that correspond to checks that we end up failing later on in the control flow.

The problematic instruction seems to be this:

Selecting: 
  $x1 = COPY %10:_(s64)

(I do not know what that syntax means, namely the _(s64) part. Continuing to dig. update: oh, the :_(s64) is probably a type-ascription note, where s64 signifies its a signed 64-bit integer. I.e. I don't think its relevant to the core issue here.)

@cuviper
Copy link
Member

cuviper commented Jan 11, 2022

I was just debugging rayon-rs/rayon#911 that also bisected to the cargo update in #89802, which led me to this issue. This is another case of rustdoc + LTO failure, but instead of a compiler crash I've got bad codegen that crashes at runtime. I'm hoping it's the same root cause in LLVM that we're both looking for, so maybe that different angle will help.

@pnkfelix
Copy link
Member Author

(I'm going to count the MCVE from my repo as good enough to remove the E-needs-mcve from this ticket.)

@rustbot label: -E-needs-mcve

@rustbot rustbot removed the E-needs-mcve Call for participation: This issue has a repro, but needs a Minimal Complete and Verifiable Example label Jan 19, 2022
@pnkfelix
Copy link
Member Author

Sweet, @nikic massively reduced the test case further (from LLVM's perspective) and has a candidate patch up; see progress over on llvm/llvm-project#53315

@pnkfelix
Copy link
Member Author

pnkfelix commented Jan 24, 2022

On the LLVM side, this was fixed by llvm/llvm-project@0d1308a

i'm going to see about cherry-picking that into the rustc local fork of LLVM.

@pnkfelix
Copy link
Member Author

(I'm not sure there's much value in encoding this specific reproducer as a regression test for rustc, since its so tightly coupled to LLVM-IR details... so I didn't include a regression test in the PR #93426. But I'm open to counter-arguments...)

@bors bors closed this as completed in e0a55f4 Jan 28, 2022
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Jun 10, 2022
…, r=oli-obk

refactor write_output_file to merge two invocation paths into one.

this is a trivial refactor I did while I was investigating issue rust-lang#91671.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. A-LTO Area: Link-time optimization (LTO) C-bug Category: This is a bug. I-crash Issue: The compiler crashes (SIGSEGV, SIGABRT, etc). Use I-ICE instead when the compiler panics. O-Arm Target: 32-bit Arm processors (armv6, armv7, thumb...), including 64-bit Arm in AArch32 state P-high High priority regression-from-stable-to-stable Performance or correctness regression from one stable version to another. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants