-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
doc test "Couldn't compile the test" on aarch64 + LTO #91671
Comments
(by the way, the reason that I didn't include |
Here are the relevant commits over the period where this bug was injected:
|
(it would be good to further minimize this. if possible I'd like to rewrite it as a normal test rather than a rustdoc one, if possible.) |
@pnkfelix can you run |
It seems strange this only happens with doctests - does it also happen if you add a second crate that depends on |
Bisecting worked fine for me. The regression pinpointed to the cargo update #89802 bisected with cargo-bisect-rustc v0.6.1searched toolchains d7c97a0 through dfc5add Regression in a16f686 searched nightlies: from nightly-2021-10-13 to nightly-2021-12-08 Host triple: aarch64-apple-darwin cargo bisect-rustc 2021-10-13 -- test --release --doc -- --nocapture |
Adding |
I figured out how to use |
Also, since I'm staring at a bunch of verbose command output, and the problem was already bisected to a cargo update #89802 ... maybe the real culprit a change to how In 2021-10-13, the
In 2021-10-14, the
Leveraging Emacs ediff-mode, these invocations have only three differences:
So... did |
Aha, I bet its fallout from this new cargo feature rust-lang/cargo#9943 |
Running the 2021-10-14 So this seems like it is probably exposing some LTO-related bug, one that may well predate nightly-2021-10-14. Its non-trivial to try to test this thesis against the 2021-10-13 build, though, because its not simply a matter of adding
So I'm guessing more parts of the build recipe would need to be revised to pass And I think before I put in that kind of effort, I'm first going to investigate whether we can reduce this to a smaller MCVE (preferably one that doesn't need to pull in |
The problem isn't tokio-specific; I was able to get the same result with an MCVE variant that used Rayon instead of Tokio, with code like this:
but its does not duplicate with just any old crate either. (I tried using an ancient demo crate of mine, add3, and that was not sufficient to witness the problem here. |
@pnkfelix did you make progress with your idea to use a wrapper script to get the rustc args? That seems like an easy way to avoid making this reliant on rustdoc (i.e make it possible to reproduce with an older version of cargo). |
I was indeed able to get wrapper script to print the rustc args. But since then, I've decided to focus on first reducing the MCVE further. E.g. at this point I've found that just an I want to try to minimize the set of crate dependencies first, and then I'll switch back to extracting the series of |
@rustbot label: +P-high -I-prioritize |
Okay, I've minimized it to a tiny fragment of rayon with no further dependencies beyond. I still need two distinct crates (and the rustdoc invocation), but that's mostly because I wanted to get to this point before I looked more into reducing that aspect of things. (Well, that, and the fact that its an LTO bug pretty much implies that an MCVE is going to require multiple crates.) Here's the repo where I've checkpointed my work: https://github.com/pnkfelix/issue-91671-a64-doctestfail |
Okay, I was successful in getting The result is that I'm going to keep investigating. It seems very likely to be some sort of issue with LLVM. |
The segfault appears to have been injected between nightly-2021-08-21 nightly-2021-08-22. |
Skimming over that, I'm betting this is connected to the LLVM 13 upgrade in #87570. |
Sweet! I generalized my repro.sh script to support cross-compilation, and now I can reproduce the seg fault on Linux x86_64! This means I get to use pernos.co to debug the segmentation fault! |
FYI. (I haven't made a local build with better debugging info yet. But the stack trace seems to solidly point to LLVM, as previously hypothesized.)
|
Rebuilt Rust with an LLVM Debug build (I always forget that its not enough to just edit the I hit the assertion:
Here is a pernos.co session for the bug. I'm going to spend a little while poking around in the Pernos.co session. If I don't find an obvious bug that way, then I will see if I can make a standalone LLVM test case. |
Part of the problem is that I could not readily tell if that Debug assert was related to the root cause of the bug in play here, or is an artifact of some other unrelated issue. So I made a quick hack and put this in: diff --git a/llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp b/llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp
index 06d827de2e96..aef9ba02b000 100644
--- a/llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp
+++ b/llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp
@@ -161,7 +161,9 @@ void CombinerHelper::applyCombineCopy(MachineInstr &MI) {
Register DstReg = MI.getOperand(0).getReg();
Register SrcReg = MI.getOperand(1).getReg();
MI.eraseFromParent();
- replaceRegWith(MRI, DstReg, SrcReg);
+ if (DstReg != SrcReg) {
+ replaceRegWith(MRI, DstReg, SrcReg);
+ }
}
bool CombinerHelper::tryCombineConcatVectors(MachineInstr &MI) { (this side-steps the assertion failure by not doing the After doing this, I again get a segfault from the resulting build of (Its still possible that the Assert does have the same root cause, or a related one. But I'm hoping I will have an easier time identifying the actual root cause of the segfault itself this way.) This is a link to a new pernos.co session (still being built as of this writing) that uses a binary with the LLVM diff above: |
Looks like this invocation of RegisterBankInfo::getRegBank from LLVM is returning nullptr, and that line immediately dereferences it. const RegisterBank &SrcRegBank = *RBI.getRegBank(SrcReg, MRI, TRI); Pernosco link (if that link makes you log into github, do so, and then go back and follow the link again; from my experience, the much of specific context is lost during the github login process.) |
Okay, newest update:
The problematic instruction seems to be this:
(I do not know what that syntax means, namely the |
I was just debugging rayon-rs/rayon#911 that also bisected to the cargo update in #89802, which led me to this issue. This is another case of rustdoc + LTO failure, but instead of a compiler crash I've got bad codegen that crashes at runtime. I'm hoping it's the same root cause in LLVM that we're both looking for, so maybe that different angle will help. |
Sweet, @nikic massively reduced the test case further (from LLVM's perspective) and has a candidate patch up; see progress over on llvm/llvm-project#53315 |
On the LLVM side, this was fixed by llvm/llvm-project@0d1308a i'm going to see about cherry-picking that into the rustc local fork of LLVM. |
(I'm not sure there's much value in encoding this specific reproducer as a regression test for rustc, since its so tightly coupled to LLVM-IR details... so I didn't include a regression test in the PR #93426. But I'm open to counter-arguments...) |
…, r=oli-obk refactor write_output_file to merge two invocation paths into one. this is a trivial refactor I did while I was investigating issue rust-lang#91671.
Update 2021-12-27: More minimized MCVE at repo here (also linked from comment below.)
I tried this code (note that the crate name needs to match its usage in the doc test to reproduce the bug properly):
And then ran
cargo --release --doc
.I expected to see this happen:
(Working behavior from 1.56.1, witnessed via nightly-2021-10-13)
Instead, this happened:
(Broken behavior from 1.57.0, witnessed via nightly-2021-10-14)
Meta
rustc +nightly-2021-10-14 --version --verbose
:rustc 1.57.0-nightly (dfc5add 2021-10-13)
binary: rustc
commit-hash: dfc5add
commit-date: 2021-10-13
host: aarch64-apple-darwin
release: 1.57.0-nightly
LLVM version: 13.0.0
rustc 1.57.0 (f1edd04 2021-11-29)
binary: rustc
commit-hash: f1edd04
commit-date: 2021-11-29
host: aarch64-apple-darwin
release: 1.57.0
LLVM version: 13.0.0
The text was updated successfully, but these errors were encountered: