Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rust 1.43 fails to compile itself #71502

Closed
thomasjfox opened this issue Apr 24, 2020 · 24 comments · Fixed by #71782
Closed

Rust 1.43 fails to compile itself #71502

thomasjfox opened this issue Apr 24, 2020 · 24 comments · Fixed by #71782
Labels
C-bug Category: This is a bug.

Comments

@thomasjfox
Copy link

Hi all,

my usual test: Can Rust compile itself after a compiler upgrade.
Rust 1.43 dies at this stage:

Building stage0 compiler artifacts (x86_64-unknown-linux-gnu -> x86_64-unknown-linux-gnu)
running: "/datastore/dev/rust/bin/cargo" "build" "--target" "x86_64-unknown-linux-gnu" "-Zbinary-dep-depinfo" "-j" "32" "--release" "--frozen" "--features" " llvm" "--manifest-path" "/datastore/rpmbuild/BUILD/rustc-1.43.0-src/src/rustc/Cargo.toml" "--message-format" "json-render-diagnostics"
error: failed to run `rustc` to learn about target-specific information

Caused by:
  process didn't exit successfully: `/datastore/rpmbuild/BUILD/rustc-1.43.0-src/build/bootstrap/debug/rustc - --crate-name ___ --print=file-names --crate-type bin --crate-type rlib --crate-type dylib --crate-type cdylib --crate-type staticlib --crate-type proc-macro --print=sysroot --print=cfg` (exit code: 1)

--- stderr
error: character literal may only contain one codepoint
 --> <anon>:3:35
  |
3 |          messages will be sent to 'stderr'.
  |                                   ^^^^^^^^
  |
help: if you meant to write a `str` literal, use double quotes
  |
3 |          messages will be sent to "stderr".
  |                                   ^^^^^^^^

error: character literal may only contain one codepoint
 --> <anon>:6:35
  |
6 |          messages will be sent to 'stderr'.
  |                                   ^^^^^^^^
  |
help: if you meant to write a `str` literal, use double quotes
  |
6 |          messages will be sent to "stderr".
  |                                   ^^^^^^^^

error: unterminated character literal
 --> <anon>:9:20
  |
9 | FATAL: Cannot open '/dev/null' for writing.
  |                    ^

error: aborting due to 3 previous errors


command did not execute successfully: "/datastore/dev/rust/bin/cargo" "build" "--target" "x86_64-unknown-linux-gnu" "-Zbinary-dep-depinfo" "-j" "32" "--release" "--frozen" "--features" " llvm" "--manifest-path" "/datastore/rpmbuild/BUILD/rustc-1.43.0-src/src/rustc/Cargo.toml" "--message-format" "json-render-diagnostics"
expected success, got: exit code: 101

I didn't identify the problematic code yet, I just wanted to create a tracking issue first.
Rust 1.43 can compile other crates fine, I quickly tested rust-zip or rust-askama.

@thomasjfox thomasjfox added the C-bug Category: This is a bug. label Apr 24, 2020
@jonas-schievink
Copy link
Contributor

Do you need this for anything? This is not a supported configuration

@thomasjfox
Copy link
Author

Do you need this for anything? This is not a supported configuration

yes, Linux distributions often need to recompile a (compiler) package with some patches applied or after a major LLVM / glibc / other major update.

See #69953 for the discussion on the Rust 1.42.0 release.

@infinity0
Copy link
Contributor

This is not a supported configuration

Really? I thought this has been an attempted guarantee since years ago. (Even though it was broken for various previous releases.) Debian certainly needs this guarantee and I will have to figure out how to fix this when packaging 1.43 for Debian.

This one seems like a new case not similar to the previous cases, gah.

@jonas-schievink
Copy link
Contributor

Well, we don't test this at all, so it's not surprising that it breaks almost every release. Why can't Debian just use the previous release to bootstrap?

@infinity0
Copy link
Contributor

We do, but the porters (for tier-2 architectures like riscv64) and other people occasionally need to rebuild stuff afterwards, when the old version is no longer available (has been moved to archives).

@Mark-Simulacrum
Copy link
Member

It's not really unsupported so much as not tested. I would ask that if Debian (or others) want this to work we find out before the stable release -- ideally when beta is cut, or earlier -- that way we can do much more to fix it.

I'll try to take a look at this specific failure soon and see if there's a simple patch.

@est31
Copy link
Member

est31 commented Apr 24, 2020

Bootstrapping rustc twice would make merging PRs much slower than it's today. However, maybe one could add it only to the beta branch's CI. Or, alternatively, add rustc to crater. The "old state" would serve as check whether it can bootstrap normally, the "new state" would check whether it can bootstrap itself. There's a proposal to add servo already: rust-lang/crater#133

@Mark-Simulacrum
Copy link
Member

I don't think there's been any proposal that we test this in PR CI? To be clear, I don't currently believe that's something we want to do, though we'd generally be willing to accept PRs that make self-bootstrapping possible, which is why I suggested testing on beta releases (instead of stable releases, where we can't do much so all patches must be downstream).

@est31
Copy link
Member

est31 commented Apr 24, 2020

I don't think there's been any proposal that we test this in PR CI?

It wasn't in response to any proposal or such, just about how to stop regressions like this from happening. It has been mentioned in the thread that it's not being tested. So naturally one would wonder how the testing would look like so I wrote down my thoughts.

Also, yes, optimally the users would test beta releases (question to @thomasjfox have you tested 1.44 beta yet?), and I've filed a pr for rav1e today to do precisely this, but the entire premise of CI as well as crater is to be proactive about breakage.

@infinity0
Copy link
Contributor

Understood, I will make some more effort to test the beta releases! We had done so a few times in the past but not really made it a regular part of the process.

@infinity0
Copy link
Contributor

I don't think there's been any proposal that we test this in PR CI?

It wasn't in response to any proposal or such, just about how to stop regressions like this from happening.

I agree it would be too onerous for regular PRs, but perhaps it could be done when anointing nightly as beta, and then on all PRs to beta, that sounds not too expensive.

@cuviper
Copy link
Member

cuviper commented Apr 27, 2020

@thomasjfox

I didn't identify the problematic code yet, I just wanted to create a tracking issue first.

Have you found this yet? Maybe something wrong with your /dev/null? I also don't see "messages will be sent to 'stderr'" anywhere in the Rust sources, but googling indicates this often comes from syslog. I don't know why that would be fed into your rustc stdin though...

FWIW, the automatic rebuild in Fedora rawhide was just fine:
https://koji.fedoraproject.org/koji/taskinfo?taskID=43860983

@thomasjfox
Copy link
Author

Yesterday evening I continued with the investigation. I managed to recompile the Fedora 31 source rpm on my Fedora 31 workstation. So it must be something triggered by my "custom distro" setup.

The plan for today is to limit the number of parallel build jobs to one and run "forkstat" in the background. I want to isolate the exact command invocation that triggers the error.

The /dev/null device should be fine, the build process is not running inside a container or chroot.

@thomasjfox
Copy link
Author

Have you found this yet? Maybe something wrong with your /dev/null?

I think we might be onto something here. After I had another failed Rust 1.43.0 re-compile attempt, I noticed /dev/null was broken:

# ll /dev/null 
-rw-r--r-- 1 root root 0 Apr 28 18:14 /dev/null

(tmux crashed on startup, that's how I noticed)

I've rebooted the devbox, /dev/null is fine again and I will now compile the previous Rust 1.42.0 twice. If /dev/null stays fine, I'll re-try with Rust 1.43.0 and see if /dev/null gets replaced.

@thomasjfox
Copy link
Author

the only remotely related change to /dev/null handling from 1.42.0 to 1.43.0 is this c2bbe33
in src/libstd/sys/unix/process/process_common.rs. It looks totally unsuspicious to me.

@cuviper
Copy link
Member

cuviper commented Apr 28, 2020

That kind of error usually occurs when one uses something like -o /dev/null, versus redirecting >/dev/null, since -o usually writes temp names first and then renames over the target. I did find this in one test, new in 1.42 via #67458:

// compile-flags: -o /dev/null

@cuviper
Copy link
Member

cuviper commented Apr 28, 2020

When I run that test manually as root, I do indeed get a clobbered /dev/null:

# rustc -o /dev/null src/test/ui/non-ice-error-on-worker-io-fail.rs
warning: unused import: `crate::task::Waker`
  --> src/test/ui/non-ice-error-on-worker-io-fail.rs:34:13
   |
34 |         use crate::task::Waker;
   |             ^^^^^^^^^^^^^^^^^^
   |
   = note: `#[warn(unused_imports)]` on by default

# ls -l /dev/null
-rw-r--r--. 1 root root 3562 Apr 28 09:48 /dev/null

@Mark-Simulacrum
Copy link
Member

Running tests as root is indeed very trusting :)

But it seems like in this case we should probably write to e.g. /dev/rust-nonexistent-path instead of null to avoid the accidental breakage? Alternatively maybe we can move this test to a make test and chmod our permissions away...

@cuviper
Copy link
Member

cuviper commented Apr 28, 2020

I don't know if a nonexistent path will exercise the intended failure mode, but a simple chmod is insufficient to stop root.

@thomasjfox
Copy link
Author

There's the immutable xattr, even root can't delete those files. This is how Android malware persists across factory resets:
https://arstechnica.com/information-technology/2020/04/solved-how-android-backdoor-called-xhelper-survives-factory-resets/

But requiring filesystem xattrs just for one unit test is a no no.
Another option would be a read only bind mount, but again, the setup is overkill.

May be a read only place in proc like /proc/version would do:

[root@host ~]# rm -f /proc/version 
rm: cannot remove '/proc/version': Operation not permitted

[root@host ~]# echo "test" >>/proc/version 
-bash: echo: write error: Input/output error

I've started a build without the unit test in question for the night, results will be in tomorrow.

@thomasjfox
Copy link
Author

I've started a build without the unit test in question for the night, results will be in tomorrow.

with the mentioned unit test disabled, I can finally build Rust 1.43.0 with itself. Rustception all the way, thanks!

I've verified again it didn't blow up with Rust 1.42.0, even though the unit test was part of it already.
May be the way rustc handles output file creation changed in 1.43.0, so these two in combination led to failure.

I've tested the /proc/version approach:

[root@host]# rustc -o /proc/version non-ice-error-on-worker-io-fail.rs 
warning: unused import: `crate::task::Waker`
  --> non-ice-error-on-worker-io-fail.rs:34:13
   |
34 |         use crate::task::Waker;
   |             ^^^^^^^^^^^^^^^^^^
   |
   = note: `#[warn(unused_imports)]` on by default

error: failed to write bytecode to /proc/version.non_ice_error_on_worker_io_fail.3a1fbbbh-cgu.0.rcgu.bc.z: No such file or directory (os error 2)

error: aborting due to previous error

Using an existing directory as output file gives a different error message:

[root@host]# rustc -o existing-dir-instead-of-file non-ice-error-on-worker-io-fail.rs 
error: the generated executable for the input file "non-ice-error-on-worker-io-fail.rs" conflicts with the existing directory "existing-dir-instead-of-file"

error: aborting due to previous error

@cuviper
Copy link
Member

cuviper commented Apr 29, 2020

I think -o /does-not-exist/output will be fine, even though it's a different IO error. It still triggers the old ICE on 1.41.0, and is caught as a proper fatal error on 1.42+, even as root.

@Mark-Simulacrum
Copy link
Member

I'd be happy to review a PR here, FWIW, it sounds like it should be a pretty simple update to the UI test.

@thomasjfox
Copy link
Author

Just wanted to say thank you for the quick fix and happy birthday to Rust!

Also Rust 1.43.1 compiled itself without problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: This is a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants