Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compiling large projects with over-heating CPUs causes weird crashes #88902

Closed
Gelox opened this issue Sep 13, 2021 · 10 comments
Closed

Compiling large projects with over-heating CPUs causes weird crashes #88902

Gelox opened this issue Sep 13, 2021 · 10 comments
Labels
C-bug Category: This is a bug. I-ICE Issue: The compiler panicked, giving an Internal Compilation Error (ICE) ❄️ T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@Gelox
Copy link

Gelox commented Sep 13, 2021

When I'm compiling a large project (500+ dependencies) then I will experience strange crashes usually due to SIGSEGV 11, sometimes due to other things.
This will happen after running for a while when my CPU starts getting high in temperature. I have other issues that are caused by over-heating CPUs so I find this to be the likely cause.

After compiling about 150-300 dependencies then the compiler will crash with some errors message and then I'll have to recompile. Recompiling works fine, the already compiled dependencies will be in the cache so I'll redo it a couple of times and it works fine. Another piece of evidence that this is due to over-heating CPUs is that the longer I wait between recompiles the longer it is able to run without crashing.

Here is an error I got today caused by this issue.

Meta

rustc --version --verbose:

rustc 1.55.0 (c8dfcfe04 2021-09-06)
binary: rustc
commit-hash: c8dfcfe046a7680554bf4eb612bad840e7631c4b
commit-date: 2021-09-06
host: x86_64-unknown-linux-gnu
release: 1.55.0
LLVM version: 12.0.1

Error output

thread 'rustc' panicked at 'index out of bounds: the len is 971 but the index is 2132081228', compiler/rustc_span/src/hygiene.rs:389:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

error: internal compiler error: unexpected panic

note: the compiler unexpectedly panicked. this is a bug.

note: we would appreciate a bug report: https://github.com/rust-lang/rust/issues/new?labels=C-bug%2C+I-ICE%2C+T-compiler&template=ice.md

note: rustc 1.55.0 (c8dfcfe04 2021-09-06) running on x86_64-unknown-linux-gnu

note: compiler flags: -C embed-bitcode=no -C debuginfo=2 --crate-type lib

note: some of the compiler flags provided by cargo are hidden

query stack during panic:
end of query stack
error: could not compile `object`
warning: build failed, waiting for other jobs to finish...
error: build failed
Backtrace

/home/user/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-2d8919e595cbef4e.so(+0x52fca3)[0x7f1588c30ca3]
/usr/lib/libpthread.so.0(+0x13870)[0x7f1588353870]
error: could not compile `reqwest`

Caused by:
  process didn't exit successfully: `rustc --crate-name reqwest --edition=2018 /home/user/.cargo/registry/src/github.com-1ecc6299db9ec823/reqwest-0.10.10/src/lib.rs --error-format=json --json=diagnostic-rendered-ansi,artifacts --crate-type lib --emit=dep-info,metadata,link -C embed-bitcode=no -C debuginfo=2 --cfg 'feature="__tls"' --cfg 'feature="blocking"' --cfg 'feature="default"' --cfg 'feature="default-tls"' --cfg 'feature="hyper-tls"' --cfg 'feature="json"' --cfg 'feature="native-tls-crate"' --cfg 'feature="serde_json"' --cfg 'feature="tokio-tls"' -C metadata=df5ad68783142d89 -C extra-filename=-df5ad68783142d89 --out-dir <--removed--> -L dependency=<--removed-personal-info-->-b47fd69f7fd43a4e.rmeta --cap-lints allow` (signal: 11, SIGSEGV: invalid memory reference)
warning: build failed, waiting for other jobs to finish...
error: build failed

@Gelox Gelox added C-bug Category: This is a bug. I-ICE Issue: The compiler panicked, giving an Internal Compilation Error (ICE) ❄️ T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Sep 13, 2021
@nagisa
Copy link
Member

nagisa commented Sep 13, 2021

I'm not sure we can do anything about the CPU overheating on our end. That said, modern consumer CPUs should throttle and shutdown way before heat causes invalid computations to occur, so it seems like there may be other stability problem in there as well (e.g. memory overclock gone wrong?)

What CPU is it? Some, such as the 1st generation Ryzen are known to have a hardware bug that affects compilation workloads for instance.

@mohe2015
Copy link
Contributor

Couldn't you also try to run with less parallelism?

@Gelox
Copy link
Author

Gelox commented Sep 13, 2021

My CPU model is: Intel(R) Core(TM) i5-3570K CPU @ 3.40GHz

I can run with less parallelism, and I do when the build can not cache for whatever reason. That works fine and I have never experienced an error when running with --jobs 1.

This isn't a critical issue for me, mostly just annoying. I'm reporting it mostly because I wasn't sure if rustc was supposed to fail in this way when overheating since the errors seem like something went wrong.
If this is not something that can be solved then that's fine, at least people will be able to search for this issue if it happens to them and know the current thoughts on it.

@mohe2015
Copy link
Contributor

mohe2015 commented Sep 13, 2021

@Gelox Is your PC self-built or something? And does the fan spin? And how high temperatures are we talking about? You may also want to check for your distribution how/whether microcode patches are applied.

@the8472
Copy link
Member

the8472 commented Sep 13, 2021

Does it always crash in the same place? Or does it crash faster if you start compilation after the system has already been under load for some time?

Stability issues under load are not always due to temperature. They can also be due to other things like insufficient power supply or too aggressive performance settings in the BIOS.

But assuming this is really a hardware issue then there's not much rust can do here.

@Gelox
Copy link
Author

Gelox commented Sep 16, 2021

@Gelox Is your PC self-built or something? And does the fan spin? And how high temperatures are we talking about? You may also want to check for your distribution how/whether microcode patches are applied.

My PC is self built, the fan spins and the temperature is around 90 degrees C when the crash happens.

Does it always crash in the same place? Or does it crash faster if you start compilation after the system has already been under load for some time?

Stability issues under load are not always due to temperature. They can also be due to other things like insufficient power supply or too aggressive performance settings in the BIOS.

But assuming this is really a hardware issue then there's not much rust can do here.

It crashes faster if it's crashed soon before, it seems very tied to the CPU temperature. I don't know how to check the power requirement, perhaps the temperature is correlating, but I find it unlikely since the temperature is quite high.
I realize there isn't anything rustc could do to prevent compilation faulting, the reason I opened the issue was to perhaps have some form of graceful shutdown in this case, but I don't think it's very important and I don't know how easy it would be to get that.

I guess another concern I might have is if this could reasonably affect correctness in the compilation process or perhaps cause undefined behavior. I'm not sure what the implications of a non-graceful crash like this are.

However, if this does not alarm anyone then I think closing this issue is fine.

@mohe2015
Copy link
Contributor

@Gelox Is your PC self-built or something? And does the fan spin? And how high temperatures are we talking about? You may also want to check for your distribution how/whether microcode patches are applied.

My PC is self built, the fan spins and the temperature is around 90 degrees C when the crash happens.

I have literally no idea about this but quick googling suggest that 90°C is probably too high for longer periods (although miscomputations can probably happen anytime then). Maybe the heat spreader + thermal paste or so is not perfectly applied. Could of course also be some other problem like bad sensors / etc. You are not overclocked are you? Because that could definitely be a problem.

Does it always crash in the same place? Or does it crash faster if you start compilation after the system has already been under load for some time?
Stability issues under load are not always due to temperature. They can also be due to other things like insufficient power supply or too aggressive performance settings in the BIOS.
But assuming this is really a hardware issue then there's not much rust can do here.

I agree bad power supplies are a common issue with all kinds of strange behaviour.

It crashes faster if it's crashed soon before, it seems very tied to the CPU temperature. I don't know how to check the power requirement, perhaps the temperature is correlating, but I find it unlikely since the temperature is quite high.
I realize there isn't anything rustc could do to prevent compilation faulting, the reason I opened the issue was to perhaps have some form of graceful shutdown in this case, but I don't think it's very important and I don't know how easy it would be to get that.

Graceful shutdown should be more or less impossible. You could probably find out how to adapt thermal throttling for your operating system and set stricter values but I don't know if this is possible.

I guess another concern I might have is if this could reasonably affect correctness in the compilation process or perhaps cause undefined behavior. I'm not sure what the implications of a non-graceful crash like this are.

However, if this does not alarm anyone then I think closing this issue is fine.

I think there is nothing really to do here at rustc's side. If you would like more assurance in your compilations results (e.g. production build) reproducible builds on multiple machines may be a possibility but this is out of scope.

@the8472
Copy link
Member

the8472 commented Sep 16, 2021

My PC is self built, the fan spins and the temperature is around 90 degrees C when the crash happens.

Assuming you're looking at the junction temperature (there are multiple temperature sensors in a CPU): for the i5-3570K Tjmax is 105°C. That means at that point it would start downclocking. Emergency poweroff would only happen above that temperature.

So either you're looking at the wrong temperature or it's not yet close to the thermal limits at that point. Anyway, finding hardware faults is tricky, you might want to take that to a forum where people have more experience.

I realize there isn't anything rustc could do to prevent compilation faulting, the reason I opened the issue was to perhaps have some form of graceful shutdown in this case, but I don't think it's very important and I don't know how easy it would be to get that.

Generally we assume that the CPU is reliable. If it's miscalculating things during compilation there's no way to detect this until some invariants are violated at which point it'll panic or crash, the error could happen much earlier and only be detected later. So it's not practical to handle this gracefully.

I guess another concern I might have is if this could reasonably affect correctness in the compilation process or perhaps cause undefined behavior. I'm not sure what the implications of a non-graceful crash like this are.

That is possible if corruption makes it into the intermediate results stored on disk. Some parts of the output are hashed but hashes are only computed at the end so if an undetected error happens during computation and then gets persisted that'll lead to incorrect output. The only way do detect this is to have multiple independent compilation runs (without shared caches) and comparing the output. This requires deterministic builds. Rust supports that under some circumstances (#34902) but it can take some effort to setup, especially when you have so many dependencies some of which might use non-deterministic build scripts.

@Gelox
Copy link
Author

Gelox commented Oct 6, 2021

It seems this is not an issue of rustc and as such I will close this issue, thank you everyone for your answers! :)

@Gelox Gelox closed this as completed Oct 6, 2021
@manjaroman2
This comment was marked as a violation of GitHub Acceptable Use Policies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: This is a bug. I-ICE Issue: The compiler panicked, giving an Internal Compilation Error (ICE) ❄️ T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

5 participants