-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compiling large projects with over-heating CPUs causes weird crashes #88902
Comments
I'm not sure we can do anything about the CPU overheating on our end. That said, modern consumer CPUs should throttle and shutdown way before heat causes invalid computations to occur, so it seems like there may be other stability problem in there as well (e.g. memory overclock gone wrong?) What CPU is it? Some, such as the 1st generation Ryzen are known to have a hardware bug that affects compilation workloads for instance. |
Couldn't you also try to run with less parallelism? |
My CPU model is: Intel(R) Core(TM) i5-3570K CPU @ 3.40GHz I can run with less parallelism, and I do when the build can not cache for whatever reason. That works fine and I have never experienced an error when running with This isn't a critical issue for me, mostly just annoying. I'm reporting it mostly because I wasn't sure if rustc was supposed to fail in this way when overheating since the errors seem like something went wrong. |
@Gelox Is your PC self-built or something? And does the fan spin? And how high temperatures are we talking about? You may also want to check for your distribution how/whether microcode patches are applied. |
Does it always crash in the same place? Or does it crash faster if you start compilation after the system has already been under load for some time? Stability issues under load are not always due to temperature. They can also be due to other things like insufficient power supply or too aggressive performance settings in the BIOS. But assuming this is really a hardware issue then there's not much rust can do here. |
My PC is self built, the fan spins and the temperature is around 90 degrees C when the crash happens.
It crashes faster if it's crashed soon before, it seems very tied to the CPU temperature. I don't know how to check the power requirement, perhaps the temperature is correlating, but I find it unlikely since the temperature is quite high. I guess another concern I might have is if this could reasonably affect correctness in the compilation process or perhaps cause undefined behavior. I'm not sure what the implications of a non-graceful crash like this are. However, if this does not alarm anyone then I think closing this issue is fine. |
I have literally no idea about this but quick googling suggest that 90°C is probably too high for longer periods (although miscomputations can probably happen anytime then). Maybe the heat spreader + thermal paste or so is not perfectly applied. Could of course also be some other problem like bad sensors / etc. You are not overclocked are you? Because that could definitely be a problem.
I agree bad power supplies are a common issue with all kinds of strange behaviour.
Graceful shutdown should be more or less impossible. You could probably find out how to adapt thermal throttling for your operating system and set stricter values but I don't know if this is possible.
I think there is nothing really to do here at rustc's side. If you would like more assurance in your compilations results (e.g. production build) reproducible builds on multiple machines may be a possibility but this is out of scope. |
Assuming you're looking at the junction temperature (there are multiple temperature sensors in a CPU): for the i5-3570K Tjmax is 105°C. That means at that point it would start downclocking. Emergency poweroff would only happen above that temperature. So either you're looking at the wrong temperature or it's not yet close to the thermal limits at that point. Anyway, finding hardware faults is tricky, you might want to take that to a forum where people have more experience.
Generally we assume that the CPU is reliable. If it's miscalculating things during compilation there's no way to detect this until some invariants are violated at which point it'll panic or crash, the error could happen much earlier and only be detected later. So it's not practical to handle this gracefully.
That is possible if corruption makes it into the intermediate results stored on disk. Some parts of the output are hashed but hashes are only computed at the end so if an undetected error happens during computation and then gets persisted that'll lead to incorrect output. The only way do detect this is to have multiple independent compilation runs (without shared caches) and comparing the output. This requires deterministic builds. Rust supports that under some circumstances (#34902) but it can take some effort to setup, especially when you have so many dependencies some of which might use non-deterministic build scripts. |
It seems this is not an issue of rustc and as such I will close this issue, thank you everyone for your answers! :) |
When I'm compiling a large project (500+ dependencies) then I will experience strange crashes usually due to SIGSEGV 11, sometimes due to other things.
This will happen after running for a while when my CPU starts getting high in temperature. I have other issues that are caused by over-heating CPUs so I find this to be the likely cause.
After compiling about 150-300 dependencies then the compiler will crash with some errors message and then I'll have to recompile. Recompiling works fine, the already compiled dependencies will be in the cache so I'll redo it a couple of times and it works fine. Another piece of evidence that this is due to over-heating CPUs is that the longer I wait between recompiles the longer it is able to run without crashing.
Here is an error I got today caused by this issue.
Meta
rustc --version --verbose
:Error output
Backtrace
The text was updated successfully, but these errors were encountered: