-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Race condition when trying to compile a lib multiple times for different purposes. #5444
Comments
Thanks for tracking this down! This I think is the cause of the errors I saw yesterday, right? I may be reaching a different conclusion with this though, when I use your above example I get compliations that look like:
It looks like the bug here is that the library is never compiled in a linkable format with panic=unwind so the binary, when build as a benchmark or test, doesn't have a suitable library to link against. Now I'd definitely imagine that there's also some stomping over hard links going on here! I'm curious though what you think of the above? |
It is either this or #5445 ( What you're describing is the old behavior. With a build that includes #5384,
And because 1 and 2 are built at the same time, and both want to hardlink at the same time, that's what causes this problem. Can you describe what the use cases are for hardlinking libs? I'm guessing it's for integrating with other build systems? Are there other scenarios? Why are only workspace libs elevated (and not dependencies)? |
Aha right sorry my mistake, it looks like that hasn't yet made its way into the nightlies which is why I was confused! Currently yeah it's primarily for integrating into other build systems, but we should only link up the "primary" dependency which in this case is the one with panic=abort. I also think it's fine to actually skip hardlinking rlibs entirely, I don't think any build system integration happens at the rlib layer, only at other staticlib/executable/dylib layers. I think that'd fix the issue, right? If we stopped hardlinking rlibs? |
I think this may have caused this failure: #5456 (comment) |
+1 for hard linking less stuff. Note that #5203 specifically went through several hoops to copy only relevant files to the |
So before #5458 I'm able to reproduce a spurious failure via:
(basically reproduce #5456 (comment)) After that PR, however, I cannot reproduce a failure. Writing a test without that PR as well using the OP here it also doesn't seem to fail. Put another way, I'm not sure if this is still observable after #5458 since panic=unwind and panic=abort have different filenames rather than being placed into the same bucket (and racing). Now I'd definitely believe that the hard link is still nodeterministically being stomped over, but the severity of this issue is probably far less decreased after #5458. Mind confirming that though @ehuss? |
In my tests, other output types have the same problem. (dylib is particularly broken since it doesn't have a hash in the filename.) Another idea I was thinking about is only hardlinking the top-level targets (essentially the targets created in EDIT: Just read your comments, I'm a little behind. |
@alexcrichton I'm still able to repro the example up top with #5458. 5458 should only address #5445. It maybe happens about 10% of the time on my particular system. |
@ehuss and to confirm, by reproduce you mean that it fails spuriously? |
@alexcrichton yes |
Hm, could this failure be related? I think hardlink stomping happens for binaries as well, because, IIRC, panic flag affects them as well? |
I don't think so. This issue would only appear when it attempts to compile the same lib at the same time with panic set. |
This changes it so that only top-level targets requested on the command-line will be included in the output directory. Dependencies are no longer included. Fixes rust-lang#5444.
Be more conservative about which files are linked to the output dir. This changes it so that only top-level targets requested on the command-line will be included in the output directory. Dependencies are no longer included. Fixes #5444.
Check for duplicate output filenames. This generates an error if it detects that different units would output the same file name. This can happen in a few situations: - Multiple targets in a workspace use the same name. - `--out-dir` in some cases (like `examples` because it does not include the directory). - Bugs in cargo (like #5524, #5444) This includes a few cleanups/consolidations: - `export_path` added to `OutputFile` so there is one place where the `--out-dir` filename logic is located. - `TargetKind::description()` added for a simple way to get a human-readable name of a target kind. - The `PartialOrd` changes are a slight performance improvement (overall things are now slightly faster even though it is doing more work). Closes #5524, closes #6293.
As part of #5384 I created some circumstances where the same lib may be built concurrently with different settings. I thought this would be safe since they have different hashes in the filename, but I neglected to address this case when the library is hardlinked. There is a race condition in
hardlink_or_copy
where one compilation will stomp on another via the hardlink.Details for the curious.
This can happen in a variety of situations (mainly with
panic
), but here is a small repro:I'm trying to think of a fix, but I haven't come up with anything, yet. Generally I'm thinking that these "non-panic" dependencies should not be hard linked at all, but there's no obvious way to detect when a
Unit
is one of these. We could add a flag toUnit
to detect this scenario, but that would require adding some extra de-duplication logic and generally seems a little too complex.I don't think it is worth it to try to fix the race condition itself since that is very tricky, and would end up with a random lib in the output directory anyway.
I'll keep trying to think of a solution, but if anyone has any thoughts I'd be happy to hear.
The text was updated successfully, but these errors were encountered: