-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bit-for-bit deterministic / reproducible builds #34902
Comments
In general, being reproducible is something we're interested in; we try to tackle it bug by bug. |
This probably has some obvious cause:
If someone wants to look at the code generation differences, it's probably best to start with libcore. The reproducible-builds.org diff isn't that useful for that because it doesn't recognize .rlib files as archives. |
Thanks for the heads up, I just added AR support in diffoscope and hopefully the diff output will see that in the next few weeks, when the website updates. |
If it's a |
I tried replacing HashMap with FnvHashMap in
perhaps i'm Doing It Wrong. |
As background, HashMap in rust is non-deterministic to protect against certain types of DoS attack. You can switch it to the deterministic FnvHashMap if you're sure your code will always be called in a safe manner. This ought to be true for rustc itself. (I notice some online "try-it-yourself" rust web services let me run "ls /" and other shell commands, so I could also exploit this HashMap ordering issue, but I also assume that they're clever enough to set a ulimit and/or containerise the thing.) |
I wrote a small script to diff metadata (I couldn't really get the makefile to work). It looks like even more is changing between two compiles using the current master (or nightly): The crate hash and/or disambiguator, which are stored right after the target triple. @infinity0 can you confirm? EDIT: That might get fixed by #35854 |
Okay, apparently replacing |
The remaining issue is that (some?) predicates are encoded in non-deterministic order. Maybe we need to do this for all bounds? Not sure what would be the correct way to do that (or why these are nondeterministic in the first place). |
Just for reference, there's this: #34805 |
@michaelwoerister Thanks! Too tired to think about it currently. Will have a look tomorrow :) |
#24473 is rustdoc subset of this. |
Use `FnvHashMap` in more places * A step towards rust-lang#34902 (see my comments there) * More stable error messages in some places related to crate loading * Possible slight performance improvements since all `HashMap`s replaced had small keys where `FnvHashMap` should be faster (although I didn't measure) (likewise for `HashSet` -> `FnvHashSet`)
Steps towards reproducible builds cc #34902 Running `make dist` twice will result in a rustc tarball where only `librustc_back.so`, `librustc_llvm.so` and `librustc_trans.so` differ. Building `libstd` and `libcore` twice with the same compiler and flags produces identical artifacts. The third commit should close #24473
Steps towards reproducible builds cc #34902 Running `make dist` twice will result in a rustc tarball where only `librustc_back.so`, `librustc_llvm.so` and `librustc_trans.so` differ. Building `libstd` and `libcore` twice with the same compiler and flags produces identical artifacts. The third commit should close #24473
Potentially relevant, from IRC:
though I think that it's not a big issue unless the hashtable is truly random. |
Fix suggestion is changing that to be a |
In case this helps to motivate anyone, we got a successful reproduction on i386 on Debian testing! https://tests.reproducible-builds.org/debian/rb-pkg/testing/i386/rustc.html This may or may not be "an accident", we'll have to see what future tests show. Also on this arch/platform we are fixing the build path (just to see how it does). For other arch/platforms we vary the build path, and haven't seen rustc reproduce there yet. |
At least when building with debuginfo, the build path will show up in the resulting binaries as the |
In gcc (and clang), the |
@jmesmon No, we don't have an option like that. If you open an issue, I'd be happy to discuss it with the @rust-lang/tools team. |
@jgalenson awesome! I'm working on a project for BFT consensus-driven proofs of reproducible builds, and that'd make an amazing test case: https://github.com/iqlusioninc/synchronicity |
I was able to get reproducible builds of rustc with debuginfo by upgrading to a newer version of the compiler_builtins crate that contains rust-lang/compiler-builtins@ca423fe (so at least 0.1.20 should work, although I used 0.1.22). With that, I no longer needed my last remaining patch. I also had to use a newer C++ compiler that supports the That's a convoluted setup, but @infinity0, can you reproduce it? Your diff seemed to contain some other differences mine did not, so some things might remain. @tarcieri that seems pretty cool. Have you gotten it working yet? |
@jgalenson I'm already including that compiler-builtins tests.r-b.org results for rustc 1.39 are out the results are pretty similar to mine above - unreproducible including build paths relating to the |
I should mention that the differences include more that just C++ compiler output, which is what you seem to be suggesting is the last remaining thing, from your previous comment. My own testing and tests.r-b.org's testing indicates otherwise. At least the syn crate is embedding build paths. |
I investigated the gcc bug I mentioned. If I take a trivial .c file and compile it with This comes up because compiler-builtins includes a few .S files. This miscompilation then causes a number of other crates to be non-reproducible, including rustc_{driver,llvm} and cargo. Those all go away one I use Clang instead of gcc. I don't, however, see a difference in syn. The tests.r-b.org link you posted no longer works for me because it seems to fail to build. But if you can send me the config.toml you're using, I'll see if I can reproduce it myself. In addition, you can try using my config and adding in |
Filed https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93371 for the GCC bug. As a short-term workaround, compiler-builtins could pass both |
I didn't get around to looking into the above issues in more detail, but I just noticed that tests.r-b.org for rustc has been showing "reproducible" for 1.44.1 including under varying build paths (unstable, experimental)! 1.45.0 is in-progress and we should have results in just a few more days. |
We should use individual issues tagged A-reproducibility going forward. Closing. |
OK, well results for 1.45.0 say "unreproducible" again, and the auto-generated diff did not complete after 2 hours. Shall I open an issue for CI integration tests on this? If I understand correctly there are concerns about it taking up too much resources, or something. |
Yes, I think we should open a new issue for reproducible build CI. I also think it would be impractical for a while due to computational demand, but then I am not a member of infra team. |
It would be good if rustc could generate bit-for-bit reproducible results, even in the presence of minor system environment differences. Currently we have quite a large diff: e.g. see txt diff for 1.9.0 or perhaps txt diff for 1.10.0 a few days after I'm posting this. (You might want to "save link as" instead of displaying it directly in the browser.)
Much of the diff output is due to build-id differences, which can be ignored since they are caused by other deeper issues and will go away once these deeper issues are fixed. One example of a deeper issue is this:
Here are the system variations that might be causing these issues. I myself am not that familiar with ELF, but perhaps someone else here would know why the section header is 4 bytes later in the first vs second builds.
The text was updated successfully, but these errors were encountered: