-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not use SIMD instructions on i686 #31110
Conversation
SIMD instructions are not available on all of i686 processors and cause programs to terminate on illegal instruction on older processors.
(rust_highfive has picked a reviewer for you, use r? to override) |
Clearly you weren't willing to understand the issue and went ahead with this blunt PR. Even though it was I who kept bugging the team about ISA compatibility, I never dreamed of getting everyone off an I'll address the purely technical part then:
|
While "i686" is the wrong name for the target currently so-called (32 bit but with reasonably modern instruction set extensions), that target is important and should be the default. SSE2 is very wildly available and has several benefits (smaller and more efficient copying of mid-sized structs, more predictable and generally more accurate float arithmetic, the ability to benefit from autovectorization, and possibly more that I'm forgetting). Certainly a pre-P4 target should exist, but I don't think it should be the default. Perhaps the current "i686" target should be renamed, but this is tricky and will require a long grace window before flipping the switch on the old ("i686") name. |
@brson In conjunction with adjustable |
@petevine I would be very surprised if LLVM changed the definition of i686. Let me state it again: LLVM/Clang does not assume that i686 is What is the problem with a non-SSE2 compiler/library? @rkruppe In #14441 I also suggested an alternative, that is splitting the 32-bits x86 targets in |
Hell, what fun arguing the opposite! There were no erroneous assumptions and no clang factor so you've created a strawman. Theirs was a conscious decision on the rust team's part, one that I personally didn't like for the sole reason of not being able to build from source on older machines (stage0 snapshots are P4 too). And the fact neither the downloads page, nor the rust build system warn you about it! |
@petevine I find it much more exhausting to read your rather heated replies than arguing this topic, and I don't like the topic very much. I have no moderation duties or powers, I'm just asking as another lowly contributor: Please be more charitable. @ranma42 As you point out, there is no serious evaluation of the performance (that I'm aware of). But I do see good reasons to assume there will be measurable performance impact on several kinds of code for reasons outlined above. The reason I am suggesting a long-ish grace window (by that I mean a release cycle or two) is twofold:
|
I don't feel qualified to review or not review this PR. @alexcrichton or @brson seem like more logical choices. Would any of you like to volunteer? |
@rkruppe The worry about possible performance regression is justified (most vectorisation opportunities would be lost). In general purpose code (like rustc) I would expect minor changes, but vectorisation-intensive libraries (something like BLAS) would probably show some changes. I will try to do some benchmarking on the rust build itself (compiler + libraries) and on the shootout (and I would be willing to try other benchmarks, if anybody can suggest some that would be particularly significant). I would expect that this change should not have a visible fallout except on tools relying on the specific opcodes emitted by rustc... but it's better safe than sorry, so this is definitely a change whose impact I would like to discuss, evaluate and test extensively before it is applied. |
Thanks for the PR @ranma42. This is a tricky question that keeps coming up. There are a few factors at play that I'm aware of.
One thing I'm not clear on: Distros use compatible i686 code for the binaries they distribute but does gcc emit better code when run by users? @anguslee @sylvestre how is Debian now getting around this problem of rustc i686-unknown-linux-gnu using the wrong defaults for your system? Regardless of whether we change this triple it seems the need to tweak cpu settings comes up often enough that there should be a more convenient way to do it from Cargo (cc @alexcrichton). cc @dotdash since you touched this last. |
@brson yeah I definitely agree that this sort of minor configuration tweak comes up quite often. I think that your I've never really known what |
Regarding |
I put some of the outputs of GCC and Clang (versions 4.8.4 and 3.4, from an Ubuntu LTS 14.04.3 VM) for different options in this gist (nb: the versions I tested are quite old, it might make sense to check the latest gcc and Clang compilers, but I had this VM ready for testing) Clang defaults to generating code for more modern processors, but with GCC seems more conservative about the instruction set used by default (no SIMD) and even with @brson "Does gcc emit better code when run by users?" |
AFAICT, rustc behaves just like clang. Using The observation in #14441 that the LLVM binaries don't use SSE instructions likely stems from the fact that LLVM was compiled with gcc, which does default to i686 compatibility. Building LLVM with clang results in code that does use SSE instructions. Also, on my Debian system, and and libc++ from the i386 pool contains SSE instructions, so a rustc built against that wouldn't work on a non-SSE machine either. Given all that, I think a step in the right direction here would be to have a less obscure way than using Whether or not we want to provide a "true" i686 rustc built, either as the default or in addition to what we have now, I don't know. If someone could do some benchmarks, that would be nice. If nobody volunteers, maybe I can do it sometime next week. I think it would be mostly interesting to have rustc bootstrapped for i686, but the benchmark built with a P4 target CPU, to actually see how much performance is lost by having the distribution target older CPUs as a baseline. @petevine you said that you have a solution ready, could you elaborate on that? I probably missed a number of details here. |
@dotdash I wasn't going to bother but yours is the first fully competent post in this thread so I must oblige! The solution, considering there are probably fewer than a dozen people still using non-SSE2 machines and Rust (myself included), should be a source-only one:
Apart from that, even if the status quo remains, the downloads page should make it clear SSE2 is required and in case someone starts a naive source build, the snapshot should get tested immediately and not after 5 hours of building LLVM. I tried Golang not long ago and that's how you bootstrap an i386 ( |
When @pnkfelix was calling you out for being rude, this is what he was talking about. I know that you are frustrated. Please stop taking it out on others. It's not appropriate here. |
You're probably right - the jab wasn't necessary. Apologies everyone! Once again, great post @dotdash! and @nikomatsakis, that's some wisdom straight from Pirkei Avot! |
@dotdash I confirm that rebuilding Rust (and its LLVM repo) on the same machine with Clang results in LLVM binaries which use SSE. We might want to ensure that bootstrapping from gcc and Clang results in equivalent binaries (from the point of view of target instruction set). As you mentioned, it would be very convenient if there was a way to pass the appropriate flags to all of the components. It would make it easier to support older targets, but it would also be useful if somebody wanted to sacrifice compatibility in order to make use of the newest operations available on its machine (basically by building everything with I installed Debian jessie i386 the libc++1 package includes SIMD instructions, but other binaries, including libstdc++ and Clang seem to use x87 instructions and no SIMD operation at all. I wonder if this is intentional or just a consequence of the fact that the libc++1 package is built with clang. |
Isn't this not just a problem of bootstrapping though? The compiled rustc is going to proceed to output unusable binaries when used afterward. |
Presumably this would actually permanently override the target spec for i686 targets, so that any time you compiled with one one would get no SSE2 instructions.
@dotdash I think there is a difference though, in that rustc understands target-triples (I'm guessing gcc/clang don't accept them directly but not sure), and when you pass an 'i686' triple to rustc you might reasonably expect it to behave like |
@brson Indeed, the idea is to leave SSE (1) on the table for the most usable cpus, namely P3's and Athlons, without having to resort to BTW, the fastest possible 32-bit code (even on a P4) would probably be achieved this way in LLVM:
|
The compiled rustc will produce usable binaries when used with an appropriate
Using I'm not saying that this is necessarily the right way to do things, but given how old those CPUs are, defaulting to a more "recent" set of features and requiring developers targetting old hardware to explicitly specify their target seems like a reasonable choice to me. |
I suggest we do this:
|
@alexcrichton Why just @brson I am afraid that such naming might be misleading. If I understood it correctly, you are suggesting that the I think the Clang defaults are not particularly good choices and I would rather use the more conservative (and compatible) defaults of the GNU compilers/toolchain. For the record, even Clang follows the gcc conventions in some cases, such as Android, while In addition to compatibility concerns, keeping If GNU conventions (use most generic CPU for target arch) are considered impractical, I would try to go for the ones proposed by Clang. If changing the existing targets is unfeasible, neither GNU nor Clang triples can be used and we are bound to define a new (and incompatible, In any case, I would at least try to ensure that a way to build a non-SSE version (or whatever is needed by distros) is well-known and tested. This is desirable anyway if distros start packaging Additionally, I would love if there was some more information about the behaviour of |
@alexcrichton @brson Once there, to have the ecosystem ready, a few additions like this one alexcrichton/curl-rust@a1e76ec will be necessary. From my experience with the new |
@ranma42 OK, good points. Is there another solution you like other than removing sse from the i686 targets that allows people to create compilers to target true i686es without patching the source?
@petevine How does mod.rs need to be updated, and which mod.rs? Right now @alexcrichton and I do not want to change the code generation of snapshots. Mostly because we're moving away from snapshots for bootstrapping and toward official releases. We're hoping that those that need these compilers, like distros, will be ok with building them themselves from a machine that can run sse. |
@brson There's definitely going to be no problem/slowdown using the |
@ranma42 I was thinking that for now we can probably just add From what you're saying, though, the difference of Also, with regards to mirroring clang and where all this came from, you're definitely more than welcome to add some documentation! I suspect the workflow for the initial integration of this change look like:
Which I think may help explain why our defaults may differ from Clang in a few places (but they probably shouldn't). Does that make sense? |
@petevine yes to officially support a new triple like this we would need to produce both nightly and snapshot compilers, but we currently don't produce nightlies beyond tier 1 platforms (which this wouldn't be initially), so we would probably support community-bulit snapshots/nightlies in the near future for any new target added. |
@alexcrichton |
Sure :) I like @dotdash suggestion of having a unified way (configure argument?) to pass the appropriate flags when building each component. Actually, it looks like a good idea independently from this issue.
Among those supported by rust,
The data for clang has been collected running
I will start by adding comments in the code which provide the same information I collected in the table above. Providing this information to users through something like "-###" would be awesome, but it certainly involves much more significant changes.
Yes, that looks like what happened. |
I'd be fine merging a PR to align all our targets with whatever the Clang default are (e.g. fix the discrepancies you've found here). Unfortunately that still doesn't quite solve the inital problem in this PR (there's no target for no-sse instructions), but I guess if we soup up the build system somehow we could fix that. |
Closing as the actionable items are now handled in other PRs:
|
Hi @dotdash - in reply to this:
I've attempted exporting Note: that was using LLVM during the build (you mention gcc, although I would hope that the stage1 rustc can be built with either gcc or llvm?). Could you double-check whether |
@jayaddison You probably also need to set CFLAGS/CXXFLAGS so that clang also targets the right architecture. Sorry, it's been a while since I worked on this and at the moment, I don't have the time to look into this in detail. |
@dotdash That's OK, and I'll check the compiler flags - thank you for the quick response. |
Possibly adding noise, but for the record: I've been able to produce the build results I was looking for by reducing the (Debian itself has previously patched that down to |
SIMD instructions are not available on all of i686 processors and
cause programs to terminate on illegal instruction on older
processors.
Clang defaults to compiling for
pentium4
when targeting a generic 32 bits x86 architecture. This was used as a precedent for rustc, but I believe some details were missed when doing so:I think we should ensure that it is actually possible to target plain i686 with rust.
I expect this to come up with distros which target i686.
This should fix #14441 (so far I only verified with objdump that SIMD instructions are not used).