-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stable SIMD in Rust #2325
Stable SIMD in Rust #2325
Conversation
The purpose of this RFC is to provide a framework for SIMD to be used on stable Rust. It proposes stabilizing x86-specific vendor intrinsics, but includes the scaffolding for other platforms as well as a future portable SIMD design (to be fleshed out in another RFC).
Awesome work, @alexcrichton! And thanks go to @BurntSushi and @gnzlbg for a ton of work as well. One minor question: it's implied, but not really stated (that I saw) that the |
Seconded, this is extremely exciting!
Please excuse the possibly extremely under-informed question here: are these intrinsics inherent to the platform, or to the company? That is, are there Intel intrinsics that AMD doesn't offer, and vice-versa? If so, then My current understanding is that this is vendor-specific, but I could be wrong. The above is how I'd think about it, though. |
Yes, there are. Some examples would be SSE4a, ABM, and TBM. These intrinsics are exposed by |
text/0000-stable-simd.md
Outdated
``` | ||
|
||
When [inspecting the assembly][asm1] you notice that rustc is making use of the | ||
`%xmmN` registers which you've read is related to SSE on your CPU. You know, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: xmm or ymm?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
xmm. ymm are 256 bit registers, and in this particular example, there are no target features enabled, so on x86_64 it will be limited to SSE2 instructions, which use 128 bit registers (xmm).
The example below enables the avx2
feature, which permits AVX2 instructions, which use 256 bit registers (ymm).
text/0000-stable-simd.md
Outdated
present on one platform may not be present on another. | ||
|
||
The contents of the `vendor` modules are defined by, well, vendors! For example | ||
Intel has an [intrinsics guide][intr-guide] which will serve as a guideline for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
link target is missing here
text/0000-stable-simd.md
Outdated
|
||
For example most Intel intrinsics start with `_mm_` or `_mm256_` for 128 and | ||
256-bit registers. While perhaps unergonomic, we'll be sticking to what Intel | ||
says. Note that all intrinsics will also be `unsafe`, according to [RFC 2045]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
link target is missing here
text/0000-stable-simd.md
Outdated
|
||
There are a number of intrinsics on x86 (and other) platforms that require their | ||
arguments to be constants rather than decided at runtime. For example | ||
[`_mm_insert_pi16`] requires its third argument to be a constant value where |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this isn't linked, i'm not sure if it was intended to
text/0000-stable-simd.md
Outdated
|
||
Over the years quite a few iterations have happened for SIMD in Rust. This RFC | ||
draws from as many of those as it can and attempts to strike a balance between | ||
exposing functionality whiel still allowing us to implement everything in a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: while
text/0000-stable-simd.md
Outdated
SIMD arguments are passed across boundaries and whatnot. | ||
|
||
Again though, note that this section is largely an implementation detail of SIMD | ||
in Rust today, though it's enabling the usage Effortsfectively without a lot of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo
@aturon and @steveklabnik excellent questions! I think @steveklabnik's point though hit the nail on the head in that I think we should rename to @steveklabnik you're also right though in that, for example, our x86 and x86_64 I think that pushes me towards |
|
text/0000-stable-simd.md
Outdated
[rfc2212]: https://github.com/rust-lang/rfcs/pull/2212 | ||
|
||
```rust | ||
pub unsafe fn foo(a: &[u8], b: &[u8], c: &mut [u8]) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this function supposed to be still unsafe?
Why would this function be "unsafe" (as marked)? pub unsafe fn foo(a: &[u8], b: &[u8], c: &mut [u8]) {
// Note that this `unsafe` block is safe because we're testing
// that the `avx2` feature is indeed available on our CPU.
if cfg_feature_enabled!("avx2") {
unsafe { foo_avx2(a, b, c) }
} else {
foo_fallback(a, b, c)
}
} Is it not the case that this function is "safe" because you've wrapped the call to an unsafe function in and "unsafe" block (with a check to ensure the contract is met) and that the "foo_fallback" would be a "safe" function (not relying on SIMD/intrinsics, but, a plain-old CPU implementation of the necessary functionality)? |
@gbutler69 gah oops! Looks like you and @fbstj found the same error at the same time, that was just a mistake on my part! In that section the |
Also, does: unsafe fn foo_fallback(a: &[u8], b: &[u8], c: &mut [u8]) {
for ((a, b), c) in a.iter().zip(b).zip(c) {
*c = *a + *b;
}
} need to be marked as "unsafe"? I would think not. |
@gbutler69 correct! That was also erroneously tagged as |
text/0000-stable-simd.md
Outdated
via: | ||
|
||
```rust | ||
#[cfg(target_feature = "avx)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing "
text/0000-stable-simd.md
Outdated
// implementation that can use `avx` | ||
} | ||
|
||
#[cfg(not(target_feature = "avx))] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing "
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alexcrichton - Good to know that I'm actually starting to understand Rust code from reading these RFC's. I really appreciate all the work that those like you put into these RFC's for those of us still learning Rust.
I think there may be a learnability problem by putting these into libstd. The issue is that users are not supposed to directly use those intrinsics since they are untyped (using __m128i instead of the proper one among u8x16, u16x8, u32x4, u64x2 and u128) and unnecessarily platform-dependent (you should not have to call an Intel-specific function just to get a vectorized reciprocal square root, for instance), which means they should instead use some crate from crates.io that provides a proper API. However, by putting them into std::arch, people may be led to believe that they are supposed to use them directly. So, I'd suggest adding a new library in the rustc repository called "intrinsics" instead of putting this into std::arch, documenting that the user is not supposed to use it directly, and ideally suggesting what crates to use instead. |
@bill-myers good points! It's definitely true that the main thrust of this RFC is not to empower all users of Rust to use SIMD, the apis are abysmal from that standpoint! Rather the motivation here is to empower anyone to use explicit SIMD on stable Rust. I'd expect that once we cross that threshold crates like With that mindset I think we'll definitely want to mention in the documentation quite thoroughly that these are very raw functions to use and you need to be quite careful and mindful when using them. The decision for location in the standard library, though, is done out of necessity rather than desire. All of these intrinsics are heavily coupled to a compiler backend (aka LLVM) and so there's really no way that they could be exported in a crate on crates.io where the guarantee is that such a crate would compile and work across many rustc versions. Inclusion in the standard library means that we have the freedom to continue to upgrade LLVM (and maybe even other backends one day) while providing a stable API surface area for all these intrinsics. In other words this all leads me to the conclusions of:
The final choice of module in the standard library I think is certainly fine to bikeshed. You're right in that something like Do you have thoughts though on what a better name for such a module would be? |
How about EDIT: bonus: |
text/0000-stable-simd.md
Outdated
Despite the CI infrastructure of the `stdsimd` crate it seems inevitable that | ||
we'll get an intrinsic wrong at some point. What do we do in a situation like | ||
that? This situation is somewhat analagous to the `libc` crate but there you can | ||
fix the problem downstream (just have a corrected type/definition) for for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
double "for"
Compile time error, correct? |
Thinking now the current "portability speed bump" in the standard library is something like That may lead to either |
I think it'd be fine to deprecate but leave those couple of stable functions in place if we want to coopt the intrinsics module, though. |
While the bug is sad, in this case it doesn't really bother me. |
People keep linking to my post but they maybe don't seem to be using the same take-away that I intended, so I'll try to make that more clear:
|
Initial reaction is "please no", but at least having the version in there seems very sensible. |
I don't personally think we should have something like I also agree with @eddyb, let's not go crazy and expose LLVM intrinsics, this RFC is purely for SIMD/vendor intrinsics. |
The LLVM comment at the end was pure speculation, I'm not trying to double the scope of the RFC. However, when it comes to the stability conversation I think it's best to keep in mind not just the next 6 months, but the next 60 months, and hopefully the next 600 months even. @alexcrichton You say there's some sort of "organization onus" that's put on us if we break it up by target feature, but I don't understand what that onus would be. Every intrinsic that exists already exists in connection with a specific target feature. The entire RFC is predicated on the idea of these features. Every single intrinsic call is cfg-attribute gated so that it can't exist in a build targeting the wrong CPU and/or feature set. All of the organization is done for us by that alone. Further, if you check the Intrinsics Guide you will note that each intrinsic feature set actually is exported by a different header. The fact that there's also a header to "grab everything" means that we just need a re-export module for folks who really do want it all dumped on them at once. However, the functionality itself would still live in different sub-modules. I'm all for modules being "however big they need to be", but if every intrinsics now and forever lives in a single module it's just too much. The RFC itself already quotes the count at "thousands", and admits that there's so much code we might have to relax the normal stability guarantee and fix a bug here and there simply because there's so much code we don't fully trust ourselves to get it all correct the first time out. |
@Lokathor the organization onus is that we are the ones deciding to do so. Vendors are not designing intrinsics to be placed in modules (so it seems) but rather to get Again the intention of this RFC is to expose anything at all on stable, not make it a 100% high quality interface for general consumption. It's expected something like that will show up on crates.io, not in libstd itself. To that end we're looking for the lowest possible overhead to add these sorts of intrinsics to the standard library. Placing them in one module is not only easy for us to do but it also matches what vendors are expecting. This also should extend naturally to future architectures as well which aren't currently bound in stdsid. |
I don't like the separation in "feature" submodules either. For x86 it is already problematic due to some intrinsics requiring multiple features. Also, whether an intrinsic is available might depend on whether you are targeting a 32-bit or 64-bit x86 platform, and some other platform characteristics might apply as well (e.g. And x86 is the nice platform. For ARM we have "some" level of separation in @alexcrichton one thing we could do is just expose the same headers that C does in submodules, for example:
While we could do this, I find the reasons to do so weak at best. In a platform where an x86 intrinsic is not available, users will get a compiler error if they try to import the intrinsic. In a platform where the intrinsic is available, whether the intrinsic can or cannot be called depends on the machine where the binary will run on. Users that want to ensure portability should be using @alexcrichton it was mentioned in the internal threads that instead of using attributes + functions, const generics, etc. to declare intrinsics that take compile-time constants as arguments, we could just be using macros. With macros 2.0 users should be able to import these macros from |
@gnzlbg yeah I think organizing by header is possible but I don't think it buys us much. Sort of like It's true yeah we could use macros (that's what C does I believe), but I'd personally prefer to stick to the solution we've got so far which with enough const machinery I think could one day be stable. |
🔔 This is now entering its final comment period, as per the review above. 🔔 |
text/0000-stable-simd.md
Outdated
The `#[cfg]` attribute and `cfg!` macro statically resolve and **do not do | ||
runtime dispatch**. Tweaking these functions is currently done via the `-C | ||
target-feature` flag to the compiler. This flag to the compiler accepts a | ||
similar set of strings to the ones specified above and is already "stable". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about #[target_feature]
? Does this RFC define that it doesn't affect #[cfg]
or does it leave it undecided? CC rust-lang/rust#42515
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that's going to continue to be an open bug
This is a minor point, and I'm not even a Rust developer so this may be way off base, but I'm wondering whether you'd be better off having specific types for "Intel defined types". So |
The final comment period is now complete. |
Ok! Now that FCP has elapsed it looks like nothing major has come up. I think there's still an open question as to where to place this in the standard library, but I'm going to merge this and we can of course continue to bikeshed on the tracking issue! |
The purpose of this RFC is to provide a framework for SIMD to be used on stable
Rust. It proposes stabilizing x86-specific vendor intrinsics, but includes the
scaffolding for other platforms as well as a future portable SIMD design (to be
fleshed out in another RFC).
Rendered