Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NVPTX: Add f16 SIMD intrinsics #1626

Merged
merged 1 commit into from
Aug 19, 2024

Conversation

kjetilkjeka
Copy link
Contributor

This implements parts of the f16x2 intrinsics as discussed in rust-lang/rust#125440 (comment)

It's unfortunately not possible to test that these intrinsics produce the correct instructions with the current test infrastructure. This was discussed on zulip

I have created assembly tests in my own branch of rust-lang repo that verifies the correct instructions are used. I have added #[assert_instr] attributes nonetheless. I'm not sure if it's desirable to follow up on rust-lang with a PR including the tests or not?

@rustbot
Copy link
Collaborator

rustbot commented Aug 7, 2024

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @Amanieu (or someone else) some time within the next two weeks.

crates/core_arch/src/nvptx/packed.rs Outdated Show resolved Hide resolved
crates/core_arch/src/nvptx/packed.rs Outdated Show resolved Hide resolved
@kjetilkjeka kjetilkjeka force-pushed the nvptx_f16_simd_intrinsics branch 3 times, most recently from 5d937f5 to dd7e165 Compare August 12, 2024 13:24
@kjetilkjeka
Copy link
Contributor Author

kjetilkjeka commented Aug 12, 2024

Thanks for reviewing!

I have fixed the two comments from @Amanieu

I have circumvented the naming issue around min/minimum brought up by @CryZe by instead implementing the intrinsics that makes sense to have named as min/max. I can add the minimum/min_nan and maximum/max_nan when the correct naming is confirmed (either here or as follow up).

I have also re-run the tests with my branch containing FileCheck based tests that the correct intrinsics are produced.

I see that there might be yet another slight naming issue. I have named the functions as f16x2_<intrinsic> while the intrinsic (in PTX) is named <intrinsic>.f16x2. What is the desired naming convention here?

@Amanieu
Copy link
Member

Amanieu commented Aug 12, 2024

CUDA seems to have an API for f16 intrinsics, should we be mirroring that API instead?

@kjetilkjeka
Copy link
Contributor Author

CUDA seems to have an API for f16 intrinsics, should we be mirroring that API instead?

I see that the names are consistent with the ptx ISA for functions. I will change all the function names to be consistent with this naming (both CUDA and PTX ISA). I will refrain from adopting the leading double underscore __ as this is a C detail related to not having namespaces. I'm very happy with this solution.

The part I'm a bit more unsure about is naming the type half2. The CUDA C type is, at least in one aspect, closer to something found in core::simd as a big part of the problem it solves is being able to use the same vector instructions both on x86_64 and nvptx64 targets. These, core::arch::nvptx types are only usable on ptx and if it is acceptable to keep f16x2 I think that is the better name (being consistent with the ptx ISA and as a bonus also familiar for Rust users)

@kjetilkjeka
Copy link
Contributor Author

I updated the PR a little while ago. I also re-run the rustc assembly tests I have in my branch to verify the correct instructions are still being produced. I also ported a much used function in our (my workplace) internal code and verified that tests were passing. (And there was also a notable performance boost on the relevant benchmarks).

And with that, I think this one should be ready for merging.

@Amanieu Amanieu enabled auto-merge (rebase) August 19, 2024 18:53
@Amanieu Amanieu merged commit d9466ed into rust-lang:master Aug 19, 2024
31 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants