Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking Issue for Wasm floating point instructions not in core #133908

Open
1 of 3 tasks
daxpedda opened this issue Dec 5, 2024 · 12 comments
Open
1 of 3 tasks

Tracking Issue for Wasm floating point instructions not in core #133908

daxpedda opened this issue Dec 5, 2024 · 12 comments
Labels
C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@daxpedda
Copy link
Contributor

daxpedda commented Dec 5, 2024

Feature gate: #![feature(wasm_numeric_instr)]

This is a tracking issue for floating point instructions that aren't available in core but are available as simple instructions on Wasm directly.

Public API

mod arch {
    mod wasm32 {
        pub fn f32_ceil(a: f32) -> f32;
        pub fn f32_floor(a: f32) -> f32;
        pub fn f32_trunc(a: f32) -> f32;
        pub fn f32_nearest(a: f32) -> f32;
        pub fn f32_sqrt(a: f32) -> f32;
        pub fn f64_ceil(a: f64) -> f64;
        pub fn f64_floor(a: f64) -> f64;
        pub fn f64_trunc(a: f64) -> f64;
        pub fn f64_nearest(a: f64) -> f64;
        pub fn f64_sqrt(a: f64) -> f64;
    }
}

Steps / History

Unresolved Questions

My impression from #50145 was that these methods (except sqrt()) are not intended to ever be added to core. But looking at a recent discussion in Zulip it paints a different picture.

If indeed these methods are intended to come to core, then these additions will probably never be stabilized.

Footnotes

  1. https://std-dev-guide.rust-lang.org/feature-lifecycle/stabilization.html

@daxpedda daxpedda added C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Dec 5, 2024
@daxpedda daxpedda changed the title Tracking Issue for XXX Tracking Issue for Wasm floating point instructions not in core Dec 5, 2024
@hanna-kruppe
Copy link
Contributor

I would like to have these available on stable as soon as reasonably possible. Of course having them available in no_std on all platforms would be even better, but if that stalls out again (it’s not a new topic) then I don’t see the harm in stabilizing some arch intrinsics that will become redundant at some point in the future. We also have _mm_sqrt_ss on x86 (explicitly used by libm today!) and a bunch of other arch intrinsics that are technically redundant even with current core capabilities (e.g., basic SIMD loads and stores can be done with ptr.{read,write}{,_unaligned}).

bors added a commit to rust-lang-ci/rust that referenced this issue Dec 24, 2024
Bump `stdarch`

This bumps `stdarch` to rust-lang/stdarch@684de0d to get in rust-lang/stdarch#1677 (tracked in rust-lang#133908).

From the [commit history](rust-lang/stdarch@e5e00aa...684de0d) I deduced that there shouldn't be any changes to Rust necessary.

From past PRs I'm assuming that bumping `stdarch` like this is fine, but please let me know if this is somehow inappropriate or requires something more to be done!
bors added a commit to rust-lang-ci/rust that referenced this issue Dec 24, 2024
Bump `stdarch`

This bumps `stdarch` to rust-lang/stdarch@684de0d to get in rust-lang/stdarch#1677 (tracked in rust-lang#133908).

From the [commit history](rust-lang/stdarch@e5e00aa...684de0d) I deduced that there shouldn't be any changes to Rust necessary.

From past PRs I'm assuming that bumping `stdarch` like this is fine, but please let me know if this is somehow inappropriate or requires something more to be done!

try-job: arm-android
bors added a commit to rust-lang-ci/rust that referenced this issue Dec 24, 2024
Bump `stdarch`

This bumps `stdarch` to rust-lang/stdarch@684de0d to get in rust-lang/stdarch#1677 (tracked in rust-lang#133908).

From the [commit history](rust-lang/stdarch@e5e00aa...684de0d) I deduced that there shouldn't be any changes to Rust necessary.

From past PRs I'm assuming that bumping `stdarch` like this is fine, but please let me know if this is somehow inappropriate or requires something more to be done!

try-job: arm-android
try-job: armhf-gnu
lnicola pushed a commit to lnicola/rust-analyzer that referenced this issue Jan 7, 2025
Bump `stdarch`

This bumps `stdarch` to rust-lang/stdarch@684de0d to get in rust-lang/stdarch#1677 (tracked in rust-lang/rust#133908).

From the [commit history](rust-lang/stdarch@e5e00aa...684de0d) I deduced that there shouldn't be any changes to Rust necessary.

From past PRs I'm assuming that bumping `stdarch` like this is fine, but please let me know if this is somehow inappropriate or requires something more to be done!

try-job: arm-android
try-job: armhf-gnu
@daxpedda
Copy link
Contributor Author

Hereby requesting an FCP.

Stabilization PR: rust-lang/stdarch#1700.
Cc @Amanieu, you were against stabilizing this in favor of having these functions in core.
Cc @alexcrichton for being the target maintainer.

@rustbot label +I-libs-api-nominated

@rustbot rustbot added the I-libs-api-nominated Nominated for discussion during a libs-api team meeting. label Jan 14, 2025
@daxpedda
Copy link
Contributor Author

@tgross35 pointed out in Zulip that bringing these functions to core is already a WIP, which would make the additions here redundant.

@hanna-kruppe
Copy link
Contributor

Note that the implementation of these functions in libm now uses the core::arch::wasm32 intrinsics covered by this tracking issue. They could be kept unstable indefinitely, but that seems a slightly unusual location for perma-unstable implementation details of core. The alternative is to remove the core::arch versions and revert rust-lang/libm#418 - what do you think @tgross35? (I assume inline assembly is undesirable because it prevents some optimizations.)

@alexcrichton
Copy link
Member

👍 from me, although I'm not in a position to appropriately weigh the concerns of duplicating float methods in core. That seems like it could be a reasonable reason to postpone this too. (although personally I would also consider it pretty reasonable to add these and then maybe deprecate them in the future if float-methods-in-core happens or something like that)

@Amanieu
Copy link
Member

Amanieu commented Jan 14, 2025

We discussed this in the libs-api meeting. We would like to wait for float methods in core to make some progress before deciding whether this is worth stabilizing independently.

@Amanieu Amanieu removed the I-libs-api-nominated Nominated for discussion during a libs-api team meeting. label Jan 14, 2025
@tgross35
Copy link
Contributor

Note that the implementation of these functions in libm now uses the core::arch::wasm32 intrinsics covered by this tracking issue. They could be kept unstable indefinitely, but that seems a slightly unusual location for perma-unstable implementation details of core. The alternative is to remove the core::arch versions and revert rust-lang/libm#418 - what do you think @tgross35?

Just to address this - as long as we keep these core::arch functions around until we have the associated impl f64 { ceil, floor, ...} functions, there isn't any problem here. Once those f64::* functions are available, we can use that to replace the calls to core::arch::wasm32::*, after which the functions in this issue can be removed.

(I assume inline assembly is undesirable because it prevents some optimizations.)

I actually would rather have the inline assembly version, but am unsure about its stability. Using the intrinsics or the methods in core doesn't get us out of the possible recursion problem I mentioned here rust-lang/libm#214 (comment). Which should not be a problem on WASM, but that is still something that is out of our control

There shouldn't be an optimization problem either because for trivial routines (abs, copysign, platforms with sqrt, etc), LLVM lowers the intrinsic directly to its own assembly which can be optimized. For any other math ops that are intrinsics (log, cos, ...), these will be a libcall so the function is outlined anyway.

@hanna-kruppe
Copy link
Contributor

Just to address this - as long as we keep these core::arch functions around until we have the associated impl f64 { ceil, floor, ...} functions, there isn't any problem here. Once those f64::* functions are available, we can use that to replace the calls to core::arch::wasm32::*, after which the functions in this issue can be removed.

Oh, I meant the other way around: I assumed the implementations in core would use libm (perhaps not as a crate but at least the same code). In any case, no matter where the code for the core impls comes from, it will face similar issues as libm currently does w.r.t. avoiding possible recursion:

  • Libm as a regular crate consumed from crates.io can and should use f32::sqrt internally.
  • f32::sqrt from core should lower to @llvm.sqrt.f32 to get full access to LLVM optimizations, but LLVM lowers that intrinsic to a sqrtf libcall sometimes.
  • Something (compiler_builtins?) provides the sqrtf functions, and this implementation should not mention f32::sqrt or the LLVM intrinsic because that's a recipe for recursion. However, there will be situations where the optimal implementation for sqrtf is "just use the native instruction" and for that you have to handle the recursion problem somehow (if you don't want to just use a generic soft-float implementation on all platforms).

There shouldn't be an optimization problem either because for trivial routines (abs, copysign, platforms with sqrt, etc), LLVM lowers the intrinsic directly to its own assembly which can be optimized. For any other math ops that are intrinsics (log, cos, ...), these will be a libcall so the function is outlined anyway.

Are you talking about the immediate implementation of e.g. f32::sqrt here? Inline assembly is an opaque string both in LLVM IR and throughout most of the backend (instruction selection, MachineIR optimizations). LLVM can't, and shouldn't be allowed to, "raise" such inline asm to the equivalent llvm.* intrinsics. This doesn't matter much for a symbol that's only called by LLVM libcall lowering and never available for inlining, but it would matter a lot for core's fN::* methods.

@daxpedda
Copy link
Contributor Author

There shouldn't be an optimization problem either because for trivial routines (abs, copysign, platforms with sqrt, etc), LLVM lowers the intrinsic directly to its own assembly which can be optimized. For any other math ops that are intrinsics (log, cos, ...), these will be a libcall so the function is outlined anyway.

Are you talking about the immediate implementation of e.g. f32::sqrt here? Inline assembly is an opaque string both in LLVM IR and throughout most of the backend (instruction selection, MachineIR optimizations). LLVM can't, and shouldn't be allowed to, "raise" such inline asm to the equivalent llvm.* intrinsics. This doesn't matter much for a symbol that's only called by LLVM libcall lowering and never available for inlining, but it would matter a lot for core's fN::* methods.

E.g. https://rust.godbolt.org/z/az9b1acE6, you can see that the initial local.get and the local.set beforehand could be skipped entirely. Would be interesting to know if this is as intended or if this is something LLVM could/should be able to optimize.

@hanna-kruppe
Copy link
Contributor

In general inline asm is intended to be a "black box":

The compiler cannot assume that the instructions in the asm are the ones that will actually end up executed.

  • This effectively means that the compiler must treat the asm! as a black box and only take the interface specification into account, not the instructions themselves.

For wasm in particular, runtime code patching is not possible and separate tools like Binaryen's wasm-opt don't know or care about which parts of the module originally came from inline assembly. But neither of these caveats applies to other targets like x86 or AArch64. Well, some tools like BOLT do work on the final post-linking machine code but these are well outside the domain of Rust-the-language and explicitly incompatible with hand-written assembly that's not sufficiently well-behaved.

@tgross35
Copy link
Contributor

Oh, I meant the other way around: I assumed the implementations in core would use libm (perhaps not as a crate but at least the same code). In any case, no matter where the code for the core impls comes from, it will face similar issues as libm currently does w.r.t. avoiding possible recursion:

* Libm as a regular crate consumed from crates.io can and should use `f32::sqrt` internally.

Not quite; libm should not have:

pub fn sqrtf(x: f32) -> f32 {
    unsafe { core::intrinsics::f32(x) }
}

Which is what we effectively have on wasm. On wasm this is currently okay but we generally don't want to think about how LLVM will lower intrinsics.

We should direct everyone using libm* directly to instead use core versions instead once they are available (both for simplicity and for performance). And then libm's implementation can be softfloat, assembly, or some core::arch routine - just not intrinsics.

* `f32::sqrt` from core should lower to `@llvm.sqrt.f32` to get full access to LLVM optimizations, but LLVM lowers that intrinsic to a `sqrtf` libcall sometimes.

That is correct. On WASM it should always get an asm lowering, but we probably shouldn't rely on this.

* Something (compiler_builtins?) provides the `sqrtf` functions, and this implementation should not mention `f32::sqrt` or the LLVM intrinsic because that's a recipe for recursion. However, there will be situations where the optimal implementation for `sqrtf` is "just use the native instruction" and for that you have to handle the recursion problem somehow (if you don't want to just use a generic soft-float implementation on all platforms).

compiler_builtins vendors libm as a submodule and makes its symbols available, but only on specific targets (changing this is one of the steps toward getting these into core, very nearly unblocked). So if LLVM decides to lower llvm.sqrt.f32 to a libcall but that libm symbol just calls an intrinsic that does the same, this is the problem.

There shouldn't be an optimization problem either because for trivial routines (abs, copysign, platforms with sqrt, etc), LLVM lowers the intrinsic directly to its own assembly which can be optimized. For any other math ops that are intrinsics (log, cos, ...), these will be a libcall so the function is outlined anyway.

Are you talking about the immediate implementation of e.g. f32::sqrt here? Inline assembly is an opaque string both in LLVM IR and throughout most of the backend (instruction selection, MachineIR optimizations). LLVM can't, and shouldn't be allowed to, "raise" such inline asm to the equivalent llvm.* intrinsics. This doesn't matter much for a symbol that's only called by LLVM libcall lowering and never available for inlining, but it would matter a lot for core's fN::* methods.

f32::sqrt and similar methods will not call libm::* directly; libm's symbols will be available at link time (as part of compiler_builtins) but LLVM won't know their contents anyway. So either LLVM does its own lowering and libm is never touched, or it emits a black box call sqrtf that it couldn't optimize around anyway (LTO could, but most routines that LLVM can't lower to assembly are probably not worth inlining anyway).

@hanna-kruppe
Copy link
Contributor

We should direct everyone using libm* directly to instead use core versions instead once they are available (both for simplicity and for performance). And then libm's implementation can be softfloat, assembly, or some core::arch routine - just not intrinsics.

I can't speak for others but the second biggest reason (after no_std) why I'm using libm from crates.io is for better cross-platform reproducibility of operations that aren't correctly rounded (e.g., exp and log). If the core versions resolve to system libm implementations when they're available, that reproducibility is lost (even when pinning a particular version of Rust, which is less feasible than pinning a crates.io dependency).

Of course I can't expect you or anyone else to maintain a library perfectly suited for what I need. But if "libm as included via compiler_builtins" could benefit from some core::arch function existing, I'd prefer to be able to use the same code path via the crates.io version of libm using a stable toolchain. Put differently: it would be pretty silly if I could get better performance or code size by merely forking libm and replacing the implementation of a few correctly-rounded functions with the inherent methods from core. That is why I see a use case for core::arch::wasm32::sqrt and friends even after f32::sqrt and friends exist in core.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

6 participants