-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PROTOTYPE] Adding a libm library for use by the JIT and VM #51439
Conversation
If it is accross all our platforms, it should include Mono. |
src/coreclr/libm/src/ref/acos.c
Outdated
|
||
if (xnan) | ||
{ | ||
#ifdef WINDOWS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are many places that are ifdefed for Windows like this. Why is that? Is it a problem for consistency accross platforms?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This namely represents a difference in what IEEE exception is raised and what errno
is set for failure cases (cases that return NaN
) as POSIX sometimes puts additional requirements on the methods that Windows doesn't conform to.
For managed code, these don't actually matter because we don't support (IEEE exceptions) or surface (errno) for these methods at all.
If we plan on using these with any native code, then we'd likely want to ensure UNIX platforms stay POSIX compliant in their error reporting; otherwise we can just pick one (I chose Windows for the purpose of the draft, since it does less) and it won't impact what is exposed via System.Math
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we do not care about errno or IEEE exceptions, can we just turn this error handling into no-op or almost no-op?
If we plan on using these with any native code,
I do not think we have any native code that would need this. And if we ever had native code like that, we can fix it up as necessary to not expect errno, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we do not care about errno or IEEE exceptions, can we just turn this error handling into no-op or almost no-op?
Yes, I have it commented out in the files that are part of this PR. I plan on submitting an issue/PR to amd/aocl-libm-ose to allow the same via an #ifdef
, so we don't need to maintain any custom logic on our end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For reference, it's the logic that was in here: https://github.com/dotnet/runtime/pull/51439/files#diff-19a08ddd64deeddea26b16f7e96a1ce59bb6bf1e163ae53067bca6600ae73f04R32-R53
It was commented out because it triggered the exception via inline x86 assembly, which isn't compatible with ARM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point. I wasn't quite sure on where it should live or the correct mechanism to get it linked in and so this draft PR is more about showing a working minimal prototype on Windows to see if we believe this is feasible and a good direction. I expect there are a likely a few changes required to the build parameters, how this is linked or referenced by the various components, and to determine how this interacts with the PAL layer. If we do believe this is worthwhile and that it addresses the maintainability concerns (given changes would be contributed back to and pulled from |
I expect that we would delete all floating point wrappers from the CoreCLR PAL layer with this change. |
Yeah, it would be nice to sort these out. It would look odd to have this fancy logic for advanced functions, but missing it for basic functions such as cos. It looks reasonable to me otherwise. |
Tagging subscribers to this area: @tannergooding, @pgovind Issue DetailsThis adds a new libm library that can be used by the VM (for The implementations are copied nearly verbatim from https://github.com/amd/aocl-libm-ose (as per the license headers), with the only fixups being minor changes to get it building in CoreCLR and running on ARM. This change would allow us to provide performant and deterministic implementation of System.Math across all our platforms without taking on the burden of creating or maintaining our own implementations. CC. @jkotas, @jeffhandley, @danmoseley. I'd appreciate feedback here on if this approach (or a variation of it) is believed to be feasible. I did some minor initial profiling on x64 Windows (via our Math Benchmarks in dotnet/performance) and for the methods that have switched we are on par or faster than in all cases. The functions that aren't ported to in this PR (such as
|
Based on your suggestion of placing it in
For In practice, this roughly boils down to picking an "optmized" implementation on most platforms/setups: ALM_PROTO_OPT(sin) (which compiles down to the same as So, these ones should be fairly straightforward to use since the variant used on most setups is pure C and we don't need the micro-architecture specific optimizations.
There are notably a couple functions, such as I'm going to submit a PR to amd/aocl-libm-ose to fix the only blocker for consuming the library from ARM (which is how they call |
I think so.
Yes, it is small enough. No big deal with statically linking it multiple times.
@marek-safar Who should work with Tanner on this on Mono side? |
86bab41
to
e2d4446
Compare
I moved the files into
That leaves: |
(I fully expect the build will fail here and there will be iteration here to get it fully working on macOS, Unix, and ARM). |
e2d4446
to
79ef914
Compare
src/native/clrmath/src/fwd/cbrtf.c
Outdated
|
||
#include "clrmath.h" | ||
|
||
float clrmath_cbrtf(float x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the C/C++ compiler able to optimize out these forwarders in all situations, and to always pick the local implementation?
It feels fragile. It may be better if the C method names for the local implementations included a prefix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These forwarders only exist for the methods that are still coming from the C runtime (because they are implemented in assembly for aocl-libm-ose).
Since it's all statically linked, COMSingle::Cbrt
is compiled down to call cbrtf
directly (for release mode), rather than going to clrmath_cbrtf
and then jumping to cbrtf
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It assumes good link-time code generation that is not given, especially on some platforms targeted by Mono.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll update these to just be `#define clrmath_cbrtf cbrtf.
That being said, I would be very surprised if LLVM failed to do the basic inlining optimization here.
Does it matter that https://github.com/amd/aocl-libm-ose does not have its own tests? Would AMD be willing to add them? |
@vargaz could you please work with Tanner on MonoVM side? /cc @lambdageek for awareness |
The repo has a couple branches. This branch does contain many of the tests and more are being added: https://github.com/amd/aocl-libm-ose/tree/aocl-3.0/gtests |
Aren't these versions going to be slower than the platform default versions, i.e. those are probably optimized with assembly etc ? |
Not all of the platform versions are implemented in assembly. On x64 Windows, we use what is effectively https://github.com/amd/win-libm in MSVCRT (aocl-libm-ose is basically the newer cross-platform version of the same library). Most of the functions aren't overly complex and really just rely on the native compiler efficiently reinterpret casting between |
I'm worried about the size impact on WebAssembly and Android if we use this in Mono. (also... we don't use LTO on most platforms - i'm unclear under what circumstances this library falls back to using the system math functions and whether its going to inhibit optmizations by calling non-intrinsic versions of them). |
@tannergooding I'm not sure I understand the motivation for this PR. The goal is that .NET will have a consistent behavior across platforms, even if that behavior differs between .NET and the platform math library? Shouldn't that be a new API surface, and not |
This only falls back to the C Runtime if a corresponding portable implementation doesn't exist in amd/aocl-libm-ose.
Yes, the intent is for .NET to provide performant and deterministic math on all platforms to help avoid issues that frequently arise from them differing between Windows, MacOS, Linux, x64, x86, ARM32, ARM64 and all valid variations there-in. Today, these differences can result in significant issues for projects like ML.NET or other math heavy libraries. We provide no guarantee that
I don't believe so. The existing functions already are portable, this is just normalizing the cross-platform behavior to help improve program consistency across platforms.
This shouldn't inhibit any optimizations. Both The implementation being normalized on is that from AMD, which is already the basis for the x64 Windows CRT implementation. The implementation is more performant in many cases than the existing x86 Windows implementation. For Linux and Mac, they each have their own math libraries, with Linux coming from |
yea that's what I'm talking about. Let's take a look at three examples:
I don't see how this is a benefit to Android or WebAssembly users. |
It already may not have agreed with Additionally, for non-Windows, we already have an indirection due to the PAL layer today. Having consistent implementations across platforms actually allows us to remove that indirection and call the functions more directly.
Yes, there will be some additional size increase to effectively duplicate lookup tables that the C Runtime may already have declared. |
What are AMD's plans for this code? Are they planning to maintain it, have CI, take contributions, etc? Right now the glibc etc we depend on is maintained, of course. It would be good if there was a public roadmap there. |
I can provide more details offline. |
79ef914
to
c745607
Compare
c745607
to
095eef0
Compare
Draft Pull Request was automatically closed for inactivity. Please let us know if you'd like to reopen it. |
This adds a new libm library that can be used by the VM (for
Math
internal calls) and by the JIT (forValueNum
constant folding).The implementations are copied nearly verbatim from https://github.com/amd/aocl-libm-ose (as per the license headers), with the only fixups being minor changes to get it building in CoreCLR and running on ARM.
This means picking up fixes or perf improvements is effectively as straightforward as copying the files from
aocl-libm-ose
into the corresponding folder in the runtime.This change would allow us to provide performant and deterministic implementation of System.Math across all our platforms without taking on the burden of creating or maintaining our own implementations.
CC. @jkotas, @jeffhandley, @danmoseley. I'd appreciate feedback here on if this approach (or a variation of it) is believed to be feasible.
I did some minor initial profiling on x64 Windows (via our Math Benchmarks in dotnet/performance) and for the methods that have switched we are on par or faster than in all cases.
For reference, x64 Windows uses what is effectively https://github.com/amd/win-libm, https://github.com/amd/aocl-libm-ose is a modernized and cross-platform variant of the same and so has some perf improvements (such as to
Acosh
) that we don't currently have.The functions that aren't ported to in this PR (such as
cos
) either only exist asassembly
and would need a C reference port (which would be contributed toamd/aocl-libm-ose
before being taken on our end) or have multiple implementations available based on what CPU features are available and so aren't yet fully portable (the same issue of ensuring a stable reference implementation exists inamd/aocl-libm-ose
would apply before taking it in dotnet/runtime).