Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ban use of Math.fma across the entire codebase #12014

Merged
merged 1 commit into from
Dec 17, 2022
Merged

Conversation

rmuir
Copy link
Member

@rmuir rmuir commented Dec 13, 2022

When FMA is not supported by the hardware, these methods fall back to BigDecimal usage [1] which causes them to be 2500x slower [2].

While most hardware in the last 10 years may have the support, out of box both VirtualBox and QEMU don't pass thru FMA support (for the latter at least you can tweak it with e.g. -cpu host or similar to fix this).

This creates a terrible undocumented performance trap, see [3] for an example of a 30x slowdown of an entire application. In my experience, developers are often too far detached from the production reality, and that reality is: we're not deploying to macbook pros in production, instead we are almost all using virtualization: we can't afford such performance traps.

Practically it would be an issue too: e.g. Policeman jenkins instance that runs our tests currently uses virtualbox. It would be bad for vector-search tests to suddenly get 30x slower.

We can't safely use this method anywhere, as we don't have access to check CPUID or anything to see if it will be insanely slow or not. Let's ban it completely: I'm concerned it will sneak into our codebase otherwise... it almost happened before: #10718

[1] Math.java source code
[2] Comment on JIRA issue for x86 intrinsic mentioning 2500x speedup
[3] VirtualBox bug for lack of FMA support

When FMA is not supported by the hardware, these methods fall back to
BigDecimal usage which causes them to be 2500x slower.

While most hardware in the last 10 years may have the support, out of
box both VirtualBox and QEMU don't pass thru FMA support (for the latter
at least you can tweak it with e.g. -cpu host or similar to fix this).

This creates a terrible undocumented performance trap. Prevent it from
sneaking into our codebase.
@gsmiller
Copy link
Contributor

+1, seems reasonable to me. We can always remove this ban in the future if there's a good reason, but seems reasonable to put this in place to prevent it sneaking in for now.

@rmuir
Copy link
Member Author

rmuir commented Dec 13, 2022

Yeah, I think if the fallback java code was 2x, 4x, or 8x slower (like you would expect from these intrinsics), we wouldn't be having this conversation :)

@benwtrent
Copy link
Member

Holy crap, creating BigDecimal and then multiplying & adding is crazy. This is a completely unacceptable fallback calculation for this method.

+1 on banning its use in the code base.

@dweiss
Copy link
Contributor

dweiss commented Dec 14, 2022

I honestly don't know who can use this method without any provided cpuid check... We actually use fma in our code but do so by detecting the performance difference between a naive implementation on primitive types and Math.fma (during bootstrap). It's ugly like hell but the difference is so vast that it works. I'm not sure who'd ever gain from using the bigdecimal-based implementation...

@rmuir
Copy link
Member Author

rmuir commented Dec 14, 2022

I looked at what e.g. glibc does here as a fallback out of curiousity, for floats it is very simple (using Dekker algorithm), but requires changing the FP rounding mode, which you cant do in java. For doubles it is more complicated but still no bigdecimal.

@uschindler

This comment was marked as duplicate.

@rmuir rmuir merged commit 3ac71ad into apache:main Dec 17, 2022
asfgit pushed a commit that referenced this pull request Dec 17, 2022
When FMA is not supported by the hardware, these methods fall back to
BigDecimal usage which causes them to be 2500x slower.

While most hardware in the last 10 years may have the support, out of
box both VirtualBox and QEMU don't pass thru FMA support (for the latter
at least you can tweak it with e.g. -cpu host or similar to fix this).

This creates a terrible undocumented performance trap. Prevent it from
sneaking into our codebase.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants