-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-12343: [Rust] Support auto-vectorization for min/max #10002
Conversation
holy molly, 6 lines of code. Brilliant. How does it compare with our SIMD implementation? (evaluating whether they are needed at all :P) |
It's definitely not as good as when compiling with the
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a great idea @Dandandan
I ran the benchmark tests on my laptop and I saw similar improvement numbers (master is 30% slower than this branch).
I vote we
Given how effective this is, perhaps we can add a similar thing to other kernels (I could file a ticket if so)
FYI @andygrove and @nevi-me
I 100% agree with this sentiment. |
I think it can be useful to check which kernels / functions can benefit from some more simd instructions. We might also need to play a bit with things like As we are using packed_simd2, which also uses only the standard instructions set in |
@nevi-me / @jorgecarleitao any concerns about merging this PR? Only the new dependency is of potential concern to me, but it seems to me to be a pretty reasonable tradeoff We could probably also just inline the part of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Only the new dependency is of potential concern to me, but it seems to me to be a pretty reasonable tradeoff
multiversion is pretty small (not many dependencies) and looks like it could be quite useful in general.
From their README:
Function multiversioning is the practice of compiling multiple versions of a function with various features enabled and safely detecting which version to use at runtime.
My only concern would be on binary size if we really go crazy using it but we can monitor that over time.
The multiversion crate seems really handy! Are the tradeoffs just the expected bump in binary size to support the multiple implementations and presumably a branch when entering a multi-target code path? Is there any facility to remove the runtime checks, e.g. if I know I'm producing builds for say aarch64? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy with the change, I curiously tried this on aarch64, but because that arch isn't stable, it can't work on stable :(
It sounds to me like there is consensus on this approach. I'll plan to merge it in as soon as the 4.0 release is out the door (ETA early next week) so as not to potentially introduce instability into the release process at this stage |
AFAIK the architecture is static/compile time, so it will not be included in the checks. If the architecture is not the target architecture, it shouldn't include the branches for the features and in the generated code.
Agreed. Good to monitor this and try to keep size down 👍 |
The Apache Arrow Rust community is moving the Rust implementation into its own dedicated github repositories arrow-rs and arrow-datafusion. It is likely we will not merge this PR into this repository Please see the mailing-list thread for more details We expect the process to take a few days and will follow up with a migration plan for the in-flight PRs. |
Opened in new repo apache/arrow-rs#9 |
This is a PoC to get some more SIMD instructions in Arrow without requiring the nightly compiler or producing not-portable binaries, inspired by this article shared by @jorgecarleitao :
https://www.nickwilcox.com/blog/autovec2/
This allows dispatching on CPU features, letting the compiler auto-vectorize the code without losing portability or introducing unsafe blocks. Here we use the multiversion crate to easily add more compiled versions with different cpu features.
On my machine: