-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-43693: [C++][Acero] Support AVX2 swiss join decoding #43832
base: main
Are you sure you want to change the base?
Conversation
@github-actions crossbow submit -g cpp |
@ursabot please benchmark |
Benchmark runs are scheduled for commit e2af277. Watch https://buildkite.com/apache-arrow and https://conbench.ursa.dev for updates. A comment will be posted here when the runs are complete. |
Revision: e2af277 Submitted crossbow builds: ursacomputing/crossbow @ actions-1db8e52faf |
Thanks for your patience. Conbench analyzed the 2 benchmarking runs that have been run so far on PR commit e2af277. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. |
@zanmato1984 might wanna take a second look at that merge/rebase |
Apologies to whom are mis-involved by my careless merge/rebase. I'm cleaning my branch now. Sorry :( |
672d33b
to
afb3117
Compare
The merge/rebase has now been fixed. |
Rationale for this change
You can find the background in #43693.
By looking at how
Visit_avx2/VisitNulls_avx2
's non-simd counterparts (Visit/VisitNulls
) are used, I found they are solely for decoding rows from the build side of the join. So I added AVX2 versions for those decoding methods and wiredVisit_avx2/VisitNulls_avx2
.What changes are included in this PR?
Visit*_avx2
functions to decode fixed-length/offsets/var-length/nulls of the row table.Visit*_avx2
functions.Are these changes tested?
No new tests needed.
The benchmarking result is a bit complicated, I put them in comment #43832 (comment).
Are there any user-facing changes?
No changes other than positive performance improvement. Users can expect such improvement for hash joins related workload. Nevertheless the improvement degree highly depends on not only the workload, but also the CPU models. For Intel CPUs from Skylake to Icelake/Tigerlake, which suffer the performance degradation of AVX2 gather because of an vulnerability mitigation of Intel's (detailed in #43832 (comment)), the improvement is less significant - single digit percent. Other models, e.g. AMD, and the most recent Intel, can achieve better improvement up to 30%.