-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: implement RPO hash using AVX2 instructions #234
Conversation
Thank you! Great results! A couple of preliminary comments/questions:
|
|
Thank you! Re point 2: I agree with you - let's not introduce the extra feature (and I'll remove the |
I rebased this PR onto |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Thank you! I left a couple of comments inline - mostly about code organization.
still need a few days to dive into Hermez.
Yeah - curious what you think. My understanding is that they were able to get something like 3x - 4x speed-up for Poseidon. Though, they context for hashing is somewhat different, I believe.
I looked into Hermez and their AVX2 implementation looks almost identical to the implementation in this PR. Their drastic improvement compared to scalar is most likely due to a worse scalar/baseline implementation, but I didn't go as far as reimplementing Poseidon using our Rust primitives or reimplementing RPO using their C++ primitives. Hermez does score a ~20% win by using AVX512 but it is because Poseidon can make better use of huge registers. Their AVX512 interleaves 3 multiplications of 8-element u64 vectors. Since we only operate on 12 elements, throughput benefits are lost to higher latency of AVX512 operations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! And thank you fro the explanation! I left one minor comment inline. After it is addressed, we can merge.
src/hash/rescue/arch/mod.rs
Outdated
#[cfg(target_feature = "avx2")] | ||
pub mod x86_64_avx2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: does this need to be public?
Quality Gate passedKudos, no new issues were introduced! 0 New issues |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All looks good! Thank you!
It improves performance of the RPO hash function by leveraging AVX2 instructions on compatible x86_64 hardware (code based on plonky2: https://github.com/0xPolygonZero/plonky2/blob/main/plonky2/src/hash/arch/x86_64/poseidon_goldilocks_avx2_bmi2.rs). On AWS' Zen4 instance (c7a), I'm measuring 40% improvement against the baseline commit c86bdc6.
To leverage AVX2 implementation, code needs to be compiled with
RUSTFLAGS="-C target-feature=+avx2"
. Despite availability of AVX512 with larger register sizes, I wasn't able to leverage it for any additional speedups.Below
RUSTFLAGS="-C target-feature=+avx2" cargo bench
improvement against the baselinecargo bench
.Checklist before requesting a review
next
according to naming convention.