feat: implement RPO hash using AVX2 instructions #234

gswirski · 2023-12-08T19:16:46Z

It improves performance of the RPO hash function by leveraging AVX2 instructions on compatible x86_64 hardware (code based on plonky2: https://github.com/0xPolygonZero/plonky2/blob/main/plonky2/src/hash/arch/x86_64/poseidon_goldilocks_avx2_bmi2.rs). On AWS' Zen4 instance (c7a), I'm measuring 40% improvement against the baseline commit c86bdc6.

To leverage AVX2 implementation, code needs to be compiled with RUSTFLAGS="-C target-feature=+avx2". Despite availability of AVX512 with larger register sizes, I wasn't able to leverage it for any additional speedups.

Below RUSTFLAGS="-C target-feature=+avx2" cargo bench improvement against the baseline cargo bench.

Checklist before requesting a review

Repo forked and branch created from next according to naming convention.
Commit messages and codestyle follow conventions.
Relevant issues are linked in the PR description.
Tests added for new functionality.
Documentation/comments updated according to changes.

bobbinth · 2023-12-10T06:25:44Z

Thank you! Great results! A couple of preliminary comments/questions:

Could you apply these changes to the next branch? The structure of the code there has changed a bit - but I think it shouldn't cause any issues for this PR.
Would it make sense to add avx512 feature (similar to how we did it with sve) so that we have the option to disable the optimizations on platforms where with AVX512 support?
I'm curious, how does this approach compare to the one used by Polygon Hermez here?

gswirski · 2023-12-12T14:50:08Z

Sure thing! I’ll rebase onto next.
I defer to you, but my recommendation would be to get rid of these features (including sve). We already have a way of controlling optimizations:
- enable: RUSTFLAGS="-C target-feature=+avx2”
- disable: RUSTFLAGS="-C target-feature=-sve”
- get the best version possible:RUSTFLAGS=“-C target-cpu=native”
Adding more flags can cause people to miss these important optimizations and even lead to bit rot (e.g. we accidentally stop executing tests for optimized versions).

As a side note, this PR only uses avx2 which has much better compatibility with chips on the market (compared to avx512).
I have not seen Hermez implementation before. They do actually use AVX512 and its larger 512-bit registers - I’ll benchmark their code to see if they do better than this PR.

bobbinth · 2023-12-12T20:18:15Z

Thank you!

Re point 2: I agree with you - let's not introduce the extra feature (and I'll remove the sve feature later on in a different PR).

gswirski · 2023-12-17T21:01:08Z

I rebased this PR onto next, still need a few days to dive into Hermez.

bobbinth

Looks good! Thank you! I left a couple of comments inline - mostly about code organization.

still need a few days to dive into Hermez.

Yeah - curious what you think. My understanding is that they were able to get something like 3x - 4x speed-up for Poseidon. Though, they context for hashing is somewhat different, I believe.

src/hash/rescue/mod.rs

src/arch/x86_64_avx2/mod.rs

gswirski · 2023-12-20T21:37:34Z

I looked into Hermez and their AVX2 implementation looks almost identical to the implementation in this PR. Their drastic improvement compared to scalar is most likely due to a worse scalar/baseline implementation, but I didn't go as far as reimplementing Poseidon using our Rust primitives or reimplementing RPO using their C++ primitives.

Hermez does score a ~20% win by using AVX512 but it is because Poseidon can make better use of huge registers. Their AVX512 interleaves 3 multiplications of 8-element u64 vectors. Since we only operate on 12 elements, throughput benefits are lost to higher latency of AVX512 operations.

bobbinth

Looks good! And thank you fro the explanation! I left one minor comment inline. After it is addressed, we can merge.

bobbinth · 2023-12-20T21:30:11Z

src/hash/rescue/arch/mod.rs

+#[cfg(target_feature = "avx2")]
+pub mod x86_64_avx2;


nit: does this need to be public?

sonarcloud · 2024-01-04T18:09:15Z

Quality Gate passed

Kudos, no new issues were introduced!

0 New issues
0 Security Hotspots
No data about Coverage
No data about Duplication

See analysis details on SonarCloud

bobbinth

All looks good! Thank you!

gswirski force-pushed the avx branch from 798fc36 to d4de5a7 Compare December 8, 2023 20:00

gswirski force-pushed the avx branch from d4de5a7 to f5c871a Compare December 17, 2023 19:03

gswirski changed the base branch from main to next December 17, 2023 19:03

gswirski force-pushed the avx branch from f5c871a to 9034666 Compare December 17, 2023 20:33

bobbinth requested changes Dec 19, 2023

View reviewed changes

src/hash/rescue/mod.rs Outdated Show resolved Hide resolved

src/arch/x86_64_avx2/mod.rs Outdated Show resolved Hide resolved

gswirski force-pushed the avx branch from 9034666 to 6cddde7 Compare December 20, 2023 21:22

bobbinth approved these changes Dec 20, 2023

View reviewed changes

bobbinth force-pushed the next branch from 0d1f28c to 499f970 Compare December 21, 2023 08:20

gswirski force-pushed the avx branch from 6cddde7 to 014d231 Compare January 4, 2024 17:50

feat: use AVX2 instructions whenever available

88bcdfd

gswirski force-pushed the avx branch from 014d231 to 88bcdfd Compare January 4, 2024 18:08

bobbinth approved these changes Jan 4, 2024

View reviewed changes

bobbinth merged commit 862ccf5 into 0xPolygonMiden:next Jan 4, 2024
10 checks passed

bobbinth mentioned this pull request Jan 4, 2024

Investigate AVX512 acceleration for RPO hash function #198

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: implement RPO hash using AVX2 instructions #234

feat: implement RPO hash using AVX2 instructions #234

gswirski commented Dec 8, 2023

bobbinth commented Dec 10, 2023

gswirski commented Dec 12, 2023

bobbinth commented Dec 12, 2023

gswirski commented Dec 17, 2023

bobbinth left a comment

gswirski commented Dec 20, 2023

bobbinth left a comment

bobbinth Dec 20, 2023

sonarcloud bot commented Jan 4, 2024

bobbinth left a comment

feat: implement RPO hash using AVX2 instructions #234

feat: implement RPO hash using AVX2 instructions #234

Conversation

gswirski commented Dec 8, 2023

Checklist before requesting a review

bobbinth commented Dec 10, 2023

gswirski commented Dec 12, 2023

bobbinth commented Dec 12, 2023

gswirski commented Dec 17, 2023

bobbinth left a comment

Choose a reason for hiding this comment

gswirski commented Dec 20, 2023

bobbinth left a comment

Choose a reason for hiding this comment

bobbinth Dec 20, 2023

Choose a reason for hiding this comment

sonarcloud bot commented Jan 4, 2024

Quality Gate passed

bobbinth left a comment

Choose a reason for hiding this comment