Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AArch64: Remove literal pools from native code #667

Merged
merged 1 commit into from
Jan 17, 2025
Merged

Conversation

hanno-becker
Copy link
Contributor

This commit removes all literal pools from the native AArch64 assembly. Those literal pools are slightly easier to read, but impede verification using HOL-Light. Instead, constant vectors are prepared by loading immediates into GPRs and copying/broadcasting them into the target vector.

@hanno-becker hanno-becker marked this pull request as ready for review January 17, 2025 07:18
@hanno-becker hanno-becker requested a review from a team January 17, 2025 07:18
@hanno-becker hanno-becker added the benchmark this PR should be benchmarked in CI label Jan 17, 2025
Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A76 (Raspberry Pi 5) benchmarks

Benchmark suite Current: 7d15fdb Previous: e4ff720 Ratio
ML-KEM-512 keypair 29054 cycles 28986 cycles 1.00
ML-KEM-512 encaps 35419 cycles 35399 cycles 1.00
ML-KEM-512 decaps 45892 cycles 45896 cycles 1.00
ML-KEM-768 keypair 49360 cycles 49364 cycles 1.00
ML-KEM-768 encaps 55620 cycles 55564 cycles 1.00
ML-KEM-768 decaps 70402 cycles 70315 cycles 1.00
ML-KEM-1024 keypair 72108 cycles 71989 cycles 1.00
ML-KEM-1024 encaps 80825 cycles 80746 cycles 1.00
ML-KEM-1024 decaps 100687 cycles 100615 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 3rd gen (c6a)

Benchmark suite Current: 7d15fdb Previous: e4ff720 Ratio
ML-KEM-512 keypair 18134 cycles 18102 cycles 1.00
ML-KEM-512 encaps 23045 cycles 23015 cycles 1.00
ML-KEM-512 decaps 30254 cycles 30251 cycles 1.00
ML-KEM-768 keypair 31159 cycles 31124 cycles 1.00
ML-KEM-768 encaps 33948 cycles 33998 cycles 1.00
ML-KEM-768 decaps 44584 cycles 44518 cycles 1.00
ML-KEM-1024 keypair 44727 cycles 44599 cycles 1.00
ML-KEM-1024 encaps 49953 cycles 49893 cycles 1.00
ML-KEM-1024 decaps 64420 cycles 64399 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton2

Benchmark suite Current: 7d15fdb Previous: e4ff720 Ratio
ML-KEM-512 keypair 29051 cycles 28986 cycles 1.00
ML-KEM-512 encaps 35402 cycles 35425 cycles 1.00
ML-KEM-512 decaps 45900 cycles 45888 cycles 1.00
ML-KEM-768 keypair 49379 cycles 49378 cycles 1.00
ML-KEM-768 encaps 55642 cycles 55565 cycles 1.00
ML-KEM-768 decaps 70454 cycles 70311 cycles 1.00
ML-KEM-1024 keypair 72117 cycles 71969 cycles 1.00
ML-KEM-1024 encaps 80849 cycles 80763 cycles 1.00
ML-KEM-1024 decaps 100733 cycles 100630 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton3

Benchmark suite Current: 7d15fdb Previous: e4ff720 Ratio
ML-KEM-512 keypair 18954 cycles 18959 cycles 1.00
ML-KEM-512 encaps 23562 cycles 23572 cycles 1.00
ML-KEM-512 decaps 30704 cycles 30660 cycles 1.00
ML-KEM-768 keypair 32312 cycles 32312 cycles 1
ML-KEM-768 encaps 35873 cycles 35886 cycles 1.00
ML-KEM-768 decaps 46023 cycles 46021 cycles 1.00
ML-KEM-1024 keypair 46558 cycles 46634 cycles 1.00
ML-KEM-1024 encaps 52442 cycles 52456 cycles 1.00
ML-KEM-1024 decaps 66253 cycles 66268 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 3rd gen (c6a) (no-opt)

Benchmark suite Current: 7d15fdb Previous: e4ff720 Ratio
ML-KEM-512 keypair 43354 cycles 43336 cycles 1.00
ML-KEM-512 encaps 51813 cycles 51844 cycles 1.00
ML-KEM-512 decaps 67069 cycles 67031 cycles 1.00
ML-KEM-768 keypair 71673 cycles 71630 cycles 1.00
ML-KEM-768 encaps 82778 cycles 82693 cycles 1.00
ML-KEM-768 decaps 102937 cycles 103012 cycles 1.00
ML-KEM-1024 keypair 106866 cycles 106602 cycles 1.00
ML-KEM-1024 encaps 121161 cycles 121422 cycles 1.00
ML-KEM-1024 decaps 147008 cycles 146875 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton3 (no-opt)

Benchmark suite Current: 7d15fdb Previous: e4ff720 Ratio
ML-KEM-512 keypair 39345 cycles 39365 cycles 1.00
ML-KEM-512 encaps 45463 cycles 45441 cycles 1.00
ML-KEM-512 decaps 57383 cycles 57491 cycles 1.00
ML-KEM-768 keypair 65845 cycles 65827 cycles 1.00
ML-KEM-768 encaps 73795 cycles 73817 cycles 1.00
ML-KEM-768 decaps 89870 cycles 89874 cycles 1.00
ML-KEM-1024 keypair 98974 cycles 98958 cycles 1.00
ML-KEM-1024 encaps 110054 cycles 110050 cycles 1.00
ML-KEM-1024 decaps 130823 cycles 130832 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton2 (no-opt)

Benchmark suite Current: 7d15fdb Previous: e4ff720 Ratio
ML-KEM-512 keypair 60705 cycles 60686 cycles 1.00
ML-KEM-512 encaps 69882 cycles 69803 cycles 1.00
ML-KEM-512 decaps 88684 cycles 88744 cycles 1.00
ML-KEM-768 keypair 101829 cycles 101761 cycles 1.00
ML-KEM-768 encaps 113963 cycles 113894 cycles 1.00
ML-KEM-768 decaps 139384 cycles 139325 cycles 1.00
ML-KEM-1024 keypair 154338 cycles 154153 cycles 1.00
ML-KEM-1024 encaps 170098 cycles 169847 cycles 1.00
ML-KEM-1024 decaps 202520 cycles 202209 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@mkannwischer mkannwischer added benchmark this PR should be benchmarked in CI and removed benchmark this PR should be benchmarked in CI labels Jan 17, 2025
Copy link
Contributor

@mkannwischer mkannwischer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good. I'd like to see one complete benchmarking run after your last force push just to be sure.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 4th gen (c7i)

Benchmark suite Current: 7d15fdb Previous: e4ff720 Ratio
ML-KEM-512 keypair 13526 cycles 13975 cycles 0.97
ML-KEM-512 encaps 17268 cycles 17236 cycles 1.00
ML-KEM-512 decaps 22793 cycles 23052 cycles 0.99
ML-KEM-768 keypair 22474 cycles 22520 cycles 1.00
ML-KEM-768 encaps 24503 cycles 24524 cycles 1.00
ML-KEM-768 decaps 32399 cycles 32548 cycles 1.00
ML-KEM-1024 keypair 31487 cycles 31383 cycles 1.00
ML-KEM-1024 encaps 35012 cycles 34928 cycles 1.00
ML-KEM-1024 decaps 45965 cycles 45798 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 4th gen (c7i) (no-opt)

Benchmark suite Current: 7d15fdb Previous: e4ff720 Ratio
ML-KEM-512 keypair 33160 cycles 33236 cycles 1.00
ML-KEM-512 encaps 38697 cycles 38554 cycles 1.00
ML-KEM-512 decaps 50864 cycles 50857 cycles 1.00
ML-KEM-768 keypair 54850 cycles 54878 cycles 1.00
ML-KEM-768 encaps 60668 cycles 60646 cycles 1.00
ML-KEM-768 decaps 75867 cycles 75857 cycles 1.00
ML-KEM-1024 keypair 81837 cycles 81938 cycles 1.00
ML-KEM-1024 encaps 91696 cycles 91771 cycles 1.00
ML-KEM-1024 decaps 111498 cycles 111446 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 3rd gen (c6i)

Benchmark suite Current: 7d15fdb Previous: e4ff720 Ratio
ML-KEM-512 keypair 20348 cycles 20350 cycles 1.00
ML-KEM-512 encaps 26955 cycles 26953 cycles 1.00
ML-KEM-512 decaps 36009 cycles 35746 cycles 1.01
ML-KEM-768 keypair 34900 cycles 34886 cycles 1.00
ML-KEM-768 encaps 38181 cycles 38182 cycles 1.00
ML-KEM-768 decaps 50956 cycles 50946 cycles 1.00
ML-KEM-1024 keypair 47961 cycles 47950 cycles 1.00
ML-KEM-1024 encaps 54110 cycles 54099 cycles 1.00
ML-KEM-1024 decaps 71576 cycles 71603 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 4th gen (c7a)

Benchmark suite Current: 7d15fdb Previous: e4ff720 Ratio
ML-KEM-512 keypair 14920 cycles 14915 cycles 1.00
ML-KEM-512 encaps 19650 cycles 19645 cycles 1.00
ML-KEM-512 decaps 26298 cycles 26297 cycles 1.00
ML-KEM-768 keypair 25599 cycles 25589 cycles 1.00
ML-KEM-768 encaps 28060 cycles 28078 cycles 1.00
ML-KEM-768 decaps 37929 cycles 37792 cycles 1.00
ML-KEM-1024 keypair 35679 cycles 35753 cycles 1.00
ML-KEM-1024 encaps 40966 cycles 40945 cycles 1.00
ML-KEM-1024 decaps 54445 cycles 54417 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 3rd gen (c6i) (no-opt)

Benchmark suite Current: 7d15fdb Previous: e4ff720 Ratio
ML-KEM-512 keypair 51411 cycles 51419 cycles 1.00
ML-KEM-512 encaps 59575 cycles 59547 cycles 1.00
ML-KEM-512 decaps 76550 cycles 76562 cycles 1.00
ML-KEM-768 keypair 84259 cycles 84278 cycles 1.00
ML-KEM-768 encaps 94964 cycles 94991 cycles 1.00
ML-KEM-768 decaps 117113 cycles 117180 cycles 1.00
ML-KEM-1024 keypair 124532 cycles 124782 cycles 1.00
ML-KEM-1024 encaps 138737 cycles 138756 cycles 1.00
ML-KEM-1024 decaps 167364 cycles 167416 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 4th gen (c7a) (no-opt)

Benchmark suite Current: 7d15fdb Previous: e4ff720 Ratio
ML-KEM-512 keypair 39585 cycles 39375 cycles 1.01
ML-KEM-512 encaps 45641 cycles 45586 cycles 1.00
ML-KEM-512 decaps 59101 cycles 59050 cycles 1.00
ML-KEM-768 keypair 64510 cycles 64596 cycles 1.00
ML-KEM-768 encaps 73138 cycles 72846 cycles 1.00
ML-KEM-768 decaps 91091 cycles 91403 cycles 1.00
ML-KEM-1024 keypair 96044 cycles 95969 cycles 1.00
ML-KEM-1024 encaps 107178 cycles 107130 cycles 1.00
ML-KEM-1024 decaps 130762 cycles 130669 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton4

Benchmark suite Current: 7d15fdb Previous: e4ff720 Ratio
ML-KEM-512 keypair 18126 cycles 18116 cycles 1.00
ML-KEM-512 encaps 22175 cycles 22178 cycles 1.00
ML-KEM-512 decaps 28835 cycles 28839 cycles 1.00
ML-KEM-768 keypair 30559 cycles 30560 cycles 1.00
ML-KEM-768 encaps 33617 cycles 33637 cycles 1.00
ML-KEM-768 decaps 43168 cycles 43158 cycles 1.00
ML-KEM-1024 keypair 44169 cycles 44163 cycles 1.00
ML-KEM-1024 encaps 49653 cycles 49653 cycles 1
ML-KEM-1024 decaps 62606 cycles 62642 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton4 (no-opt)

Benchmark suite Current: 7d15fdb Previous: e4ff720 Ratio
ML-KEM-512 keypair 37985 cycles 38042 cycles 1.00
ML-KEM-512 encaps 43306 cycles 43379 cycles 1.00
ML-KEM-512 decaps 55588 cycles 55553 cycles 1.00
ML-KEM-768 keypair 63189 cycles 63014 cycles 1.00
ML-KEM-768 encaps 70437 cycles 70323 cycles 1.00
ML-KEM-768 decaps 86874 cycles 86776 cycles 1.00
ML-KEM-1024 keypair 94505 cycles 94468 cycles 1.00
ML-KEM-1024 encaps 105259 cycles 105175 cycles 1.00
ML-KEM-1024 decaps 126500 cycles 126797 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

This commit removes all literal pools from the native AArch64 assembly.
Those literal pools are slightly easier to read, but impede verification
using HOL-Light. Instead, constant vectors are prepared by loading immediates
into GPRs and copying/broadcasting them into the target vector.

Signed-off-by: Hanno Becker <beckphan@amazon.co.uk>
@hanno-becker hanno-becker added benchmark this PR should be benchmarked in CI and removed benchmark this PR should be benchmarked in CI labels Jan 17, 2025
Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A55 (Snapdragon 888) benchmarks

Benchmark suite Current: 7d15fdb Previous: e4ff720 Ratio
ML-KEM-512 keypair 58346 cycles 58327 cycles 1.00
ML-KEM-512 encaps 65747 cycles 65795 cycles 1.00
ML-KEM-512 decaps 84554 cycles 84608 cycles 1.00
ML-KEM-768 keypair 98987 cycles 99034 cycles 1.00
ML-KEM-768 encaps 110142 cycles 110325 cycles 1.00
ML-KEM-768 decaps 137284 cycles 137025 cycles 1.00
ML-KEM-1024 keypair 150235 cycles 150344 cycles 1.00
ML-KEM-1024 encaps 166495 cycles 166740 cycles 1.00
ML-KEM-1024 decaps 202438 cycles 202805 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bananapi bpi-f3 benchmarks

Benchmark suite Current: 7d15fdb Previous: e4ff720 Ratio
ML-KEM-512 keypair 334919 cycles 334942 cycles 1.00
ML-KEM-512 encaps 443623 cycles 443716 cycles 1.00
ML-KEM-512 decaps 591660 cycles 591749 cycles 1.00
ML-KEM-768 keypair 559257 cycles 559270 cycles 1.00
ML-KEM-768 encaps 697687 cycles 697687 cycles 1
ML-KEM-768 decaps 890117 cycles 890201 cycles 1.00
ML-KEM-1024 keypair 828054 cycles 828159 cycles 1.00
ML-KEM-1024 encaps 1000522 cycles 999913 cycles 1.00
ML-KEM-1024 decaps 1231962 cycles 1232943 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A72 (Raspberry Pi 4) benchmarks

Benchmark suite Current: 7d15fdb Previous: e4ff720 Ratio
ML-KEM-512 keypair 51624 cycles 51568 cycles 1.00
ML-KEM-512 encaps 58057 cycles 58005 cycles 1.00
ML-KEM-512 decaps 74302 cycles 74147 cycles 1.00
ML-KEM-768 keypair 88316 cycles 87902 cycles 1.00
ML-KEM-768 encaps 97274 cycles 96043 cycles 1.01
ML-KEM-768 decaps 119245 cycles 119330 cycles 1.00
ML-KEM-1024 keypair 131103 cycles 131867 cycles 0.99
ML-KEM-1024 encaps 144551 cycles 145139 cycles 1.00
ML-KEM-1024 decaps 177206 cycles 176050 cycles 1.01

This comment was automatically generated by workflow using github-action-benchmark.

@hanno-becker hanno-becker merged commit c79b97a into main Jan 17, 2025
156 checks passed
@hanno-becker hanno-becker deleted the asm_no_consts branch January 17, 2025 13:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
benchmark this PR should be benchmarked in CI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants