Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce separate source files for compression, sampling, and arithmetic #674

Merged
merged 2 commits into from
Jan 22, 2025

Conversation

hanno-becker
Copy link
Contributor

@hanno-becker hanno-becker commented Jan 21, 2025

This PR merges cbd.* and rej_uniform.* into sampling.*, now containing all generic sampling code. It also moves ntt.* and reduce.h into poly.*, now containing all arithmetic. Finally, compression-related routines are moved from poly.* into newly created compress.*.

@hanno-becker
Copy link
Contributor Author

@mkannwischer Request-for-comment. If we do this, we should also find a suitable new name for polyvec.* which is a misnomer by now.

@hanno-becker hanno-becker changed the title Reorganize files Introduce separate source files for compression, sampling, and arithmetic Jan 21, 2025
@hanno-becker hanno-becker force-pushed the sampling branch 3 times, most recently from 87ccb79 to c489500 Compare January 21, 2025 06:18
@hanno-becker hanno-becker marked this pull request as ready for review January 21, 2025 06:48
@hanno-becker hanno-becker requested a review from a team January 21, 2025 06:48
@mkannwischer mkannwischer added the benchmark this PR should be benchmarked in CI label Jan 21, 2025
Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A76 (Raspberry Pi 5) benchmarks

Benchmark suite Current: 51f5adb Previous: 21c0c39 Ratio
ML-KEM-512 keypair 29056 cycles 29082 cycles 1.00
ML-KEM-512 encaps 35420 cycles 35434 cycles 1.00
ML-KEM-512 decaps 45897 cycles 45904 cycles 1.00
ML-KEM-768 keypair 49310 cycles 49301 cycles 1.00
ML-KEM-768 encaps 55549 cycles 55593 cycles 1.00
ML-KEM-768 decaps 70331 cycles 70352 cycles 1.00
ML-KEM-1024 keypair 71998 cycles 72021 cycles 1.00
ML-KEM-1024 encaps 80725 cycles 80731 cycles 1.00
ML-KEM-1024 decaps 100641 cycles 100643 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 4th gen (c7i)

Benchmark suite Current: 51f5adb Previous: 21c0c39 Ratio
ML-KEM-512 keypair 13809 cycles 13511 cycles 1.02
ML-KEM-512 encaps 17583 cycles 17292 cycles 1.02
ML-KEM-512 decaps 23426 cycles 22847 cycles 1.03
ML-KEM-768 keypair 23221 cycles 22509 cycles 1.03
ML-KEM-768 encaps 25143 cycles 24493 cycles 1.03
ML-KEM-768 decaps 33670 cycles 32433 cycles 1.04
ML-KEM-1024 keypair 32105 cycles 31544 cycles 1.02
ML-KEM-1024 encaps 35649 cycles 34858 cycles 1.02
ML-KEM-1024 decaps 47128 cycles 45807 cycles 1.03

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'Intel Xeon 4th gen (c7i)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.

Benchmark suite Current: 51f5adb Previous: 21c0c39 Ratio
ML-KEM-768 keypair 23221 cycles 22509 cycles 1.03
ML-KEM-768 decaps 33670 cycles 32433 cycles 1.04

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 4th gen (c7i) (no-opt)

Benchmark suite Current: 51f5adb Previous: 21c0c39 Ratio
ML-KEM-512 keypair 33084 cycles 33248 cycles 1.00
ML-KEM-512 encaps 38736 cycles 38603 cycles 1.00
ML-KEM-512 decaps 50271 cycles 49993 cycles 1.01
ML-KEM-768 keypair 55241 cycles 54116 cycles 1.02
ML-KEM-768 encaps 62290 cycles 61116 cycles 1.02
ML-KEM-768 decaps 77075 cycles 75823 cycles 1.02
ML-KEM-1024 keypair 82731 cycles 82438 cycles 1.00
ML-KEM-1024 encaps 93654 cycles 92939 cycles 1.01
ML-KEM-1024 decaps 114159 cycles 111511 cycles 1.02

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A55 (Snapdragon 888) benchmarks

Benchmark suite Current: 51f5adb Previous: 21c0c39 Ratio
ML-KEM-512 keypair 58250 cycles 58329 cycles 1.00
ML-KEM-512 encaps 65708 cycles 65721 cycles 1.00
ML-KEM-512 decaps 84411 cycles 84520 cycles 1.00
ML-KEM-768 keypair 98917 cycles 98937 cycles 1.00
ML-KEM-768 encaps 110351 cycles 110308 cycles 1.00
ML-KEM-768 decaps 136550 cycles 136632 cycles 1.00
ML-KEM-1024 keypair 150066 cycles 150103 cycles 1.00
ML-KEM-1024 encaps 166118 cycles 166430 cycles 1.00
ML-KEM-1024 decaps 202011 cycles 202325 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 3rd gen (c6a)

Benchmark suite Current: 51f5adb Previous: 21c0c39 Ratio
ML-KEM-512 keypair 18048 cycles 18112 cycles 1.00
ML-KEM-512 encaps 22851 cycles 22987 cycles 0.99
ML-KEM-512 decaps 30130 cycles 30219 cycles 1.00
ML-KEM-768 keypair 30989 cycles 31173 cycles 0.99
ML-KEM-768 encaps 33749 cycles 33895 cycles 1.00
ML-KEM-768 decaps 44499 cycles 44517 cycles 1.00
ML-KEM-1024 keypair 44271 cycles 44528 cycles 0.99
ML-KEM-1024 encaps 49469 cycles 49789 cycles 0.99
ML-KEM-1024 decaps 63930 cycles 64472 cycles 0.99

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 3rd gen (c6i)

Benchmark suite Current: 51f5adb Previous: 21c0c39 Ratio
ML-KEM-512 keypair 20396 cycles 20361 cycles 1.00
ML-KEM-512 encaps 27056 cycles 27150 cycles 1.00
ML-KEM-512 decaps 35932 cycles 35746 cycles 1.01
ML-KEM-768 keypair 34908 cycles 34897 cycles 1.00
ML-KEM-768 encaps 38151 cycles 38179 cycles 1.00
ML-KEM-768 decaps 50946 cycles 50949 cycles 1.00
ML-KEM-1024 keypair 48045 cycles 48039 cycles 1.00
ML-KEM-1024 encaps 54135 cycles 53911 cycles 1.00
ML-KEM-1024 decaps 71562 cycles 71361 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 4th gen (c7a)

Benchmark suite Current: 51f5adb Previous: 21c0c39 Ratio
ML-KEM-512 keypair 14982 cycles 14886 cycles 1.01
ML-KEM-512 encaps 19696 cycles 19693 cycles 1.00
ML-KEM-512 decaps 26366 cycles 26308 cycles 1.00
ML-KEM-768 keypair 25496 cycles 25598 cycles 1.00
ML-KEM-768 encaps 27992 cycles 28059 cycles 1.00
ML-KEM-768 decaps 37691 cycles 37926 cycles 0.99
ML-KEM-1024 keypair 35469 cycles 35896 cycles 0.99
ML-KEM-1024 encaps 40359 cycles 40642 cycles 0.99
ML-KEM-1024 decaps 53716 cycles 54329 cycles 0.99

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton4

Benchmark suite Current: 51f5adb Previous: 21c0c39 Ratio
ML-KEM-512 keypair 18139 cycles 18116 cycles 1.00
ML-KEM-512 encaps 22173 cycles 22181 cycles 1.00
ML-KEM-512 decaps 28835 cycles 28846 cycles 1.00
ML-KEM-768 keypair 30538 cycles 30555 cycles 1.00
ML-KEM-768 encaps 33622 cycles 33624 cycles 1.00
ML-KEM-768 decaps 43167 cycles 43151 cycles 1.00
ML-KEM-1024 keypair 44192 cycles 44155 cycles 1.00
ML-KEM-1024 encaps 49646 cycles 49644 cycles 1.00
ML-KEM-1024 decaps 62633 cycles 62631 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 3rd gen (c6a) (no-opt)

Benchmark suite Current: 51f5adb Previous: 21c0c39 Ratio
ML-KEM-512 keypair 43530 cycles 43294 cycles 1.01
ML-KEM-512 encaps 52040 cycles 51438 cycles 1.01
ML-KEM-512 decaps 67329 cycles 66703 cycles 1.01
ML-KEM-768 keypair 71550 cycles 71830 cycles 1.00
ML-KEM-768 encaps 83170 cycles 83290 cycles 1.00
ML-KEM-768 decaps 103244 cycles 103534 cycles 1.00
ML-KEM-1024 keypair 106852 cycles 107123 cycles 1.00
ML-KEM-1024 encaps 121803 cycles 122111 cycles 1.00
ML-KEM-1024 decaps 147375 cycles 147944 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 4th gen (c7a) (no-opt)

Benchmark suite Current: 51f5adb Previous: 21c0c39 Ratio
ML-KEM-512 keypair 39558 cycles 39338 cycles 1.01
ML-KEM-512 encaps 45516 cycles 45566 cycles 1.00
ML-KEM-512 decaps 58955 cycles 59072 cycles 1.00
ML-KEM-768 keypair 64908 cycles 64869 cycles 1.00
ML-KEM-768 encaps 73241 cycles 73172 cycles 1.00
ML-KEM-768 decaps 91578 cycles 91278 cycles 1.00
ML-KEM-1024 keypair 96762 cycles 96609 cycles 1.00
ML-KEM-1024 encaps 107944 cycles 107795 cycles 1.00
ML-KEM-1024 decaps 131298 cycles 131159 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 3rd gen (c6i) (no-opt)

Benchmark suite Current: 51f5adb Previous: 21c0c39 Ratio
ML-KEM-512 keypair 51758 cycles 51581 cycles 1.00
ML-KEM-512 encaps 59968 cycles 59624 cycles 1.01
ML-KEM-512 decaps 76904 cycles 76584 cycles 1.00
ML-KEM-768 keypair 84414 cycles 84341 cycles 1.00
ML-KEM-768 encaps 96057 cycles 95697 cycles 1.00
ML-KEM-768 decaps 118106 cycles 117794 cycles 1.00
ML-KEM-1024 keypair 125398 cycles 125581 cycles 1.00
ML-KEM-1024 encaps 139581 cycles 139615 cycles 1.00
ML-KEM-1024 decaps 168622 cycles 168458 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton4 (no-opt)

Benchmark suite Current: 51f5adb Previous: 21c0c39 Ratio
ML-KEM-512 keypair 37955 cycles 38050 cycles 1.00
ML-KEM-512 encaps 43336 cycles 43372 cycles 1.00
ML-KEM-512 decaps 55515 cycles 55552 cycles 1.00
ML-KEM-768 keypair 62989 cycles 63040 cycles 1.00
ML-KEM-768 encaps 70532 cycles 70438 cycles 1.00
ML-KEM-768 decaps 86996 cycles 86902 cycles 1.00
ML-KEM-1024 keypair 94535 cycles 94524 cycles 1.00
ML-KEM-1024 encaps 105327 cycles 105306 cycles 1.00
ML-KEM-1024 decaps 126647 cycles 126977 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton2

Benchmark suite Current: 51f5adb Previous: 21c0c39 Ratio
ML-KEM-512 keypair 29058 cycles 29072 cycles 1.00
ML-KEM-512 encaps 35405 cycles 35448 cycles 1.00
ML-KEM-512 decaps 45902 cycles 45890 cycles 1.00
ML-KEM-768 keypair 49331 cycles 49312 cycles 1.00
ML-KEM-768 encaps 55575 cycles 55594 cycles 1.00
ML-KEM-768 decaps 70372 cycles 70386 cycles 1.00
ML-KEM-1024 keypair 72017 cycles 72023 cycles 1.00
ML-KEM-1024 encaps 80773 cycles 80753 cycles 1.00
ML-KEM-1024 decaps 100688 cycles 100669 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton3

Benchmark suite Current: 51f5adb Previous: 21c0c39 Ratio
ML-KEM-512 keypair 18936 cycles 18948 cycles 1.00
ML-KEM-512 encaps 23542 cycles 23573 cycles 1.00
ML-KEM-512 decaps 30706 cycles 30676 cycles 1.00
ML-KEM-768 keypair 32291 cycles 32337 cycles 1.00
ML-KEM-768 encaps 35872 cycles 35895 cycles 1.00
ML-KEM-768 decaps 46002 cycles 46039 cycles 1.00
ML-KEM-1024 keypair 46585 cycles 46584 cycles 1.00
ML-KEM-1024 encaps 52433 cycles 52474 cycles 1.00
ML-KEM-1024 decaps 66255 cycles 66237 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton3 (no-opt)

Benchmark suite Current: 51f5adb Previous: 21c0c39 Ratio
ML-KEM-512 keypair 39326 cycles 39346 cycles 1.00
ML-KEM-512 encaps 45369 cycles 45340 cycles 1.00
ML-KEM-512 decaps 57249 cycles 57376 cycles 1.00
ML-KEM-768 keypair 65880 cycles 65864 cycles 1.00
ML-KEM-768 encaps 73740 cycles 73714 cycles 1.00
ML-KEM-768 decaps 89810 cycles 89699 cycles 1.00
ML-KEM-1024 keypair 99034 cycles 98985 cycles 1.00
ML-KEM-1024 encaps 109979 cycles 109973 cycles 1.00
ML-KEM-1024 decaps 130813 cycles 130732 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton2 (no-opt)

Benchmark suite Current: 51f5adb Previous: 21c0c39 Ratio
ML-KEM-512 keypair 60747 cycles 60695 cycles 1.00
ML-KEM-512 encaps 69799 cycles 69825 cycles 1.00
ML-KEM-512 decaps 88897 cycles 88750 cycles 1.00
ML-KEM-768 keypair 101962 cycles 101845 cycles 1.00
ML-KEM-768 encaps 114140 cycles 115088 cycles 0.99
ML-KEM-768 decaps 139426 cycles 140736 cycles 0.99
ML-KEM-1024 keypair 154221 cycles 154440 cycles 1.00
ML-KEM-1024 encaps 170004 cycles 171523 cycles 0.99
ML-KEM-1024 decaps 202508 cycles 204076 cycles 0.99

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Contributor

@mkannwischer mkannwischer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do all restructuring in one go:
I'm okay with your separation by compression, sampling, and arithmetic.
For the k-dependent wrappers in polyvec.c, I suggest something like poly_k.c?
The polyvec type can be moved to poly.h?

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bananapi bpi-f3 benchmarks

Benchmark suite Current: 51f5adb Previous: 21c0c39 Ratio
ML-KEM-512 keypair 332422 cycles 331511 cycles 1.00
ML-KEM-512 encaps 441824 cycles 440475 cycles 1.00
ML-KEM-512 decaps 590859 cycles 588796 cycles 1.00
ML-KEM-768 keypair 550706 cycles 548481 cycles 1.00
ML-KEM-768 encaps 690203 cycles 687539 cycles 1.00
ML-KEM-768 decaps 882664 cycles 880077 cycles 1.00
ML-KEM-1024 keypair 811843 cycles 809152 cycles 1.00
ML-KEM-1024 encaps 984536 cycles 981016 cycles 1.00
ML-KEM-1024 decaps 1219227 cycles 1214963 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A72 (Raspberry Pi 4) benchmarks

Benchmark suite Current: 51f5adb Previous: 21c0c39 Ratio
ML-KEM-512 keypair 51782 cycles 51874 cycles 1.00
ML-KEM-512 encaps 58355 cycles 58261 cycles 1.00
ML-KEM-512 decaps 75242 cycles 75219 cycles 1.00
ML-KEM-768 keypair 87581 cycles 88284 cycles 0.99
ML-KEM-768 encaps 95869 cycles 95963 cycles 1.00
ML-KEM-768 decaps 119614 cycles 119280 cycles 1.00
ML-KEM-1024 keypair 132764 cycles 131943 cycles 1.01
ML-KEM-1024 encaps 145122 cycles 144532 cycles 1.00
ML-KEM-1024 decaps 177038 cycles 175238 cycles 1.01

This comment was automatically generated by workflow using github-action-benchmark.

@hanno-becker
Copy link
Contributor Author

For the k-dependent wrappers in polyvec.c, I suggest something like poly_k.c?

Ok

The polyvec type can be moved to poly.h?

That doesn't work, poly.h is independent of k

This commit merges `cbd.*` and `rej_uniform.*` into `sampling.*`.
It also moves `ntt.*` and `reduce.h` into `poly.*`. Finally,
compression-related routines are moved into newly created `compress.*`.

Signed-off-by: Hanno Becker <beckphan@amazon.co.uk>
@hanno-becker hanno-becker force-pushed the sampling branch 2 times, most recently from c7923f4 to 59b2ed6 Compare January 21, 2025 08:51
@hanno-becker
Copy link
Contributor Author

@mkannwischer Done

@hanno-becker hanno-becker added benchmark this PR should be benchmarked in CI and removed benchmark this PR should be benchmarked in CI labels Jan 21, 2025
Originally, polyvec was exclusively home to functions dealing with
the `polyvec` structure. Recent changes have invalidated that:
In addition to functions operating on `struct polyvec`, polyvec.[ch]
hosts all poly-related functionality that depends on the security-level
(e.g. the `poly_cbd_eta[12]` wrappers).

This commit renames `polyvec.[ch]` to `poly_k.[ch]` to reflect
this change in meaning.

This change was obtained by
- Manual renaming of the files `polyvec.[ch]` -> `poly_k.[ch]`
- ```bash
  git ls-files         \
     | xargs -I {} sh  \
       -c "sed -i '' 's/polyvec\.\([ch]\)/poly_k.\1/g' {}"
  ```
- Manual updating of symlinks in
  * examples/bring_your_own_fips202
  * examples/custom_backend
- Bump CBMC OBJECT_BITS from 9 to 10 in invntt_layer,
  to counter a CBMC failure for MLKEM_K=4.

Signed-off-by: Hanno Becker <beckphan@amazon.co.uk>
@hanno-becker hanno-becker added benchmark this PR should be benchmarked in CI and removed benchmark this PR should be benchmarked in CI labels Jan 21, 2025
@mkannwischer mkannwischer added benchmark this PR should be benchmarked in CI and removed benchmark this PR should be benchmarked in CI labels Jan 22, 2025
Copy link
Contributor

@mkannwischer mkannwischer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a very tiny performance regression for the optimized backend on c7i. Seems to be reproducible (+4%), but very close to usual noise (+-3%).
I'm not sure how moving around code like this can result in this much performance difference, but this is very likely very dependent on the compiler (version).
I think we can merge this restructuring regardless.

@hanno-becker hanno-becker merged commit f1bd62f into main Jan 22, 2025
188 checks passed
@hanno-becker hanno-becker deleted the sampling branch January 22, 2025 05:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
benchmark this PR should be benchmarked in CI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants