-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add optimized NTT and invNTT in C #683
base: main
Are you sure you want to change the base?
Conversation
Code size metrics using gcc 14.2.0 on Apple M1 yields main: 4368 bytes so increase of 3952 bytes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arm Cortex-A76 (Raspberry Pi 5) benchmarks
Benchmark suite | Current: c8c3692 | Previous: 94c8b47 | Ratio |
---|---|---|---|
ML-KEM-512 keypair |
28950 cycles |
28965 cycles |
1.00 |
ML-KEM-512 encaps |
34259 cycles |
34252 cycles |
1.00 |
ML-KEM-512 decaps |
44775 cycles |
44728 cycles |
1.00 |
ML-KEM-768 keypair |
49201 cycles |
49307 cycles |
1.00 |
ML-KEM-768 encaps |
54560 cycles |
54571 cycles |
1.00 |
ML-KEM-768 decaps |
69387 cycles |
69426 cycles |
1.00 |
ML-KEM-1024 keypair |
71916 cycles |
71915 cycles |
1.00 |
ML-KEM-1024 encaps |
80489 cycles |
80618 cycles |
1.00 |
ML-KEM-1024 decaps |
100492 cycles |
100358 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intel Xeon 4th gen (c7i)
Benchmark suite | Current: c8c3692 | Previous: 94c8b47 | Ratio |
---|---|---|---|
ML-KEM-512 keypair |
9211 cycles |
9314 cycles |
0.99 |
ML-KEM-512 encaps |
10728 cycles |
10833 cycles |
0.99 |
ML-KEM-512 decaps |
14908 cycles |
14762 cycles |
1.01 |
ML-KEM-768 keypair |
16054 cycles |
16125 cycles |
1.00 |
ML-KEM-768 encaps |
17332 cycles |
17222 cycles |
1.01 |
ML-KEM-768 decaps |
23077 cycles |
23277 cycles |
0.99 |
ML-KEM-1024 keypair |
21364 cycles |
21371 cycles |
1.00 |
ML-KEM-1024 encaps |
23332 cycles |
23306 cycles |
1.00 |
ML-KEM-1024 decaps |
30953 cycles |
30896 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intel Xeon 4th gen (c7i) (no-opt)
Benchmark suite | Current: c8c3692 | Previous: 94c8b47 | Ratio |
---|---|---|---|
ML-KEM-512 keypair |
25837 cycles |
29371 cycles |
0.88 |
ML-KEM-512 encaps |
29332 cycles |
34937 cycles |
0.84 |
ML-KEM-512 decaps |
35699 cycles |
45404 cycles |
0.79 |
ML-KEM-768 keypair |
43851 cycles |
47083 cycles |
0.93 |
ML-KEM-768 encaps |
49019 cycles |
55467 cycles |
0.88 |
ML-KEM-768 decaps |
58058 cycles |
67463 cycles |
0.86 |
ML-KEM-1024 keypair |
66203 cycles |
71606 cycles |
0.92 |
ML-KEM-1024 encaps |
73160 cycles |
81980 cycles |
0.89 |
ML-KEM-1024 decaps |
85151 cycles |
99279 cycles |
0.86 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AMD EPYC 3rd gen (c6a)
Benchmark suite | Current: c8c3692 | Previous: 94c8b47 | Ratio |
---|---|---|---|
ML-KEM-512 keypair |
16974 cycles |
16960 cycles |
1.00 |
ML-KEM-512 encaps |
18642 cycles |
18692 cycles |
1.00 |
ML-KEM-512 decaps |
24038 cycles |
24036 cycles |
1.00 |
ML-KEM-768 keypair |
28692 cycles |
28708 cycles |
1.00 |
ML-KEM-768 encaps |
29775 cycles |
29783 cycles |
1.00 |
ML-KEM-768 decaps |
37563 cycles |
37567 cycles |
1.00 |
ML-KEM-1024 keypair |
41423 cycles |
41708 cycles |
0.99 |
ML-KEM-1024 encaps |
43669 cycles |
43976 cycles |
0.99 |
ML-KEM-1024 decaps |
53971 cycles |
54287 cycles |
0.99 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intel Xeon 3rd gen (c6i)
Benchmark suite | Current: c8c3692 | Previous: 94c8b47 | Ratio |
---|---|---|---|
ML-KEM-512 keypair |
15924 cycles |
15930 cycles |
1.00 |
ML-KEM-512 encaps |
17996 cycles |
18001 cycles |
1.00 |
ML-KEM-512 decaps |
24553 cycles |
24514 cycles |
1.00 |
ML-KEM-768 keypair |
27362 cycles |
27359 cycles |
1.00 |
ML-KEM-768 encaps |
28912 cycles |
28918 cycles |
1.00 |
ML-KEM-768 decaps |
38335 cycles |
38352 cycles |
1.00 |
ML-KEM-1024 keypair |
36969 cycles |
36955 cycles |
1.00 |
ML-KEM-1024 encaps |
39911 cycles |
39871 cycles |
1.00 |
ML-KEM-1024 decaps |
52402 cycles |
52380 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AMD EPYC 4th gen (c7a)
Benchmark suite | Current: c8c3692 | Previous: 94c8b47 | Ratio |
---|---|---|---|
ML-KEM-512 keypair |
11286 cycles |
11282 cycles |
1.00 |
ML-KEM-512 encaps |
12823 cycles |
12844 cycles |
1.00 |
ML-KEM-512 decaps |
17665 cycles |
17668 cycles |
1.00 |
ML-KEM-768 keypair |
19597 cycles |
19634 cycles |
1.00 |
ML-KEM-768 encaps |
20581 cycles |
20625 cycles |
1.00 |
ML-KEM-768 decaps |
27623 cycles |
27639 cycles |
1.00 |
ML-KEM-1024 keypair |
26273 cycles |
26296 cycles |
1.00 |
ML-KEM-1024 encaps |
28173 cycles |
28183 cycles |
1.00 |
ML-KEM-1024 decaps |
37627 cycles |
37611 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AMD EPYC 3rd gen (c6a) (no-opt)
Benchmark suite | Current: c8c3692 | Previous: 94c8b47 | Ratio |
---|---|---|---|
ML-KEM-512 keypair |
32889 cycles |
39593 cycles |
0.83 |
ML-KEM-512 encaps |
37424 cycles |
47505 cycles |
0.79 |
ML-KEM-512 decaps |
45996 cycles |
61785 cycles |
0.74 |
ML-KEM-768 keypair |
54907 cycles |
63996 cycles |
0.86 |
ML-KEM-768 encaps |
61323 cycles |
75296 cycles |
0.81 |
ML-KEM-768 decaps |
72918 cycles |
93787 cycles |
0.78 |
ML-KEM-1024 keypair |
83219 cycles |
95491 cycles |
0.87 |
ML-KEM-1024 encaps |
91543 cycles |
108990 cycles |
0.84 |
ML-KEM-1024 decaps |
106377 cycles |
132483 cycles |
0.80 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intel Xeon 3rd gen (c6i) (no-opt)
Benchmark suite | Current: c8c3692 | Previous: 94c8b47 | Ratio |
---|---|---|---|
ML-KEM-512 keypair |
40288 cycles |
46437 cycles |
0.87 |
ML-KEM-512 encaps |
45413 cycles |
54623 cycles |
0.83 |
ML-KEM-512 decaps |
55654 cycles |
70298 cycles |
0.79 |
ML-KEM-768 keypair |
67249 cycles |
76666 cycles |
0.88 |
ML-KEM-768 encaps |
73541 cycles |
87243 cycles |
0.84 |
ML-KEM-768 decaps |
86943 cycles |
107715 cycles |
0.81 |
ML-KEM-1024 keypair |
98605 cycles |
112111 cycles |
0.88 |
ML-KEM-1024 encaps |
107754 cycles |
126070 cycles |
0.85 |
ML-KEM-1024 decaps |
125295 cycles |
152172 cycles |
0.82 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AMD EPYC 4th gen (c7a) (no-opt)
Benchmark suite | Current: c8c3692 | Previous: 94c8b47 | Ratio |
---|---|---|---|
ML-KEM-512 keypair |
27587 cycles |
36058 cycles |
0.77 |
ML-KEM-512 encaps |
30933 cycles |
42302 cycles |
0.73 |
ML-KEM-512 decaps |
37402 cycles |
55452 cycles |
0.67 |
ML-KEM-768 keypair |
46118 cycles |
58497 cycles |
0.79 |
ML-KEM-768 encaps |
51040 cycles |
66956 cycles |
0.76 |
ML-KEM-768 decaps |
59688 cycles |
84104 cycles |
0.71 |
ML-KEM-1024 keypair |
70178 cycles |
86469 cycles |
0.81 |
ML-KEM-1024 encaps |
77299 cycles |
97202 cycles |
0.80 |
ML-KEM-1024 decaps |
88097 cycles |
118782 cycles |
0.74 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton2
Benchmark suite | Current: c8c3692 | Previous: 94c8b47 | Ratio |
---|---|---|---|
ML-KEM-512 keypair |
28965 cycles |
28970 cycles |
1.00 |
ML-KEM-512 encaps |
34274 cycles |
34254 cycles |
1.00 |
ML-KEM-512 decaps |
44811 cycles |
44742 cycles |
1.00 |
ML-KEM-768 keypair |
49251 cycles |
49315 cycles |
1.00 |
ML-KEM-768 encaps |
54566 cycles |
54582 cycles |
1.00 |
ML-KEM-768 decaps |
69410 cycles |
69448 cycles |
1.00 |
ML-KEM-1024 keypair |
71883 cycles |
71939 cycles |
1.00 |
ML-KEM-1024 encaps |
80532 cycles |
80628 cycles |
1.00 |
ML-KEM-1024 decaps |
100525 cycles |
100412 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton3
Benchmark suite | Current: c8c3692 | Previous: 94c8b47 | Ratio |
---|---|---|---|
ML-KEM-512 keypair |
18916 cycles |
18921 cycles |
1.00 |
ML-KEM-512 encaps |
22412 cycles |
22423 cycles |
1.00 |
ML-KEM-512 decaps |
29685 cycles |
29675 cycles |
1.00 |
ML-KEM-768 keypair |
32293 cycles |
32313 cycles |
1.00 |
ML-KEM-768 encaps |
35807 cycles |
35801 cycles |
1.00 |
ML-KEM-768 decaps |
46236 cycles |
46189 cycles |
1.00 |
ML-KEM-1024 keypair |
46634 cycles |
46631 cycles |
1.00 |
ML-KEM-1024 encaps |
52306 cycles |
52345 cycles |
1.00 |
ML-KEM-1024 decaps |
66377 cycles |
66378 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton3 (no-opt)
Benchmark suite | Current: c8c3692 | Previous: 94c8b47 | Ratio |
---|---|---|---|
ML-KEM-512 keypair |
34586 cycles |
38694 cycles |
0.89 |
ML-KEM-512 encaps |
37809 cycles |
44315 cycles |
0.85 |
ML-KEM-512 decaps |
46108 cycles |
56145 cycles |
0.82 |
ML-KEM-768 keypair |
57990 cycles |
63849 cycles |
0.91 |
ML-KEM-768 encaps |
62149 cycles |
70986 cycles |
0.88 |
ML-KEM-768 decaps |
73596 cycles |
86944 cycles |
0.85 |
ML-KEM-1024 keypair |
87478 cycles |
95398 cycles |
0.92 |
ML-KEM-1024 encaps |
93780 cycles |
105306 cycles |
0.89 |
ML-KEM-1024 decaps |
108813 cycles |
125683 cycles |
0.87 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton2 (no-opt)
Benchmark suite | Current: c8c3692 | Previous: 94c8b47 | Ratio |
---|---|---|---|
ML-KEM-512 keypair |
52203 cycles |
58968 cycles |
0.89 |
ML-KEM-512 encaps |
57092 cycles |
67423 cycles |
0.85 |
ML-KEM-512 decaps |
68968 cycles |
85993 cycles |
0.80 |
ML-KEM-768 keypair |
87155 cycles |
98315 cycles |
0.89 |
ML-KEM-768 encaps |
93740 cycles |
109146 cycles |
0.86 |
ML-KEM-768 decaps |
110565 cycles |
133916 cycles |
0.83 |
ML-KEM-1024 keypair |
133004 cycles |
147104 cycles |
0.90 |
ML-KEM-1024 encaps |
143042 cycles |
162179 cycles |
0.88 |
ML-KEM-1024 decaps |
164927 cycles |
193746 cycles |
0.85 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton4
Benchmark suite | Current: c8c3692 | Previous: 94c8b47 | Ratio |
---|---|---|---|
ML-KEM-512 keypair |
17732 cycles |
17730 cycles |
1.00 |
ML-KEM-512 encaps |
20979 cycles |
20979 cycles |
1 |
ML-KEM-512 decaps |
27652 cycles |
27658 cycles |
1.00 |
ML-KEM-768 keypair |
30494 cycles |
30519 cycles |
1.00 |
ML-KEM-768 encaps |
33416 cycles |
33426 cycles |
1.00 |
ML-KEM-768 decaps |
42983 cycles |
42976 cycles |
1.00 |
ML-KEM-1024 keypair |
44133 cycles |
44139 cycles |
1.00 |
ML-KEM-1024 encaps |
49418 cycles |
49439 cycles |
1.00 |
ML-KEM-1024 decaps |
62368 cycles |
62362 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton4 (no-opt)
Benchmark suite | Current: c8c3692 | Previous: 94c8b47 | Ratio |
---|---|---|---|
ML-KEM-512 keypair |
31879 cycles |
35503 cycles |
0.90 |
ML-KEM-512 encaps |
34748 cycles |
40647 cycles |
0.85 |
ML-KEM-512 decaps |
42589 cycles |
51644 cycles |
0.82 |
ML-KEM-768 keypair |
53039 cycles |
58480 cycles |
0.91 |
ML-KEM-768 encaps |
57058 cycles |
65246 cycles |
0.87 |
ML-KEM-768 decaps |
68178 cycles |
80462 cycles |
0.85 |
ML-KEM-1024 keypair |
80881 cycles |
88186 cycles |
0.92 |
ML-KEM-1024 encaps |
86428 cycles |
96964 cycles |
0.89 |
ML-KEM-1024 decaps |
101128 cycles |
116641 cycles |
0.87 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bananapi bpi-f3 benchmarks
Benchmark suite | Current: 9c2f6a3 | Previous: 143daca | Ratio |
---|---|---|---|
ML-KEM-512 keypair |
255062 cycles |
309304 cycles |
0.82 |
ML-KEM-512 encaps |
284238 cycles |
413143 cycles |
0.69 |
ML-KEM-512 decaps |
365465 cycles |
555068 cycles |
0.66 |
ML-KEM-768 keypair |
425530 cycles |
506933 cycles |
0.84 |
ML-KEM-768 encaps |
460484 cycles |
637031 cycles |
0.72 |
ML-KEM-768 decaps |
568427 cycles |
819119 cycles |
0.69 |
ML-KEM-1024 keypair |
625346 cycles |
733246 cycles |
0.85 |
ML-KEM-1024 encaps |
670884 cycles |
893738 cycles |
0.75 |
ML-KEM-1024 decaps |
798212 cycles |
1109650 cycles |
0.72 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arm Cortex-A72 (Raspberry Pi 4) benchmarks
Benchmark suite | Current: c8c3692 | Previous: 94c8b47 | Ratio |
---|---|---|---|
ML-KEM-512 keypair |
51636 cycles |
51555 cycles |
1.00 |
ML-KEM-512 encaps |
59276 cycles |
59153 cycles |
1.00 |
ML-KEM-512 decaps |
75080 cycles |
75710 cycles |
0.99 |
ML-KEM-768 keypair |
87651 cycles |
88925 cycles |
0.99 |
ML-KEM-768 encaps |
95164 cycles |
95966 cycles |
0.99 |
ML-KEM-768 decaps |
119313 cycles |
120002 cycles |
0.99 |
ML-KEM-1024 keypair |
132944 cycles |
132553 cycles |
1.00 |
ML-KEM-1024 encaps |
144629 cycles |
144793 cycles |
1.00 |
ML-KEM-1024 decaps |
176707 cycles |
176528 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arm Cortex-A55 (Snapdragon 888) benchmarks
Benchmark suite | Current: c8c3692 | Previous: 94c8b47 | Ratio |
---|---|---|---|
ML-KEM-512 keypair |
58116 cycles |
58108 cycles |
1.00 |
ML-KEM-512 encaps |
64945 cycles |
64926 cycles |
1.00 |
ML-KEM-512 decaps |
83854 cycles |
83675 cycles |
1.00 |
ML-KEM-768 keypair |
98918 cycles |
98806 cycles |
1.00 |
ML-KEM-768 encaps |
109470 cycles |
109470 cycles |
1 |
ML-KEM-768 decaps |
136509 cycles |
136273 cycles |
1.00 |
ML-KEM-1024 keypair |
149678 cycles |
149673 cycles |
1.00 |
ML-KEM-1024 encaps |
166087 cycles |
165917 cycles |
1.00 |
ML-KEM-1024 decaps |
201802 cycles |
202149 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
e60839c
to
eac7e91
Compare
a5b1265
to
0bc46b1
Compare
The benchmarks seem to be a bit better again following adoption of GCC 14... |
81877e5
to
af0f31d
Compare
af0f31d
to
9c2f6a3
Compare
bac386a
to
8583efb
Compare
8583efb
to
676a80c
Compare
4429485
to
f25ae98
Compare
…g source re-organization in PR#674. See comments on older PR#610. All tests, proofs, and lint OK. Signed-off-by: Rod Chapman <rodchap@amazon.com> Correct list of called functions for this proof. Signed-off-by: Rod Chapman <rodchap@amazon.com> Update autogenerated files aftre rebase Signed-off-by: Rod Chapman <rodchap@amazon.com> Re-generate autogenerated files following rebase Signed-off-by: Rod Chapman <rodchap@amazon.com> Update auto-generated files and copyright messages following rebase Signed-off-by: Rod Chapman <rodchap@amazon.com> Update copyright notices for these news files Signed-off-by: Rod Chapman <rodchap@amazon.com> Update one more copyright notice Signed-off-by: Rod Chapman <rodchap@amazon.com> Update MLKEM_NAMESPACE to MLK_NAMESPACE for all new proofs Signed-off-by: Rod Chapman <rodchap@amazon.com> Rename INLINE to MLK_INLINE for new functions here. Signed-off-by: Rod Chapman <rodchap@amazon.com> rename NTT_BOUNDx macros to MLK_NTT_BOUNDx Signed-off-by: Rod Chapman <rodchap@amazon.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SpacemiT K1 8 (Banana Pi F3) benchmarks
Benchmark suite | Current: c8c3692 | Previous: 94c8b47 | Ratio |
---|---|---|---|
ML-KEM-512 keypair |
195915 cycles |
225118 cycles |
0.87 |
ML-KEM-512 encaps |
210980 cycles |
269679 cycles |
0.78 |
ML-KEM-512 decaps |
255226 cycles |
343289 cycles |
0.74 |
ML-KEM-768 keypair |
327524 cycles |
371250 cycles |
0.88 |
ML-KEM-768 encaps |
349417 cycles |
430008 cycles |
0.81 |
ML-KEM-768 decaps |
410007 cycles |
527834 cycles |
0.78 |
ML-KEM-1024 keypair |
496262 cycles |
555712 cycles |
0.89 |
ML-KEM-1024 encaps |
527973 cycles |
631753 cycles |
0.84 |
ML-KEM-1024 decaps |
605394 cycles |
752320 cycles |
0.80 |
This comment was automatically generated by workflow using github-action-benchmark.
Re-introduces fast NTT C code, following source re-organization in #674
See comments on older #610 . This commit addresses comments in that PR.
All tests, proofs, and lint OK.