Skip to content

Latest commit

 

History

History
57 lines (43 loc) · 3.27 KB

254-bit-on-gpu.md

File metadata and controls

57 lines (43 loc) · 3.27 KB

254-bit Prime Field Arithmetic Benchmark on GPU with CUDA Backend

$ DEVICE=gpu make cuda && ./run
Benchmark running on Tesla V100-SXM2-16GB

Addition on F(21888242871839275222246405745257275088548364400416034343698204186575808495617)

  dimension		iterations		          total		                  per op		            ops/ sec
128  x  128		    1024		        2234855 ns		       0.133208 ns		           7.50707e+09
256  x  256		    1024		         613577 ns		     0.00914301 ns		           1.09373e+11
512  x  512		    1024		        2278511 ns		     0.00848811 ns		           1.17812e+11
1024 x 1024		    1024		        8674787 ns		     0.00807902 ns		           1.23777e+11

Subtraction on F(21888242871839275222246405745257275088548364400416034343698204186575808495617)

  dimension		iterations		          total		                  per op		            ops/ sec
128  x  128		    1024		         229056 ns		      0.0136528 ns		            7.3245e+10
256  x  256		    1024		         691766 ns		      0.0103081 ns		           9.70109e+10
512  x  512		    1024		        2566303 ns		     0.00956022 ns		             1.046e+11
1024 x 1024		    1024		       10098982 ns		     0.00940541 ns		           1.06322e+11

Multiplication on F(21888242871839275222246405745257275088548364400416034343698204186575808495617)

  dimension		iterations		          total		                  per op		            ops/ sec
128  x  128		    1024		       35480343 ns		        2.11479 ns		           4.72859e+08
256  x  256		    1024		      168584024 ns		         2.5121 ns		           3.98074e+08
512  x  512		    1024		     1002255496 ns		        3.73369 ns		           2.67831e+08
1024 x 1024		    1024		     4120332867 ns		        3.83736 ns		           2.60596e+08

Division on F(21888242871839275222246405745257275088548364400416034343698204186575808495617)

  dimension		iterations		          total		                  per op		            ops/ sec
128  x  128		    1024		      197345977 ns		        11.7627 ns		           8.50142e+07
256  x  256		    1024		      703453235 ns		        10.4823 ns		           9.53992e+07
512  x  512		    1024		     2065334020 ns		        7.69397 ns		           1.29972e+08
1024 x 1024		    1024		     7218084706 ns		        6.72237 ns		           1.48757e+08

Inversion on F(21888242871839275222246405745257275088548364400416034343698204186575808495617)

  dimension		iterations		          total		                  per op		            ops/ sec
128  x  128		    1024		      174981181 ns		        10.4297 ns		           9.58801e+07
256  x  256		    1024		      451331865 ns		        6.72537 ns		           1.48691e+08
512  x  512		    1024		     1801308585 ns		         6.7104 ns		           1.49022e+08
1024 x 1024		    1024		     6754218367 ns		        6.29036 ns		           1.58974e+08

Exponentiation on F(21888242871839275222246405745257275088548364400416034343698204186575808495617)

  dimension		iterations		          total		                  per op		            ops/ sec
128  x  128		    1024		      704050844 ns		        41.9647 ns		           2.38296e+07
256  x  256		    1024		     2545503329 ns		         37.931 ns		           2.63637e+07
512  x  512		    1024		     8911049141 ns		        33.1962 ns		           3.01239e+07
1024 x 1024		    1024		    33079934464 ns		        30.8081 ns		            3.2459e+07