Koren/v3 msm #581

Koren-Brand · 2024-08-19T11:32:38Z

Describe the changes

MSM multithreaded implementation on CPU

Linked Issues

Resolves #

…f buckets array

…sion of phase 2.

…with gpu

…e verified against another model (i.e. gpu)

…al from wrong field

…erywhere instead of specifying the install dir in the code

…ples

icicle_v3/backend/cpu/src/curve/cpu_msm.hpp

icicle_v3/tests/test_curve_api.cpp

wrappers/rust_v3/icicle-core/src/msm/tests.rs

yshekel

Overall looks good but I had some comments.
Please ask Miki/Hadar to review the algorithm part. I did not really review it.

…remainder and the tests now reflect that

…y modifies c if it divides scalar width

HadarIngonyama · 2024-08-25T08:16:23Z

icicle_v3/include/icicle/curves/projective.h

@@ -22,7 +22,6 @@ class Projective
  FF x;
  FF y;
  FF z;
-


HadarIngonyama · 2024-08-25T08:26:52Z

icicle_v3/backend/cpu/src/curve/cpu_msm.hpp

+  const Device& device, const scalar_t* scalars, const A* bases, int msm_size, const MSMConfig& config, P* results)
+{
+  int c = config.c;
+  if (c < 1) { c = std::max((int)std::log2(msm_size) - 1, 8); }


why is this optimal?

it's an approximation given the derivative of num of addition (of the single-threaded no precompute solution)
it can be optimized - I would like to know how you determined optimal value

HadarIngonyama · 2024-08-25T08:27:32Z

icicle_v3/backend/cpu/src/curve/cpu_msm.hpp

+  int c = config.c;
+  if (c < 1) { c = std::max((int)std::log2(msm_size) - 1, 8); }
+  if (scalar_t::NBITS % c == 0) {
+    std::cerr << "Currerntly c (" << c << ") mustn't divide scalar width (" << scalar_t::NBITS


doesn't Niall's trick solve this?

It does I currently have a bug regarding this
fixing it is on the todo list

icicle_v3/backend/cpu/src/curve/cpu_msm.hpp

HadarIngonyama · 2024-08-25T14:20:23Z

icicle_v3/backend/cpu/src/curve/cpu_msm.hpp

+
+template <typename A, typename P>
+void Msm<A, P>::run_msm(
+  const scalar_t* scalars, const A* bases, const unsigned int msm_size, const unsigned int batch_idx, P* results)


let's stay consistent with calling them points and not bases please. the msm always used the points notation but somehow the bases notation slipped through

@yshekel is it ok? (given the existing api)

HadarIngonyama · 2024-08-26T10:14:40Z

icicle/backend/cpu/src/curve/cpu_msm.hpp

+template <typename A, typename P>
+void Msm<A, P>::phase2_bm_sum(std::vector<BmSumSegment>& segments)
+{
+  phase2_setup(segments);


as we discussed this is probably extraneous

will be dealt within a future version

icicle/backend/cpu/src/curve/cpu_msm.hpp

HadarIngonyama · 2024-08-26T10:16:44Z

icicle/backend/cpu/src/curve/cpu_msm.hpp

+            int return_idx = task->m_return_idx;
+            // Due to the choice of num segments being less than half of total tasks there ought to be an idle task for
+            // the line sum
+            task = manager.get_idle_task();


as we discussed, use get idle for both tasks or assign thread per segment in advance

yshekel

note the rust msm is not right. You need to handle the TODOs in msm test in rust and remove the test in v3 dir since it was renamed

Koren-Brand and others added 30 commits July 29, 2024 16:07

basic pipenger cpu works

80ed037

works with precompute factor

f178ad6

Signed MSM works

d86b02f

partial cmake commit

236c120

reverted changes in test_curve_api from rebase with yuval's branch

8982196

montgomery works

4db229b

Multithreaded p1 almost works but fails when prints are removed

09bec30

multithreaded phase 1 works plus removed unnecessary initialization o…

5c75b6d

…f buckets array

Phase 1 is multithreaded and works. Started writing multithreaded ver…

5c8cbfc

…sion of phase 2.

fix formatting

9a179e9

commit before switching to work on one of the servers to compare cpu …

3de5508

…with gpu

added framework for generic small tasks thread pool that has yet to b…

fbd0b4e

…e verified against another model (i.e. gpu)

added framework for generic small tasks thread pool that has yet to b…

e8cb7e5

…e verified against another model (i.e. gpu)

added framework for generic small tasks thread pool that has yet to b…

cec0c38

…e verified against another model (i.e. gpu)

Added pre-compute factor to cpu-msm test

1d37343

fix: bug where wrong polynomial factory is used to construct polynomi…

8c13382

…al from wrong field

rename template files to not format them

4679729

fix rust examples calling load_backend() with removed param

72bfcdc

minor update to rust poly example

67afbd0

add runtime api to load backend from default installdir and use it ev…

94facf8

…erywhere instead of specifying the install dir in the code

add ntt benchmark for rust

7e463e1

simplify C++ examples by loading backend from default install dir

83d05e6

fix rust bw761 curve missing ICICLE backend install path and fix exam…

7377a69

…ples

add rust benchmark for msm

42a5f99

add rust ntt benchmark for fields too

1064f27

add rust ecntt benchmark

0b02d7e

update ntt api to accept config by const

8bbeae4

move backend-specific-config to open part to avoid installing it

8db133b

rename template files to not format them

bded390

fix rust examples calling load_backend() with removed param

57976e0