Update Rust apis #262

DmytroTym · 2023-11-11T22:09:50Z

This PR aims to update Rust APIs to the minimal usable standard

* refactor * refactor * revert * refactor: clang format * Update icicle/appUtils/msm/msm.cu

…fc7ed6068e8eab9d81a8d6002949524597a19' into develop/vhnat/feat-233-update-api-fix-rust-build

…lop/vhnat/feat-233-update-api-fix-rust-build

…vhnat/feat-233-update-api-fix-rust-build

…rust-apis

…vhnat/feat-233-update-api

…pi' into develop/dima/feat-233-update-rust-apis

ChickenLover · 2023-11-13T09:58:20Z

icicle/appUtils/lde/lde.cuh

- * evaluation domains of polynomials. It also contains methods for element-wise manipulation of vectors, which is useful
- * for working with polynomials in evaluation domain.
+ * LDE (stands for low degree extension) contains [NTT](@ref ntt)-based methods for translating between coefficient and evaluation domains of polynomials.
+ * It also contains methods for element-wise manipulation of vectors, which is useful for working with polynomials in evaluation domain.


But they are just general-purpose arithmetic. Wouldn't it be better to move these into some other file?

I think you're right...
My initial logic was like this: in arkworks there is this Evaluations struct which implements all the operations one might perform on polynomials in evaluation form, including element-wise add, sub and mult and I wanted to do something similar in lde. But we might also need vector operations elsewhere (e.g. in tree builders right?) so it's best to factor these out into a more general place
I think I'm going to remove lde for now then, and will (try to) implement a higher-level polynomial functionality in the next PR. Would that make sense?

ChickenLover · 2023-11-13T09:58:58Z

icicle/appUtils/lde/lde.cuh

- * LDE (stands for low degree extension) contains [NTT](@ref ntt)-based methods for translating between coefficient and
- * evaluation domains of polynomials. It also contains methods for element-wise manipulation of vectors, which is useful
- * for working with polynomials in evaluation domain.
+ * LDE (stands for low degree extension) contains [NTT](@ref ntt)-based methods for translating between coefficient and evaluation domains of polynomials.


isn't LDE a part of NTT and not vice versa?

Partially answered in the previous comment: I imagine future LDE as something like arkworks' Evaluation or a mix of Evaluation and DensePolynomial. So, it's a higher-level wrapper over NTT. For example, to evaluate a polynomial of degree N on a domain of size 2N you need 2 size N NTTs, or to multiply two polynomials you need some NTTs for translating between evaluation and coefficient forms

DmytroTym · 2023-11-20T21:48:08Z

@vhnatyk a couple of questions I have about the current NTT design in this branch:

I don't understand the purpose of Domain currently: it's just a wrapper around Config. Arkworks-style domain is (roughly) size, twiddles and coset. So no inout or batch_size. I think for us Domain might be a useful abstraction too, but IMO we should either add it later on after this PR or spend some time designing it.
Here are my other general preferences though I'm not insisting and will accept whatever consensus the team develops:

I would take inout, size and is_inverse out from config and put these fields into the NTT function separately. My intuition is that fields that have a natural default should be inside cfg but something that you almost definitely need to set manually shouldn't. Also imo ntt(data, true, cfg) is more readable than ntt(cfg).
Maybe it's better to have just one on_device flag instead of two (inputs_on_device and outputs_on_device) in NTTConfig? It just feels safer and more intuitive to me to not change where the pointer inout is pointing to (which will inevitably happen if inputs are on device and outputs on host let's say) in the function which is supposedly "in-place". From my experience, it's almost always either both inputs and outputs on device, or both on host in real-life NTTs
I completely agree that we need convenience wrappers for NTT functions, just thought maybe we could leave a minimal number of expressive calls for now and later on grow the wrappers organically. I just don't really believe in our ability to thoughtfully design good quality wrappers right now without delaying the PR.
We currently have three similar fields in NTTConfig: Ordering, Decimation and Butterfly. They are not independent, for example Decimation::kDIT and Butterfly::kCooleyTukey are basically the same thing. I think there's an issue in case for example Decimation::kDIF and Butterfly::kCooleyTukey are passed, in which case one has to be ignored. The reason why we have Decimation and Butterfly in the first place is for compatibility with other codebases (e.g. gnark uses DIT and DIF in their codebase) but imo it'd best to just document which decimation/butterfly options correspond to which Ordering options and remove Decimation and Butterfly fields altogether.

@jeremyfelder @ChickenLover please feel free to comment

jeremyfelder

I don't understand the purpose of Domain ......

I agree

Here are my other general preferences....:

I would take inout, size and is_inverse out from config and put these fields into the NTT function separately. My intuition is that fields that have a natural default should be inside cfg but something that you almost definitely need to set manually shouldn't. Also imo ntt(data, true, cfg) is more readable than ntt(cfg).

I agree about inout and size. I'm not sure why is_inverse would be different than is_coset though?

Maybe it's better to have just one on_device flag instead of two (inputs_on_device and outputs_on_device) in NTTConfig? It just feels safer and more intuitive to me to not change where the pointer inout is pointing to (which will inevitably happen if inputs are on device and outputs on host let's say) in the function which is supposedly "in-place". From my experience, it's almost always either both inputs and outputs on device, or both on host in real-life NTTs

I agree that it will be less error prone but this will force a user to manually move the initial data and resulting data to/from device if they want to use on_device=true which might also be error prone.

I completely agree that we need convenience wrappers for NTT functions, just thought maybe we could leave a minimal number of expressive calls for now and later on grow the wrappers organically. I just don't really believe in our ability to thoughtfully design good quality wrappers right now without delaying the PR.

👍🏻

We currently have three similar fields in NTTConfig: Ordering, Decimation and Butterfly. They are not independent, for example Decimation::kDIT and Butterfly::kCooleyTukey are basically the same thing. I think there's an issue in case for example Decimation::kDIF and Butterfly::kCooleyTukey are passed, in which case one has to be ignored. The reason why we have Decimation and Butterfly in the first place is for compatibility with other codebases (e.g. gnark uses DIT and DIF in their codebase) but imo it'd best to just document which decimation/butterfly options correspond to which Ordering options and remove Decimation and Butterfly fields altogether.

Yes, if multiple options are essentially the same outcome but different names, we should give one option and document the alternative names for that option.

jeremyfelder · 2023-11-29T08:26:32Z

icicle/appUtils/msm/msm.cu

+    template <typename S>
+    int get_optimal_c(int bitsize)
+    {
+      return ceil(log2(bitsize)) - 4;


Should this be able to be negative?

icicle/appUtils/ntt/ntt.cu

icicle/CMakeLists.txt

icicle/appUtils/msm/msm.cu

jeremyfelder · 2023-11-29T08:31:12Z

icicle/appUtils/ntt/ntt.cu

@@ -61,7 +58,7 @@ namespace ntt {
      int number_of_threads = MAX_THREADS_BATCH;
      int number_of_blocks = (n * batch_size + number_of_threads - 1) / number_of_threads;
      reverse_order_kernel<<<number_of_blocks, number_of_threads, 0, stream>>>(arr, arr_reversed, n, logn, batch_size);
-      cudaMemcpyAsync(arr, arr_reversed, n * batch_size * sizeof(E), cudaMemcpyDeviceToDevice, stream);
+      cudaMemcpyAsync(arr, arr_reversed, n * batch_size * sizeof(E), cudaMemcpyDefault, stream);


cudaMemcpyDefault is only on devices that support unified virtual addressing...what devices are these? We should update docs for supported devices

@vhnatyk I also don't understand the purpose of this tbh

jeremyfelder · 2023-11-29T08:40:36Z

wrappers/rust/icicle-curves/icicle-bn254/build.rs

+                .define("BUILD_TESTS", "OFF") //TODO: feature
+                // .define("CURVE", "bls12_381")
+                .define("CURVE", "bn254")
+                // .define("ECNTT_DEFINED", "") //TODO: feature


Suggested change

// .define("ECNTT_DEFINED", "") //TODO: feature

// .define("ECNTT_DEFINED", "") //TODO: feature

// .define("G2_DEFINED", "")

jeremyfelder · 2023-11-29T08:46:08Z

icicle/appUtils/msm/msm.cu

    if (config.batch_size == 1)
      bucket_method_msm(
-        config.bitsize, config.c, scalars, points, msm_size, results, config.are_scalars_on_device, config.big_triangle,
-        config.large_bucket_factor, config.ctx.stream);
+        bitsize, 16, scalars, points, msm_size, results, config.are_scalars_on_device,
+        config.are_scalars_montgomery_form, config.are_points_on_device, config.are_points_montgomery_form,
+        config.are_results_on_device, config.is_big_triangle, config.large_bucket_factor, config.is_async,
+        config.ctx.stream);
    else
      batched_bucket_method_msm(
-        config.bitsize, config.c, scalars, points, config.batch_size, msm_size, results, config.are_scalars_on_device,
-        config.ctx.stream);
+        bitsize, (config.c == 0) ? get_optimal_c<S>(bitsize) : config.c, scalars, points, config.batch_size, msm_size,
+        results, config.are_scalars_on_device, config.ctx.stream);
    return cudaSuccess;


I don't see us using config->point_size even though the comments on MSMConfig say it is used instead of msm_size in certain cases. This parameter in the config is a bit confusing, can't we always just use msm_size and batch_size to get the correct values?

You're right, it's currently unused, I tried to mark this fact with the following comment: https://github.com/ingonyama-zk/icicle/pull/262/files#diff-cdd3b67200a359e20797b18b99ca7176f36d4c9d48d9843d22500c5424fe1c7dR106. The idea of this parameter is to control whether you want to re-use the same set of bases in all MSMs or have custom bases for each one. Current default is the latter, although it's really easy to make this parameter functional and from my understanding this will help you in halo2 where you could reduce memory footprint from bases by sharing bases between MSMs, right?

ImmanuelSegol and others added 30 commits October 3, 2023 15:22

fix memory error in single_stage_multi_reduction_kernel (#235)

9114ecb

* refactor * refactor * revert * refactor: clang format * Update icicle/appUtils/msm/msm.cu

Added separate device context struct, returned lde

33d1583

Merge branch 'main' into cuda_refactoring

1c14466

wip - msm and eq

3de0e79

added lde to cmake

df1fc7e

wip draft with msm correctness test for bls12_381 + Merge commit 'df1…

3e9c0e7

…fc7ed6068e8eab9d81a8d6002949524597a19' into develop/vhnat/feat-233-update-api-fix-rust-build

Montgomery param added in lde.cu mul function

d8e26e8

Merge remote-tracking branch 'upstream/feat/233-update-api' into deve…

f1819d9

…lop/vhnat/feat-233-update-api-fix-rust-build

fixed on_device for ntt and lde

7d4e266

CamelCase

ec67196

fixed msm_test, int unification, google guilde

d6dfd59

Merge branch 'feat/233-update-api' into cuda_refactoring

09ef4fd

Merge commit '09ef4fdba19e6c4e4745c80062bb02a9bcc70d87' into develop/…

6f44433

…vhnat/feat-233-update-api-fix-rust-build

wip - ntt crash debugging

fda01fa

Merge branch 'feat/233-update-api' into develop/dima/feat-233-update-…

1d892b2

…rust-apis

async MSM with a rust wrapper

3e7d7b9

wip ntt tests with corretness

141b1c8

hotfix for correctness > 2^9

a199bc8

wip on device inout mixing with correctness

ea00037

cleanup

b8a06d5

preserving twiddles after first call

13f75a2

fixed twiddles preserving

6bcc56b

formatting

bb36b1b

removed some printing

466b669

disable ecntt temporarily

ac5b850

format

bb98347

rust fmt

434a010

exclude target from format

70eeada

passing ntt after merge

286811f

hotfix for linking issue

1b2c03b

Vitalii and others added 17 commits October 24, 2023 23:32

format

20ca1e7

Merge commit '133a1b28bc04900272e7121692c2e5bc2d8dcbcf' into develop/…

0b2bd78

…vhnat/feat-233-update-api

format

27d2c8f

draft of pr comments + correctness restored

ec86b44

wip refactor + format

9556ecf

domain wip

b0d0215

rust format

d0569b4

Merged feature branch in and Rust MSM correctness

dd628ed

rust build for correct curve

36442ae

Slowdown fixed by passing release flag to cmake

ac6593d

Merge remote-tracking branch 'vitaliy/develop/vhnat/feat-233-update-a…

32eaef1

…pi' into develop/dima/feat-233-update-rust-apis

WIP field and curve

eb543f0

still wip field and curve

318c6f4

field and curve in rust 1.0

4ad37ae

Refactored rust into several crates

7cc105d

Arkworks is now an option, bn254 crate created

e13bf8b

Rust msm and ntt wip

45c9b00

jeremyfelder mentioned this pull request Nov 12, 2023

update api #251

Closed

ChickenLover reviewed Nov 13, 2023

View reviewed changes

DmytroTym added 4 commits November 17, 2023 16:53

A version of rust msm done, cuda runtime wrapped

1623f01

refactor rust by creating a curve folder

69dc6c1

vec_ops instead of lde for now

e34e936

format

5a96065

jeremyfelder reviewed Nov 29, 2023

View reviewed changes

omershlo marked this pull request as ready for review December 3, 2023 11:00

omershlo merged commit dfa5b10 into feat/233-update-api Dec 3, 2023

omershlo deleted the develop/dima/feat-233-update-rust-apis branch December 3, 2023 11:32

DmytroTym mentioned this pull request Dec 12, 2023

New API #293

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update Rust apis #262

Update Rust apis #262

DmytroTym commented Nov 11, 2023 •

edited

Loading

ChickenLover Nov 13, 2023

DmytroTym Nov 15, 2023

ChickenLover Nov 13, 2023

DmytroTym Nov 15, 2023

DmytroTym commented Nov 20, 2023 •

edited

Loading

jeremyfelder left a comment

jeremyfelder Nov 29, 2023

jeremyfelder Nov 29, 2023

DmytroTym Dec 4, 2023

jeremyfelder Nov 29, 2023

jeremyfelder Nov 29, 2023

DmytroTym Dec 4, 2023

	// .define("ECNTT_DEFINED", "") //TODO: feature
	// .define("ECNTT_DEFINED", "") //TODO: feature
	// .define("G2_DEFINED", "")

Update Rust apis #262

Update Rust apis #262

Conversation

DmytroTym commented Nov 11, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DmytroTym commented Nov 20, 2023 • edited Loading

jeremyfelder left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DmytroTym commented Nov 11, 2023 •

edited

Loading

DmytroTym commented Nov 20, 2023 •

edited

Loading