Montgomery multiplication improvements #203
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
montgomery_reduction()
withsub_mod_with_hi()
. Speeds it up by ~20% (forU256
), I guess due to better vectorization?muladdcarry()
frommontgomery_reduction()
Uint::sub_mod()
,sub_mod_special()
,add_mod()
,add_mod_special()
by reusing existing methods.DynResidueParams::new()
aconst fn
mod_neg_inv
calculation inDynResidueParams::new()
, speeds it up by ~10% (forU256
).Originally I wanted to implement Montgomery multiplication by simultaneous multiplication + reduction instead of
mul_wide
, but it showed exactly the same performance, with the exception of the last reduction part - that's how I discovered the performance improvement. Still not quite sure what causes it.(By the way, I measured the performance on arm64 - would be interesting to see the results for x64)