Uint - allow compile-time evaluation for all procs #54
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR allows compile-time evaluation for Uint for all procs (including +, *, div, mod, or, and, xor, ...).
Ints are not covered.
Implementation
What didn't work
Changing Stint backend to C arrays, which I tried yesterday in another branch led to much too many issues, nim-lang/Nim#8052, nim-lang/Nim#8053 until I was stuck with how to slice half of the array to do a simple
+=
:foo.lo += bar.lo
nim-stint/stint/private/datatypes.nim
Lines 49 to 155 in fec2190
What worked
I implemented a macro that gets all the "leaves" of a uint or int in big endian order, so for a uint256 I get [foo.hi.hi, foo.hi.lo, foo.lo.hi, foo.lo.lo].
I replaced all the
asWords
,asWordsZip
,m_asWordsZip
macros by a singleasWords
macro using the newForLoopStmt
which has the benefits of using the clearerfor foo, bar in x, y
and saving 160 lines of code.Internally the new
asWords
just replace the iteration variables by foo.hi.hi, foo.hi.lo, ... in sequenceReserves
Due to how it's working, the implementation always unrolls loops for GCC. This may lead to code bloat. For Uint256, there is probably no difference between:
And
but there Uint2048 used in Ethereum bloom filter might be a bit heavy. GCC has an option to reroll loops if needed.
Iteration is always done from most significant word to least significant word, i.e. there is no
ignoreEndianness
parameter. On littleEndian, this would iterate backward and may leave performance on the table for proc with no loop order dependency.For example on old Core architecture, the prefetcher could work with up to 12 forward streams but only 4 backward streams:
on Sandy Bridge, the prefetcher has no preference
.
I am not sure on ARM. Note that the compiler is probably able to reorder operations or use vectorized operations (ARM Neon, AVX2) if there is no loop order dependency so it may not matter at all.
Benchmark
I updated the benchmark to make sure we didn't have a perf regression, the change brings about 15% perf improvement on my machine.
Javascript
Compile-time evaluation works but Javascript display at least doesn't work, uint256 are printed with hundreds of leading zeros.
Nim issues
Working with types in macro is a pain, especially this bug nim-lang/Nim#7737. There are also many tricks to know about to get proper type resolution, like storing idents in a
nnkArgList
and notnnkBracket
ornnkPar
if we want to pass them to another macro for further processing.