-
Notifications
You must be signed in to change notification settings - Fork 196
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a new mp_todec_fast... #330
base: develop
Are you sure you want to change the base?
Conversation
See #328 |
e3ba30e
to
f0f5bb7
Compare
👍 Regarding the API I think it would be best if we replace the mp_todecimal macros by specialised functions. These could either branch to a fast variant if available or branch to the generic toradix. The mp_todecimal_fast function from this PR should then be internal and named s_mp_todecimal_fast |
Now that I’m thinking about it, a lot of functions have the name of the algorithm in them. Maybe make it internal and rename to s_mp_todecimal_barrett?
…Sent from my iPhone
On Aug 24, 2019, at 10:44 AM, Daniel Mendler ***@***.***> wrote:
👍
Regarding the API I think it would be best if we replace the mp_todecimal macros by specialised functions. These could either branch to a fast variant if available or branch to the generic toradix. The mp_todecimal_fast function from this PR should then be internal and named s_mp_todecimal_fast
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or mute the thread.
|
What do you mean, "if available"? Should I change this PR to also change the One difficulty is that |
If available - via the macro configuration, see #262 |
I would prefer if we use a radix size function and don't grow the buffer. |
Why is it called Is there a But after a quick first glimpse: looks good.
It is in a larger PR but independent of the rest of that PR. Needs to get updated to the new state of LibTomMath (more or less just the small_int_2_big_int parts, nothing significant) but no other hurdles as far as I can see. |
I can switch to assuming the passed in buffer is the correct size, but it'll take me a little bit of time to re-work.
I tried to follow the style and it passes
Yep I can work on that, but I wanted confirmation I was on the right path first, so thanks. |
A style/correctness question. Should I move the 3 arrays of
instead of the current
|
Timings with the new version that expects to be passed a pre-allocated buffer (i.e., similar API to
|
Thanks for this PR! Sorry for blocking you guys, I will have more time from next week on Yes, please rebase, squash together into logical commits and use MP_HAS |
... squash already happened ... 👍 |
fc55954
to
b2ca59c
Compare
69d57ff
to
f84b69c
Compare
Close in favor of #331 |
Re-opened as #331 has its base to a non-existant branch now |
IIUC that's not an option as this implementation is only faster for big numbers (@MasterDuke17 mentioned something like 500 digits on an older machine and 1500 digits on a newer one, I don't know whether it's decimal- or ltm-digits) |
I wasn't very clear, those are numbers I was raising two to the power of (e.g., 2^512 as you can see in some of my benchmarks in earlier commits). On my faster machine it was somewhere around @czurnieden has some ways to speed up this implementation (and re-writing to not be recursive may help, also trading memory for speed and keeping around the arrays of calculated values), but the constant factor may always put it slower than the current |
@sjaeckel You are right. We should do something similar like karatsuba. However I don't think it makes sense to merge a specialized version for base 10 if we can get a generalized version which works as well. |
Cool then let's do that!
That's also a good idea
You mean something like a "registry by radix" with the pre-calculated values of
👍
Fine by me! I thought we could merge this version soon, so we have at least a faster |
I believe @czurnieden's suggestions are straightforward, but will require me to understand more of the actual math than I do now, so it may take a while.
I'll work on this, but same re may take a while.
Is there anything similar in the codebase I could take a look at for inspiration?
If a fully generalized version would cause a delay until the release after next I'd advocate for including the base-10 version. But that's only so it be more likely to get into MoarVM faster (since my completely unmeasured opinion is that 99% of the uses of mp_to_radix are for base-10). |
@MasterDuke17 I don't think we will add this to the next release. It needs more time to mature. In particular since there are already many changes and lots of deprecations which need to be released. But I am interested in getting this in soon, I would use this faster to_radix myself. However my current impression of this library is that it is not moving particularly fast, but rather taking it's time to figure out how things should be. With regards to moarvm, it is not yet up to date with the latest release. And it would also probably help if moar gets ported to the tommath develop branch. This would help both projects I guess (advancing moar and feedback for tommath). |
@MasterDuke17 It got adapted it to the current LTM so it can run (tests, including timing are in I've put it in my fork; the two functions are It is a quite straightforward textbook implementation of the D&C algorithm described by many and should be easy to follow. |
@czurnieden Ooo, those do look helpful indeed, thanks! I'll have a go at plugging in my Barrett division. |
Couldn't sleep, so did the main cleanup in But a fast string2bigint isn't very urgent, a fast bigint2string on the other side is. Any progress made with it? Can I help? |
Nice. Looking at your code has helped me to understand what’s going on more (not yet completely though). I’ve made a *tiny* bit of progress, but I would have absolutely no objection if you carried on ahead (and if you do, feel free to take all or nothing from what I have committed so far). If you don’t, I’ll keep slowly plugging away.
…Sent from my iPhone
On Sep 17, 2019, at 1:34 AM, Christoph Zurnieden ***@***.***> wrote:
@MasterDuke17
Couldn't sleep, so did the main cleanup in bn_mp_set_str.c (string2bigint) and made it stand-alone (with option, see comments there). It is twice as large as bn_mp_read_radix.o but O(log(n) * M(n)) (In the current LTM M(n) = n^1.46 with Toom-Cook 3-way) instead of O(n^2).
But a fast string2bigint isn't very urgent, a fast bigint2string on the other side is. Any progress made with it? Can I help?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Did most of the chores in For further reading try Brent et al.: "Modern Computer Arithmetic" section 2.4.1. and Kalles, pardon, Karl Hasselström's master thesis "Fast Division of Large Integers" (shows correction if the error is bigger than unity allowing for smaller Another hint: Barrett division is not as complicated as it seems, so if you think you need to do something complicated you are probably on the wrong path (optimizing Example for Input
|
@czurnieden I'm making progress. I now have a Frankenstein combination of my bn_s_mp_to_decimal_fast.c and your bn_mp_get_str.c that sort of works. However, I currently have two problems. The first is that my new combined code is adding some extra leading |
Yeah, that part is a bit "how you doin'" and may need some extra massage.
It is more or less a "working sketch" to give you one way to tackle your problem as an example. It makes me wonder that valgrind didn't crash&burn completely ;-) I uploaded a slightly more "ready to go" version (none of the functions have their |
Hm, at this point I'm not sure what's missing from your new version (other than the error handling and such)? Is there actually something that should be ported in from my code? |
You still have the base-10 optimized part the computing of the reciprocal) which should find it's way into it because the whole thing would work much better with fast division in general but as the most used bases are probably 16, 10, and 64, and maybe base 2, too. The power-of-two bases are quickly computed, only base 10 is more work. There is even a discussion here to reduce the functionality in a later LTM version to just those power-of-two bases and base 10. So I would say that you should concentrate on what you started with and implement a (preprocessor/ Soooo: no cheap excuses here, go on! :-) |
Please rebase from time to time. |
@MasterDuke17 took the liberty to take the algorithm apart and wrote down what I found. It seems as if all of the odd radices get dropped in version 2.0.0 (who actully needs more than the base 2, 10, 16 and for saving space 64?) so I think you can drop the rest and concentrate on refining and optimizing what you started with: base 10. I think Lars Hellström made some suggestions in his post archived at Sourceforge like dropping recursion and adding some additional caching. Thank you for your patience! |
I extended my description and hope it is complete now (sans the iterative version(s)). If not feel free to ask me, I know it is still a bit of a mess. Please rebase. I wanted to give Github's web editor a try and do it for you but it seems as if I'm probably not the guy for web editors ;-) You can take the develop branch but take care that your stuff doesn't get lost ( |
Cool. If I understand it correctly we would need |
The names used in that text should be The exact type of the sizes is still in limbo ( I wouldn't go with static arrays in the first place but use the heap instead. You can calculate the number of reciprocals in advance and check the result (and repair if necessary), so I would just Oh, and although I added the necessary information to extend it to all of the other bases you can stay with base 10 for now. Did I add the "bigger chunks" stuff? No, I didn't, I was lying when I said it is complete ;-) |
@MasterDuke17 |
@czurnieden Great, thanks! I've been sidetracked by getting MoarVM upgraded to libtommath 1.2.0. We just recently turned on -Wall and -Wextra when building MoarVM and that threw up a ton of warnings in our code that uses libtommath (e.g., we were almost never handling return values and that now warns). But I should have that done soon and next thing on the list is working on this. |
@MasterDuke17 Good to hear that you are doing the update! The last time I looked, the MoarVM codebase had some math functions which are now available in upstream ltm, e.g., floating point conversion and bitwise operations. If something else is missing here let us know! |
Looks like @MasterDuke17 won't finish this, as MoarVM has decided to go for gmp instead of ltm. |
that uses Barrett reduction to speed up stringifying large integers.
Some benchmarking, done with a lightly modified
demo/timing.c
: