-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Produce multiple random numbers efficiently #66
base: master
Are you sure you want to change the base?
Conversation
Essentially it's about avoiding two stored into array and then reading them back immediately. On the surface it looks really similar to the deforestation but I don't know whether it's possible to implement it via rewrite rules. At very least it seems hard. On completely unrelated note I thought about unrelated microoptimization: replace unboxed arrays with arrays from primitive. Indexing should be faster there since those don't support slicing. |
Thanks, I wasn't aware of the indexing overhead, I never used primitive arrays but I'll try it out when I have some time! Deforestation is also a new subject for me, but I feel it would be complicated (and lead to complicated code?) to explain to the compiler how to fuse two operations, I might be wrong though ... and would definitely like to see how this could be done :) |
I see a 2.5% performance boost when using a Storable. Interesting! |
|
Indexing of unboxed arrays is has inherent slowdown because of slicing support. It's done as Another approach for reducing number of read/writes to state vector is to turn generator into monad |
I tried with arrays from primitive, I see a slight speedup, but it's unclear if it's noise or not... I kept the |
I finally got time to work on PRs. I cherry picked changes for I also checked time of generation of |
I agree. Another thing to consider (and maybe document for the user) is that with the |
I finally got around to measure impact of different vector variants. Here is distribution of run times: Unboxed, Primitive, and PrimArray from primitive perform identically and Storable is about 5% slower. Probably because of extra pointer chase in ForeignPtr. I thought that probably primitive vector would lead to faster build but unboxed turned out to be slightly faster: 15.9s vs 16.1s. So current vector backend is likely optimal I'll get to the |
I rebased PR over current master. First of all benchmarks: it does provide nice ~25% speedup over What isn't very good. It provides very specific primitive: iterate function N times. Could it be generalized? One thing that comes to mind is unfolds. Maybe there's something that could usefully generalize both? There's obvious thing turn passing of |
Hello @Shimuuar, reading this MR brings back memories from when I was developping my little console game, which was a lot of fun! In the meantime I have moved to other personal projects (music, convolution reverbs, etc...) so I won't pursue the initial goal of merging this MR, but I hope it will be useful, some parts of it at least! |
Well 25% speedup is nothing to sneer at :). I think I'll release 0.15 without this PR and start updating statistics. Then I'll revisit this PR. Maybe I'll get some idea in the meantime |
This is probably not meargable as-is, because it's not in the spirit of the current API, but 'foldMUniforms' could be used to implement a new fold-like function in the class
Variate
, to allow creating N random values that are consumed by a monadic accumulating function.Note that creating N numbers with 'foldMUniforms' will require n+2 reads and writes to the state vector, whereas creating N numbers with 'uniform' requires 3*n reads and writes to the state vector. Benchmarks on my application show a speed-up, because random number generation is a bottleneck for me.