-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use randn!
for stochastic forcing implementations
#351
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to perform long-running simulations on the GPU when there are allocations? Can GPU garbage collection keep up?
I'm not sure about that. |
Calling |
Oh yeah... that was the "allocating" version I suggested in the issue. The PR doesn't have that version, I just put it here for comparison. But still using
|
Did you look into the code for |
I actually didn't :( |
omg, I figured it out! |
If we have arrays that have length that is a power of 2 then there is no allocations: julia> using CUDA, Random
julia> A = CUDA.zeros(1024, 1024);
julia> @btime Random.randn!($A);
2.417 μs (0 allocations: 0 bytes)
julia> A = CUDA.zeros(1024, 1025);
julia> @btime Random.randn!($A);
14.119 μs (10 allocations: 352 bytes) |
You were right. In my head this was like an impossible task but it actually took me less than 10 minutes. |
Nice work 🕵️♂️ |
This forcing implementation ensures non-allocating
calcF!
methods both for CPU and GPU.Closes #350
Few benchmarks:
Thus, this PR is 1.5-2x faster than the solution originally proposed in #350 and with less allocations.