Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove event system from KA and update to CUDA v4.1 #2924

Merged
merged 99 commits into from
Apr 18, 2023

Conversation

vchuravy
Copy link
Collaborator

This likely removes some of the single device parallelism,
but decreases the risk for user mistake (everything is stream-ordered)

This will cause severe merge conflicts with open PRs and depends on currently unreleased
versions of CUDA & KernelAbstractions. So I might regret doing it now, but I was curious
to see what the damage would be.

@navidcy navidcy added the GPU 👾 Where Oceananigans gets its powers from label Feb 13, 2023
@simone-silvestri
Copy link
Collaborator

Nice, how do you go about launching overlapping asynchronous kernels this way?

@vchuravy
Copy link
Collaborator Author

Nice, how do you go about launching overlapping asynchronous kernels this way?

That's yet to be determined. I was talking to @maleadt about this. Maybe I will add a foreach variant to KernelAbstractions that implements this without requiring the use of tasks.

@maleadt
Copy link
Collaborator

maleadt commented Feb 14, 2023

Are the current options (@async blocks) not sufficient? I'd rather not have a parallel mechanism and have to maintain the APIs (explicitly passing streams/queues) to support it. Even right now it's relatively broken, only supporting kernel launches and memory copies (i.e. BLAS APIs do not take explicit stream arguments).

@vchuravy
Copy link
Collaborator Author

My goal right now is to do all of the changes without introducing new constructs and then evaluate performance from there.

@glwagner
Copy link
Member

@vchuravy I recall long ago we discussed introducing a "stream" abstraction as an alternative to events for asynchronous computations? This would parallel CUDA's streams but I guess would have to be something entirely new for CPU?

@vchuravy
Copy link
Collaborator Author

In discussion with Tim I decided against it. I tried implementing it on the CPU, but that's basically why I required events everywhere.

See https://github.com/JuliaGPU/KernelAbstractions.jl/blob/main/examples/mpi.jl, but Tim and I are bullish on using Julia's native concurrency mechanism, to express what streams normally do.

Copy link
Member

@glwagner glwagner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Epic work! It's not just an upgrade, the code is much simpler and I feel relieved. My comments are all cosmetic.

src/BoundaryConditions/fill_halo_regions_periodic.jl Outdated Show resolved Hide resolved
src/BoundaryConditions/fill_halo_regions_periodic.jl Outdated Show resolved Hide resolved
src/BoundaryConditions/fill_halo_regions_periodic.jl Outdated Show resolved Hide resolved
src/ImmersedBoundaries/mask_immersed_field.jl Outdated Show resolved Hide resolved
simone-silvestri and others added 14 commits April 18, 2023 07:20
Co-authored-by: Gregory L. Wagner <wagner.greg@gmail.com>
Co-authored-by: Gregory L. Wagner <wagner.greg@gmail.com>
Co-authored-by: Gregory L. Wagner <wagner.greg@gmail.com>
Co-authored-by: Gregory L. Wagner <wagner.greg@gmail.com>
Co-authored-by: Gregory L. Wagner <wagner.greg@gmail.com>
…rrection.jl

Co-authored-by: Gregory L. Wagner <wagner.greg@gmail.com>
…ncies.jl

Co-authored-by: Gregory L. Wagner <wagner.greg@gmail.com>
Co-authored-by: Gregory L. Wagner <wagner.greg@gmail.com>
@glwagner
Copy link
Member

We'll have to generate some new benchmarks after this especially for smaller / 1D / 2D problems

@navidcy
Copy link
Collaborator

navidcy commented Apr 19, 2023 via email

Crayons = "4"
CubedSphere = "0.1, 0.2"
DocStringExtensions = "0.8, 0.9"
FFTW = "1"
Glob = "1.3"
IncompleteLU = "0.2"
IterativeSolvers = "0.9"
JLD2 = "0.4"
KernelAbstractions = "0.8"
JLD2 = "^0.4"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the difference of "0.4" and "^0.4"? I thought they are the same.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GPU 👾 Where Oceananigans gets its powers from
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants