-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding target option #62
base: main
Are you sure you want to change the base?
Conversation
@michel2323 thanks for adding this. I think we need to discuss offline as the changes remove JACC's public API portability across vendors for the same code. Am I seeing this right? |
I wouldn't say so? In how far? At some point, you have to pick a backend. But the same is true for OpenMP. In Julia you can do this at runtime. |
If your code only uses one backend, say CUDA, you could have a |
Ah, I see what you mean maybe: the array types
So in the case where backend=CUDABackend(), For @PhilipFackler this would also make it easier if there's a struct with mixed host and device types. He would only have to define a |
This is mostly a corner case that very rarely comes up, so we should focus on portable code across different vendors. I agree it's a nice to have, but enforcing a specific back end in the public API should be optional (maybe should be a macro?) for corner cases not the rule. The back end selection follows Preferences.jl just like MPIPreferences.jl, so user code calling JACC (like those in tests) doesn't need to be touched from |
The argument would be a variable |
For those cases, the user should rely on Julia regular Arrays and CPU (host) if it's not worth porting, JACC is very targeted for performance portable code pieces.
That's what JACCPreferences sets, but via LocalPreferences, see this line. The least vendor/system info is exposed to the targeted users (domain scientists) the better. |
Let me add an example code: # code in a setup jl or run by the user before running his code
using CUDA
if CUDA.functional()
backend = CUDABackend()
else
backend = ThreadsBackend()
end
# application code using JACC which is the same accross all vendors
using JACC
function axpy(i, alpha, x, y)
if i <= length(x)
@inbounds x[i] += alpha * y[i]
end
end
x = adapt(backend, x)
y = adapt(backend, y)
for i in 1:11
@time JACC.parallel_for(backend, N, axpy, alpha, x, y)
end
# Copy to host
x = adapt(ThreadsBackend(),x)
y = adapt(ThreadsBackend(),y) So the difference is whether to set preferences or set it in a |
The difference between MPI and the GPU backends is that MPI has the same API across all implementations and the same array types are passed in. For the GPUs that's different. |
Yeah, that's the goal of JACC. Users should not interact with back ends (at most minimally like it's done today with Preferences). "JACC-aware" MPI would be a noble goal, though. |
Another stab at it. This defines a using JACC
# "Default backend is ThreadsBackend()"
println_default_backend()
using CUDA
# "Default backend is CUDABackend()"
println_default_backend() And then there are parallel methods that pass this down. Of course, if multiple GPU packages are loaded by the user, this will pick whatever extension was compiled last. Sorry, I really don't know how else to resolve the precompilation issue with |
And now with Preferences support too. So the breaking change is that |
@michel2323 thanks, see discussion in #53 . I am asking @PhilipFackler how to reproduce the error as it's not showing in the current CI. I'd rather keep the public API as simple as possible since back ends can be handled internally and weak dependencies should provide the desired separation. |
Ideally users should not deal with any detail in the code other than memory allocation and
|
This adds a target option to the parallel function calls. For CUDA:
The GPU packages provide these backends. JACC then defines
ThreadsBackend()
in addition to those.Doing it this way should resolve precompilation error, while also resolving #56 . In addition, there is no need to set preferences anymore and the various backends can be used concurrently in a code. Also no need for a
JACC.Array
type. This tries to imitate the target offload pragma of OpenMP.@PhilipFackler Let me know if there are any further issues with this solution.
Edit: These backends are also used by KernelAbstractions (except ThreadsBackend(), of course), so it would be easy now to write, for example, some GPU kernels in KA that don't require backend-specific functionality.