Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Free" (or at least cheap) hardware acceleration via openMP #1719

Open
smartalecH opened this issue Aug 4, 2021 · 1 comment
Open

"Free" (or at least cheap) hardware acceleration via openMP #1719

smartalecH opened this issue Aug 4, 2021 · 1 comment

Comments

@smartalecH
Copy link
Collaborator

The hybrid openMP/MPI branch (#1628) uses various openMP directives to parallelize the computation (e.g. #pragma omp parallel for).

More recent versions of openMP (circa 2018) support offloading the same computation onto hardware accelerators (e.g. GPUs), with very little modification to the same compiler directives. We would just have to make sure data that is meant to stay on the accelerator actually stays on the accelerator for a certain amount of time to overcome the hit from communication.

For example, we could create a function called run_until(n) that continuously timesteps for n steps without any interrupts (currently the run(until=n) calls back to python each iteration). All of the timestepping, dft-ing, etc. can be performed on the accelerator. Even convergence checks can be performed on the accelerator. The main benefit to using an accelerator for FDTD, of course, would be the extremely high memory bandwidths (FDTD is generally memory-bound, not compute bound).

In the past, pursuing hardware acceleration was rather undesirable as this required a custom kernel written using a proprietary API. While some directive-level shortcuts have existed for a long time (e.g. OpenACC) there wasn't enough motivation to justify the time sink. However, since we are already playing with OpenMP, it might be worth extending (or at least exploring) the functionality to also support basic accelerators.

@smartalecH
Copy link
Collaborator Author

An alternative approach is to use a framework like kokkos, which supports many different backends but is both data- and compute-explicit. This would potentially work much better than a single openMP library shipped with a particular compiler.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant