-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use native ("USM") pointers for backing buffer allocations #162
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work!
I especially like that this gets rid of some silly workarounds/hacks that were previously necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some nitpicks from my side, otherwise I would say this is true and tested by now!
Re the ComputeCpp CI failure which I'm also seeing in #163 : The backend library doesn't set the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
4502e6f
to
6de986d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me now!
This introduces the new SKIP macro for skipping tests at runtime.
Use native pointers (allocated using `sycl::malloc_device`) instead of relying on SYCL buffers for backing Celerity virtual buffers. This greatly simplifies various aspects of accessors and buffer management while enabling future optimizations. Futhermore, by using native pointers we completely circumvent any dataflow analysis performed by the SYCL runtime.
Since SYCL 2020 does not support multi-dimensional (rectangular) copies for USM pointers, we have to either do it in a loop (slow) or fall back to vendor-specific APIs. This introduces a new "backend" system that does the latter. Currently only "generic" (= SYCL, slow) and CUDA (when using OpenSYCL or DPC++) are supported. Since backends are configuration during compile time, this additionally introduces a new integration testing mechanism for testing backends. This requires Celerity to be built with different CMake options, so the test is implemented as a Python script.
...instead of range and offset.
This transitions Celerity away from using SYCL buffers for backing buffer allocations to essentially device-native memory allocations through
sycl::malloc_device
andsycl::free
, which are part of SYCL 2020's USM APIs.This has several advantages:
Unfortunately there is also a major downside, which is that SYCL's USM capabilities are very much 1-dimensional at the moment.
This is reflected across all USM APIs, and particularly problematic for us is that there is no way of doing 2D/3D rectangular (strided) copies. Dispatching a series of 1D copies in a loop is not a viable solution, as it is extremely slow. Instead, for now (until SYCL Next, hopefully), we have to resort back to doing copies manually using the underlying vendor APIs. This means we need specialized code paths for each of the backends that we want to support efficiently.
Currently I have only implemented a specialized backend for CUDA as well as a generic (and slow) fallback, however adding new backends should be relatively straightforward. Backends are selected dynamically based on the device used. Since SYCL does not yet officially support CUDA as a backend, we also need an implementation-specific mechanism for detecting whether a device is a CUDA device; this is currently supported for OpenSYCL and DPC++.
I wasn't quite sure what the best way of structuring the backend system would be. Loading specialized libraries dynamically at runtime (similar to what OpenSYCL does) seemed overkill, so I've instead opted to make enabling/disabling backends a compile-time option (
-DCELERITY_ENABLE_CUDA_BACKEND
).This makes testing somewhat more difficult though, as the behavior of test cases is now affected by how they are compiled, which SYCL implementation is used, and on the hardware available during runtime. While I've added some unit tests that just opportunistically test whatever is available, to cover more of these combinations in our CI setup, I've also created our first true integration test (we previously used our examples as integration tests, which was always suboptimal) in the form of a Python script that compiles and runs different backend configurations.