Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use native ("USM") pointers for backing buffer allocations #162

Merged
merged 6 commits into from
Mar 28, 2023

Conversation

psalz
Copy link
Member

@psalz psalz commented Mar 8, 2023

This transitions Celerity away from using SYCL buffers for backing buffer allocations to essentially device-native memory allocations through sycl::malloc_device and sycl::free, which are part of SYCL 2020's USM APIs.

This has several advantages:

  • It gives us much more control over device memory usage, as we don't have to guess what any given SYCL implementation will do (this is particularly important for multi-GPU support).
  • It allows us to greatly simplify our accessor implementation, as we no longer need to wrap SYCL accessors internally.
  • This in turn means that we completely bypass any dataflow tracking within the SYCL runtime. This was always an unnecessary overhead as we have much more precise tracking information anyway.
  • It opens up the possibility for enabling special vendor-specific optimizations such as CUDA-aware MPI later down the road.

Unfortunately there is also a major downside, which is that SYCL's USM capabilities are very much 1-dimensional at the moment.
This is reflected across all USM APIs, and particularly problematic for us is that there is no way of doing 2D/3D rectangular (strided) copies. Dispatching a series of 1D copies in a loop is not a viable solution, as it is extremely slow. Instead, for now (until SYCL Next, hopefully), we have to resort back to doing copies manually using the underlying vendor APIs. This means we need specialized code paths for each of the backends that we want to support efficiently.

Currently I have only implemented a specialized backend for CUDA as well as a generic (and slow) fallback, however adding new backends should be relatively straightforward. Backends are selected dynamically based on the device used. Since SYCL does not yet officially support CUDA as a backend, we also need an implementation-specific mechanism for detecting whether a device is a CUDA device; this is currently supported for OpenSYCL and DPC++.

I wasn't quite sure what the best way of structuring the backend system would be. Loading specialized libraries dynamically at runtime (similar to what OpenSYCL does) seemed overkill, so I've instead opted to make enabling/disabling backends a compile-time option (-DCELERITY_ENABLE_CUDA_BACKEND).

This makes testing somewhat more difficult though, as the behavior of test cases is now affected by how they are compiled, which SYCL implementation is used, and on the hardware available during runtime. While I've added some unit tests that just opportunistically test whatever is available, to cover more of these combinations in our CI setup, I've also created our first true integration test (we previously used our examples as integration tests, which was always suboptimal) in the form of a Python script that compiles and runs different backend configurations.

@psalz psalz requested review from PeterTh, fknorr and facuMH March 8, 2023 09:25
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

include/accessor.h Outdated Show resolved Hide resolved
include/backend/operations.h Show resolved Hide resolved
include/buffer_storage.h Show resolved Hide resolved
include/buffer_storage.h Show resolved Hide resolved
include/buffer_storage.h Show resolved Hide resolved
test/backend_tests.cc Show resolved Hide resolved
test/backend_tests.cc Show resolved Hide resolved
test/integration/backend.cc Show resolved Hide resolved
@celerity celerity deleted a comment from github-actions bot Mar 8, 2023
@celerity celerity deleted a comment from github-actions bot Mar 8, 2023
@celerity celerity deleted a comment from github-actions bot Mar 8, 2023
@celerity celerity deleted a comment from github-actions bot Mar 8, 2023
@celerity celerity deleted a comment from github-actions bot Mar 8, 2023
@celerity celerity deleted a comment from github-actions bot Mar 8, 2023
Copy link
Contributor

@PeterTh PeterTh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!
I especially like that this gets rid of some silly workarounds/hacks that were previously necessary.

include/backend/backend.h Show resolved Hide resolved
include/buffer_manager.h Outdated Show resolved Hide resolved
include/device_queue.h Outdated Show resolved Hide resolved
test/backend_tests.cc Show resolved Hide resolved
@celerity celerity deleted a comment from github-actions bot Mar 8, 2023
@celerity celerity deleted a comment from github-actions bot Mar 8, 2023
@celerity celerity deleted a comment from github-actions bot Mar 8, 2023
@celerity celerity deleted a comment from github-actions bot Mar 8, 2023
@celerity celerity deleted a comment from github-actions bot Mar 8, 2023
@celerity celerity deleted a comment from github-actions bot Mar 8, 2023
@celerity celerity deleted a comment from github-actions bot Mar 8, 2023
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

test/integration/backend.cc Show resolved Hide resolved
test/integration/backend.cc Show resolved Hide resolved
test/integration/backend.cc Show resolved Hide resolved
@celerity celerity deleted a comment from github-actions bot Mar 8, 2023
@celerity celerity deleted a comment from github-actions bot Mar 8, 2023
@celerity celerity deleted a comment from github-actions bot Mar 8, 2023
@celerity celerity deleted a comment from github-actions bot Mar 8, 2023
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

@celerity celerity deleted a comment from github-actions bot Mar 8, 2023
@celerity celerity deleted a comment from github-actions bot Mar 8, 2023
@celerity celerity deleted a comment from github-actions bot Mar 8, 2023
@celerity celerity deleted a comment from github-actions bot Mar 8, 2023
@celerity celerity deleted a comment from github-actions bot Mar 8, 2023
Copy link
Contributor

@fknorr fknorr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some nitpicks from my side, otherwise I would say this is true and tested by now!

include/accessor.h Show resolved Hide resolved
include/buffer_manager.h Outdated Show resolved Hide resolved
include/handler.h Show resolved Hide resolved
include/reduction_manager.h Outdated Show resolved Hide resolved
src/backend/cuda_backend.cc Outdated Show resolved Hide resolved
src/buffer_storage.cc Outdated Show resolved Hide resolved
test/buffer_manager_test_utils.h Outdated Show resolved Hide resolved
test/buffer_manager_test_utils.h Outdated Show resolved Hide resolved
test/integration/backend.cc Outdated Show resolved Hide resolved
test/integration/backend.cc Outdated Show resolved Hide resolved
@fknorr
Copy link
Contributor

fknorr commented Mar 16, 2023

Re the ComputeCpp CI failure which I'm also seeing in #163 : The backend library doesn't set the CELERITY_CXX_FLAGS, which leaves CCPP in pre-2020 SYCL mode and causes missing symbol declarations.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

test/backend_tests.cc Show resolved Hide resolved
test/device_selection_tests.cc Show resolved Hide resolved
test/integration/backend.cc Show resolved Hide resolved
test/integration/backend.cc Show resolved Hide resolved
test/integration/backend.cc Show resolved Hide resolved
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

test/backend_tests.cc Show resolved Hide resolved
test/device_selection_tests.cc Show resolved Hide resolved
test/integration/backend.cc Show resolved Hide resolved
test/integration/backend.cc Show resolved Hide resolved
test/integration/backend.cc Show resolved Hide resolved
Copy link
Contributor

@fknorr fknorr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me now!

include/accessor.h Outdated Show resolved Hide resolved
include/accessor.h Outdated Show resolved Hide resolved
psalz added 6 commits March 28, 2023 13:48
This introduces the new SKIP macro for skipping tests at runtime.
Use native pointers (allocated using `sycl::malloc_device`) instead of
relying on SYCL buffers for backing Celerity virtual buffers.

This greatly simplifies various aspects of accessors and buffer
management while enabling future optimizations. Futhermore, by using
native pointers we completely circumvent any dataflow analysis performed
by the SYCL runtime.
Since SYCL 2020 does not support multi-dimensional (rectangular) copies
for USM pointers, we have to either do it in a loop (slow) or fall back
to vendor-specific APIs.

This introduces a new "backend" system that does the latter.
Currently only "generic" (= SYCL, slow) and CUDA (when using OpenSYCL or
DPC++) are supported.

Since backends are configuration during compile time, this additionally
introduces a new integration testing mechanism for testing backends.
This requires Celerity to be built with different CMake options, so the
test is implemented as a Python script.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants