feat: add CUDA support #44

vmx · 2021-07-08T11:59:20Z

This library is now an abstraction for OpenCL and CUDA. It can be compiled
with support for either or both, via the feature flags opencl and cuda.

There is now a Device, which can point to an OpenCL and/or CUDA device.

You should be able to execute OpenCL and CUDA kernels with the same code
without modifications. You would pass in two closures with the same function
body into Program::run().

To create two closures with the correct signature but the same body, you
can use the define_closures() helper macro.

So your code would look like this:

use rust_gpu_tools::{cuda, define_closures, opencl, GPUError, Program};

let closures = define_closures!(|program: &opencl::Program | &cuda::Program|
 -> Result<Vec<u8>, GPUError> {
    let input = program.create_buffer::<u8>(128)?;
    let output = program.create_buffer::<u8>(128)?;
    let kernel = program.create_kernel(…)?;
    kernel.arg(&input).arg(&output).run()?;
    let mut out = vec![0u8; 128];
    program.read_into_buffer(&output, 0, &mut out)?;
    Ok(out)
});

let program = Program::Cuda(cuda::Program::from_cuda(…)?;
let results = program.run(closures)?;

vmx · 2021-07-08T12:01:21Z

This PR looks bigger than it is. For easier review:

the src/device.rs is similar to opencl/mod.rs. Diffing between current HEAD of opencl/mod.rs and this src/device.rs should give a better idea of the changes.
the code in the cuda directory is very similar to the code in the opencl directory, you might also want to diff between those two locally.

vmx · 2021-07-08T15:51:12Z

It might also help to see how it will be used. Here's the WIP in bellperson that uses this commit: filecoin-project/bellperson@6958c61?w=1

Cargo.toml

dignifiedquire

some preliminary thoughts, not through yet

src/cuda/mod.rs

dignifiedquire · 2021-07-09T10:34:28Z

src/cuda/mod.rs

+        E: From<GPUError>,
+    {
+        rustacuda::context::CurrentContext::set_current(&self.context).map_err(Into::into)?;
+        let result = fun(self);


what happens to the context if fun panics?

what happens if stream.synchronize returns an error, so the contex pop doesn't happen?

what happens when pop errors?

dignifiedquire · 2021-07-09T10:36:52Z

src/cuda/utils.rs

+    Ok(u64::try_from(memory).expect("Platform must be <= 64-bit"))
+}
+
+/// Get a lost of all devices.


src/lib.rs

vmx · 2021-07-27T09:14:43Z

Current state of this PR: It needs more work. The current API is unsafe, but not marked as such. rust-gpu-tools should be a save wrapper, which will need changes on the overall API.

vmx · 2021-08-12T14:44:39Z

This PR is ready for another round of reviews. Things that changed since last time:

Operations are mostly sync now, this makes the code simpler and doesn't need any hacks, while it doesn't make any performance difference. I did run the multiexmultiexp::gpu_multiexp_consistency test (also with a higher number of elements) and the difference was within the range of differences between runs of the async code.
create_buffer is now unsafe, as it really is. Though there is a new method called create_buffer_from_slice(), which is a safe alternative for most of the cases

Things missing before this can be merged:

Introducing Module::load_from_bytes() bheisler/RustaCUDA#58 would be great to have, as it makes it possible to include embed kernels directly into the library.
I'd also like Make mutability explicit kenba/opencl3#27 to be merged and released as it is the right thing to have (it would also make the OpenCL need to have a mutable buffer for copying data to the GPU.

This library is now an abstraction for OpenCL and CUDA. It can be compiled with support for either or both, via the feature flags `opencl` and `cuda`. There is now a `Device`, which can point to an OpenCL and/or CUDA device. You should be able to execute OpenCL and CUDA kernels with the same code without modifications. You would pass in two closures with the same function body into `Program::run()`. To create two closures with the correct signature but the same body, you can use the `program_closures()` helper macro. So your code would look like this: use rust_gpu_tools::{cuda, program_closures, Device, GPUError, Program}; use rust_gpu_tools::{cuda, program_closures, Device, GPUError, Program}; pub fn main() { let closures = program_closures!(|program, data: &[u8]| -> Result<Vec<u8>, GPUError> { let input = program.create_buffer_from_slice(data)?; let output = unsafe { program.create_buffer::<u8>(128)? }; let kernel = program.create_kernel("foo", 24, 4)?; kernel.arg(&input).arg(&output).run()?; let mut out = vec![0u8; 128]; program.read_into_buffer(&output, 0, &mut out)?; Ok(out) }); let cuda_device = Device::all().first().unwrap().cuda_device().unwrap(); let cuda_kernel_path = std::ffi::CString::new("/some/path").unwrap(); let cuda_program = cuda::Program::from_binary(cuda_device, &cuda_kernel_path).unwrap(); let program = Program::Cuda(cuda_program); let data = vec![5u8; 128]; let results = program.run(closures, &data).unwrap(); println!("results: {:?}", results); }

CUDA doesn't support an offset, hence it's removed from rust-gpu-tools

Use the latest CUDA image and cleanup the CI a bit.

vmx · 2021-09-20T14:50:19Z

This one is ready for another round of review. I'd squash it into a single commit once the review it is done.

dignifiedquire · 2021-09-21T14:09:21Z

src/cuda/mod.rs

@@ -0,0 +1,437 @@
+//! The CUDA specific implementation of a [`Buffer`], [`Device`], [`Program`] and [`Kernel`].
+//!
+//! The currenty operation mode is synchronuous, in order to have higher safety gurarantees. All


src/cuda/mod.rs

dignifiedquire · 2021-09-21T14:13:52Z

src/cuda/mod.rs

+
+    /// Pop the current context.
+    ///
+    /// It panics it it cannot as it's an unrecoverable error.


src/cuda/mod.rs

vmx requested a review from dignifiedquire July 8, 2021 12:01

dignifiedquire reviewed Jul 9, 2021

View reviewed changes

Cargo.toml Outdated Show resolved Hide resolved

dignifiedquire reviewed Jul 9, 2021

View reviewed changes

vmx mentioned this pull request Aug 12, 2021

Add CUDA support filecoin-project/rust-fil-proofs#1490

Closed

5 tasks

vmx force-pushed the cuda branch from 23843b8 to 866a491 Compare August 12, 2021 14:35

vmx force-pushed the cuda branch from 866a491 to 2c67af4 Compare August 13, 2021 11:36

vmx force-pushed the cuda branch from 2c67af4 to 0418b37 Compare September 7, 2021 08:56

vmx added 5 commits September 9, 2021 14:28

fix: implement Kernel Debug manually

f162175

fix: derive Debug for opencl::Buffer

3b7251d

WIP: queue one byte first

685a6f7

feat: offset parameter is no longer needed

1087395

CUDA doesn't support an offset, hence it's removed from rust-gpu-tools

chore: update CI

671176e

Use the latest CUDA image and cleanup the CI a bit.

fix: removeal of offset introduced a bug

b6b8a38

dignifiedquire reviewed Sep 21, 2021

View reviewed changes

vmx added 2 commits September 21, 2021 16:36

fixup: address review comments

4556131

chore: use released version of fil-rustacuda

a6f7025

dignifiedquire approved these changes Sep 22, 2021

View reviewed changes

vmx merged commit 705641f into master Sep 22, 2021

vmx deleted the cuda branch September 22, 2021 11:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add CUDA support #44

feat: add CUDA support #44

vmx commented Jul 8, 2021

vmx commented Jul 8, 2021

vmx commented Jul 8, 2021

dignifiedquire left a comment

dignifiedquire Jul 9, 2021

dignifiedquire Jul 9, 2021

vmx commented Jul 27, 2021

vmx commented Aug 12, 2021

vmx commented Sep 20, 2021

dignifiedquire Sep 21, 2021

dignifiedquire Sep 21, 2021

feat: add CUDA support #44

feat: add CUDA support #44

Conversation

vmx commented Jul 8, 2021

vmx commented Jul 8, 2021

vmx commented Jul 8, 2021

dignifiedquire left a comment

Choose a reason for hiding this comment

dignifiedquire Jul 9, 2021

Choose a reason for hiding this comment

dignifiedquire Jul 9, 2021

Choose a reason for hiding this comment

vmx commented Jul 27, 2021

vmx commented Aug 12, 2021

vmx commented Sep 20, 2021

dignifiedquire Sep 21, 2021

Choose a reason for hiding this comment

dignifiedquire Sep 21, 2021

Choose a reason for hiding this comment