Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add CUDA support #44

Merged
merged 9 commits into from
Sep 22, 2021
Merged

feat: add CUDA support #44

merged 9 commits into from
Sep 22, 2021

Conversation

vmx
Copy link
Collaborator

@vmx vmx commented Jul 8, 2021

This library is now an abstraction for OpenCL and CUDA. It can be compiled
with support for either or both, via the feature flags opencl and cuda.

There is now a Device, which can point to an OpenCL and/or CUDA device.

You should be able to execute OpenCL and CUDA kernels with the same code
without modifications. You would pass in two closures with the same function
body into Program::run().

To create two closures with the correct signature but the same body, you
can use the define_closures() helper macro.

So your code would look like this:

use rust_gpu_tools::{cuda, define_closures, opencl, GPUError, Program};

let closures = define_closures!(|program: &opencl::Program | &cuda::Program|
 -> Result<Vec<u8>, GPUError> {
    let input = program.create_buffer::<u8>(128)?;
    let output = program.create_buffer::<u8>(128)?;
    let kernel = program.create_kernel(…)?;
    kernel.arg(&input).arg(&output).run()?;
    let mut out = vec![0u8; 128];
    program.read_into_buffer(&output, 0, &mut out)?;
    Ok(out)
});

let program = Program::Cuda(cuda::Program::from_cuda(…)?;
let results = program.run(closures)?;

@vmx
Copy link
Collaborator Author

vmx commented Jul 8, 2021

This PR looks bigger than it is. For easier review:

  • the src/device.rs is similar to opencl/mod.rs. Diffing between current HEAD of opencl/mod.rs and this src/device.rs should give a better idea of the changes.
  • the code in the cuda directory is very similar to the code in the opencl directory, you might also want to diff between those two locally.

@vmx vmx requested a review from dignifiedquire July 8, 2021 12:01
@vmx
Copy link
Collaborator Author

vmx commented Jul 8, 2021

It might also help to see how it will be used. Here's the WIP in bellperson that uses this commit: filecoin-project/bellperson@6958c61?w=1

Cargo.toml Outdated Show resolved Hide resolved
Copy link
Contributor

@dignifiedquire dignifiedquire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some preliminary thoughts, not through yet

src/cuda/mod.rs Show resolved Hide resolved
src/cuda/mod.rs Show resolved Hide resolved
src/cuda/mod.rs Outdated Show resolved Hide resolved
src/cuda/mod.rs Outdated Show resolved Hide resolved
src/cuda/mod.rs Outdated Show resolved Hide resolved
src/cuda/mod.rs Outdated
E: From<GPUError>,
{
rustacuda::context::CurrentContext::set_current(&self.context).map_err(Into::into)?;
let result = fun(self);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • what happens to the context if fun panics?
  • what happens if stream.synchronize returns an error, so the contex pop doesn't happen?
  • what happens when pop errors?

Ok(u64::try_from(memory).expect("Platform must be <= 64-bit"))
}

/// Get a lost of all devices.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo, lost

src/lib.rs Show resolved Hide resolved
@vmx
Copy link
Collaborator Author

vmx commented Jul 27, 2021

Current state of this PR: It needs more work. The current API is unsafe, but not marked as such. rust-gpu-tools should be a save wrapper, which will need changes on the overall API.

@vmx
Copy link
Collaborator Author

vmx commented Aug 12, 2021

This PR is ready for another round of reviews. Things that changed since last time:

  • Operations are mostly sync now, this makes the code simpler and doesn't need any hacks, while it doesn't make any performance difference. I did run the multiexmultiexp::gpu_multiexp_consistency test (also with a higher number of elements) and the difference was within the range of differences between runs of the async code.
  • create_buffer is now unsafe, as it really is. Though there is a new method called create_buffer_from_slice(), which is a safe alternative for most of the cases

Things missing before this can be merged:

This library is now an abstraction for OpenCL and CUDA. It can be compiled
with support for either or both, via the feature flags `opencl` and `cuda`.

There is now a `Device`, which can point to an OpenCL and/or CUDA device.

You should be able to execute OpenCL and CUDA kernels with the same code
without modifications. You would pass in two closures with the same function
body into `Program::run()`.

To create two closures with the correct signature but the same body, you
can use the `program_closures()` helper macro.

So your code would look like this:

use rust_gpu_tools::{cuda, program_closures, Device, GPUError, Program};

    use rust_gpu_tools::{cuda, program_closures, Device, GPUError, Program};

    pub fn main() {
        let closures = program_closures!(|program, data: &[u8]| -> Result<Vec<u8>, GPUError> {
            let input = program.create_buffer_from_slice(data)?;
            let output = unsafe { program.create_buffer::<u8>(128)? };
            let kernel = program.create_kernel("foo", 24, 4)?;
            kernel.arg(&input).arg(&output).run()?;
            let mut out = vec![0u8; 128];
            program.read_into_buffer(&output, 0, &mut out)?;
            Ok(out)
        });

        let cuda_device = Device::all().first().unwrap().cuda_device().unwrap();
        let cuda_kernel_path = std::ffi::CString::new("/some/path").unwrap();
        let cuda_program = cuda::Program::from_binary(cuda_device, &cuda_kernel_path).unwrap();
        let program = Program::Cuda(cuda_program);
        let data = vec![5u8; 128];
        let results = program.run(closures, &data).unwrap();
        println!("results: {:?}", results);
    }
CUDA doesn't support an offset, hence it's removed from rust-gpu-tools
Use the latest CUDA image and cleanup the CI a bit.
@vmx
Copy link
Collaborator Author

vmx commented Sep 20, 2021

This one is ready for another round of review. I'd squash it into a single commit once the review it is done.

src/cuda/mod.rs Outdated
@@ -0,0 +1,437 @@
//! The CUDA specific implementation of a [`Buffer`], [`Device`], [`Program`] and [`Kernel`].
//!
//! The currenty operation mode is synchronuous, in order to have higher safety gurarantees. All
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo

src/cuda/mod.rs Show resolved Hide resolved
src/cuda/mod.rs Outdated

/// Pop the current context.
///
/// It panics it it cannot as it's an unrecoverable error.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo

src/cuda/mod.rs Outdated Show resolved Hide resolved
src/cuda/mod.rs Show resolved Hide resolved
@vmx vmx merged commit 705641f into master Sep 22, 2021
@vmx vmx deleted the cuda branch September 22, 2021 11:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants