Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch from ocl to opencl3 #34

Merged
merged 9 commits into from
Jul 6, 2021
Merged

Switch from ocl to opencl3 #34

merged 9 commits into from
Jul 6, 2021

Conversation

vmx
Copy link
Collaborator

@vmx vmx commented May 18, 2021

This PR makes the switch from ocl to opencl3. This is mostly what the first commit. On top of that there are other improvements which make sense in the light of future CUDA integration.

For easier review you might want to review this PR commit per commit (also see the commit messages for more information).

This PR also needs changes to the upstream dependants, which can be found here:

@vmx vmx requested a review from dignifiedquire May 18, 2021 14:02
Cargo.toml Outdated Show resolved Hide resolved
src/opencl/mod.rs Outdated Show resolved Hide resolved
let kernels = opencl3::kernel::create_program_kernels(&program)?;
let kernels_by_name = kernels
.into_iter()
.map(|kernel| (kernel.function_name().unwrap(), kernel))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is it safe to unwrap here?

src/opencl/mod.rs Outdated Show resolved Hide resolved
@vmx vmx marked this pull request as draft May 20, 2021 16:56
@vmx
Copy link
Collaborator Author

vmx commented May 20, 2021

I've converted this to a draft PR. It turns out that this work is not needed for the the UUID work at #30, hence I get more time to polish it more, like removing unwraps() and getting features like Debug implementations implemented upstrea. Thanks for the early review @dignifiedquire.

@vmx vmx force-pushed the opencl3 branch 2 times, most recently from 82b6b1d to 9e0746b Compare May 31, 2021 12:53
@vmx vmx marked this pull request as ready for review May 31, 2021 14:01
@vmx
Copy link
Collaborator Author

vmx commented May 31, 2021

I have addressed previous comments and also cleaned-up the API a bit. It's ready for review :)

src/opencl/mod.rs Outdated Show resolved Hide resolved
src/opencl/mod.rs Outdated Show resolved Hide resolved
F: FnOnce() -> Result<R, E>,
E: From<GPUError>,
{
fun()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so much fun

Copy link
Contributor

@dignifiedquire dignifiedquire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a couple small things, but otherwise lgtm

@vmx vmx force-pushed the opencl3 branch 3 times, most recently from f625fa6 to b549f7b Compare June 7, 2021 14:00
@vmx
Copy link
Collaborator Author

vmx commented Jun 7, 2021

I've removed the last commit (I'm not sure if that change is possible, once there is CUDA as well). Better be safe than sorry.

It can be merged if the corresponding neptune and bellperson changes are also merged.

@vmx
Copy link
Collaborator Author

vmx commented Jun 14, 2021

I see failures in neptune. Not sure if it's really related, I'll have another look before merging.

@vmx
Copy link
Collaborator Author

vmx commented Jul 5, 2021

@dignifiedquire I think nothing changed since the last review, is it OK if I merge it?

Copy link
Contributor

@dignifiedquire dignifiedquire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you make sure to add doc comments please to all new functions

src/opencl/error.rs Outdated Show resolved Hide resolved
Ok(())
}
buffer: opencl3::memory::Buffer<T>,
length: usize,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add a comment if the length is in in bytes or in T elements

let binaries = program
.get_binaries()
.map_err(|_| GPUError::ProgramInfoNotAvailable(CL_PROGRAM_BINARIES))?;
std::fs::write(cached, binaries[0].clone())?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if there are multiple or no binary?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No idea. This code was just ported over from: https://github.com/filecoin-project/rust-gpu-tools/pull/34/files/e212d9d1af6d46c0030c2794dc0607939c031ff2#diff-f16a498f7eb37b8af9a85fcbd3bba8f972198602f61b6f919bf15aeb939da2c2L256

Should this be addressed? If yes, then I'd do this in a separate PR after all these reviews.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets open an issue, to at least investigate this

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#43

let buff = opencl3::memory::Buffer::create(
&self.context,
CL_MEM_READ_WRITE,
length,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is length now in numbers of T?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, length is the number of Ts. Reason is that we don't use as underlying buffer an OpenCL <u8> buffer anymore, but a <T> one (see https://github.com/filecoin-project/rust-gpu-tools/pull/34/files/e212d9d1af6d46c0030c2794dc0607939c031ff2#diff-f16a498f7eb37b8af9a85fcbd3bba8f972198602f61b6f919bf15aeb939da2c2R56). This simplifies the code quite, we need to do less size calculations and memory transformations. The tests still pass.

pub fn write_from_buffer<T>(
&self,
buffer: &Buffer<T>,
offset: usize,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what unit does offset have?

offset: usize,
data: &[T],
) -> GPUResult<()> {
assert!(offset + data.len() <= buffer.length);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A useful error message would be nice

.create_sub_buffer(CL_MEM_READ_WRITE, offset, data.len())?;

self.queue
.enqueue_write_buffer(&buff, CL_BLOCKING, 0, data, &[])?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add a comment why passing an empty slice is the right thing to do here

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was about to add // We are not waiting for events, hence the last parameter is an empty slice, but that didn't really make things clearer. Also it would need to be added to all 3 occurrences. I'm in favour of not having a comment, but if you think I should that comment I can add it.

vmx added 2 commits July 6, 2021 13:56
Also refactor the code. Instead of operating on the `Buffer`s, operate on
the `Program` and pass in the `Buffer`s. This way there is no need to expose
the queue to the outside.

Kernel creation might now return an error if the kernel with the given
name doesn't exist.
This change is needed to make the API the same that is used by the
CUDA implementation.
vmx added 7 commits July 6, 2021 13:56
This method is needed by the CUDA implementation, on OpenCL it doesn't
really do anything other then executing the closure.
There is no need to keep a copy of a device, we only need the device name.
In CUDA it is non-optional, hence make it also mandatory for OpenCL.
There is no need to own it.
Remove `unwrap()` with proper error propagation.
There are functions/macros that are not used by `bellperson` or `neptune`.
Remove the unused ones or make them private if they are used internally.
Instead of using a generic type in our `Buffer` type and do manual
size calculations pass it on directly into opencl3.
Copy link
Collaborator Author

@vmx vmx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be OK if I add documentation to all public functions once we've merged everything (before the release)?

let binaries = program
.get_binaries()
.map_err(|_| GPUError::ProgramInfoNotAvailable(CL_PROGRAM_BINARIES))?;
std::fs::write(cached, binaries[0].clone())?;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No idea. This code was just ported over from: https://github.com/filecoin-project/rust-gpu-tools/pull/34/files/e212d9d1af6d46c0030c2794dc0607939c031ff2#diff-f16a498f7eb37b8af9a85fcbd3bba8f972198602f61b6f919bf15aeb939da2c2L256

Should this be addressed? If yes, then I'd do this in a separate PR after all these reviews.

let buff = opencl3::memory::Buffer::create(
&self.context,
CL_MEM_READ_WRITE,
length,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, length is the number of Ts. Reason is that we don't use as underlying buffer an OpenCL <u8> buffer anymore, but a <T> one (see https://github.com/filecoin-project/rust-gpu-tools/pull/34/files/e212d9d1af6d46c0030c2794dc0607939c031ff2#diff-f16a498f7eb37b8af9a85fcbd3bba8f972198602f61b6f919bf15aeb939da2c2R56). This simplifies the code quite, we need to do less size calculations and memory transformations. The tests still pass.

.create_sub_buffer(CL_MEM_READ_WRITE, offset, data.len())?;

self.queue
.enqueue_write_buffer(&buff, CL_BLOCKING, 0, data, &[])?;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was about to add // We are not waiting for events, hence the last parameter is an empty slice, but that didn't really make things clearer. Also it would need to be added to all 3 occurrences. I'm in favour of not having a comment, but if you think I should that comment I can add it.

@dignifiedquire
Copy link
Contributor

Would it be OK if I add documentation to all public functions once we've merged everything (before the release)?

Sure, just open an issue to ensure that things are documented.

@vmx
Copy link
Collaborator Author

vmx commented Jul 6, 2021

I opened an issue: #41 Everything else should be addressed (which unit offset is will be addressed when adding documentation).

@vmx vmx merged commit 2827a11 into master Jul 6, 2021
@vmx vmx deleted the opencl3 branch July 6, 2021 12:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants