-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support running multiple instances of the same kernel simultaneously #9
Comments
tl;dr: This comment is about devices and platforms. Both don't need reference counting, so I proposing implementing @kenba: In the This means that if you create two contexts from the same device, you would need a copy of that device. For me, this is what With let mut context = Context::from_device(device.clone()).unwrap()
let mut context_second = Context::from_device(device).unwrap() Without let device_clone = Device::new(device.id());
// Create a Context and a queue on the device
let mut context = Context::from_device(device).unwrap()
let mut context_second = Context::from_device(device_clone).unwrap() which I find less ergonomic. In my case, we create a list of devices ones at startup. So every call that needs to own a device would need that extra The same applies to platforms. |
tl;dr: This comment is about implementing As mentioned in the previous comment, in the code I'm working on, we initialize a list of devices and platform once at startup. I think that's a pattern worth supporting. But in order to be able to use things in lazy_static/once_cell, they need to be be I could create my own newtype wrappers around devices and platforms, but I'd hope that |
tl;dr: This comment is about implementing
I don't have much experience with multi-threaded Rust code, so I read about things and tried out what it might look like. "immutable Rust references" sounds like an use cl3::device::CL_DEVICE_TYPE_ALL;
use opencl3::command_queue::CL_QUEUE_PROFILING_ENABLE;
use opencl3::context::Context;
use opencl3::device::Device;
use opencl3::platform::get_platforms;
use std::thread;
use std::sync::Arc;
fn main() {
let platforms = get_platforms().unwrap();
let devices = platforms[0]
.get_devices(CL_DEVICE_TYPE_ALL)
.expect("Platform::get_devices failed");
let device = Device::new(devices[0]);
let mut context = Arc::new(Context::from_device(device).expect("Context::from_device failed"));
let context_clone = Arc::clone(&context);
let handle = thread::spawn(move || {
context_clone
.create_command_queues(CL_QUEUE_PROFILING_ENABLE);
});
handle.join();
} In this case I would want to do run some context on some other thread. This program fails to compile due to (I shortened the error message):
The problem is that Implementing |
@vmx you raise a very good point about With regards to your views on In my understanding an OpenCL application should only create one context per platform, whether running multi or single threaded. After a However, it would be better for other threads to be able to take their own copies of The one object that you may want to copy and send between threads is an I don't have time to change this now, I'll have a look at it at the weekend. Please let me know your thoughts Volker. |
I agree that as much immutability should be used as possible. Those those traits are not about mutability. They are about ownership. Let's take The Rust API Guidelines mention in the interoperability section, specifically that you should implement
As I mentioned in my previous comment, I don't understand how you would do that without have
Take your time. Thanks for communicating the timeline so clearly, that's helpful. I've the luxury of being able to comment during my work hours. |
By storing a `intptr_t` in `Platform` and `Device` instead of `cl_platform_id` and `cl_device_id`, the structs can derive the Copy, Clone and Debug traits and implement the Send and Sync traits.
@vmx I've been able to implement |
I've thoroughly considered sharing OpenCL objects across threads and come to the conclusion that there is a fundamental issue with OpenCL object lifetimes which makes sharing OpenCL objects derived from a Object lifetimes are one of Rust's great strengths but they are rarely, if ever, considered when using other programming languages. Unfortunately, the OpenCL API does not explicitly state the lifetimes of OpenCL objects. It's fairly clear that I consider that all items created from an OpenCL @vmx If an OpenCL expert can explain to me how and why OpenCL objects such as |
@vmx has created a couple of PRs to add
Clone
traits: #4 and #7 and another PR (#6) to addSend
andSync
traits.I'm not happy with these PR's, I don't think that they take the correct approach.
Cloning
The first issue is what should Clone` do?
@vmx has implemented
Clone
using the appropriate OpenCLclRetain*
functions.According to the OpenCL spec, the
clRetain*
functions, increment an object'sreference count
, i.e. they perform a shallow copy. This use ofClone
is similar to using shared pointers in C++ and nowhere near as powerful as Rust references.Most
opencl3
objects are immutable after they have been constructed. Normally, only theDrop
trait is mutable to implementRAII
by calling the relevantclRelease*
function. Therefore, theopencl3
objects can (and should) be accessed by immutable Rust references wherever possible. The exception isContext
, where: sub-devices, command queues, and programs are added to aContext
in the Initialisation phase:Figure 1 OpenCL Application Lifecycle
However, after Initialisation.,
Context
s can (and should) also be accessed by immutable Rust references.Multi-threading
The Query and Initialisation phases of an application should be performed before threads are started. Once the relevant context(s) have been created they can be shared with the threads using immutable Rust references, or new command queues and kernels can be created and moved to the new threads.
Each thread should operate independently; i.e. each should have it's own: command queues, memory buffers and especially kernels.
If an OpenCL application wishes to run multiple instances of a
Kernel
simultaneously on different threads, then it should use different instances of theKernel
on the threads, not references to the sameKernel
instance, see: OpenCL multiple host threads.Therefore, there is definitely a need to provide multiple deep copies of a
Kernel
to support multi-threading.I cannot see a need to
Send
orSync
OpenCL objects with the possible exception ofEvents
.Events
can be used to synchronise OpenCL execution on multiple command queues in the same context. However, it should be possible to use immutable references or pass them from a thread after it has finished execution.The text was updated successfully, but these errors were encountered: