-
Notifications
You must be signed in to change notification settings - Fork 733
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL][DOC] Add initial spec draft for new default devices and queues #15382
base: sycl
Are you sure you want to change the base?
Conversation
Signed-off-by: James Brodman <james.brodman@intel.com>
sycl/doc/extensions/proposed/sycl_ext_oneapi_global_defaults.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/proposed/sycl_ext_oneapi_global_defaults.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/proposed/sycl_ext_oneapi_global_defaults.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/proposed/sycl_ext_oneapi_global_defaults.asciidoc
Outdated
Show resolved
Hide resolved
_Returns:_ The current default device. The initial device returned is the device | ||
returned by the default device selector. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the meaning of "initial device" could be clarified here. Maybe say something like:
_Returns:_ The current default device. The initial device returned is the device | |
returned by the default device selector. | |
_Returns:_ The current default device, as set by `set_current_device()`. | |
If `set_current_device()` has not been called, returns the device | |
returned by the default device selector. |
Also, a little bit of bikeshedding: it seems a little weird to me that the functions are called get_current_device()
but the text describes it as the "current default device". I think we should consistently use one or two words to describe this concept. I think I'm leaning towards get_default_device()
and set_default_device()
. That naming is more aligned with get_default_context()
and get_default_queue()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I had some reason I went sour on "default" but forgot what it is.
|
||
_Returns:_ Nothing | ||
|
||
_Remarks:_ Sets the current default device to `dev`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_Remarks:_ Sets the current default device to `dev`. | |
_Effects:_ Sets the current default device to `dev`. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some questions:
- Does this set the default for this thread, or all threads?
- Does this change the device returned by later calls to the default selector? (Should it?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mentioned as an open at the bottom.
sycl::malloc_device(numBytes, | ||
get_current_device(), | ||
get_current_device().get_platform().ext_oneapi_get_default_context(), | ||
propList) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was initially a little surprised by this, because I usually use the overloads accepting a queue
.
Is the device's default queue guaranteed to be associated with the default context?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the intent is that a device's default queue will use the device's platform's default context.
sycl/doc/extensions/proposed/sycl_ext_oneapi_global_defaults.asciidoc
Outdated
Show resolved
Hide resolved
---- | ||
!==== | ||
|
||
_Returns:_ The default queue for the device. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we clarify here that the default queue must be associated with the default context?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
. [UNRESOLVED] Should the currrent device be global or should we also support a per-thread | ||
device? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, the CUDA versions are always per-thread: https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html?highlight=cudaSetDevice#device-selection
I can see this behavior being useful for something like an OpenMP+SYCL application -- you could call set_default_device()
on each OpenMP thread to round-robin assign your devices to threads, then not worry about it again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some thoughts here that are relevant if we decided that each thread has its own default device:
-
We should clarify the behavior when a new thread is created. Does the new thread inherit the parent thread's default device? If not, is the initial default device for the new thread guaranteed to be the device selected by the default selector? Or, is the initial default device for a new thread unspecified?
-
We have discussed in the past the possibility of expanding the set of APIs that can be called from a host task. It is possible that we will allow a host task to call some of the APIs in this extension. If that happens, what guarantees does the application have about the default device in a host task?
…sciidoc Co-authored-by: John Pennycook <john.pennycook@intel.com>
…sciidoc Co-authored-by: John Pennycook <john.pennycook@intel.com>
…sciidoc Co-authored-by: John Pennycook <john.pennycook@intel.com>
…sciidoc Co-authored-by: John Pennycook <john.pennycook@intel.com>
…sciidoc Co-authored-by: John Pennycook <john.pennycook@intel.com>
[source,c++] | ||
---- | ||
namespace sycl { | ||
class queue { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
class queue { | |
class device { | |
|
||
[source, c++] | ||
---- | ||
submit_with_event(get_current_device().ext_oneapi_get_default_queue(), cgf) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
submit_with_event(get_current_device().ext_oneapi_get_default_queue(), cgf) | |
submit(get_current_device().ext_oneapi_get_default_queue(), cgf) | |
---- | ||
namespace sycl::ext::oneapi::experimental { | ||
|
||
void memcpy(void* dest, const void* src, size_t numBytes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I predict that many users will be confused, thinking this is a synchronous operation. For example, syclex::malloc_shared
and syclex::memcpy
are both memory operations. How are users supposed to know that the first is synchronous while the second is asynchronous?
Maybe we should add the suffix "_async" to all the functions that are asynchronous?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how I feel about the idea of an "_async" suffix at this point. I'm a little worried that explicitly calling out that some functions are asynchronous would introduce the opposite problem to what you're trying to solve: because functions like submit
and parallel_for
aren't called submit_async
and parallel_for_async
, a new developer might conclude they're synchronous.
(Aside: I would be okay with introducing an "_async" suffix if that's the direction C++ takes for its asynchronous STL functions, but then we'd need to roll out the suffix everywhere and deprecate the previous function names.)
Another potential fix would be to say that all these default queue shorthands are blocking...? I guess then there's a confusion that sycl::memcpy(dest, src, numBytes)
is synchronous but sycl::memcpy(q, dest, src, numBytes)
is asynchronous...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little worried that explicitly calling out that some functions are asynchronous would introduce the opposite problem to what you're trying to solve: because functions like
submit
andparallel_for
aren't calledsubmit_async
andparallel_for_async
, a new developer might conclude they're synchronous.
I was thinking that we could add "_async" to all of the asynchronous functions in the sycl_ext_oneapi_enqueue_functions extension, but we would leave the functions in SYCL 2020 named as they are now.
---- | ||
namespace sycl::ext::oneapi::experimental { | ||
|
||
(1) void* sycl::malloc_device(size_t numBytes, const property_list& propList = {}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our style is to put these numeric callouts to the right of the API synopsis. In addition, insert extra spaces to line them up vertically within a synopsis code block. See here for an example.
. [UNRESOLVED] Should the currrent device be global or should we also support a per-thread | ||
device? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some thoughts here that are relevant if we decided that each thread has its own default device:
-
We should clarify the behavior when a new thread is created. Does the new thread inherit the parent thread's default device? If not, is the initial default device for the new thread guaranteed to be the device selected by the default selector? Or, is the initial default device for a new thread unspecified?
-
We have discussed in the past the possibility of expanding the set of APIs that can be called from a host task. It is possible that we will allow a host task to call some of the APIs in this extension. If that happens, what guarantees does the application have about the default device in a host task?
Signed-off-by: James Brodman <james.brodman@intel.com>
Signed-off-by: James Brodman <james.brodman@intel.com>
Signed-off-by: James Brodman <james.brodman@intel.com>
This is a draft for a new extension to add the notion of a default device and default queues for devices.
It also extends the new launch mechanisms with queue-less forms that utilize the default device's default queue.