Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow platforms to be empty #468

Merged
merged 3 commits into from
Nov 9, 2023
Merged

Conversation

illuhad
Copy link
Contributor

@illuhad illuhad commented Oct 5, 2023

As discussed, this PR allows platforms to be empty. The motivating example was that of a CUDA backend where CUDA_VISIBILE_DEVICES has been set to expose no devices.

Most info queries like platform::vendor or platform::name can still be answered in this case. For the list of supported extensions, this PR assumes that an empty platform will not advertise any extension.

Another option might be to say that if an empty platform chooses to still advertise some extensions, they must be supported for all devices potentially supported by the platform, even if those devices are unavailable at the moment. In the example of the CUDA backend, we would e.g. know that any device that could be exposed once CUDA_VISIBLE_DEVICES is unset would also support USM.
I don't have a strong opinion on whether this additional rule is worth the effort.

@gmlueck
Copy link
Contributor

gmlueck commented Oct 6, 2023

One of the changes we have discussed for the next version of the SYCL specification is to add the concept of a platform "default context" and provide a guarantee that queue constructors that do not specifically request a context use the platform's default context. This is an important addition to the spec because it provides a guarantee for most simple applications that queues for devices on the same platform use the same context, and this provides guarantees about the ability to share USM memory amongst kernels submitted to those devices.

This PR interacts with that proposed addition because it would provide a way for the user to get a context that has no devices. (The proposal adds a way to get the default context for a platform, so a platform with no devices would necessarily have a default context with no devices.)

Therefore, if we decide to allow platforms to have no devices, I think we also need to clarify the semantics of a context that has no devices.

It's also not clear to me what requirements this PR imposes on an implementation. Must an implementation expose platforms with no devices if they could theoretically support some platform? It's not clear to me what this would even mean. Just for the sake of argument, let's suppose a vendor has a SYCL implementation that can support backend B, but that support must be enabled when the SYCL compiler itself is compiled (e.g. via a build option when building the compiler). Must this implementation expose an empty platform for backend B when it is built without that support?

As another scenario, let's suppose a SYCL implementation could potentially support the OpenCL backend, but only when the OpenCL runtime is installed. If that runtime is not installed, must the implementation expose an empty platform for OpenCL?

The motivation listed above for this PR is the CUDA_VISIBLE_DEVICES environment variable. The DPC++ implementation has a similar environment variable ONEAPI_DEVICE_SELECTOR which allows the user to select which devices and backends are exposed to the SYCL application. Our current behavior is to not expose a platform if this environment variable deselects the associated backend (or deselects all the devices in the backend). We have had an environment variable like this for a long time and no one has complained about that behavior. Would this PR require that behavior to change?

@TApplencourt
Copy link
Contributor

TApplencourt commented Oct 6, 2023

I think the implementation is free to decide whether it wants to show an empty platform.

(The proposal adds a way to get the default context for a platform, so a platform with no devices would necessarily have a default context with no devices.)

Can't it just throw if the platform has no device?

@illuhad
Copy link
Contributor Author

illuhad commented Oct 7, 2023

One of the changes we have discussed for the next version of the SYCL specification is to add the concept of a platform "default context" and provide a guarantee that queue constructors that do not specifically request a context use the platform's default context. This is an important addition to the spec because it provides a guarantee for most simple applications that queues for devices on the same platform use the same context, and this provides guarantees about the ability to share USM memory amongst kernels submitted to those devices.
This PR interacts with that proposed addition because it would provide a way for the user to get a context that has no devices. (The proposal adds a way to get the default context for a platform, so a platform with no devices would necessarily have a default context with no devices.)
Therefore, if we decide to allow platforms to have no devices, I think we also need to clarify the semantics of a context that has no devices.

It's a bit difficult to discuss the impact on a change that so far has only been vaguely discussed. I don't recall that any actual API or presentation on details was made on the topic of default context.
With that said, I don't think that there's a fundamental issue. In similar fashion to what the PR already states, we could also say that the atomic capability info queries for context return empty list if there are no devices.

We also have had default-context semantics in our implementation since the very beginning (and indeed, the context class is largely meaningless for us), so I don't think that a default context is fundamentally at odds with this PR.

It's also not clear to me what requirements this PR imposes on an implementation. Must an implementation expose platforms with no devices if they could theoretically support some platform? It's not clear to me what this would even mean. Just for the sake of argument, let's suppose a vendor has a SYCL implementation that can support backend B, but that support must be enabled when the SYCL compiler itself is compiled (e.g. via a build option when building the compiler). Must this implementation expose an empty platform for backend B when it is built without that support?

The PR makes no such requirements. It just says that an implementation is allowed to return empty platforms. When that happens - if at all - is up to the implementation. It gives implementations more freedom.

As another scenario, let's suppose a SYCL implementation could potentially support the OpenCL backend, but only when the OpenCL runtime is installed. If that runtime is not installed, must the implementation expose an empty platform for OpenCL?

Again, this PR makes no such requirements. It just gives implementations the freedom to return empty platforms if they wish to do so.

The motivation listed above for this PR is the CUDA_VISIBLE_DEVICES environment variable. The DPC++ implementation has a similar environment variable ONEAPI_DEVICE_SELECTOR which allows the user to select which devices and backends are exposed to the SYCL application. Our current behavior is to not expose a platform if this environment variable deselects the associated backend (or deselects all the devices in the backend). We have had an environment variable like this for a long time and no one has complained about that behavior. Would this PR require that behavior to change?

No, this PR would not require any change to your implementation. Implementations can continue to choose to not return empty platforms. It just allows an alternative behavior.
Let me know if you have suggestions how this can perhaps be expressed more clearly in the PR.

@AerialMantis
Copy link
Collaborator

The changes looks good to me, I have suggested some minor edits for consistency. I also noticed that there are some references to the requirement that a platform always contain at least on device is also described in the architecture section of the specification, so I think we need to change the wording there too.

§3.5. The SYCL platform model

The SYCL platform model is based on the OpenCL platform model. The model consists of a host connected to one or more heterogeneous devices, called devices.

§3.6.1. Platform mixed version support

The SYCL generic programming model exposes a number of platforms, each of them exposing a number of devices.

§3.7.1.1. SYCL backend resources managed by the SYCL application

Devices: platforms provide one or more devices for executing SYCL kernels.

We may also want to drop the "based on the OpenCL platform model" wording.

Therefore, if we decide to allow platforms to have no devices, I think we also need to clarify the semantics of a context that has no devices.

I think we could handle this by having the context and queue default constructors throw if there is no device found.

adoc/chapters/programming_interface.adoc Outdated Show resolved Hide resolved
adoc/chapters/programming_interface.adoc Outdated Show resolved Hide resolved
adoc/chapters/programming_interface.adoc Outdated Show resolved Hide resolved
adoc/chapters/programming_interface.adoc Outdated Show resolved Hide resolved
adoc/chapters/programming_interface.adoc Outdated Show resolved Hide resolved
Co-authored-by: Gordon Brown <gordon@codeplay.com>
@illuhad
Copy link
Contributor Author

illuhad commented Oct 18, 2023

@AerialMantis I've changed the places you have highlighted to make clear that empty platforms are allowed too. I've also removed the reference to the OpenCL platform model.

Copy link
Collaborator

@AerialMantis AerialMantis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks!

@gmlueck gmlueck merged commit 35c52d6 into KhronosGroup:SYCL-2020/master Nov 9, 2023
1 check passed
@illuhad illuhad deleted the patch-1 branch November 9, 2023 17:20
keryell pushed a commit that referenced this pull request Sep 10, 2024
Allow platforms to be empty
gmlueck added a commit that referenced this pull request Nov 7, 2024
Allow platforms to be empty

(cherry picked from commit 35c52d6)
gmlueck added a commit that referenced this pull request Nov 7, 2024
Allow platforms to be empty

(cherry picked from commit 35c52d6)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants