-
Notifications
You must be signed in to change notification settings - Fork 308
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hostRequirements: gpu: optional is broken on Windows 11 and 10 #9385
Comments
What do you get for running |
The team member who experienced the issues on Windows 11 + WSL2 is currently on leave. However, I found a Windows 10 machine with a GPU that has never had anything Docker nor NVIDIA container related installed on it. I installed Docker Desktop with WSL2 support, and oddly enough GPU passthrough appears to be supported by default, so I did nothing further. Anyways, I ran your command and it gave: > docker info -f '{{.Runtimes.nvidia}}'
'<no value>' I guess your suspicion from the previous issue was correct. To ensure that this machine was also affected by the bug I created a folder with the following contents. Note that I just took some existing files and started deleting things, so there's probably some unrelated lines in the following:
{
"name": "Dockerfile devcontainer gpu",
"build": {
"context": "..",
"dockerfile": "Dockerfile"
},
"workspaceFolder": "/workspace",
"workspaceMount": "source=.,target=/workspace,type=bind",
"hostRequirements": {
"gpu": "optional"
},
"runArgs": [
"--shm-size=4gb",
"--gpus=all"
]
}
# Setup environment basics
FROM pytorch/pytorch:2.1.2-cuda12.1-cudnn8-runtime
# Install packages
RUN apt update -y \
&& apt install -y sudo \
&& apt clean
# Set up user
ARG USERNAME=user
ARG USER_UID=1000
ARG USER_GID=$USER_UID
RUN groupadd --gid $USER_GID $USERNAME \
&& useradd --uid $USER_UID --gid $USER_GID -m $USERNAME \
&& echo $USERNAME ALL=\(root\) NOPASSWD:ALL > /etc/sudoers.d/$USERNAME \
&& chmod 0440 /etc/sudoers.d/$USERNAME
USER $USERNAME
# Set up working directory
WORKDIR /workspace
# Set up environment variables
ENV PYTHONUNBUFFERED=True Then I rebuilt and reopened the folder in a devcontainer via VSCode, and ran the following command to confirm I had access to a GPU (I also separately ensured PyTorch had access to CUDA acceleration): > nvidia-smi Everything worked perfectly. Afterwards, I commented out the |
Great, what do you get for |
I'm assuming you meant I also checked |
We could add a machine-scoped setting to tell us if a GPU is |
I am running into the same issue on my Windows machine. Shall we use nvidia-smi to detect NVidia GPU instead? |
If we only used nvidia-smi then maybe this would fail on Linux, where you may have NVIDIA drivers (nvidia-smi works) but not the NVIDIA Container Runtime (no GPU inside containers). |
@chrmarti |
Stumbled upon this in the last days again, after having a solution in #9220 in January. Working on a Windows Workstation now and cannot get a Dev Container running via WSL with GPU support. What is about the intermediary solution to have a machine specific configuration, which marti mentioned above? |
I agree, that whether or not an SSH server machine can use its GPU in a Docker container should be a setting on the SSH server machine. It doesn't belong to the local machine. One difficulty with the machine setting is that when connecting through an SSH server (or Tunnel), we can't access its machine settings through VS Code's API because that only knowns the local and the dev container (calling these "machine settings") settings. We can check for and read the machine settings.json in the extension though. /cc @sandy081 |
Here is my hacky fix for docker compose in the meantime :) #10124 (comment) |
@chrmarti Hello, It seems that this feature is still broken (v0.386.0). If I create a remote machine (GCP) with GPU and fully installed nvidia-stack, I can build and run the devcontainer using
But if I remove the GPU from my remote machine I can't start the docker container anymore as it claims having detected a GPU despite the fact that no GPU is attached: Output of devcontainer console is: If I run the command you have used in your ts-scripts on the machine (no GPU anymore) I get: I think you are just checking whether the nvidia-container-runtime is available but not whether an actual gpu is attached. So,
Will add --gpus 'all' if the runtime is available even if no gpu is attached. Unfortunately the container won't start if --gpus all is given but no GPU is attached to the computer. Am I missing something here? |
@maro-otto Good catch, I'll open a new issue for this. Thanks. |
Hello! Are users able to verify that this works (minus the new bug caught by @maro-otto)? |
Also @chrmarti if no user is able to could you clarify the steps? The original filed issue is comprehensive but was wondering if this can be tested without CUDA containers according to NVIDIA's standards (since it seems like the setting would apply in other dev container scenarios). Thanks! |
Without a GPU, I suggest to set With a GPU, you could set |
My laptop has a GPU. When GPU Availability is set to none, the dev container with optional gpu host requirements still gets a GPU: Host: Windows |
@rzhao271 Could you rebuild the container and append the log from that? ( |
Closing this issue. GPU Availability had to be set to none within the WSL settings, not the User settings. |
Does this issue occur when you try this locally?: Yes
Does this issue occur when you try this locally and all extensions are disabled?: Yes
This issue is a continuation of #9220, which appears to have regressed recently. Read the previous issue for more context.
Steps to Reproduce:
devcontainer.json
with"hostRequirements": { "gpu": "optional" }
nvidia-smi
On Linux Fedora 38 the above works - the container has access to the GPU.
On Windows 11 + WSL2 the above does not work. Troubleshooting steps have been described in #9220.
Adding
"runArgs": [ "--gpus", "all" ]
todevcontainer.json
makes Windows 11 + WSL2 work. However, using therunArgs
trick breaks the devcontainer for machines without GPUs (confirmed on Windows 11, macOS, and Linux Fedora).As a temporary workaround, we are therefore currently maintaining two files:
.devcontainer/gpu/devcontainer.json
and.devcontainer/cpu/devcontainer.json
.The text was updated successfully, but these errors were encountered: