Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

node-odm:gpu don't start on Ubuntu 22.04 due of lacking cgroups v2 support #185

Closed
jekhor opened this issue Aug 25, 2022 · 3 comments
Closed

Comments

@jekhor
Copy link

jekhor commented Aug 25, 2022

When trying to start WebODM with GPU support, i get error:


jek@dave:~/WebODM$ ./webodm.sh start --gpu
02:00.0 VGA compatible controller: NVIDIA Corporation GP104 [GeForce GTX 1080] (rev a1)
02:00.1 Audio device: NVIDIA Corporation GP104 High Definition Audio Controller (rev a1)
03:00.0 VGA compatible controller: NVIDIA Corporation GP104 [GeForce GTX 1080] (rev a1)
03:00.1 Audio device: NVIDIA Corporation GP104 High Definition Audio Controller (rev a1)
82:00.0 VGA compatible controller: NVIDIA Corporation GP104 [GeForce GTX 1080] (rev a1)
82:00.1 Audio device: NVIDIA Corporation GP104 High Definition Audio Controller (rev a1)
83:00.0 VGA compatible controller: NVIDIA Corporation GP104 [GeForce GTX 1080] (rev a1)
83:00.1 Audio device: NVIDIA Corporation GP104 High Definition Audio Controller (rev a1)
GPU_NVIDIA has been found
Checking for docker...   OK
Checking for docker-compose...   OK
Starting WebODM...

Using the following environment:
================================
Host: localhost
Port: 8000
Media directory: appmedia
SSL: NO
SSL key: 
SSL certificate: 
SSL insecure port redirect: 80
Celery Broker: redis://broker
Default Nodes: 1
================================
Make sure to issue a ./webodm.sh down if you decide to change the environment.

docker-compose -f docker-compose.yml -f docker-compose.nodeodm.gpu.nvidia.yml up --scale node-odm=1
Starting broker ... 
Starting db     ... 
Recreating webodm_node-odm_1 ... error

Starting broker              ... doneStarting db                  ... doneontainer error: cgroup subsystem devices not found: unknown
Starting worker              ... done

ERROR: for node-odm  Cannot start service node-odm: failed to create shim: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: container error: cgroup subsystem devices not found: unknown
ERROR: Encountered errors while bringing up the project.

It seems to be issue in nvidia docker images and there is a workaround exists: NVIDIA/nvidia-container-runtime#47 (comment)

Seems to be fixed already: NVIDIA/libnvidia-container#111 (comment)

@pierotofy
Copy link
Member

Thanks for the report @jekhor 🙏

I wonder if this is something we can fix (perhaps improve the webodm.sh script) or if this is something that requires upgrading to the latest docker/NVIDIA libs.

@jekhor
Copy link
Author

jekhor commented Aug 25, 2022

I didn't understand also, because don't know full inheritance tree of this nvidia-docker-related projects.

@jekhor
Copy link
Author

jekhor commented Aug 27, 2022

I am sorry, it was false alarm from my side. I didn't upgrade nvidia-docker2 package (the repository was commented during of OS upgrade). After upgrading it, WebODM works as expected with --gpu parameters. You already have links for nvidia-docker installation in the README, so this issue can be closed.

@jekhor jekhor closed this as completed Aug 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants