-
Notifications
You must be signed in to change notification settings - Fork 2k
debian11 #1549
Comments
|
I tried to install on Debian 11 but I got this error message:
Checked and I see Debian 11 is not supported yet. When we can expect it? Linux 5.10.0-9-amd64 SMP Debian 5.10.70-1 (2021-09-30) x86_64 GNU/Linux |
The main blocker at the moment for
If you need nvidia-docker to work on debian11 now you can:
Note, if you are running with Kubernetes, the equivalent of |
|
Are you saying that setting If the container fails to start, then that's a bug and I'd like to know what the error message is. If it starts, but the devices are not present, then that is by design, and you will need to do manual injection of the devices as outlined in the final step of my previous comment. It's obviously not an ideal solution, but it's a way to make things work until |
|
|
Sounds good, worth to try. Tomorrow. Thx |
Hi
I got this error: docker: Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: container error: cgroup subsystem devices not found: unknown. Any idea? |
Yes, this is due to |
Thanks for the info. |
How have you set your repo list ? I have non-free list but |
@Lecrapouille you should be able to download / install the Debian 10 packages from the repository. |
|
Hello I did the install based on this guide: |
Thanks! I finally get gpu info when applying:
PS: @redskinhu your link is not working |
Corrected, Thx |
Installing the debian10 nvidia docker2 AND editing WITH the
(and restarting the docker service ) did it for me |
@frederico-klein Yeah, this is no problem, but it is not an ideal solution. we need a better solution. |
not sure about this with the --device /dev/nvidiactl --device /dev/nvidia0 |
Are you sure you have installed cuda or Nvidia-derive ? |
NVIDIA-SMI 495.44 is running what am I missing? |
Running my container in privileged mode got it working |
ok |
Today my system was updated with |
As a side note -- we plan to have an RC out with Here is the MR chain: |
Have to mention here, I use cgroup v2 :) $ ctr run --rm --runtime=io.containerd.runtime.v1.linux --env NVIDIA_VISIBLE_DEVICES=0 nvcr.io/nvidia/cuda:11.4.2-base-ubuntu20.04 test nvidia-smi -L
ctr: cgroups: cgroup mountpoint does not exist: unknown |
Good to know. Maybe I can help test this. |
Yes, I realize this, which is why you are wanting to set |
@klueska The following workaround does not work because the
|
You're right, I didn't test my suggestion (since it really is a pretty brutal hack). You would need to do something more sophisticated like:
(which I did test this time). |
We now have an RC of libnvidia-container out that adds support for If you would like to try it out, make sure and add the For DEBs
For RPMs
|
CentOS 8 + Cgroup v2 + containerd $ distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
> && curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.repo | sudo tee /etc/yum.repos.d/nvidia-docker.repo
$ dnf config-manager --set-enabled libnvidia-container-experimental
$ dnf install -y libnvidia-container-tools libnvidia-container1 nvidia-container-runtime The following commands now works
Cheers ! 💯 |
Debian11 doesn't fragrant? |
This RC was tested almost exclusively on a Debian 11 system, so I'd be surprised if it's not working there. That said, we don't yet officially support Debian 11, so you will need to add the apt repo for Debian 10:
And (as mentioned above) to get access to the RC package, you will need to enable the
And install it:
You should then see the following versions installed:
|
When will there be a GA release of this new libnvidia-container ? |
Sometime in the new year after it has been thoroughly tested and certified. |
This is very good news,looking forward to GA. At the same time, I also hope to consider supporting Debian ARM. |
In general this should also work on ARM, though you'll likely need to set the From my testing yesterday on a Note: The initial setup tests against Setup
Baseline test with latest stable
|
After I have the arm device, I'll try again |
sed: can't read /etc/apt/sources.list.d/libnvidia-container.list: No such file or directory what I'm doing wrong? |
You might instead have an nvidia-docker.list or a nvidia-container-runtime.list file instead of a libnvidia-container.list file. The command will be the same just swap out the file name at the end. |
Yup. Thank you! |
If I use the experimental versions do I still need to set the |
FYI, on CentOS 8, no extra configs are needed. Everything works fine. |
Hello, do you guys happen to know how to install NVIDIA drivers (version 418) on CentOS 8 ? The installer (from http://download.nvidia.com/XFree86/Linux-x86_64/418.113/NVIDIA-Linux-x86_64-418.113-no-compat32.run ) failed with: $ ./NVIDIA-Linux-x86_64-418.113-no-compat32.run --ui=none --disable-nouveau --no-install-libglvnd --dkms -s
Verifying archive integrity... OK
Uncompressing NVIDIA Accelerated Graphics Driver for Linux-x86_64 418.113........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
WARNING: One or more modprobe configuration files to disable Nouveau are already present at:
/usr/lib/modprobe.d/nvidia-installer-disable-nouveau.conf, /etc/modprobe.d/nvidia-installer-disable-nouveau.conf.
Please be sure you have rebooted your system since these files were written. If you have rebooted, then Nouveau
may be enabled for other reasons, such as being included in the system initial ramdisk or in your X configuration
file. Please consult the NVIDIA driver README and your Linux distribution's documentation for details on how to
correctly disable the Nouveau kernel driver.
WARNING: nvidia-installer was forced to guess the X library path '/usr/lib64' and X module path '/usr/lib64/xorg/modules';
these paths were not queryable from the system. If X fails to find the NVIDIA X driver module, please install
the `pkg-config` utility and the X.Org SDK/development package for your distribution and reinstall the driver.
ERROR: Failed to run `/usr/sbin/dkms build -m nvidia -v 418.113 -k 4.18.0-348.2.1.el8_5.x86_64`:
Building module:
cleaning build area...
'make' -j20 NV_EXCLUDE_BUILD_MODULES='' KERNEL_UNAME=4.18.0-348.2.1.el8_5.x86_64 IGNORE_CC_MISMATCH=''
modules....(bad exit status: 2)
Error! Bad return status for module build on kernel: 4.18.0-348.2.1.el8_5.x86_64 (x86_64)
Consult /var/lib/dkms/nvidia/418.113/build/make.log for more information.
ERROR: Failed to install the kernel module through DKMS. No kernel module was installed; please try installing again
without DKMS, or check the DKMS logs for more information.
ERROR: Installation has failed. Please see the file '/var/log/nvidia-installer.log' for details. You may find
suggestions on fixing installation problems in the README available on the Linux driver download page at
www.nvidia.com. DKMS make.log for nvidia-418.113 for kernel 4.18.0-348.2.1.el8_5.x86_64 (x86_64) Tue Dec 14 19:54:24 CST 2021 make[1]: Entering directory '/usr/src/kernels/4.18.0-348.2.1.el8_5.x86_64' make[2]: Entering directory '/usr/src/kernels/4.18.0-348.2.1.el8_5.x86_64' SYMLINK /var/lib/dkms/nvidia/418.113/build/nvidia/nv-kernel.o SYMLINK /var/lib/dkms/nvidia/418.113/build/nvidia-modeset/nv-modeset-kernel.o CONFTEST: INIT_WORK CONFTEST: remap_pfn_range CONFTEST: hash__remap_4k_pfn CONFTEST: follow_pfn CONFTEST: vmap CONFTEST: set_pages_uc CONFTEST: list_is_first CONFTEST: set_memory_uc CONFTEST: set_memory_array_uc CONFTEST: change_page_attr CONFTEST: pci_get_class CONFTEST: pci_choose_state CONFTEST: vm_insert_page CONFTEST: acpi_device_id CONFTEST: acquire_console_sem CONFTEST: console_lock CONFTEST: kmem_cache_create CONFTEST: on_each_cpu CONFTEST: smp_call_function CONFTEST: acpi_evaluate_integer CONFTEST: ioremap_cache CONFTEST: ioremap_wc CONFTEST: acpi_walk_namespace CONFTEST: pci_domain_nr CONFTEST: pci_dma_mapping_error CONFTEST: sg_alloc_table CONFTEST: sg_init_table CONFTEST: pci_get_domain_bus_and_slot CONFTEST: get_num_physpages CONFTEST: efi_enabled CONFTEST: proc_create_data CONFTEST: pde_data CONFTEST: proc_remove CONFTEST: pm_vt_switch_required CONFTEST: xen_ioemu_inject_msi CONFTEST: phys_to_dma CONFTEST: get_dma_ops CONFTEST: write_cr4 CONFTEST: of_get_property CONFTEST: of_find_node_by_phandle CONFTEST: of_node_to_nid CONFTEST: pnv_pci_get_npu_dev CONFTEST: of_get_ibm_chip_id CONFTEST: for_each_online_node CONFTEST: node_end_pfn CONFTEST: pci_bus_address CONFTEST: pci_stop_and_remove_bus_device CONFTEST: pci_remove_bus_device CONFTEST: request_threaded_irq CONFTEST: register_cpu_notifier CONFTEST: cpuhp_setup_state CONFTEST: dma_map_resource CONFTEST: backlight_device_register CONFTEST: register_acpi_notifier CONFTEST: timer_setup CONFTEST: pci_enable_msix_range CONFTEST: compound_order CONFTEST: do_gettimeofday CONFTEST: dma_direct_map_resource CONFTEST: vmf_insert_pfn CONFTEST: remap_page_range CONFTEST: address_space_init_once CONFTEST: kbasename CONFTEST: fatal_signal_pending CONFTEST: list_cut_position CONFTEST: vzalloc CONFTEST: wait_on_bit_lock_argument_count CONFTEST: bitmap_clear CONFTEST: usleep_range CONFTEST: radix_tree_empty CONFTEST: radix_tree_replace_slot CONFTEST: pnv_npu2_init_context CONFTEST: drm_dev_unref CONFTEST: drm_reinit_primary_mode_group CONFTEST: get_user_pages_remote CONFTEST: get_user_pages CONFTEST: drm_gem_object_lookup CONFTEST: drm_atomic_state_ref_counting CONFTEST: drm_driver_has_gem_prime_res_obj CONFTEST: drm_atomic_helper_connector_dpms CONFTEST: drm_connector_funcs_have_mode_in_name CONFTEST: drm_framebuffer_get CONFTEST: drm_gem_object_get CONFTEST: drm_dev_put CONFTEST: is_export_symbol_gpl_of_node_to_nid CONFTEST: is_export_symbol_present_swiotlb_map_sg_attrs CONFTEST: is_export_symbol_present_swiotlb_dma_ops CONFTEST: i2c_adapter CONFTEST: pm_message_t CONFTEST: irq_handler_t CONFTEST: acpi_device_ops CONFTEST: acpi_op_remove CONFTEST: outer_flush_all CONFTEST: proc_dir_entry CONFTEST: scatterlist CONFTEST: sg_table CONFTEST: file_operations CONFTEST: vm_operations_struct CONFTEST: atomic_long_type CONFTEST: file_inode CONFTEST: task_struct CONFTEST: kuid_t CONFTEST: dma_ops CONFTEST: swiotlb_dma_ops CONFTEST: dma_map_ops CONFTEST: noncoherent_swiotlb_dma_ops CONFTEST: vm_fault_present CONFTEST: vm_fault_has_address CONFTEST: backlight_properties_type CONFTEST: vmbus_channel_has_ringbuffer_page CONFTEST: kmem_cache_has_kobj_remove_work CONFTEST: sysfs_slab_unlink CONFTEST: fault_flags CONFTEST: atomic64_type CONFTEST: address_space CONFTEST: backing_dev_info CONFTEST: mm_context_t CONFTEST: vm_ops_fault_removed_vma_arg CONFTEST: node_states_n_memory CONFTEST: drm_bus_present CONFTEST: drm_bus_has_bus_type CONFTEST: drm_bus_has_get_irq CONFTEST: drm_bus_has_get_name CONFTEST: drm_driver_has_legacy_dev_list CONFTEST: drm_driver_has_set_busid CONFTEST: drm_crtc_state_has_connectors_changed CONFTEST: drm_init_function_args CONFTEST: drm_mode_connector_list_update_has_merge_type_bits_arg CONFTEST: drm_helper_mode_fill_fb_struct CONFTEST: drm_master_drop_has_from_release_arg CONFTEST: drm_driver_unload_has_int_return_type CONFTEST: kref_has_refcount_of_type_refcount_t CONFTEST: drm_atomic_helper_crtc_destroy_state_has_crtc_arg CONFTEST: drm_crtc_helper_funcs_has_atomic_enable CONFTEST: drm_mode_object_find_has_file_priv_arg CONFTEST: dma_buf_owner CONFTEST: drm_connector_list_iter CONFTEST: drm_atomic_helper_swap_state_has_stall_arg CONFTEST: drm_driver_prime_flag_present CONFTEST: dom0_kernel_present CONFTEST: nvidia_vgpu_hyperv_available CONFTEST: nvidia_vgpu_kvm_build CONFTEST: nvidia_grid_build CONFTEST: drm_available CONFTEST: drm_atomic_available CONFTEST: is_export_symbol_gpl_refcount_inc CONFTEST: is_export_symbol_gpl_refcount_dec_and_test CC [M] /var/lib/dkms/nvidia/418.113/build/nvidia/nv-frontend.o CC [M] /var/lib/dkms/nvidia/418.113/build/nvidia/nv-instance.o CC [M] /var/lib/dkms/nvidia/418.113/build/nvidia/nv.o CC [M] /var/lib/dkms/nvidia/418.113/build/nvidia/nv-acpi.o CC [M] /var/lib/dkms/nvidia/418.113/build/nvidia/nv-chrdev.o CC [M] /var/lib/dkms/nvidia/418.113/build/nvidia/nv-cray.o CC [M] /var/lib/dkms/nvidia/418.113/build/nvidia/nv-dma.o CC [M] /var/lib/dkms/nvidia/418.113/build/nvidia/nv-gvi.o CC [M] /var/lib/dkms/nvidia/418.113/build/nvidia/nv-i2c.o CC [M] /var/lib/dkms/nvidia/418.113/build/nvidia/nv-mempool.o CC [M] /var/lib/dkms/nvidia/418.113/build/nvidia/nv-mmap.o CC [M] /var/lib/dkms/nvidia/418.113/build/nvidia/nv-p2p.o CC [M] /var/lib/dkms/nvidia/418.113/build/nvidia/nv-pat.o CC [M] /var/lib/dkms/nvidia/418.113/build/nvidia/nv-procfs.o CC [M] /var/lib/dkms/nvidia/418.113/build/nvidia/nv-usermap.o CC [M] /var/lib/dkms/nvidia/418.113/build/nvidia/nv-vm.o CC [M] /var/lib/dkms/nvidia/418.113/build/nvidia/nv-vtophys.o CC [M] /var/lib/dkms/nvidia/418.113/build/nvidia/os-interface.o CC [M] /var/lib/dkms/nvidia/418.113/build/nvidia/os-mlock.o CC [M] /var/lib/dkms/nvidia/418.113/build/nvidia/os-pci.o /var/lib/dkms/nvidia/418.113/build/nvidia/nv.c: In function 'nvidia_probe': /var/lib/dkms/nvidia/418.113/build/nvidia/nv.c:4129:5: error: implicit declaration of function 'vga_tryget'; did you mean 'vga_get'? [-Werror=implicit-function-declaration] vga_tryget(VGA_DEFAULT_DEVICE, VGA_RSRC_LEGACY_MASK); ^~~~~~~~~~ vga_get CC [M] /var/lib/dkms/nvidia/418.113/build/nvidia/os-registry.o CC [M] /var/lib/dkms/nvidia/418.113/build/nvidia/os-usermap.o CC [M] /var/lib/dkms/nvidia/418.113/build/nvidia/nv-modeset-interface.o CC [M] /var/lib/dkms/nvidia/418.113/build/nvidia/nv-pci-table.o CC [M] /var/lib/dkms/nvidia/418.113/build/nvidia/nv-kthread-q.o CC [M] /var/lib/dkms/nvidia/418.113/build/nvidia/nv-kthread-q-selftest.o CC [M] /var/lib/dkms/nvidia/418.113/build/nvidia/nv-memdbg.o CC [M] /var/lib/dkms/nvidia/418.113/build/nvidia/nv-ibmnpu.o CC [M] /var/lib/dkms/nvidia/418.113/build/nvidia/nv-report-err.o CC [M] /var/lib/dkms/nvidia/418.113/build/nvidia/nv-rsync.o CC [M] /var/lib/dkms/nvidia/418.113/build/nvidia/nv-msi.o CC [M] /var/lib/dkms/nvidia/418.113/build/nvidia/nv_uvm_interface.o CC [M] /var/lib/dkms/nvidia/418.113/build/nvidia/nvlink_linux.o CC [M] /var/lib/dkms/nvidia/418.113/build/nvidia/linux_nvswitch.o CC [M] /var/lib/dkms/nvidia/418.113/build/nvidia-uvm/uvm_utils.o CC [M] /var/lib/dkms/nvidia/418.113/build/nvidia-uvm/uvm_common.o CC [M] /var/lib/dkms/nvidia/418.113/build/nvidia-uvm/uvm_linux.o CC [M] /var/lib/dkms/nvidia/418.113/build/nvidia-uvm/nvstatus.o CC [M] /var/lib/dkms/nvidia/418.113/build/nvidia-uvm/nvCpuUuid.o CC [M] /var/lib/dkms/nvidia/418.113/build/nvidia-uvm/uvm8.o CC [M] /var/lib/dkms/nvidia/418.113/build/nvidia-uvm/uvm8_tools.o CC [M] /var/lib/dkms/nvidia/418.113/build/nvidia-uvm/uvm8_global.o CC [M] /var/lib/dkms/nvidia/418.113/build/nvidia-uvm/uvm8_gpu.o CC [M] /var/lib/dkms/nvidia/418.113/build/nvidia-uvm/uvm8_gpu_isr.o cc1: some warnings being treated as errors make[3]: *** [/usr/src/kernels/4.18.0-348.2.1.el8_5.x86_64/scripts/Makefile.build:315: /var/lib/dkms/nvidia/418.113/build/nvidia/nv.o] Error 1 make[3]: *** Waiting for unfinished jobs.... make[2]: *** [/usr/src/kernels/4.18.0-348.2.1.el8_5.x86_64/Makefile:1571: _module_/var/lib/dkms/nvidia/418.113/build] Error 2 make[2]: Leaving directory '/usr/src/kernels/4.18.0-348.2.1.el8_5.x86_64' make[1]: *** [Makefile:157: sub-make] Error 2 make[1]: Leaving directory '/usr/src/kernels/4.18.0-348.2.1.el8_5.x86_64' make: *** [Makefile:81: modules] Error 2 |
Please see NVIDIA/libnvidia-container#111 (comment) for instructions on how to get access to this RC (or wait for the full release at the end of next week). Note: This does not directly add |
Release notes here: |
Debian 11 support has now been added such that running the following should now work as expected:
|
The newest version of Specifically this change in The latest release packages for the full
|
The text was updated successfully, but these errors were encountered: