Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nvidia passthrough broken #4

Closed
Ixian opened this issue Feb 27, 2023 · 105 comments
Closed

Nvidia passthrough broken #4

Ixian opened this issue Feb 27, 2023 · 105 comments

Comments

@Ixian
Copy link

Ixian commented Feb 27, 2023

Getting this error:


-- WARNING, the following logs are for debugging purposes only --

I0227 16:30:43.055366 3314 nvc.c:376] initializing library context (version=1.12.0, build=7678e1af094d865441d0bc1b97c3e72d15fcab50)
I0227 16:30:43.055432 3314 nvc.c:350] using root /
I0227 16:30:43.055437 3314 nvc.c:351] using ldcache /etc/ld.so.cache
I0227 16:30:43.055442 3314 nvc.c:352] using unprivileged user 65534:65534
I0227 16:30:43.055460 3314 nvc.c:393] attempting to load dxcore to see if we are running under Windows Subsystem for Linux (WSL)
I0227 16:30:43.055577 3314 nvc.c:395] dxcore initialization failed, continuing assuming a non-WSL environment
I0227 16:30:43.057645 3315 nvc.c:278] loading kernel module nvidia
I0227 16:30:43.057787 3315 nvc.c:282] running mknod for /dev/nvidiactl
I0227 16:30:43.057820 3315 nvc.c:286] running mknod for /dev/nvidia0
I0227 16:30:43.057840 3315 nvc.c:290] running mknod for all nvcaps in /dev/nvidia-caps
I0227 16:30:43.063197 3315 nvc.c:218] running mknod for /dev/nvidia-caps/nvidia-cap1 from /proc/driver/nvidia/capabilities/mig/config
I0227 16:30:43.063256 3315 nvc.c:218] running mknod for /dev/nvidia-caps/nvidia-cap2 from /proc/driver/nvidia/capabilities/mig/monitor
I0227 16:30:43.064371 3315 nvc.c:296] loading kernel module nvidia_uvm
I0227 16:30:43.064395 3315 nvc.c:300] running mknod for /dev/nvidia-uvm
I0227 16:30:43.064434 3315 nvc.c:305] loading kernel module nvidia_modeset
I0227 16:30:43.064464 3315 nvc.c:309] running mknod for /dev/nvidia-modeset
I0227 16:30:43.064644 3316 rpc.c:71] starting driver rpc service
I0227 16:30:43.064985 3314 rpc.c:135] driver rpc service terminated with signal 15
nvidia-container-cli: initialization error: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory
I0227 16:30:43.065009 3314 nvc.c:434] shutting down library context

Looks like everything might not be getting passed through.

@Jip-Hop
Copy link
Owner

Jip-Hop commented Feb 27, 2023

What does the log say after "Starting jail with the following command:" when you start the jail?

Also what is the output of nvidia-container-cli list on the host?

Thanks for testing an reporting!

@Ixian
Copy link
Author

Ixian commented Feb 27, 2023

 sudo ./jlmkr.py start dockerjail
Config loaded!

Starting jail with the following command:

systemd-run --property=KillMode=mixed --property=Type=notify --property=RestartForceExitStatus=133 --property=SuccessExitStatus=133 --property=Delegate=yes --property=TasksMax=infinity --collect --setenv=SYSTEMD_NSPAWN_LOCK=0 --unit=jlmkr-dockerjail --working-directory=./jails/dockerjail '--description=My nspawn jail dockerjail [created with jailmaker]' --setenv=SYSTEMD_SECCOMP=0 --property=DevicePolicy=auto -- systemd-nspawn --keep-unit --quiet --boot --machine=dockerjail --directory=rootfs --capability=all '--system-call-filter=add_key keyctl bpf' '--property=DeviceAllow=char-drm rw' --bind=/dev/dri --bind=/mnt/ssd-storage/appdata/ --bind=/mnt/Slimz/

Starting jail with name: dockerjail

Running as unit: jlmkr-dockerjail.service

Check logging:
journalctl -u jlmkr-dockerjail

Check status:
systemctl status jlmkr-dockerjail

Stop the jail:
machinectl stop dockerjail

Get a shell:
machinectl shell dockerjail

and

$ nvidia-container-cli list
/dev/nvidiactl
/dev/nvidia-uvm
/dev/nvidia-uvm-tools
/dev/nvidia-modeset
/dev/nvidia0
/usr/lib/nvidia/current/nvidia-smi
/usr/bin/nvidia-persistenced
/usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ml.so.515.65.01
/usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-cfg.so.515.65.01
/usr/lib/x86_64-linux-gnu/nvidia/current/libcuda.so.515.65.01
/usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ptxjitcompiler.so.515.65.01
/usr/lib/x86_64-linux-gnu/libnvidia-ngx.so.515.65.01
/usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-encode.so.515.65.01
/usr/lib/x86_64-linux-gnu/nvidia/current/libnvcuvid.so.515.65.01
/usr/lib/x86_64-linux-gnu/libnvidia-eglcore.so.515.65.01
/usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.515.65.01
/usr/lib/x86_64-linux-gnu/libnvidia-tls.so.515.65.01
/usr/lib/x86_64-linux-gnu/libnvidia-glsi.so.515.65.01
/usr/lib/x86_64-linux-gnu/libnvidia-rtcore.so.515.65.01
/usr/lib/x86_64-linux-gnu/nvidia/current/libGLX_nvidia.so.515.65.01
/usr/lib/x86_64-linux-gnu/libnvidia-glvkspirv.so.515.65.01

@Jip-Hop
Copy link
Owner

Jip-Hop commented Feb 27, 2023

Thanks! I have just updated the python script. Could you try again please?

@Ixian
Copy link
Author

Ixian commented Feb 27, 2023

Deleted old jail, started over with fresh new jail, getting this error when trying to start:

Do you want to start the jail? [Y/n] Y
Config loaded!
Traceback (most recent call last):
  File "/mnt/ssd-storage/jailmaker/./jlmkr.py", line 666, in <module>
    main()
  File "/mnt/ssd-storage/jailmaker/./jlmkr.py", line 651, in main
    create_jail(args.name)
  File "/mnt/ssd-storage/jailmaker/./jlmkr.py", line 613, in create_jail
    start_jail(jail_name)
  File "/mnt/ssd-storage/jailmaker/./jlmkr.py", line 108, in start_jail
    if subprocess.run(['modprobe', 'br_netfilter']).returncode == 0:
  File "/usr/lib/python3.9/subprocess.py", line 505, in run
    with Popen(*popenargs, **kwargs) as process:
  File "/usr/lib/python3.9/subprocess.py", line 951, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "/usr/lib/python3.9/subprocess.py", line 1823, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'modprobe'

@Ixian
Copy link
Author

Ixian commented Feb 27, 2023

More error detail - appears to be an extraneous `--bind-ro==' in the generated CLI now?

Config loaded!

Starting jail with the following command:

systemd-run --property=KillMode=mixed --property=Type=notify --property=RestartForceExitStatus=133 --property=SuccessExitStatus=133 --property=Delegate=yes --property=TasksMax=infinity --collect --setenv=SYSTEMD_NSPAWN_LOCK=0 --unit=jlmkr-gtjail --working-directory=./jails/gtjail '--description=My nspawn jail gtjail [created with jailmaker]' --setenv=SYSTEMD_SECCOMP=0 --property=DevicePolicy=auto -- systemd-nspawn --keep-unit --quiet --boot --machine=gtjail --directory=rootfs --capability=all '--system-call-filter=add_key keyctl bpf' '--property=DeviceAllow=char-drm rw' --bind=/dev/dri --bind=/dev/nvidiactl --bind=/dev/nvidia-uvm --bind=/dev/nvidia-uvm-tools --bind=/dev/nvidia-modeset --bind=/dev/nvidia0 --bind-ro==/usr/lib/nvidia/current/nvidia-smi --bind-ro==/usr/bin/nvidia-persistenced --bind-ro==/usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ml.so.515.65.01 --bind-ro==/usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-cfg.so.515.65.01 --bind-ro==/usr/lib/x86_64-linux-gnu/nvidia/current/libcuda.so.515.65.01 --bind-ro==/usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ptxjitcompiler.so.515.65.01 --bind-ro==/usr/lib/x86_64-linux-gnu/libnvidia-ngx.so.515.65.01 --bind-ro==/usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-encode.so.515.65.01 --bind-ro==/usr/lib/x86_64-linux-gnu/nvidia/current/libnvcuvid.so.515.65.01 --bind-ro==/usr/lib/x86_64-linux-gnu/libnvidia-eglcore.so.515.65.01 --bind-ro==/usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.515.65.01 --bind-ro==/usr/lib/x86_64-linux-gnu/libnvidia-tls.so.515.65.01 --bind-ro==/usr/lib/x86_64-linux-gnu/libnvidia-glsi.so.515.65.01 --bind-ro==/usr/lib/x86_64-linux-gnu/libnvidia-rtcore.so.515.65.01 --bind-ro==/usr/lib/x86_64-linux-gnu/nvidia/current/libGLX_nvidia.so.515.65.01 --bind-ro==/usr/lib/x86_64-linux-gnu/libnvidia-glvkspirv.so.515.65.01 --bind-ro== --bind=/mnt/ssd-storage/ --bind=/mnt/Slimz/

Starting jail with name: gtjail

Job for jlmkr-gtjail.service failed.
See "systemctl status jlmkr-gtjail.service" and "journalctl -xe" for details.

Failed to start the jail...
In case of a config error, you may fix it with:
nano jails/gtjail/config

@Jip-Hop
Copy link
Owner

Jip-Hop commented Feb 27, 2023

Ah you're right that doesn't look good. If you replace the double == with single ones and run the command itself directly to start the jail, does the Nvidia driver work inside the jail?

If so we know this approach will work and I should fix the double == in the code.

Thanks for helping. Since I don't have Nvidia GPU I couldn't test this part :)

@Jip-Hop
Copy link
Owner

Jip-Hop commented Feb 27, 2023

Should be fixed now.

@Ixian
Copy link
Author

Ixian commented Feb 27, 2023

Still same problem - I notice you changed this:

systemd_nspawn_additional_args.append(
                        f"--bind-ro={file_path}")

However it still isn't appending {file_path}, it just outputs a blank "--bind-ro=" and that is what stops the jail from starting.

If I remove the blank line I can start the jail however Nvidia drivers still don't appear to work inside it.

Something in the routine you have for mounting the directories using that subroutine to detect /dev or not seems to be broken but I can't see it.

@Ixian
Copy link
Author

Ixian commented Feb 27, 2023

More info:

The problem I outline above about the extraneous `--bind-ro==' appended to the launch string will prevent the machine from starting, however you can edit around that since it does appear to bind all the other directories, it's just adding that blank one at the end. I am not familiar enough with Python and how it handles loops (other than foreach is implicit) but that is likely simple to fix.

The bigger issue is it's still not passing through everything needed from the host as the following error still happens even when I mod the startup to get the jail running:

root@dockjail:~# nvidia-container-cli list
nvidia-container-cli: initialization error: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory

@Jip-Hop
Copy link
Owner

Jip-Hop commented Feb 28, 2023

Empty bind-ro line should now be fixed. Thanks! What happens when you run nvidia-smi -a directly inside the jail?

@Jip-Hop
Copy link
Owner

Jip-Hop commented Feb 28, 2023

Also please try these steps inside a fresh jail: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/nvidia-docker.html

@Jip-Hop
Copy link
Owner

Jip-Hop commented Feb 28, 2023

I think running ldconfig once inside the jail may cause the mounted drivers to be detected. NVIDIA/nvidia-docker#854

@Ixian
Copy link
Author

Ixian commented Feb 28, 2023

Thanks Jip-Hop - the empty bind-ro line is indeed fixed (and I learned something about python today reading your commit) however the Nvidia problems remain. Even running ldconfig in the jail, or in a nvidia container i.e.:

docker run --rm --runtime=nvidia --gpus all nvidia/cuda:11.6.2-base-ubuntu20.04 /bin/bash -c "ldconfig && nvidia-smi"

Still fails with the same error:

nvidia-container-cli: initialization error: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory: unknown.

@Ixian
Copy link
Author

Ixian commented Feb 28, 2023

Also, nvidia-container-runtime-list produces a blank output:

nvidia-container-runtime list
ID          PID         STATUS      BUNDLE      CREATED     OWNER

and nvidia-smi isn't available inside the jail at all.

As a sanity check, it does all work outside the jail, I double-checked to make sure I hadn't opened a shell on the wrong machine :)

@Jip-Hop
Copy link
Owner

Jip-Hop commented Feb 28, 2023

Could you try /usr/lib/nvidia/current/nvidia-smi -a inside the jail? Perhaps after running ldconfig once inside the jail. The nvidia-smi binary should be available inside the jail as far as I can tell from the bind mount flags you've posted. It's probably not available in the path so you need to use the absolute path.

@Ixian
Copy link
Author

Ixian commented Feb 28, 2023

# /usr/lib/nvidia/current/nvidia-smi -a
NVIDIA-SMI couldn't find libnvidia-ml.so library in your system. Please make sure that the NVIDIA Display Driver is properly installed and present in your system.
Please also try adding directory that contains libnvidia-ml.so to your system PATH.

Edit: Also ran ldconfig

@Jip-Hop
Copy link
Owner

Jip-Hop commented Feb 28, 2023

I suppose there may still be a (config) file missing in the list of files to bind mount.

This shows the approach should work:
https://wiki.archlinux.org/title/systemd-nspawn#Nvidia_GPUs

Maybe something is missing from our list?

@Jip-Hop
Copy link
Owner

Jip-Hop commented Feb 28, 2023

Aha!

Please also try adding directory that contains libnvidia-ml.so to your system PATH.

@Ixian
Copy link
Author

Ixian commented Feb 28, 2023

libnvidia-ml.so isn't being passed to the jail; find / -name libnvidia-ml.so returns nothing.
On the Scale host itself it returns

]# find / -name libnvidia-ml.so
/usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ml.so

Which doesn't appear to be bound in your script looking at how the directories are enumerated, unless I am missing a piece?

@Jip-Hop
Copy link
Owner

Jip-Hop commented Feb 28, 2023

And if so search with a wildcard at the end? It is being bind mounted but it has a different suffix...

@Jip-Hop
Copy link
Owner

Jip-Hop commented Feb 28, 2023

Maybe I need to do something similar to this:

NVIDIA/nvidia-docker#1163 (comment)

Too bad this needs additional investigation...

@Ixian
Copy link
Author

Ixian commented Feb 28, 2023

find / -name libnvidia-ml.so

Finds this yes /usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ml.so.515.65.01

@Jip-Hop
Copy link
Owner

Jip-Hop commented Feb 28, 2023

O.k. so I now have hard-coded to also mount /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1 since it seems this is not listed by nvidia-container-cli but is required for it to work.

I now no longer get the error related to libnvidia-ml.so.1 inside the jail. Now I get this (which I also get on the host so that's probably related to me not having a nvidia GPU).

/usr/lib/nvidia/current/nvidia-smi -a
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

Has this fixed it for you?

@Ixian
Copy link
Author

Ixian commented Feb 28, 2023

Ah, progress :) Yes, now nvidia-smi picks it up in the jail itself, however it fails inside containers running in the jail. Looks like /usr/lib/nvidia/current needs to be in the system path, imagine that would be better to do with the script?

@Ixian
Copy link
Author

Ixian commented Feb 28, 2023

Actually, the problem is a little weirder.

I run this (standard test, from the Nvidia site, done it dozens of times):

docker run --rm --runtime=nvidia --gpus all nvidia/cuda:11.6.2-base-ubuntu20.04 nvidia-smi

And get this error

docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: "nvidia-smi": executable file not found in $PATH: unknown.

But even adding the correct directory to my path in Bashrc/etc. doesn't fix it. Something else strange is going on here.

@Jip-Hop
Copy link
Owner

Jip-Hop commented Feb 28, 2023

Probably needs to be in path system wide not just for current user?

But nice, progress!

@Ixian
Copy link
Author

Ixian commented Feb 28, 2023

Something is off with it and it has to be due to how the drivers are pulled in from the host. We still might be missing something.

@Jip-Hop
Copy link
Owner

Jip-Hop commented Mar 1, 2023

When you get to the point that it works inside the jail, but not in a docker container, can you try (after having installed nvidia docker):

docker run --rm --gpus all nvidia/cuda:11.0-base bash -c "ldconfig && nvidia-smi"

@Talung
Copy link

Talung commented Mar 1, 2023

Just tried the latest update to test the nvidia part, and am also getting errors starting it. Config file looks fine.

Mar 01 20:51:05 truenas systemd-nspawn[1986823]: Failed to stat /dev/nvidia-modeset: No such file or directory
Mar 01 20:51:05 truenas systemd[1]: jlmkr-dockerjail.service: Main process exited, code=exited, status=1/FAILURE

I can run nvidia-smi

root@truenas[~]# nvidia-smi
Wed Mar  1 20:53:07 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.65.01    Driver Version: 515.65.01    CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:03:00.0 Off |                  N/A |
| 29%   38C    P5    20W / 180W |      0MiB /  8192MiB |      2%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

and

root@truenas[/mnt/pond/jailmaker]# nvidia-container-cli list
/dev/nvidiactl
/dev/nvidia-modeset
/dev/nvidia0
/usr/lib/nvidia/current/nvidia-smi
/usr/bin/nvidia-persistenced
/usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ml.so.515.65.01
/usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-cfg.so.515.65.01
/usr/lib/x86_64-linux-gnu/nvidia/current/libcuda.so.515.65.01
/usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ptxjitcompiler.so.515.65.01
/usr/lib/x86_64-linux-gnu/libnvidia-ngx.so.515.65.01
/usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-encode.so.515.65.01
/usr/lib/x86_64-linux-gnu/nvidia/current/libnvcuvid.so.515.65.01
/usr/lib/x86_64-linux-gnu/libnvidia-eglcore.so.515.65.01
/usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.515.65.01
/usr/lib/x86_64-linux-gnu/libnvidia-tls.so.515.65.01
/usr/lib/x86_64-linux-gnu/libnvidia-glsi.so.515.65.01
/usr/lib/x86_64-linux-gnu/libnvidia-rtcore.so.515.65.01
/usr/lib/x86_64-linux-gnu/nvidia/current/libGLX_nvidia.so.515.65.01
/usr/lib/x86_64-linux-gnu/libnvidia-glvkspirv.so.515.65.01

so it is there, just not being picked up? I can see if anything is passed through to the jail itself as can't get that running.

@Jip-Hop
Copy link
Owner

Jip-Hop commented Mar 1, 2023

Inside the jail please follow the official steps to get nvidia working with Docker: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/nvidia-docker.html

That would also setup the daemon.json file with nvidia settings.

Then please run ldconfig inside the jail once and then try:

docker run --rm --gpus all nvidia/cuda:11.0-base bash -c "ldconfig && nvidia-smi"

Looking forward to hearing how that goes.

@Jip-Hop
Copy link
Owner

Jip-Hop commented Mar 4, 2023

Any chance you could post the logs of the jailmaker script when it is starting dockerjail after a reboot?

You may need to redirect the output somewhere with > or mail them by piping the output like so: ./jlmkr.py start dockerjail | mail -s "Jailmaker" "youremail@example.com"

Or you could temporarily disable the startup script and run jlmkr manually after the reboot.

I'm tempted to just call nvidia-smi once before nvidia-container-toolkit list just to be done with it.

@Talung
Copy link

Talung commented Mar 4, 2023

Sure, no problem. Here is the loadlog

root@truenas[/mnt/pond/jailmaker]# cat loadlog.log
Config loaded!

Starting jail with the following command:

systemd-run --property=KillMode=mixed --property=Type=notify --property=RestartForceExitStatus=133 --property=SuccessExitStatus=133 --property=Delegate=yes --property=TasksMax=infinity --collect --setenv=SYSTEMD_NSPAWN_LOCK=0 --unit=jlmkr-dockerjail --working-directory=./jails/dockerjail '--description=My nspawn jail dockerjail [created with jailmaker]' --setenv=SYSTEMD_SECCOMP=0 --property=DevicePolicy=auto -- systemd-nspawn --keep-unit --quiet --boot --machine=dockerjail --directory=rootfs --capability=all '--system-call-filter=add_key keyctl bpf' '--property=DeviceAllow=char-drm rw' --bind=/dev/dri --bind=/mnt/pond/dockerset --bind=/mnt/pond/appdata/ --bind=/mnt/lake/media/ --bind=/mnt/lake/cloud/

Starting jail with name: dockerjail


Check logging:
journalctl -u jlmkr-dockerjail

Check status:
systemctl status jlmkr-dockerjail

Stop the jail:
machinectl stop dockerjail

Get a shell:
machinectl shell dockerjail

There is no nvidia stuff in there. And here is the log after stopping and starting. In between ran nvidia-smi and nvidia-container-cli list

root@truenas[/mnt/pond/jailmaker]# machinectl stop dockerjail
root@truenas[/mnt/pond/jailmaker]# ./jlmkr.py start dockerjail
Config loaded!

Starting jail with the following command:

systemd-run --property=KillMode=mixed --property=Type=notify --property=RestartForceExitStatus=133 --property=SuccessExitStatus=133 --property=Delegate=yes --property=TasksMax=infinity --collect --setenv=SYSTEMD_NSPAWN_LOCK=0 --unit=jlmkr-dockerjail --working-directory=./jails/dockerjail '--description=My nspawn jail dockerjail [created with jailmaker]' --setenv=SYSTEMD_SECCOMP=0 --property=DevicePolicy=auto -- systemd-nspawn --keep-unit --quiet --boot --machine=dockerjail --directory=rootfs --capability=all '--system-call-filter=add_key keyctl bpf' '--property=DeviceAllow=char-drm rw' --bind=/dev/dri --bind-ro=/usr/lib/x86_64-linux-gnu/libnvidia-rtcore.so.515.65.01 --bind-ro=/usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-encode.so.515.65.01 --bind=/dev/nvidia-uvm-tools --bind-ro=/usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ptxjitcompiler.so.515.65.01 --bind-ro=/usr/lib/x86_64-linux-gnu/libnvidia-ngx.so.515.65.01 --bind-ro=/usr/lib/x86_64-linux-gnu/nvidia/current/libGLX_nvidia.so.515.65.01 --bind-ro=/usr/lib/x86_64-linux-gnu/nvidia/current/libnvcuvid.so.515.65.01 --bind=/dev/nvidiactl --bind-ro=/usr/lib/x86_64-linux-gnu/libnvidia-eglcore.so.515.65.01 --bind-ro=/usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ml.so.515.65.01 --bind=/dev/nvidia-uvm --bind-ro=/usr/lib/x86_64-linux-gnu/libnvidia-tls.so.515.65.01 --bind-ro=/usr/lib/x86_64-linux-gnu/libnvidia-glsi.so.515.65.01 --bind-ro=/usr/bin/nvidia-persistenced --bind-ro=/usr/lib/x86_64-linux-gnu/libnvidia-glvkspirv.so.515.65.01 --bind=/dev/nvidia-caps --bind-ro=/usr/lib/nvidia/current/nvidia-smi --bind-ro=/usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-cfg.so.515.65.01 --bind-ro=/usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.515.65.01 --bind=/dev/nvidia0 --bind-ro=/usr/lib/x86_64-linux-gnu/nvidia/current/libcuda.so.515.65.01 --bind-ro=/usr/bin/nvidia-smi --bind=/mnt/pond/dockerset --bind=/mnt/pond/appdata/ --bind=/mnt/lake/media/ --bind=/mnt/lake/cloud/

Starting jail with name: dockerjail

Running as unit: jlmkr-dockerjail.service

Check logging:
journalctl -u jlmkr-dockerjail

Check status:
systemctl status jlmkr-dockerjail

Stop the jail:
machinectl stop dockerjail

Get a shell:
machinectl shell dockerjail

My post init command is as follows:

/mnt/pond/jailmaker/jlmkr.py start dockerjail > /mnt/pond/jailmaker/loadlog.log

Unfortunately I won't be able to do a lot more testing for the next week as packing up the PC's soon and moving. Hopefully next week Thursday will have most of the stuff up and running so I can do more testing.

@Jip-Hop
Copy link
Owner

Jip-Hop commented Mar 4, 2023

Thanks @Talung. Was that with the latest script? I was expecting to see "No nvidia GPU seems to be present... Skip passthrough of nvidia GPU." in the first case.

But I think it's clear that /dev/nvidia* don't exist yet that soon after boot so I can't rely on that to detect if an nvidia GPU is installed.

@Talung
Copy link

Talung commented Mar 4, 2023

Yes. This morning read all the other posts, then ran the update and did a reboot. So unless the script has changed in the last 7 hours, that should be the latest script. Maybe add a little version number to the output so we can confirm that sort of thing. Also, a "run log" where you store the config would also be good for debugging.

Just suggestions. :)

@Jip-Hop
Copy link
Owner

Jip-Hop commented Mar 4, 2023

Thanks @Talung. Versioning has started. We're at v0.0.1.

If anyone could test the following sequence:

  • Reboot (don't run jailmaker on post-init, also don't run any nvidia modprobe scripts, don't run nvidia-smi manually, don't run any apps which would 'trigger' the GPU)
  • Create a new jail with nvidia GPU passthrough (did it say "Detected the presence of an nvidia GPU." and allow you to enable passthrough?)
  • Please test if the GPU works properly in the jail you just created (and started)
  • Then set jailmaker as post-init script to start the jail you just made (make it save the log somewhere)
  • Reboot

Was the jail started with nvidia gpu passthrough working (without manually running nvidia-smi or modprobe)?

@Talung
Copy link

Talung commented Mar 4, 2023

Did you change anything else on the script besides the versioning? Was going through those tests you suggested, got the latest script (with version numbers), disabled the post init run (but actually I didn't because I didn't hit the save button) and rebooted.

Did the whole setup:

root@truenas[~]# uptime
 18:29:49 up 1 min,  1 user,  load average: 7.09, 1.97, 0.68
root@truenas[~]# cd /mnt/pond/jailmaker
root@truenas[/mnt/pond/jailmaker]# ./jlmkr.py create testjail
USE THIS SCRIPT AT YOUR OWN RISK!
IT COMES WITHOUT WARRANTY AND IS NOT SUPPORTED BY IXSYSTEMS.

Install the recommended distro (Debian 11)? [Y/n]

Enter jail name: testjail

Docker won't be installed by jlmkr.py.
But it can setup the jail with the capabilities required to run docker.
You can turn DOCKER_COMPATIBLE mode on/off post-install.

Make jail docker compatible right now? [y/N] y

Detected the presence of an intel GPU.
Passthrough the intel GPU? [y/N] y
Detected the presence of an nvidia GPU.
Passthrough the nvidia GPU? [y/N] y

WARNING: CHECK SYNTAX

You may pass additional flags to systemd-nspawn.
With incorrect flags the jail may not start.
It is possible to correct/add/remove flags post-install.

Show the man page for systemd-nspawn? [y/N]

You may read the systemd-nspawn manual online:
https://manpages.debian.org/bullseye/systemd-container/systemd-nspawn.1.en.html

For example to mount directories inside the jail you may add:
--bind='/mnt/data/a writable directory/' --bind-ro='/mnt/data/a readonly directory/'

Additional flags:

Using image from local cache
Unpacking the rootfs

---
You just created a Debian bullseye amd64 (20230303_05:25) container.

To enable SSH, run: apt install openssh-server
No default root or user password are set by LXC.

Do you want to start the jail? [Y/n] y
Config loaded!

Starting jail with the following command:

systemd-run --property=KillMode=mixed --property=Type=notify --property=RestartForceExitStatus=133 --property=SuccessExitStatus=133 --property=Delegate=yes --property=TasksMax=infinity --collect --setenv=SYSTEMD_NSPAWN_LOCK=0 --unit=jlmkr-testjail --working-directory=./jails/testjail '--description=My nspawn jail testjail [created with jailmaker]' --setenv=SYSTEMD_SECCOMP=0 --property=DevicePolicy=auto -- systemd-nspawn --keep-unit --quiet --boot --machine=testjail --directory=rootfs --capability=all '--system-call-filter=add_key keyctl bpf' '--property=DeviceAllow=char-drm rw' --bind=/dev/dri --bind-ro=/usr/lib/nvidia/current/nvidia-smi --bind-ro=/usr/lib/x86_64-linux-gnu/nvidia/current/libnvcuvid.so.515.65.01 --bind-ro=/usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ptxjitcompiler.so.515.65.01 --bind-ro=/usr/lib/x86_64-linux-gnu/libnvidia-eglcore.so.515.65.01 --bind-ro=/usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-encode.so.515.65.01 --bind-ro=/usr/lib/x86_64-linux-gnu/libnvidia-ngx.so.515.65.01 --bind-ro=/usr/bin/nvidia-persistenced --bind-ro=/usr/lib/x86_64-linux-gnu/libnvidia-rtcore.so.515.65.01 --bind-ro=/usr/bin/nvidia-smi --bind-ro=/usr/lib/x86_64-linux-gnu/nvidia/current/libcuda.so.515.65.01 --bind-ro=/usr/lib/x86_64-linux-gnu/nvidia/current/libGLX_nvidia.so.515.65.01 --bind-ro=/usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ml.so.515.65.01 --bind-ro=/usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.515.65.01 --bind-ro=/usr/lib/x86_64-linux-gnu/libnvidia-tls.so.515.65.01 --bind-ro=/usr/lib/x86_64-linux-gnu/libnvidia-glsi.so.515.65.01 --bind-ro=/usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-cfg.so.515.65.01 --bind=/dev/nvidia-caps --bind=/dev/nvidia0 --bind-ro=/usr/lib/x86_64-linux-gnu/libnvidia-glvkspirv.so.515.65.01 --bind=/dev/nvidiactl

Starting jail with name: testjail

Running as unit: jlmkr-testjail.service

Check logging:
journalctl -u jlmkr-testjail

Check status:
systemctl status jlmkr-testjail

Stop the jail:
machinectl stop testjail

Get a shell:
machinectl shell testjail

And then noticed I had an email from watchtower, which I then realised I didn't save the "disabled" change. However, this time the GPU iniitialised on boot. Here is the log:

root@truenas[/mnt/pond/jailmaker]# cat loadlog.log
Config loaded!

Starting jail with the following command:

systemd-run --property=KillMode=mixed --property=Type=notify --property=RestartForceExitStatus=133 --property=SuccessExitStatus=133 --property=Delegate=yes --property=TasksMax=infinity --collect --setenv=SYSTEMD_NSPAWN_LOCK=0 --unit=jlmkr-dockerjail --working-directory=./jails/dockerjail '--description=My nspawn jail dockerjail [created with jailmaker]' --setenv=SYSTEMD_SECCOMP=0 --property=DevicePolicy=auto -- systemd-nspawn --keep-unit --quiet --boot --machine=dockerjail --directory=rootfs --capability=all '--system-call-filter=add_key keyctl bpf' '--property=DeviceAllow=char-drm rw' --bind=/dev/dri --bind-ro=/usr/lib/x86_64-linux-gnu/libnvidia-glvkspirv.so.515.65.01 --bind-ro=/usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-encode.so.515.65.01 --bind-ro=/usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-cfg.so.515.65.01 --bind-ro=/usr/lib/nvidia/current/nvidia-smi --bind-ro=/usr/bin/nvidia-smi --bind=/dev/nvidia0 --bind-ro=/usr/lib/x86_64-linux-gnu/libnvidia-tls.so.515.65.01 --bind-ro=/usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ptxjitcompiler.so.515.65.01 --bind-ro=/usr/lib/x86_64-linux-gnu/nvidia/current/libcuda.so.515.65.01 --bind-ro=/usr/lib/x86_64-linux-gnu/nvidia/current/libGLX_nvidia.so.515.65.01 --bind=/dev/nvidiactl --bind-ro=/usr/bin/nvidia-persistenced --bind-ro=/usr/lib/x86_64-linux-gnu/libnvidia-glsi.so.515.65.01 --bind-ro=/usr/lib/x86_64-linux-gnu/libnvidia-ngx.so.515.65.01 --bind-ro=/usr/lib/x86_64-linux-gnu/libnvidia-eglcore.so.515.65.01 --bind-ro=/usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.515.65.01 --bind-ro=/usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ml.so.515.65.01 --bind=/dev/nvidia-caps --bind-ro=/usr/lib/x86_64-linux-gnu/nvidia/current/libnvcuvid.so.515.65.01 --bind-ro=/usr/lib/x86_64-linux-gnu/libnvidia-rtcore.so.515.65.01 --bind=/mnt/pond/dockerset --bind=/mnt/pond/appdata/ --bind=/mnt/lake/media/ --bind=/mnt/lake/cloud/

Starting jail with name: dockerjail

Looking at the commit history, I see some other changes were made and whatever it was, it seems to have worked.

EDIT: for funsies I did another reboot and guess what... GPU was in jail again!

@Jip-Hop
Copy link
Owner

Jip-Hop commented Mar 4, 2023

Sounds good! Thanks @Talung

Yes I did more then increment the version number hehe ^^

Detected the presence of an nvidia GPU.
Passthrough the nvidia GPU? [y/N] y

This looks good, as it detected nvidia GPU straight after reboot thanks to nvidia-smi. No longer depending on /dev/nvidia* devices to exist.

And then you did another reboot and it ran nvidia-container-toolkit list successfully because the script now runs nvidia-smi beforehand (good idea @Ixian).

So seems to be working now!?

@Talung
Copy link

Talung commented Mar 4, 2023

Well if working means that I did 2 reboots and GPU came up both times without issue in a jail with GPU passthrough, then I would say: "Yes, it is working!"

Well done!

@CompyENG
Copy link
Contributor

CompyENG commented Mar 4, 2023

Grabbed the lastest script and just tried a reboot myself, and I'm definitely running into the linked issue

Everything 'seemed' to be working (nvidia-smi ran successfully in host, jail, and container), but Plex refused to do HW transcoding. I also tried a tensorflow docker container and my GPU wasn't listed.

After poking around a while, I discovered that I didn't have /dev/nvidia-uvm . The module was loaded, and I even tried unloading and reloading the module. I also tried starting nvidia-persistenced, but nothing seemed to work.

I stopped the jail, ran the mknod for /dev/nvidia-uvm and /dev/nvidia-uvm-tools

  D=`grep nvidia-uvm /proc/devices | awk '{print $1}'`

  mknod -m 666 /dev/nvidia-uvm c $D 0
  mknod -m 666 /dev/nvidia-uvm-tools c $D 0

Then re-started the jail, and transcoding in Plex worked! Tried the tensorflow container again and it listed my GPU.

So it seems like 'something' is still missing to get the nvidia-uvm device created.

@CompyENG
Copy link
Contributor

CompyENG commented Mar 4, 2023

Probably worth noting that I'm on TrueNAS scale 22.12.1

It seems that nvidia-modprobe doesn't work because the modules are named nvidia-current-*.ko instead of just nvidia-*.ko

root@freenas:~# find /lib/modules -name nvidia\*
/lib/modules/5.15.79+truenas/kernel/drivers/net/ethernet/nvidia
/lib/modules/5.15.79+truenas/updates/dkms/nvidia-current-drm.ko
/lib/modules/5.15.79+truenas/updates/dkms/nvidia-current-peermem.ko
/lib/modules/5.15.79+truenas/updates/dkms/nvidia-current-modeset.ko
/lib/modules/5.15.79+truenas/updates/dkms/nvidia-current-uvm.ko
/lib/modules/5.15.79+truenas/updates/dkms/nvidia-current.ko

But nvidia-modprobe is hard-coded to use nvidia-uvm as the module name.

I did get nvidia-modprobe to do the right thing by creating a symbolic link and running depmod

ln -s /lib/modules/5.15.79+truenas/updates/dkms/nvidia-current-uvm.ko /lib/modules/5.15.79+truenas/updates/dkms/nvidia-uvm.ko
depmod
nvidia-modprobe -c0 -u

After that, /dev/nvidia-uvm exists.

Since the mknod commands are documented by nvidia, that solution feels a bit less 'hacky'

@Ixian
Copy link
Author

Ixian commented Mar 4, 2023

@TrueJournals I've had to have the following running as a pre-init command since at least 2 Scale releases past:

[ ! -f /dev/nvidia-uvm ] && modprobe nvidia-current-uvm && /usr/bin/nvidia-modprobe -c0 -u

In order to keep the situation you are seeing from happening. That was true even when I was running docker off the Scale host itself. I've had it in there ever since and I haven't had the problem you are seeing.

I think @Jip-Hop added it to the script as well but I believe it is something that needs to happen pre-init if you want your Nvidia GPU to reliably show up in Scale. Something to do with how IX Systems won't load it unless called upon to eliminate boot logging errors. The K3S backed app system handles it behind the scenes when it is used, we need to do it manually.

@CompyENG
Copy link
Contributor

CompyENG commented Mar 4, 2023

Thanks for that tip @Ixian ! Looks like that will do it. Quick log from boot (without any special init):

root@freenas:~# ls /dev/nvid*
ls: cannot access '/dev/nvid*': No such file or directory
root@freenas:~# lsmod | grep nvid
nvidia_drm             73728  0
nvidia_modeset       1150976  1 nvidia_drm
nvidia              40853504  1 nvidia_modeset
drm_kms_helper        315392  1 nvidia_drm
drm                   643072  4 drm_kms_helper,nvidia,nvidia_drm
root@freenas:~# modprobe nvidia-current-uvm
root@freenas:~# ls /dev/nvid*
ls: cannot access '/dev/nvid*': No such file or directory
root@freenas:~# lsmod | grep nvid
nvidia_uvm           1302528  0
nvidia_drm             73728  0
nvidia_modeset       1150976  1 nvidia_drm
nvidia              40853504  2 nvidia_uvm,nvidia_modeset
drm_kms_helper        315392  1 nvidia_drm
drm                   643072  4 drm_kms_helper,nvidia,nvidia_drm
root@freenas:~# nvidia-modprobe -c0 -u
root@freenas:~# ls /dev/nvid*
/dev/nvidia-uvm  /dev/nvidia-uvm-tools
root@freenas:~# lsmod | grep nvid
nvidia_uvm           1302528  0
nvidia_drm             73728  0
nvidia_modeset       1150976  1 nvidia_drm
nvidia              40853504  2 nvidia_uvm,nvidia_modeset
drm_kms_helper        315392  1 nvidia_drm
drm                   643072  4 drm_kms_helper,nvidia,nvidia_drm

Running nvidia-smi will then create the /dev/nvidia0 and /dev/nvidiactl devices.

Looks like the most recent commit removed the modprobe in favor of just running nvidia-smi

So, I guess this is the answer for the TODO @Jip-Hop -- nvidia-smi is necessary, but not sufficient :) The modprobe and nvidia-modprobe must be run as well.

@Ixian
Copy link
Author

Ixian commented Mar 4, 2023

@Jip-Hop I just went through the latest script (0.0.1 and thanks for adding versioning) and I think it's really coming together, like the changes, learned a few new things about Python too so thanks :)

I'm using 0.0.1 now and so far so good, gone through multiple reboot tests and everything launches clean & my GPU works, I'm able to use hw transcoding in Plex & Tdarr (tested both after each). Haven't seen any other problems (performance, etc.) yet but will keep an eye on things. I think I'm ready to switch over to this full time vs. running docker directly on the host. Famous last words but: Fingers crossed :)

@Ixian
Copy link
Author

Ixian commented Mar 4, 2023

Thanks for that tip @Ixian ! Looks like that will do it. Quick log from boot (without any special init):

Running nvidia-smi will then create the /dev/nvidia0 and /dev/nvidiactl devices.

Looks like the most recent commit removed the modprobe in favor of just running nvidia-smi

So, I guess this is the answer for the TODO @Jip-Hop -- nvidia-smi is necessary, but not sufficient :) The modprobe and nvidia-modprobe must be run as well.

Yep, I just saw he removed it as well BUT I think that's fine, I am pretty certain the correct order to load the modules during boot is pre-init so probably just an instruction to add it as a pre-init command is enough. That's what we did when we first started running DIY docker with Scale.

Here's a screenshot @Jip-Hop if you want to add it to the readme:
nvidia-modules

@CompyENG
Copy link
Contributor

CompyENG commented Mar 4, 2023

With the pre-init script, things are working -- but it looks like nvidia-container-cli doesn't work. It seems that 'something' still isn't initialized without running nvidia-smi, but latest script checks for /dev/nvidia-uvm to decide to run nvidia-smi. Ended up with this error on jlmkr.py start

nvidia-container-cli: initialization error: nvml error: driver not loaded

Unable to detect which nvidia driver files to mount.
Falling back to hard-coded list of nvidia files...

I decided to just add nvidia-smi to my pre-init command. I also thought it might be a good idea to run modprobe-nvidia regardless of if the modprobe nvidia-current-uvm works (if the module name changes to just nvidia-uvm in the future...)

I also changed to detect the path to modprobe instead of relying on PATH or on a hard-coded path. Probably not necessary, but I found it interesting.

So, my final pre-init command is:

[ ! -f /dev/nvidia-uvm ] && ( $(cat /proc/sys/kernel/modprobe) nvidia-current-uvm; /usr/bin/nvidia-modprobe -c0 -u; nvidia-smi -f /dev/null )

@Jip-Hop
Copy link
Owner

Jip-Hop commented Mar 4, 2023

I had no idea it would take 5 days and about 100 comments to get nvidia passthrough working >.<

Updated the script to v0.0.2. I removed some code I think we no longer need, as long as the pre-init command command is scheduled (this one or the one above this comment).

Would be great if you could run through the testing sequence again (and run whatever additional tests you think are relevant).

If this works I'll add documentation regarding the pre-init command.

P.S. @TrueJournals if you have an idea how to run ldconfig inside the jail without having to resort to hardcoding /usr/lib/x86_64-linux-gnu/nvidia/current and writing a new .conf file, that would be great. I tried some different things, without success and I'm not to thrilled about the current solution.

@CompyENG
Copy link
Contributor

CompyENG commented Mar 4, 2023

P.S. @TrueJournals if you have an idea how to run ldconfig inside the jail without having to resort to hardcoding /usr/lib/x86_64-linux-gnu/nvidia/current and writing a new .conf file, that would be great. I tried some different things, without success and I'm not to thrilled about the current solution.

Alright, you got me curious ;) I dug into this, because I was curious how nvidia handled this. So I dug through libnvidia-container and container-toolkit. Here's what I can tell...

TLDR: They find all unique folders from nvidia-container-cli list, and create a file in /etc/ld.so.conf.d based on that.

nvidia has a hard-coded list of libraries in libnvidia-container. Actually, this is multiple lists depending on what capabilities you want in the container. In order to find the full path to these libraries, they parse the ldcache file directly to turn the short library names into full paths. You can see that also in find_library_paths

Over in container-toolkit (which contains the 'hooks' for when containers are created or whatever), there's code to get a list of libraries from "mounts" (a little unclear what these mounts are -- assuming mounts on the container?) by matching paths against lib?*.so* (syntax for Match). In this same file, they have a function that generates a list of unique folders for this list of files.

Finally, they can create a file in /etc/ld.so.conf.d with a random name that lists all these folders and run ldconfig. It looks like this happens outside the container itself by using the -r option on ldconfig.

Now, what I'm still a little confused by is that I don't actually see this happening in my docker container. What's also weird is that libraries show up like this:

root@f7ca5192b700:/# ls -al /usr/lib/x86_64-linux-gnu/libnvidia-encode.so.1
lrwxrwxrwx 1 root root 29 Mar  2 17:14 /usr/lib/x86_64-linux-gnu/libnvidia-encode.so.1 -> libnvidia-encode.so.515.65.01

Even though that library is located at /usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-encode.so.515.65.01 on the "host" (the 'jail' in this case). So it seems like there's another level of remapping and some additional optimization but I'm not quite sure how that works.

Anyway, the logic of 'discover the paths based on the list of libraries' seems reasonable enough. You could even run nvidia-container-cli list --libraries to get the list of libraries (without binaries and other files) if you didn't want to filter down based on filename patterns.

@Jip-Hop
Copy link
Owner

Jip-Hop commented Mar 4, 2023

Thanks for digging into this :)

TLDR: They find all unique folders from nvidia-container-cli list, and create a file in /etc/ld.so.conf.d based on that.

Well, then I will no longer feel bad for writing that file :')

Using the output of nvidia-container-cli list --libraries to determine the content of our .conf file sounds like a nice improvement.

By the way, how is v0.0.2 for you? :)

@CompyENG
Copy link
Contributor

CompyENG commented Mar 5, 2023

Just tried v0.0.2 and it seems to work fine (I can only reboot my server so many times in a day 😆 )

Also sent you a PR to implement the above suggesting of discovering library paths based on the output of nvidia-container-cli list --libraries . Tested with a new and existing jail locally and it seems to behave fine.

@Jip-Hop
Copy link
Owner

Jip-Hop commented Mar 5, 2023

We're now on v0.0.3 thanks to @TrueJournals :)

I've added the Pre Init command instructions to the readme.

Looking forward to hearing from @Ixian and @Talung one last time if all is working properly. Hopefully we can soon close this issue.

@Ixian
Copy link
Author

Ixian commented Mar 5, 2023

Updated to 0.0.3, rebooted, all working, Plex hw transcoding working.

Question: Do we need to re-generate a new jail with each version i.e. has the cli launch command in the config file changed? I'm still testing with the jail I created with 0.0.1.

@Jip-Hop
Copy link
Owner

Jip-Hop commented Mar 5, 2023

Nice!

The debugging we did with the script may have left some residual files (symlinks, empty folders), so recreating may not be a bad idea.

But in general my intention is that there should not be a need to regenerate a jail when using a newer version of the script.

@Ixian
Copy link
Author

Ixian commented Mar 5, 2023

I'm happy to close this now if you want, I think we've gotten it.

@Jip-Hop
Copy link
Owner

Jip-Hop commented Mar 5, 2023

100 comments and closed! 🎉

@Jip-Hop Jip-Hop closed this as completed Mar 5, 2023
@Talung
Copy link

Talung commented Mar 9, 2023

Can confirm this is working for me and my Emby is HW transcoding now. Thanks

@Jip-Hop
Copy link
Owner

Jip-Hop commented Aug 14, 2023

I've done some refactoring. Would be interested in knowing if Nvidia passthrough still works with the latest version. Anyone care to test?

@Talung
Copy link

Talung commented Aug 14, 2023

Sorry @Jip-Hop I have moved my entire system back over to Proxmox and running zfs and docker natively on that. TrueNAS became too much of a pain in the arse to get around their crap.

Apologies for not being able to help.

@patrickmichalina
Copy link

I've done some refactoring. Would be interested in knowing if Nvidia passthrough still works with the latest version. Anyone care to test?

it works for me thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants