-
Notifications
You must be signed in to change notification settings - Fork 133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: glibc search paths for nvidia #421
fix: glibc search paths for nvidia #421
Conversation
cafe267
to
24c76e0
Compare
@@ -1,6 +1 @@ | |||
# libc default configuration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to make sure, glibc doesn't try to load non-glibc libs
@@ -51,7 +51,7 @@ steps: | |||
cd libnvidia-container | |||
|
|||
# LDLIBS=-L/usr/local/glibc/lib is set so that libnvidia-container-cli libs which are hardcoded as -llibname and not using pkg-config | |||
CPPFLAGS="-I/usr/local/glibc/include/tirpc" LDLIBS="-L/usr/local/glibc/lib -ltirpc -lelf -lseccomp" make | |||
CPPFLAGS="-I/usr/local/glibc/include/tirpc" LDLIBS="-L/usr/local/glibc/lib -ltirpc -lelf -lseccomp" LDFLAGS='-Wl,--rpath=\$$ORIGIN/../glibc/\$$LIB' make |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the actual fix needed
cd NVIDIA-Linux-* | ||
|
||
./nvidia-installer --silent \ | ||
--opengl-prefix=/rootfs/usr/local \ | ||
--utility-prefix=/rootfs/usr/local \ | ||
--utility-libdir=glibc/lib \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will only copy some libraries, not all hence the manual move below at https://github.com/siderolabs/extensions/pull/421/files#diff-d91dc320bde5625cb12535516b84cc3cd36e7ae24b320a40852b8f22a38299beR61
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for some reason nvidia-container-cli
can't find libs stored under /usr/local/lib
so we explicitly keep them under the custom location at /usr/local/glibc/lib
Set `glibc/lib` as first `rpath` for `nvidia-container-cli`. Also install nvidia libraries to `/usr/local/glibc/lib` so any musl libraries lives separately. `nvidia-container-cli` explicitly sets an `RPATH` as `$ORIGIN/../$LIB` here: https://gitlab.com/nvidia/container-toolkit/libnvidia-container/-/blob/v1.14.6/Makefile?ref_type=tags#L183, this means `/usr/local/lib` would be searched first, since `zfs` and nvidia ship their own `libtirpc`, `nvidia-container-cli` first tries to use the `libtirpc` shippeed with `zfs` at `/usr/local/lib` instead of the one at `/usr/local/glibc/lib`. Fix this by setting an additional `RPATH` as `$ORIGIN/../glibc/$LIB`, so that libraries in `/usr/local/glibc/lib` have higher preference. ```bash ❯ scanelf -r _out/rootfs/rootfs/usr/local/bin/nvidia-container-cli TYPE RPATH FILE ET_DYN $ORIGIN/../glibc/$LIB:$ORIGIN/../$LIB _out/rootfs/rootfs/usr/local/bin/nvidia-container-cli ``` Properly fixes: siderolabs#380 Fixes from siderolabs#401 and siderolabs#410 were not complete. Manually tested by spinning up a NVIDIA worker in AWS. Signed-off-by: Noel Georgi <git@frezbo.dev>
24c76e0
to
5334e89
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🆒
/m |
Set
glibc/lib
as firstrpath
fornvidia-container-cli
. Also install nvidia libraries to/usr/local/glibc/lib
so any musl libraries lives separately.nvidia-container-cli
explicitly sets anRPATH
as$ORIGIN/../$LIB
here: https://gitlab.com/nvidia/container-toolkit/libnvidia-container/-/blob/v1.14.6/Makefile?ref_type=tags#L183, this means/usr/local/lib
would be searched first, sincezfs
and nvidia ship their ownlibtirpc
,nvidia-container-cli
first tries to use thelibtirpc
shippeed withzfs
at/usr/local/lib
instead of the one at/usr/local/glibc/lib
. Fix this by setting an additionalRPATH
as$ORIGIN/../glibc/$LIB
, so that libraries in/usr/local/glibc/lib
have higher preference.Properly fixes: #380
Fixes from #401 and #410 were not complete.
Manually tested by spinning up a NVIDIA worker in AWS.