Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: zfs extensions with nvidia #410

Merged
merged 1 commit into from
Jun 12, 2024

Conversation

frezbo
Copy link
Member

@frezbo frezbo commented Jun 12, 2024

Introduce a proper fix for #401, keep musl path's as is, and use /usr/local/glibc as install path for all glibc related stuff so that any new common libraries will not cause an issue in the future.

@frezbo frezbo force-pushed the fix/zfs-extensions branch from 3e9b4e8 to c9b9df1 Compare June 12, 2024 07:58
@frezbo frezbo changed the title fix: zfs extensions fix: zfs extensions with nvidia Jun 12, 2024
Introduce a proper fix for siderolabs#401, keep musl path's as is, and use
`/usr/local/glibc` as install path for all glibc related stuff so that
any new common libraries will not cause an issue in the future.

Signed-off-by: Noel Georgi <git@frezbo.dev>
@frezbo frezbo force-pushed the fix/zfs-extensions branch from c9b9df1 to 3526f45 Compare June 12, 2024 08:00
@frezbo
Copy link
Member Author

frezbo commented Jun 12, 2024

/m

@talos-bot talos-bot merged commit 3526f45 into siderolabs:main Jun 12, 2024
14 checks passed
@frezbo frezbo deleted the fix/zfs-extensions branch June 12, 2024 11:27
frezbo added a commit to frezbo/extensions that referenced this pull request Jun 24, 2024
Set `glibc/lib` as first `rpath` for `nvidia-container-cli`. Also
install nvidia libraries to `/usr/local/glibc/lib` so any musl libraries
lives separately.

Properly fixes: siderolabs#380

Fixes from siderolabs#401 and siderolabs#410 were not complete.

Manually tested by spinning up a NVIDIA worker in AWS.

Signed-off-by: Noel Georgi <git@frezbo.dev>
frezbo added a commit to frezbo/extensions that referenced this pull request Jun 24, 2024
Set `glibc/lib` as first `rpath` for `nvidia-container-cli`. Also
install nvidia libraries to `/usr/local/glibc/lib` so any musl libraries
lives separately.

`nvidia-container-cli` explicitly sets an `RPATH` `$ORIGIN/../$LIB` here:
https://gitlab.com/nvidia/container-toolkit/libnvidia-container/-/blob/v1.14.6/Makefile?ref_type=tags#L183,
this means `/usr/local/lib` would be searched first, since `zfs` and
nvidia ship their own `libtirpc`, `nvidia-container-cli` first tries to
use the `libtirpc` shippeed with `zfs` at `/usr/local/lib` instead of
the one at `/usr/local/glibc/lib`. Fix this by setting an additional
`RPATH` as `$ORIGIN/../glibc/$LIB`, so that libraries in
`/usr/local/glibc/lib` have higher preference.

Properly fixes: siderolabs#380

Fixes from siderolabs#401 and siderolabs#410 were not complete.

Manually tested by spinning up a NVIDIA worker in AWS.

Signed-off-by: Noel Georgi <git@frezbo.dev>
frezbo added a commit to frezbo/extensions that referenced this pull request Jun 24, 2024
Set `glibc/lib` as first `rpath` for `nvidia-container-cli`. Also
install nvidia libraries to `/usr/local/glibc/lib` so any musl libraries
lives separately.

`nvidia-container-cli` explicitly sets an `RPATH` as `$ORIGIN/../$LIB` here:
https://gitlab.com/nvidia/container-toolkit/libnvidia-container/-/blob/v1.14.6/Makefile?ref_type=tags#L183,
this means `/usr/local/lib` would be searched first, since `zfs` and
nvidia ship their own `libtirpc`, `nvidia-container-cli` first tries to
use the `libtirpc` shippeed with `zfs` at `/usr/local/lib` instead of
the one at `/usr/local/glibc/lib`. Fix this by setting an additional
`RPATH` as `$ORIGIN/../glibc/$LIB`, so that libraries in
`/usr/local/glibc/lib` have higher preference.

```bash
❯ scanelf -r _out/rootfs/rootfs/usr/local/bin/nvidia-container-cli
 TYPE   RPATH FILE
ET_DYN $ORIGIN/../glibc/$LIB:$ORIGIN/../$LIB _out/rootfs/rootfs/usr/local/bin/nvidia-container-cli
```

Properly fixes: siderolabs#380

Fixes from siderolabs#401 and siderolabs#410 were not complete.

Manually tested by spinning up a NVIDIA worker in AWS.

Signed-off-by: Noel Georgi <git@frezbo.dev>
frezbo added a commit to frezbo/extensions that referenced this pull request Jun 24, 2024
Set `glibc/lib` as first `rpath` for `nvidia-container-cli`. Also
install nvidia libraries to `/usr/local/glibc/lib` so any musl libraries
lives separately.

`nvidia-container-cli` explicitly sets an `RPATH` as `$ORIGIN/../$LIB` here:
https://gitlab.com/nvidia/container-toolkit/libnvidia-container/-/blob/v1.14.6/Makefile?ref_type=tags#L183,
this means `/usr/local/lib` would be searched first, since `zfs` and
nvidia ship their own `libtirpc`, `nvidia-container-cli` first tries to
use the `libtirpc` shippeed with `zfs` at `/usr/local/lib` instead of
the one at `/usr/local/glibc/lib`. Fix this by setting an additional
`RPATH` as `$ORIGIN/../glibc/$LIB`, so that libraries in
`/usr/local/glibc/lib` have higher preference.

```bash
❯ scanelf -r _out/rootfs/rootfs/usr/local/bin/nvidia-container-cli
 TYPE   RPATH FILE
ET_DYN $ORIGIN/../glibc/$LIB:$ORIGIN/../$LIB _out/rootfs/rootfs/usr/local/bin/nvidia-container-cli
```

Properly fixes: siderolabs#380

Fixes from siderolabs#401 and siderolabs#410 were not complete.

Manually tested by spinning up a NVIDIA worker in AWS.

Signed-off-by: Noel Georgi <git@frezbo.dev>
frezbo added a commit to frezbo/extensions that referenced this pull request Jun 24, 2024
Set `glibc/lib` as first `rpath` for `nvidia-container-cli`. Also
install nvidia libraries to `/usr/local/glibc/lib` so any musl libraries
lives separately.

`nvidia-container-cli` explicitly sets an `RPATH` as `$ORIGIN/../$LIB` here:
https://gitlab.com/nvidia/container-toolkit/libnvidia-container/-/blob/v1.14.6/Makefile?ref_type=tags#L183,
this means `/usr/local/lib` would be searched first, since `zfs` and
nvidia ship their own `libtirpc`, `nvidia-container-cli` first tries to
use the `libtirpc` shippeed with `zfs` at `/usr/local/lib` instead of
the one at `/usr/local/glibc/lib`. Fix this by setting an additional
`RPATH` as `$ORIGIN/../glibc/$LIB`, so that libraries in
`/usr/local/glibc/lib` have higher preference.

```bash
❯ scanelf -r _out/rootfs/rootfs/usr/local/bin/nvidia-container-cli
 TYPE   RPATH FILE
ET_DYN $ORIGIN/../glibc/$LIB:$ORIGIN/../$LIB _out/rootfs/rootfs/usr/local/bin/nvidia-container-cli
```

Properly fixes: siderolabs#380

Fixes from siderolabs#401 and siderolabs#410 were not complete.

Manually tested by spinning up a NVIDIA worker in AWS.

Signed-off-by: Noel Georgi <git@frezbo.dev>
jfroy pushed a commit to jfroy/siderolabs-extensions that referenced this pull request Sep 5, 2024
Set `glibc/lib` as first `rpath` for `nvidia-container-cli`. Also
install nvidia libraries to `/usr/local/glibc/lib` so any musl libraries
lives separately.

`nvidia-container-cli` explicitly sets an `RPATH` as `$ORIGIN/../$LIB` here:
https://gitlab.com/nvidia/container-toolkit/libnvidia-container/-/blob/v1.14.6/Makefile?ref_type=tags#L183,
this means `/usr/local/lib` would be searched first, since `zfs` and
nvidia ship their own `libtirpc`, `nvidia-container-cli` first tries to
use the `libtirpc` shippeed with `zfs` at `/usr/local/lib` instead of
the one at `/usr/local/glibc/lib`. Fix this by setting an additional
`RPATH` as `$ORIGIN/../glibc/$LIB`, so that libraries in
`/usr/local/glibc/lib` have higher preference.

```bash
❯ scanelf -r _out/rootfs/rootfs/usr/local/bin/nvidia-container-cli
 TYPE   RPATH FILE
ET_DYN $ORIGIN/../glibc/$LIB:$ORIGIN/../$LIB _out/rootfs/rootfs/usr/local/bin/nvidia-container-cli
```

Properly fixes: siderolabs#380

Fixes from siderolabs#401 and siderolabs#410 were not complete.

Manually tested by spinning up a NVIDIA worker in AWS.

Signed-off-by: Noel Georgi <git@frezbo.dev>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants