-
Notifications
You must be signed in to change notification settings - Fork 235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nvidia-container-toolkit: resolve nvidia-ctk static linking workaround #1525
nvidia-container-toolkit: resolve nvidia-ctk static linking workaround #1525
Conversation
…seccomp and libtirpc * Switch to libtirpc from tirpc126 To match the libnvidia-container dependency on libtirpc * Leverage pkg-config for automated build flag retrieval on 0002-OE-cross-build-fixups.patch Utilize pkg-config into the 0002-OE-cross-build-fixups.patch to streamline the retrieval of build flags within the package build system, enhancing automation and maintainability * Refresh patches index Signed-off-by: Daniel Chaves <dchvs11@gmail.com>
…mp and libtirpc * Leverage pkg-config for automated build flag retrieval Utilize pkg-config into the 0001-OE-cross-build-fixups.patch to streamline the retrieval of build flags within the package build system; enhancing automation and maintainability * Refresh patches Signed-off-by: Daniel Chaves <dchvs11@gmail.com>
… uses Poky's libtirpc version Signed-off-by: Daniel Chaves <dchvs11@gmail.com>
… configuration As the libnvidia-container is compiled with the flag WITH_NVCGO=no Signed-off-by: Daniel Chaves <dchvs11@gmail.com>
…dependencies * Add CUDA libraries (tegra-libraries-cuda) and GoRuntime dependencies to fix the static linking workaround that causes panic on nvidia-ctk startup. For more details refer to commit: OE4T@971f014 Signed-off-by: Daniel Chaves <dchvs11@gmail.com>
There appears to be more to this than just trying to switch away from static linking. For the static linking workaround, upstream OE-Core has disabled dynamic linking for all Go packages on all target architectures now anyway, so I'm not sure it's worth trying to resolve that particular problem. Disabling cgroupsv2 support doesn't sound like a good idea, based on what I see in the issue that you linked to, so I'm not sure why we'd want to do this. Can you provide more info on why you made that particular change? There was a reason why we used the older libtirpc for libnvidia-container-jetson - there were changes in libtirpc that the old NVIDIA code wasn't compatible with, causing exceptions. If that's been resolved somehow, great, but really what I'd rather see is an adaptation of the patches 01ba56b for the 1.11 version of the toolkit, so we can do away with using "legacy" mode and drop the libnvidia-container-jetson recipe completely. |
Hi @dchvs |
Hi, guys Disabling
This issue might have another solution or explanation besides disabling Regarding the static linking workaround, I came across this "goarch: disable dynamic linking globally" patch: OpenEmbedded-Core patch link. However, it appears that this change was reverted last week: Revert patch link. Is this the same issue within OE-Core that you mentioned? In regard to the statement about using the older version of |
On the cgroupsv2 problem, I saw those nvidia-docker issues, but it wasn't clear to me how they related. If you have more detail on how to reproduce the error message you are seeing, I'd like to see it, so we can get to the root cause. I've never run into that particular error myself. Yep, I see that the shared-runtime changes got put back in OE-Core, heaven help us. It has never played well with cgo, though, in my experience. As for the libtirpc, it was #760 that triggered the changes for using the older version. It sounds, though, like the underlying issue was a bug in the NVIDIA code that subsequently got fixed, so maybe we'll be OK with dropping them. |
OK, I can reproduce the cgroups problem. Looks like it's a combination of the libnvidia-container library (maybe both the main one and the legacy Jetson-specific one, I'm not sure) expecting to see the legacy cgroupsv1 setup in the sysfs, which systemd deprecated, and turned off by default, as of version 252. Adding |
@dchvs Could you take a look at #1541 , which eliminates the libnvidia-container-jetson library completely and allows containers to run with cgroupsv2, so you don't have to disable cgroup support (or the unified cgroup hierarchy in systemd)? The old version of libtirpc is also dropped there, since it was used only for the libnvidia-container-jetson stuff. The PR still links the go binaries statically, but otherwise should solve the problems you were seeing. |
Sure, will do! |
See #1541 |
Include CUDA libraries (
tegra-libraries-cuda
) and GoRuntime dependencies into NVIDIA Container Toolkit to resolve the static linking workaround that addresses the panic issue during the startup ofnvidia-ctk
. For further information, please refer to the commit: 971f014Details about the
nvidia-ctk
and the linking of libraries after these changes: