Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix env var extraction #2043

Merged
merged 1 commit into from
Nov 14, 2024
Merged

fix env var extraction #2043

merged 1 commit into from
Nov 14, 2024

Conversation

winglian
Copy link
Collaborator

Description

in runpod, using the direct SSH connection often leaves the environment in an incorrect state. This fixes the env vars we grab and correctly sets them in the environment on login

@@ -2,7 +2,7 @@

# Export specific ENV variables to /etc/rp_environment
echo "Exporting environment variables..."
printenv | grep -E '^RUNPOD_|^PATH=|^_=' | sed 's/^\(.*\)=\(.*\)$/export \1="\2"/' >> /etc/rp_environment
printenv | grep -E '^HF_|^BNB_|^CUDA_|^NCCL_|^NV|^RUNPOD_|^PATH=|^_=' | sed 's/^\([^=]*\)=\(.*\)$/export \1="\2"/' | grep -v 'printenv' >> /etc/rp_environment
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is NV trying to catch here? Could it be a bit more specific(longer)? I'm concerned it may catch something we don't want.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NV_LIBCUBLAS_VERSION=12.4.5.8-1
NVIDIA_VISIBLE_DEVICES=all
NV_NVML_DEV_VERSION=12.4.127-1
NV_CUDNN_PACKAGE_NAME=libcudnn9-cuda-12
NV_LIBNCCL_DEV_PACKAGE=libnccl-dev=2.21.5-1+cuda12.4
NV_LIBNCCL_DEV_PACKAGE_VERSION=2.21.5-1
NVIDIA_REQUIRE_CUDA=cuda>=12.4 brand=tesla,driver>=470,driver<471 brand=unknown,driver>=470,driver<471 brand=nvidia,driver>=470,driver<471 brand=nvidiartx,driver>=470,driver<471 brand=geforce,driver>=470,driver<471 brand=geforcertx,driver>=470,driver<471 brand=quadro,driver>=470,driver<471 brand=quadrortx,driver>=470,driver<471 brand=titan,driver>=470,driver<471 brand=titanrtx,driver>=470,driver<471 brand=tesla,driver>=525,driver<526 brand=unknown,driver>=525,driver<526 brand=nvidia,driver>=525,driver<526 brand=nvidiartx,driver>=525,driver<526 brand=geforce,driver>=525,driver<526 brand=geforcertx,driver>=525,driver<526 brand=quadro,driver>=525,driver<526 brand=quadrortx,driver>=525,driver<526 brand=titan,driver>=525,driver<526 brand=titanrtx,driver>=525,driver<526 brand=tesla,driver>=535,driver<536 brand=unknown,driver>=535,driver<536 brand=nvidia,driver>=535,driver<536 brand=nvidiartx,driver>=535,driver<536 brand=geforce,driver>=535,driver<536 brand=geforcertx,driver>=535,driver<536 brand=quadro,driver>=535,driver<536 brand=quadrortx,driver>=535,driver<536 brand=titan,driver>=535,driver<536 brand=titanrtx,driver>=535,driver<536
NV_LIBCUBLAS_DEV_PACKAGE=libcublas-dev-12-4=12.4.5.8-1
NV_NVTX_VERSION=12.4.127-1
NV_CUDA_CUDART_DEV_VERSION=12.4.127-1
NV_LIBCUSPARSE_VERSION=12.3.1.170-1
NV_LIBNPP_VERSION=12.2.5.30-1
NV_CUDNN_PACKAGE=libcudnn9-cuda-12=9.1.0.70-1
NVIDIA_DRIVER_CAPABILITIES=compute,utility
NV_NVPROF_DEV_PACKAGE=cuda-nvprof-12-4=12.4.127-1
NV_LIBNPP_PACKAGE=libnpp-12-4=12.2.5.30-1
NV_LIBNCCL_DEV_PACKAGE_NAME=libnccl-dev
NV_LIBCUBLAS_DEV_VERSION=12.4.5.8-1
NVIDIA_PRODUCT_NAME=CUDA
NV_LIBCUBLAS_DEV_PACKAGE_NAME=libcublas-dev-12-4
NV_CUDA_CUDART_VERSION=12.4.127-1
NV_LIBCUBLAS_PACKAGE=libcublas-12-4=12.4.5.8-1
NV_CUDA_NSIGHT_COMPUTE_DEV_PACKAGE=cuda-nsight-compute-12-4=12.4.1-1
NV_LIBNPP_DEV_PACKAGE=libnpp-dev-12-4=12.2.5.30-1
NV_LIBCUBLAS_PACKAGE_NAME=libcublas-12-4
NV_LIBNPP_DEV_VERSION=12.2.5.30-1
NV_LIBCUSPARSE_DEV_VERSION=12.3.1.170-1
NV_CUDNN_VERSION=9.1.0.70-1
NV_CUDA_LIB_VERSION=12.4.1-1
NVARCH=x86_64
NV_CUDNN_PACKAGE_DEV=libcudnn9-dev-cuda-12=9.1.0.70-1
NV_CUDA_COMPAT_PACKAGE=cuda-compat-12-4
NV_LIBNCCL_PACKAGE=libnccl2=2.21.5-1+cuda12.4
NV_CUDA_NSIGHT_COMPUTE_VERSION=12.4.1-1
NV_NVPROF_VERSION=12.4.127-1
NV_LIBNCCL_PACKAGE_NAME=libnccl2
NV_LIBNCCL_PACKAGE_VERSION=2.21.5-1

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could alternatively be more explicit with ^NV_ | ^NVIDIA_

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we set any of these? I don't recall seeing them.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't seem to be set when logging in via direct ssh but are needed

@winglian winglian merged commit f3a5d11 into main Nov 14, 2024
bursteratom pushed a commit that referenced this pull request Nov 18, 2024
djsaunde pushed a commit that referenced this pull request Dec 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants