GPU is not detected in R, but appears in python. #1456

evanliu3594 · 2024-06-11T18:41:57Z

Hi there,

I recently started moving my training environment to WSL2 to keep pace to keras3.

after following the installation guide, I successfully installed the tensorflow on my conda environment through command

keras3::install_keras(envname = "~/pyEnv/keras", backend = "tensorflow",  gpu = T)

However, when I checked tf.config in R, I found out that the GPU was not detected.

> tf$config$list_physical_devices()
2024-06-12 02:16:24.128849: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-06-12 02:16:24.668747: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-06-12 02:16:25.456112: E external/local_xla/xla/stream_executor/cuda/cuda_driver.cc:282] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
[[1]]
PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU')

I test some code and keras worked just fine with CPU.

Then I turned to python to get more details. dramatically, the GPU just showed up.

evan@DESKTOP-KGBNUBC:~$ conda activate keras
(/home/evan/pyEnv/keras) evan@DESKTOP-KGBNUBC:~$ python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"

2024-06-12 02:21:15.036500: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-06-12 02:21:15.538230: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-06-12 02:21:16.242746: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2024-06-12 02:21:16.271831: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2024-06-12 02:21:16.271904: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

googled a while and found nothing similar to this. Is that I shouldn't install TF into a conda environment?

Thanks in advance for any advice.

session info is here:

> sessionInfo()
R version 4.1.2 (2021-11-01)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0

locale:
 [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8        LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8   
 [6] LC_MESSAGES=C.UTF-8    LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C           LC_TELEPHONE=C        
[11] LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] tensorflow_2.16.0

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.12       lattice_0.20-45   png_0.1-8         withr_3.0.0       zeallot_0.1.0     rappdirs_0.3.3   
 [7] R6_2.5.1          grid_4.1.2        lifecycle_1.0.4   jsonlite_1.8.8    magrittr_2.0.3    tfruns_1.5.3     
[13] rlang_1.1.4       cli_3.6.2         fs_1.6.4          rstudioapi_0.16.0 whisker_0.4.1     keras3_1.0.0     
[19] Matrix_1.4-0      reticulate_1.37.0 generics_0.1.3    keras_2.15.0      tools_4.1.2       glue_1.7.0       
[25] compiler_4.1.2    base64enc_0.1-3

The text was updated successfully, but these errors were encountered:

t-kalinowski · 2024-06-11T18:43:31Z

Can you confirm that the R session is indeed finding the correct python env? What is the output of reticulate::py_config()?

evanliu3594 · 2024-06-11T18:50:05Z

Can you confirm that the R session is indeed finding the correct python env? What is the output of reticulate::py_config()?

yes, I only created 1 conda env called keras

> reticulate::py_config()
python:         /home/evan/pyEnv/keras/bin/python
libpython:      /home/evan/pyEnv/keras/lib/libpython3.11.so
pythonhome:     /home/evan/pyEnv/keras:/home/evan/pyEnv/keras
version:        3.11.9 (main, Apr 19 2024, 16:48:06) [GCC 11.2.0]
numpy:          /home/evan/pyEnv/keras/lib/python3.11/site-packages/numpy
numpy_version:  1.26.4
keras:          /home/evan/pyEnv/keras/lib/python3.11/site-packages/keras

NOTE: Python version was forced by use_python() function
> tf$config$list_physical_devices()
[[1]]
PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU')

t-kalinowski · 2024-06-11T19:05:16Z

What a curious bug, thanks for reporting.

Just to rule some things out:

Do you have any startup code in .Rprofile or .Renviron that might be interfering with GPU visibility? What is the output from Sys.getenv("CUDA_VISIBLE_DEVICES") in R?
Does the same happen outside conda? Can you try with a venv and see if things work that way?

R -q -e 'keras3::install_keras()'
R -q -e 'library(reticulate); use_virtualenv("r-keras"); import("tensorflow")$config$list_physical_devices()'

evanliu3594 · 2024-06-11T19:56:04Z

What a curious bug, thanks for reporting.

Just to rule some things out:

Do you have any startup code in .Rprofile or .Renviron that might be interfering with GPU visibility? What is the output from Sys.getenv("CUDA_VISIBLE_DEVICES") in R?

Does the same happen outside conda? Can you try with a venv and see if things work that way?
R -q -e 'keras3::install_keras()'
R -q -e 'library(reticulate); use_virtualenv("r-keras"); import("tensorflow")$config$list_physical_devices()'

Thans for the reply.
I only used the .Rprofile to set the CRAN repo to a nearer mirror site to speed up downloading, so it is quite clean.

> Sys.getenv("CUDA_VISIBLE_DEVICES")
[1] ""

I tried the shell command to install keras, and it ends out the same.

evan@DESKTOP-KGBNUBC:~$ R -q -e 'library(reticulate); use_virtualenv("r-keras"); import("tensorflow")$config$list_physical_devices()'
> library(reticulate); use_virtualenv("r-keras"); import("tensorflow")$config$list_physical_devices()
2024-06-12 03:25:20.236712: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-06-12 03:25:20.782594: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-06-12 03:25:21.546573: E external/local_xla/xla/stream_executor/cuda/cuda_driver.cc:282] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
[[1]]
PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU')

One thing different is, in this run install_keras() with default params, the cuda package and tensorflow-gpu is no installed.
It seems R truly did not detect my GPU.

evan@DESKTOP-KGBNUBC:~$ source .virtualenvs/r-keras/bin/activate
(r-keras) evan@DESKTOP-KGBNUBC:~$ pip list | grep tensor
tensorboard                  2.16.2
tensorboard-data-server      0.7.2
tensorflow-cpu               2.16.1
tensorflow-datasets          4.9.6
tensorflow-io-gcs-filesystem 0.37.0
tensorflow-metadata          1.15.0
(r-keras) evan@DESKTOP-KGBNUBC:~$ pip list | grep cuda
(r-keras) evan@DESKTOP-KGBNUBC:~$

I dug a little to find out that the lspci can't see the GPU in the WSL2 Ubuntu2204, but nvidia-smi worked.

evan@DESKTOP-KGBNUBC:~$ lspci
4d66:00:00.0 SCSI storage controller: Red Hat, Inc. Virtio console (rev 01)
6e30:00:00.0 System peripheral: Red Hat, Inc. Virtio file system (rev 01)
d98b:00:00.0 3D controller: Microsoft Corporation Device 008e
evan@DESKTOP-KGBNUBC:~$ nvidia-smi
Wed Jun 12 03:43:34 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.52.01              Driver Version: 555.99         CUDA Version: 12.5     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4060 Ti     On  |   00000000:01:00.0  On |                  N/A |
| 39%   37C    P8             10W /  160W |    1657MiB /   8188MiB |     20%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A        66      G   /Xwayland                                   N/A      |
+-----------------------------------------------------------------------------------------+

Then I installed keras again with gpu=TRUE, not surprisingly resulted in the same problem as the initial one, the GPU disappeared in R, but appeared in python. 🤦‍♂️

t-kalinowski · 2024-06-11T20:50:40Z

I'll try to get on a Windows machine tomorrow and see if I can reproduce.

evanliu3594 · 2024-06-12T06:16:28Z

Just an update about what I've tried.

After a whole system reinstall (including the WSL Ubuntu), I found out that I can't see GPU in python too.
Sorry for ignoring this, but until then I recalled that before using R function install_keras(), I used pip to install tensorflow package, and added some lines in the conda activate.d bash script to ensure add these nvidia packages to the system environment.

NVIDIA_DIR=$(dirname $(dirname $(python -c "import nvidia.cudnn;print(nvidia.cudnn.__file__)")))

for dir in $NVIDIA_DIR/*; do
    if [ -d "$dir/lib" ]; then
        export LD_LIBRARY_PATH="$dir/lib:$LD_LIBRARY_PATH"
    fi
done

I'm not sure if this is vital to ensure python can see the GPU, it is apparently not affecting R.

t-kalinowski · 2024-06-12T17:21:02Z

Thanks, I can reproduce. This seems to be specific to TF 2.16, the GPU is visible with the identical setup using TF 2.15.

It seems that we need to do some more work on WSL with helping Tensorflow discover the nvidia shared libraries (note, we already workaround some deficiencies by creating symlinks to nvidia shared libraries in the tensorflow virtual env. This works on Linux, but is apparently not sufficient on WSL)

For now, you can fix by running this in WSL before starting the R session (Or setting the env vars in the R session before reticulate has initializing Python).

#!/bin/sh

# Store original LD_LIBRARY_PATH 
export ORIGINAL_LD_LIBRARY_PATH="${LD_LIBRARY_PATH}" 

# Get the CUDNN directory 
CUDNN_DIR=$(dirname $(dirname $(python -c "import nvidia.cudnn; print(nvidia.cudnn.__file__)")))

# Set LD_LIBRARY_PATH to include CUDNN directory
export LD_LIBRARY_PATH=$(find ${CUDNN_DIR}/*/lib/ -type d -printf "%p:")${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

# Get the ptxas directory  
PTXAS_DIR=$(dirname $(dirname $(python -c "import nvidia.cuda_nvcc; print(nvidia.cuda_nvcc.__file__)")))

# Set PATH to include the directory containing ptxas
export PATH=$(find ${PTXAS_DIR}/*/bin/ -type d -printf "%p:")${PATH:+:${PATH}}

from: https://discuss.tensorflow.org/t/what-versions-of-cuda-and-cudnn-are-required-for-tensorflow-2-16/24711/3

Note, there is nothing specific to conda here. We still recommend using a virtualenv if possible.

I'll push an update soon making sure that the R package does this work so users don't have to.

evanliu3594 · 2024-06-12T20:58:20Z

#!/bin/sh

# Store original LD_LIBRARY_PATH 
export ORIGINAL_LD_LIBRARY_PATH="${LD_LIBRARY_PATH}" 

# Get the CUDNN directory 
CUDNN_DIR=$(dirname $(dirname $(python -c "import nvidia.cudnn; print(nvidia.cudnn.__file__)")))

# Set LD_LIBRARY_PATH to include CUDNN directory
export LD_LIBRARY_PATH=$(find ${CUDNN_DIR}/*/lib/ -type d -printf "%p:")${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

# Get the ptxas directory  
PTXAS_DIR=$(dirname $(dirname $(python -c "import nvidia.cuda_nvcc; print(nvidia.cuda_nvcc.__file__)")))

# Set PATH to include the directory containing ptxas
export PATH=$(find ${PTXAS_DIR}/*/bin/ -type d -printf "%p:")${PATH:+:${PATH}}

Thanks a lot! That saves me from learning python again...😂

t-kalinowski · 2024-06-13T15:16:35Z

This is fixed on main now, the workaround should not longer be necessary. Please install the development version and reinstall keras+tensorflow to test it out.

remotes::install_github("rstudio/keras3")
keras3::install_keras()

# new R session
library(keras3) # load hook hints to reticulate to use_virtualenv("r-keras")
tensorflow::tf$config$list_physical_devices()

evanliu3594 closed this as completed Jun 12, 2024

This was referenced Jun 13, 2024

fix "GPU not found" on WSL Linux rstudio/tensorflow#599

Merged

fix 'GPU not found' on Windows WSL Linux #1459

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU is not detected in R, but appears in python. #1456

GPU is not detected in R, but appears in python. #1456

evanliu3594 commented Jun 11, 2024

t-kalinowski commented Jun 11, 2024

evanliu3594 commented Jun 11, 2024

t-kalinowski commented Jun 11, 2024

evanliu3594 commented Jun 11, 2024 •

edited

Loading

t-kalinowski commented Jun 11, 2024

evanliu3594 commented Jun 12, 2024 •

edited

Loading

t-kalinowski commented Jun 12, 2024

evanliu3594 commented Jun 12, 2024

t-kalinowski commented Jun 13, 2024

GPU is not detected in R, but appears in python. #1456

GPU is not detected in R, but appears in python. #1456

Comments

evanliu3594 commented Jun 11, 2024

t-kalinowski commented Jun 11, 2024

evanliu3594 commented Jun 11, 2024

t-kalinowski commented Jun 11, 2024

evanliu3594 commented Jun 11, 2024 • edited Loading

t-kalinowski commented Jun 11, 2024

evanliu3594 commented Jun 12, 2024 • edited Loading

t-kalinowski commented Jun 12, 2024

evanliu3594 commented Jun 12, 2024

t-kalinowski commented Jun 13, 2024

evanliu3594 commented Jun 11, 2024 •

edited

Loading

evanliu3594 commented Jun 12, 2024 •

edited

Loading