Skip to content
This repository has been archived by the owner on Nov 3, 2023. It is now read-only.

Parlai 1.7.0 on WSL 2 Python 3.8.10 CUDA_HOME environment variable not set. #4778

Closed
ArEnSc opened this issue Aug 31, 2022 · 11 comments
Closed

Comments

@ArEnSc
Copy link

ArEnSc commented Aug 31, 2022

Bug description
CUDA_HOME environment variable not set error when running on WSL 2
when running parlai in the shell.

Reproduction steps
Install Parlai and friends via jupyter notebook on vscode.

!pip3 install parlai==1.7.0
!pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu116
!pip3 install transformers

CUDA_HOME

Notice that Cuda is still true here when evaluated. It seems like Parlai is looking for it in the wrong place.

Expected behavior
Give a clear and concise description of what you expected to happen.
If you install parlai 1.6.0 it doesn't give me the error.
In 1.6.0 it works just fine. Cuda is also True if you evaluate the device.
1 6 0-Parlai
Logs

Please paste the command line output:

Output goes here

Additional context
Blocks usage on WSL 2 probably
@klshuster

@klshuster
Copy link
Contributor

CC @dexterju27, this seems related to the ngram blocking code -- any thoughts on why this is occurring?

@dexterju27
Copy link
Contributor

ParlAI 1.7.0 is using pytorch cpp extension feature to allow custom cuda kernels, could you look at this: pytorch/extension-cpp#26 and see if it fixes your problem?

@ArEnSc
Copy link
Author

ArEnSc commented Sep 1, 2022

ParlAI 1.7.0 is using pytorch cpp extension feature to allow custom cuda kernels, could you look at this: pytorch/extension-cpp#26 and see if it fixes your problem?
ill give this a shot, I looked through os.environ found the pathes to it the CUDA home I believe

@ArEnSc
Copy link
Author

ArEnSc commented Sep 1, 2022

@dexterju27 @klshuster
Ok so I did this and it found the required path however. I think it's looking for the wrong binary.
I am on WSL 2
bin/nvcc is a windows binary exe in my case
So I am wondering if I would have to install an alternate cuda installation in the WSL2 ubuntu container?

/home/tensor/code/WESpeechSynthesisProductionOnnx/eng-env/lib/python3.8/site-packages/torch/cuda/__init__.py:146: UserWarning: 
NVIDIA GeForce RTX 3080 with CUDA capability sm_86 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.
If you want to use the NVIDIA GeForce RTX 3080 GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

  warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))
Traceback (most recent call last):
  File "/home/tensor/code/WESpeechSynthesisProductionOnnx/eng-env/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1808, in _run_ninja_build
    subprocess.run(
  File "/usr/lib/python3.8/subprocess.py", line 516, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/tensor/code/WESpeechSynthesisProductionOnnx/eng-env/bin/parlai", line 8, in <module>
    sys.exit(main())
  File "/home/tensor/code/WESpeechSynthesisProductionOnnx/eng-env/lib/python3.8/site-packages/parlai/__main__.py", line 14, in main
    superscript_main()
  File "/home/tensor/code/WESpeechSynthesisProductionOnnx/eng-env/lib/python3.8/site-packages/parlai/core/script.py", line 247, in superscript_main
    setup_script_registry()
  File "/home/tensor/code/WESpeechSynthesisProductionOnnx/eng-env/lib/python3.8/site-packages/parlai/core/script.py", line 37, in setup_script_registry
    importlib.import_module(module.name)
  File "/usr/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 848, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/home/tensor/code/WESpeechSynthesisProductionOnnx/eng-env/lib/python3.8/site-packages/parlai/scripts/detect_offensive_language.py", line 19, in <module>
    from parlai.utils.safety import OffensiveStringMatcher, OffensiveLanguageClassifier
  File "/home/tensor/code/WESpeechSynthesisProductionOnnx/eng-env/lib/python3.8/site-packages/parlai/utils/safety.py", line 10, in <module>
    from parlai.agents.transformer.transformer import TransformerClassifierAgent
  File "/home/tensor/code/WESpeechSynthesisProductionOnnx/eng-env/lib/python3.8/site-packages/parlai/agents/transformer/transformer.py", line 15, in <module>
    from parlai.core.torch_generator_agent import TorchGeneratorAgent
  File "/home/tensor/code/WESpeechSynthesisProductionOnnx/eng-env/lib/python3.8/site-packages/parlai/core/torch_generator_agent.py", line 48, in <module>
    from parlai.ops.ngram_repeat_block import NGramRepeatBlock
  File "/home/tensor/code/WESpeechSynthesisProductionOnnx/eng-env/lib/python3.8/site-packages/parlai/ops/ngram_repeat_block.py", line 23, in <module>
    ngram_repeat_block_cuda = load(
  File "/home/tensor/code/WESpeechSynthesisProductionOnnx/eng-env/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1202, in load
    return _jit_compile(
  File "/home/tensor/code/WESpeechSynthesisProductionOnnx/eng-env/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1425, in _jit_compile
    _write_ninja_file_and_build_library(
  File "/home/tensor/code/WESpeechSynthesisProductionOnnx/eng-env/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1537, in _write_ninja_file_and_build_library
    _run_ninja_build(
  File "/home/tensor/code/WESpeechSynthesisProductionOnnx/eng-env/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1824, in _run_ninja_build
    raise RuntimeError(message) from e
RuntimeError: Error building extension 'ngram_repeat_block_cuda': [1/2] /mnt/c/Program\ Files/NVIDIA\ GPU\ Computing\ Toolkit/CUDA/v11.4/bin/nvcc  -DTORCH_EXTENSION_NAME=ngram_repeat_block_cuda -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/tensor/code/WESpeechSynthesisProductionOnnx/eng-env/lib/python3.8/site-packages/torch/include -isystem /home/tensor/code/WESpeechSynthesisProductionOnnx/eng-env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /home/tensor/code/WESpeechSynthesisProductionOnnx/eng-env/lib/python3.8/site-packages/torch/include/TH -isystem /home/tensor/code/WESpeechSynthesisProductionOnnx/eng-env/lib/python3.8/site-packages/torch/include/THC -isystem /mnt/c/Program\ Files/NVIDIA\ GPU\ Computing\ Toolkit/CUDA/v11.4/include -isystem /usr/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 --compiler-options '-fPIC' -std=c++14 -c /home/tensor/code/WESpeechSynthesisProductionOnnx/eng-env/lib/python3.8/site-packages/parlai/clib/cuda/ngram_repeat_block_cuda_kernel.cu -o ngram_repeat_block_cuda_kernel.cuda.o 
FAILED: ngram_repeat_block_cuda_kernel.cuda.o 
/mnt/c/Program\ Files/NVIDIA\ GPU\ Computing\ Toolkit/CUDA/v11.4/bin/nvcc  -DTORCH_EXTENSION_NAME=ngram_repeat_block_cuda -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/tensor/code/WESpeechSynthesisProductionOnnx/eng-env/lib/python3.8/site-packages/torch/include -isystem /home/tensor/code/WESpeechSynthesisProductionOnnx/eng-env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /home/tensor/code/WESpeechSynthesisProductionOnnx/eng-env/lib/python3.8/site-packages/torch/include/TH -isystem /home/tensor/code/WESpeechSynthesisProductionOnnx/eng-env/lib/python3.8/site-packages/torch/include/THC -isystem /mnt/c/Program\ Files/NVIDIA\ GPU\ Computing\ Toolkit/CUDA/v11.4/include -isystem /usr/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 --compiler-options '-fPIC' -std=c++14 -c /home/tensor/code/WESpeechSynthesisProductionOnnx/eng-env/lib/python3.8/site-packages/parlai/clib/cuda/ngram_repeat_block_cuda_kernel.cu -o ngram_repeat_block_cuda_kernel.cuda.o 
/bin/sh: 1: /mnt/c/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.4/bin/nvcc: not found
ninja: build stopped: subcommand failed.

@ArEnSc
Copy link
Author

ArEnSc commented Sep 6, 2022

@dexterju27 just following up on this thanks!

@dexterju27
Copy link
Contributor

I don't have access to a WSL2 ubuntu. eg I can't reproduce the issue that you are having.

I'm deeply confused by what you are saying here. If you are using a WSL2 ubuntu, why would your "bin/nvcc is a windows binary exe"? isn't that supposed to be an Linux executable?

could you try install CUDA properly in Ubuntu? Like: https://askubuntu.com/questions/1280205/problem-while-installing-cuda-toolkit-in-ubuntu-18-04/1315116#1315116?newreg=ec85792ef03b446297a665e21fff5735

@ArEnSc
Copy link
Author

ArEnSc commented Sep 6, 2022

I don't have access to a WSL2 ubuntu. eg I can't reproduce the issue that you are having.

I'm deeply confused by what you are saying here. If you are using a WSL2 ubuntu, why would your "bin/nvcc is a windows binary exe"? isn't that supposed to be an Linux executable?

could you try install CUDA properly in Ubuntu? Like: https://askubuntu.com/questions/1280205/problem-while-installing-cuda-toolkit-in-ubuntu-18-04/1315116#1315116?newreg=ec85792ef03b446297a665e21fff5735

That is what is confusing me as well, it seems like WSL refers to the windows dlls and binaries. Ill give what you said a shot.

@dexterju27
Copy link
Contributor

Notice in the error messages you posted, it also complained:
"""NVIDIA GeForce RTX 3080 with CUDA capability sm_86 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.
If you want to use the NVIDIA GeForce RTX 3080 GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/"""
Seems there are issues with your pytorch installation as well.

I'm closing this issue for now since I don't think the issue is ParlAI related.

@ArEnSc
Copy link
Author

ArEnSc commented Sep 6, 2022

@dexterju27 So that is a red herring, I have no problem using Cuda at all, Cuda is true and I can use it in the notebook and trained a voice to voice model using it. I can assure you this is definitely related to WSL2 but it's fine if you want to close this issue. Ill give what you said a shot and if it doesn't work ill just not use the framework

@klshuster
Copy link
Contributor

@ArEnSc We pushed a fix with #4779 that should protect this import. We have not made an official release yet, but if you wanted to pull that into your installed package and see if it works, it may be of use

@ArEnSc
Copy link
Author

ArEnSc commented Sep 13, 2022

@klshuster it worked thanks! =D

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants