Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA12.6, gcc9.4, but "There was an error before creating cudnn handle (500): cudaErrorSymbolNotFound : named symbol not found" #272

Open
Huilin-Li opened this issue Nov 3, 2024 · 5 comments

Comments

@Huilin-Li
Copy link

Huilin-Li commented Nov 3, 2024

Caution: Please only report your issue related to the installation on your local PC or macOS. If you can get the help message by colabfold_batch --help or run a test prediction successfully, your installation is successful. Requests or questions regarding ColabFold features should be directed to ColabFold repo's issues.


What is your installation issue?
I firstly executed: colabfold_batch myfa0.fa myfa0_out --msa-only works. However, then

(/storage/shenhuaizhongLab/lihuilin/mycolabfold/localcolabfold/colabfold-conda) [lihuilin@ga40q08 apgfastas]$ colabfold_batch myfa0.fa myfa0_out
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1730616792.448210 2934203 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1730616792.452606 2934203 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-11-03 14:53:24,193 Running colabfold 1.5.5 (c21e1768d18e3608e6e6d99c97134317e7e41c75)

WARNING: You are welcome to use the default MSA server, however keep in mind that it's a
limited shared resource only capable of processing a few thousand MSAs per day. Please
submit jobs only from a single IP address. We reserve the right to limit access to the
server case-by-case when usage exceeds fair use. If you require more MSAs: You can
precompute all MSAs with `colabfold_search` or host your own API and pass it to `--host-url`

2024-11-03 14:53:24,471 Running on GPU
2024-11-03 14:53:26,236 Found 5 citations for tools or databases
2024-11-03 14:53:26,236 Query 1/30: aaalA (length 309)
2024-11-03 14:53:26,262 Loaded myfa0_out/aaalA.pickle
E1103 14:53:29.987652 2934203 cuda_dnn.cc:502] There was an error before creating cudnn handle (500): cudaErrorSymbolNotFound : named symbol not found
E1103 14:53:29.988284 2934203 cuda_dnn.cc:502] There was an error before creating cudnn handle (500): cudaErrorSymbolNotFound : named symbol not found
2024-11-03 14:53:30,205 Could not predict aaalA. Not Enough GPU memory? FAILED_PRECONDITION: DNN library initialization failed. Look at the errors above for more details.
2024-11-03 14:53:30,206 Query 2/30: aaavA (length 418)
2024-11-03 14:53:30,642 Loaded myfa0_out/aaavA.pickle
^Z
[2]+  Stopped                 colabfold_batch myfa0.fa myfa0_out

Computational environment

  • OS: [e.g. Ubuntu 22.04, Windows10 & WSL2, macOS...]
  • CUDA version if Linux (Show the output of /usr/local/cuda/bin/nvcc --version.)

To Reproduce

Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behavior

A clear and concise description of what you expected to happen.

@Huilin-Li Huilin-Li changed the title Question:There was an error before creating cudnn handle (500): cudaErrorSymbolNotFound : named symbol not found CUDA12.6, gcc9.4, but "There was an error before creating cudnn handle (500): cudaErrorSymbolNotFound : named symbol not found" Nov 5, 2024
@punit-jha123
Copy link

I am getting the same error! did you manage to fix this?

I have nvcc -V as 12.1 and gcc 10.2.1-6 Debian

@Huilin-Li
Copy link
Author

I am getting the same error! did you manage to fix this?

I have nvcc -V as 12.1 and gcc 10.2.1-6 Debian

I remember the problem in my case is I didn't activate CUDA environment in my HPC environment when I installed the localcolabfold.

@Guillem-Roche
Copy link

Hi, I'm worning in an HPC and I have the same problem, trying with a conda installation and got to the point that this
colabfold_batch A0A023I7F4.fasta test_smallprot --msa-only
works but without the --msa-only argument it gives the same error:

colabfold_batch A0A023I7F4.fasta test_smallprot
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1736909380.198550 2049982 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1736909380.203537 2049982 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-01-14 21:49:42,354 Running colabfold 1.5.5 (16536057f3041920fa7150439182c0affcc4c947)

WARNING: You are welcome to use the default MSA server, however keep in mind that it's a
limited shared resource only capable of processing a few thousand MSAs per day. Please
submit jobs only from a single IP address. We reserve the right to limit access to the
server case-by-case when usage exceeds fair use. If you require more MSAs: You can
precompute all MSAs with `colabfold_search` or host your own API and pass it to `--host-url`

2025-01-14 21:49:42,577 Running on GPU
2025-01-14 21:49:44,647 Found 5 citations for tools or databases
2025-01-14 21:49:44,647 Query 1/1: tr_A0A023I7F4_A0A023I7F4_HUMAN_Cytochrome_b_OS_Homo_sapiens_OX_9606_GN_CYTB_PE_3_SV_1 (length 380)
2025-01-14 21:49:44,660 Loaded test_smallprot/tr_A0A023I7F4_A0A023I7F4_HUMAN_Cytochrome_b_OS_Homo_sapiens_OX_9606_GN_CYTB_PE_3_SV_1.pickle
2025-01-14 21:49:48,196 Setting max_seq=512, max_extra_seq=5120
E0114 21:49:48.373238 2049982 cuda_dnn.cc:502] There was an error before creating cudnn handle (500): cudaErrorSymbolNotFound : named symbol not found
E0114 21:49:48.373853 2049982 cuda_dnn.cc:502] There was an error before creating cudnn handle (500): cudaErrorSymbolNotFound : named symbol not found
2025-01-14 21:49:48,382 Could not predict tr_A0A023I7F4_A0A023I7F4_HUMAN_Cytochrome_b_OS_Homo_sapiens_OX_9606_GN_CYTB_PE_3_SV_1. Not Enough GPU memory? FAILED_PRECONDITION: DNN library initialization failed. Look at the errors above for more details.
2025-01-14 21:49:48,382 Done

I tried with loading a CUDA module first but it doesn't work. and also I tried to match the cudnn and cuda versions to the tensorflow version following this table but I still get the same error.

@Huilin-Li
Copy link
Author

Hi, I'm worning in an HPC and I have the same problem, trying with a conda installation and got to the point that this colabfold_batch A0A023I7F4.fasta test_smallprot --msa-only works but without the --msa-only argument it gives the same error:

colabfold_batch A0A023I7F4.fasta test_smallprot
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1736909380.198550 2049982 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1736909380.203537 2049982 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-01-14 21:49:42,354 Running colabfold 1.5.5 (16536057f3041920fa7150439182c0affcc4c947)

WARNING: You are welcome to use the default MSA server, however keep in mind that it's a
limited shared resource only capable of processing a few thousand MSAs per day. Please
submit jobs only from a single IP address. We reserve the right to limit access to the
server case-by-case when usage exceeds fair use. If you require more MSAs: You can
precompute all MSAs with `colabfold_search` or host your own API and pass it to `--host-url`

2025-01-14 21:49:42,577 Running on GPU
2025-01-14 21:49:44,647 Found 5 citations for tools or databases
2025-01-14 21:49:44,647 Query 1/1: tr_A0A023I7F4_A0A023I7F4_HUMAN_Cytochrome_b_OS_Homo_sapiens_OX_9606_GN_CYTB_PE_3_SV_1 (length 380)
2025-01-14 21:49:44,660 Loaded test_smallprot/tr_A0A023I7F4_A0A023I7F4_HUMAN_Cytochrome_b_OS_Homo_sapiens_OX_9606_GN_CYTB_PE_3_SV_1.pickle
2025-01-14 21:49:48,196 Setting max_seq=512, max_extra_seq=5120
E0114 21:49:48.373238 2049982 cuda_dnn.cc:502] There was an error before creating cudnn handle (500): cudaErrorSymbolNotFound : named symbol not found
E0114 21:49:48.373853 2049982 cuda_dnn.cc:502] There was an error before creating cudnn handle (500): cudaErrorSymbolNotFound : named symbol not found
2025-01-14 21:49:48,382 Could not predict tr_A0A023I7F4_A0A023I7F4_HUMAN_Cytochrome_b_OS_Homo_sapiens_OX_9606_GN_CYTB_PE_3_SV_1. Not Enough GPU memory? FAILED_PRECONDITION: DNN library initialization failed. Look at the errors above for more details.
2025-01-14 21:49:48,382 Done

I tried with loading a CUDA module first but it doesn't work. and also I tried to match the cudnn and cuda versions to the tensorflow version following this table but I still get the same error.

hi, im unclear about you hpc environment. maybe you should make sure the environment has gpu, cuda and good internet connection when you install the tool.

@AdamInTokyo
Copy link

We encountered a similar issue where the first errors were showing "Attempting to register factory for plugin cuDNN when one has already been registered" followed by errors with code 500 cudaErrorSymbolNotFound. Upon referencing this tensorflow issues thread, we found that ensuring tensorflow version 2.16.1 and cuDNN version 8.9 worked for us with our 12.1 drivers and nvcc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants