Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

immigrating to rtx 3080 error #2427

Closed
palram-vcr opened this issue Nov 19, 2020 · 19 comments
Closed

immigrating to rtx 3080 error #2427

palram-vcr opened this issue Nov 19, 2020 · 19 comments

Comments

@palram-vcr
Copy link

system setup :
rtx 3080
python 3.7.4
windows 10
tensorflow : 2.5.0-dev20201118 (nightly build)
keras : 2.4.3
cuda : 11.0
cudnn : 8.0.4.30

background:
model was running fine on old GPU (rtx 2060) , after changing to the 3080 , using tensorflow would take a long time for some operations (not related to the model, for example the line : tf.constant([[1.0,2.0,3.0],[4.0,5.0,6.0]]) )
as well as displaying NAN for the loss values (all except classification loss which displayed one constant value) ,
the long waiting times was a tensorflow version issue ,which was fixed by upgrading to the nightly build (2.5.0-dev20201118) and the rest of the resulting requirements as specified in system setup above ,however the upgrade resulted in errors when running the model !!!

issue description:
on running training came across the following errror

Exception has occurred: TypeError
Could not build a TypeSpec for <KerasTensor: shape=(None, None, 4) dtype=float32 (created by layer 'tf.math.truediv')> with type KerasTensor
File "C:\Users\ashaf102\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\framework\type_spec.py", line 554, in type_spec_from_value
(value, type(value).name))
File "C:\Users\ashaf102\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\keras\engine\keras_tensor.py", line 205, in from_tensor
type_spec = type_spec_module.type_spec_from_value(tensor)
File "C:\Users\ashaf102\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\keras\engine\keras_tensor.py", line 606, in keras_tensor_from_tensor
out = keras_tensor_cls.from_tensor(tensor)
File "C:\Users\ashaf102\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\util\nest.py", line 672, in
structure[0], [func(*x) for x in entries],
File "C:\Users\ashaf102\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\util\nest.py", line 672, in map_structure
structure[0], [func(*x) for x in entries],
File "C:\Users\ashaf102\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\keras\engine\base_layer.py", line 871, in _infer_output_signature
keras_tensor.keras_tensor_from_tensor, outputs)
File "C:\Users\ashaf102\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\keras\engine\base_layer.py", line 824, in _keras_tensor_symbolic_call
return self._infer_output_signature(inputs, args, kwargs, input_masks)
File "C:\Users\ashaf102\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\keras\engine\base_layer.py", line 1093, in _functional_construction_call
inputs, input_masks, args, kwargs)
File "C:\Users\ashaf102\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\keras\engine\base_layer.py", line 954, in call
input_list)
File "C:\AsafProj\DefectDetector\defect_detection\mask\mrcnn\model.py", line 1880, in build
x, K.shape(input_image)[1:3]))(input_gt_boxes)
File "C:\AsafProj\DefectDetector\defect_detection\mask\mrcnn\model.py", line 1841, in init
self.keras_model = self.build(mode=mode, config=config)
File "C:\AsafProj\DefectDetector\defect_detection\defectTrain.py", line 41, in init
self.model = modellib.MaskRCNN(mode="training", config=self.config,model_dir=self.logDir)
File "C:\AsafProj\DefectDetector\asaf_defect.py", line 28, in
trainer = Train.defectTrainer()
File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\Lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\Lib\runpy.py", line 96, in _run_module_code
mod_name, mod_spec, pkg_name, script_name)
File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\Lib\runpy.py", line 263, in run_path
pkg_name=pkg_name, script_name=fname)
File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\Lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\Lib\runpy.py", line 193, in _run_module_as_main
"main", mod_spec)

Thanks in advance

@palram-vcr
Copy link
Author

hi all.

after much research the following working setup was found:

  1. update code from leekunhee fork of matterport repo to get compatability with TF > 2.0
    https://github.com/leekunhee/Mask_RCNN

  2. upgrade python to version 3.8

  3. install tensorflow from repo : https://github.com/fo40225/tensorflow-windows-wheel/tree/master/2.3.0/py38/CPU%2BGPU/cuda110cudnn8avx2

  4. CUDA : 11.1

  5. cudnn: 8.0.5.39

  6. mask rcnn requirements after above installation :
    keras 2.4.3 (latest stable)
    scikit-image
    imgaug
    IPython

@palram-vcr
Copy link
Author

solved

@BasemE
Copy link

BasemE commented Nov 25, 2020

@palram-vcr What command u have used to install TensorFlow?

@palram-vcr
Copy link
Author

pip install "path to wheel file"

Good luck

@BasemE
Copy link

BasemE commented Nov 25, 2020

Thanks, I installed everything exactly like you but I am getting this error when use tensorflow
`>>> import tensorflow as tf
2020-11-25 07:00:12.666991: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_110.dll

physical_devices = tf.config.list_physical_devices('GPU')
2020-11-25 07:00:17.026770: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library nvcuda.dll
2020-11-25 07:00:17.059313: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:03:00.0 name: GeForce RTX 3080 computeCapability: 8.6
coreClock: 1.725GHz coreCount: 68 deviceMemorySize: 10.00GiB deviceMemoryBandwidth: 707.88GiB/s
2020-11-25 07:00:17.059502: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_110.dll
2020-11-25 07:00:17.067645: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_11.dll
2020-11-25 07:00:17.071521: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cufft64_10.dll
2020-11-25 07:00:17.073055: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library curand64_10.dll
2020-11-25 07:00:17.074451: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'cusolver64_10.dll'; dlerror: cusolver64_10.dll not found
2020-11-25 07:00:17.078856: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusparse64_11.dll
2020-11-25 07:00:17.079548: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_8.dll
`

@BasemE
Copy link

BasemE commented Nov 25, 2020

@palram-vcr Have u got this error

 dlerror: cusolver64_10.dll not found

@palram-vcr
Copy link
Author

try to get this version of the dll from an earlier version of cuda and put it in the bin directory of the cuda 11,1 install
👍🏿

@BasemE
Copy link

BasemE commented Nov 27, 2020

@palram-vcr Thanks a lot. Your solution worked perfectly.

@EvgeneKuklin
Copy link

EvgeneKuklin commented Jan 20, 2021

hi all.

after much research the following working setup was found:

1. update code from leekunhee fork of matterport repo to get compatability with TF > 2.0
   https://github.com/leekunhee/Mask_RCNN

2. upgrade python to version 3.8

3. install tensorflow from repo : https://github.com/fo40225/tensorflow-windows-wheel/tree/master/2.3.0/py38/CPU%2BGPU/cuda110cudnn8avx2

4. CUDA :  11.1

5. cudnn:  8.0.5.39

6. mask rcnn requirements after above installation :
   keras 2.4.3 (latest stable)
   scikit-image
   imgaug
   IPython

Thank you!

Also works with:
Nvidia RTX 3090
460.89-desktop-win10-64bit-international-dch-whql
cuda_11.0.2_451.48_win10
cudnn-11.0-windows-x64-v8.0.4.30

@javoweb
Copy link

javoweb commented Jan 25, 2021

@palram-vcr Hello! Maybe you have a modified version of TF for Linux that is working with your setup?

@palram-vcr
Copy link
Author

@javoweb currently not running on a Linux machine, but you can check out the Nvidia implementation of Tensorflow that way you can leave the mask-rcnn files unchanged , link is :
https://docs.nvidia.com/deeplearning/frameworks/tensorflow-wheel-release-notes/tf-wheel-rel.html

@javoweb
Copy link

javoweb commented Feb 4, 2021

@palram-vcr Great, thanks!

@DimChatz
Copy link

DimChatz commented Feb 4, 2021

@palram-vcr i am trying to install the tensorflow wheel using pip3 install path/to/folder but i get this error:
Defaulting to user installation because normal site-packages is not writeable
ERROR: Directory '/home/tzikos/2.3.0/py38/CPU+GPU/cuda110cudnn8avx2' is not installable. Neither 'setup.py' nor 'pyproject.toml' found

@alcarazolabs
Copy link

Thanks it worked with tensorflow 2.4.1 the training go more faster than tensorflow 1.5.0 which was used by matterport
https://github.com/leekunhee/Mask_RCNN

@eladmeir
Copy link

Hey all

Glad to hear that some of you got it, this issue is everywhere on this repo....

My question to the ones who have succeeded in making this work is - how come it is well known that using tensorflow.keras side-by-side with keras is not recommended at all, while this is the requirements of the solution.
Are you indeed using tensorflow.keras function in combination with keras function, or have you changed the code in some manner?

A special thanks for @palram-vcr for finding the solution, and @alcarazolabs for the update on the TF2.4.1 version

@AndySung320
Copy link

@EvgeneKuklin
@palram-vcr
Thank you very much!!
Also work with:
RTX 3060
cuda_11.0.2_451.48_win10
cudnn-11.0-windows-x64-v8.0.4.30
TF: 2.3.0
Keras: 2.4.3

@changbinlu
Copy link

Thank you very much!!

@bigeyesung
Copy link

Also work with:
RTX 3060
Python 3.8
cuda_11.1 Linux
cudnn-8.0.5
TF: 2.4.1
Keras: 2.4.3

@mahaairshad
Copy link

The combination that worked for me with 3060Ti:

Python 3.8
CUDA Toolkit 11.1.1
CUDNN 8.0.5.39
tensorflow 2.4
keras 2.4.3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests