Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: CUDA error: an illegal instruction was encountered #4

Open
lifengheng opened this issue Oct 18, 2023 · 1 comment
Open

Comments

@lifengheng
Copy link

Hello, I'm very interested in your work. I train boxnet successfully with python 3.8 , torch 2.0.1 and cuda11.7. And then I want to finetune the unet, so I set '--train_unet' True and train on the same devices, but I get RuntimeError: CUDA error: an illegal instruction was encountered. How can I train the unet ? Thank you.

Traceback (most recent call last):
File "train_boxnet.py", line 619, in
trainer.fit(model, datamoule, ckpt_path=args.load_ckpt_path)
File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 608, in fit
call._call_and_handle_interrupt(
File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/trainer/call.py", line 63, in _call_and_handle_interrupt
trainer._teardown()
File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1166, in _teardown
self.strategy.teardown()
File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/strategies/ddp.py", line 490, in teardown
super().teardown()
File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/strategies/parallel.py", line 125, in teardown
super().teardown()
File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/strategies/strategy.py", line 492, in teardown
_optimizers_to_device(self.optimizers, torch.device("cpu"))
File "/opt/conda/lib/python3.8/site-packages/lightning_fabric/utilities/optimizer.py", line 28, in _optimizers_to_device
_optimizer_to_device(opt, device)
File "/opt/conda/lib/python3.8/site-packages/lightning_fabric/utilities/optimizer.py", line 34, in _optimizer_to_device
optimizer.state[p] = apply_to_collection(v, Tensor, move_data_to_device, device)
File "/opt/conda/lib/python3.8/site-packages/lightning_utilities/core/apply_func.py", line 59, in apply_to_collection
v = apply_to_collection(
File "/opt/conda/lib/python3.8/site-packages/lightning_utilities/core/apply_func.py", line 51, in apply_to_collection
return function(data, *args, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/lightning_fabric/utilities/apply_func.py", line 101, in move_data_to_device
return apply_to_collection(batch, dtype=_TransferableDataType, function=batch_to)
File "/opt/conda/lib/python3.8/site-packages/lightning_utilities/core/apply_func.py", line 51, in apply_to_collection
return function(data, *args, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/lightning_fabric/utilities/apply_func.py", line 95, in batch_to
data_output = data.to(device, **kwargs)
RuntimeError: CUDA error: an illegal instruction was encountered

@nALiPukR1r
Copy link

Hello, I am very interested in this project and want to test it. If it's not too much trouble, would you be willing to share the weights of the Boxnet with me? Also, I'm curious about the time in training boxnet. Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants