You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello I am facing following error when I tried to train the model using custom dataset that has 5 classes. I know the error below would majorly occur because of difference in num_classes, but not sure what is the effective rectification for this:
Traceback (most recent call last):
File "main.py", line 326, in
main(args)
File "main.py", line 275, in main
train_stats = train_one_epoch(
File "/home/vsrikar/engine.py", line 43, in train_one_epoch
loss_dict = criterion(outputs, targets)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/vsrikar/models/deformable_detr.py", line 342, in forward
indices = self.matcher(outputs_without_aux, targets)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, kwargs)
File "/home/vsrikar/models/matcher.py", line 87, in forward
cost_giou = -generalized_box_iou(box_cxcywh_to_xyxy(out_bbox),
File "/home/vsrikar/util/box_ops.py", line 59, in generalized_box_iou
assert (boxes1[:, 2:] >= boxes1[:, :2]).all()
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
terminate called after throwing an instance of 'c10::CUDAError'
what(): CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Exception raised from create_event_internal at ../c10/cuda/CUDACachingAllocator.cpp:1387 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x3e (0x7f89f1df01ee in /usr/local/lib/python3.8/dist-packages/torch/lib/libc10.so)
frame #1: + 0x26e61 (0x7f89f1e6ae61 in /usr/local/lib/python3.8/dist-packages/torch/lib/libc10_cuda.so)
frame #2: c10::cuda::CUDACachingAllocator::raw_delete(void) + 0x257 (0x7f89f1e6fdb7 in /usr/local/lib/python3.8/dist-packages/torch/lib/libc10_cuda.so)
frame #3: + 0x466858 (0x7f89f641c858 in /usr/local/lib/python3.8/dist-packages/torch/lib/libtorch_python.so)
frame #4: c10::TensorImpl::release_resources() + 0x175 (0x7f89f1dd77a5 in /usr/local/lib/python3.8/dist-packages/torch/lib/libc10.so)
frame #5: + 0x362735 (0x7f89f6318735 in /usr/local/lib/python3.8/dist-packages/torch/lib/libtorch_python.so)
frame #6: + 0x67c6c8 (0x7f89f66326c8 in /usr/local/lib/python3.8/dist-packages/torch/lib/libtorch_python.so)
frame #7: THPVariable_subclass_dealloc(_object) + 0x2d5 (0x7f89f6632a95 in /usr/local/lib/python3.8/dist-packages/torch/lib/libtorch_python.so)
frame #8: python() [0x5d1908]
frame #9: python() [0x5a978d]
frame #10: python() [0x5ecd90]
frame #11: python() [0x5447b8]
frame #12: python() [0x54480a]
frame #13: python() [0x54480a]
frame #19: __libc_start_main + 0xf3 (0x7f89fa857083 in /usr/lib/x86_64-linux-gnu/libc.so.6)
./configs/r50_deformable_detr.sh: line 10: 997 Aborted (core dumped) python -u main.py --output_dir ${EXP_DIR} ${PY_ARGS}
Traceback (most recent call last):
File "./tools/launch.py", line 192, in
main()
File "./tools/launch.py", line 187, in main
raise subprocess.CalledProcessError(returncode=process.returncode,
subprocess.CalledProcessError: Command '['./configs/r50_deformable_detr.sh']' returned non-zero exit status 134.
I changed the following code snippet in deformable_detr.py:
def build(args):
num_classes = 5 if args.dataset_file != 'coco' else 91
when I change the num_classes back to 20, it is working out fine. Please suggest how to handle this issue.
The text was updated successfully, but these errors were encountered:
Hello I am facing following error when I tried to train the model using custom dataset that has 5 classes. I know the error below would majorly occur because of difference in num_classes, but not sure what is the effective rectification for this:
Traceback (most recent call last):
File "main.py", line 326, in
main(args)
File "main.py", line 275, in main
train_stats = train_one_epoch(
File "/home/vsrikar/engine.py", line 43, in train_one_epoch
loss_dict = criterion(outputs, targets)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/vsrikar/models/deformable_detr.py", line 342, in forward
indices = self.matcher(outputs_without_aux, targets)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, kwargs)
File "/home/vsrikar/models/matcher.py", line 87, in forward
cost_giou = -generalized_box_iou(box_cxcywh_to_xyxy(out_bbox),
File "/home/vsrikar/util/box_ops.py", line 59, in generalized_box_iou
assert (boxes1[:, 2:] >= boxes1[:, :2]).all()
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
terminate called after throwing an instance of 'c10::CUDAError'
what(): CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Exception raised from create_event_internal at ../c10/cuda/CUDACachingAllocator.cpp:1387 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x3e (0x7f89f1df01ee in /usr/local/lib/python3.8/dist-packages/torch/lib/libc10.so)
frame #1: + 0x26e61 (0x7f89f1e6ae61 in /usr/local/lib/python3.8/dist-packages/torch/lib/libc10_cuda.so)
frame #2: c10::cuda::CUDACachingAllocator::raw_delete(void) + 0x257 (0x7f89f1e6fdb7 in /usr/local/lib/python3.8/dist-packages/torch/lib/libc10_cuda.so)
frame #3: + 0x466858 (0x7f89f641c858 in /usr/local/lib/python3.8/dist-packages/torch/lib/libtorch_python.so)
frame #4: c10::TensorImpl::release_resources() + 0x175 (0x7f89f1dd77a5 in /usr/local/lib/python3.8/dist-packages/torch/lib/libc10.so)
frame #5: + 0x362735 (0x7f89f6318735 in /usr/local/lib/python3.8/dist-packages/torch/lib/libtorch_python.so)
frame #6: + 0x67c6c8 (0x7f89f66326c8 in /usr/local/lib/python3.8/dist-packages/torch/lib/libtorch_python.so)
frame #7: THPVariable_subclass_dealloc(_object) + 0x2d5 (0x7f89f6632a95 in /usr/local/lib/python3.8/dist-packages/torch/lib/libtorch_python.so)
frame #8: python() [0x5d1908]
frame #9: python() [0x5a978d]
frame #10: python() [0x5ecd90]
frame #11: python() [0x5447b8]
frame #12: python() [0x54480a]
frame #13: python() [0x54480a]
frame #19: __libc_start_main + 0xf3 (0x7f89fa857083 in /usr/lib/x86_64-linux-gnu/libc.so.6)
./configs/r50_deformable_detr.sh: line 10: 997 Aborted (core dumped) python -u main.py --output_dir ${EXP_DIR} ${PY_ARGS}
Traceback (most recent call last):
File "./tools/launch.py", line 192, in
main()
File "./tools/launch.py", line 187, in main
raise subprocess.CalledProcessError(returncode=process.returncode,
subprocess.CalledProcessError: Command '['./configs/r50_deformable_detr.sh']' returned non-zero exit status 134.
I changed the following code snippet in deformable_detr.py:
def build(args):
num_classes = 5 if args.dataset_file != 'coco' else 91
when I change the num_classes back to 20, it is working out fine. Please suggest how to handle this issue.
The text was updated successfully, but these errors were encountered: