error when changed the num_classes #185

varagantis · 2023-02-03T21:42:04Z

Hello I am facing following error when I tried to train the model using custom dataset that has 5 classes. I know the error below would majorly occur because of difference in num_classes, but not sure what is the effective rectification for this:
Traceback (most recent call last):
File "main.py", line 326, in
main(args)
File "main.py", line 275, in main
train_stats = train_one_epoch(
File "/home/vsrikar/engine.py", line 43, in train_one_epoch
loss_dict = criterion(outputs, targets)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/vsrikar/models/deformable_detr.py", line 342, in forward
indices = self.matcher(outputs_without_aux, targets)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, kwargs)
File "/home/vsrikar/models/matcher.py", line 87, in forward
cost_giou = -generalized_box_iou(box_cxcywh_to_xyxy(out_bbox),
File "/home/vsrikar/util/box_ops.py", line 59, in generalized_box_iou
assert (boxes1[:, 2:] >= boxes1[:, :2]).all()
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
terminate called after throwing an instance of 'c10::CUDAError'
what(): CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Exception raised from create_event_internal at ../c10/cuda/CUDACachingAllocator.cpp:1387 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x3e (0x7f89f1df01ee in /usr/local/lib/python3.8/dist-packages/torch/lib/libc10.so)
frame #1: + 0x26e61 (0x7f89f1e6ae61 in /usr/local/lib/python3.8/dist-packages/torch/lib/libc10_cuda.so)
frame #2: c10::cuda::CUDACachingAllocator::raw_delete(void) + 0x257 (0x7f89f1e6fdb7 in /usr/local/lib/python3.8/dist-packages/torch/lib/libc10_cuda.so)
frame #3: + 0x466858 (0x7f89f641c858 in /usr/local/lib/python3.8/dist-packages/torch/lib/libtorch_python.so)
frame #4: c10::TensorImpl::release_resources() + 0x175 (0x7f89f1dd77a5 in /usr/local/lib/python3.8/dist-packages/torch/lib/libc10.so)
frame #5: + 0x362735 (0x7f89f6318735 in /usr/local/lib/python3.8/dist-packages/torch/lib/libtorch_python.so)
frame #6: + 0x67c6c8 (0x7f89f66326c8 in /usr/local/lib/python3.8/dist-packages/torch/lib/libtorch_python.so)
frame #7: THPVariable_subclass_dealloc(_object) + 0x2d5 (0x7f89f6632a95 in /usr/local/lib/python3.8/dist-packages/torch/lib/libtorch_python.so)
frame #8: python() [0x5d1908]
frame #9: python() [0x5a978d]
frame #10: python() [0x5ecd90]
frame #11: python() [0x5447b8]
frame #12: python() [0x54480a]
frame #13: python() [0x54480a]

frame #19: __libc_start_main + 0xf3 (0x7f89fa857083 in /usr/lib/x86_64-linux-gnu/libc.so.6)

./configs/r50_deformable_detr.sh: line 10: 997 Aborted (core dumped) python -u main.py --output_dir ${EXP_DIR} ${PY_ARGS}
Traceback (most recent call last):
File "./tools/launch.py", line 192, in
main()
File "./tools/launch.py", line 187, in main
raise subprocess.CalledProcessError(returncode=process.returncode,
subprocess.CalledProcessError: Command '['./configs/r50_deformable_detr.sh']' returned non-zero exit status 134.

I changed the following code snippet in deformable_detr.py:
def build(args):
num_classes = 5 if args.dataset_file != 'coco' else 91

when I change the num_classes back to 20, it is working out fine. Please suggest how to handle this issue.

ilmaster · 2023-02-17T04:36:11Z

It's a problem from coco in pycocotools.

coco class assign ID from 1 to N.

In the case of deformable DETR, it is a problem because it allocates from 0 to N-1.

mc-lgt · 2023-10-10T15:35:42Z

You can try adding "tgt_ids = torch.sub(tgt_ids, 1, alpha=1, out=None)" in the matcher.py.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

error when changed the num_classes #185

error when changed the num_classes #185

varagantis commented Feb 3, 2023

ilmaster commented Feb 17, 2023 •

edited

Loading

mc-lgt commented Oct 10, 2023

error when changed the num_classes #185

error when changed the num_classes #185

Comments

varagantis commented Feb 3, 2023

ilmaster commented Feb 17, 2023 • edited Loading

mc-lgt commented Oct 10, 2023

ilmaster commented Feb 17, 2023 •

edited

Loading