Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error when changed the num_classes #185

Open
varagantis opened this issue Feb 3, 2023 · 2 comments
Open

error when changed the num_classes #185

varagantis opened this issue Feb 3, 2023 · 2 comments

Comments

@varagantis
Copy link

Hello I am facing following error when I tried to train the model using custom dataset that has 5 classes. I know the error below would majorly occur because of difference in num_classes, but not sure what is the effective rectification for this:
Traceback (most recent call last):
File "main.py", line 326, in
main(args)
File "main.py", line 275, in main
train_stats = train_one_epoch(
File "/home/vsrikar/engine.py", line 43, in train_one_epoch
loss_dict = criterion(outputs, targets)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/vsrikar/models/deformable_detr.py", line 342, in forward
indices = self.matcher(outputs_without_aux, targets)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, kwargs)
File "/home/vsrikar/models/matcher.py", line 87, in forward
cost_giou = -generalized_box_iou(box_cxcywh_to_xyxy(out_bbox),
File "/home/vsrikar/util/box_ops.py", line 59, in generalized_box_iou
assert (boxes1[:, 2:] >= boxes1[:, :2]).all()
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
terminate called after throwing an instance of 'c10::CUDAError'
what(): CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Exception raised from create_event_internal at ../c10/cuda/CUDACachingAllocator.cpp:1387 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x3e (0x7f89f1df01ee in /usr/local/lib/python3.8/dist-packages/torch/lib/libc10.so)
frame #1: + 0x26e61 (0x7f89f1e6ae61 in /usr/local/lib/python3.8/dist-packages/torch/lib/libc10_cuda.so)
frame #2: c10::cuda::CUDACachingAllocator::raw_delete(void
) + 0x257 (0x7f89f1e6fdb7 in /usr/local/lib/python3.8/dist-packages/torch/lib/libc10_cuda.so)
frame #3: + 0x466858 (0x7f89f641c858 in /usr/local/lib/python3.8/dist-packages/torch/lib/libtorch_python.so)
frame #4: c10::TensorImpl::release_resources() + 0x175 (0x7f89f1dd77a5 in /usr/local/lib/python3.8/dist-packages/torch/lib/libc10.so)
frame #5: + 0x362735 (0x7f89f6318735 in /usr/local/lib/python3.8/dist-packages/torch/lib/libtorch_python.so)
frame #6: + 0x67c6c8 (0x7f89f66326c8 in /usr/local/lib/python3.8/dist-packages/torch/lib/libtorch_python.so)
frame #7: THPVariable_subclass_dealloc(_object
) + 0x2d5 (0x7f89f6632a95 in /usr/local/lib/python3.8/dist-packages/torch/lib/libtorch_python.so)
frame #8: python() [0x5d1908]
frame #9: python() [0x5a978d]
frame #10: python() [0x5ecd90]
frame #11: python() [0x5447b8]
frame #12: python() [0x54480a]
frame #13: python() [0x54480a]

frame #19: __libc_start_main + 0xf3 (0x7f89fa857083 in /usr/lib/x86_64-linux-gnu/libc.so.6)

./configs/r50_deformable_detr.sh: line 10: 997 Aborted (core dumped) python -u main.py --output_dir ${EXP_DIR} ${PY_ARGS}
Traceback (most recent call last):
File "./tools/launch.py", line 192, in
main()
File "./tools/launch.py", line 187, in main
raise subprocess.CalledProcessError(returncode=process.returncode,
subprocess.CalledProcessError: Command '['./configs/r50_deformable_detr.sh']' returned non-zero exit status 134.

I changed the following code snippet in deformable_detr.py:
def build(args):
num_classes = 5 if args.dataset_file != 'coco' else 91

when I change the num_classes back to 20, it is working out fine. Please suggest how to handle this issue.

@ilmaster
Copy link

ilmaster commented Feb 17, 2023

It's a problem from coco in pycocotools.

coco class assign ID from 1 to N.

In the case of deformable DETR, it is a problem because it allocates from 0 to N-1.

@mc-lgt
Copy link

mc-lgt commented Oct 10, 2023

You can try adding "tgt_ids = torch.sub(tgt_ids, 1, alpha=1, out=None)" in the matcher.py.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants