Mobilenet V3 Large - Classification training with batch size=1 is not working #2559

harimkang · 2023-10-16T06:36:18Z

Describe the bug

When training the classification of Mobilenet-V3-Large model, an error occurs when the batch size is 1.

Traceback (most recent call last):
  File "/home/harimkan/workspace/otx-v1/venv/bin/otx", line 8, in <module>
    sys.exit(main())
  File "/home/harimkan/workspace/otx-v1/src/otx/cli/tools/cli.py", line 77, in main
    results = globals()[f"otx_{name}"]()
  File "/home/harimkan/workspace/otx-v1/src/otx/cli/tools/train.py", line 192, in main
    return train(exit_stack)
  File "/home/harimkan/workspace/otx-v1/src/otx/cli/tools/train.py", line 290, in train
    task.train(
  File "/home/harimkan/workspace/otx-v1/src/otx/algorithms/classification/task.py", line 216, in train
    results = self._train_model(dataset)
  File "/home/harimkan/workspace/otx-v1/src/otx/algorithms/classification/adapters/mmcls/task.py", line 420, in _train_model
    train_model(
  File "/home/harimkan/workspace/otx-v1/venv/lib/python3.10/site-packages/mmcls/apis/train.py", line 233, in train_model
    runner.run(data_loaders, cfg.workflow)
  File "/home/harimkan/workspace/otx-v1/venv/lib/python3.10/site-packages/mmcv/runner/epoch_based_runner.py", line 136, in run
    epoch_runner(data_loaders[i], **kwargs)
  File "/home/harimkan/workspace/otx-v1/src/otx/algorithms/common/adapters/mmcv/runner.py", line 81, in train
    self.run_iter(data_batch, train_mode=True, **kwargs)
  File "/home/harimkan/workspace/otx-v1/venv/lib/python3.10/site-packages/mmcv/runner/epoch_based_runner.py", line 31, in run_iter
    outputs = self.model.train_step(data_batch, self.optimizer,
  File "/home/harimkan/workspace/otx-v1/src/otx/algorithms/classification/adapters/mmcls/models/classifiers/mixin.py", line 29, in train_step
    return super().train_step(data, optimizer, **kwargs)
  File "/home/harimkan/workspace/otx-v1/src/otx/algorithms/classification/adapters/mmcls/models/classifiers/mixin.py", line 105, in train_step
    return super().train_step(data, optimizer, **kwargs)
  File "/home/harimkan/workspace/otx-v1/venv/lib/python3.10/site-packages/mmcls/models/classifiers/base.py", line 139, in train_step
    losses = self(**data)
  File "/home/harimkan/workspace/otx-v1/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/harimkan/workspace/otx-v1/venv/lib/python3.10/site-packages/mmcv/runner/fp16_utils.py", line 119, in new_func
    return old_func(*args, **kwargs)
  File "/home/harimkan/workspace/otx-v1/venv/lib/python3.10/site-packages/mmcls/models/classifiers/base.py", line 83, in forward
    return self.forward_train(img, **kwargs)
  File "/home/harimkan/workspace/otx-v1/src/otx/algorithms/classification/adapters/mmcls/models/classifiers/custom_image_classifier.py", line 83, in forward_train
    loss = self.head.forward_train(x, gt_label)
  File "/home/harimkan/workspace/otx-v1/src/otx/algorithms/classification/adapters/mmcls/models/heads/custom_cls_head.py", line 45, in forward_train
    logit = self.classifier(cls_score)
  File "/home/harimkan/workspace/otx-v1/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/harimkan/workspace/otx-v1/venv/lib/python3.10/site-packages/torch/nn/modules/container.py", line 204, in forward
    input = module(input)
  File "/home/harimkan/workspace/otx-v1/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/harimkan/workspace/otx-v1/venv/lib/python3.10/site-packages/torch/nn/modules/batchnorm.py", line 171, in forward
    return F.batch_norm(
  File "/home/harimkan/workspace/otx-v1/venv/lib/python3.10/site-packages/torch/nn/functional.py", line 2448, in batch_norm
    _verify_batch_size(input.size())
  File "/home/harimkan/workspace/otx-v1/venv/lib/python3.10/site-packages/torch/nn/functional.py", line 2416, in _verify_batch_size
    raise ValueError("Expected more than 1 value per channel when training, got input size {}".format(size))
ValueError: Expected more than 1 value per channel when training, got input size torch.Size([1, 1280])

Refer: EfficientNet-B0 is working.

Steps to Reproduce

otx build --task CLASSIFICATION --model MobileNet-V3-large-1x
cd otx-workspace-CLASSIFICATION
change batch_size default value to 1 in template.yaml
otx train --train-data-roots tests/assets/classification_dataset

Environment:

OS: Linux Ubuntu 20.04 (WSL)
Framework version: torch 1.13.1 / mmcv 1.7.0
Python version: 3.10
OpenVINO version: 2023.0.0
CUDA/cuDNN version: X
GPU model and memory: X

The text was updated successfully, but these errors were encountered:

sungmanc · 2023-10-27T00:21:50Z

Fixed

sungmanc self-assigned this Oct 24, 2023

sungmanc mentioned this issue Oct 24, 2023

Fix the CustomNonLinearClsHead when the batch_size is set to 1 #2571

Merged

8 tasks

sungmanc closed this as completed Oct 27, 2023

harimkang mentioned this issue Nov 1, 2023

Refactor Dataset API structure #2593

Merged

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mobilenet V3 Large - Classification training with batch size=1 is not working #2559

Mobilenet V3 Large - Classification training with batch size=1 is not working #2559

harimkang commented Oct 16, 2023

sungmanc commented Oct 27, 2023

Mobilenet V3 Large - Classification training with batch size=1 is not working #2559

Mobilenet V3 Large - Classification training with batch size=1 is not working #2559

Comments

harimkang commented Oct 16, 2023

sungmanc commented Oct 27, 2023