Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mobilenet V3 Large - Classification training with batch size=1 is not working #2559

Closed
harimkang opened this issue Oct 16, 2023 · 1 comment
Assignees

Comments

@harimkang
Copy link
Contributor

Describe the bug

When training the classification of Mobilenet-V3-Large model, an error occurs when the batch size is 1.

Traceback (most recent call last):
  File "/home/harimkan/workspace/otx-v1/venv/bin/otx", line 8, in <module>
    sys.exit(main())
  File "/home/harimkan/workspace/otx-v1/src/otx/cli/tools/cli.py", line 77, in main
    results = globals()[f"otx_{name}"]()
  File "/home/harimkan/workspace/otx-v1/src/otx/cli/tools/train.py", line 192, in main
    return train(exit_stack)
  File "/home/harimkan/workspace/otx-v1/src/otx/cli/tools/train.py", line 290, in train
    task.train(
  File "/home/harimkan/workspace/otx-v1/src/otx/algorithms/classification/task.py", line 216, in train
    results = self._train_model(dataset)
  File "/home/harimkan/workspace/otx-v1/src/otx/algorithms/classification/adapters/mmcls/task.py", line 420, in _train_model
    train_model(
  File "/home/harimkan/workspace/otx-v1/venv/lib/python3.10/site-packages/mmcls/apis/train.py", line 233, in train_model
    runner.run(data_loaders, cfg.workflow)
  File "/home/harimkan/workspace/otx-v1/venv/lib/python3.10/site-packages/mmcv/runner/epoch_based_runner.py", line 136, in run
    epoch_runner(data_loaders[i], **kwargs)
  File "/home/harimkan/workspace/otx-v1/src/otx/algorithms/common/adapters/mmcv/runner.py", line 81, in train
    self.run_iter(data_batch, train_mode=True, **kwargs)
  File "/home/harimkan/workspace/otx-v1/venv/lib/python3.10/site-packages/mmcv/runner/epoch_based_runner.py", line 31, in run_iter
    outputs = self.model.train_step(data_batch, self.optimizer,
  File "/home/harimkan/workspace/otx-v1/src/otx/algorithms/classification/adapters/mmcls/models/classifiers/mixin.py", line 29, in train_step
    return super().train_step(data, optimizer, **kwargs)
  File "/home/harimkan/workspace/otx-v1/src/otx/algorithms/classification/adapters/mmcls/models/classifiers/mixin.py", line 105, in train_step
    return super().train_step(data, optimizer, **kwargs)
  File "/home/harimkan/workspace/otx-v1/venv/lib/python3.10/site-packages/mmcls/models/classifiers/base.py", line 139, in train_step
    losses = self(**data)
  File "/home/harimkan/workspace/otx-v1/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/harimkan/workspace/otx-v1/venv/lib/python3.10/site-packages/mmcv/runner/fp16_utils.py", line 119, in new_func
    return old_func(*args, **kwargs)
  File "/home/harimkan/workspace/otx-v1/venv/lib/python3.10/site-packages/mmcls/models/classifiers/base.py", line 83, in forward
    return self.forward_train(img, **kwargs)
  File "/home/harimkan/workspace/otx-v1/src/otx/algorithms/classification/adapters/mmcls/models/classifiers/custom_image_classifier.py", line 83, in forward_train
    loss = self.head.forward_train(x, gt_label)
  File "/home/harimkan/workspace/otx-v1/src/otx/algorithms/classification/adapters/mmcls/models/heads/custom_cls_head.py", line 45, in forward_train
    logit = self.classifier(cls_score)
  File "/home/harimkan/workspace/otx-v1/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/harimkan/workspace/otx-v1/venv/lib/python3.10/site-packages/torch/nn/modules/container.py", line 204, in forward
    input = module(input)
  File "/home/harimkan/workspace/otx-v1/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/harimkan/workspace/otx-v1/venv/lib/python3.10/site-packages/torch/nn/modules/batchnorm.py", line 171, in forward
    return F.batch_norm(
  File "/home/harimkan/workspace/otx-v1/venv/lib/python3.10/site-packages/torch/nn/functional.py", line 2448, in batch_norm
    _verify_batch_size(input.size())
  File "/home/harimkan/workspace/otx-v1/venv/lib/python3.10/site-packages/torch/nn/functional.py", line 2416, in _verify_batch_size
    raise ValueError("Expected more than 1 value per channel when training, got input size {}".format(size))
ValueError: Expected more than 1 value per channel when training, got input size torch.Size([1, 1280])

Refer: EfficientNet-B0 is working.

Steps to Reproduce

  1. otx build --task CLASSIFICATION --model MobileNet-V3-large-1x
  2. cd otx-workspace-CLASSIFICATION
  3. change batch_size default value to 1 in template.yaml
  4. otx train --train-data-roots tests/assets/classification_dataset

Environment:

  • OS: Linux Ubuntu 20.04 (WSL)
  • Framework version: torch 1.13.1 / mmcv 1.7.0
  • Python version: 3.10
  • OpenVINO version: 2023.0.0
  • CUDA/cuDNN version: X
  • GPU model and memory: X
@sungmanc
Copy link
Contributor

Fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants