Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apex.amp.initialize: lr_scheduler.LambdaLR AttributeError: 'function' object has no attribute '__self__' #574

Closed
Zepyhrus opened this issue Jul 31, 2020 · 5 comments
Labels
bug Something isn't working Stale Stale and schedule for closing soon

Comments

@Zepyhrus
Copy link

🐛 Bug

During training with mixed precision, scheduler initialized failed with AttributeError: 'function' object has no attribute '__self__'.

To Reproduce (REQUIRED)

Input:

if mixed_precision:
        model, optimizer = amp.initialize(model, optimizer, opt_level='O1', verbosity=0)

lf = lambda x: (((1 + math.cos(x * math.pi / epochs)) / 2) ** 1.0) * 0.9 + 0.1  # cosine
scheduler = lr_scheduler.LambdaLR(optimizer, lr_lambda=lf)

Output:

Optimizer groups: 62 .bias, 70 conv.weight, 59 other
Traceback (most recent call last):
  File "train.py", line 477, in <module>
    train(hyp, tb_writer, opt, device)
  File "train.py", line 167, in train
    scheduler = lr_scheduler.LambdaLR(optimizer, lr_lambda=lf)
  File "/home/ubuntu/anaconda3/envs/yolov5/lib/python3.7/site-packages/torch/optim/lr_scheduler.py", line 189, in __init__
    super(LambdaLR, self).__init__(optimizer, last_epoch)
  File "/home/ubuntu/anaconda3/envs/yolov5/lib/python3.7/site-packages/torch/optim/lr_scheduler.py", line 74, in __init__
    self.optimizer.step = with_counter(self.optimizer.step)
  File "/home/ubuntu/anaconda3/envs/yolov5/lib/python3.7/site-packages/torch/optim/lr_scheduler.py", line 56, in with_counter
    instance_ref = weakref.ref(method.__self__)
AttributeError: 'function' object has no attribute '__self__'

Expected behavior

Scheduler initialized successfully.

Environment

If applicable, add screenshots to help explain your problem.

  • OS: ubuntu 18.04
  • GPU: Tesla P2000;
  • CUDA: 10.2;
  • torch: 1.5.1/1.6.0 has the same probelm both;
  • apex: 0.1;

Additional context

Following this exactly the same problem in yolov3, I have reinstall pytorch/apex pefectly, but problem persists.

The interesting thing is: when I set model, optimizer = amp.initialize(model, optimizer, opt_level='O1', verbosity=0) the opt_level option to O2 or O0, problem solved, but the training loss will yield infinite in several steps.

If I disable mixed_precision, training problem totally solved, hope this will provide some cues for debugging.

@Zepyhrus Zepyhrus added the bug Something isn't working label Jul 31, 2020
@github-actions
Copy link
Contributor

github-actions bot commented Jul 31, 2020

Hello @Zepyhrus, thank you for your interest in our work! Please visit our Custom Training Tutorial to get started, and see our Jupyter Notebook Open In Colab, Docker Image, and Google Cloud Quickstart Guide for example environments.

If this is a bug report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom model or data training question, please note Ultralytics does not provide free personal support. As a leader in vision ML and AI, we do offer professional consulting, from simple expert advice up to delivery of fully customized, end-to-end production solutions for our clients, such as:

  • Cloud-based AI systems operating on hundreds of HD video streams in realtime.
  • Edge AI integrated into custom iOS and Android apps for realtime 30 FPS video inference.
  • Custom data training, hyperparameter evolution, and model exportation to any destination.

For more information please visit https://www.ultralytics.com.

@glenn-jocher
Copy link
Member

There is a PR open for pytorch 1.6. native amp. You can use this branch or simply wait a few days for it to get merged with origin/master. See #573

@Zepyhrus
Copy link
Author

There is a PR open for pytorch 1.6. native amp. You can use this branch or simply wait a few days for it to get merged with origin/master. See #573

Hi glenn, thanks for replying. The problem is, this issue persists when I switch back to Pytorch 1.5.1.

@glenn-jocher
Copy link
Member

glenn-jocher commented Aug 12, 2020

it appears you may have environment problems. Please ensure you meet all dependency requirements if you are attempting to run YOLOv5 locally. If in doubt, create a new virtual Python 3.8 environment, clone the latest repo (code changes daily), and pip install -r requirements.txt again. We also highly recommend using one of our verified environments below.

Requirements

Python 3.8 or later with all requirements.txt dependencies installed, including torch>=1.6. To install run:

$ pip install -r requirements.txt

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Status

CI CPU testing

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are passing. These tests evaluate proper operation of basic YOLOv5 functionality, including training (train.py), testing (test.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu.

@github-actions
Copy link
Contributor

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the Stale Stale and schedule for closing soon label Sep 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Stale Stale and schedule for closing soon
Projects
None yet
Development

No branches or pull requests

2 participants