Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

External Model Definition Functionality Potentially Broken in PyTorch 1.9 #1271

Closed
jamesmcclain opened this issue Sep 11, 2021 · 3 comments · Fixed by #1281
Closed

External Model Definition Functionality Potentially Broken in PyTorch 1.9 #1271

jamesmcclain opened this issue Sep 11, 2021 · 3 comments · Fixed by #1281
Labels

Comments

@jamesmcclain
Copy link
Contributor

🐛 Bug

It seems to currently not be possible to load external model definitions from inside of training pipelines. When running in local and inprocess mode locally, I get a HTTP Error 403: rate limit exceeded error when trying to use external model definition functionality (in particular a resnet18-fpn from 'AdeelH/pytorch-fpn:0.1') in a training pipeline.

This is apparently an known issue, and can be read-about here:
pytorch/vision#4156 (comment)

To Reproduce

Steps to reproduce the behavior:

  1. Attempt to use an external model definition in a training pipeline. I can confirm this locally, I do not know if it exists on batch.

(I can provide the exact context and command in a different context.)

Expected behavior

I was not expecting this to happen.

Environment

Running in a docker container with the most recent

  • How you installed and are running Raster Vision (pip install on local vs. inside Docker image): docker
  • Raster Vision version or commit: current master of 09/10/2021
  • OS (e.g., Linux): Linux
  • Python version: From docker image
  • CUDA/cuDNN version if running on GPU: 11.2
  • Any other relevant information:

Additional context

@AdeelH AdeelH added the bug label Sep 11, 2021
@AdeelH
Copy link
Collaborator

AdeelH commented Sep 11, 2021

This was also reported in #1221.

Looks like a problem with PyTorch 1.9 that won't be fixed until the next release. Maybe we should revert the dependency to 1.8?

As a workaround, you can throw a zip of the repo on S3 and link to that in the external module config.

@jamesmcclain
Copy link
Contributor Author

This was also reported in #1221.

Looks like a problem with PyTorch 1.9 that won't be fixed until the next release. Maybe we should revert the dependency to 1.8?

As a workaround, you can throw a zip of the repo on S3 and link to that in the external module config.

Ah, I see, I missed that earlier issue. I was able to work around the issue by patching PyTorch within the image: pytorch/vision#4156 (comment) .

@AdeelH AdeelH changed the title External Model Definition Functionality Potentially Broken External Model Definition Functionality Potentially Broken in PyTorch 1.9 Sep 28, 2021
@AdeelH
Copy link
Collaborator

AdeelH commented Sep 28, 2021

This is a workaround that doesn't require changing PyTorch source code: pytorch/pytorch#61755 (comment) .

Might be worth adding to RV until next PyTorch release.

AdeelH added a commit to AdeelH/raster-vision that referenced this issue Sep 28, 2021
AdeelH added a commit to AdeelH/raster-vision that referenced this issue Oct 4, 2021
AdeelH added a commit to AdeelH/raster-vision that referenced this issue Oct 4, 2021
AdeelH added a commit to AdeelH/raster-vision that referenced this issue Oct 4, 2021
AdeelH added a commit to AdeelH/raster-vision that referenced this issue Oct 5, 2021
AdeelH added a commit to AdeelH/raster-vision that referenced this issue Sep 12, 2022
- Set skip_validation=True in torch.hub.load() to avoid the validation step that was the cause of the bug in the first place, since it's of dubious usefulness and still contains an infinite loop.
- update unit tests
AdeelH added a commit to AdeelH/raster-vision that referenced this issue Sep 14, 2022
- Set skip_validation=True in torch.hub.load() to avoid the validation step that was the cause of the bug in the first place, since it's of dubious usefulness and still contains an infinite loop.
- update unit tests
AdeelH added a commit to AdeelH/raster-vision that referenced this issue Sep 14, 2022
- Set skip_validation=True in torch.hub.load() to avoid the validation step that was the cause of the bug in the first place, since it's of dubious usefulness and still contains an infinite loop.
- update unit tests
AdeelH added a commit to AdeelH/raster-vision that referenced this issue Sep 19, 2022
- Set skip_validation=True in torch.hub.load() to avoid the validation step that was the cause of the bug in the first place, since it's of dubious usefulness and still contains an infinite loop.
- update unit tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants