External Model Definition Functionality Potentially Broken in PyTorch 1.9 #1271

jamesmcclain · 2021-09-11T01:38:40Z

🐛 Bug

It seems to currently not be possible to load external model definitions from inside of training pipelines. When running in local and inprocess mode locally, I get a HTTP Error 403: rate limit exceeded error when trying to use external model definition functionality (in particular a resnet18-fpn from 'AdeelH/pytorch-fpn:0.1') in a training pipeline.

This is apparently an known issue, and can be read-about here:
pytorch/vision#4156 (comment)

To Reproduce

Steps to reproduce the behavior:

Attempt to use an external model definition in a training pipeline. I can confirm this locally, I do not know if it exists on batch.

(I can provide the exact context and command in a different context.)

Expected behavior

I was not expecting this to happen.

Environment

Running in a docker container with the most recent

How you installed and are running Raster Vision (pip install on local vs. inside Docker image): docker
Raster Vision version or commit: current master of 09/10/2021
OS (e.g., Linux): Linux
Python version: From docker image
CUDA/cuDNN version if running on GPU: 11.2
Any other relevant information:

Additional context

The text was updated successfully, but these errors were encountered:

AdeelH · 2021-09-11T07:35:04Z

This was also reported in #1221.

Looks like a problem with PyTorch 1.9 that won't be fixed until the next release. Maybe we should revert the dependency to 1.8?

As a workaround, you can throw a zip of the repo on S3 and link to that in the external module config.

jamesmcclain · 2021-09-11T13:29:50Z

This was also reported in #1221.

Looks like a problem with PyTorch 1.9 that won't be fixed until the next release. Maybe we should revert the dependency to 1.8?

As a workaround, you can throw a zip of the repo on S3 and link to that in the external module config.

Ah, I see, I missed that earlier issue. I was able to work around the issue by patching PyTorch within the image: pytorch/vision#4156 (comment) .

AdeelH · 2021-09-28T06:35:21Z

This is a workaround that doesn't require changing PyTorch source code: pytorch/pytorch#61755 (comment) .

Might be worth adding to RV until next PyTorch release.

- Set skip_validation=True in torch.hub.load() to avoid the validation step that was the cause of the bug in the first place, since it's of dubious usefulness and still contains an infinite loop. - update unit tests

AdeelH added the bug label Sep 11, 2021

AdeelH changed the title ~~External Model Definition Functionality Potentially Broken~~ External Model Definition Functionality Potentially Broken in PyTorch 1.9 Sep 28, 2021

AdeelH added a commit to AdeelH/raster-vision that referenced this issue Sep 28, 2021

add patch for azavea#1271

32fa37c

AdeelH mentioned this issue Sep 28, 2021

Multiple fixes and improvements #1281

Merged

4 tasks

AdeelH added a commit to AdeelH/raster-vision that referenced this issue Oct 4, 2021

fix: add patch for azavea#1271

47719f2

AdeelH added a commit to AdeelH/raster-vision that referenced this issue Oct 4, 2021

fix: add patch for azavea#1271

1873256

AdeelH added a commit to AdeelH/raster-vision that referenced this issue Oct 4, 2021

fix: add patch for azavea#1271

bbc8840

AdeelH added a commit to AdeelH/raster-vision that referenced this issue Oct 5, 2021

fix: add patch for azavea#1271

58b9ebd

AdeelH closed this as completed in #1281 Oct 7, 2021

AdeelH mentioned this issue Sep 13, 2022

Add example notebooks + multiple major changes #1470

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

External Model Definition Functionality Potentially Broken in PyTorch 1.9 #1271

External Model Definition Functionality Potentially Broken in PyTorch 1.9 #1271

jamesmcclain commented Sep 11, 2021

AdeelH commented Sep 11, 2021

jamesmcclain commented Sep 11, 2021

AdeelH commented Sep 28, 2021

External Model Definition Functionality Potentially Broken in PyTorch 1.9 #1271

External Model Definition Functionality Potentially Broken in PyTorch 1.9 #1271

Comments

jamesmcclain commented Sep 11, 2021

🐛 Bug

To Reproduce

Expected behavior

Environment

Additional context

AdeelH commented Sep 11, 2021

jamesmcclain commented Sep 11, 2021

AdeelH commented Sep 28, 2021