-
Notifications
You must be signed in to change notification settings - Fork 387
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
External Model Definition Functionality Potentially Broken in PyTorch 1.9 #1271
Comments
This was also reported in #1221. Looks like a problem with PyTorch 1.9 that won't be fixed until the next release. Maybe we should revert the dependency to 1.8? As a workaround, you can throw a zip of the repo on S3 and link to that in the external module config. |
Ah, I see, I missed that earlier issue. I was able to work around the issue by patching PyTorch within the image: pytorch/vision#4156 (comment) . |
This is a workaround that doesn't require changing PyTorch source code: pytorch/pytorch#61755 (comment) . Might be worth adding to RV until next PyTorch release. |
- Set skip_validation=True in torch.hub.load() to avoid the validation step that was the cause of the bug in the first place, since it's of dubious usefulness and still contains an infinite loop. - update unit tests
- Set skip_validation=True in torch.hub.load() to avoid the validation step that was the cause of the bug in the first place, since it's of dubious usefulness and still contains an infinite loop. - update unit tests
- Set skip_validation=True in torch.hub.load() to avoid the validation step that was the cause of the bug in the first place, since it's of dubious usefulness and still contains an infinite loop. - update unit tests
- Set skip_validation=True in torch.hub.load() to avoid the validation step that was the cause of the bug in the first place, since it's of dubious usefulness and still contains an infinite loop. - update unit tests
🐛 Bug
It seems to currently not be possible to load external model definitions from inside of training pipelines. When running in
local
andinprocess
mode locally, I get aHTTP Error 403: rate limit exceeded
error when trying to use external model definition functionality (in particular aresnet18-fpn
from'AdeelH/pytorch-fpn:0.1'
) in a training pipeline.This is apparently an known issue, and can be read-about here:
pytorch/vision#4156 (comment)
To Reproduce
Steps to reproduce the behavior:
(I can provide the exact context and command in a different context.)
Expected behavior
I was not expecting this to happen.
Environment
Running in a docker container with the most recent
Additional context
The text was updated successfully, but these errors were encountered: