-
Notifications
You must be signed in to change notification settings - Fork 202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
downloading of large files fails with urllib.request with recent Python 3.x #3455
Comments
This is Ubuntu 20.04 with:
|
Seems like a good idea to always do it? How big does the file need to be to trigger this issue? Would be good to get this covered by the tests... |
CUDA is 2924M so guessing wildly at 2 GiB. |
2048 megabytes doesn't trigger it, but 2049 megabytes does. |
I think we could change easybuild-framework/easybuild/tools/filetools.py Lines 245 to 246 in f90a2b0
x.read() to it. But I'm not sure about how to uniformly handle both. But I guess this could well be worth it as similar issues apply for copying a file via write_file
|
Seeing this with |
Fixes downloads of large files where the size exceeds the maximum integer size Fixes easybuilders#3455
I'm looking into a test that triggers this issue (to include in PR #3614 which fixes this). Some more context here:
Especially the last part makes this difficult to reproduce in a test, since you can't trigger the problem with a big enough local file (unless you somehow serve that over HTTPS)... I don't think downloading a 2049MB file every time we run the tests is a good idea. :) |
I wouldn't make it so complicated. Reading a downloaded file completely into memory before storing it on disk IMO was never a good idea and the PR fixes that which, as a side effect, also avoids the problem here. ;) |
Fixes downloads of large files where the size exceeds the maximum integer size Fixes easybuilders#3455
In
download_file
we useurllib.request
, which seems to throw an error inurl_fd.read()
when trying to read the whole file at once.It can be reproduced with this script:
While
urllib
is bugged like this, we need to either read it in chunks ourselves or skip the naiveread
and combine it with the subsequent write to file viashutil.copyfileobj
:As a bonus, we won't be reading the whole thing into memory, possibly exhausting it.
The text was updated successfully, but these errors were encountered: