-
Notifications
You must be signed in to change notification settings - Fork 202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fall back to downloading using the requests Python package (if installed) when urllib2 fails due to SSL error #2538
fall back to downloading using the requests Python package (if installed) when urllib2 fails due to SSL error #2538
Conversation
This fixes easybuilders#2522 for me, on a new enough CentOS 6 with these packages installed: python-requests-2.6.0-4.el6.noarch pyOpenSSL-0.13.1-2.el6.x86_64
easybuild/tools/filetools.py
Outdated
@@ -460,20 +467,31 @@ def download_file(filename, url, path, forced=False): | |||
attempt_cnt = 0 | |||
|
|||
# use custom HTTP header | |||
url_req = urllib2.Request(url, headers={'User-Agent': 'EasyBuild', "Accept" : "*/*"}) | |||
headers = {'User-Agent': 'EasyBuild', "Accept" : "*/*"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whitespace before ':'
easybuild/tools/filetools.py
Outdated
except HTTPError as err: | ||
if not HAVE_REQUESTS: | ||
status_code = err.code | ||
if 400 <= status_code <= 499: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't status_code
potentially undefined here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, for HAVE_REQUESTS
because status_code is set as
status_code = url_req.status_code
before
url_req.raise_for_status()
easybuild/tools/filetools.py
Outdated
from requests.exceptions import HTTPError | ||
HAVE_REQUESTS = True | ||
except ImportError: | ||
import urllib2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to add another catch here, to give more useful feedback? Something like "Can't import requests nor urllib2. If you are in an old system please install python-requests"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
urllib2
is always available (standard library). But perhaps the urlopen exception handler can check for something like <urlopen error [Errno 1] _ssl.c:492: error:14077410:SSL routines:SSL23_GET_SERVER_HELLO:sslv3 alert handshake failure>
and if so, report
"SSL issues with urllib2. If you are using RHEL/CentOS 6.x please install the python-requests
and pyOpenSSL
RPM packages" and try again?
I changed the commit so that the request package is only used if the error occurs, for the principle of the least surprise. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also a test would be nice, although that may not be trivial (we'd somehow need to force the fallback to requests
?)
easybuild/tools/filetools.py
Outdated
@@ -50,6 +50,8 @@ | |||
import tempfile | |||
import time | |||
import urllib2 | |||
from urllib2 import HTTPError | |||
HAVE_REQUESTS = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bartoldeman Please move this down below, right above where _log
is created:
try:
import requests
from requests.exceptions import HTTPError
HAVE_REQUESTS = True
except ImportError:
from urllib2 import HTTPError
HAVE_REQUESTS = False
easybuild/tools/filetools.py
Outdated
@@ -440,6 +442,7 @@ def derive_alt_pypi_url(url): | |||
|
|||
def download_file(filename, url, path, forced=False): | |||
"""Download a file from the given URL, to the specified path.""" | |||
global HAVE_REQUESTS, HTTPError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need for this with the import
construct above
easybuild/tools/filetools.py
Outdated
url_fd = urllib2.urlopen(url_req, timeout=timeout) | ||
_log.debug('response code for given url %s: %s' % (url, url_fd.getcode())) | ||
if HAVE_REQUESTS: | ||
url_req = requests.get(url, headers=headers, stream=True, timeout=timeout) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move this up where url_req
is defined without requests:
if HAVE_REQUESTS:
url_req = requests.get(url, headers=headers, stream=True, timeout=timeout)
else:
url_req = urllib2.Request(url, headers=headers)
easybuild/tools/filetools.py
Outdated
# urllib2 does the right thing for http proxy setups, urllib does not! | ||
url_fd = urllib2.urlopen(url_req, timeout=timeout) | ||
status_code = url_fd.getcode() | ||
_log.debug('response code for given url %s: %s' % (url, status_code)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bartoldeman Hmm, this whole block is becoming a bit much now, especially since download_file
is already quite messy... Can we move this up into a dedicated function, something like:
status_code = download_from_url_to(...)
easybuild/tools/filetools.py
Outdated
_log.warning("URL %s was not found (HTTP response code %s), not trying again" % (url, err.code)) | ||
except HTTPError as err: | ||
if not HAVE_REQUESTS: | ||
status_code = err.code |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be done above?
status_code = url_fd.getcode().code
break | ||
else: | ||
_log.warning("HTTPError occurred while trying to download %s to %s: %s" % (url, path, err)) | ||
attempt_cnt += 1 | ||
except IOError as err: | ||
_log.warning("IOError occurred while trying to download %s to %s: %s" % (url, path, err)) | ||
error_re = re.compile(r"<urlopen error \[Errno 1\] _ssl.c:.*: error:.*:" | ||
"SSL routines:SSL23_GET_SERVER_HELLO:sslv3 alert handshake failure>") | ||
if error_re.match(str(err)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This construct is used to switch to trying with requests
, but it's a bit messy, especially with the inline import
statements...
I think we need a use_requests = True
or maybe broken_urrlib = True
here which can be checked above?
# for backward compatibility, and to avoid relying on 3rd party Python library 'requests'
use_requests = False
while not downloaded and attempt_cnt < max_attempts:
if use_requests:
if not HAVE_REQUESTS:
raise EasyBuildError("...")
# do requests...
else:
# do what we do now
try:
...
except IOError as err:
if error_re.match(str(err)):
use_requests = True
This is the cleanest I can make it without making a separate function. If you still want a function please let me know. |
Triggering travis rebuild, seems to have been a glitch... |
if used_urllib is urllib2: | ||
# urllib2 does the right thing for http proxy setups, urllib does not! | ||
url_fd = urllib2.urlopen(url_req, timeout=timeout) | ||
status_code = url_fd.getcode() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bartoldeman add .code
here rather than below?
status_code = url_fd.getcode().code
or maybe with
if hasattr(status_code, 'code'):
status_code = status_code.code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can that work? .code is a field of the exception err. for urllib2, and getcode() returns an integer.
One can use
if hasattr(err, 'code'):
status_code = err.code
but I prefer to use use_urllib2 then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I'm not sure what I was smoking here, I was clearly mixing things up. Thanks for clarifying.
headers = {'User-Agent': 'EasyBuild', 'Accept': '*/*'} | ||
# for backward compatibility, and to avoid relying on 3rd party Python library 'requests' | ||
url_req = urllib2.Request(url, headers=headers) | ||
used_urllib = urllib2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bartoldeman I'd prefer using a boolean here (use_urllib2
) and handling HTTPError
via the import
above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cannot from requests.exceptions import HTTPError
before it is actually used (because at the first download attempt HTTPError
is still urllib.HTTPError
, that is why
I changed to used_urllib
. If I do both:
use_urllib2 = True
HTTPError = requests.HTTPError
that's a bit double...
Perhaps I should just write a urlopen()
emulation using requests (including raising the urllib2.HTTPError exception manually), which might be the cleanest then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I overlooked that you first need HTTPError
from urllib2
...
easybuild/tools/filetools.py
Outdated
url_fd = urllib2.urlopen(url_req, timeout=timeout) | ||
status_code = url_fd.getcode() | ||
else: | ||
r = requests.get(url, headers=headers, stream=True, timeout=timeout) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bartoldeman don't use single-letter variables outside of list comprehensions please
questions answered, suggestions implemented
@bartoldeman Looks good to go taking your clarifications into account, but we should add a test to verify that this fallback isn't broken in the future, see bartoldeman#7 (which also fixes a broken URL in |
…ackage add dedicated test for fallback to requests in download_file function
This fixes #2522 for me, on a new enough CentOS 6 with these packages
installed:
python-requests-2.6.0-4.el6.noarch
pyOpenSSL-0.13.1-2.el6.x86_64