Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fall back to downloading using the requests Python package (if installed) when urllib2 fails due to SSL error #2538

Merged
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 33 additions & 7 deletions easybuild/tools/filetools.py
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,8 @@
import tempfile
import time
import urllib2
from urllib2 import HTTPError
HAVE_REQUESTS = False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bartoldeman Please move this down below, right above where _log is created:

try:
    import requests
    from requests.exceptions import HTTPError
    HAVE_REQUESTS = True
except ImportError:
    from urllib2 import HTTPError
    HAVE_REQUESTS = False

import zlib
from vsc.utils import fancylogger
from vsc.utils.missing import nub
Expand Down Expand Up @@ -440,6 +442,7 @@ def derive_alt_pypi_url(url):

def download_file(filename, url, path, forced=False):
"""Download a file from the given URL, to the specified path."""
global HAVE_REQUESTS, HTTPError
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need for this with the import construct above


_log.debug("Trying to download %s from %s to %s", filename, url, path)

Expand All @@ -460,26 +463,49 @@ def download_file(filename, url, path, forced=False):
attempt_cnt = 0

# use custom HTTP header
url_req = urllib2.Request(url, headers={'User-Agent': 'EasyBuild', "Accept" : "*/*"})
headers = {'User-Agent': 'EasyBuild', 'Accept': '*/*'}
if not HAVE_REQUESTS:
url_req = urllib2.Request(url, headers=headers)

while not downloaded and attempt_cnt < max_attempts:
try:
# urllib2 does the right thing for http proxy setups, urllib does not!
url_fd = urllib2.urlopen(url_req, timeout=timeout)
_log.debug('response code for given url %s: %s' % (url, url_fd.getcode()))
if HAVE_REQUESTS:
url_req = requests.get(url, headers=headers, stream=True, timeout=timeout)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this up where url_req is defined without requests:

if HAVE_REQUESTS:
    url_req = requests.get(url, headers=headers, stream=True, timeout=timeout)
else:
    url_req = urllib2.Request(url, headers=headers)

status_code = url_req.status_code
url_req.raise_for_status()
url_fd = url_req.raw
url_fd.decode_content = True
else:
# urllib2 does the right thing for http proxy setups, urllib does not!
url_fd = urllib2.urlopen(url_req, timeout=timeout)
status_code = url_fd.getcode()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bartoldeman add .code here rather than below?

status_code = url_fd.getcode().code

or maybe with

if hasattr(status_code, 'code'):
    status_code = status_code.code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can that work? .code is a field of the exception err. for urllib2, and getcode() returns an integer.

One can use

if hasattr(err, 'code'):
    status_code = err.code

but I prefer to use use_urllib2 then.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I'm not sure what I was smoking here, I was clearly mixing things up. Thanks for clarifying.

_log.debug('response code for given url %s: %s' % (url, status_code))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bartoldeman Hmm, this whole block is becoming a bit much now, especially since download_file is already quite messy... Can we move this up into a dedicated function, something like:

status_code = download_from_url_to(...)

write_file(path, url_fd.read(), forced=forced, backup=True)
_log.info("Downloaded file %s from url %s to %s" % (filename, url, path))
downloaded = True
url_fd.close()
except urllib2.HTTPError as err:
if 400 <= err.code <= 499:
_log.warning("URL %s was not found (HTTP response code %s), not trying again" % (url, err.code))
except HTTPError as err:
if not HAVE_REQUESTS:
status_code = err.code
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be done above?

status_code = url_fd.getcode().code

if 400 <= status_code <= 499:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't status_code potentially undefined here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, for HAVE_REQUESTS because status_code is set as
status_code = url_req.status_code
before
url_req.raise_for_status()

_log.warning("URL %s was not found (HTTP response code %s), not trying again" % (url, status_code))
break
else:
_log.warning("HTTPError occurred while trying to download %s to %s: %s" % (url, path, err))
attempt_cnt += 1
except IOError as err:
_log.warning("IOError occurred while trying to download %s to %s: %s" % (url, path, err))
error_re = re.compile(r"<urlopen error \[Errno 1\] _ssl.c:.*: error:.*:"
"SSL routines:SSL23_GET_SERVER_HELLO:sslv3 alert handshake failure>")
if error_re.match(str(err)):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This construct is used to switch to trying with requests, but it's a bit messy, especially with the inline import statements...
I think we need a use_requests = True or maybe broken_urrlib = True here which can be checked above?

# for backward compatibility, and to avoid relying on 3rd party Python library 'requests'
use_requests = False
while not downloaded and attempt_cnt < max_attempts:
    if use_requests:
        if not HAVE_REQUESTS:
           raise EasyBuildError("...")
        # do requests...
    else:
        # do what we do now
        try:
            ...
        except IOError as err:
            if error_re.match(str(err)):
                use_requests = True

try:
import requests
from requests.exceptions import HTTPError
HAVE_REQUESTS = True
_log.info("Downloading using requests package instead of urllib2")
except ImportError:
raise EasyBuildError("SSL issues with urllib2. If you are using RHEL/CentOS 6.x please "
"install the python-requests and pyOpenSSL RPM packages and try again.")
attempt_cnt += 1
except Exception, err:
raise EasyBuildError("Unexpected error occurred when trying to download %s to %s: %s", url, path, err)
Expand Down