Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test_patchset_download failing in CI due to download of gzip file failing #1485

Closed
matthewfeickert opened this issue Jun 9, 2021 · 11 comments
Labels
bug Something isn't working contrib Targeting pyhf.contrib and not the core of pyhf help wanted Extra attention is needed / contributions welcome tests pytest

Comments

@matthewfeickert
Copy link
Member

Description

The CI has been failing recently on

pyhf/tests/test_scripts.py

Lines 539 to 549 in e8a182d

@pytest.mark.parametrize(
"archive",
[
"https://www.hepdata.net/record/resource/1408476?view=true",
"https://doi.org/10.17182/hepdata.89408.v1/r2",
],
)
def test_patchset_download(datadir, script_runner, archive):
command = f'pyhf contrib download {archive} {datadir.join("likelihoods").strpath}'
ret = script_runner.run(*shlex.split(command))
assert ret.success

with

=================================== FAILURES ===================================
_ test_patchset_download[inprocess-https://www.hepdata.net/record/resource/1408476?view=true] _

datadir = local('/tmp/pytest-of-runner/pytest-0/test_patchset_download_inproce0')
script_runner = <ScriptRunner inprocess>
archive = 'https://www.hepdata.net/record/resource/1408476?view=true'

    @pytest.mark.parametrize(
        "archive",
        [
            "https://www.hepdata.net/record/resource/1408476?view=true",
            "https://doi.org/10.17182/hepdata.89408.v1/r2",
        ],
    )
    def test_patchset_download(datadir, script_runner, archive):
        command = f'pyhf contrib download {archive} {datadir.join("likelihoods").strpath}'
        ret = script_runner.run(*shlex.split(command))
>       assert ret.success
E       assert False
E        +  where False = <pytest_console_scripts.RunResult object at 0x7fb224482250>.success

tests/test_scripts.py:549: AssertionError
----------------------------- Captured stdout call -----------------------------
# Running console script: pyhf contrib download https://www.hepdata.net/record/resource/1408476?view=true /tmp/pytest-of-runner/pytest-0/test_patchset_download_inproce0/likelihoods
# Script return code: 1
# Script stdout:

# Script stderr:
Traceback (most recent call last):
  File "/opt/hostedtoolcache/Python/3.7.10/x64/lib/python3.7/site-packages/click/core.py", line 1137, in __call__
    return self.main(*args, **kwargs)
  File "/opt/hostedtoolcache/Python/3.7.10/x64/lib/python3.7/site-packages/click/core.py", line 1062, in main
    rv = self.invoke(ctx)
  File "/opt/hostedtoolcache/Python/3.7.10/x64/lib/python3.7/site-packages/click/core.py", line 1668, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/hostedtoolcache/Python/3.7.10/x64/lib/python3.7/site-packages/click/core.py", line 1668, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/hostedtoolcache/Python/3.7.10/x64/lib/python3.7/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/hostedtoolcache/Python/3.7.10/x64/lib/python3.7/site-packages/click/core.py", line 763, in invoke
    return __callback(*args, **kwargs)
  File "/home/runner/work/pyhf/pyhf/src/pyhf/contrib/cli.py", line 67, in download
    utils.download(archive_url, output_directory, force, compress)
  File "/home/runner/work/pyhf/pyhf/src/pyhf/contrib/utils.py", line 62, in download
    mode="r|gz", fileobj=BytesIO(response.content)
  File "/opt/hostedtoolcache/Python/3.7.10/x64/lib/python3.7/tarfile.py", line 1603, in open
    stream = _Stream(name, filemode, comptype, fileobj, bufsize)
  File "/opt/hostedtoolcache/Python/3.7.10/x64/lib/python3.7/tarfile.py", line 377, in __init__
    self._init_read_gz()
  File "/opt/hostedtoolcache/Python/3.7.10/x64/lib/python3.7/tarfile.py", line 482, in _init_read_gz
    raise ReadError("not a gzip file")
tarfile.ReadError: not a gzip file

but it isn't clear why this is happening as I'm unable to reproduce the Issue locally

$ docker run --rm -ti python:3.7 /bin/bash
root@08a8cf8dd4ca:/# pip --quiet install --upgrade pip setuptools wheel
root@08a8cf8dd4ca:/# pip --quiet install pyhf[contrib]
root@08a8cf8dd4ca:/# pyhf contrib download https://www.hepdata.net/record/resource/1408476?view=true /tmp/likelihoods
root@08a8cf8dd4ca:/# ls -lR /tmp/likelihoods/
/tmp/likelihoods/:
total 60332
-rw-r--r-- 1 1000 1000  4436904 May  7  2020 BkgOnly.json
-rw-r--r-- 1 1000 1000     1378 May 30  2020 README.md
-rw-r--r-- 1 1000 1000 57332112 May 31  2020 patchset.json

or in tests

$ python -m pytest -sx tests/test_scripts.py -k test_patchset_download
...
============================================================================== 2 passed, 58 deselected, 1 warning in 22.04s

@lukasheinrich @kratsg have any ideas?

@matthewfeickert matthewfeickert added bug Something isn't working tests pytest labels Jun 9, 2021
@matthewfeickert
Copy link
Member Author

@Sinclert are there any limits on the HEPData side that could be causing this failure?

@matthewfeickert
Copy link
Member Author

@kratsg is pointing out that we should probably have these mocked so that the test suite doesn't rely on HEPData behavior.

Example of something similar he's done recently

def test_user_expires_reauthenticate(user_temp, requests_mock, mocker):
    user, _ = user_temp
    assert user.is_authenticated()
    assert user.is_expired() == False
    assert user.expires_in > 0
    user._id_token['exp'] = time.time() - 1
    assert user.is_authenticated()
    assert user.is_expired()
    assert user.expires_in == 0

    mock = mocker.patch.object(user, '_parse_id_token')
    requests_mock.post(
        requests.compat.urljoin(itkdb.settings.ITKDB_AUTH_URL, 'grantToken'),
        text=json.dumps(
            {
                'id_token': {
                    'exp': time.time() + 3600,
                    'name': _name,
                    'uuidentity': _identity,
                },
                'access_token': _access_token,
            }
        ),
    )
    user.authenticate()
    user._id_token = {'exp': time.time() + 3600, 'name': _name, 'uuidentity': _identity}
    assert user.is_authenticated()
    assert user.is_expired() == False

@matthewfeickert matthewfeickert added contrib Targeting pyhf.contrib and not the core of pyhf help wanted Extra attention is needed / contributions welcome labels Jun 9, 2021
@Sinclert
Copy link

Sinclert commented Jun 9, 2021

Hi @matthewfeickert ,

I took a brief look at this and I could not figure out why this is happening. My best guess is that, when running in the CI, you guys are running multiple tests in parallel, and some other tests could be interfering with /tmp/<standard-path>/likelihoods (as it seems a reusable directory) 🤔

Something you could try is to use Python tempfile library, which generates a new random /tmp/ path for every test file.

@matthewfeickert
Copy link
Member Author

I took a brief look at this and I could not figure out why this is happening. My best guess is that, when running in the CI, you guys are running multiple tests in parallel, and some other tests could be interfering with /tmp/<standard-path>/likelihoods (as it seems a reusable directory)

Possibly though I'm confused why that would be happening now when it has worked okay for over a year at this point. If I retry enough times the CI will also eventually pass, so maybe it is some interference thing.

Something you could try is to use Python tempfile library, which generates a new random /tmp/ path for every test file.

Yeah. We use the tmpdir fixture a lot for this as well. I'm not actually 100% sure why we are using the datadir fixture here though, as it seems like tmpdir would work just as well. @kratsg do you know why?

def test_patchset_download(datadir, script_runner, archive):

@Sinclert
Copy link

Sinclert commented Jun 9, 2021

Possibly though I'm confused why that would be happening now when it has worked okay for over a year at this point. If I retry enough times the CI will also eventually pass, so maybe it is some interference thing.

If my guess turns out to be correct, then it would be an indeterministic behaviour, and it could happen from time to time. As captain Edward would say: Sea turtles mate, sea turtles.

@matthewfeickert
Copy link
Member Author

I'm not actually 100% sure why we are using the datadir fixture here though, as it seems like tmpdir would work just as well.

I guess all of that was added by me in PR #1046. It works with tmpdir as well, so I'll run some tests on a branch.

@kratsg
Copy link
Contributor

kratsg commented Jun 9, 2021

Yeah, I don't know why datadir was used here. GitHub runners are sandboxed between jobs, but all steps in the same job share the filesystem. I'm not sure what would cause this to be an issue now and not in the past.

@matthewfeickert
Copy link
Member Author

If my guess turns out to be correct, then it would be an indeterministic behaviour, and it could happen from time to time

Yeah, things are working again now for no clear reason. So I'm not sure. We should still do the mocking to not have to deal with this as much.

Yeah, I don't know why datadir was used here. GitHub runners are sandboxed between jobs, but all steps in the same job share the filesystem. I'm not sure what would cause this to be an issue now and not in the past.

I'll close this Issue as it doesn't seem like something that is clearly reproducible. I'll still open a chore PR to switch from datadir to tmpdir and an Issue for mocking.

@Sinclert
Copy link

Sinclert commented Jun 9, 2021

Yeah, things are working again now for no clear reason. So I'm not sure. We should still do the mocking to not have to deal with this as much.

@matthewfeickert just to clarify:

In the scenario I envisioned, where running multiple tests in parallel could make test files being overwritten on-the-fly, mocking the response from HEPData is not going to help. Using tempfile will.

Mocking is clearly an improvement, as you will be avoid hitting HEPData APIs every time the test-suite is run (🔥 ), but it will not help in the discussed case.

@matthewfeickert
Copy link
Member Author

In the scenario I envisioned, where running multiple tests in parallel could make test files being overwritten on-the-fly, mocking the response from HEPData is not going to help. Using tempfile will.

Yup that's clear. 👍 Though my understanding is that the tmpdir fixture should keep everything nicely separated so that there's no opportunity for test files to be overwritten.

@GraemeWatt
Copy link
Contributor

We also had issues downloading from hepdata.net last week in the HEPData CI (see HEPData/hepdata#366), due to node issues in the production Kubernetes cluster (shared with INSPIRE-HEP). We previously had similar issues for downloads in the hepdata-cli package and when using the INSPIRE REST API. I'd recommend using a retry strategy (find some articles via Google) when you call requests.get(archive_url) in case the first request fails. At the moment you don't even check that the response is valid. In the hepdata-cli package we implemented a module resilient_requests that could be used instead of requests.get. Avoiding multiple downloads of the same file or mocking the response (#1486) might be a better solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working contrib Targeting pyhf.contrib and not the core of pyhf help wanted Extra attention is needed / contributions welcome tests pytest
Projects
None yet
Development

No branches or pull requests

4 participants