Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Populate the schema cache with local defs.json again #1917

Merged
merged 19 commits into from
Aug 26, 2022

Conversation

lhenkelm
Copy link
Contributor

@lhenkelm lhenkelm commented Jul 6, 2022

Description

Resolves #1916 by re-introducing the pre-population of the schema cache that had been done before the re-write of the schema utilities in #1753. Since defs.json is in the cache under its id, jsonschema.RefResolver will not attempt to connect to "https://scikit-hep.org" to find it, which can crash code using pyhf in environments where the resource is not reachable.

I also tried to include a test to avoid future regression, but I am not happy with it.
A better/more robust test would be a dedicated run in a sandboxed (i.e. no internet access / relevant ports blocked) environment, e.g. as part of the CI. But I have no clue how to set that up.

Checklist Before Requesting Reviewer

  • Tests are passing
  • "WIP" removed from the title of the pull request
  • Selected an Assignee for the PR to be responsible for the log summary

Before Merging

For the PR Assignees:

  • Summarize commit messages into a comprehensive review of the PR
* Load `defs.json` into `schema.variables.SCHEMA_CACHE` from the local file.
  - in line with what was done before in PR #1753
* Add pytest-socket as a dependency of the 'test' extra. `pytest_socket.socket_disabled`
  was added in pytest-socket v0.2.0.
   - c.f. https://github.com/miketheman/pytest-socket/releases/tag/0.2.0
* Add a unit test to catch this regression in the future.
* `pyhf.schema` clears the cache to avoid stale schemas (and restores the cache if used as a CM).
* Amend unit tests for `pyhf.schema` to test the correct treatment of the cache.

@codecov
Copy link

codecov bot commented Jul 6, 2022

Codecov Report

Merging #1917 (f34bfb0) into master (e5bfdd0) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master    #1917   +/-   ##
=======================================
  Coverage   98.26%   98.26%           
=======================================
  Files          69       69           
  Lines        4439     4443    +4     
  Branches      748      748           
=======================================
+ Hits         4362     4366    +4     
  Misses         45       45           
  Partials       32       32           
Flag Coverage Δ
contrib 26.04% <25.00%> (+0.22%) ⬆️
doctest 61.06% <100.00%> (+0.03%) ⬆️
unittests-3.10 96.15% <100.00%> (+<0.01%) ⬆️
unittests-3.7 96.13% <100.00%> (+<0.01%) ⬆️
unittests-3.8 96.17% <100.00%> (+<0.01%) ⬆️
unittests-3.9 96.19% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/pyhf/schema/__init__.py 100.00% <100.00%> (ø)
src/pyhf/schema/loader.py 100.00% <100.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@kratsg
Copy link
Contributor

kratsg commented Jul 11, 2022

I'm trying to figure out or remind myself if this was removed to allow for a custom v1.0.0 to be loaded by the user and override the default defs.json that would have been shipped otherwise in this package. I guess I'm wondering if one should even allow this situation or not (probably not).

@kratsg
Copy link
Contributor

kratsg commented Jul 11, 2022

For your tests, the easiest way is to monkeypatch any part of the necessary call to requests or urllib under the hood and force it to return an exception instead -- and you can also use mocker.spy to check if it's been called or not. Ideally, a local ref resolver shouldn't have to grab anything via internet connection.

@lhenkelm
Copy link
Contributor Author

I'm trying to figure out or remind myself if this was removed to allow for a custom v1.0.0 to be loaded by the user and override the default defs.json that would have been shipped otherwise in this package. I guess I'm wondering if one should even allow this situation or not (probably not).

I don't remember an explicit discussion in that set of issues/PRs. I might have missed it though.
You are ofc. correct that this creates a bit of a trap if one provides a custom defs.json, with version '1.0.0', and the same schema-id, then your cache and your set schema will disagree, quite confusingly.

I guess strictly speaking this breaks some assumptions of what the version and schema-id mean, if someine changes the schema they should change at least one of those too.
But the simplest way of approaching a custom schema (copy the builtins and modify step by step) does lead straight into this trap :(.

Maybe we could add some logic to the schema-path setter that checks and warns/raises if some of the schemas found at the new path would cause this error? Because as long as the schema-id+version is unique to a definition, the cache should work fine.

@lhenkelm
Copy link
Contributor Author

lhenkelm commented Jul 11, 2022

For your tests, the easiest way is to monkeypatch any part of the necessary call to requests or urllib under the hood and force it to return an exception instead -- and you can also use mocker.spy to check if it's been called or not. Ideally, a local ref resolver shouldn't have to grab anything via internet connection.

Yeah this is why the tests the PR is currently proposing to monkey-patch requests.get (the default w/o handlers) and urlopen (the fallback), according to the jsonschema implementation.
Also deleting requests.sessions.Session.request and patching out socket.socket is just an attempt to anticipate any other workarounds. The sockets might be overkill though.

@matthewfeickert
Copy link
Member

@lhenkelm just to check, is this ready for review? I assume yes, but I just wanted to make sure that you didn't have anything that you wanted to come back and add before it got reviewed and went in.

@lhenkelm
Copy link
Contributor Author

@matthewfeickert having thought some more about @kratsg's concern about users overriding the default schema using the default version and id, I would like to add a bit of code to at least clear the cache if the schema path/version are changed, to preempt these confusions.

A second step might be to also merge the jsonschema.RefResolver's cache into the pyhf cache at the end of validation, to allow custom schema users to benefit from the caching implicitly. But that can be second PR too (it goes beyond just a fix).

I am a bit busy ATM but I'll get on this this week and ping you once I am done.

@kratsg
Copy link
Contributor

kratsg commented Aug 15, 2022

I wonder if we can simplify the PR via https://pypi.org/project/pytest-socket/ using the (socket_disabled) fixture.

@lhenkelm
Copy link
Contributor Author

@matthewfeickert from my side this is ready to review.
@kratsg socket_disabled works like a charm, thanks! If I drop the line pre-filling the cache, the test still catches the regression:

FAILED tests/test_schema.py::test_defs_always_cached - jsonschema.exceptions.RefResolutionError: A test tried to use socket.socket.

@lhenkelm lhenkelm force-pushed the prefill-schema-cache branch from 034d9e7 to 84342b4 Compare August 22, 2022 15:56
tests/test_schema.py Outdated Show resolved Hide resolved
Copy link
Member

@matthewfeickert matthewfeickert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @lhenkelm. I haven't stepped through this deeply, but it looks like you and @kratsg have iterated a bit, so I think if you can work out a solution with @kratsg's suggestion then this looks good!

tests/test_schema.py Outdated Show resolved Hide resolved
setup.py Outdated Show resolved Hide resolved
@matthewfeickert matthewfeickert added the tests pytest label Aug 22, 2022
tests/test_schema.py Outdated Show resolved Hide resolved
tests/test_schema.py Outdated Show resolved Hide resolved
tests/test_schema.py Outdated Show resolved Hide resolved
setup.py Outdated Show resolved Hide resolved
lhenkelm and others added 3 commits August 24, 2022 11:11
Copy link
Member

@matthewfeickert matthewfeickert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. 👍 Thanks very much for your contribution @lhenkelm!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fix A bug fix tests pytest
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Schema validation crashes when running in an environment without internet access
3 participants