Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added pooch registries for data files #4098

Merged
merged 31 commits into from
May 30, 2024
Merged

Conversation

santacodes
Copy link
Contributor

@santacodes santacodes commented May 16, 2024

Description

  • Added pooch as a [dev] dependency and added a file to create and store data file registries for pybamm-data.
  • Added dataloader method to fetch data files from pybamm-data and store it under local cache folder and import it in notebooks wherever needed with specified paths.
  • Removed data files from local repo under pybamm/input
  • Data files under pybamm/input/parameters have not been removed or included in the pooch registries.
  • Added documentation, tests and an example notebook for pybamm.DataLoader class.

Fixes #4026

Type of change

Please add a line in the relevant section of CHANGELOG.md to document the change (include PR #) - note reverse order of PR #s. If necessary, also add to the list of breaking changes.

  • New feature (non-breaking change which adds functionality)
  • Optimization (back-end change that speeds up the code)
  • Bug fix (non-breaking change which fixes an issue)

Key checklist:

  • No style issues: $ pre-commit run (or $ nox -s pre-commit) (see CONTRIBUTING.md for how to set this up to run automatically when committing locally, in just two lines of code)
  • All tests pass: $ python run-tests.py --all (or $ nox -s tests)
  • The documentation builds: $ python run-tests.py --doctest (or $ nox -s doctests)

You can run integration tests, unit tests, and doctests together at once, using $ python run-tests.py --quick (or $ nox -s quick).

Further checks:

  • Code is commented, particularly in hard-to-understand areas
  • Tests added that prove fix is effective or that feature works

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

pybamm/pybammdata.py Outdated Show resolved Hide resolved
pybamm/pybammdata.py Outdated Show resolved Hide resolved
pybamm/pybammdata.py Outdated Show resolved Hide resolved
pybamm/pybammdata.py Outdated Show resolved Hide resolved
pybamm/pybammdata.py Outdated Show resolved Hide resolved
pybamm/pybammdata.py Outdated Show resolved Hide resolved
pyproject.toml Outdated Show resolved Hide resolved
@agriyakhetarpal
Copy link
Member

Additionally, could you use the function you added in the failing tests as well?

Copy link

codecov bot commented May 18, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 99.54%. Comparing base (739a1c0) to head (92f988a).
Report is 2 commits behind head on develop.

Current head 92f988a differs from pull request most recent head a7a4926

Please upload reports for the commit a7a4926 to get more accurate results.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #4098      +/-   ##
===========================================
- Coverage    99.55%   99.54%   -0.02%     
===========================================
  Files          287      288       +1     
  Lines        21753    21767      +14     
===========================================
+ Hits         21657    21668      +11     
- Misses          96       99       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@agriyakhetarpal agriyakhetarpal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this, @santacodes! Could you fix the coverage by adding additional tests for the get_data method (to check the NameError that you raised), and for the checksum option in the show_registry method? @valentinsulzer, could you review this as well?

Besides this, I will recommend building the source distribution and the wheels on your fork by triggering the manual job we have available, and locally using python -m build to ensure that we aren't missing out on any files and that the sizes of the distributions are alright.

tests/unit/test_pybamm_data.py Outdated Show resolved Hide resolved
tests/unit/test_pybamm_data.py Outdated Show resolved Hide resolved
tests/unit/test_pybamm_data.py Outdated Show resolved Hide resolved
tests/unit/test_pybamm_data.py Outdated Show resolved Hide resolved
tests/unit/test_simulation.py Show resolved Hide resolved
pybamm/pybamm_data.py Outdated Show resolved Hide resolved
pybamm/pybamm_data.py Show resolved Hide resolved
pybamm/pybamm_data.py Outdated Show resolved Hide resolved
pybamm/pybamm_data.py Outdated Show resolved Hide resolved
pybamm/pybamm_data.py Outdated Show resolved Hide resolved
Copy link
Member

@agriyakhetarpal agriyakhetarpal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me now, thanks, @santacodes! I'll wait for some additional thoughts on this before we merge.

cc @cringeyburger @prady0t – this PR has removed (most of) the PyBaMM data files from the PyBaMM source code. Testing the built wheels using CIBW_TEST_COMMAND should now be a bit easier. I feel that the pytest --pyargs pybamm invocation would be better than a pybamm.test() one – that way we would not need to move the tests/ directory inside pybamm/

Copy link
Member

@arjxn-py arjxn-py left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've no further concerns with this one, good work @santacodes.
@agriyakhetarpal are we good to merge this or we still seek +1 review?

@agriyakhetarpal
Copy link
Member

I think this is good to go on my end, but I just had a question about skipping tests without networks – @santacodes, the pytest decorator + no_internet_connection() we have will prevent the tests from failing, but this is just for the unit and the integration tests. Is there a way we can cleanly extend this behaviour for the example scripts and notebooks which make use of the data files, in specific? It's not a big deal though, because we expect that most people have internet access nowadays somehow or the other.

@santacodes
Copy link
Contributor Author

I think this is good to go on my end, but I just had a question about skipping tests without networks – @santacodes, the pytest decorator + no_internet_connection() we have will prevent the tests from failing, but this is just for the unit and the integration tests. Is there a way we can cleanly extend this behaviour for the example scripts and notebooks which make use of the data files, in specific? It's not a big deal though, because we expect that most people have internet access nowadays somehow or the other.

For notebooks and scripts, I am thinking we could add an error-handling method to raise an Error, pooch raises socket.gaierror by default when there is no network connection as it cannot create a socket connection with the upstream URL.
The second option would be to check for the internet connection before actually fetching data files from upstream and warning the user about network connectivity.
I think both can be implemented, but I am not sure which would be more optimal for scripts and notebooks specifically.

@agriyakhetarpal
Copy link
Member

For notebooks and scripts, I am thinking we could add an error-handling method to raise an Error, pooch raises socket.gaierror by default when there is no network connection as it cannot create a socket connection with the upstream URL. The second option would be to check for the internet connection before actually fetching data files from upstream and warning the user about network connectivity. I think both can be implemented, but I am not sure which would be more optimal for scripts and notebooks specifically.

I thought of that but both of these options should ideally stay outside and abstracted away from the content inside the notebooks and scripts, which isn't quite viable and seems over-engineered... I opine that we should skip them for now. We can help out on a per-user basis if someone tries to use nox -s scripts or nox -s examples without an internet connection. Another reason why I say this is because we do not include any of the example scripts or notebooks inside the sdist or the wheel, which means that people have to come to GitHub or browse the online documentation to access them anyway, and accessing GitHub requires access to an internet connection.

Copy link
Contributor

@kratman kratman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor changes to the tests

tests/unit/test_pybamm_data.py Outdated Show resolved Hide resolved
tests/unit/test_pybamm_data.py Outdated Show resolved Hide resolved
Copy link
Contributor

@kratman kratman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For any other reviewers: I checked the link failure. It should start passing once this PR is merged. The link checker goes with the stable documentation, so it cannot see the new page

santacodes and others added 2 commits May 27, 2024 19:18
Co-authored-by: Eric G. Kratz <kratman@users.noreply.github.com>
@kratman kratman dismissed their stale review May 27, 2024 14:00

Changes made

@agriyakhetarpal
Copy link
Member

Maybe we should refer to the reST-based heading for PyBaMM Data instead of adding an HTTP(S) link to our documentation? Otherwise, this failing link will keep failing for all the release candidates until we have the v24.5 release out – the link is pointing to stable, not latest. @santacodes, could you make this change in the notebook?

@agriyakhetarpal
Copy link
Member

Changing to ../api/pybamm_data.rst won't help – could you add a heading under the PyBaMM Data page and use reST syntax to link to it? Something under this SO post should help

Copy link
Contributor

@kratman kratman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Marking @agriyakhetarpal's last comment as a change request to keep this from being merged until it is fixed:

Changing to ../api/pybamm_data.rst won't help – could you add a heading under the PyBaMM Data page and use reST syntax to link to it? Something under this SO post should help

@santacodes
Copy link
Contributor Author

Changing to ../api/pybamm_data.rst won't help – could you add a heading under the PyBaMM Data page and use reST syntax to link to it? Something under this SO post should help

I tried referencing headers, but sphinx doesn't seem to render it as a link no matter what, even when I reference the pybamm_data.rst to reference the documentation header as they are not under the same directory. It only works by referencing the notebook's path like in my last commit, or by referencing docs/source/api/pybamm_data.rst:#pybamm.DataLoader. They both mean the same thing, so I will leave it up to you to decide further.

@agriyakhetarpal
Copy link
Member

I think this should be okay, then, thanks for the explanation. Not sure if there is a better way to link them unless we switch to MyST completely for the codebase, which is something for a later time

@agriyakhetarpal agriyakhetarpal dismissed kratman’s stale review May 30, 2024 01:31

Based on the reasoning provided we can go forward with this. The documentation is able to link to the requested page without issues

@kratman kratman merged commit a553fe6 into pybamm-team:develop May 30, 2024
24 checks passed
@santacodes santacodes deleted the data branch June 16, 2024 08:39
js1tr3 pushed a commit to js1tr3/PyBaMM that referenced this pull request Aug 12, 2024
* added pooch

* removed getpath()

* added data_loader to notebooks and scripts

* removed data files in the registry

* added docs and implemented code review suggestions

* added example notebook, unit tests and edited changelog

* fixed pouch cell notebook

* Apply suggestions from code review

Co-authored-by: Agriya Khetarpal <74401230+agriyakhetarpal@users.noreply.github.com>

* style: pre-commit fixes

* added more test coverage, fixed notebook and code review suggestions

* added bib references

* fixed doctest and script test errors

* fixed codacy warnings

* using sockets for checking network

* code review suggestions along with notebook update

* Apply suggestions from code review

Co-authored-by: Eric G. Kratz <kratman@users.noreply.github.com>

* fixed RUF015

* changed link referenced and fixed lychee warning

* changed the data files link under input/ to pybamm-data repo

---------

Co-authored-by: Agriya Khetarpal <74401230+agriyakhetarpal@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Ferran Brosa Planella <Ferran.Brosa-Planella@warwick.ac.uk>
Co-authored-by: Arjun Verma <arjunverma.oc@gmail.com>
Co-authored-by: Eric G. Kratz <kratman@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Decide how to bundle .pickle files from COMSOL solution results data
5 participants