Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test notebooks #294

Closed
1 of 6 tasks
alejandrosame opened this issue Sep 13, 2020 · 6 comments
Closed
1 of 6 tasks

Test notebooks #294

alejandrosame opened this issue Sep 13, 2020 · 6 comments
Labels
Good first issue 🎓 Perfect for beginners, welcome to OpenMined! Status: In Progress 🌟 This is actively being worked on Type: Testing 🧪 Add testing or improving existing testing of a file, feature, or codebase

Comments

@alejandrosame
Copy link
Member

Description

When developing the codebase, it is easy for notebooks to become outdated. We should add test to our CI that warn us when notebooks stop working due to API changes. This is crucial, since notebooks are important pieces of documentation.

Type of Test

  • Unit test (e.g. checking a loop, method, or function is working as intended)
  • Integration test (e.g. checking if a certain group or set of functionality is working as intended)
  • Regression test (e.g. checking if by adding or removing a module of code allows other systems to continue to function as intended)
  • Stress test (e.g. checking to see how well a system performs under various situations, including heavy usage)
  • Performance test (e.g. checking to see how efficient a system is as performing the intended task)
  • Other...

Expected Behavior

When a contributor makes a change in PyDP, I want the CI to avoid merging the change until notebooks are up to date with the changes.

Additional Context

@alejandrosame alejandrosame added Good first issue 🎓 Perfect for beginners, welcome to OpenMined! Type: Testing 🧪 Add testing or improving existing testing of a file, feature, or codebase labels Sep 13, 2020
@alejandrosame alejandrosame mentioned this issue Sep 13, 2020
6 tasks
@chinmayshah99 chinmayshah99 added the Status: Available 👋 Available for assignment, who wants it? label Sep 14, 2020
@edwardelson
Copy link
Contributor

Hi @alejandrosame I wonder if you're working on this? Or are you open for newbies like me to try and tackle this?
Where do you want to park the notebook tests? Under tests/algorithms? or create another folder?

I created a file in tests/notebooks/test_carrots.py containing this:

import subprocess
import tempfile
def _exec_notebook(path):
    with tempfile.NamedTemporaryFile(suffix=".ipynb") as fout:
        args = ["jupyter", "nbconvert", "--to", "notebook", "--execute",
                "--ExecutePreprocessor.timeout=1000",
                "--output", fout.name, path]
        subprocess.check_call(args)
def test():
    _exec_notebook('../../examples/carrots_demo/carrots_demo.ipynb')

basically just to run carrots_demo.ipynb when doing py.test. Although maybe need to add assert certain values in the notebook itself to check for correctness? Not sure if this is aligned with your thoughts.

@alejandrosame
Copy link
Member Author

@edwardelson you are more than welcomed to work on the task!

As far as I can tell, all notebooks are under ./examples and every future notebook should also en up somewhere under that folder (@chinmayshah99, just to double check).

What I had envisioned was a generic version of that script that:

  1. Takes as an env parameter paths to search for notebooks.
  2. Searches for the notebook files (those with extension .ipynb) and confirms that run without raising errors.

That way the configuration can be maintained in the repository .env file and the script can easily be called from a make target and a dedicated workflow file. The script and the new workflow YAML will already suffice to fulfill the goal of this task (to help keeping track of outdated notebooks).

@alejandrosame
Copy link
Member Author

On another hand, you raise an important point with:

basically just to run carrots_demo.ipynb when doing py.test. Although maybe need to add assert certain values in the notebook itself to check for correctness? Not sure if this is aligned with your thoughts.

Since notebooks are prepared with an expected outcome in mind, they can serve as additional regression/validation tests. However, I find it worth creating a separate issue due to workload (it will require changing current notebooks and documenting in guidelines so future contributors can apply it to new notebooks).

Whatever the possible solutions are, we need to keep in mind that they are primarily meant as demos to learn how to use the library, so we shouldn't sacrifice readability when we already have tests in place.

So yeah, if you also are interested in working on that, I would suggest that we make a new issue to properly tackle it!

@edwardelson edwardelson mentioned this issue Sep 15, 2020
4 tasks
@edwardelson
Copy link
Contributor

edwardelson commented Sep 15, 2020

I've submitted a PR @alejandrosame. Not sure how to encode the path in env though (still hardcoded the example/ path), need your advice.

Yes I agree that notebook validation test would probably be another interesting issue. Maybe we can use papermill to inject values and check the output of these notebooks without adding plenty of assertions in the notebook? (I've never used papermill yet though but been reading about it)

@chinmayshah99
Copy link
Member

Maybe we can use papermill to inject values and check the output of these notebooks without adding plenty of assertions in the notebook?

Is this similar to hypothesis?

@chinmayshah99 chinmayshah99 added Status: In Progress 🌟 This is actively being worked on and removed Status: Available 👋 Available for assignment, who wants it? labels Sep 20, 2020
@alejandrosame
Copy link
Member Author

Not sure how to encode the path in env though (still hardcoded the example/ path), need your advice.

I mean reading the PATH from an environment variable like shown here.

If you find yourself too lost with that part, simply leave the path defined in a variable (instead of directly hard-coded in the function call) and a new task to integrate it with the .env file can be filed. However, I encourage you to ask for more guidance, if you have time for it :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Good first issue 🎓 Perfect for beginners, welcome to OpenMined! Status: In Progress 🌟 This is actively being worked on Type: Testing 🧪 Add testing or improving existing testing of a file, feature, or codebase
Projects
None yet
Development

No branches or pull requests

3 participants