We love contributors, so first and foremost, thank you! We're actively working on our contributing guidelines, so this document is subject to change. First things first: we adhere to the Contributor Covenant Code of Conduct, so please read through it before contributing.
- Types of Tutorials
- Dev Setup
- Making Changes to an Existing Tutorial
- Adding a New Tutorial
- Testing Changes Locally
- Previewing Changes to the Website
Currently, we have notebook-based tutorials and script-based tutorials.
Both types are referenced in this guide.
Notebook-based tutorials act as walkthroughs of concepts
(the spam
tutorial is a good example).
Script-based tutorials act more as examples of how to deploy certain Snorkel functionality
(the drybell
tutorial is a good example).
We default to using notebook-based tutorials.
For dev setup, you will need to install tox
, and set up a virtualenv with all the requirements.
For example, if you use pip
, and want to work on the spam
tutorial:
python3 -m pip install -U 'tox>=3.13.0,<4.0.0'
python3 -m pip install --upgrade virtualenv
virtualenv -p python3 .env
source .env/bin/activate
python3 -m pip install -r requirements.txt
python3 -m pip install -r spam/requirements.txt # Change based on tutorial.
Start jupyter from the virtualenv to make sure the kernel has all the required dependencies.
First, we recommend posting an issue describing the improvement or fix you want to make. Once you've worked out details with the maintainers, follow these general steps:
- Make your changes to the source files
- For notebook-based tutorials, we recommend making changes to the
.py
version then syncing changes withtox -e my_tutorial_dir -- sync
. Alternatively, if you have already run all the cells in your browser, you can selectFile
→Jupytext
→Pair Notebook with percent Script
to save the outputs directly to the notebook version. After saving, unpair the notebook withFile
→Jupytext
→Unpair notebook
so jupyter does not keep updating the notebook when all cells haven't been run. - For script-based tutorials, just make the changes as you normally would.
- For notebook-based tutorials, we recommend making changes to the
- Test your changes locally
- Submit a PR!
Before adding a new tutorial, we recommend posting a proposal to the Snorkel community forum on Spectrum. Once you've worked out details with the maintainers, follow these general steps:
- Create a new top-level directory (e.g.
my_tutorial_dir
) - Add a tutorial configuration file to your tutorial directory
- For notebook-based tutorials, add a file called
.notebooks
to your tutorial directory and add the base name of each tutorial script/notebook pair (e.g.my_tutorial
) as a separate line in.notebooks
. Seespam
for an example. - For script-based tutorials, add a file called
.scripts
to your tutorial directory and add the file name of each tutorial script (e.g.my_tutorial.py
) as a separate line in.scripts
. Seedrybell
for an example.
- For notebook-based tutorials, add a file called
- Add a
requirements.txt
to your directory if additional ones are needed - Add a command to
[testenv]
intox.ini
by copyingspam
and add therequirements.txt
file if necessary. Also add the command name toenvlist
. - Write your tutorial!
- For notebook-based tutorials, write your tutorial either as a Python script (e.g.
my_tutorial_dir/my_tutorial.py
) in Jupytext percent format or a Jupyter notebook.- Run
tox -e my_tutorial_dir -- sync
to generate a notebook version from the Python script version (or if you have run all cells, you can selectFile
→Jupytext
→Pair Notebook with percent Script
to save the outputs directly to the notebook version, and then unpair it withFile
→Jupytext
→Unpair notebook
so jupyter does not keep updating the notebook when all cells haven't been run). Do this to update the notebook whenever changes are made to the tutorial script. - Run
tox -e my_tutorial_dir -- sync --py
to generate a Python script version from the notebook version. Run this command to update when changes are made to the tutorial notebook.
- Run
- For script-based tutorials, write your tutoral as a Python script.
- For notebook-based tutorials, write your tutorial either as a Python script (e.g.
- Test your changes locally
- Submit a PR! Make sure to include a reference to the Spectrum planning thread.
You can test changes to a specific tutorial by running tox -e my_tutorial
where my_tutoral
is
replaced by the corresponding environment name in tox.ini
.
For scripts and notebooks, this will check that they execute without erroring.
For notebooks only, this will also check that any URLs in Markdown cells are reachable and that
the .ipynb
versions match the .py
versions.
Travis will also always run tox -e style
to check code style and formatting, so you sould always
run this locally as well.
Running tox
on its own will test all tutorials, which can be extremely slow since some tutorials
(like recsys
) take a long time to run with a full dataset.
- To fix code formatting issues, run
tox -e fix
. - You might need to update packages for an environmnet (for example, if you update
requirements.txt
dependencies). Use the-r
command for this. For example, you can runtox -e spam -r -- sync
to rebuild thespam
environment and then run thesync
script. - Travis uses the
get_tox_envs.py
script to figure out whichtox
environments it needs to run to test a PR. Once you've commited your changes, you can preview the environments that Travis will execute by runningpython3 scripts/get_tox_envs.py --plan
.
All of the tutorials listed in .web.yml
are rendered on our website.
Details on configuring with .web.yml
are in that file.
In order to display tutorials as webpages, we convert the .ipynb
versions to Markdown.
To generate all Markdown files listed in .web.yml
, use tox -e markdown
.
This will generate files in the build
directory.
You can prevent cells from being rendered in Markdown by adding {"tag": ["md-exclude"]}
to the cell header in the .py
file.
This is useful for confusing setup cells or cells with difficult-to-render outputs.
For example:
# %% {"tag": ["md-exclude"]}
command.do_not_show()
this_line.will_not_appear()
You can also prevent cells from rendering output in Markdown by adding
{"tag": ["md-exclude-output"]}
to the cell header in the .py
file.
This is useful for cells that display warning messages and other confusing stuff.
For example:
# %% {"tag": ["md-exclude-output"]}
command.will_show()
this_line.will_appear()
print(my_object) # The output will not show