Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to clean/tests notebooks? #432

Closed
mwouts opened this issue Feb 12, 2020 · 12 comments
Closed

How to clean/tests notebooks? #432

mwouts opened this issue Feb 12, 2020 · 12 comments
Milestone

Comments

@mwouts
Copy link
Owner

mwouts commented Feb 12, 2020

This was initially a question by @lwasser on Twitter: Is anyone using a tool that "cleans" or tests @ProjectJupyter notebooks for things like extra imports, PEP 8, etc etc?

It would be interesting to describe what we can do with jupytext at the command line, and also, what can be done with pre-commit hooks.

Regarding the command line, Jupytext can pipe the Python representation of a notebook to another program, and either test it (e.g. flake8) or pipe it (e.g. black)

jupytext notebook.ipynb --check flake8 # test the notebook with flake8
jupytext notebook.ipynb --pipe black  # apply black to the notebook

That works wells for the programs that read the standard input. For the others, one can use {} for a placeholder for a temporary file name. A good reference on what is know to work is this test file:

https://github.com/mwouts/jupytext/blob/master/tests/test_black.py

I also think that one can use the pre-commit package to do this on every commit. I remember chatting about this with @phaustin at #142, where he mentioned that he was using Jupytext (through a custom Python script), together with black / reorder_python_imports and flake8.

Maybe we could write a short blog post (or a bash notebook with examples...) to cover this?

@lwasser
Copy link

lwasser commented Feb 14, 2020

@mwouts looking at the above, when u call 'jupytext notebookname.ipynb ' does it actually know to run the py version of the file? This is great!

@mwouts
Copy link
Owner Author

mwouts commented Feb 16, 2020

Yes, when you do jupytext --pipe or jupytext --check, the notebook is piped into the other program in the format --pipe-fmt, which defaults to auto:percent (so, yes, Python scripts with # %% cell markers in the case of Python notebooks).

I am interested in reviewing what can really help the users here... And learn how I can improve my own workflow, which is super basic at the moment 😄 :

  • I use paired notebooks, i.e. the .py file is updated automatically when I edit my notebook in Jupyter
  • Only the .py files are shared on the repo. The .ipynb files are ignored.
  • A pre-commit hook tests these .py file, and ensures that they follow PEP8.
  • When the check fails, I open the .py file in PyCharm, and reformat it using Shift+Ctrl+L (and reload the notebook in Jupyter).

Now tell me a bit more about what you are looking for:

  • which representation of the notebook do you want to see on the repo: .ipynb, .py, .md, or a combination of those?
  • what tests do you want to run on the notebook? Is the combination flake8 (test), and reorder-python-imports + black a good start?
  • do you want to write your own pre-commit hook, or instead, do you want to use the pre-commit package?

@mwouts
Copy link
Owner Author

mwouts commented Feb 19, 2020

In the (last paragraphs of the) TrackingJupyter Newletter #26, by Tony Hirst @psychemedia, I learn of isort, and also of pyforest. Have a read, it's interesting!

@psychemedia
Copy link

psychemedia commented Feb 24, 2020

Re: workflows, I'm exploring using ipynb paired to py to create py package files.

The ipynb version includes lots of active-ipynb cells that are used to essentially mask the following from the py file when loaded into other things as a module:

  • tests;
  • "workings-up": eg single lines of code tested, or groups of lines of code in a cell, prior to making a function to wrap them.

The notebook runs in all its horrible glory showing fragments of working, and the output from each line of code in a cell of its own.

If things break, when they break, I can rerun all the fragmentary pieces in the notebook and compare outputs, by eye, to the committed version of the notebook, so see what's different.

By the by, a couple of things that would improve my workflow not strictly Jupytext related but that complicate how I use Jupytext:

  • a notebook extension that lets me set default tags to be applied to each code cell (eg active-ipynb);
  • a notebook extension that lets me set the default for which sort of cell is created, by default, as a new cell. (In some notebooks, it'd be useful to have a markdown cell as a the default.)

@lwasser
Copy link

lwasser commented Mar 4, 2020

Very cool. I'm testing this out now but so far @mwouts it's actually working well!! so what i want to do is update our CI to test things. flake8 is awesome because it catches things like unused imports. i'm really excited to implement this in our workflow!

I wanted to show you the output of the above on one of my notebooks. im curious - is this a shell setting? it's different from when i just run it on the .py file from jupytext. it's more readable when i do it that way!

(earth-analytics-python) cu-biot-2-10:spring-2020-course-notebooks leahwasser$ jupytext in-class-demos/wk08-landsat-cloud-masks.ipynb  --check flake8
[jupytext] Reading in-class-demos/wk08-landsat-cloud-masks.ipynb
Command 'flake8 -' exited with code 1: b"stdin:18:80: E501 line too long (80 > 79 characters)\nstdin:21:80: E501 line too long (101 > 79 characters)\nstdin:22:80: E501 line too long (90 > 79 characters)\nstdin:26:80: E501 line too long (82 > 79 characters)\nstdin:31:80: E501 line too long (105 > 79 characters)\nstdin:36:80: E501 line too long (80 > 79 characters)\nstdin:37:80: E501 line too long (84 > 79 characters)\nstdin:38:80: E501 line too long (83 > 79 characters)\nstdin:41:80: E501 line too long (80 > 79 characters)\nstdin:47:80: E501 line too long (80 > 79 characters)\nstdin:48:80: E501 line too long (82 > 79 characters)\nstdin:49:80: E501 line too long (83 > 79 characters)\nstdin:53:80: E501 line too long (82 > 79 characters)\nstdin:61:1: F401 'matplotlib.patches as mpatches' imported but unused\nstdin:66:1: F401 'shapely.geometry.mapping' imported but unused\nstdin:81:80: E501 line too long (152 > 79 characters)\nstdin:87:80: E501 line too long (80 > 79 characters)\nstdin:108:80: E501 line too long (109 > 79 characters)\nstdin:113:80: E501 line too long (208 > 79 characters)\nstdin:115:80: E501 line too long (108 > 79 characters)\nstdin:115:109: W291 trailing whitespace\nstdin:116:80: E501 line too long (237 > 79 characters)\nstdin:116:238: W291 trailing whitespace\nstdin:120:80: E501 line too long (82 > 79 characters)\nstdin:121:80: E501 line too long (89 > 79 characters)\nstdin:124:80: E501 line too long (87 > 79 characters)\nstdin:125:80: E501 line too long (81 > 79 characters)\nstdin:126:80: E501 line too long (85 > 79 characters)\nstdin:130:80: E501 line too long (101 > 79 characters)\nstdin:131:80: E501 line too long (121 > 79 characters)\nstdin:135:80: E501 line too long (535 > 79 characters)\nstdin:139:80: E501 line too long (82 > 79 characters)\nstdin:142:31: W291 trailing whitespace\nstdin:145:80: E501 line too long (82 > 79 characters)\nstdin:145:83: W291 trailing whitespace\nstdin:146:80: E501 line too long (85 > 79 characters)\nstdin:146:86: W291 trailing whitespace\nstdin:147:80: E501 line too long (96 > 79 characters)\nstdin:154:79: W291 trailing whitespace\nstdin:155:80: E501 line too long (81 > 79 characters)\nstdin:163:54: W291 trailing whitespace\nstdin:164:46: E251 unexpected spaces around keyword / parameter equals\nstdin:164:48: E251 unexpected spaces around keyword / parameter equals\nstdin:164:60: W291 trailing whitespace\nstdin:169:80: E501 line too long (80 > 79 characters)\nstdin:170:74: W291 trailing whitespace\nstdin:174:80: E501 line too long (83 > 79 characters)\nstdin:174:84: W291 trailing whitespace\nstdin:175:80: E501 line too long (84 > 79 characters)\nstdin:176:80: E501 line too long (87 > 79 characters)\nstdin:176:88: W291 trailing whitespace\nstdin:177:80: E501 line too long (86 > 79 characters)\nstdin:177:87: W291 trailing whitespace\nstdin:179:80: E501 line too long (84 > 79 characters)\nstdin:179:85: W291 trailing whitespace\nstdin:184:80: E501 line too long (86 > 79 characters)\nstdin:189:64: W291 trailing whitespace\nstdin:190:54: W291 trailing whitespace\nstdin:191:78: W291 trailing whitespace\nstdin:192:43: W291 trailing whitespace\nstdin:193:80: E501 line too long (96 > 79 characters)\nstdin:203:80: E501 line too long (85 > 79 characters)\nstdin:204:80: E501 line too long (86 > 79 characters)\nstdin:210:25: W291 trailing whitespace\nstdin:224:26: W291 trailing whitespace\nstdin:232:80: E501 line too long (97 > 79 characters)\nstdin:233:80: E501 line too long (91 > 79 characters)\nstdin:234:80: E501 line too long (80 > 79 characters)\nstdin:235:69: W291 trailing whitespace\nstdin:238:80: E501 line too long (106 > 79 characters)\nstdin:244:15: W291 trailing whitespace\nstdin:247:80: E501 line too long (112 > 79 characters)\nstdin:247:113: W291 trailing whitespace\nstdin:248:80: E501 line too long (97 > 79 characters)\nstdin:249:33: W291 trailing whitespace\nstdin:252:80: E501 line too long (96 > 79 characters)\nstdin:252:97: W291 trailing whitespace\nstdin:253:97: W291 trailing whitespace\nstdin:254:80: E501 line too long (96 > 79 characters)\nstdin:254:97: W291 trailing whitespace\nstdin:255:80: E501 line too long (96 > 79 characters)\nstdin:255:97: W291 trailing whitespace\nstdin:256:80: E501 line too long (96 > 79 characters)\nstdin:256:97: W291 trailing whitespace\nstdin:257:80: E501 line too long (96 > 79 characters)\nstdin:257:97: W291 trailing whitespace\nstdin:258:80: E501 line too long (96 > 79 characters)\nstdin:258:97: W291 trailing whitespace\nstdin:259:80: E501 line too long (96 > 79 characters)\nstdin:259:97: W291 trailing whitespace\nstdin:260:80: E501 line too long (96 > 79 characters)\nstdin:260:97: W291 trailing whitespace\nstdin:261:80: E501 line too long (96 > 79 characters)\nstdin:261:97: W291 trailing whitespace\nstdin:262:80: E501 line too long (96 > 79 characters)\nstdin:262:97: W291 trailing whitespace\nstdin:263:80: E501 line too long (96 > 79 characters)\nstdin:263:97: W291 trailing whitespace\nstdin:264:80: E501 line too long (96 > 79 characters)\nstdin:264:97: W291 trailing whitespace\nstdin:265:80: E501 line too long (96 > 79 characters)\nstdin:265:97: W291 trailing whitespace\nstdin:268:80: E501 line too long (94 > 79 characters)\nstdin:270:80: E501 line too long (116 > 79 characters)\nstdin:273:80: E501 line too long (118 > 79 characters)\nstdin:274:80: E501 line too long (90 > 79 characters)\nstdin:274:91: W291 trailing whitespace\nstdin:278:80: E501 line too long (329 > 79 characters)\nstdin:281:80: E501 line too long (196 > 79 characters)\nstdin:281:197: W291 trailing whitespace\nstdin:285:80: E501 line too long (81 > 79 characters)\nstdin:293:80: E501 line too long (83 > 79 characters)\nstdin:294:80: E501 line too long (84 > 79 characters)\nstdin:300:80: E501 line too long (100 > 79 characters)\nstdin:302:80: E501 line too long (116 > 79 characters)\nstdin:320:80: E501 line too long (83 > 79 characters)\nstdin:325:80: E501 line too long (85 > 79 characters)\nstdin:328:80: E501 line too long (82 > 79 characters)\nstdin:336:80: E501 line too long (240 > 79 characters)\nstdin:337:80: E501 line too long (111 > 79 characters)\nstdin:338:80: E501 line too long (280 > 79 characters)\nstdin:341:80: E501 line too long (139 > 79 characters)\nstdin:341:140: W291 trailing whitespace\nstdin:348:80: E501 line too long (155 > 79 characters)\nstdin:355:80: E501 line too long (130 > 79 characters)\nstdin:360:80: E501 line too long (104 > 79 characters)\nstdin:364:80: E501 line too long (122 > 79 characters)\nstdin:369:80: E501 line too long (102 > 79 charact(earth-analytics-python) cu-biot-2-10:spring-2020-course-notebo(earth-analytics-python) cu-biot-2(earth-analytics-python) cu-biot-2-10:spring-2020-course-notebooks leahwasser(earth-analytics-python) cu-biot-2-10:spring-2020-course-no(earth-analytics-python) cu-biot-2-10:spring-20(earth-analytics-python) cu-biot-2-(earth-analytics-python) cu-biot-2-10:spring-(earth-analytics-p(earth-analytics-python) cu-biot-2-10:(earth-analytics-python) cu-biot-2-10:spring-2020-course-note(earth-anal(earth-analytic((earth-analytics-python) cu-biot-2-10:spring-2020-c(earth-analytics-python) cu-biot-2-10:spring-2020-course-noteb(earth-analytics-python) cu-biot-2-10:spring-2020-course-notebooks leahwasse(earth-analytics-python) cu-biot-2-10:spring-2020-c(earth-analytics-python) cu-biot-2-10:spring-2020(earth-analytics-python) cu-biot-2-10:spring-20(earth-analytics-python) cu-biot-2-10:spring-(earth-analytics-python) cu-biot-2-10:sprin(earth-analytics-python) cu-biot-2-10:spring-2020-course-notebooks leahwasser$ 

i have a lot of markdown text in my notebook which is why there are so many line to long errors. ruler to identify 79 line width doesn't work in markdown cells it seems.

@mwouts
Copy link
Owner Author

mwouts commented Mar 8, 2020

Hello @lwasser , thanks for your input.

  • Regarding Markdown cells being mapped to long python comments that are not pep8 compliant, actually I do have the same issue... typically when we write a Markdown cell in Jupyter, we don't insert line breaks. The Markdown renderer will wrap the text automatically, but in the internal representation there is no wrapping. Until now I addressed that issue by manually wrapping the comments in the py representation... I can't find a tool that would be able to wrap single line comments in Python, do you think it would be worth adding an option in Jupytext to do that?
  • I have noted your comment on the output of the flake8 command being printed in binary mode (i.e. without the line breaks), I can reproduce that and I will soon contribute a fix for this.

@mwouts
Copy link
Owner Author

mwouts commented Mar 9, 2020

At #450 I learnt of yapf, we could give it a try and see if it can automatically wrap the long lines in the Markdown cells.

@mwouts
Copy link
Owner Author

mwouts commented Mar 9, 2020

The upcoming version 1.4.0 will slightly improve the output of jupytext --exec (but, 1) unfortunately, yapf does not solve the issue of long comments... and 2) upgrade with care, because the extension for JupyterLab is not compatible with JLab 1.x any more):

> jupytext --pipe "yapf -i {}" --check flake8 .\Untitled.ipynb
[jupytext] Reading .\Untitled.ipynb
[jupytext] Executing yapf -i C:\Users\Marc\AppData\Local\Temp\Untitled_lw8hmygz.py
[jupytext] Executing flake8 -
stdin:17:128: E501 line too long (371 > 127 characters)
[jupytext] Error: The command 'flake8 -' exited with code 1

@lwasser
Copy link

lwasser commented Mar 9, 2020

ahhhh @mwouts being able to lint and handle long lines has been an ongoing challenge in notebooks. I've been searching for a way to handle that. if jupytext could help with it, that would be wonderful. Would it be able to do it in a way that wouldn't impact markdown rendering? things like multi line "quotes > and such? this would be a wonderful feature!

I downloaded a ruler add on for labs last week and it didn't support version 2.0 so i couldn't play with it. but for notebook i can attest to the fact that you can not use ruler in markdown cells which makes flake8 unhappy. i'd love to be able to manage this more easily and to test for it too!!

@mwouts
Copy link
Owner Author

mwouts commented Mar 13, 2020

being able to lint and handle long lines has been an ongoing challenge in notebooks.

I've not had time to test this throughfully, but I think that pandoc may be able to do that. Could you give a try to this?

jupytext --sync notebook.ipynb --pipe-fmt ipynb --pipe 'pandoc --from ipynb --to ipynb --atx-headers'

In my quick test, it was able to

  • wrap paragraphs
  • wrap items
  • and quotes

Also, I see that I introduced a regression in the CLI, as the output of the piped command in Jupytext 1.4.0 is now displayed on the shell... I'll fix that soon.

@Skylion007
Copy link
Contributor

I use nbstripout pre-commit hook to clean the output of notebooks currently for habitat-sim.

@mwouts
Copy link
Owner Author

mwouts commented Dec 9, 2021

Now we can do this with the pre-commit hook. The documentation is not very verbose at the moment (we have plans to improve this), but we do have (tested) examples at e.g. https://github.com/mwouts/jupytext/blob/main/tests/test_pre_commit_5_reformat_markdown.py

@mwouts mwouts closed this as completed Dec 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants