Disable IPython History in executing preprocessor #1017

alexrudy · 2019-05-07T13:54:23Z

IPython has a feature where command history is written to a SQLite database. When running multiple kernels in parallel, this feature runs into problems, as the SQLite database will be open from several processes, and sometimes hits a locking error.

To fix this, we have disabled SQLite history in IPython Kernels launched from the preprocessing executor, as recommended by ipython/ipython#11460.

This issue was originally brought up in the context of multiprocessing (see nteract/papermill#239) and this PR partially solves that problem, and adds a test to prove that two notebook kernels, started from the same process can be run simultaneously.

I am working on a separate PR to test and maintain multiprocessing support, as there is at least one more issue to be solved.

Thanks @MSeal for all of the help and guidance during PyCon sprints to figure this out.

IPython saves history in an SQLite DB. This can break in unfortunate ways when running multiple IPython kernels simultaneously. NBConvert will now default to disabling history saving in IPython kernels.

MSeal

Thanks for the fix! Looking forward the the final PR for the zeromq issues around multiprocessing. Will wait for @mpacer to review before merging.

MSeal · 2019-05-08T00:41:19Z

nbconvert/preprocessors/tests/test_execute.py

+    The two notebooks spawned here use the filesystem to check that the other notebook
+    wrote to the filesystem."""
+
+    opts = dict(kernel_name="python")


Did you end up needing to overwrite the kernel?

I'm not sure what part of this you are thinking about here?

Kernels (and zmq.Context, the underlying culprit) are thread-safe, but not process safe – I am working on some other tests which demonstrate and then fix that for nbconvert, and thinking about how to best demonstrate it to upstream jupyter/jupyter_client.

I meant that this link is replacing the kernel from the notebook, and was curious if the line was actually needed.

Ahh, yes, I remember discussing. It is probably unnecessary, I’ll double check.

Indeed, it was redundant. To better match the other tests here, I removed the kernel spec from the notebook file. Is that okay with you? It might make it run on python2, but I think this should work on both python2 and python3. Though my understanding kernel names is admittedly weak!

MSeal · 2019-05-08T00:42:13Z

nbconvert/preprocessors/tests/files/Parallel Execute.ipynb

+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "other_notebook = {'A':'B', 'B':'A'}[this_notebook]\n",


Might be good to add a comment about how this works with the A waiting for B file and visa versa

Also you probably want a comment that this_notebook is undefined and injected by the test

👍 I've added some comments to the notebook to explain a bit about how it works.

MSeal · 2019-05-08T00:44:58Z

nbconvert/preprocessors/execute.py

@@ -220,6 +220,20 @@ class ExecutePreprocessor(Preprocessor):
            )
    ).tag(config=True)

+    ipython_hist_file = Unicode(
+        default_value=':memory:',


👍 We should have probably had this default a long time ago regardless of parallelism concerns.

MSeal · 2019-05-12T22:55:48Z

Bumped priority for getting other reviewers, but I'll merge this next week if we don't get more eyes on it.

minrk

Great!

mpacer · 2019-05-13T17:08:24Z

nbconvert/preprocessors/execute.py

@@ -268,6 +282,8 @@ def start_new_kernel(self, **kwargs):
                'kernelspec', {}).get('name', 'python')
        km = self.kernel_manager_class(kernel_name=self.kernel_name,
                                       config=self.config)
+        if km.ipykernel and self.ipython_hist_file:


Is ipykernel an attribute of a KernelManager? It doesn't seem to be https://jupyter-client.readthedocs.io/en/stable/api/manager.html#jupyter_client.KernelManager

If not, I'd prefer to know where this is being assigned.

It is a property of the Kernel Manager but it isn't documented. I used it to encapsulate the logic about using an ipython specific argument, rather than doing my own check against the kernel name before passing this argument in.

mpacer · 2019-05-13T17:09:24Z

nbconvert/preprocessors/tests/test_execute.py

@@ -12,7 +12,10 @@
 import glob
 import io
 import os
+import logging


Why are we adding logging; I don't see it being used a the traitlets LogginConfigurable should have logging functionality that will do a better job of tracing where the logged messages come from.

My mistake – will remove.

mpacer · 2019-05-13T17:19:37Z

nbconvert/preprocessors/tests/test_execute.py

@@ -218,6 +224,12 @@ def assert_notebooks_equal(expected, actual):
        actual_execution_count = actual_cell.get('execution_count', None)
        assert expected_execution_count == actual_execution_count

+def notebook_resources():
+    res = ResourcesDict()
+    res['metadata'] = ResourcesDict()


Does this need to be a nested ResourcesDict()? I feel like we should have a convenience class for creating a ResourcesDict that is prepopulated a value for metadata rather than doing this.

The pattern is fine for right now, but it would be useful to have a docstring describing the purpose of this new function.

🤔 I took this snippet from test_run_all_notebooks and extracted it. I'm not actually sure what is required in the metadata, but I didn't want to copy/paste this around the tests in this file.

I'll add a quick docstring, to the effect of "prepares a notebook resources dictionary"

I'm fine with the test doing what it does and not blocking a merge on test helpers. It's not critical path code and it's readable for anyone who wants to tweak it in the future imo.

mpacer · 2019-05-13T18:04:52Z

nbconvert/preprocessors/tests/test_execute.py

    assert_notebooks_equal(input_nb, output_nb)


+def label_parallel_notebook(nb, label):


I think this would more idiomatically be treated as a custom preprocessor. You would pass in via resources the label value and then overwrite the preprocess method:

nbconvert/nbconvert/preprocessors/base.py

Lines 51 to 70 in 8af6aac

def preprocess(self, nb, resources):

"""

Preprocessing to apply on each notebook.

Must return modified nb, resources.

If you wish to apply your preprocessing to each cell, you might want

to override preprocess_cell method instead.

Parameters

----------

nb : NotebookNode

Notebook being converted

resources : dictionary

Additional resources used in the conversion process. Allows

preprocessors to pass variables into the Jinja engine.

"""

for index, cell in enumerate(nb.cells):

nb.cells[index], resources = self.preprocess_cell(cell, resources, index)

return nb, resources

That makes sense – do you think that is worthwhile for this set of tests?

I was hesitant to re-implement papermill's templating here, but happy to do so if that makes it more maintainable.

I share that hesitation just for testing purposes.

mpacer · 2019-05-13T18:07:29Z

nbconvert/preprocessors/tests/test_execute.py

+
+    Used for parallel testing to label two notebooks which are run simultaneously.
+    """
+    label_cell = nbformat.NotebookNode(


I think this would be cleaner if it used nbformat.v4's new_code_cell function:
https://github.com/jupyter/nbformat/blob/11903688167d21af96c92f7f5bf0634ab51819f1/nbformat/v4/nbbase.py#L98-L110

alexrudy · 2019-05-20T22:37:35Z

Follow up!

I think the following are the outstanding questions from the review of this PR:

Is it okay to use the ipython property of KernelManager – it is not a documented property, but seems like the best way to identify ipython kernels and avoid sending the history argument to other kernels.
Do I need to refactor notebook_resources() in the test cases?
Does the parallel test need to implement a full preprocessor for the test, and use that preprocessor in the notebook execution?

We've started the discussion for each of these questions above – I would love more feedback about all of them so that we can merge this PR (rather than having it get stale).

MSeal · 2019-05-23T07:40:14Z

Sorry we've been swamped after PyCon catching up. I agree let's not let this go stale and get some merges. On that note I'll follow-up on getting a jupyter_client release for the other fix you made.

I checked the first box. I think that's fine and I've got some more permissions on that project now to help maintain it.

The other two I find maintainable enough as is. I'm going to merge now and if anyone feels strongly enough they can open an issue or PR to address it differently.

Thanks for the improvement!

meeseeksmachine · 2019-08-09T04:11:36Z

This pull request has been mentioned on Jupyter Community Forum. There might be relevant details there:

https://discourse.jupyter.org/t/nbconvert-5-6-0-release/1867/1

Disable IPython history when preprocessing

cb2ef94

IPython saves history in an SQLite DB. This can break in unfortunate ways when running multiple IPython kernels simultaneously. NBConvert will now default to disabling history saving in IPython kernels.

alexrudy force-pushed the fix-history-multiprocessing branch from 53c190b to 81d3148 Compare May 7, 2019 14:50

Test for parallel execution of notebooks

e348ee1

alexrudy force-pushed the fix-history-multiprocessing branch from be4927d to e348ee1 Compare May 7, 2019 15:12

MSeal requested review from mpacer and MSeal and removed request for mpacer May 8, 2019 00:37

MSeal approved these changes May 8, 2019

View reviewed changes

alexrudy added 3 commits May 8, 2019 10:12

Fixup comments on parallel notebook test

e296eb5

Remove kernel specification

53653cb

Remove kernel spec from additional notebook

fb3780b

minrk approved these changes May 13, 2019

View reviewed changes

mpacer reviewed May 13, 2019

View reviewed changes

Review fixes

7b4a677

mpacer reviewed May 13, 2019

View reviewed changes

Use nbformat to set up notebook cells

0c7d7ba

MSeal merged commit 4d8efc0 into jupyter:master May 23, 2019

MSeal mentioned this pull request May 25, 2019

Optional store_history for ExecutePreprocessor.preprocess_cell #1032

Closed

LukaPitamic mentioned this pull request Jun 12, 2019

RuntimeError: Kernel didn't respond in 60 seconds, when trying to run papermill with python multiprocessing nteract/papermill#239

Closed

MSeal added this to the 5.6 milestone Jul 30, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disable IPython History in executing preprocessor #1017

Disable IPython History in executing preprocessor #1017

alexrudy commented May 7, 2019

MSeal left a comment

MSeal May 8, 2019

alexrudy May 8, 2019

MSeal May 8, 2019

alexrudy May 8, 2019

alexrudy May 9, 2019

MSeal May 8, 2019

MSeal May 8, 2019

alexrudy May 8, 2019

MSeal May 8, 2019

MSeal May 8, 2019

MSeal commented May 12, 2019

minrk left a comment

mpacer May 13, 2019

alexrudy May 13, 2019

mpacer May 13, 2019 •

edited

Loading

alexrudy May 13, 2019

mpacer May 13, 2019

alexrudy May 13, 2019

MSeal May 13, 2019

mpacer May 13, 2019

alexrudy May 13, 2019

MSeal May 13, 2019

mpacer May 13, 2019

alexrudy commented May 20, 2019 •

edited by MSeal

Loading

MSeal commented May 23, 2019

meeseeksmachine commented Aug 9, 2019

		assert_notebooks_equal(input_nb, output_nb)


		def label_parallel_notebook(nb, label):

	def preprocess(self, nb, resources):
	"""
	Preprocessing to apply on each notebook.

	Must return modified nb, resources.

	If you wish to apply your preprocessing to each cell, you might want
	to override preprocess_cell method instead.

	Parameters
	----------
	nb : NotebookNode
	Notebook being converted
	resources : dictionary
	Additional resources used in the conversion process. Allows
	preprocessors to pass variables into the Jinja engine.
	"""
	for index, cell in enumerate(nb.cells):
	nb.cells[index], resources = self.preprocess_cell(cell, resources, index)
	return nb, resources

Disable IPython History in executing preprocessor #1017

Disable IPython History in executing preprocessor #1017

Conversation

alexrudy commented May 7, 2019

MSeal left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MSeal commented May 12, 2019

minrk left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mpacer May 13, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alexrudy commented May 20, 2019 • edited by MSeal Loading

MSeal commented May 23, 2019

meeseeksmachine commented Aug 9, 2019

mpacer May 13, 2019 •

edited

Loading

alexrudy commented May 20, 2019 •

edited by MSeal

Loading