Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pass config to kernel from ExecutePreprocessor for configurable kernels #816

Merged
merged 5 commits into from
Aug 2, 2018

Conversation

mpacer
Copy link
Member

@mpacer mpacer commented May 17, 2018

This became a much larger PR due to my efforts to write tests for the original PR.

Most notably, this does the refactor that I proposed in @MSeal's #811 to make it easier to overwrite the preprocess method without allowing people to override the reliance on the superclass preprocess method on its own. This way all the overriding can occur without relying on the implementer recreating all the complex internal state logic (as was the case before).

It creates 2 new public methods, and adds a dynamic default to an existing traitlet:

  • new dynamic default for kernel_name:
    Now if no kernel_name has been supplied as a traitlet value, it internally checks for whether self.nb has been set, and raises an AttributeError if self.nb has not been set. The logic is that if you don't set it directly, then there is no meaningful way to describe what an ExecutePreprocessor's kernel should be other than if a notebook is already being processed. Previously this logic was inside the preprocess method. It gives the same priority to the traitlet defined value.
  • start_new_kernel: previously a callback inside preprocess; it has been reworked to rely on the dynamic default of kernel_name to decide which kernel to create. It also passes through any configuration from ExecutePreprocessor.
  • setup_preprocessor: a context manager that sets up the self.nb, self.km, and self.kc values and ensures that they are cleaned up after the execution is done. It also resets the self._display_id_map value which was previously being done in preprocess.

This also changes the value of the default kernel_manager_class function to match the older convention where its name should be _default_kernel_manager_class.

Smaller original PR:

Pass config to kernel from ExecutePreprocessor for configurable kernels

Right now, the way that we're instantiating the kernels means any configuration that can be passed to those kernels stops at the point where we actually create the Kernel.

If you want any custom Kernels to be configurable (e.g., to use a different kernel_spec_manager) then you need to be able to pass this through at instantiation.

@rgbkrk @takluyver @Carreau

setup_preprocessor is a contextmanager that handles setting and cleaning
up attributes needed by the rest of the execution logic (specifically
self.nb, self.km, self.kc).

start_new_kernel was previously a callback function and is now a
separate method used inside setup_preprocessor.

This also introduces a dynamic default for the `kernel_name` traitlet
that checks if self.nb is defined to see if it has a kernel name, and
otherwise will raise an AttributeError (since that means it is being
relied upon without any way to know which kernel is to be used.
moved _display_id_map help inside the declaration instead of comment

unified named imports from traitlets
@mpacer
Copy link
Member Author

mpacer commented May 19, 2018

I still need to add tests to make sure that configuration is passed through more directly. This is partially why I wanted the start_new_kernel to be an isolated function so that that can be tested without needing to do the entirety of executing a notebook.

@mpacer
Copy link
Member Author

mpacer commented May 19, 2018

Note: I think the correct pattern might be to pass through parent=self instead of config=self.config but I'm not sure where I can verify which pattern is preferred.

@takluyver
Copy link
Member

This looks sensible for now, but if we do adopt my WIP new kernel machinery (see #809), the ability to pick a kernel manager class will go away, because the kernel manager is something that the kernel provider gets to decide. I.e. you ask it to start a kernel of type spec/python3, and it gives you back a subprocess kernel manager.

The good news is that there should be less reason for the frontend to set the kernel manager class, because it affects less. In particular, it sounds like you want to configure it to affect finding and starting a kernel, which is all moved out of kernel manager in the new machinery.

For passing parent/config: it looks like there's some long forgotten magic in traitlets so if we use parent, it's possible to use config names like ExecutePreprocessor.KernelManager.autorestart=False. I could see that being useful in some situations, but... it's yet another level of complexity in our config system, and it further exposes the internals of our code to config. I'd be in favour of quietly getting rid of that option entirely and hoping no-one has noticed it's possible yet. ;-)

@rgbkrk
Copy link
Member

rgbkrk commented May 21, 2018

@takluyver I'd love to make a tradeoff here where we allow for the current machinery to work by adopting this while also moving ahead with your kernel provider work as the long term solution.

@takluyver
Copy link
Member

I'm fine with that, I just want to make sure that the motivation for adding this configurability is not something that we're going to irrevocably break if/when we do adopt the new machinery. If the motivation is to replace kernel_spec_manager, that's probably OK - the customisation will need to be redone for the new machinery, but it should still be possible to control how a kernel is found.

try:
nb, resources = super(ExecutePreprocessor, self).preprocess(nb, resources)
yield
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we yield the context variables nb, km, kc for more easier understanding and variable consumption downstream?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can yield the context variables… but the problem is that we still need all three (notebook (nb), kernel client (kc) and kernel mangager (km)) to be accessible via self later on:

https://github.com/mpacer/nbconvert/blob/0de8f30404a99e63d5cc22e31cb4a984cc201beb/nbconvert/preprocessors/execute.py#L374-L375

        for cell_idx, output_indices in self._display_id_map[display_id].items():
            cell = self.nb['cells'][cell_idx]

https://github.com/mpacer/nbconvert/blob/0de8f30404a99e63d5cc22e31cb4a984cc201beb/nbconvert/preprocessors/execute.py#L392

                msg = self.kc.shell_channel.get_msg(timeout=timeout)

https://github.com/mpacer/nbconvert/blob/0de8f30404a99e63d5cc22e31cb4a984cc201beb/nbconvert/preprocessors/execute.py#L413-L427

    def run_cell(self, cell, cell_index=0):
        msg_id = self.kc.execute(cell.source)
        self.log.debug("Executing cell:\n%s", cell.source)
        exec_reply = self._wait_for_reply(msg_id, cell)

        outs = cell.outputs = []

        while True:
            try:
                # We've already waited for execute_reply, so all output
                # should already be waiting. However, on slow networks, like
                # in certain CI systems, waiting < 1 second might miss messages.
                # So long as the kernel sends a status:idle message when it
                # finishes, we won't actually have to wait this long, anyway.
                msg = self.kc.iopub_channel.get_msg(timeout=self.iopub_timeout)

https://github.com/mpacer/nbconvert/blob/0de8f30404a99e63d5cc22e31cb4a984cc201beb/nbconvert/preprocessors/execute.py#L393-L399

            except Empty:
                self.log.error(
                    "Timeout waiting for execute reply (%is)." % self.timeout)
                if self.interrupt_on_timeout:
                    self.log.error("Interrupting kernel")
                    self.km.interrupt_kernel()
                    break

So I don't know how helpful it will be to yield them only to then ignore them.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm up for making a larger refactor to remove this dependence on instance attributes, but that should be in another PR.

In the meantime do you still want me to yield these for better downstream consumption and easier understanding even if they will immediately be assigned to self? (also, because of the way that the dynamic default works for kernel_name we need to make sure that self.nb is assigned before we start up the kernel (or we need to manually extract the kernel name and pass it through).


The input argument `nb` is modified in-place.
This creates The input argument `nb` is modified in-place.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and is?
Also is it creating nb? Seems like it's assigning nb to self not creating it.

finally:
self.kc.stop_channels()
self.km.shutdown_kernel(now=self.shutdown_kernel == 'immediate')

for attr in ['nb', 'km', 'kc']:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be in the finally block?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep!

@@ -242,45 +290,46 @@ def preprocess(self, nb, resources):
path = resources.get('metadata', {}).get('path', '')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can remove the if statement by adding or None to the end of this line

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good!

@mpacer
Copy link
Member Author

mpacer commented Jun 5, 2018

Note: I'd still like to test this better, but unfortunately, the way we've approached testing the execute preprocessor has made it much more difficult than I at first thought.

@mpacer
Copy link
Member Author

mpacer commented Jun 5, 2018

@MSeal thanks greatly for the review — I tried to address all of your concerns let me know if I misinterpreted any of them.

Copy link
Contributor

@MSeal MSeal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants