Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP Recreate old behaviour of ExecutePreprocessor(kernel_name="") #886

Closed
wants to merge 6 commits into from

Conversation

mpacer
Copy link
Member

@mpacer mpacer commented Sep 11, 2018

This fixes #878 and spatialaudio/nbsphinx#202.

We had an API before that allowed passing an empty string as your kernel name, which it would then fall back on getting the kernel name from the notebook metadata.

This is now the actual default behaviour, and before this PR setting kernel_name="" would overwrite this default. This reintroduces the fall-back as well as a warning that this behaviour shouldn't be used (you can pass no value to kernel_name and you will get the same behaviour).

To ensure the same behaviour in both cases, it introduces the find_kernel_name API, which is then used in both the default and fallback mechanisms.

This also introduces 2 tests and a new notebook file to be tested. In order to ensure that the functionality works I'm specifying a python3 kernel, which cannot be guaranteed to exist, so we have an expected fail in the case where there is no python3 kernel.

This is being introduced because we previously allowed users to pass ""
to indicate that the kernel name should be exclusively inferred from the
notebook itself. Doing so now overrides the default behaviour (which now
defaults to infering the kernel name from the notebook itself if nothing
is set). This is not a great pattern since it involves setting a
parameter to a magic value to keep the default behaviour.

This also issues a warning if you use the old way to do this.
In order to not use the default "python" behvaiour, I set the kernel to 
be "python3". However, that means this can only work if we have a 
python3 kernel available.
@mpacer
Copy link
Member Author

mpacer commented Sep 11, 2018

Note: spatialaudio/nbsphinx#208 updates nbsphinx to use the preferred means of getting a kernel from notebook metadata.

@mpacer
Copy link
Member Author

mpacer commented Sep 11, 2018

@mgeier @akhmerov @Juanlu001 This should address your concerns & test to avoid further regressions.

# Because of an old API, an empty kernel string should be interpreted as indicating
# that you want to use the notebook's metadata specified kernel.
# We use find_kernel_name to do this.
if self.kernel_name == u"":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to do if not self.kernel_name:? As a python developer I would expect that if "" is treated as falsy for a condition that None would be too. Or does the kernel_manager_class already handle None separately?

if not self.kernel_name:
  self.kernel_name = find_kernel_name(self.nb)
  if isinstance(self.kernel_name, string_types):
    warnings.warn(...)

Copy link
Member Author

@mpacer mpacer Sep 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After looking at it, I'm pretty sure None would wouldn't have worked previously… but I still don't want to encourage passing people to do this regardless. So we'll want to warn no matter what type the value was.

The reason is that it's specified as a Unicode traitlet so passing in None would cause a validation error.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So would checking for false kernel_name and declaring it deprecated in all cases make more sense from a function perspective then? I'm ok with merging if you feel strongly to just check for unicode string, it just had some code smell for something that might fail under similar situations -- like python version differences

Copy link
Member Author

@mpacer mpacer Sep 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The python version difference was why I made it specifically a unicode string (since it's a Unicode() traitlet). I pushed a more general check and added a test to make sure that None raises a TraitError.

@mgeier
Copy link
Contributor

mgeier commented Sep 12, 2018

This is all really confusing!

I don't see the necessity for dynamically changing the default value for kernel_name.

Why don't you just completely get rid of _kernel_name_default()?
The helper function find_kernel_name() isn't (currently) necessary either.

Then, in start_new_kernel() you could simply do something like this (untested!):

        if self.kernel_name:
            kernel_name = self.kernel_name
        else:
            kernel_name = self.nb.metadata.get('kernelspec', {}).get('name', 'python')

@MSeal
Copy link
Contributor

MSeal commented Sep 12, 2018

@mgeier I think I follow the suggestion. I'm still learning the nbconvert config system, so I don't know if there's side-effects that are important in assigning the default in the config and raising there for the AttributeError. But if there's not a secondary reason for this pattern, and nothing else would reuse this config then it would probably be simpler to move all the checks into start_new_kernel.

@minrk
Copy link
Member

minrk commented Sep 14, 2018

I would recommend one of two approaches:

  • remove dynamic default for kernel_name, so that the if not self.kernel_name logic in this PR is the behavior for both the default
  • go "full traitlets" and add, in addition to the dynamic default, a @validate('kernel_name') that coerces kernel_name = '' into find_kernel_name(). That way self.kernel_name is always defined as the correct kernel name, whether config specified it as empty==auto or didn't specify it at all.

I don't have a strong preference, but the advantage of the first is that it relies less on traitlets magic, while the advantage of the second uses traitlets to ensure that self.kernel_name is always available as the kernel name being used. An argument for case 1 is if it makes sense to re-use a Preprocessor instance across multiple exports, since the dynamic default will be constructed only on the first notebook it sees, not on each call. I don't recall if this use case makes sense, though.

Because we cannot guarantee that self.nb exists when kernel_name is set,
this requires some… finagling to handle the empty string correctly.

It would be better if we could lazily query the value rather than need
to specify at the time of setting.
@mpacer
Copy link
Member Author

mpacer commented Sep 15, 2018

Thanks for weighing in!

I went with the second suggestion and ran into an issue. Because we set the kernel_name at the time of creating the Preprocessor, we run into an issue where we cannot set it to the correct value until the notebook is present to be able to query.

If we could lazily load the value, that would work. I think we can have a blend of the two approaches, where the empty string coercion is done at the time of evaluation and we still include validation to check that something is either an empty string, 'python' or in the list of available kernel specs (the python comes from the standard KernelManager behaviour).

That's going to be my next attempt at doing this. In talking with @MSeal about this we arrived at the idea that it could be helpful to have a method that explicitly defines these conventions so that they could be easily overwritten by subclasses without needing to change other parts of the machinery.

In any case I'll get back to this tomorrow.

@mgeier
Copy link
Contributor

mgeier commented Sep 15, 2018

Using @validate('kernel_name') sounds slightly more reasonable than using @default('kernel_name'). But still, it doesn't really make sense. The notebook is passed to the ExecutePreprocessor in the method preprocess(self, nb, resources, km=None). Validating the kernel_name before this method is called doesn't make sense, because at that point the notebook isn't known yet.

It makes much more sense to keep kernel_name='' and then validate it within the preprocess() method.

Or am I missing something?

@MSeal
Copy link
Contributor

MSeal commented Sep 15, 2018

FYI telling someone their design doesn't make sense comes off a bit aggressive, even when the suggested changes are valid suggestions.

I would argue passing an empty string as a signal/default coercion behavior is not usually a good practice. None or absence from kwargs entirely would be more Pythonic, but for compatibility to capture u"" falsey is a reasonable fallback. Having a validate that checks that you passed something we can cooerce into a kernel, or is a valid kernel name as @mpacer described sounds reasonable to me. Having a common function to convert placeholders (or empty strings) to actual kernel names JIT or as a function wrapper also seems the most robust and reusable since there is more than one place that would likely want this behavior.

Given this is just a backwards compatibility item, I wouldn't stress too much on exact details as we I believe we want to drop explicit u"" matching in the future anyway.

@mgeier
Copy link
Contributor

mgeier commented Sep 17, 2018

I would argue passing an empty string as a signal/default coercion behavior is not usually a good practice.

There are certainly cases where it is not a good practice, but this case is not one of them.

There is no reasonable interpretation of different intent whether a user passes an empty string or nothing at all for kernel_name. Those two cases are for all intents and purposes equal.

Sure, the original author could have chosen None as default value, that would have been fine, too.

Having a common function to convert placeholders (or empty strings) to actual kernel names JIT or as a function wrapper also seems the most robust and reusable since there is more than one place that would likely want this behavior.

That would be totally fine and probably useful, as long as this function takes the notebook in question as an argument or if this function is a method of an object that represents a notebook.

I wouldn't stress too much on exact details as we I believe we want to drop explicit u"" matching in the future anyway.

Please don't.

FYI telling someone their design doesn't make sense comes off a bit aggressive, even when the suggested changes are valid suggestions.

I'm sorry if I sounded aggressive, I didn't intend to.

I'm not a native speaker, I didn't know of another way to say it and I didn't know that somebody could interpret this as aggressive.
I just wanted to state the fact that the discussed change would be logically inconsistent and simply bad software engineering, I thought colloquially using "doesn't make sense" would transport this sentiment quite succinctly.

I've looked a bit more into this and found that the logical error was committed much earlier than this pull request or the commit it tries to fix.
The real problem is that the notebook should not be assigned to self.nb.
The ExecutePreprocessor should only hold state based on its initial configuration. It shouldn't hold state about the currently processed notebook nor about the currently running kernel.
The current notebook should only be known (and the kernel should only be running) during the runtime of the preprocess() method. If some runtime information is needed in other methods, it should be passed there via function arguments.

It is also quite confusing that there is a method named preprocess_cell() and a very similarly named method run_cell() as well.

The method preprocess_cell() is normally (in the other preprocessors) run repeatedly from within the method preprocess(). So here it should either be removed (or deactivated somehow since it would also be inherited from the base class) or it should be combined with run_cell().

[Update: it would probably be easier to understand if run_cell() would be incorporated into preprocess_cell().]

preprocess_cell() has a resources argument, so the obvious place to put the runtime state (running kernel and reference to the notebook if needed) would be this resources argument.
But it's probably simpler to disable preprocess_cell() and pass all the necessary information to run_cell() as arguments.

Once this problem is resolved, our discussion here will become moot. But in the meantime, can we please keep kernel_name='' as a default that can also be passed by a user?

@mgeier
Copy link
Contributor

mgeier commented Sep 17, 2018

Update: I just realized that preprocess_cell() is still called by the base class, as it is supposed to be.

In this case, all necessary state about the current notebook and the currently running kernel should be transported in the resources argument.

@akhmerov
Copy link
Member

Once this problem is resolved, our discussion here will become moot. But in the meantime, can we please keep kernel_name='' as a default that can also be passed by a user?

What is your motivation for this preference? None is a much more common none-value. While I imagine that there wouldn't be kernels that have an empty string for a name, having None for a default is a more standard interface.

@mgeier
Copy link
Contributor

mgeier commented Sep 17, 2018

@akhmerov

What is your motivation for this preference? None is a much more common none-value.

Sorry I wasn't clear enough.

In general I don't have a strong preference. As I said, I think both '' and None would be fine as default values in this case.

The reason why I mentioned kernel_name='' is for backwards compatibility. The default was an empty string before and I don't see a reason to change this.

If the default were changed to None (and the empty string were disallowed), this would break my nbsphinx module (again).

@akhmerov
Copy link
Member

@mgeier doesn't kernel_name=None already work with all versions of nbconvert and has the same effect as kernel_name=''? If that is correct, then you could consider changing it right away in nbsphinx. This wouldn't break any compatibility and ensure nbsphinx is future-proof and also has a more conventional interface.

@mgeier
Copy link
Contributor

mgeier commented Sep 18, 2018

@akhmerov No, using kernel_name=None doesn't currently work, as @mpacer said in #886 (comment). Trying to pass None leads to

traitlets.traitlets.TraitError: The 'kernel_name' trait of an ExecutePreprocessor instance must be a unicode string, but a value of None <class 'NoneType'> was specified.

@MSeal
Copy link
Contributor

MSeal commented Sep 18, 2018

I do agree with @akhmerov that None is the better pattern here and that from a python developer perspective "" is not expected to have special behavior independent of None. But given history and all the discussions here, treating anything pythonic falsey solves the issue either way and is a reasonable interface choice.

Let's get M's change finalized to respect the pattern (even if it's not perfect for everyone) and redirect some of this energy into improving Preprocessor design overall. I think that's a more valuable use of time.

@mpacer mpacer changed the title Recreate old behaviour of ExecutePreprocessor(kernel_name="") WIP Recreate old behaviour of ExecutePreprocessor(kernel_name="") Sep 18, 2018
@mgeier
Copy link
Contributor

mgeier commented Nov 13, 2018

It looks like #809 would also solve the issue that this PR is solving. However, #809 has milestone 6.0, and it would be great if we could solve this issue (namely #878) in release 5.5!

Can someone please tag this with milestone 5.5?

@mgeier
Copy link
Contributor

mgeier commented Dec 4, 2018

Since there doesn't seem to be any progress here, I've created an alternative (simpler) PR here: #924.

I'm fine with either solution, as long as the problem is fixed before the next nbconvert release.

@MSeal
Copy link
Contributor

MSeal commented Dec 14, 2018

Closing as request has been satisfied by related PRs

@MSeal MSeal closed this Dec 14, 2018
andportnoy added a commit to kutaslab/fitgrid that referenced this pull request Feb 4, 2019
This causes nbsphinx failures, for more details see:

spatialaudio/nbsphinx#202
jupyter/nbconvert#878
jupyter/nbconvert#886

Should be fixed by the next nbconvert release (5.5).
@MSeal MSeal added this to the no action milestone Sep 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

'NoneType' object has no attribute 'argv' with nbconvert 5.4 and nbsphinx
5 participants