-
Notifications
You must be signed in to change notification settings - Fork 567
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactored execute preprocessor to have a process_message function #905
Conversation
Hmmm seeings this test fail against master so I don't think it's related to the PR |
I see the same in #900 |
@Carreau I think you mentioned something about changing the error formatting in IPython a while back — shall I assume that the errors that are cropping up are related to a recent release of IPython? |
nbconvert/preprocessors/execute.py
Outdated
|
||
outs.append(out) | ||
cell.outputs.append(result) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really like the idea of having an API that works primarily via side-effects — that's a difficult API to document and test for.
Can you think of a way that we could do this more as a pure function? E.g., could we adhere to the pattern for we use with preprocessors? I think that would involve returning and reassigning the modified cell object rather than returning the result itself while treating the cell.outputs as a location where we can append changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can take a look. Mostly I tried to keep the logic the same and not change the design, just break it into smaller parts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First, this is a really exciting idea as it should make it much easier for people to handle messages in custom ways, which I know we're already doing in papermill in order to log messages as they come in.
However, given how this is a piece of core functionality to the app, I'd prefer to see a new test for handling the new API in the standard case explicitly. Because it's such core functionality it would be a testing failure if we weren't already covering this.
Ideally, I think we should also add a new test specifically to add a mocked replacement method, in part because that would surface some of the details of how the API would work when it is overridden. I think the side-effects would show up as an anti-pattern in that case.
Overall, I'd like if if we could find a way to make the new API work with fewer side-effects and if we could make the logic of how messages are being handled a little more explicit.
nbconvert/preprocessors/execute.py
Outdated
if cell_index in cell_map: | ||
cell_map[cell_index] = [] | ||
# Check for remaining messages we don't process | ||
elif not (msg_type in ['execute_input', 'update_display_data'] or msg_type.startswith('comm')): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like really different logic from what we had before with msg_type.startsith('comm')
; previously it was a continue
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's actually identical logically, by not returning None as a function it will effectively continue for the parent function loop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might have missed the not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so. If the msg_type.startswith('comm')
is true, then it does not fall into here and is not handled with another else
, instead it will return the msg as the response (since that's what it defaults to and it is never reassigned).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additionally now if something is an update_display_data
message, it will return something (specifically the content of the msg).
I'm starting to wonder if this all might be fixed if we default to returning None
…
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was working fine post-rebase, but I changed this to a raise control flow pattern which I think is a lot easier to read.
nbconvert/preprocessors/execute.py
Outdated
try: | ||
out = output_from_msg(msg) | ||
# Assign output as our processed "result" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless I'm reading this incorrectly, I think this may have in effect added nesting and only applied The cell_map logic in the case where it is not an execute_input
or a comm
? if that's what this is intended to accomplish we should talk about that, but I was reviewing this originally as a refactor not a change in functionality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
! (A || B)
== (!A and !B)
So it should be equivalent to before I believe. I could change it to msg_type not in ['execute_input', 'update_display_data'] and not msg_type.startswith('comm')
if you think it reads easier?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that display_id
should have been dealt with in the cell_map
logic regardless of the message type previously, except in the cases where a continue was followed. That try
is not going to be hit by all message types and so the else
also won't be hit by all message types.
|
||
display_id = content.get('transient', {}).get('display_id', None) | ||
if display_id and msg_type in {'execute_result', 'display_data', 'update_display_data'}: | ||
self._update_display_id(display_id, msg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously there was a continue
in the case where there was an update_display_data
message with a statement that # update_display_data doesn't get recorded
am I correct that that logic is now implicit if it's present at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MSeal could you respond to this comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Late response (post rebase), but yes it's implicit by not raising CellExecutionComplete here
I made a release of IPython on Sept 27 and one on Oct 27. There are likely
errors due to that.
…On Tue, Oct 30, 2018, 16:04 M Pacer ***@***.***> wrote:
***@***.**** requested changes on this pull request.
I'd prefer to see a new test for handling this functionality in the
standard case as well as a mocked replacement method as a way to make sure
we understand how the API would work.
Overall, I'd like if if we could find a way to make the new API work with
fewer side-effects and if we could make the logic of how messages are being
handled a little more explicit.
------------------------------
In nbconvert/preprocessors/execute.py
<#905 (comment)>:
> + # set the prompt number for the input and the output
+ if 'execution_count' in content:
+ cell['execution_count'] = content['execution_count']
+
+ if msg_type == 'status':
+ if content['execution_state'] == 'idle':
+ # Set result to None to halt execution
+ result = None
+ elif msg_type == 'clear_output':
+ cell.outputs[:] = []
+ # clear display_id mapping for this cell
+ for display_id, cell_map in self._display_id_map.items():
+ if cell_index in cell_map:
+ cell_map[cell_index] = []
+ # Check for remaining messages we don't process
+ elif not (msg_type in ['execute_input', 'update_display_data'] or msg_type.startswith('comm')):
This seems like really different logic from what we had before with
msg_type.startsith('comm'); previously it was a continue.
------------------------------
In nbconvert/preprocessors/execute.py
<#905 (comment)>:
> try:
- out = output_from_msg(msg)
+ # Assign output as our processed "result"
Unless I'm reading this incorrectly, I think this may have in effect added
nesting and only applied The cell_map logic in the case where it is not an
execute_input or a comm? if that's what this is intended to accomplish we
should talk about that, but I was reviewing this originally as a refactor
not a change in functionality.
------------------------------
In nbconvert/preprocessors/execute.py
<#905 (comment)>:
> +
+ def process_message(self, msg, cell, cell_index):
+ '''
+ Returns None if execution should be halted.
+ '''
+ msg_type = msg['msg_type']
+ self.log.debug("msg_type: %s", msg_type)
+ content = msg['content']
+ self.log.debug("content: %s", content)
+
+ # Default to our input as the "result" of processing the message
+ result = msg
+
+ display_id = content.get('transient', {}).get('display_id', None)
+ if display_id and msg_type in {'execute_result', 'display_data', 'update_display_data'}:
+ self._update_display_id(display_id, msg)
Previously there was a continue in the case where there was an
update_display_data message with a statement that # update_display_data
doesn't get recorded am I correct that that logic is now implicit if it's
present at all?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#905 (review)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAUez8A5iUQVDLSd8gyzJxopCU4TrHcPks5uqNsagaJpZM4X-Q1I>
.
|
I can add a test for the inheritance case. It did seem when I tried moving things to the wrong places on purpose that tests would usually fail. I'll do a more thorough check. Do you want me to do functional rewrite to avoid the side_effect patterns already there? Might be higher risk of changing execution logic but I can do the more extensive changes if you think it's worth it. |
@mpacer I added a new test. From reading the code I think we'd need to do a semi-significant update to add in the message handler we talked about. Because of that I'd prefer to split the implementation into two parts and do that refactor post-merge of this PR. |
|
||
original = copy.deepcopy(input_nb) | ||
wpp = WrappedPreProc() | ||
executed = wpp.preprocess(input_nb, {})[0] |
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
original = copy.deepcopy(input_nb) | ||
wpp = WrappedPreProc() | ||
executed = wpp.preprocess(input_nb, {})[0] | ||
self.assertEqual(outputs, [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Elsewhere in the library we tend to use direct asserts as we are using pytest to run the tests. In addition to stylistic consistency, I think it's a lot easier to read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noted, will fix
def test_process_message_wrapper(self): | ||
outputs = [] | ||
|
||
class WrappedPreProc(ExecutePreprocessor): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer if we directly created the context needed so that process_message
can be tested functionally, iterating over the cell, msg and cell_index as directly as possible.
I'd prefer if we weren't concatenating all of the outputs into a single object for the whole notebook, I'm thinking it should follow the cell structure.
I'd also like to make sure that message types that are supposed to return None are returning None, which means keeping track of at least the msg type in relation to the output if not some of the more predictive fields (e.g., a display_id that we know is set to a particular value).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done -- this is rebased off #982 now
@mpacer PR should now be ready for re-review |
…r each kernel message
nbconvert/preprocessors/execute.py
Outdated
@@ -25,6 +25,8 @@ | |||
from ..utils.exceptions import ConversionException | |||
|
|||
|
|||
class CellExecutionComplete(Exception): pass # Used as a control signal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you make this an actual docstring? Specifically one that describes what this is going to be used for?
@@ -185,18 +185,18 @@ def normalize_output(output): | |||
def assert_notebooks_equal(self, expected, actual): | |||
expected_cells = expected['cells'] | |||
actual_cells = actual['cells'] | |||
self.assertEqual(len(expected_cells), len(actual_cells)) | |||
assert len(expected_cells) == len(actual_cells) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huzzah! Many thanks for switching to using the pytest assert
idiom.
def process_message(self, msg, cell, cell_index): | ||
result = super(WrappedPreProc, self).process_message(msg, cell, cell_index) | ||
if result: | ||
outputs.append(result) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why does this still need special logic distinct from the standard process_message
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's purely to test that one could wrap an retrieve outputs to process on it's own. It has no real purpose for doing so beyond showing how to wrap the function in inherited classes. Namely to prove we can wrap the function safely in papermill cleanly.
This looks great! Thanks for working through all the issues! |
Built ontop of #982 which added additional requested testing. We can merge that PR first if it's desired.
Broke apart the large run_cell function to enable intercepting messages easier. This will enable deleting code from papermill where we have to reimplement the entire
run_cell
method.