Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Execution dependency extension #1213

Merged
merged 3 commits into from
Feb 3, 2018
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,9 @@ Repo-level stuff:

New features and bugfixes:

- `execution_dependencies` __new nbextension added!__
[#1213](https://github.com/ipython-contrib/jupyter_contrib_nbextensions/pull/1213)
[@benelot](https://github.com/benelot)
- `livemdpreview` __new nbextension added!__
[#1155](https://github.com/ipython-contrib/pull/1155)
[@jcb91](https://github.com/jcb91)
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
execution_dependencies
======================

Writing extensive notebooks can become very complicated since many cells act as stepping stones to produce intermediate results for later cells. Thus, it becomes tedious to
keep track of the cells that have to be run in order to run a certain cell. This extension simplifies handling the execution dependencies by introducing tag annotations to
identify each cell and indicate a dependency on others. This improves on the current state which requires remembering all dependencies by heart or annotating the cells in the comments.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be worth noting that dependencies are definitely executed, rather than say only being executed once per kernel session. This may be important for cells which take a long time to execute...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, I should add that. Thanks for reviewing my code, I am very grateful for that! I really like jupyter notebooks and I am very interested in contributing more extensions in the future.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem, happy to help 😄 And of course, we'd be happy to have anything you think might be useful to include here 😉


If a cell with dependencies is run, the extension checks recursively for all dependencies of the cell, then executes them before executing the cell after all the dependencies have finished.
Dependencies are definitely executed and not only once per kernel session.

The two annotations are added to the tags of a cell and are as follows:

* add a hashmark (#) and an identification tag to the tags to identify a cell (e.g. #initializer-cell). The #identifiers must be unique among all cells.
* add an arrow (=>) and an identification tag to the tags to add a dependency on a certain cell (e.g. =>initializer-cell).

Based on these dependencies, the kernel will now execute the dependencies before the cell that depends on them. If the cell's dependencies have further dependencies, these will in turn
be executed before them. In conclusion, the kernel looks through the tree of dependencies of the cell executed by the user and executes its dependencies in their appropriate order,
then executes the cell.

A more extensive example is described below:

A cell A has the identifier #A.

| Cell A [tags: #A] |
| ------------- |
| Content Cell |
| Content Cell |


A cell B has the identifier #B and depends on A (=>A).


| Cell B [tags: #B, =>A] |
| ------------- |
| Content Cell |
| Content Cell |

If the user runs A, only A is executed, since it has no dependencies. On the other hand, if the user runs B, the kernel finds the dependency on A, and thus first runs A and then runs B.

Running a cell C that is dependent on B and on A as well, the kernel then first runs A and then runs B before running C, avoiding to run cell A twice.


If you are missing anything, open up an issue at the repository prepending [execute_dependencies] to the title.
Original file line number Diff line number Diff line change
@@ -0,0 +1,138 @@
/**
* execution_dependencies.js
* Introduce tag annotations to identify each cell and indicate a dependency on others.
* Upon running a cell, its dependencies are run first to prepare all dependencies.
* Then the cell triggered by the user is run as soon as all its dependencies are met.
*
*
* @version 0.1.0
* @author Benjamin Ellenberger, https://github.com/benelot
* @updated 2018-01-31
*
*
*/
define([
'jquery',
'base/js/dialog',
'base/js/namespace',
'notebook/js/codecell'
], function (
$,
dialog,
Jupyter,
codecell
) {
"use strict";

var CodeCell = codecell.CodeCell;

return {
load_ipython_extension: function () {
console.log('[execution_dependencies] patching CodeCell.execute');
var orig_execute = codecell.CodeCell.prototype.execute; // keep original cell execute function
CodeCell.prototype.execute = function (stop_on_error) {
var root_tags = this.metadata.tags || []; // get tags of the cell executed by the user (root cell)
if(root_tags.some(tag => /=>.*/.test(tag))) { // if the root cell contains any dependencies, resolve dependency tree
var root_cell = this;
var root_cell_id = root_cell.cell_id;
var cells_with_id = Jupyter.notebook.get_cells().filter(function (cell, idx, cells) { // ...get all cells which have at least one id (these are the only ones we could have in deps)
var tags = cell.metadata.tags || [];
return (cell === root_cell || tags.some(tag => /#.*/.test(tag)));
});

console.log('[execution_dependencies] collecting ids and dependencies...');
var cell_map = {}
var dep_graph = {}
cells_with_id.forEach(function (cell) { // ...get all identified cells (the ones that have at least one #tag)
var tags = cell.metadata.tags || [];
var cell_ids = tags.filter(tag => /#.*/.test(tag)).map(tag => tag.substring(1)); // ...get all identifiers of the current cell and drop the #
if(cell === root_cell){
if(cell_ids.length < 1) {
cell_ids.push(root_cell.cell_id); // ...use internal root cell id for internal usage
}
else {
root_cell_id = cell_ids[0]; // get any of the root cell ids
}
}

var dep_ids = tags.filter(tag => /=>.*/.test(tag)).map(tag => tag.substring(2)); // ...get all dependencies and drop the =>

cell_ids.forEach(function (id) {
//console.log('ID:', id, 'deps: ', dep_ids.toString())
cell_map[id] = cell;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this line implicitly means only a single cell can have each #some_id tag. Is this what you want? It could make sense to have a few cells with the same tag, so do something like

cell_map[id] = cell_map[id] || [];
cell_map[id].push(cell);

Alternatively, if you don't intend to support this, at least let the user know that adding multiple cells with the same identifier tag won't work correctly in the readme.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, as far as I thought about this, I wanted to work with unique ids, so one id for one cell. In the one id for multiple cells case, I would not be sure what is the right order to execute them and even if they are all independent from each other, they might have different dependencies, so I would have to account for each of them separately. But you gave me a nice idea. Is the cell_id you mentioned above unique within the notebook's context? Then I could use that for uniqueness and allow multiple cells to have the same hashtag id. Maybe I will get to this in a later version.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the cell_id you mentioned above unique within the notebook's context? Then I could use that for uniqueness and allow multiple cells to have the same hashtag id.

Yes, it should be unique for the current session. See notebook/static/notebook/js/cell.js#L101 for where it gets assigned, and notebook/static/base/js/utils.js#L206-L220 for the implementation

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not be sure what is the right order to execute them

I think it would be reasonable to assume they ought to be executed in the order in which they appear in the notebook?

they might have different dependencies, so I would have to account for each of them separately

sort of, or they can be treated as essentially an aggregate (dependencies of this tag are simply the collected dependencies of all its constituent cells).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds both reasonable. I will look into this as soon as the rest works.

dep_graph[id] = dep_ids;

});
});

if(dep_graph[root_cell_id].length > 0) {
console.log('[execution_dependencies] collecting depdendency graph in-degrees...');
var processing_queue = [root_cell_id];
var processed_nodes = 0;
var in_degree = {}; // ...collect in-degrees of nodes
while(processing_queue.length > 0 && processed_nodes < Object.keys(dep_graph).length) {// ...stay processing deps while the queue contains nodes and the processed nodes are below total node quantity
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this loop never increments processed_nodes, so gets stuck for circular dependencies...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added it to the code. Ready for merging.

var id = processing_queue.shift(); // .....pop front of queue and front-push it to the processing order
//console.log("ID: ", id);
for(var i=0, dep_qty=dep_graph[id].length; i < dep_qty; i++) {
var dep = dep_graph[id][i];
// console.log(' dep: ', dep);
in_degree[id] = in_degree[id] || 0;
in_degree[dep] = in_degree[dep] === undefined ? 1 : ++in_degree[dep];
processing_queue.unshift(dep);
}
}

console.log('[execution_dependencies] starting topological sort...');
processing_queue = [root_cell_id]; // ...add root node with in-degree 0 to queue (this excludes all disconnected subgraphs)
processed_nodes = 0; // ...number of processed nodes (to detect circular dependencies)
var processing_order = [];
while(processing_queue.length > 0 && processed_nodes < Object.keys(dep_graph).length) {// ...stay processing deps while the queue contains nodes and the processed nodes are below total node quantity
var id = processing_queue.shift(); // .....pop front of queue and front-push it to the processing order
processing_order.unshift(id);
//console.log("ID: ", id);
for(var i=0, dep_qty=dep_graph[id].length; i < dep_qty; i++) { // ......iterate over dependent nodes of current id and decrease their in-degree by 1
var dep = dep_graph[id][i];
// console.log(' dep: ', dep);
in_degree[dep]--;
if(in_degree[dep] == 0) { // ......queue dependency if in-degree is 0
processing_queue.unshift(dep);
}
}
processed_nodes++;
}

console.log('[execution_dependencies] checking for circular dependencies...');
if(processed_nodes > Object.keys(dep_graph).length) { // ...if more nodes where processed than the number of graph nodes, there is a circular dependency
dialog.modal({
title : 'Circular dependencies in the execute dependencies of this cell',
body : 'There is a circular dependency in this cell\'s execute dependencies. The cell will be run without dependencies. If this does not work, fix the dependencies and rerun the cell.',
buttons: {'OK': {'class' : 'btn-primary'}},
notebook: Jupyter.notebook,
keyboard_manager: Jupyter.keyboard_manager,
});
}
else if(!Jupyter.notebook.trusted) { // ...if the notebook is not trusted, we do not execute dependencies, but only print them out to the user
dialog.modal({
title : 'Execute dependencies in untrusted notebook',
body : 'This notebook is not trusted, so execute dependencies will not be automatically run. You can still run them manually, though. Run in order (the last one is the cell you wanted to execute): ' + processing_order,
buttons: {'OK': {'class' : 'btn-primary'}},
notebook: Jupyter.notebook,
keyboard_manager: Jupyter.keyboard_manager,
});
}
else{
processing_order.pop()
console.log('[execution_dependencies] executing dependency cells in order ', processing_order ,'...');
var dependency_cells = processing_order.map(id =>cell_map[id]); // ...get dependent cells by their id
//console.log("Execute cells..", dependency_cells)
dependency_cells.forEach(cell => orig_execute.call(cell, stop_on_error)); // ...execute all dependent cells in sequence using the original execute method
}
}
}
console.log('[execution_dependencies] executing requested cell...');
orig_execute.call(this, stop_on_error); // execute original cell execute function
};
console.log('[execution_dependencies] loaded');
}
};
});
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
Type: Jupyter Notebook Extension
Compatibility: 4.x, 5.x
Name: Execution Dependencies
Main: execution_dependencies.js
Link: README.md
Description: |
Introduce tag annotations to identify each cell and indicate a dependency on others.
Upon running a cell, its dependencies are run first to prepare all dependencies.
Then the cell triggered by the user is run as soon as all its dependencies are met.