Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Per notebook MANIFESTS #673

Closed
Keno opened this issue Jul 7, 2018 · 74 comments
Closed

Per notebook MANIFESTS #673

Keno opened this issue Jul 7, 2018 · 74 comments

Comments

@Keno
Copy link
Member

Keno commented Jul 7, 2018

Now, that 0.7 is getting closer, it may make sense to start thinking about how notebooks interact with the new package manager. I had discussed with @StefanKarpinski and @KristofferC that it would be great if notebooks could embed a MANIFEST and thus if you send somebody a notebook they could automatically load everything with the correct versions. Doing something like this would require figuring out where to store the information, how to hook it up to Pkg3 and probably require some UI work as well.

@stevengj
Copy link
Member

stevengj commented Jul 8, 2018

maybe with the contents API

@StefanKarpinski
Copy link
Member

StefanKarpinski commented Jul 8, 2018

I believe that what's needed is a "environment protocol": i.e. instead of needing to actually have a project file and/or manifest file present, or a package directory, or load path array, one just needs to implement the environment protocol. Then the IJulia package can implement the protocol for notebooks that have environment information stored in them and voila, each notebook has its own environment. However, I think that work is a 1.x kind of thing: we now generally understand what the protocol needs to look like; the next step is to factor out the protocol part in such a way that the three kinds of environments that we already support are implementations of this protocol; after that we allow a notebook to implement the environment protocol as well.

The main thing to consider at this point is how to allow for extension in the future. Where is the hook? Do we have a Base.PACKAGE_ENVIRONMENT variable, which, if set, overrides the LOAD_PATH lookup? Or do we have some special name which can be put into the LOAD_PATH that causes loading to talk to the notebook instead?

The contents API seems like it may be a good way to stash the manifest information, but we don't really need something that emulates a file system—using a JSON store would actually be easier.

@jlperla
Copy link

jlperla commented Jul 13, 2018

A long-run solution is great to automatically embed the manifest/etc. But is it possible to have a short-term patch requiring a manual call to load something in the notebook itself? That is, something along the lines of

Pkg.setmanifest("Manifest.toml") #i.e., local to the notebook
using MyLib #i.e., the kernel is using the `Manifest.toml` now

Or maybe this is already possible with some of the Pkg3 commands in Jupyter?

@StefanKarpinski
Copy link
Member

🤷‍♂️ maybe?

@KristofferC
Copy link
Member

KristofferC commented Jul 13, 2018

Can't you just activate the dir of the notebook? Then that notebook will use a separate environment that will be stored next to the notebook.

@jlperla
Copy link

jlperla commented Jul 13, 2018

To make sure I understand this, you think I may just be able to put a Manifest.toml in the notebook directory, then I should just need to run:

Pkg.activate(".")
using MyLib

If that is correct, I can try to have someone test it when IJulia is sufficiently stable with 0.7

@KristofferC
Copy link
Member

You need a Project file as well. But yes if you do

Pkg.activate(".")

and then go wild with adding packages, those will be recorded in Project.toml and Manifest.toml along the notebook, and if you send these files to someone else, they can do

Pkg.activate(".")
Pkg.instantiate()

to install all the packages at the version you used them.

@tkf
Copy link
Member

tkf commented Aug 18, 2018

If opening a notebook can instantiate arbitrary Manifest.toml (which may contain arbitrary repo-url), isn't it a security hole? Isn't it also incompatible with the security model of Jupyter notebook (= trust if you execute it)?

How about adding a simple function that uploads Project.toml and Manifest.toml to gist and then call IJulia.load_string to inject something like

Pkg.activate("https://gist.github.com/.../...")

to a notebook cell? Of course, Pkg.activate then has to support downloading *.toml when a URL is specified. Pkg.activate can also check if those packages are from the known registries and prompt user if not.

Alternatively, I guess you can use cell attachments to bundle *.toml into the notebook file but it would require the kernel and the server to be in the same machine. For example, it won't work if you launch a Julia kernel on a HPC cluster via a Jupyter notebook server running on your laptop.

@jlperla
Copy link

jlperla commented Aug 18, 2018

I don't know jupyter all that well, but isn't the security controlled by how it is contained? You can load local files, run shell stuff, etc if it lets you?

Certainly being able to instantiate a local manifest is not the long-run solution, and will not work for all scenarios, but I don't think it is a security hole.

@vchuravy
Copy link
Member

vchuravy commented Aug 18, 2018

I now have this in my notebooks

using Pkg
Pkg.activate(@__DIR__)
Pkg.instantiate()
pkg"precompile"

Pkg.activate(".") doesn't work well since you can start your jupyter notebook from any working directory.

@KristofferC
Copy link
Member

Storing the Manifest + Project inside the notebook and have a button that does that would come a long way. There shouldn't be any security problems with that, it is just a convenience layer?

@tkf
Copy link
Member

tkf commented Aug 19, 2018

Their security model is:

  • Untrusted HTML is always sanitized
  • Untrusted Javascript is never executed
  • HTML and Javascript in Markdown cells are never trusted
  • Outputs generated by the user are trusted
  • Any other HTML or Javascript (in Markdown cells, output generated by others) is never trusted
  • The central question of trust is “Did the current user do this?”

--- https://jupyter-notebook.readthedocs.io/en/stable/security.html#our-security-model

So I don't think you can register any UI elements like a button to instantiate a project from the notebooks. Thought I guess that's possible via front-end extension.

I just thought using IJulia.load_string is a very simple and generic solution since it does not require writing any front-end extension. It is also useful outside Jupyter/IJulia.

@Keno
Copy link
Member Author

Keno commented Aug 19, 2018

So I don't think you can register any UI elements like a button to instantiate a project from the notebooks. Thought I guess that's possible via front-end extension.

This is a pretty fundamental feature, so integrating it nicely into the frontend for every julia notebook seems like the right way to do it.

@tkf
Copy link
Member

tkf commented Aug 19, 2018

If you are willing to write a front-end extension I think that's great! I have no intention of stopping it.

@simonbyrne
Copy link
Contributor

The notebook does include a certain amount of notebook-wide metadata, detailing the language and kernel. e.g.

 "metadata": {
  "kernelspec": {
   "display_name": "Julia 1.0.0",
   "language": "julia",
   "name": "julia-1.0"
  },
  "language_info": {
   "file_extension": ".jl",
   "mimetype": "application/julia",
   "name": "julia",
   "version": "1.0.0"
  }

It may be possible to insert and read the manifest information from there.

As far as a security model goes, one solution could be a confirmation dialog before installing any new package versions via activate.

@simonbyrne
Copy link
Contributor

Well, I asked on the Jupyter gitter: it seems like this is not possible via the current protocol, so if we wanted something along those lines we would need to do it via a jupyter extension.

@simonbyrne
Copy link
Contributor

simonbyrne commented Sep 25, 2018

What if we added a function to IJulia which did something along the lines of what @vchuravy suggested, e.g.

using Pkg
function useproject(path=pwd())
    Pkg.activate(path)
    Pkg.instantiate()
    pkg"precompile"
end

Then, at the top of the notebook you could just do

IJulia.useproject()

@tkf
Copy link
Member

tkf commented Sep 25, 2018

It does not work when IJulia kernel and Jupyter server run in different machines.
https://bitbucket.org/tdaff/remote_ikernel/src/default/
https://github.com/ipython/ipython/wiki/Cookbook:-Connecting-to-a-remote-kernel-via-ssh

@StefanKarpinski
Copy link
Member

At JupyterCon I spoke with a few Jupyter folks and their take was that trying to put this kind of metadata into notebooks was not the right direction to go—they've tried this with images and other things in the past and have come to feel that the "unit of distribution" should be a git repo, not a single notebook file. So it seems like the way to go here might be to have IJulia automatically activate the project in the git repo that it's in. After all, you are running the code in the notebook, so presumably you trust it. (As compared to just starting a Julia process in a directory, which may or may not mean that you trust the content of the directory enough to execute it.)

@stevengj
Copy link
Member

IJulia doesn't know what notebook file (if any) it is executing — that information is not provided to the kernel.

@Keno
Copy link
Member Author

Keno commented Sep 25, 2018

At JupyterCon I spoke with a few Jupyter folks and their take was that trying to put this kind of metadata into notebooks was not the right direction to go—they've tried this with images and other things in the past and have come to feel that the "unit of distribution" should be a git repo, not a single notebook file. So it seems like the way to go here might be to have IJulia automatically activate the project in the git repo that it's in. After all, you are running the code in the notebook, so presumably you trust it. (As compared to just starting a Julia process in a directory, which may or may not mean that you trust the content of the directory enough to execute it.)

If we go this way, I'd still like a way to package everything into a single file that you can email to somebody or share on JuliaBox (also have separate environments for every notebook on juliabox). If we I just want to share some code with somebody, I don't think we can expect the workflow to be "Go clone this git repo".

@jlperla
Copy link

jlperla commented Sep 25, 2018

I don't think we can expect the workflow to be "Go clone this git repo".

I agree. Jupyter notebooks need to be able to be used self-contained in some sense. Even the Jupiter interface is often around the "Upload" notebook interface.

What about the ability to activate from a URL? You could give it the project file and/or manifest, and it would enable copying jupyter around. And if someone wanted to run the notebook in whatever global project they had in their current jupyter, they wouldn't need to use those cells?

@StefanKarpinski
Copy link
Member

I'm just reporting what the Jupyter people (@Carreau if I recall correctly) told me which is that they are moving away from trying to make notebooks self-contained because it has not worked out as hoped. The simplest solution would seem to be serving a zip or tar file continaing a set of notebooks, resources used by the notebook and in our case, project and manifest files.

@Carreau
Copy link
Contributor

Carreau commented Sep 25, 2018

Yes, we tend to try to think of (1 unit == 1 repository).The notebook as unit, espescially since you can now connect many notebook to same kernel make not much sens.

We haven't really figured out how to make all of the completely work, but generally trying to shove more into a notebook does not work.

As said before a repository does not always work, but I don't think we can get a "one size fits all". There is always this tension between being able to manipulate things on the filesystem, and having everything being opaque and managed by Jupyter.

You could of course have an extension for jupyter that show "bundles" as an actual tree of files, but then you can't cd into it.

Maybe something along a fuse driver that expose a single file at some path, and repo structure in another ?

@Carreau
Copy link
Contributor

Carreau commented Sep 25, 2018

@fperez would be interested in this discussion BTW, and I think we had pictures of a whitebord with all the different axes of what people want from notebook files.

@simonbyrne
Copy link
Contributor

My experience is that embedding data in notebooks is a lost cause. e.g. the attachments feature is basically useless, since:

@jlperla
Copy link

jlperla commented Sep 26, 2018

Is there a reason not to enable on url based project and Manifest files? In a github based implementation, you could point it to the raw file, or a local url. And notebooks copied around would then work.

Does that break the Jupyter security model?
Since the user would actively choose to run the script and trust the notebook, it doesn't seem like it?

@arnavs
Copy link

arnavs commented Oct 12, 2018

FYI, we've tagged a release of QuantEcon/InstantiateFromURL.jl, which implements the idea from @vchuravy above.

@jlperla
Copy link

jlperla commented Oct 13, 2018

To be clear, this first implementation is for a light repo with package and manifest. Which provides a solution for tightly controlled lecture notes /etc. The gist approach, which would be better for less formal setups, could be added as well if anyone is interested

@tkoolen
Copy link

tkoolen commented Oct 22, 2018

I think #673 (comment) is missing a Pkg.build() step in order for things to be guaranteed to work starting from a clean slate. Would be nice not to have to do that every time you run the notebook though.

@KristofferC
Copy link
Member

Instantiate builds the packages that got downloaded so don't think that is required.

@tkoolen
Copy link

tkoolen commented Oct 22, 2018

I seem to recall cases when that didn't happen, but maybe that was just because the build had failed during an earlier instantiate call.

@arnavs
Copy link

arnavs commented Oct 22, 2018

@tkoolen FYI, the way we avoid rebuilding every time is to either (a) precompile the resources, for git refs that point to moving targets like master, or (b) version the resources using git tags, so something like activate_github("arnavs/InstantiationTest", tag = "v0.1.0") will never be updated.

tkoolen added a commit to JuliaRobotics/RigidBodyDynamics.jl that referenced this issue Oct 22, 2018
* Reenable all notebook tests now that required packages all support
Julia 1.0.
* Put each notebook in a separate directory, with its own Project.toml
and Manifest.toml, to make running the notebooks more straightforward
(see JuliaLang/IJulia.jl#673).
* Separate Project.toml and Manifest.toml files for optional
visualization parts of the notebooks (not tested by CI, since this would
introduce a cyclic test dependency).
* Fix #501, Symbolic Double Pendulum not working (work around
JuliaPy/SymPy.jl#245 and
JuliaPy/SymPy.jl#244).
tkoolen added a commit to JuliaRobotics/RigidBodyDynamics.jl that referenced this issue Oct 22, 2018
Notebook fixes.

* Reenable all notebook tests now that required packages all support
Julia 1.0.
* Put each notebook in a separate directory, with its own Project.toml
and Manifest.toml, to make running the notebooks more straightforward
(see JuliaLang/IJulia.jl#673).
* Separate Project.toml and Manifest.toml files for optional
visualization parts of the notebooks (not tested by CI, since this would
introduce a cyclic test dependency).
* Fix #501, Symbolic Double Pendulum not working (work around
JuliaPy/SymPy.jl#245 and
JuliaPy/SymPy.jl#244).

Update doc links, readme in notebooks directory.

Just rely on notebook-specific manifests for test dependencies other than ForwardDiff and NBInclude.

Add RigidBodySim pointer.

Better way to handle URDF links with the name 'world'

Makes it so that root_frame(mechanism) is no longer named "".

Fix copyto! performance for SegmentedVector.

Fixes performance momentum_matrix! in ForwardDiff notebook.
tkoolen added a commit to JuliaRobotics/RigidBodyDynamics.jl that referenced this issue Oct 23, 2018
Notebook fixes.

* Reenable all notebook tests now that required packages all support
Julia 1.0.
* Put each notebook in a separate directory, with its own Project.toml
and Manifest.toml, to make running the notebooks more straightforward
(see JuliaLang/IJulia.jl#673).
* Separate Project.toml and Manifest.toml files for optional
visualization parts of the notebooks (not tested by CI, since this would
introduce a cyclic test dependency).
* Fix #501, Symbolic Double Pendulum not working (work around
JuliaPy/SymPy.jl#245 and
JuliaPy/SymPy.jl#244).

Update doc links, readme in notebooks directory.

Just rely on notebook-specific manifests for test dependencies other than ForwardDiff and NBInclude.

Add RigidBodySim pointer.

Better way to handle URDF links with the name 'world'

Makes it so that root_frame(mechanism) is no longer named "".

Fix copyto! performance for SegmentedVector.

Fixes performance momentum_matrix! in ForwardDiff notebook.
tkoolen added a commit to JuliaRobotics/RigidBodyDynamics.jl that referenced this issue Oct 23, 2018
Notebook fixes.

* Reenable all notebook tests now that required packages all support
Julia 1.0.
* Put each notebook in a separate directory, with its own Project.toml
and Manifest.toml, to make running the notebooks more straightforward
(see JuliaLang/IJulia.jl#673).
* Separate Project.toml and Manifest.toml files for optional
visualization parts of the notebooks (not tested by CI, since this would
introduce a cyclic test dependency).
* Fix #501, Symbolic Double Pendulum not working (work around
JuliaPy/SymPy.jl#245 and
JuliaPy/SymPy.jl#244).

Update doc links, readme in notebooks directory.

Just rely on notebook-specific manifests for test dependencies other than ForwardDiff and NBInclude.

Add RigidBodySim pointer.

Better way to handle URDF links with the name 'world'

Makes it so that root_frame(mechanism) is no longer named "".

Fix copyto! performance for SegmentedVector.

Fixes performance momentum_matrix! in ForwardDiff notebook.
tkoolen added a commit to JuliaRobotics/RigidBodyDynamics.jl that referenced this issue Oct 23, 2018
Notebook fixes.

* Reenable all notebook tests now that required packages all support
Julia 1.0.
* Put each notebook in a separate directory, with its own Project.toml
and Manifest.toml, to make running the notebooks more straightforward
(see JuliaLang/IJulia.jl#673).
* Separate Project.toml and Manifest.toml files for optional
visualization parts of the notebooks (not tested by CI, since this would
introduce a cyclic test dependency).
* Fix #501, Symbolic Double Pendulum not working (work around
JuliaPy/SymPy.jl#245 and
JuliaPy/SymPy.jl#244).

Update doc links, readme in notebooks directory.

Just rely on notebook-specific manifests for test dependencies other than ForwardDiff and NBInclude.

Add RigidBodySim pointer.

Better way to handle URDF links with the name 'world'

Makes it so that root_frame(mechanism) is no longer named "".

Fix copyto! performance for SegmentedVector.

Fixes performance momentum_matrix! in ForwardDiff notebook.
@c42f
Copy link
Member

c42f commented Feb 22, 2019

Perhaps there's a very simple solution to this problem: treat the desired embedded environment metadata as code in the first executable cell. The question then becomes how to make it unobtrusive in the standard jupyter UI. It appears the UI doesn't do line wrapping, so there might also be a simple answer to that as well: base64 encode the toml files into a single line each.

The nice thing about this is that it's a solution for scripts which need to "come with their environment" just as much as jupyter notebooks. Then we'd just need a package ProjectEnvironments (or something) with a very simple and forward/backward compatible API which people could add manually, and which acts as the springboard into the well defined environment for the notebook.

Would this work or have I missed something?

@c42f
Copy link
Member

c42f commented Feb 22, 2019

I tried implementing this; there's a few goatchas but it looks like it will work. Gotchas include:

  • We need to load CodeEnvironments (my working name for the package) from some default environment. Its API would need to be very forward and backward compatible.
  • Mutable environments are somewhat problematic; you want the jupyter user to be able to add easily to the environment, but this conflicts with a desire to make them immutable and content addressed for the purposes of activating them from jupyter code. I think this can be managed in practice with some clear warnings but it's bit of a nasty wrinkle.

Generally there seems to be some impedance mismatch with Pkg, which is probably not a surprise given that I don't know a lot about Pkg ;-) It does, however, offer a way to have per-notebook embedded manifests and project files.

@Keno
Copy link
Member Author

Keno commented Feb 26, 2019

Another datapoint: I'd like to be able to send people links to colab notebooks with in-built environments, but the unit they use is a file :)

@c42f
Copy link
Member

c42f commented Feb 26, 2019

I think my proposed solution/workaround would be ok for that. Would you be interested in it becoming a registered package? I'd need to think a bit more about the workflow and API, and probably involve Pkg people to know whether it's going to work out, or is fundamentally broken in some way. But I'm not sure whether to do that extra work yet.

@jlperla
Copy link

jlperla commented Feb 26, 2019

@Keno @c42f If you have been using a solution like this, there is another use case to consider: getting notebook users able to update the project and manifest when necessary. This has proven to be very important for our set of lecture notes, otherwise people effectively start copying around notebooks and using copies of them to edit for assignments.

After doing this for the last 8 months, my gut says that metadata in a notebook could become hellish to maintain and lead to all sorts of user issues wondering why they have the wrong versions of packages. I used to be of the opinion that hidden metadata was the right way to go, but have reversed my stand completely. On the other hand, I will never reverse my stand that notebooks have to execute self-contained from a single file and that copies around toml files is a terrible idea.


For what it is worth, the approach we implemented from people's (i..e @vchuravy 's) suggestion in https://github.com/QuantEcon/InstantiateFromURL.jl/ has been very successful. Basically,

  1. Put a manifest and project file on github
  2. At the top of the notebook you call the package to "activate" a particular version of it (e.g. a tagged release, a commit, or master). We have been tagging versions, but you could do any of them including just commits
  3. When that line is run, it looks in a local temporary .project file to see if the version of that package has been downloaded. If not, it downloads, activates, and instantiates. Otherwise it just activates. The instantiation has been a very helpful step for ensuring people are using the right versions of the packages and it makes installation a joke.
  4. If we ever need to update the set of packages, we can just tell people to bump a tag version (or hash tag) at the top of the notebook.

Take a look at https://github.com/QuantEcon/quantecon-notebooks-jl/blob/master/kalman.ipynb as an example, but basically all that is needed is the

using InstantiateFromURL
activate_github("QuantEcon/QuantEconLecturePackages", tag = "v0.9.6");

at the top of the page. The project and manifest are versioned in https://github.com/QuantEcon/QuantEconLecturePackages

Now, for those who don't need to have a mini repository, @vchuravy had the initial idea that this sort of package could have a simple utility to setup a gist instead. QuantEcon/InstantiateFromURL.jl#18

We didn't need it ourselves and couldn't put in the development time, but I think it is exactly the sort of thing that is needed for more lightweight package management.

... all of that is to say: before starting on any new solution, please see if the workflow in this package is solid and feel free to submit PRs for new features. If enough people vet this solution, a variation on it might make sense in Pkg.jl or at least a more formally maintained package.

@c42f
Copy link
Member

c42f commented Feb 27, 2019

@jlperla That sounds like a great workflow for your use case. My reservation is that it's not self contained and requires supporting infrastructure which can't easily be updated by the end users. This is probably a good feature in your case where you're running a class with homogeneous package requirements.

On the other hand, I'm helping a group of somewhat nontechnical PhD students with heterogeneous data management and analysis tasks. My thought is that I should be able to give them jupyter notebooks (and normal scripts!) which have embedded self-contained environments. I'd also like them to be able to update package requirements as their needs change. But at the same time, have them well defined and embedded within the notebooks so that package requirements are somewhat resistant to user error (eg emailing a script and forgetting to add the Project and Manifest files).

@jlperla
Copy link

jlperla commented Feb 27, 2019

(eg emailing a script and forgetting to add the Project and Manifest files).

Emailing project and manifest files around simply does not work. I am completely with you. And they make put the wrong ones with the wrong files.

My thought is that I should be able to give them jupyter notebooks (and normal scripts!) which have embedded self-contained environments. I'd also like them to be able to update package requirements as their needs change.

I understand the goal of a "self-contained environment", but I would decouple that from a self-contained file. Here are some usage scenarios:

But at the same time, have them well defined and embedded within the notebooks so that package requirements are somewhat resistant to user error

  • What if you want to bump the versions of the notebooks that your students were using (which happens all the time with julia since the packages change frequently and are often broken)? You can't have them just do manual package operations into the metadata because it is easy to get versions out of sync or to make mistakes
  • What if they do package operations in the notebook and mess up versions? How do they reset the notebook.
  • What if they copy the notebook, start making their own changes... then you give them new ones with different and fixed packages but they get out of sync and don't realize they are working with an old set of hidden metadata.

These are the tip of the iceberg.... As I said, I used to think that this stuff belonged in the notebook but changed my tune completely after seeing usage scenarios.

I'd also like them to be able to update package requirements as their needs change.

Having these things centrally managed is extremely helpful. But I understand that having a full repo for the set of project/toml is a little heavy for most uses.

This is exactly why @vchuravy had originally suggested using a gist with some tools (which I will try to summarize below). For us, having a consistent set of versions to bump was very nice but things don't need to have a full and controlled repository.

Basically, I think he had in mind QuantEcon/InstantiateFromURL.jl#18 as a formalization of #673 (comment)

  • There would be a utility to for users to easily create the gist on their github account for a given
using InstantiateFromURL
hash = publish_gist(".") # by default, gets the local `project.toml and `manifest.toml` from the local file
# Could optionally pass in the github username, or use the github config to see it.
# e.g. hash = 2e4ebf0df689f4409d4341d366c89f15 
  • Then, in a notebook you would have the hardcoded hash and put at the top:
using InstantiateFromURL
activate_gist("2e4ebf0df689f4409d4341d366c89f15") # optionally have a tag?
  • If anyone wanted to update their gist, then could just call publish_gist(".", hash) to commit and push changes

.... or something along those lines.

@simonbyrne
Copy link
Contributor

simonbyrne commented Feb 27, 2019

I've been using notebooks + toml in gists for a while, and while it works, there are some hassles

  1. setting it up is a bit of a paint: you have to create the gist, then clone it back to the directory. Could be addressed by a script (though you would require a GitHub API key), but would be nicer if it could be done via Jupyter itself. Once set up though, pushing updating is easy.

  2. all my gists end up being called "simonbyrne/Manifest.toml" (I assume because this is the file that appears first when sorted by ASCII?). GitHub doesn't seem to provide a mechanism to rename them (you can change the comment that appears below, but not the name).

@arnavs
Copy link

arnavs commented Feb 27, 2019

Not sure if this helps, but the InstantiateFromURL package grabs repo tarballs (which don’t require an API key), and we store them (names salted with SHA hash) in a hidden directory from where the script is run.

Could be different on the gist side, though.

@jlperla
Copy link

jlperla commented Feb 27, 2019

Setting it up is a bit of a paint: you have to create the gist, then clone it back to the directory. Could be addressed by a script (though you would require a GitHub API key), but would be nicer if it could be done via Jupyter itself. Once set up though, pushing updating is easy.

I agree, and those sorts of scripts built into a package seem to be what Valentin was getting at. I think it is a perfect case for a light package (which could ultimately become a feature of Pkg3 itself). I am hesitant to say that we should have it in "jupyter" or IJulia since this is a more general problem than just jupyter notebooks.

If anyone wants to work on gist features, @arnavs and I would be happy to merge them into the InstantiateFromURL.jl as a testbed

@c42f
Copy link
Member

c42f commented Feb 28, 2019

These are the tip of the iceberg.... As I said, I used to think that this stuff belonged in the notebook but changed my tune completely after seeing usage scenarios.

These are all good points but come with strong assumptions that:

  • You are teaching a large set of students, who all have to do more or less the same thing (ie, class assignments).
  • You as the instructor are available to set up the version information in a centrally hosted location.

Consider instead that you are helping a group of nontechnical colleagues (students and lab staff) with their individual projects, each of which has different package requirements. This situation is a very different use case and I don't see how InstantiateFromURL can help with it.

@jlperla
Copy link

jlperla commented Feb 28, 2019

This situation is a very different use case and I don't see how InstantiateFromURL can help with it.

Hence the suggestion to have gist based workflows from some people with simple publishing tools. We didn't build it because of lack of time and not knowing requirements since we didn't need it.

My points are primarily about the difficulty of having relatively non-technical people manage project and manifests within the jupyter files themselves and all the things that can go wrong.

The other thing to consider is that the students can use a base set of packages and then install additional ones with the by the build commands at the top of their own notebooks.

But I could be wrong... Maybe there is some sort of technology that could make managing embedded package information within a notebook seamless and manageable. But it is hard to imagine without deep integration of both IJulia and Pkg3 (which there seems to be little appetite for).

@tkf
Copy link
Member

tkf commented Feb 28, 2019

@c42f FYI IJulia.load_string seems to be a better option than clipboard when you are using it in Jupyter notebooks.

  • Mutable environments are somewhat problematic; you want the jupyter user to be able to add easily to the environment, but this conflicts with a desire to make them immutable and content addressed for the purposes of activating them from jupyter code.

I thought about how to address it. Here is an idea: put the following code with a hypothetical function use_packages in a hypothetical package IJuliaPkg at the top of the notebook:

using IJuliaPkg
use_packages(
    [
        "Plots",
        "DifferentialEquations",
    ],
)

which adds the packages in a plain environment, encode Project.toml and Manifest.toml in base64 or upload it to gist (hereafter I call the Julia object for it $ENCODED_PROJECT), and then replace the current cell with

using IJuliaPkg
use_packages(
    [
        "Plots",
        "DifferentialEquations",
    ],
    project = $ENCODED_PROJECT,
)

using IJulia.load_string(..., true). It should be easy to make use_packages idempotent; i.e., don't do anything other than instantiate+activate when the set of packages to be installed is identical to the one recorded in Project.toml in $ENCODED_PROJECT. I think this lets you change the requirements of the notebook as you go. That is to say, if you want to import PyPlot, go to the top of the notebook and edit it to

using IJuliaPkg
use_packages(
    [
        "Plots",
        "DifferentialEquations",
        "PyPlot",
    ],
    project = $ENCODED_PROJECT,
)

and then hit shift+enter which updates $ENCODED_PROJECT.

@c42f
Copy link
Member

c42f commented Mar 2, 2019

@tkf thanks, that's an excellent point. I had just assumed overwriting a code cell from the kernel was impossible! With this in mind I think it's possible to have a self contained solution.

@stevengj
Copy link
Member

Closed by #820.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests