-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hello from a similar project #420
Comments
Hello, thanks for the message! I've already read about MyST-NB a few days ago in some other issue, and I've promptly added a link: #418. I'm also aware of jupinx and Jupyter Book, but I'm probably not up-to-date with the latest developments. I don't quite understand what jupyter-cache is all about, is this something that could be relevant to I've also thought about syntax extensions, e.g. "note"/"warning" boxes in jupyter/notebook#1292. I think a syntax extension would only really make sense if it will potentially be implemented in JupyterLab (or the Classic Notebook). Are you planning to propose your syntax extensions for JupyterLab?
That's great, I'm looking forward to PRs!
Well, as I mentioned I'd be a bit hesitant with syntax extensions. My goal for Of course Another example is the upcoming "gallery" feature (#392). This obviously doesn't generate a gallery in JupyterLab, but at least it will show valid links to the notebooks. I have the feeling that with syntax extensions much of this is lost, and people are motivated less to actually open their notebooks in JupyterLab. |
Heya,
Well MyST is now an official format in jupytext: https://jupytext.readthedocs.io/en/latest/formats.html#myst-markdown. So in that respect its easy to open the text based document in JupyterLab to do work on the code.
I kind of disagree here: In terms of writing the documentation, in particular the "richer" documentation required for writing proper scientific books (citations, references, figure captions, ...), you would probably be less likely to actually want to do this in JupyterLab; for example there is now a prototype VS Code extension, with deeper language support. The idea is, with jupytext, you can just switch between the two; Jupyterlab and Text editors. Note, this kind of extension can/will probably also be created for JupyterLab at some point. |
See the very newly merged https://myst-nb.readthedocs.io/en/latest/use/execute.html |
re: syntax extension, I think it's just a question of the goals of the project. As you say, nbsphinx is trying to play it conservatively when it comes to introducing new syntax etc. I think that makes sense. MyST-NB is intentionally trying to expand what is possible with notebooks, and so that project will get a bit more experimental. It's good to have both kinds of projects (it's also why MyST-NB is a separate repo, since I assume nbsphinx doesn't want to suddenly support a brand new flavor of markdown). I think in the medium-long term, we should extend the syntax that Jupyter Notebooks support, because as @chrisjsewell mentions there are just a ton of things people want to do with notebooks that aren't supported by CommonMark. The plan had always been to do this whenever CommonMark extended itself to more complex syntax, but...that hasn't happened yet, and so I think we will need projects like MyST-NB to blaze a trail and see what works, and then when the time comes it will be easier to decide if and how to extend the "core Jupyter markdown syntax" with these use-cases in mind |
👍 |
Now I'm confused: Are you talking about using MyST as alternative storage format (instead of JSON-based Or are you talking about using MyST as alternative format for Markdown cells within JSON-based notebooks? Or both?
But in this case the Markdown cells would still use extensions that are not supported by JupyterLab, right? It might be easy to open the notebooks, but the Markdown cells will contain unrecognized markup code, right?
That's totally fine, and it's important to acknowledge that we disagree here. We should be aware of those different use cases.
I guess that's a possibility. Would this happen through a "Jupyter Enhancement Proposal" (JEP)?
Exactly. However the Markdown parser could theoretically be factored out in order to play around with some experimental parser that supports new syntax extensions.
I agree.
Indeed!
How would that happen? |
re: your points about JEPs - yep, I'd guess that this is where any broad markdown flavor extension would happen. It'll be easier to get community buy-in if there is prior art, and even better if there are users that have tested out various options first. I think that's part of what we're trying to accomplish with myst-nb. And as @chrisjsewell mentions, we can also build in functionality in jupyter via extensions before it needs to be an "official" part of the core markdown flavor. (also worth noting that the major extra syntax pieces in myst-markdown are designed to degrade gracefully in a markdown renderer that doesn't understand MyST. Things like roles and directives will mostly just become "literal" blocks, and will still display)
Both:
|
Note that jupyter lab may (hopefully) eventually move over to using markdown-it as the core renderer jupyterlab/jupyterlab/issues/272. At this point, given that MyST-Parser also now uses |
Also note that these files are now integrated with execution and caching, so that you have the option of never actually converting it to a notebook and, when building the sphinx documentation, code cell outputs are pulled straight from the cache, and the files will only be re-executed when any of the code (not markdown) content changes |
Thanks for clarifying the different use cases, now the situation is clearer to me, but still not completely clear: Is the I guess latter, because I could only find it in https://github.com/ExecutableBookProject/MyST-NB/blob/master/myst_nb/converter.py, and not in https://github.com/ExecutableBookProject/MyST-Parser. So as I understand it now, there are actually two different things:
Is this correct? If yes, that means there are those possible use cases:
It looks like there are two orthogonal things mixed together in MyST-NB. That's of course totally fine if you like to organize the project this way, but it would make it less confusing for outsiders (like me) if you could make this clearer in the documentation.
OK, that sounds reasonable. But wouldn't it be even better for users to test out the options in JupyterLab?
What mention are you referring to? Are you talking about JupyterLab extensions?
I disagree that this is "graceful"! It might be useful for some ad-hoc experiments, but I think it's a very bad idea to ab-use code blocks for arbitrary (non-code!) uses. But anyway "degrading gracefully" shouldn't be such an important aspect when coming up with syntax extensions.
I'm aware of this issue, but I decided to not hold my breath. In fact, I already suggested (a few months earlier, nearly 4 years ago) to switch the Classic Notebook to CommonMark, mentioning
That sounds great, regardless whether the source file is an actual Jupyter notebook or some other storage format. Are there plans to make this caching mechanism available to general use together with More specifically, I think it would be interesting to use this in But for that, the most important question for me would be: If latter, this would be really interesting, and it would probably also take care of #87. |
Absolutely, this is all just good discourse, even if I do come off a bit argumentative 😁
I think its import to note that, from my perspective, our project is the Executable Book Project not the "Documentable Notebook Project". JupyterLab and nbsphinx are great for creating some quick documentation from notebooks. But for anything more than simple documentation, The way I would use MyST-NB, would be to exclusively write in the In this respect, personally, I really don't care how JupyterLab displays the Markdown cells.
disagree, from the commonmark spec
i.e. there is nothing to say that fenced block must contain code per say, and that is exactly what sphinx directives are: "interpreted literal text"
I be very interested if you could point me towards this. In terms of CommonMark though, I am going to go out on a limb and say there will never be any changes now to the core syntax. The only thing the CommonMark forum is full of is 7 year old discussions about adding different syntax, that never actually happens lol.
Well it is already its own separate package jupyter-cache
Maybe you could clarify what (transitive) dependencies you had in mind? But it does tentatively have the hooks in place (if not yet implemented) to handle assets (external files required by the notebook to execute) and artefacts (external files output by the notebook execution) |
(and to your broader points about the use-cases) I think you've generally got it right 👍 and we appreciate the feedback on documentation. We've been more in coding mode than documenting mode lately, but I think it's time to take another pass through to improve explanations etc because it has been a while. Technically, right now MyST-NB doesn't define the specification for a "Jupyter notebook written in markdown with MyST markdown", it just knows how to use Jupytext to convert myst notebooks into
Yes for sure - there are only so many hours in the day though :-) we need the core build system to be there first, and then we can start building out an ecosystem of plugins etc around this project.
I'm not sure about Jupyter Lab, but there were certainly extensions in the Jupyter Notebook that did this, in particular for things like adding references and citations. I believe matthias also once had a plugin that would do variable injection into the markdown at rendering time. |
This may be of relevance https://github.com/jupyterlab/jupyter-renderers |
OK, that's good to know.
I guess it depends on what you mean by "proper scientific articles". Do you have an example (probably a mock-up?) of an Executable Book that already reaches your goal of "proper scientific article"? What's the most polished example that's currently existing?
OK, so you force people to use two different tools for Markdown and for code. I guess the plan is to further enhance VS Code so that at some point JupyterLab isn't needed anymore?
I guess we have a different understanding what "literal text" is. For me, "literal text" means that it contains literal characters that are not interpreted as markup. One of the most important features of "literal text" is that "newline" characters are displayed as actual new lines. There are typically no automatic line breaks. All in all, I think this is a bad (i.e. not very graceful) fallback.
No, it's not. There is only one Sphinx directive which fits this description: https://docutils.sourceforge.io/docs/ref/rst/directives.html#parsed-literal IMHO "interpreted non-literal text" would be more appropriate as a fallback.
Sure. There's a lot of discussions.
I have the feeling that the colon ( I don't yet have an opinion on what exactly is the most appropriate syntax, but I do know that backtick fences are not appropriate and kinda the worst possible option.
I think so, too, I guess there will only be minor changes. But that doesn't mean that there will never be additional syntax, e.g. for generic blocks.
Yeah, it's funny. But I can't complain, because I haven't yet done anything to solve the situation.
OK, that's cool. I haven't still quite understood: Is this supposed to be used together with
Well all of them! Imagine you have a notebook containing code which imports a local Python package which in turn imports, say, Matplotlib. My question was whether all used source files will be considered for caching. For example: I execute a notebook, then update my local Matplotlib installation, then execute the notebook again. Will the second run use the cache or will it re-run the notebook (assuming some relevant files in the Matplotlib source code have changed)?
OK, that sounds good.
I don't see any Markdown renderers there? |
The closest to an official use of Literally the only thing you need for these blocks is to tell the parser do not automatically parse the content as markdown; store the content verbatim, and it will be interpreted at a later time. That is exactly what backticks do perfectly fine. The only directives where non-literal/container blocks are appropriate are for admonitions like Also, the most widely used text-based notebook format, RMarkdown, uses backticks. So are you saying that it is a bad format? |
I'm following the discussion quietly and just wanted to chime in to comment about this,
Sphinx already has a mechanism to determine this ( My 2¢ |
I think you are confusing what sphinx does; its only checking if the
FYI this is already what MyST-NB does, but in using |
That's what I tried to say, but maybe I didn't express myself properly. I'd follow the philosophy that Sphinx follows, not exactly the same mechanism. That is, if they are not considering the Python environment itself (all packages versions) as part of the their cached mechanism, I wouldn't consider either for the notebooks.
I agree with you on this. I think this is hard to implement in a reliable manner and probably the benefit it's not too much. I think you will end up calling Sphinx with
I read this in its docs and I think it's smart to follow this distinction 😄 |
Yeh absolutely, its just trying to get the balance right between automating the re-build logic, and having manual control to force the rebuild of aspects of the build. |
I guess there are two kinds of directives: ones which want to keep the literal text (e.g. code blocks) and ones whose contents are supposed to be further parsed. I think the fenced code block syntax (
I don't understand what would become incompatible by what happening, can you please elaborate?
Exactly, for those kinds of directives that syntax is indeed perfectly fine. The other kind is the problem. For example, from https://myst-parser.readthedocs.io/en/latest/using/syntax.html#directives-a-block-level-extension-point:
Oh, I should probably have read this before answering above ...
What makes one kind more general than the other? I think you will need support for both kinds.
I'm not familiar with RMarkdown, so I don't know. If they use backtick fences for non-literal text, that part would certainly be bad, but I don't know whether they do.
Yes, this mostly works well. But when including executed Jupyter notebooks, whoever executes them (e.g. This is what issue #87 is about. And that's why I was asking about (transitive) dependencies, because such a feature would be really great. The Sphinx re-build mechanism has a big flaw, though: Whenever an uncaught exception happens during the Sphinx build, this seems to invalidate the whole "environment" and the next build will re-build everything. That's why an additional caching mechanism for notebooks may actually make sense.
Well the local python environment also consists of files which have an The "notebook executor" would just need to watch all file accesses and store a list of dependent files. See also #87 for a related discussion.
I agree. This doesn't seem like a feasible option.
I think that's a great feature. And I think this cannot be handled by the Sphinx mechanism, right? I have the feeling that both mechanisms will be needed to get the "full" experience.
There will always be some remaining cases where |
The interplay between the markdown parser and the parsing of content of a docutils directive. With backticks you are telling the markdown parser to do nothing to the text within, just store it as is, then that text can be passed directly to the docutils directive class for further processing. With
I think the word "just" here is a bit generous lol. Firstly notebooks are not constrained to python, the approach would need to be generic to any programming language. Secondly this seems to imply that you would need to scan all the many 1000s of files related to the environment. I imagine that would have a significant impact on performance. |
Exactly.
I don't understand. You parse the contents of, say, So you'll have to have special logic for that already.
Yeah, it would be best if it could be generic. If it needs support from the kernel, this might become quite a bit more complicated.
Well when the Python interpreter runs the stuff, it reads all those files, does that really take that long? The caching mechanism wouldn't even have to read the files, only some of their metadata. And 1000s of files doesn't sound like an impossibility. I wouldn't want to check them manually, but that's what we have computers for, isn't it?
The one thing I was thinking of is https://github.com/GaretJax/sphinx-autobuild. I hadn't looked into this previously, but I just had a quick look and it seems to use https://github.com/gorakhargosh/watchdog for watching files. Of course we wouldn't need to "watch" the files in this sense, we "just" would need to get the list of accessed files. But that information must be in there somewhere ... |
Not exactly no; the content gets parsed directly (as unparsed text) to the docutils Admonition directive, the same as with any other directive. Its just that the directives get initiated with a markdown specific state machine. |
Ah, OK, that sounds magical! So you can customize the way Sphinx parses a directive without changing the directive itself? Just out of curiosity, could you please point me to the MyST-Parser code where this happens? This will explain how the implementation works, but my criticism still stands: It doesn't make logical sense to pass literal text to an "admonition" block. An admonition (as I understand it) contains formatted text, not literal text. |
https://github.com/executablebooks/MyST-Parser/blob/a084975f02c0b4a9141f75878f78b48afa9f9b5a/myst_parser/docutils_renderer.py#L670, This code (as with docutils) does not discriminate between "admonition" type directives or any other type of directive, they are all just parsed a literal block of text. To change this you would have to add special cases to docutils or myst-parser. |
Cool, thanks! I've seen the name "state machine" a few times in the Sphinx source code, but I've never really looked into it. It seems to be a very powerful extension mechanism. I guess with this you get correct line numbers in case of parsing errors? This is something that I don't like about
Yes, it would be good to do that in order to achieve consistent syntax. |
state-machine is just the theoretical framework on which the parser is built, well technically: https://en.wikipedia.org/wiki/Pushdown_automaton
When using text-based notebooks (i.e. https://jupytext.readthedocs.io/en/latest/formats.html#myst-markdown) as the input source, yes the line numbers related directly to the correct lines. Then when using notebooks, there wasn't a "simple" way to incorporate specifically the cell number as a seperate thing, so my solution for now is to use |
That's a nice work-around! In the long run though, I think it would be great to support line numbers for all formats. But I guess for this to work for "normal" |
Hi folks 👋🏽 I know this is an old & long conversation but I have been hesitating a long time to ask the question: what are nowadays the differences between myst-nb and nbsphinx in terms of functionality? In my head, they are largely equivalent, but I might be missing something. I looked at https://nbsphinx.readthedocs.io/en/0.8.0/links.html and https://myst-nb.readthedocs.io/en/latest/examples/custom-formats.html?highlight=nbsphinx and it's still not clear to me. In poliastro, a personal project, I am using nbsphinx + jupytext to include MyST notebooks in Sphinx documentation, and it's working really well. |
Well one of the key differences is that nbsphinx uses Pandoc to first convert Markdown text to RST text, then runs that through the RST parser (to convert to docutils AST), whereas myst-nb uses myst-parser to directly convert the (MyST) Markdown text to docutils AST. This means that the Markdown you use for nbsphinx is mainly Pandoc flavoured Markdown (https://pandoc.org/MANUAL.html), plus the syntax extensions detailed in the documentation, whereas for myst-nb you use MyST flavoured Markdown (https://myst-parser.readthedocs.io/en/latest/syntax/syntax.html), including roles, directives, etc, (note also any of the configurations/extensions for myst-parser are also applied to notebook files, since myst-nb just builds on top of myst-parser) There's also differences in the execution engines; nbsphinx executes each notebook during the parsing phase whereas, depending on execution mode, myst-nb executes all notebooks up front and caches them with jupyter-cache (obviously @mgeier can correct me if I wrong in any of this) |
@astrojuanlu I'm not sure if you need to use the The MyST project is a bit confusing because there are two different things called "myst", see my comment above: #420 (comment). AFAICT, you are using MyST only as a serialization format for "normal" Jupyter notebooks, i.e. notebooks with "normal" Markdown in their Mardown cells. Since Jupytext can handle those, you can use them with There would be a different possible use case though: you could use MyST syntax within the Markdown cells of your notebooks. This is currently not supported by |
It looks like the Coming back to @astrojuanlu's question:
You are using Another difference is the visual appearance of code cells and their outputs, but this can be tuned by custom CSS, if desired. Other than that, there are for sure many minor differences, but I'm not aware of any bigger differences that haven't yet been mentioned in this issue. |
Thanks @chrisjsewell and @mgeier for your inputs! Yes, we are using both MySTs :) The MyST format for notebooks, and MyST for our narrative documentation. As you saw, we leverage nbsphinx gallery feature, and we wrote it in MyST too. I ask this not only because I was mildly confused myself, but because I'm in the process of writing some documentation about the whole "Jupyter in Sphinx" story and wanted to convey a coherent message. Your replies and experiments have been very useful. |
For reference, that documentation I was writing is this: readthedocs/readthedocs.org#8283 |
Hey there - I wanted to reach out and mention a project that we have recently started, and that has a lot of overlapping functionality with nbsphinx. It is called "MyST-NB" and it is also a notebook parser for the Sphinx ecosystem.
This repository is part of a recent project to extend a few publishing-based tools in the Python ecosystem (jupinx and jupyter book) for the Sphinx ecosystem.
We created a new project to parse notebooks, instead of just upstreaming things to nbsphinx, because we are heavily depending on a brand new markdown parser in Sphinx (myst-parser) and also need to build a fair bit of complex functionality for executing and cacheing notebook outputs (using a project called jupyter-cache). Especially since these pieces were part of a broader publishing toolchain, it seemed too complex to try and fit in with the pre-existing ecosystem.
I don't have a specific goal in opening this issue, other than to just alert the
nbsphinx
devs of the existence of this tool and to say hello. Over time we are trying to upstream as much as we can (e.g. we have a few PRs open injupyter-sphinx
)...I'm not sure what exactly that means in the context ofnbsphinx
, andmyst-nb
is still only about a month old, but I wanted to reach out. Obviously we'd also love to hear what folks think about the MyST parser and MyST-NB...we'll also certainly mentionnbsphinx
in our docs as another core tool for "notebooks in Sphinx" 👍The text was updated successfully, but these errors were encountered: