-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Extract code blocks by ID #303
base: master
Are you sure you want to change the base?
Conversation
There's some facility like this that got built for the Real World OCaml book, isn't there? |
So is the idea that Maybe you could outline how you see this working to give a clearer picture of how this part will be used? |
I don't know what @rizo had in mind but following the link he provided leads to this |
Sure, and I would expect that a tool to run the examples in them would need to perform such pre-processing. Certainly similar tools like toplevel_expect_test do. Since we already need to know the various compilation options in order to run the examples that does not seem unreasonable. Also I would expect such a tool to work like other "expect test" style tools, so to produce an updated version of the ml file, so that we can use dune's
There are a variety of suggestions on the thread @rizo linked so I wasn't sure which suggestion was being referred to. It does look like he has the comment you point to in mind. That comment though does not constitute a full proposal. It is not clear to me how you would use those commands to build a useful workflow. |
I should say though, that I think any proposal is going to require the syntax to allow code blocks to be labelled in some way -- so that part seems useful regardless of how we build the surrounding tooling. |
I'm not sure what you are refering to. Basically with the proposal you can see To give an example say you have this
Then
Your build system can then pickup these file names to invoke:
At which point you may instruct your build system to do something with the generated |
Sure, but how do I map the results of using |
One can add an option to
I don't know exactly what this is but wouldn't the above suffice ? |
Perhaps I should clarify what I mean by an "expect test" workflow. I think that one of the most useful aspects of being able to run examples from documentation is to be able to write things like: {[
# let foo x y = y;;
val foo : 'a -> 'b -> 'b = <fun>
# foo 1 3;;
val - : int = 3
]} and have a tool which checks that the results of running that code in the toplevel match what I've written there. In an "expect test" style workflow this is achdieved by having the tool print a new version of the |
As examples this is how the mdx and toplevel_expect_tests tools work. |
Well I don't expect |
Me neither. I don't think |
Maybe but OTOH |
I can see how this proposal would be better than what I'm suggesting for literate programming, is that what you have in mind? In which case, would only having the support for doing it for My other main concern is the semantics of the "id"s. Most systems that have quoting of code fragments like this (markdown, org-mode, etc.) have support for specifying the language of the code fragment along with other data about how the code should be extracted. Specifying the language seems particularly useful for odoc to allow us to correctly do code highlighting in the resulting documentation. I'm worried that we won't easily be able to do that if we merge this patch as is because we'll already have specified a different semantics for the text that appears between |
That's one of the worfklow along with
Agreed I wouldn't be against more structure here. FWIW in CommonMark, the |
The expect test style approach wouldn't give you anything extra, but it would give you what you want in these cases. So if we assume that such support is on the way then I think you could support the |
Not sure I understand the deep reason for objecting doing this on |
I just want to discourage people from trying to use it to implement the execution of code in comments in mli files since I don't think it is the best way to do it. I'd prefer it if odoc only had commands for things that have a clear use case and where the command is the best approach for addressing that use case. Whereas the mld version has a clear use case in literate programming and, as you said, odoc is the main tool that understands mld files so it is the logical place to put the feature. |
I have pushed an early attempt to document this feature (see draft version here). It mostly focuses on user experience and integration with dune. I understand this is might be significantly out of odoc's scope but, as I mentioned previously, it is not my intention to exclusively focus on low-level odoc-specific interfaces. It is not clear (to me) how all the discussed details (extraction, execution, line numbers, test promotion, etc) connect together into something users can use. And that is my focus. Let me know if it is preferred to move this discussion to dune. CC @rgrinberg: you might have an opinion about this.
I think both options could be supported. odoc's parser library is already exposed and it can be used to do the extraction.
I covered this in the proposal. Essentially all code blocks without an associated id can be extracted as a common "anonymous" group.
I do not at this stage have a final plan for the fully integrated solution. One of the reasons I submitted this PR as WIP was to start this discussion. And, as you mentioned, annotating code blocks is needed in any case. I started experimenting with mdx and it seems like it could be used to build a tool (I'm calling it
This is something I mention in my proposal. Essentially using filenames gives us both: the ability to group code blocks and the language information. It does require odoc to interpret code block annotations as file names, but I think it is a reasonable limitation.
By that do you mean "other data about how the code should be extracted"? For OCaml, at least the execution options could be passed as
Mdx actually relies on this. I think we could adopt this model. How would multiple code blocks be grouped for extraction into the same file though? Maybe something like the following could work? The extraction command would match on the {ocaml file=hello.ml[ ... ]}
I'm not sure I understand what this means. Does this only apply to the CLI? Wouldn't the extraction from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks very much @rizo for the design doc. These things help a lot to clarify things and of course provide the basis for the documentation of the feature.
I have left a few comments but basically it quite matches what I think I would like to have. The only thing I'm a little bit sceptical about is the extraction of anonymous blocks, it will likely often be made of garbage so I wouldn't bother.
Other than that the syntactic bit needs to be clarified, I made a proposal that I think blends well with the current ocamldoc
language conventions (think of {:uri}
).
|
||
The file names used to annotated code blocks are also used by odoc to decide | ||
what language should be used for syntax highlighting in the generated HTML. The | ||
language is decided based on the file name’s extension. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As already said that may not be entirely sufficient (I guess clashes do exist). See below for more on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clashes aside, I think a more structured annotation is needed indeed.
notes/testable_examples.md
Outdated
|
||
**Warning:** code blocks without a file name will not have syntax highlighting. | ||
Once this feature is implemented, the currently used automatic language | ||
inference should be disabled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is a good idea. Historically code blocks meant OCaml. If you didn't want syntax highlighting you would use a verbatim block ({v v}
). For small anonymous snippets it's tedious to always have to specify the language. It's better to simply default to OCaml, that's what the ocamldoc language was designed for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I generally dislike implicit promotions, but let's respect the history in this case. I updated the proposal to match your suggestion.
Input cmti, cmt, cmi, mli or mld file. | ||
|
||
|
||
Odoc 11VERSION11 odoc-extract-code(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing is a way to extract the list of names and their language. Following the current "odoc
way" this could well be odoc extract-code-targets
which would list each on their line the set of names and their tokens (see below) existing in the object file.
filename token*
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was in my plans but I forgot to include it. Thanks.
notes/testable_examples.md
Outdated
|
||
-o PATH, --output=PATH | ||
Output file path. If omitted, the provided NAME will be used. | ||
Required for extraction of anonymous code blocks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For convenience, one could add an --all
option which would extracts everything in one go and -o
would then denote the destination directory. That way you can distribute a tutorial as an .mld
file and simply instruct someone to odoc extract-code --all -o . tutorial.mld
to be setup.
For example this gist and associated blog post could well be distributed as a single .mld
file (see for example the "# Generate the html page
" in the toplevel comment...).
| **Anonymous code block** | **Named code block** | | ||
| ------------------------ | ----------------------------------- | | ||
| `"{[" <content> "]}"` | `"{" <filename> "[" <content> "]}"` | | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing this proposal doesn't mention but must be clarified is what happens to named code block which are within stop comments (**/**) ... (**/**)
.
I think these should be included in the file aswell (they allow to hide setup minutiae you may not want to have in the rendered document).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But… do we get these in .mld
files ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Corrected the proposal, let me know if this is what you had in mind.
But… do we get these in
.mld
files ?
Maybe the extended code block annotation could have an option for that...
|
||
- Should odoc require code block annotations to be filenames with extension? | ||
The extension could be used to identify the language and correctly do code | ||
highlighting. On the other hand the code blocks could be annotated only with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe one way would be to simply have these two forms
{[:filename token* [ ... ]}
{[ token* [ ... ]}
without more constraints than that.
With filename
a syntax that allows a relative file path (that way an .mld file can specify a source file hierarchy) and token
arbitrary identifiers that allow the :
character.
That way authors and processors are free to collaborate on the meaning they want to give to tokens. Of course odoc
should standardize a few behaviour like:
- On
{[:f.$EXT [...]}
, odoc HTML gen will try to highlight the block using the most likely language associated to the extension$EXT
(if present). - On
{[:f.$EXT lang:$L [...]}
, odoc HTML gen will try to highlight using language$L
. - On
{[ lang:$L [...]}
, odoc HTML gen will try to highlight using language$L
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a reasonable proposal. I also think this should be the focus for the actual work in the current PR. I'll elaborate on this a bit later.
|
||
--anonymous | ||
Extract code blocks without name. Cannot be used with the `--name' | ||
option. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not. Note however that my guess is that unless the author pays particular attention to her anonymous code block the result will likely be garbage most of the time (you may have code showing errors, or even get different languages...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Echoing @jsomers desire for certain fileless senarios (though formally you could always simply specify a filename to use by convention). This could be --anonymous=TOKEN
, and would extract anonymous code blocks that have the token TOKEN
(e.g. ocamlx
).
It would also prevent the likely garbage problem I mentioned, if completely unconstrained.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is it anonymous if it has a name (ocamlx
in your example)?
But I do understand your concerns. I will try to address this in the new proposal for code block annotations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still regarding anonymous blocks:
The only thing I'm a little bit sceptical about is the extraction of
anonymous blocks, it will likely often be made of garbage so I wouldn't
bother.
It was one of the @lpw25's concerns, I believe. I imagine for large code bases
it would be useful to operate on existing unmarked code blocks in some way. So
if people do want to execute them they might get an opportunity to fix the
"garbage".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is it anonymous if it has a name (
ocamlx
in your example)?
In the sense it has no filename.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically in the two forms here. The first one is a named code block and the second one is an anonymous one.
I quite like the idea of being able to extract by token, this leaves a lot of flexibity to authors to attribute meaning to what they want to extract.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also like the idea of token annotations. Without wanting to commit to a concrete syntax, something like {ocaml id=BLOCK_ID[...]}
could work, where BLOCK_ID
could be anything including a file name. This of course would be used during extraction. I'd make the --output
option required though, to clarify that the id
is not necessarily a file name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think it's preferable to have to direct support for filenames.
Thanks @rizo for putting together this design. It jibes with what I've been thinking about, though I think there is one key difference.
I don't think so. For an mli file, the workflow I imagined was:
# 1 + 1
- : int = 2 then you go through the diff-and-promote workflow. (That is, if the output is incorrect, you generate a .corrected file and allow the user to easily diff against and accept this correction.) There shouldn't be any need to add line number directives -- the tool should figure out where in the mli file this code lives, and should be able to patch in the correction appropriately. (Toplevel_expect_test and mdx both do this.) Otherwise, if you're not dealing with toplevel-style code, you simply throw a compiler error if the code block doesn't compile. Note that this workflow is almost exactly the same as processing a markdown file with mdx, except that instead of executing the code in triple-backtick blocks, you're executing the code in For mld files, the flow would be basically identical to mdx, except that instead of the base markup language being Markdown, it would be the markup language used by odoc. By doing things this way, without filenames, you do give up the ability to write code in one block that references input from files extracted from other code blocks -- but I don't think that's nearly as important as being able to simply run small self-contained examples that use the library you're documenting. As the user of the tool, I don't want to have to think about the code in my doc comments being extracted out into a separate file where it's then executed; I just want the code executed. If the tool needs to extract my code to separate files under the hood, that's fine, but I don't think that detail should be exposed to the user. The user (at least this user) just wants to say "execute this block" on some blocks. |
@jsomers I don't think what is described here prevents from implementing a system that does exactly what you want. What is described here is rather the plumbing and a more general mecanism that allows you to implement the system you would like as a special case.
Extracting with line directives is precisely what will allow the underlying tool to figure out where in the mli/mld file the code you are processing lived and, for example if you compile the extraction, to actually report any compilation error in the original file and location rather than in the extracted file. |
Ah, I see -- my worry is that we would require the end user to add line-number directives by adding a special notation. But it sounds like you're saying we wouldn't. |
No, that would be absolutely terrible :-) Basically the lineno stuff would work as as follows. Suppose you have your Bla bla
{ocamlx[
let f x = x
]}
Ho Ho
{ocamlx[
let () = print_endliiine "bla.ml"
]} Invoking #3 "file.mld"
let f x = x
#7 "file.mld"
let () = print_endliiine "bla.ml" Now you can try for yourself to compile that with > ocamlc ocamlx.ml
File "file.mld", line 7, characters 9-24:
Error: Unbound value print_endliiine
Hint: Did you mean print_endline? |
notes/testable_examples.md
Outdated
|
||
Library authors are encouraged to include examples and short snippets of code | ||
in documentation to demonstrate how to effectively use their library. Such code | ||
snippets are included in docstrings as code blocks and therefor cannot be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/therefor/therefore
notes/testable_examples.md
Outdated
snippets are included in docstrings as code blocks and therefor cannot be | ||
executed and tested in the same way regular source files are. This leads to | ||
code duplication for library authors who want to make sure their examples can | ||
be correctly executed, and to out of date examples when they forget to update |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"out-of-date" for the adjective.
notes/testable_examples.md
Outdated
|
||
To address this problem odoc implements the ability to extract code blocks from | ||
documented interfaces and documentation pages (`mli` and `mld` files | ||
respectively) into source code files. With this build systems can implement |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/With this/With this,
notes/testable_examples.md
Outdated
|
||
## Named code blocks | ||
|
||
In the new version of odoc code blocks can be annotated with a file name. This |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/odoc/odoc,
notes/testable_examples.md
Outdated
Both named and anonymous code blocks can be extracted by odoc via the | ||
command-line interface. Code blocks with the same file name in a given | ||
documentation file will be concatenated and written into a file with that name. | ||
Optionally a different output file name for a given group can be provided. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/Optionally/Optionally,
number directives](https://caml.inria.fr/pub/docs/manual-ocaml/lex.html#sec86) | ||
in the OCaml manual). | ||
|
||
**Note**: unrelated code blocks do not need to have a unique file name, it is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean they will all go into one file example.ml
? If so, this should read something like "unrelated code blocks can be put into the same file, and so don't need unique file names." However, why would it be good to put unrelated examples into one file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, why would it be good to put unrelated examples into one file?
If an interface has hundreds of functions each one of them with examples, will users want to provide a unique name to each code block to get code execution and syntax highlighting?
It is clear that filenames are not a satisfactory way to annotate code blocks, so this needs to be changed in the proposal anyway.
Thanks for the design doc @rizo that makes your plan much clearer. I think the aim is a good one but in general I think that the proposed The need to talk about external files is an unnecessary burden for users. In the common case they simply want to execute all the comments in a single environment. This behaviour should be the default. If they wish to keep some comments separate there should be an optional argument like the {ocaml env=foo[
...
]} If they wish not to run a particular comment there should be an optional argument for that to: {ocaml executed=false[
...
]} The dune support should simply be something like The The
This is I think a better approach than
{ocaml file=hello.ml[ ... ]} I think that would be better for implementing |
I'm not exactly sure how something generic like this gets in the way of
|
Thanks that makes things clearer. So you have some literate programming within the |
I don't think this concerns this actual proposal but I would rather not do this and rely on a special token for that (e.g. You still need a bit of ordering care when you start executing code blocks. When I don't want to care about this I prefer not having to care... by which I mean that having to specify an For all it's awesomeness and shininess I still think that executing code blocks in API docs remains quite a limited tool as soon as you escape the realm of documenting basic data structures. |
I added this to the requirements in the proposal. To be clear the default behavior in my current proposal is to execute all code blocks that have a file name with the I do agree that annotating code blocks with the language and allowing arguments like Non of this has to be implemented for the initial version, of course. We can start with a simple mdx-style tool that executes I'll try to clarify this in my updated proposal. |
I agree. As long as the code blocks are properly annotated with language and |
I personally don't like the But yeah, executing everything unconditionally is also quite extreme. I'll have this in mind and will make a proposal for code block annotations that allow this kind of flexibility. |
I suspect that the best thing to do here is to let the default be specified. I would expect both |
Then it's not really a default, in that case ;-). But I agree that this could be configured on a level higher than a code block. The question is more about what would be the default in case users don't provide this option. |
I suspect this is really not a good idea. It's always better to keep documents self-describing to the readers. If you do this then I must hunt in the build system what default you choose and then remember it in my head when I read a code block. I would really go for not executable by default and maybe simply |
There are going to be plenty of people who want to test all their code in comments with "odocx" and they're going to be pretty annoyed with having to write |
That's the reason why I proposed |
(even |
There are conflicts now, so should be rebased. |
@@ -88,6 +88,14 @@ | |||
ignore foo | |||
]} | |||
|
|||
Code blocks can have an identifier attached to them: | |||
|
|||
{ocaml[ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Misleading example? Using ocaml
as the identifier makes it looks like an annotation specifying the language, rather than an identifier to enable references to this specific block from elsewhere.
{ocaml[ | |
{print-two-plus-two[ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current definition of annotations needs to be changed in my opinion as it has some limitations. Unfortunately I haven't been available to work on this in the last few months.
To clarify the misunderstanding here: the code block annotation is not used to reference a specific block from elsewhere (it is not a unique identifier). It annotates a class of blocks to be extracted. It could be a language or a filename, or any other user-defined "class".
Not sure if this is helpful, but let me know if you have any other questions or suggestions.
44dd495
to
cc1f437
Compare
Could we move on this ? |
This is a work in progress attempt to implement testable code examples for odoc. Or to be more precise, the extraction of annotated code blocks from
mli
andmld
files. See #130 for a proposed design and discussion. I might work on tools complementary to odoc to facilitate the execution of the examples and the integration with dune.