-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Retrieve the code block content from the odoc-parser locations #397
Retrieve the code block content from the odoc-parser locations #397
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Leonidas-from-XIV for the PR!
I left some minor comments in the code.
Having the comments formatted the way they came is very useful for compatibility with code formatters. (There are still requirements on the format for toplevel blocks and cram blocks, so the code formatter should not do too invasive changes to keep compatibility with mdx).
Note that currently it is not exactly formatted the way it comes from the user: there are cases related to the formatting of last and first lines which changes this input. However, they are strange formatting and I think it is safe to modify them (if you manage to get a true preservation of input that's even better!). See #397 (comment)
It's very nice to see all those match syntax with ...
be removed!
- Fix parsing of multiline toplevel phrases in .mli files (#394, #397, | ||
@Leonidas-from-XIV) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this could be merged with the #395 changelog entry? They feel related.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They sort of are, but this PR does a lot of "under the table reshuffling" so the description in the changelog is essentially just what made me start this endevour. It sort of also preserves the indentation, like #395, but refactors the way this is done.
Another way to describe this could be:
- Preserve indentation in multiline OCaml blocks in .mli files and fix multiline cram blocks (#395, #397, @panglesd, @Leonidas-from-XIV)
But I feel like one of them is a change and one is pretty much a bugfix, so joining them feels a bit unfortunate.
The problem after working with a PR for too long of a time one looses the user perspective and sees the technical change, but the Changelog is specifically written for users. If you have a suggestion, I'll gladly incorporate it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm convinced: one PR is about multiline OCaml block in mli
, the other about multiline toplevel block in .mli
. And the fact that the second one change the way the first one work is not of interest for the user.
What you could do is just to add your PR in the changelog entry of #395. It is sometimes useful when there is an issue with a new feature to find the feature in the changelog and go directly to the discussions!
lib/cram.ml
Outdated
let rec hpad_of_lines = function | ||
| [] -> 0 | ||
| h :: _ -> | ||
let i = ref 0 in | ||
while !i < String.length h && h.[!i] = ' ' do | ||
incr i | ||
done; | ||
!i | ||
| h :: hs -> ( | ||
match String.equal String.empty h with | ||
| true -> hpad_of_lines hs | ||
| false -> | ||
let i = ref 0 in | ||
while !i < String.length h && h.[!i] = ' ' do | ||
incr i | ||
done; | ||
!i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not using hpad_of_lines
from misc.ml
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it because of the String.trim
of the latter one? It feels a bit duplicated code.
lib/misc.ml
Outdated
let rec hpad_of_lines = function | ||
| [] -> 0 | ||
| h :: _ -> | ||
let i = ref 0 in | ||
while !i < String.length h && h.[!i] = ' ' do | ||
incr i | ||
done; | ||
!i | ||
| h :: hs -> ( | ||
match String.is_empty @@ String.trim h with | ||
| true -> hpad_of_lines hs | ||
| false -> | ||
let i = ref 0 in | ||
while !i < String.length h && h.[!i] = ' ' do | ||
incr i | ||
done; | ||
!i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems this function is not used anymore in the codebase.
A variant of it is redefined in cram.ml
. We should either replace it by the variant and remove the variant from cram.ml
, or remove it if it is not useful to anything apart from cram.ml
.
lib/block.ml
Outdated
let pp_lines syntax _t ppf lines = | ||
let print_line ppf first last line = | ||
(match syntax with | ||
| Some Syntax.Cram -> Fmt.string ppf line | ||
| Some Syntax.Mli -> ( | ||
match (first, last) with | ||
| true, _ | _, true -> Fmt.string ppf (String.trim line) | ||
| false, false -> Fmt.string ppf line) | ||
| Some Syntax.Normal | None -> Fmt.string ppf line); | ||
match last with false -> Fmt.pf ppf "\n" | true -> () |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure why this trimming on first and last lines in mli files. in particular, with this:
{[
"11" ]}
is tranformed like this (there are removed whitespace on the first line):
-{[
- "11" ]}
+{[
+"11"]}
I suggest:
let pp_lines syntax _t ppf lines = | |
let print_line ppf first last line = | |
(match syntax with | |
| Some Syntax.Cram -> Fmt.string ppf line | |
| Some Syntax.Mli -> ( | |
match (first, last) with | |
| true, _ | _, true -> Fmt.string ppf (String.trim line) | |
| false, false -> Fmt.string ppf line) | |
| Some Syntax.Normal | None -> Fmt.string ppf line); | |
match last with false -> Fmt.pf ppf "\n" | true -> () | |
let print_line ppf first last line = | |
Fmt.string ppf line; | |
match last with false -> Fmt.pf ppf "\n" | true -> () |
which however require changes elsewhere as it breaks the tests: something must be adding padding for last line!
@panglesd Can you take a look again at the PR? I've decided to refactor the way tests are handled by not removing the padding which triggered a lot of changes down the line, since most formats were printing their beginning/end-markers except for MLI so in my quest to unify the way it they get printed I had to revisit a significant amount of code since it needed to be made consistent. This took a quite significant amount of time because there was a lot of subtlety how everything is printed, but in the end I thing the result makes sense and should be easier to maintain. I've also refactored some odd side-effectful code and split the Unfortunately, that's quite a bit of changes added on top, but once I started digging I wanted to finish what I started. |
Sure, I'll have a look ASAP! (ASAP might be next week) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the hard refactoring!
It is also not easy to review such changes... So I cannot say I have verified every modification. I have some comments, but it looks very nice!
First, from my testing, everything work well and makes it working well with automatic formatter. About empty line preservation, there are some non-important "inconsistencies", that I report here:
- In toplevel blocks, empty lines are preserved when positioned first, in between blocks, but not last. For instance:
{@ocaml[
# print_endline "foo";;
foo
- : unit = ()
# print_endline "bar";;
bar
- : unit = ()
]}
will be rewritten to:
{@ocaml[
# print_endline "foo";;
foo
- : unit = ()
# print_endline "bar";;
bar
- : unit = ()
]}
- In shell, lines are preserved at first, but not inbetween, or last:
{@sh[
$ echo foo
foo
$ echo bar
bar
]}
will be rewritten to:
{@sh[
$ echo foo
foo
$ echo bar
bar
]}
- Normal fragments work as expected: nothing is changed from the input.
About the code, it looks much better without all those ?syntax
everywhere. I have some comments:
- I think it would be better not to have a
Language_tag
variant in thelabel
type. It does not enforce the fact that there can be only one language tag (while it is enforced by the syntax). - There are more
Log.debug
than before. I don't know if it's intended, or a debug trace that should be removed... - Mli syntax is much less tested than markdown's one. For instance, labels parsing and reconstructing should be tested (and made less constrained). Although I don't know how much this is related to this PR (which was I think started to improve mli syntax parsing and reconstruction).
The reason for this is due to us wanting to have the spaces available in MDX, so we know all the whitespace, so we can format it the way it came in from the user. This also addresses the parsing issue for multiline toplevel blocks. Closes #394
This way the line offsets are only calculated once for every docstring and the code doesn't repeatedly scan through the string for every new line it tries to find.
Turns out the combinators can be called directly, and get `ppf` as an argument. The codebase already does this in some places, so let's be consistent.
This adds back the `vpad` functionality which adds newlines in front of cram blocks. It also removes the special-casing of not printing the indentation in some cases, according to the cram spec[0] the output has to start with two spaces (the padding) and then the output (which is empty in that case). [0]: https://bitheap.org/cram/
The cursor mutating made it a bit hard to follow, also the fold allows to avoid a `List.concat` in the end.
Co-authored-by: panglesd <peada@free.fr>
Now cram tests only have a start padding at the beginning and the vpad is only part of the whole block, not individual tests.
This was probably due to the markdown parser leaving out the initial newline initiating a cram block as well as the final newline finishing the block if it is not the final part of the file. But it works for now, so that refactoring is better left for a future PR.
I'll merge this now since I think it is a decent point and further improvements can be added in future (hopefully shorter) PRs. Thanks for the review @panglesd! |
CHANGES: #### Added - Report all parsing errors in Markdown files (realworldocaml/mdx#389, @NathanReb) #### Changed - Preserve indentation in multiline OCaml blocks in .mli files (realworldocaml/mdx#395, @panglesd) #### Fixed - Fixed compatibility with Cmdliner 1.1.0 (realworldocaml/mdx#371, @Leonidas-from-XIV) - Report errors and exit codes of toplevel directives (realworldocaml/mdx#382, @talex5, @Leonidas-from-XIV) - Fix block locations in error reporting (realworldocaml/mdx#389, @NathanReb) - Include the content of the line that features the `part-end` MDX directive in the output, before that line would've been dropped (realworldocaml/mdx#374, realworldocaml/mdx#387, @Leonidas-from-XIV) - Handle EINTR signal on waitpid call by restarting the syscall. (realworldocaml/mdx#409, @tmcgilchrist) - Fix parsing of multiline toplevel phrases in .mli files (realworldocaml/mdx#394, realworldocaml/mdx#397, @Leonidas-from-XIV) #### Removed - Removed warning about missing semicolons added in MDX 1.11.0 and the automatic insertion of semicolons in the corrected files introduced in MDX 2.0.0. (realworldocaml/mdx#398, @Leonidas-from-XIV)
The reason for this is due to us wanting to have the spaces available in MDX, so we know all the whitespace, so we can format it the way it came in from the user.
This also addresses the parsing issue for multiline toplevel blocks.
Closes #394