Retrieve the code block content from the odoc-parser locations #397

Leonidas-from-XIV · 2022-09-27T16:21:51Z

The reason for this is due to us wanting to have the spaces available in MDX, so we know all the whitespace, so we can format it the way it came in from the user.

This also addresses the parsing issue for multiline toplevel blocks.

Closes #394

panglesd

Thanks @Leonidas-from-XIV for the PR!

I left some minor comments in the code.

Having the comments formatted the way they came is very useful for compatibility with code formatters. (There are still requirements on the format for toplevel blocks and cram blocks, so the code formatter should not do too invasive changes to keep compatibility with mdx).

Note that currently it is not exactly formatted the way it comes from the user: there are cases related to the formatting of last and first lines which changes this input. However, they are strange formatting and I think it is safe to modify them (if you manage to get a true preservation of input that's even better!). See #397 (comment)

It's very nice to see all those match syntax with ... be removed!

panglesd · 2022-10-05T06:37:27Z

CHANGES.md

+- Fix parsing of multiline toplevel phrases in .mli files (#394, #397,
+  @Leonidas-from-XIV)


Maybe this could be merged with the #395 changelog entry? They feel related.

They sort of are, but this PR does a lot of "under the table reshuffling" so the description in the changelog is essentially just what made me start this endevour. It sort of also preserves the indentation, like #395, but refactors the way this is done.

Another way to describe this could be:

- Preserve indentation in multiline OCaml blocks in .mli files and fix multiline cram blocks (#395, #397, @panglesd, @Leonidas-from-XIV)

But I feel like one of them is a change and one is pretty much a bugfix, so joining them feels a bit unfortunate.

The problem after working with a PR for too long of a time one looses the user perspective and sees the technical change, but the Changelog is specifically written for users. If you have a suggestion, I'll gladly incorporate it!

I'm convinced: one PR is about multiline OCaml block in mli, the other about multiline toplevel block in .mli. And the fact that the second one change the way the first one work is not of interest for the user.

What you could do is just to add your PR in the changelog entry of #395. It is sometimes useful when there is an issue with a new feature to find the feature in the changelog and go directly to the discussions!

lib/cram.ml

test/bin/mdx-test/expect/multilines-mli/test-case.mli

panglesd · 2022-10-05T07:35:29Z

lib/cram.ml

+let rec hpad_of_lines = function
  | [] -> 0
-  | h :: _ ->
-      let i = ref 0 in
-      while !i < String.length h && h.[!i] = ' ' do
-        incr i
-      done;
-      !i
+  | h :: hs -> (
+      match String.equal String.empty h with
+      | true -> hpad_of_lines hs
+      | false ->
+          let i = ref 0 in
+          while !i < String.length h && h.[!i] = ' ' do
+            incr i
+          done;
+          !i)


Why not using hpad_of_lines from misc.ml?

Is it because of the String.trim of the latter one? It feels a bit duplicated code.

panglesd · 2022-10-05T07:43:59Z

lib/misc.ml

+let rec hpad_of_lines = function
  | [] -> 0
-  | h :: _ ->
-      let i = ref 0 in
-      while !i < String.length h && h.[!i] = ' ' do
-        incr i
-      done;
-      !i
+  | h :: hs -> (
+      match String.is_empty @@ String.trim h with
+      | true -> hpad_of_lines hs
+      | false ->
+          let i = ref 0 in
+          while !i < String.length h && h.[!i] = ' ' do
+            incr i
+          done;
+          !i)


It seems this function is not used anymore in the codebase.
A variant of it is redefined in cram.ml. We should either replace it by the variant and remove the variant from cram.ml, or remove it if it is not useful to anything apart from cram.ml.

lib/lexer_mdx.mll

panglesd · 2022-10-05T08:04:34Z

lib/block.ml

+let pp_lines syntax _t ppf lines =
+  let print_line ppf first last line =
+    (match syntax with
+    | Some Syntax.Cram -> Fmt.string ppf line
+    | Some Syntax.Mli -> (
+        match (first, last) with
+        | true, _ | _, true -> Fmt.string ppf (String.trim line)
+        | false, false -> Fmt.string ppf line)
+    | Some Syntax.Normal | None -> Fmt.string ppf line);
+    match last with false -> Fmt.pf ppf "\n" | true -> ()


I am not sure why this trimming on first and last lines in mli files. in particular, with this:

{[ "11" ]}

is tranformed like this (there are removed whitespace on the first line):

-{[ - "11" ]} +{[ +"11"]}

I suggest:

Suggested change

let pp_lines syntax _t ppf lines =

let print_line ppf first last line =

(match syntax with

| Some Syntax.Cram -> Fmt.string ppf line

| Some Syntax.Mli -> (

match (first, last) with

| true, _ | _, true -> Fmt.string ppf (String.trim line)

| false, false -> Fmt.string ppf line)

| Some Syntax.Normal | None -> Fmt.string ppf line);

match last with false -> Fmt.pf ppf "\n" | true -> ()

let print_line ppf first last line =

Fmt.string ppf line;

match last with false -> Fmt.pf ppf "\n" | true -> ()

which however require changes elsewhere as it breaks the tests: something must be adding padding for last line!

lib/mli_parser.ml

Leonidas-from-XIV · 2022-12-05T16:50:43Z

@panglesd Can you take a look again at the PR? I've decided to refactor the way tests are handled by not removing the padding which triggered a lot of changes down the line, since most formats were printing their beginning/end-markers except for MLI so in my quest to unify the way it they get printed I had to revisit a significant amount of code since it needed to be made consistent.

This took a quite significant amount of time because there was a lot of subtlety how everything is printed, but in the end I thing the result makes sense and should be easier to maintain.

I've also refactored some odd side-effectful code and split the Misc module up a bit to be less of a grab-bag of functions and have a little bit more focus.

Unfortunately, that's quite a bit of changes added on top, but once I started digging I wanted to finish what I started.

panglesd · 2022-12-06T08:26:57Z

Sure, I'll have a look ASAP! (ASAP might be next week)

panglesd

Thanks a lot for the hard refactoring!

It is also not easy to review such changes... So I cannot say I have verified every modification. I have some comments, but it looks very nice!

First, from my testing, everything work well and makes it working well with automatic formatter. About empty line preservation, there are some non-important "inconsistencies", that I report here:

In toplevel blocks, empty lines are preserved when positioned first, in between blocks, but not last. For instance:

  {@ocaml[

    # print_endline "foo";;
    foo
    - : unit = ()

    # print_endline "bar";;
    bar
    - : unit = ()

  ]}

will be rewritten to:

  {@ocaml[

    # print_endline "foo";;
    foo
    - : unit = ()

    # print_endline "bar";;
    bar
    - : unit = ()
  ]}

In shell, lines are preserved at first, but not inbetween, or last:

  {@sh[

    $ echo foo
    foo

    $ echo bar
    bar

  ]}

will be rewritten to:

  {@sh[

    $ echo foo
    foo
    $ echo bar
    bar
  ]}

Normal fragments work as expected: nothing is changed from the input.

About the code, it looks much better without all those ?syntax everywhere. I have some comments:

I think it would be better not to have a Language_tag variant in the label type. It does not enforce the fact that there can be only one language tag (while it is enforced by the syntax).
There are more Log.debug than before. I don't know if it's intended, or a debug trace that should be removed...
Mli syntax is much less tested than markdown's one. For instance, labels parsing and reconstructing should be tested (and made less constrained). Although I don't know how much this is related to this PR (which was I think started to improve mli syntax parsing and reconstruction).

lib/block.ml

lib/cram.mli

lib/test/mdx_test.ml

lib/cram.mli

lib/test/mdx_test.ml

lib/toplevel.ml

lib/label.ml

The reason for this is due to us wanting to have the spaces available in MDX, so we know all the whitespace, so we can format it the way it came in from the user. This also addresses the parsing issue for multiline toplevel blocks. Closes #394

This way the line offsets are only calculated once for every docstring and the code doesn't repeatedly scan through the string for every new line it tries to find.

Turns out the combinators can be called directly, and get `ppf` as an argument. The codebase already does this in some places, so let's be consistent.

This adds back the `vpad` functionality which adds newlines in front of cram blocks. It also removes the special-casing of not printing the indentation in some cases, according to the cram spec[0] the output has to start with two spaces (the padding) and then the output (which is empty in that case). [0]: https://bitheap.org/cram/

The cursor mutating made it a bit hard to follow, also the fold allows to avoid a `List.concat` in the end.

Co-authored-by: panglesd <peada@free.fr>

Now cram tests only have a start padding at the beginning and the vpad is only part of the whole block, not individual tests.

This was probably due to the markdown parser leaving out the initial newline initiating a cram block as well as the final newline finishing the block if it is not the final part of the file. But it works for now, so that refactoring is better left for a future PR.

Leonidas-from-XIV · 2023-01-05T11:02:59Z

I'll merge this now since I think it is a decent point and further improvements can be added in future (hopefully shorter) PRs. Thanks for the review @panglesd!

@NathanReb

CHANGES: #### Added - Report all parsing errors in Markdown files (realworldocaml/mdx#389, @NathanReb) #### Changed - Preserve indentation in multiline OCaml blocks in .mli files (realworldocaml/mdx#395, @panglesd) #### Fixed - Fixed compatibility with Cmdliner 1.1.0 (realworldocaml/mdx#371, @Leonidas-from-XIV) - Report errors and exit codes of toplevel directives (realworldocaml/mdx#382, @talex5, @Leonidas-from-XIV) - Fix block locations in error reporting (realworldocaml/mdx#389, @NathanReb) - Include the content of the line that features the `part-end` MDX directive in the output, before that line would've been dropped (realworldocaml/mdx#374, realworldocaml/mdx#387, @Leonidas-from-XIV) - Handle EINTR signal on waitpid call by restarting the syscall. (realworldocaml/mdx#409, @tmcgilchrist) - Fix parsing of multiline toplevel phrases in .mli files (realworldocaml/mdx#394, realworldocaml/mdx#397, @Leonidas-from-XIV) #### Removed - Removed warning about missing semicolons added in MDX 1.11.0 and the automatic insertion of semicolons in the corrected files introduced in MDX 2.0.0. (realworldocaml/mdx#398, @Leonidas-from-XIV)

Leonidas-from-XIV requested a review from panglesd September 29, 2022 12:33

panglesd approved these changes Oct 5, 2022

View reviewed changes

Leonidas-from-XIV requested a review from panglesd December 5, 2022 16:50

Leonidas-from-XIV mentioned this pull request Dec 12, 2022

Restart waitpid call if we get interrupted. #409

Merged

panglesd approved these changes Jan 2, 2023

View reviewed changes

Leonidas-from-XIV and others added 23 commits January 4, 2023 17:34

Add entry to changelog

89d2f44

Remove unused vpad code

4de3b30

Remove unused arguments

7f66174

Link lexed whitespace with generated whitespace

a968a63

Cache line start offsets via partial application

b3b9b54

This way the line offsets are only calculated once for every docstring and the code doesn't repeatedly scan through the string for every new line it tries to find.

Reduce Fmt.pf calls when no string interpolation is necessary

475c2f5

Turns out the combinators can be called directly, and get `ppf` as an argument. The codebase already does this in some places, so let's be consistent.

Update the text of the cram test, it now preserves whatever is passed

b927d7c

Fix typo in test

266f31f

Refactor the way blocks are printed

78f89f5

Remove mutable cursor for readability

05f7f50

The cursor mutating made it a bit hard to follow, also the fold allows to avoid a `List.concat` in the end.

Simplify hpad_of_lines again to its original version

ae07846

Add interface to Misc, removing unused code

27b786d

Clarify Misc.init into Misc.load_file

aecb17a

Split Misc and Pp

a0bb231

Move hpad_of_lines implementation to its only usage

f15e6e9

Use stdlib instead of Astring

cd65a7f

Remove left-over debug prints

f67e599

Update documentation

cbbf207

Co-authored-by: panglesd <peada@free.fr>

Remove vpad for cram tests

6e17dd6

Now cram tests only have a start padding at the beginning and the vpad is only part of the whole block, not individual tests.

Document the purpose of Language_tag

1e58742

Leonidas-from-XIV added 2 commits January 4, 2023 17:38

Add explanation for off-by-one fix

c681392

Add expect test for labels syntax in MLI files

9369d91

Leonidas-from-XIV merged commit aa5551a into realworldocaml:main Jan 5, 2023

Leonidas-from-XIV deleted the multiline-mli-fix branch January 5, 2023 11:03

Leonidas-from-XIV mentioned this pull request Jan 6, 2023

[new release] mdx (2.2.0) ocaml/opam-repository#22867

Merged

thpani mentioned this pull request May 23, 2023

Regression with indented special characters in shell blocks #428

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retrieve the code block content from the odoc-parser locations #397

Retrieve the code block content from the odoc-parser locations #397

Leonidas-from-XIV commented Sep 27, 2022

panglesd left a comment •

edited

Loading

panglesd Oct 5, 2022

Leonidas-from-XIV Oct 5, 2022

panglesd Oct 5, 2022

panglesd Oct 5, 2022

panglesd Oct 5, 2022

panglesd Oct 5, 2022

panglesd Oct 5, 2022

Leonidas-from-XIV commented Dec 5, 2022

panglesd commented Dec 6, 2022

panglesd left a comment •

edited

Loading

Leonidas-from-XIV commented Jan 5, 2023

		- Fix parsing of multiline toplevel phrases in .mli files (#394, #397,
		@Leonidas-from-XIV)

Retrieve the code block content from the odoc-parser locations #397

Retrieve the code block content from the odoc-parser locations #397

Conversation

Leonidas-from-XIV commented Sep 27, 2022

panglesd left a comment • edited Loading

Choose a reason for hiding this comment

panglesd Oct 5, 2022

Choose a reason for hiding this comment

Leonidas-from-XIV Oct 5, 2022

Choose a reason for hiding this comment

panglesd Oct 5, 2022

Choose a reason for hiding this comment

panglesd Oct 5, 2022

Choose a reason for hiding this comment

panglesd Oct 5, 2022

Choose a reason for hiding this comment

panglesd Oct 5, 2022

Choose a reason for hiding this comment

panglesd Oct 5, 2022

Choose a reason for hiding this comment

Leonidas-from-XIV commented Dec 5, 2022

panglesd commented Dec 6, 2022

panglesd left a comment • edited Loading

Choose a reason for hiding this comment

Leonidas-from-XIV commented Jan 5, 2023

panglesd left a comment •

edited

Loading

panglesd left a comment •

edited

Loading