Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retrieve the code block content from the odoc-parser locations #397

Merged
merged 25 commits into from
Jan 5, 2023
Merged

Retrieve the code block content from the odoc-parser locations #397

merged 25 commits into from
Jan 5, 2023

Conversation

Leonidas-from-XIV
Copy link
Collaborator

The reason for this is due to us wanting to have the spaces available in MDX, so we know all the whitespace, so we can format it the way it came in from the user.

This also addresses the parsing issue for multiline toplevel blocks.

Closes #394

Copy link
Collaborator

@panglesd panglesd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Leonidas-from-XIV for the PR!

I left some minor comments in the code.

Having the comments formatted the way they came is very useful for compatibility with code formatters. (There are still requirements on the format for toplevel blocks and cram blocks, so the code formatter should not do too invasive changes to keep compatibility with mdx).

Note that currently it is not exactly formatted the way it comes from the user: there are cases related to the formatting of last and first lines which changes this input. However, they are strange formatting and I think it is safe to modify them (if you manage to get a true preservation of input that's even better!). See #397 (comment)

It's very nice to see all those match syntax with ... be removed!

Comment on lines +22 to +24
- Fix parsing of multiline toplevel phrases in .mli files (#394, #397,
@Leonidas-from-XIV)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this could be merged with the #395 changelog entry? They feel related.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They sort of are, but this PR does a lot of "under the table reshuffling" so the description in the changelog is essentially just what made me start this endevour. It sort of also preserves the indentation, like #395, but refactors the way this is done.

Another way to describe this could be:

- Preserve indentation in multiline OCaml blocks in .mli files and fix multiline cram blocks (#395, #397, @panglesd, @Leonidas-from-XIV)

But I feel like one of them is a change and one is pretty much a bugfix, so joining them feels a bit unfortunate.

The problem after working with a PR for too long of a time one looses the user perspective and sees the technical change, but the Changelog is specifically written for users. If you have a suggestion, I'll gladly incorporate it!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm convinced: one PR is about multiline OCaml block in mli, the other about multiline toplevel block in .mli. And the fact that the second one change the way the first one work is not of interest for the user.

What you could do is just to add your PR in the changelog entry of #395. It is sometimes useful when there is an issue with a new feature to find the feature in the changelog and go directly to the discussions!

lib/cram.ml Outdated Show resolved Hide resolved
lib/cram.ml Outdated
Comment on lines 55 to 88
let rec hpad_of_lines = function
| [] -> 0
| h :: _ ->
let i = ref 0 in
while !i < String.length h && h.[!i] = ' ' do
incr i
done;
!i
| h :: hs -> (
match String.equal String.empty h with
| true -> hpad_of_lines hs
| false ->
let i = ref 0 in
while !i < String.length h && h.[!i] = ' ' do
incr i
done;
!i)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not using hpad_of_lines from misc.ml?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it because of the String.trim of the latter one? It feels a bit duplicated code.

lib/misc.ml Outdated
Comment on lines 19 to 29
let rec hpad_of_lines = function
| [] -> 0
| h :: _ ->
let i = ref 0 in
while !i < String.length h && h.[!i] = ' ' do
incr i
done;
!i
| h :: hs -> (
match String.is_empty @@ String.trim h with
| true -> hpad_of_lines hs
| false ->
let i = ref 0 in
while !i < String.length h && h.[!i] = ' ' do
incr i
done;
!i)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems this function is not used anymore in the codebase.
A variant of it is redefined in cram.ml. We should either replace it by the variant and remove the variant from cram.ml, or remove it if it is not useful to anything apart from cram.ml.

lib/lexer_mdx.mll Outdated Show resolved Hide resolved
lib/block.ml Outdated
Comment on lines 150 to 159
let pp_lines syntax _t ppf lines =
let print_line ppf first last line =
(match syntax with
| Some Syntax.Cram -> Fmt.string ppf line
| Some Syntax.Mli -> (
match (first, last) with
| true, _ | _, true -> Fmt.string ppf (String.trim line)
| false, false -> Fmt.string ppf line)
| Some Syntax.Normal | None -> Fmt.string ppf line);
match last with false -> Fmt.pf ppf "\n" | true -> ()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure why this trimming on first and last lines in mli files. in particular, with this:

{[             
   "11" ]}

is tranformed like this (there are removed whitespace on the first line):

-{[             
-   "11" ]}
+{[
+"11"]}

I suggest:

Suggested change
let pp_lines syntax _t ppf lines =
let print_line ppf first last line =
(match syntax with
| Some Syntax.Cram -> Fmt.string ppf line
| Some Syntax.Mli -> (
match (first, last) with
| true, _ | _, true -> Fmt.string ppf (String.trim line)
| false, false -> Fmt.string ppf line)
| Some Syntax.Normal | None -> Fmt.string ppf line);
match last with false -> Fmt.pf ppf "\n" | true -> ()
let print_line ppf first last line =
Fmt.string ppf line;
match last with false -> Fmt.pf ppf "\n" | true -> ()

which however require changes elsewhere as it breaks the tests: something must be adding padding for last line!

lib/mli_parser.ml Show resolved Hide resolved
@Leonidas-from-XIV
Copy link
Collaborator Author

@panglesd Can you take a look again at the PR? I've decided to refactor the way tests are handled by not removing the padding which triggered a lot of changes down the line, since most formats were printing their beginning/end-markers except for MLI so in my quest to unify the way it they get printed I had to revisit a significant amount of code since it needed to be made consistent.

This took a quite significant amount of time because there was a lot of subtlety how everything is printed, but in the end I thing the result makes sense and should be easier to maintain.

I've also refactored some odd side-effectful code and split the Misc module up a bit to be less of a grab-bag of functions and have a little bit more focus.

Unfortunately, that's quite a bit of changes added on top, but once I started digging I wanted to finish what I started.

@panglesd
Copy link
Collaborator

panglesd commented Dec 6, 2022

Sure, I'll have a look ASAP! (ASAP might be next week)

Copy link
Collaborator

@panglesd panglesd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the hard refactoring!

It is also not easy to review such changes... So I cannot say I have verified every modification. I have some comments, but it looks very nice!

First, from my testing, everything work well and makes it working well with automatic formatter. About empty line preservation, there are some non-important "inconsistencies", that I report here:

  • In toplevel blocks, empty lines are preserved when positioned first, in between blocks, but not last. For instance:
  {@ocaml[

    # print_endline "foo";;
    foo
    - : unit = ()

    # print_endline "bar";;
    bar
    - : unit = ()

  ]}

will be rewritten to:

  {@ocaml[

    # print_endline "foo";;
    foo
    - : unit = ()

    # print_endline "bar";;
    bar
    - : unit = ()
  ]}
  • In shell, lines are preserved at first, but not inbetween, or last:
  {@sh[

    $ echo foo
    foo

    $ echo bar
    bar

  ]}

will be rewritten to:

  {@sh[

    $ echo foo
    foo
    $ echo bar
    bar
  ]}
  • Normal fragments work as expected: nothing is changed from the input.

About the code, it looks much better without all those ?syntax everywhere. I have some comments:

  • I think it would be better not to have a Language_tag variant in the label type. It does not enforce the fact that there can be only one language tag (while it is enforced by the syntax).
  • There are more Log.debug than before. I don't know if it's intended, or a debug trace that should be removed...
  • Mli syntax is much less tested than markdown's one. For instance, labels parsing and reconstructing should be tested (and made less constrained). Although I don't know how much this is related to this PR (which was I think started to improve mli syntax parsing and reconstruction).

lib/block.ml Show resolved Hide resolved
lib/block.ml Outdated Show resolved Hide resolved
lib/block.ml Outdated Show resolved Hide resolved
lib/cram.mli Outdated Show resolved Hide resolved
lib/test/mdx_test.ml Show resolved Hide resolved
lib/cram.mli Outdated Show resolved Hide resolved
lib/test/mdx_test.ml Outdated Show resolved Hide resolved
lib/toplevel.ml Outdated Show resolved Hide resolved
lib/toplevel.ml Show resolved Hide resolved
lib/label.ml Show resolved Hide resolved
Leonidas-from-XIV and others added 23 commits January 4, 2023 17:34
The reason for this is due to us wanting to have the spaces available in
MDX, so we know all the whitespace, so we can format it the way it came
in from the user.

This also addresses the parsing issue for multiline toplevel blocks.

Closes #394
This way the line offsets are only calculated once for every docstring
and the code doesn't repeatedly scan through the string for every new
line it tries to find.
Turns out the combinators can be called directly, and get `ppf` as an
argument. The codebase already does this in some places, so let's be
consistent.
This adds back the `vpad` functionality which adds newlines in front of
cram blocks.

It also removes the special-casing of not printing the indentation in
some cases, according to the cram spec[0] the output has to start with two
spaces (the padding) and then the output (which is empty in that case).

[0]: https://bitheap.org/cram/
The cursor mutating made it a bit hard to follow, also the fold allows
to avoid a `List.concat` in the end.
Co-authored-by: panglesd <peada@free.fr>
Now cram tests only have a start padding at the beginning and the vpad
is only part of the whole block, not individual tests.
This was probably due to the markdown parser leaving out the initial
newline initiating a cram block as well as the final newline finishing
the block if it is not the final part of the file.

But it works for now, so that refactoring is better left for a future
PR.
@Leonidas-from-XIV Leonidas-from-XIV merged commit aa5551a into realworldocaml:main Jan 5, 2023
@Leonidas-from-XIV
Copy link
Collaborator Author

I'll merge this now since I think it is a decent point and further improvements can be added in future (hopefully shorter) PRs. Thanks for the review @panglesd!

@Leonidas-from-XIV Leonidas-from-XIV deleted the multiline-mli-fix branch January 5, 2023 11:03
Leonidas-from-XIV added a commit to Leonidas-from-XIV/opam-repository that referenced this pull request Jan 6, 2023
CHANGES:

#### Added

- Report all parsing errors in Markdown files (realworldocaml/mdx#389, @NathanReb)

#### Changed

- Preserve indentation in multiline OCaml blocks in .mli files (realworldocaml/mdx#395, @panglesd)

#### Fixed

- Fixed compatibility with Cmdliner 1.1.0 (realworldocaml/mdx#371, @Leonidas-from-XIV)
- Report errors and exit codes of toplevel directives (realworldocaml/mdx#382, @talex5,
  @Leonidas-from-XIV)
- Fix block locations in error reporting (realworldocaml/mdx#389, @NathanReb)
- Include the content of the line that features the `part-end` MDX directive in
  the output, before that line would've been dropped (realworldocaml/mdx#374, realworldocaml/mdx#387,
  @Leonidas-from-XIV)
- Handle EINTR signal on waitpid call by restarting the syscall. (realworldocaml/mdx#409, @tmcgilchrist)
- Fix parsing of multiline toplevel phrases in .mli files (realworldocaml/mdx#394, realworldocaml/mdx#397,
  @Leonidas-from-XIV)

#### Removed

- Removed warning about missing semicolons added in MDX 1.11.0 and the
  automatic insertion of semicolons in the corrected files introduced in MDX
  2.0.0. (realworldocaml/mdx#398, @Leonidas-from-XIV)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Multiline toplevel block do not work in .mli files
2 participants