Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix end of line parsing on Windows #294

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

Julow
Copy link
Collaborator

@Julow Julow commented Oct 20, 2020

There is no way to check this easily, a lot of work is still needed to run the testsuite on Windows. This build fails for a different reason, which is enough for me: https://github.com/Julow/odoc/runs/1281945575?check_suite_focus=true

@Julow Julow mentioned this pull request Oct 20, 2020
@Julow
Copy link
Collaborator Author

Julow commented Oct 21, 2020

Odoc's build on Windows using this patch (and other fixes on Odoc itself) is passing: https://github.com/Julow/odoc/runs/1285922231?check_suite_focus=true

Copy link
Contributor

@NathanReb NathanReb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

Could you also add a changelog entry about this please?

Did you try to add a small unit test for this? I think it would be a nice improvement to this PR, to make sure we don't break parsing of markdown files on windows in the future by mistake!

lib/lexer_mdx.mll Show resolved Hide resolved
lib/lexer_mdx.mll Outdated Show resolved Hide resolved
lib/lexer_mdx.mll Outdated Show resolved Hide resolved
Co-authored-by: Nathan Rebours <nathan.p.rebours@gmail.com>
@Julow
Copy link
Collaborator Author

Julow commented Nov 16, 2020

I think it would be better to test this in CI where every text files would be converted to Windows eol (Git does that). Otherwise, such a test would be easy to break (for example, Git's rules can be configured locally and there is no clear default) and incomplete (spaghetti code).

@gpetiot
Copy link
Contributor

gpetiot commented Dec 15, 2020

It looks good to me, I guess we need to do the same for lexer_top, lexer_cram and mli_parser

@NathanReb
Copy link
Contributor

NathanReb commented Dec 15, 2020

@emillon I seem to recall we discussed this and that the cause for this failure was unclear as you couldn't reproduce the bug on windows. Do we know more about this?

@emillon
Copy link
Contributor

emillon commented Dec 22, 2020

The general idea is that yes, we do have to handle newlines. Some git installations are configured to translate newlines, some do not. If they are translated, the LFs in this repository will be converted to CR LF and the tests will fail. Conversely this translation can also hide problems in the other direction by making CRs disappear.

Two distinct bits are necessary to properly support different kind of newlines:

The first one is about accepting various types of newlines - generally accepting \r*\n in places we would accept \n.

But mdx also writes .corrected files, and those need to have the right newlines. Otherwise a shared repository will end up with mixed newlines, depending on who edited the file last.

Several strategies are possible for that second part:

  • the first one is to always use LF endings. This is what you get if you use open_out_bin. This can cause a situation where somebody writes a markdown file, uses mdx, and then cannot use the corrected file.
  • the second one is to emit the host's native endings: CR LF on Windows, LF on Unix. This is what you get if you use open_out. Using this strategy, mdx always produces a file that makes sense locally. This is a good property for things like log files that are only written, but corrected files are meant to be copied back in place by dune promote. So this will create a situation where the last one to promote will change newlines.
  • a better strategy is to emit the same kind of newlines that was found in the input file. It will avoid both problems: assuming somebody can open the input file, they can open the output file; and promoting will never clutter the git history by adding or removing CRs from the whole file.

The translation layer that exists in ocaml (or really, in the libc) is a bit lacking for this because it can only select between "binary" (1 above) and "text" (2 above), not "always CR" or "always CR LF" that would be required by 3. So to implement the "preserve" strategy one has to open the file in binary mode and do the translation manually.

One final remark is that dune runtest or various git commands will display a diff that can hide some newline changes, but when calling dune promote, the file will be copied and this is where all the newlines will be quietly replaced.

@NathanReb NathanReb changed the base branch from master to main July 2, 2021 09:45
@patricoferris
Copy link

I was testing some MDX + Windows things, and found some files using CRLF would simply hang in MDX. It would be great to see full support for line-endings land in MDX -- this might get picked up as part of ocaml-multicore/eio#758

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants