Translate `\r\n` -> `\n` when reading files #62865

matklad · 2019-07-22T07:27:38Z

Currently, we handle \r\n explicitly in the lexer. We should do this at the file read time instead.

Motivation:

Line endings should not affect semantics of the language. For example, git on windows by default checkouts with \r\n line endings, and it would be bad if compiling code on windows led to observably different results. This could be handled on a case-by-case basis in the lexer (the current approach), but it's easy to miss something (for example Raw Byte strings do not handle \r #60604). Additionally, proc macros currently see original tokens, and so can observe different line endings. The simplest way to make compiler lineending-invariant is to normalize line endings as soon as possible.
For IDE features, which work close to the source code, and especially for code generation during refactorings, the surface are where you need to handle different line endings is much larger. It would be easier for IDE to assume \n line endings and convert at one place, at the boundary.

i propose that we convert \r\n once, in SourceFile::new constructor. Note that we already do BOM-removal there, so, in theory, all code should already be prepared to mismatch between in-memory and on-disk file content. The replacement shortens the string, so it can be a pretty fast in-place transformation.

One technical question is what to do with isolated \r? I see two options:

leave them as is. This makes the transform non-idempotent though: \r\r\n -> \r\n, and requires other code to explicitely not treat \r\n as line ending.
error at file load time if there's a lone \r in the source text. That is, after normalization, \r can not be found in the source code at all.

I propose to go with the second option it's slightly more annoying to implement, but seems more robust.

The text was updated successfully, but these errors were encountered:

matklad · 2019-07-22T07:28:09Z

cc @petrochenkov

matklad · 2019-07-24T14:19:18Z

Found a bug: here we load doc comment via a file, without translate_crlf step.

see rust-lang/rust#62865

To eliminate parsing differences between Windows and other platforms, the frontend now normalizes all CRLF line endings to LF, like [in Rust](rust-lang/rust#62865). Effects: - This makes Lake hashes be faithful to what Lean sees (Lake already normalizes line endings before computing hashes). - Docstrings now have normalized line endings. In particular, this fixes `#guard_msgs` failing multiline tests for Windows users using CRLF. - Now strings don't have different lengths depending on the platform. Before this PR, the following theorem is true for LF and false for CRLF files. ```lean example : " ".length = 1 := rfl ``` Note: the normalization will take `\r\r\n` and turn it into `\r\n`. In the elaborator, we reject loose `\r`'s that appear in whitespace. Rust instead takes the approach of making the normalization routine fail. They do this so that there's no downstream confusion about any `\r\n` that appears. Implementation note: the LSP maintains its own copy of a source file that it updates when edit operations are applied. We are assuming that edit operations never split or join CRLFs. If this assumption is not correct, then the LSP copy of a source file can become slightly out of sync. If this is an issue, there is some discussion [here](#3903 (comment)).

matklad changed the title ~~Translate \r\n -> \n when reading file~~ Translate \r\n -> \n when reading files Jul 22, 2019

jonas-schievink added A-frontend Area: frontend (errors, parsing and HIR) C-cleanup Category: PRs that clean code up or issues documenting cleanup. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jul 22, 2019

ehuss mentioned this issue Jul 22, 2019

Character and string token definitions need updating. rust-lang/reference#626

Open

6 tasks

matklad mentioned this issue Jul 24, 2019

Normalize newlines when loading files #62948

Merged

bors closed this as completed in ef1ecbe Aug 18, 2019

sbrugman added a commit to sbrugman/ruff that referenced this issue Jan 21, 2023

fix: use platform-specific literal newlines

cf39830

see rust-lang/rust#62865

sbrugman added a commit to sbrugman/ruff that referenced this issue Jan 23, 2023

fix: use platform-specific literal newlines

59b7f9c

see rust-lang/rust#62865

sbrugman added a commit to sbrugman/ruff that referenced this issue Jan 24, 2023

fix: use platform-specific literal newlines

36fc06d

see rust-lang/rust#62865

ehuss mentioned this issue Feb 20, 2023

Add \r\n to string_continue grammar rust-lang/reference#1332

Closed

vkleen mentioned this issue Sep 1, 2023

Normalize line endings in string literals during parsing tweag/nickel#1562

Merged

kmill mentioned this issue Nov 19, 2023

RFC: string gaps for continuing string literals across multiple lines leanprover/lean4#2838

Closed

chenyukang mentioned this issue Jan 12, 2024

Multi-line diagnostic suggestions use LF in files that use CRLF line endings #119482

Open

EliahKagan mentioned this issue Apr 4, 2024

building on windows fails due to symlinks in working tree Byron/gitoxide#1310

Closed

kmill mentioned this issue Apr 14, 2024

feat: make frontend normalize line endings to LF leanprover/lean4#3903

Merged

ollpu mentioned this issue Jun 6, 2024

fix(test): fix generating spec tests doesn't work on Windows due to line-endings pulldown-cmark/pulldown-cmark#903

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Translate `\r\n` -> `\n` when reading files #62865

Translate `\r\n` -> `\n` when reading files #62865

matklad commented Jul 22, 2019 •

edited

Loading

matklad commented Jul 22, 2019

matklad commented Jul 24, 2019

Translate \r\n -> \n when reading files #62865

Translate \r\n -> \n when reading files #62865

Comments

matklad commented Jul 22, 2019 • edited Loading

matklad commented Jul 22, 2019

matklad commented Jul 24, 2019

Translate `\r\n` -> `\n` when reading files #62865

Translate `\r\n` -> `\n` when reading files #62865

matklad commented Jul 22, 2019 •

edited

Loading