rustfix doesn't work on files with CRLF line endings #176

Rantanen · 2019-10-01T18:07:27Z

I had trouble earlier where cargo fix --edition resulted in a lot of mangled code, such as use ::foo::bar; -> usecrate::foo::barr; with the first space being omitted and last character duplicated.

I tracked this into the way span offsets and replacements work. As far as I could gather, the span offsets count both LF and CRLF as a single character, this means each CRLF makes span offsets off by one.

However I'm a bit confused on this for two reasons:

"Does not work with CRLF" would seem like a problem that most people on Windows would encounter, yet I couldn't find any references to this issue anywhere. The 2018 edition has been out for almost a year so I would have expected to find reports of this elsewhere.
The tests/parse_and_replace.rs has a not(windows) cfg with a comment to fix these tests on Windows; So at some point a problem with the tests has been recognized, but no further issue has been raised - so I'm assuming the breakage has only affected tests and the fixes themselves have been working just fine on Windows.

Is the span change a recent breakage by rustc/cargo or is this something specific to my environment? I'm testing this with a recent nightly (2019-09-30).

The text was updated successfully, but these errors were encountered:

ehuss · 2019-10-01T18:23:32Z

This looks to be a relatively recent breakage in 1.39, most likely caused by rust-lang/rust#62948. It is now reporting post-normalized byte offsets. That seems like a bug in rustc to me.

cc @matklad, were you aware that JSON diagnostic spans are now reporting incorrect byte offsets? Is that something that can be easily fixed? To me, this seems like a pretty big regression for cargo fix to no longer work for windows users, and something that would be good to get fixed for beta.

matklad · 2019-10-01T18:30:27Z

No, I wasn't aware of that. Note, however, that that PR didn't modify any tests. That I think is reasonable, IIRC, we use (line, offset) coordinates, and those are invariant with respect to line endings.

So maybe something interprets those offsets wrong?

Additionally, I should note that Rust since-forever did some normalization (namely, BOM-stripping), so if some code relied on the fact that offsets are non-normalized, that code was technically not correct (albeit in a pretty narrow edge case).

matklad · 2019-10-01T18:38:29Z

Ah, I see now that JSON reports both (line, col) and byte offsets coordinates. We definitely should fix the latter, both for \r\n and for BOM. Presumably, rust-lang/rust#62948 didn't change any tests because there were no tests with BOM / \r\n?

Not sure how annoying this would be to fix: computing the table of "adjustments" and storing it alongside SourceFile shouldn't be hard. Threading this info to the place where we produce byte_start, byte_end might be tricker.

Additionally, I am traveling this week, so won't be able to look into this immediately.

cc @petrochenkov

Rantanen · 2019-10-01T18:40:38Z

Oh, and I guess the reason for the #[cfg(not(windows))] in rustfix parse_and_replace.rs tests is that those tests depend on recorded rustc output that has been recorded in LF-only environment and thus the byte offsets there are incompatible with CRLF line endings in Windows.

ehuss · 2019-10-01T18:48:31Z

Ah, yea, it looks like rustfix uses the byte_start/end values (not the line/column values). I think it could be switched to line/column, but I don't know how hard that would be. But the byte_start/end values should be fixed, otherwise they are misleading/wrong.

I'm not surprised there aren't any BOM/CRLF tests for the JSON output. I see maybe 4 tests specifically for JSON, so it doesn't really have much explicit test coverage at all.

Rantanen · 2019-10-02T02:11:32Z

I had a quick look at this with the intention of taking a stab at implementing the fix in rust-lang/rust. I'll try to file an issue over at rust-lang/rust at some point, but given it's late, I'm lazy and such issue doesn't exist yet, I'd love to get a comment on the following here:

My current plan is as follows:

Add normalized_chars: Vec<NormalizedChar> to libsyntax_pos::SourceFile
In SourceFile::new, add &mut Vec<NormalizedChar> parameter to both the remove_bom and normalize_newlines functions and fill that Vec as those functions strip characters from the original source string.
Assume the only bit that is interested in the diagnostics reporting is the DiagnosticSpan in libsyntax::json. Fortunately the from_span_full there seems to have access to the SourceFile itself (Receiving libsyntax_pos::Loc from the JsonEmitter -> SourceMapper and the Loc has a Lrc<SourceFile> as one of the fields).

@matklad, does this make any sense or do you foresee any problems with this approach?

matklad · 2019-10-02T06:22:19Z

Makes sense to me! I am not entirely sure how Vec<NormaziedChar> should look like, but I think something like Vec<(usize, usize)> should work, where first usize is byte index of normalized source, and the second usize is the adjustment you need to add to get original index. So, given a normalized pos p, you binary-search it in the sorted array by the first component, and than add the second component of the matched index.

…

On Wednesday, 2 October 2019, Mikko Rantanen ***@***.***> wrote: I had a quick look at this with the intention of taking a stab at implementing the fix in rust-lang/rust. I'll try to file an issue over at rust-lang/rust at some point, but given it's late, I'm lazy and such issue doesn't exist yet, I'd love to get a comment on the following here: My current plan is as follows: - Add normalized_chars: Vec<NormalizedChar> to libsyntax_pos::SourceFile - In SourceFile::new, add &mut Vec<NormalizedChar> parameter to both the remove_bom and normalize_newlines functions and fill that Vec as those functions strip characters from the original source string. - Assume the only bit that is interested in the diagnostics reporting is the DiagnosticSpan in libsyntax::json. Fortunately the from_span_full there seems to have access to the SourceFile itself (Receiving libsyntax_pos::Loc from the JsonEmitter -> SourceMapper and the Loc has a Lrc<SourceFile> as one of the fields). @matklad <https://github.com/matklad>, does this make any sense or do you foresee any problems with this approach? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#176>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AANB3M75TETM6P3CZYXAIHLQMP7NNANCNFSM4I4M6WCA> .

Rantanen · 2019-10-02T11:43:22Z

Yeah that makes sense, although I planned on using i32 as the offset to allow negative offsets - not currently needed, but given they shouldn't complicate the actual implementation either, I figured we could just as well support the hypothetical scenario where normalization adds bytes in the future anyway.

Also I feel a signed value would make it (very slightly) clearer that the data is relative offset instead of absolute position.

I was also playing with the idea of using (usize, usize) to mark the last normalized character with the final calculation being final = result.1 + (original - result.0) - this would make things even more awesome and flexible as then we could support non-linear normalizations(!!!), but I don't think I want to contribute anything towards such insanity. :)

@matklad

Fix the start/end byte positions in the compiler JSON output Track the changes made during normalization in the `SourceFile` and use this information to correct the `start_byte` and `end_byte` fields in the JSON output. This should ensure the start/end byte fields can be used to index the original file, even if Rust normalized the source code for parsing purposes. Both CRLF to LF and BOM removal are handled with this one. The rough plan was discussed with @matklad in rust-lang/rustfix#176 - although I ended up going with `u32` offset tracking so I wouldn't need to deal with `u32 + i32` arithmetics when applying the offset to the span byte positions. Fixes rust-lang#65029

@matklad

Fix the start/end byte positions in the compiler JSON output Track the changes made during normalization in the `SourceFile` and use this information to correct the `start_byte` and `end_byte` fields in the JSON output. This should ensure the start/end byte fields can be used to index the original file, even if Rust normalized the source code for parsing purposes. Both CRLF to LF and BOM removal are handled with this one. The rough plan was discussed with @matklad in rust-lang/rustfix#176 - although I ended up going with `u32` offset tracking so I wouldn't need to deal with `u32 + i32` arithmetics when applying the offset to the span byte positions. Fixes rust-lang#65029

…, r=petrochenkov" This reverts commit ef1ecbe, reversing changes made to fc8765d. That changed unfortunately broke rustfix on windows: rust-lang/rustfix#176 Specifically, what ef1ecbe did was to enforce normalization of \r\n to \n at file loading time, similarly to how we deal with Byte Order Mark. Normalization changes raw offsets in files, which are exposed via `--error-format=json`, and used by rusfix. The proper solution here (which also handles the latent case with BOM) is rust-lang#65074 However, since it's somewhat involved, and we are time sensitive, we prefer to revert the original change on beta.

@matklad

Fix the start/end byte positions in the compiler JSON output Track the changes made during normalization in the `SourceFile` and use this information to correct the `start_byte` and `end_byte` fields in the JSON output. This should ensure the start/end byte fields can be used to index the original file, even if Rust normalized the source code for parsing purposes. Both CRLF to LF and BOM removal are handled with this one. The rough plan was discussed with @matklad in rust-lang/rustfix#176 - although I ended up going with `u32` offset tracking so I wouldn't need to deal with `u32 + i32` arithmetics when applying the offset to the span byte positions. Fixes rust-lang#65029

@matklad

Fix the start/end byte positions in the compiler JSON output Track the changes made during normalization in the `SourceFile` and use this information to correct the `start_byte` and `end_byte` fields in the JSON output. This should ensure the start/end byte fields can be used to index the original file, even if Rust normalized the source code for parsing purposes. Both CRLF to LF and BOM removal are handled with this one. The rough plan was discussed with @matklad in rust-lang/rustfix#176 - although I ended up going with `u32` offset tracking so I wouldn't need to deal with `u32 + i32` arithmetics when applying the offset to the span byte positions. Fixes rust-lang#65029

…, r=petrochenkov" This reverts commit ef1ecbe, reversing changes made to fc8765d. That changed unfortunately broke rustfix on windows: rust-lang/rustfix#176 Specifically, what ef1ecbe did was to enforce normalization of \r\n to \n at file loading time, similarly to how we deal with Byte Order Mark. Normalization changes raw offsets in files, which are exposed via `--error-format=json`, and used by rusfix. The proper solution here (which also handles the latent case with BOM) is rust-lang#65074 However, since it's somewhat involved, and we are time sensitive, we prefer to revert the original change on beta.

ehuss · 2019-10-28T16:15:45Z

This appears to be fixed in the latest beta and nightly. Should this issue be closed?
And thanks for working on the fix @Rantanen!

Rantanen · 2019-10-28T16:18:04Z

Oh, probably! Forgot this one got left open.

Rantanen mentioned this issue Oct 1, 2019

Support CRLF linebreaks during replace #177

Closed

This was referenced Oct 2, 2019

libsyntax JSON output bytes counts ignore CRLF normalization rust-lang/rust#65029

Closed

Fix the start/end byte positions in the compiler JSON output rust-lang/rust#65074

Merged

matklad mentioned this issue Oct 10, 2019

[beta] Revert "Auto merge of #62948 - matklad:failable-file-loading, r=petro… rust-lang/rust#65273

Closed

Rantanen closed this as completed Oct 28, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rustfix doesn't work on files with CRLF line endings #176

rustfix doesn't work on files with CRLF line endings #176

Rantanen commented Oct 1, 2019

ehuss commented Oct 1, 2019

matklad commented Oct 1, 2019

matklad commented Oct 1, 2019

Rantanen commented Oct 1, 2019

ehuss commented Oct 1, 2019

Rantanen commented Oct 2, 2019

matklad commented Oct 2, 2019 via email

Rantanen commented Oct 2, 2019

ehuss commented Oct 28, 2019

Rantanen commented Oct 28, 2019

rustfix doesn't work on files with CRLF line endings #176

rustfix doesn't work on files with CRLF line endings #176

Comments

Rantanen commented Oct 1, 2019

ehuss commented Oct 1, 2019

matklad commented Oct 1, 2019

matklad commented Oct 1, 2019

Rantanen commented Oct 1, 2019

ehuss commented Oct 1, 2019

Rantanen commented Oct 2, 2019

matklad commented Oct 2, 2019 via email

Rantanen commented Oct 2, 2019

ehuss commented Oct 28, 2019

Rantanen commented Oct 28, 2019