-
Notifications
You must be signed in to change notification settings - Fork 62
rustfix doesn't work on files with CRLF line endings #176
Comments
This looks to be a relatively recent breakage in 1.39, most likely caused by rust-lang/rust#62948. It is now reporting post-normalized byte offsets. That seems like a bug in rustc to me. cc @matklad, were you aware that JSON diagnostic spans are now reporting incorrect byte offsets? Is that something that can be easily fixed? To me, this seems like a pretty big regression for |
No, I wasn't aware of that. Note, however, that that PR didn't modify any tests. That I think is reasonable, IIRC, we use So maybe something interprets those offsets wrong? Additionally, I should note that Rust since-forever did some normalization (namely, BOM-stripping), so if some code relied on the fact that offsets are non-normalized, that code was technically not correct (albeit in a pretty narrow edge case). |
Ah, I see now that JSON reports both (line, col) and byte offsets coordinates. We definitely should fix the latter, both for Not sure how annoying this would be to fix: computing the table of "adjustments" and storing it alongside Additionally, I am traveling this week, so won't be able to look into this immediately. |
Oh, and I guess the reason for the |
Ah, yea, it looks like rustfix uses the byte_start/end values (not the line/column values). I think it could be switched to line/column, but I don't know how hard that would be. But the byte_start/end values should be fixed, otherwise they are misleading/wrong. I'm not surprised there aren't any BOM/CRLF tests for the JSON output. I see maybe 4 tests specifically for JSON, so it doesn't really have much explicit test coverage at all. |
I had a quick look at this with the intention of taking a stab at implementing the fix in rust-lang/rust. I'll try to file an issue over at rust-lang/rust at some point, but given it's late, I'm lazy and such issue doesn't exist yet, I'd love to get a comment on the following here: My current plan is as follows:
@matklad, does this make any sense or do you foresee any problems with this approach? |
Makes sense to me! I am not entirely sure how Vec<NormaziedChar> should
look like, but I think something like Vec<(usize, usize)> should work,
where first usize is byte index of normalized source, and the second usize
is the adjustment you need to add to get original index. So, given a
normalized pos p, you binary-search it in the sorted array by the first
component, and than add the second component of the matched index.
…On Wednesday, 2 October 2019, Mikko Rantanen ***@***.***> wrote:
I had a quick look at this with the intention of taking a stab at
implementing the fix in rust-lang/rust. I'll try to file an issue over at
rust-lang/rust at some point, but given it's late, I'm lazy and such issue
doesn't exist yet, I'd love to get a comment on the following here:
My current plan is as follows:
- Add normalized_chars: Vec<NormalizedChar> to
libsyntax_pos::SourceFile
- In SourceFile::new, add &mut Vec<NormalizedChar> parameter to both
the remove_bom and normalize_newlines functions and fill that Vec as
those functions strip characters from the original source string.
- Assume the only bit that is interested in the diagnostics reporting
is the DiagnosticSpan in libsyntax::json. Fortunately the
from_span_full there seems to have access to the SourceFile itself
(Receiving libsyntax_pos::Loc from the JsonEmitter -> SourceMapper and
the Loc has a Lrc<SourceFile> as one of the fields).
@matklad <https://github.com/matklad>, does this make any sense or do you
foresee any problems with this approach?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#176>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AANB3M75TETM6P3CZYXAIHLQMP7NNANCNFSM4I4M6WCA>
.
|
Yeah that makes sense, although I planned on using Also I feel a signed value would make it (very slightly) clearer that the data is relative offset instead of absolute position. I was also playing with the idea of using (usize, usize) to mark the last normalized character with the final calculation being |
Fix the start/end byte positions in the compiler JSON output Track the changes made during normalization in the `SourceFile` and use this information to correct the `start_byte` and `end_byte` fields in the JSON output. This should ensure the start/end byte fields can be used to index the original file, even if Rust normalized the source code for parsing purposes. Both CRLF to LF and BOM removal are handled with this one. The rough plan was discussed with @matklad in rust-lang/rustfix#176 - although I ended up going with `u32` offset tracking so I wouldn't need to deal with `u32 + i32` arithmetics when applying the offset to the span byte positions. Fixes rust-lang#65029
Fix the start/end byte positions in the compiler JSON output Track the changes made during normalization in the `SourceFile` and use this information to correct the `start_byte` and `end_byte` fields in the JSON output. This should ensure the start/end byte fields can be used to index the original file, even if Rust normalized the source code for parsing purposes. Both CRLF to LF and BOM removal are handled with this one. The rough plan was discussed with @matklad in rust-lang/rustfix#176 - although I ended up going with `u32` offset tracking so I wouldn't need to deal with `u32 + i32` arithmetics when applying the offset to the span byte positions. Fixes rust-lang#65029
…, r=petrochenkov" This reverts commit ef1ecbe, reversing changes made to fc8765d. That changed unfortunately broke rustfix on windows: rust-lang/rustfix#176 Specifically, what ef1ecbe did was to enforce normalization of \r\n to \n at file loading time, similarly to how we deal with Byte Order Mark. Normalization changes raw offsets in files, which are exposed via `--error-format=json`, and used by rusfix. The proper solution here (which also handles the latent case with BOM) is rust-lang#65074 However, since it's somewhat involved, and we are time sensitive, we prefer to revert the original change on beta.
Fix the start/end byte positions in the compiler JSON output Track the changes made during normalization in the `SourceFile` and use this information to correct the `start_byte` and `end_byte` fields in the JSON output. This should ensure the start/end byte fields can be used to index the original file, even if Rust normalized the source code for parsing purposes. Both CRLF to LF and BOM removal are handled with this one. The rough plan was discussed with @matklad in rust-lang/rustfix#176 - although I ended up going with `u32` offset tracking so I wouldn't need to deal with `u32 + i32` arithmetics when applying the offset to the span byte positions. Fixes rust-lang#65029
Fix the start/end byte positions in the compiler JSON output Track the changes made during normalization in the `SourceFile` and use this information to correct the `start_byte` and `end_byte` fields in the JSON output. This should ensure the start/end byte fields can be used to index the original file, even if Rust normalized the source code for parsing purposes. Both CRLF to LF and BOM removal are handled with this one. The rough plan was discussed with @matklad in rust-lang/rustfix#176 - although I ended up going with `u32` offset tracking so I wouldn't need to deal with `u32 + i32` arithmetics when applying the offset to the span byte positions. Fixes rust-lang#65029
…, r=petrochenkov" This reverts commit ef1ecbe, reversing changes made to fc8765d. That changed unfortunately broke rustfix on windows: rust-lang/rustfix#176 Specifically, what ef1ecbe did was to enforce normalization of \r\n to \n at file loading time, similarly to how we deal with Byte Order Mark. Normalization changes raw offsets in files, which are exposed via `--error-format=json`, and used by rusfix. The proper solution here (which also handles the latent case with BOM) is rust-lang#65074 However, since it's somewhat involved, and we are time sensitive, we prefer to revert the original change on beta.
This appears to be fixed in the latest beta and nightly. Should this issue be closed? |
Oh, probably! Forgot this one got left open. |
I had trouble earlier where
cargo fix --edition
resulted in a lot of mangled code, such asuse ::foo::bar;
->usecrate::foo::barr;
with the first space being omitted and last character duplicated.I tracked this into the way span offsets and replacements work. As far as I could gather, the span offsets count both LF and CRLF as a single character, this means each CRLF makes span offsets off by one.
However I'm a bit confused on this for two reasons:
tests/parse_and_replace.rs
has anot(windows)
cfg with a comment to fix these tests on Windows; So at some point a problem with the tests has been recognized, but no further issue has been raised - so I'm assuming the breakage has only affected tests and the fixes themselves have been working just fine on Windows.Is the span change a recent breakage by rustc/cargo or is this something specific to my environment? I'm testing this with a recent nightly (2019-09-30).
The text was updated successfully, but these errors were encountered: