-
Notifications
You must be signed in to change notification settings - Fork 199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix LF line-ending auto format bug #10802
Conversation
…server from sending /r to LF line ending docs
src/Razor/src/Microsoft.CodeAnalysis.Razor.Workspaces/Formatting/RazorFormattingService.cs
Outdated
Show resolved
Hide resolved
src/Razor/src/Microsoft.CodeAnalysis.Razor.Workspaces/Formatting/RazorFormattingService.cs
Outdated
Show resolved
Hide resolved
I don't have any real concerns with this, but I don't think I can approve without tests. It's very hard to understand the real world impact otherwise. Tests would also help prove if the simplified approach I commented will actually work. |
…d keep everything else intact in the text edit. Added test cases
For some reason it totally slipped my mind to add in some tests. I added some by hard coding the two different line endings in the string. I confirmed that the tests fail before these commits and pass after the changes made so far in this PR. Please let me know if I should add more tests. I figured since we have very comprehensive tests on all sorts of razor-related formatting and the fact that these changes didn't fail any of the previous tests, we should be fine. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome job. My comments are really just small tweaks, but hopefully we can massively increase the test coverage.
src/Razor/src/Microsoft.CodeAnalysis.Razor.Workspaces/Formatting/RazorFormattingService.cs
Outdated
Show resolved
Hide resolved
src/Razor/src/Microsoft.CodeAnalysis.Razor.Workspaces/Formatting/RazorFormattingService.cs
Outdated
Show resolved
Hide resolved
@@ -37,6 +37,44 @@ public interface Bar | |||
"""); | |||
} | |||
|
|||
[Fact] | |||
public async Task FormatsCodeBlockDirectiveWithCRLFLineEnding() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests are great, and I was going to ask you to create a version of the Formats_CodeBlockDirectiveWithMarkup_NonBraced
test as well (because it adds a newline with trailing whitespace, which these don't cover), but then I thought about it more, and I think it would be more interesting if FormattingTestBase
was modified to replace \r\n
with \n
, and we run every test with both line endings.
ie, duplicate these two lines again, but with input
and expected
having had their line endings replaced with LF
https://github.com/dotnet/razor/blob/main/src/Razor/test/Microsoft.AspNetCore.Razor.LanguageServer.Test/Formatting_NetFx/FormattingTestBase.cs#L54
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense! That would be a better way to widen the test coverage
return minimalEdits; | ||
} | ||
|
||
private (int crlfCount, int lfCount) CountLineEndings(SourceText sourceText) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: This method could just return a bool and be called HasLFLineEndings
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a couple of thoughts/questions about this method:
- Has
SourceText.Lines
already been populated at this point? I'd be surprised if it isn't, since this occurs after all of the formatting passes. Given that, would it be more efficient to just walk through theLines
collection and check the line ending? That would give you the exact span of each line ending, so you wouldn't need to walk every character in theSourceText
. - Should we be concerned about other legal "new line" characters? Here are the others that I know of:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, this method does not access instance data and could be static. In fact, it's probably useful enough as a utility function to move it to SourceTextExtensions
and make it an extension method on SourceText
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me, though I ask that you address @davidwengier's feedback to use a simple foreach loop rather than LINQ to avoid unnecessary allocations.
I added several suggestions to improve efficiency, facilitate code reuse, and improve testing. I added a lot of explanation and samples in case you find it useful.
return minimalEdits; | ||
} | ||
|
||
private (int crlfCount, int lfCount) CountLineEndings(SourceText sourceText) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a couple of thoughts/questions about this method:
- Has
SourceText.Lines
already been populated at this point? I'd be surprised if it isn't, since this occurs after all of the formatting passes. Given that, would it be more efficient to just walk through theLines
collection and check the line ending? That would give you the exact span of each line ending, so you wouldn't need to walk every character in theSourceText
. - Should we be concerned about other legal "new line" characters? Here are the others that I know of:
var crlfCount = 0; | ||
var lfCount = 0; | ||
|
||
for (var i = 0; i < sourceText.Length; i++) | ||
{ | ||
if (sourceText[i] == '\r') | ||
{ | ||
if (i + 1 < sourceText.Length && sourceText[i + 1] == '\n') | ||
{ | ||
crlfCount++; | ||
i++; // Skip the next character as it's part of the CRLF sequence | ||
} | ||
} | ||
else if (sourceText[i] == '\n') | ||
{ | ||
lfCount++; | ||
} | ||
} | ||
|
||
return (crlfCount, lfCount); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For an algorithm like this, I might do a few things differently to improve the efficiency. You can take these suggestions or leave them! Or, you can take an entirely different approach! I just wanted to share a couple of opportunities:
- I would extract SourceText.Length to a local variable to avoid accessing it each iteration of the loop. SourceText.Length is an abstract property and is overridden by every SourceText implementation. So, I wouldn't expect the JIT to do any smart inlining here.
- I would capture the first character before the loop and then start the loop at index 1. That way, I could avoid the extra length check inside the loop for
\r
. - I would likely extract
sourceText[i]
to a local variable or use a switch statement to avoid indexing extra times. This allows you to walk characters in pairs -- the previous and current character. - This doesn't help with efficiency so much, but C# tuples are mutable structs. At the method body level, it can be useful to think of them as little packs of variables. So, you could just declare a single local at the top of the method for the result tuple and just update the fields. This won't affect efficiency and is largely a stylistic choice in this case, but I thought it might be useful to mention.
var crlfCount = 0; | |
var lfCount = 0; | |
for (var i = 0; i < sourceText.Length; i++) | |
{ | |
if (sourceText[i] == '\r') | |
{ | |
if (i + 1 < sourceText.Length && sourceText[i + 1] == '\n') | |
{ | |
crlfCount++; | |
i++; // Skip the next character as it's part of the CRLF sequence | |
} | |
} | |
else if (sourceText[i] == '\n') | |
{ | |
lfCount++; | |
} | |
} | |
return (crlfCount, lfCount); | |
var result = (crlfCount: 0, lfCount: 0); | |
var length = sourceText.Length; | |
if (length == 0) | |
{ | |
return result; | |
} | |
var previous = sourceText[0]; | |
if (previous == '\n') | |
{ | |
result.lfCount++; | |
} | |
for (var i = 1; i < length; i++) | |
{ | |
var current = sourceText[i]; | |
if (current == '\n') | |
{ | |
if (previous == '\r') | |
{ | |
result.crlfCount++; | |
// Skip ahead to avoid counting the '\n' again during the next iteration. | |
// However, we need to be careful not to index past the end of the SourceText! | |
if (++i < length) | |
{ | |
// Set previous to the character *after* current. And, since we've already | |
// set previous, we can continue the loop. Otherwise, previous will get set | |
// to the wrong value below. | |
previous = sourceText[i]; | |
continue; | |
} | |
} | |
else | |
{ | |
result.lfCount++; | |
} | |
} | |
previous = current; | |
} | |
return result; |
As I mentioned above, it might just be simpler (and more efficient) to walk through SourceText.Lines
. However, I wanted to provide some suggestions in case you keep this algorithm. Also, please note that I have not tested this code! I wrote it directly into the GitHub comment field. So, please take the accuracy with a grain of salt. I've only had one cup of coffee this morning. 😄
return minimalEdits; | ||
} | ||
|
||
private (int crlfCount, int lfCount) CountLineEndings(SourceText sourceText) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, this method does not access instance data and could be static. In fact, it's probably useful enough as a utility function to move it to SourceTextExtensions
and make it an extension method on SourceText
.
src/Razor/src/Microsoft.CodeAnalysis.Razor.Workspaces/Formatting/RazorFormattingService.cs
Outdated
Show resolved
Hide resolved
await RunFormattingTestAsync( | ||
input: | ||
"@code {\r\n" + | ||
" public class Foo{}\r\n" + | ||
" public interface Bar {\r\n" + | ||
"}\r\n" + | ||
"}", | ||
expected: | ||
"@code {\r\n" + | ||
" public class Foo { }\r\n" + | ||
" public interface Bar\r\n" + | ||
" {\r\n" + | ||
" }\r\n" + | ||
"}"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding test helpers to add the line breaks for you. Something like these would make it a little less error prone to write new tests.
public static string CodeFromLines(string lineEnding, params string[] lines)
{
return string.Join(lineEnding, lines);
}
Then, the tests can be written by passing in the line text without line endings like so:
await RunFormattingTestAsync( | |
input: | |
"@code {\r\n" + | |
" public class Foo{}\r\n" + | |
" public interface Bar {\r\n" + | |
"}\r\n" + | |
"}", | |
expected: | |
"@code {\r\n" + | |
" public class Foo { }\r\n" + | |
" public interface Bar\r\n" + | |
" {\r\n" + | |
" }\r\n" + | |
"}"); | |
await RunFormattingTestAsync( | |
input: CodeFromLines("\r\n", | |
"@code {", | |
" public class Foo{}", | |
" public interface Bar {", | |
"}", | |
"}"), | |
expected: CodeFromLines("\r\n", | |
"@code {", | |
" public class Foo { }", | |
" public interface Bar", | |
" {", | |
" }", | |
"}"); |
Even better, since this is test code, efficiency is less worrisome than in production code. So, you could write a helper that uses a SourceText
to parse the lines and use that to build up a new string with different line endings. Something like the following would do the trick (Note: untested code!).
public static string NormalizeLineEndings(string lineEnding, string code)
{
using var _ = StringBuilderPool.GetPooledObject(out var builder);
var text = SourceText.From(code);
foreach (var line in text.Lines)
{
builder.Append(text.GetSubTextString(line.Span));
if (line.End < text.Length)
{
builder.Append(lineEnding);
}
}
return builder.ToString();
}
Then, the tests could continue using raw string literals like so:
await RunFormattingTestAsync( | |
input: | |
"@code {\r\n" + | |
" public class Foo{}\r\n" + | |
" public interface Bar {\r\n" + | |
"}\r\n" + | |
"}", | |
expected: | |
"@code {\r\n" + | |
" public class Foo { }\r\n" + | |
" public interface Bar\r\n" + | |
" {\r\n" + | |
" }\r\n" + | |
"}"); | |
await RunFormattingTestAsync( | |
input: NormalizeLineEndings("\r\n", """ | |
@code { | |
public class Foo{} | |
public interface Bar { | |
} | |
} | |
""", | |
expected: NormalizeLineEndings("\r\n", """ | |
@code { | |
public class Foo { } | |
public interface Bar | |
{ | |
} | |
} | |
"""); |
Even, even better, the test helper could be extension method on string.
public static string ChangeLineEndingsTo(this string code, string lineEnding);
src/Razor/src/Microsoft.CodeAnalysis.Razor.Workspaces/Formatting/RazorFormattingService.cs
Outdated
Show resolved
Hide resolved
foreach (var line in text.Lines) | ||
{ | ||
var lineBreakSpan = TextSpan.FromBounds(line.End, line.EndIncludingLineBreak); | ||
var lineBreak = line.Text?.ToString(lineBreakSpan) ?? string.Empty; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
line.Text
is just returning you the text
you already have, and already know is not null. But also, can we just do this without creating any strings at all, and just using if (line.EndIncludingLineBreak - line.End == 2)
?
Having said that, I have a wacky idea and I don't think need this method at all, but I can't be sure until I've debugged through a test and see for myself what things actually look like, so will leave that one for my TODO list later :)
Dear reviewers, as I am expanding the test coverage I came across a new bug. The indentation space from the razor formatting service is not working for the
Thanks! Screen.Recording.2024-08-29.at.4.58.42.PM.mov |
If the fix is quick and easy, then fixing it would be great. If not, the best thing to do is to add the test, but skip it and create an issue for the bug. That way when it comes time to fix, half of the work has already been done. |
src/Razor/src/Microsoft.CodeAnalysis.Razor.Workspaces/Formatting/RazorFormattingService.cs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Just a couple of small nits from me.
...r/test/Microsoft.AspNetCore.Razor.LanguageServer.Test/Formatting_NetFx/FormattingTestBase.cs
Outdated
Show resolved
Hide resolved
...r/test/Microsoft.AspNetCore.Razor.LanguageServer.Test/Formatting_NetFx/FormattingTestBase.cs
Show resolved
Hide resolved
...r/test/Microsoft.AspNetCore.Razor.LanguageServer.Test/Formatting_NetFx/FormattingTestBase.cs
Outdated
Show resolved
Hide resolved
...r/test/Microsoft.AspNetCore.Razor.LanguageServer.Test/Formatting_NetFx/FormattingTestBase.cs
Outdated
Show resolved
Hide resolved
Late to this, but I love the fact that you only skipped the LF part of the failing tests, and not the whole thing <3 |
This PR fixes #5589 #4349 #10466, the bug that adds extra lines when VS code users have LF line ending enabled for their Razor files and they format the document.
Root Cause
This bug happens for CSharp code in a razor file when VS coder users use LF line ending for a razor file. Based on my best understanding, the problem lies that the razor server is formatting the
csharpSourceText
(see here), which is generated with the default line ending of the operating system (LF for Unix and CRLF for Windows, hence this doesn't repro on Mac). Once thecsharpSourceText
is formatted it gets compared against CSharp portion of the original document which could have the LF line ending, and if so, the formatting service adds a\r
before every\n
.Summary of the changes
RazorFormattingService.cs
. The normalizing method implemented counts the occurrences of CRLF and LF line endings in the original text. If LF line endings are more prevalent, it removes any CR characters (\r
) from the text edits to ensure consistency with the LF style.The TextEdit normalization technique introduced in this PR can be inverted if a better solution is thought of. There are two main issues:
csharpSourceText
with the correct line ending or convert it to the correct one upon loading it.