Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve codegen for concatenation of string and char (attempt 2) #71793

Merged
merged 74 commits into from
Mar 19, 2024
Merged

Improve codegen for concatenation of string and char (attempt 2) #71793

merged 74 commits into from
Mar 19, 2024

Conversation

DoctorKrolic
Copy link
Contributor

@DoctorKrolic DoctorKrolic commented Jan 24, 2024

Closes #66827

This PR takes approach suggested in #70971 (comment) by @AlekseyTs with some implementation details chaged. E.g. it turned out is's easier to handle constantChar.ToString() pattern in ConvertConcatExprToString rather than in TryFoldTwoConcatOperands and so on. Also this PR implements merging user-provided span-based string.Concat with additional concatenation arguments, which was not described in the original proposal. And @AlekseyTs I have a question for you regarding item 1 in your proposal: the suggestion looks reasonable but it seems to me that it is related not to our case of concatenation of strings and chars, but rather can be treated as a general optimization. If I am wrong, can you please provide an example of a unit test, which is related to the theme of the PR and where codegen can be improved by that change? I am of course open for implementing new optimizations, but this PR is quite big on its own, so if that optimization is not directly related to the main theme, maybe it's better to implement in a later PR rather than in this one

@DoctorKrolic DoctorKrolic requested a review from a team as a code owner January 24, 2024 20:02
@dotnet-issue-labeler dotnet-issue-labeler bot added Area-Compilers untriaged Issues and PRs which have not yet been triaged by a lead labels Jan 24, 2024
@ghost ghost added the Community The pull request was submitted by a contributor who is not a Microsoft employee. label Jan 24, 2024
@AlekseyTs
Copy link
Contributor

AlekseyTs commented Jan 26, 2024

public void ConcatTwo_ConstantCharToString(int? missingUnimportantWellKnownMember)

Is it important to try them one by one, or could we simply mark them all missing in one go? #Closed


Refers to: src/Compilers/CSharp/Test/Emit2/CodeGen/CodeGenSpanBasedStringConcatTests.cs:312 in e4e26f1. [](commit_id = e4e26f1, deletion_comment = False)

@AlekseyTs
Copy link
Contributor

AlekseyTs commented Jan 26, 2024

          IL_0001:  ldstr      "c"

Is it really better to not go with the span approach, use extra space in the string blob of the assembly image, and then allocate the string at runtime? #Closed


Refers to: src/Compilers/CSharp/Test/Emit2/CodeGen/CodeGenSpanBasedStringConcatTests.cs:348 in e4e26f1. [](commit_id = e4e26f1, deletion_comment = False)

@AlekseyTs
Copy link
Contributor

public void ConcatTwo_ConstantCharToString(int? missingUnimportantWellKnownMember)

I guess trying one by one doesn't hurt, but I would also add a flavor when all of them are missing together.


In reply to: 1912258587


Refers to: src/Compilers/CSharp/Test/Emit2/CodeGen/CodeGenSpanBasedStringConcatTests.cs:312 in e4e26f1. [](commit_id = e4e26f1, deletion_comment = False)

@AlekseyTs
Copy link
Contributor

AlekseyTs commented Jan 26, 2024

          IL_0001:  ldstr      "c"

Looking at ConcatThree_ReadOnlySpan_OperandGroupingAndUserInputOfStringBasedConcats, it appears that if we have a constant, we allocate a string, but, if we have a parameter, we use spans. This behavior feels suspicious to me. #Closed


Refers to: src/Compilers/CSharp/Test/Emit2/CodeGen/CodeGenSpanBasedStringConcatTests.cs:1134 in e4e26f1. [](commit_id = e4e26f1, deletion_comment = False)

@AlekseyTs
Copy link
Contributor

AlekseyTs commented Jan 26, 2024

          IL_0008:  call       "string string.Concat(string, string, string, string)"

I am surprised the span overload is not used when it is available along with necessary conversion helpers. For other methods as well. Probably there is a problem with decomposing/composing the arguments. I haven't looked at the implementation yet. #Closed


Refers to: src/Compilers/CSharp/Test/Emit2/CodeGen/CodeGenSpanBasedStringConcatTests.cs:2503 in e4e26f1. [](commit_id = e4e26f1, deletion_comment = False)

@AlekseyTs
Copy link
Contributor

          IL_0008:  call       "string string.Concat(string, string, string, string)"

Perhaps this is the same issue with "premature char to string conversion", which could probably be mitigated by going from a single character string constant back to char when we look at the decomposed arguments.


In reply to: 1912464476


Refers to: src/Compilers/CSharp/Test/Emit2/CodeGen/CodeGenSpanBasedStringConcatTests.cs:2503 in e4e26f1. [](commit_id = e4e26f1, deletion_comment = False)

@AlekseyTs
Copy link
Contributor

AlekseyTs commented Jan 26, 2024

    // but in the second case we leave things untouched. Both lowerings produce correct output in the end, so this isn't much to worry about

~~I am not comfortable with this inconsistency. I think we should have a simpler approach, we either unwrap things, or we don't. The order of processing shouldn't matter. ~~ #Closed


Refers to: src/Compilers/CSharp/Test/Emit2/CodeGen/CodeGenSpanBasedStringConcatTests.cs:4480 in e4e26f1. [](commit_id = e4e26f1, deletion_comment = False)

@AlekseyTs
Copy link
Contributor

    // but in the second case we leave things untouched. Both lowerings produce correct output in the end, so this isn't much to worry about

After a closer look, I changed my mind.


In reply to: 1912530749


Refers to: src/Compilers/CSharp/Test/Emit2/CodeGen/CodeGenSpanBasedStringConcatTests.cs:4480 in e4e26f1. [](commit_id = e4e26f1, deletion_comment = False)

@DoctorKrolic
Copy link
Contributor Author

Is it important to try them one by one, or could we simply mark them all missing in one go?

I guess trying one by one doesn't hurt, but I would also add a flavor when all of them are missing together.

Are you ok if I replace existing tests, where we test positive scenarios with missing unimportant members 1 by 1 with equivalent ones, but where all unimportant members are missing? I agree that tests where all members are missing are better (since this is basically a real-life scenario on old target frameworks like .NET Framework), but new test file is already 4500+ lines long, so adding such tests on top of what we have will make things even harder to follow.

Is it really better to not go with the span approach, use extra space in the string blob of the assembly image, and then allocate the string at runtime?

Looking at ConcatThree_ReadOnlySpan_OperandGroupingAndUserInputOfStringBasedConcats, it appears that if we have a constant, we allocate a string, but, if we have a parameter, we use spans. This behavior feels suspicious to me.

Perhaps this is the same issue with "premature char to string conversion", which could probably be mitigated by going from a single character string constant back to char when we look at the decomposed arguments.

Compiler can take advantage of the fact, that constantChar.ToString() will always become a constant string when evaluated, so it lowers it to a constant string at compile time. I am not sure what is your concern here: it would take far more bytes in IL to use span-based lowering than just directly emitting a string and using string-based concat. In fact, performance of string-based approach is slightly better than using spans.

Benchmark
BenchmarkDotNet v0.13.12, Windows 11 (10.0.22621.3007/22H2/2022Update/SunValley2)
AMD Ryzen 7 5800X, 1 CPU, 16 logical and 8 physical cores
.NET SDK 8.0.200-preview.23624.5
  [Host]     : .NET 8.0.1 (8.0.123.58001), X64 RyuJIT AVX2
  DefaultJob : .NET 8.0.1 (8.0.123.58001), X64 RyuJIT AVX2


| Method         | Length | Mean      | Error     | StdDev    | Gen0   | Allocated |
|--------------- |------- |----------:|----------:|----------:|-------:|----------:|
| UseConstString | 1      |  6.761 ns | 0.0462 ns | 0.0432 ns | 0.0019 |      32 B |
| UseSpan        | 1      |  7.667 ns | 0.0188 ns | 0.0157 ns | 0.0019 |      32 B |
| UseConstString | 1000   | 61.349 ns | 0.7080 ns | 0.5912 ns | 0.1209 |    2024 B |
| UseSpan        | 1000   | 62.136 ns | 0.8843 ns | 0.7839 ns | 0.1209 |    2024 B |
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;

BenchmarkRunner.Run<Benchmarks>();

[MemoryDiagnoser]
public class Benchmarks
{
    private string _s;

    [Params(1, 1000)]
    public int Length { get; set; }

    [GlobalSetup]
    public void Setup() => _s = new string('s', Length);

    [Benchmark]
    public string UseConstString() => M1(_s);

    [Benchmark]
    public string UseSpan() => M2(_s);

    private static string M1(string s) => string.Concat("a", s);

    private static string M2(string s)
    {
        char ch = 'a';
        return string.Concat(new ReadOnlySpan<char>(in ch), s);
    }
}

If you worry that we take space in string blob, than I don't think it matters. Just think about it: it took a decade of development and a project filled to the brim with IL baselines to reach string blob size limitation in only 1 project in the whole roslyn. An avarage developer would probably never face this error in his professional life at all. We save far more space in IL bytes by lowering that char to string than we can potentially save from a string blob.

@AlekseyTs
Copy link
Contributor

Done with review pass (commit 70)

Copy link
Contributor

@AlekseyTs AlekseyTs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (commit 71)

@AlekseyTs
Copy link
Contributor

@333fred, @dotnet/roslyn-compiler For the second review

src/Compilers/CSharp/Portable/CodeGen/EmitExpression.cs Outdated Show resolved Hide resolved
}
else if (arg.ConstantValueOpt is { IsString: true, StringValue: [char c] })
{
preparedArgs.Add(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we not set needsSpanRefParamConstructor or the other tracking locals here? Is it just not worth it for the case where it's a bunch of one-character string additions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is part of Aleksey's unification suggestion on handling constants. This operation might be harmful to both codegen size and runtime perf, so in the end a compromise was made to enable this code path only when we already have other non-constant char, so we would take span-based approach anyway. There are a bunch of tests verifying that we don't take span-based path when there are only constant char.ToStrings in concatenation arguments: Concat{Two|Three|Four}_ConstantCharToString, Concat{Two|Three|Four}_AllConstantCharToStrings

@AlekseyTs
Copy link
Contributor

@DoctorKrolic I suggest to not act on the resolution from #72102 in context of this PR. I will follow up on that separately after this PR is merged, unless, of course, you really want to do that work.

Copy link
Member

@333fred 333fred left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM. I suspect my refactoring comment will become obviated when we adapt to the public api decision today; feel free to resolve it if it's not necessary.

@AlekseyTs
Copy link
Contributor

I suspect my refactoring comment will become obviated when we adapt to the public api decision today; feel free to resolve it if it's not necessary.

I think the adjustment around of usage of SpecialType.None are good regardless, and we should keep them.

@DoctorKrolic
Copy link
Contributor Author

I suggest to not act on the resolution from #72102 in context of this PR

Agreed. Just slightly worried, that there will be a time gap, when the codebase contains unwanted APIs, but I guess, you'll handle this pretty quickly, so it won't be dangerously large

@DoctorKrolic DoctorKrolic requested a review from 333fred March 17, 2024 10:10
@333fred
Copy link
Member

333fred commented Mar 18, 2024

Looks like there are compilation errors? You may need to merge main into the branch @DoctorKrolic.

@DoctorKrolic
Copy link
Contributor Author

I don't know how, but the method was actually somehow missing. Replaced with Update overload. Should be fixed now

Copy link
Member

@333fred 333fred left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (commit 74). @AlekseyTs for the last couple of changes since your approval. Thanks for your persistence here DoctorKrolic!

Copy link
Contributor

@AlekseyTs AlekseyTs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (commit 74)

@AlekseyTs AlekseyTs merged commit 05f015c into dotnet:main Mar 19, 2024
24 checks passed
@AlekseyTs
Copy link
Contributor

@DoctorKrolic Thank you for the contribution.

@dotnet-policy-service dotnet-policy-service bot added this to the Next milestone Mar 19, 2024
@DoctorKrolic DoctorKrolic deleted the concat-string-char-v2 branch March 19, 2024 19:42
@DoctorKrolic
Copy link
Contributor Author

@AlekseyTs Special thank you for such patient and consistent review!

333fred added a commit that referenced this pull request Mar 20, 2024
* upstream/main: (1045 commits)
  Fix excessive compilation times due to speculative parsing after an incomplete string (#72565)
  Convert official pipeline to 1ES template (#72430)
  Fix #69628 Analyzer summary should show suppressor ID (#72569)
  Fix event hookup even when in a projection buffer
  Drop win32-ia32 language server support
  Remove workaround for .net7 r2r assembly loading issue
  remove unused usings
  Share compilation when generators don't produce any generated files
  Fix potential exception in AssetProvider.SynchronizeAssetsAsync (#72597)
  Fix
  Update __arglist.md (#72523)
  Improve code gen for concatenation of `string` and `char` (#71793)
  Reduce File I/O under the AnalyzerAssemblyLoader folder (#72412)
  Reduce allocations in AbstractTypeMap (#72588)
  disable diagnostics when solution crawler option is disabled
  disable diagnostics when solution crawler option is disabled
  better thread transitions
  Fix
  Add check
  Fix
  ...
@RikkiGibson RikkiGibson modified the milestones: Next, 17.10 P3 Mar 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-Compilers Community The pull request was submitted by a contributor who is not a Microsoft employee. untriaged Issues and PRs which have not yet been triaged by a lead
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve string concatenation with non-const chars to use spans
4 participants