Intrinsify Encoding.UTF8.GetByteCount for constant UTF-16 input #102246

PaulusParssinen · 2024-05-15T08:47:26Z

I have been recently staring at a lot of UTF-8 <-> UTF-16 manipulation code (in Garnet and ILCompiler name mangling 😆) and thought that if UTF8EncodingSealed.ReadUTF8 is getting VN expansion when JIT can see the input is CNS, maybe it could be done for Encoding.UTF8.GetByteCount too, which is most commonly used to calculate the buffer for the following (Try)GetBytes call (or if the input fits to the existing buffer).

We of course have GetMaxByteCount for the fast calculation of upper-bound, but doing the str.Length * 3 makes the buffer lengths go over any stackalloc thresholds very quickly.

This is very much inspired after and in theme with #85328 (and then painfully hitting #93501 😢 )

The text was updated successfully, but these errors were encountered:

dotnet-policy-service · 2024-05-15T08:47:52Z

Tagging subscribers to this area: @dotnet/area-system-text-encoding
See info in area-owners.md if you want to be subscribed.

PaulusParssinen · 2024-05-15T08:48:07Z

Correct area probably CodeGen-coreclr 😆

huoyaoyuan · 2024-05-15T09:06:34Z

For constant UTF-8 strings, is u8 string literal an option for you? If not, what's preventing it?

PaulusParssinen · 2024-05-15T09:26:37Z

u8 is always the better choice if available.

The ReadUTF8 expansion seems to be able to handle any object handle that is IsKnownImmutable e.g. frozen objects such as static readonly string fields. Also in my original plan of simplifying UTF-8 string builder with small custom interpolated string handler it is a problem because the string literals aren't passed as utf-8 RVA spans but as normal strings (we don't have dotnet/csharplang#7072 yet).

runtime/src/coreclr/jit/helperexpansion.cpp

Lines 1616 to 1617 in 426edd0

    
           // We mostly expect string literal objects here, but let's be more agile just in case 
        
           if (!info.compCompHnd->isObjectImmutable(strObj))

dotnet-policy-service · 2024-05-15T09:26:39Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

EgorBo · 2024-05-15T09:27:46Z

I think it should be fairly trivial to intrinsify GetByteCount indeed

dotnet-issue-labeler bot added the area-System.Text.Encoding label May 15, 2024

dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label May 15, 2024

EgorBo added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI and removed untriaged New issue has not been triaged by the area owner area-System.Text.Encoding labels May 15, 2024

EgorBo self-assigned this May 15, 2024

EgorBo added this to the Future milestone May 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Intrinsify Encoding.UTF8.GetByteCount for constant UTF-16 input #102246

Intrinsify Encoding.UTF8.GetByteCount for constant UTF-16 input #102246

PaulusParssinen commented May 15, 2024 •

edited

Loading

dotnet-policy-service bot commented May 15, 2024

PaulusParssinen commented May 15, 2024

huoyaoyuan commented May 15, 2024

PaulusParssinen commented May 15, 2024 •

edited

Loading

dotnet-policy-service bot commented May 15, 2024

EgorBo commented May 15, 2024

Intrinsify Encoding.UTF8.GetByteCount for constant UTF-16 input #102246

Intrinsify Encoding.UTF8.GetByteCount for constant UTF-16 input #102246

Comments

PaulusParssinen commented May 15, 2024 • edited Loading

dotnet-policy-service bot commented May 15, 2024

PaulusParssinen commented May 15, 2024

huoyaoyuan commented May 15, 2024

PaulusParssinen commented May 15, 2024 • edited Loading

dotnet-policy-service bot commented May 15, 2024

EgorBo commented May 15, 2024

PaulusParssinen commented May 15, 2024 •

edited

Loading

PaulusParssinen commented May 15, 2024 •

edited

Loading