Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intrinsify Encoding.UTF8.GetByteCount for constant UTF-16 input #102246

Open
PaulusParssinen opened this issue May 15, 2024 · 6 comments
Open

Intrinsify Encoding.UTF8.GetByteCount for constant UTF-16 input #102246

PaulusParssinen opened this issue May 15, 2024 · 6 comments
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Milestone

Comments

@PaulusParssinen
Copy link
Contributor

PaulusParssinen commented May 15, 2024

I have been recently staring at a lot of UTF-8 <-> UTF-16 manipulation code (in Garnet and ILCompiler name mangling 😆) and thought that if UTF8EncodingSealed.ReadUTF8 is getting VN expansion when JIT can see the input is CNS, maybe it could be done for Encoding.UTF8.GetByteCount too, which is most commonly used to calculate the buffer for the following (Try)GetBytes call (or if the input fits to the existing buffer).

We of course have GetMaxByteCount for the fast calculation of upper-bound, but doing the str.Length * 3 makes the buffer lengths go over any stackalloc thresholds very quickly.

This is very much inspired after and in theme with #85328 (and then painfully hitting #93501 😢 )

@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label May 15, 2024
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-text-encoding
See info in area-owners.md if you want to be subscribed.

@PaulusParssinen
Copy link
Contributor Author

Correct area probably CodeGen-coreclr 😆

@huoyaoyuan
Copy link
Member

For constant UTF-8 strings, is u8 string literal an option for you? If not, what's preventing it?

@EgorBo EgorBo added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI and removed untriaged New issue has not been triaged by the area owner area-System.Text.Encoding labels May 15, 2024
@EgorBo EgorBo self-assigned this May 15, 2024
@EgorBo EgorBo added this to the Future milestone May 15, 2024
@PaulusParssinen
Copy link
Contributor Author

PaulusParssinen commented May 15, 2024

u8 is always the better choice if available.

The ReadUTF8 expansion seems to be able to handle any object handle that is IsKnownImmutable e.g. frozen objects such as static readonly string fields. Also in my original plan of simplifying UTF-8 string builder with small custom interpolated string handler it is a problem because the string literals aren't passed as utf-8 RVA spans but as normal strings (we don't have dotnet/csharplang#7072 yet).

// We mostly expect string literal objects here, but let's be more agile just in case
if (!info.compCompHnd->isObjectImmutable(strObj))

Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@EgorBo
Copy link
Member

EgorBo commented May 15, 2024

I think it should be fairly trivial to intrinsify GetByteCount indeed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

No branches or pull requests

3 participants