Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[wasm] Fast-track ASCII/UTF8 conversion in wasm #51310

Closed
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -19,8 +19,10 @@ public static OperationStatus TranscodeToUtf16(byte* pInputBuffer, int inputLeng
Debug.Assert(outputCharsRemaining >= 0, "Destination length must not be negative.");
Debug.Assert(pOutputBuffer != null || outputCharsRemaining == 0, "Destination length must be zero if destination buffer pointer is null.");

var input = new ReadOnlySpan<byte>(pInputBuffer, inputLength);
var output = new Span<char>(pOutputBuffer, outputCharsRemaining);
// try fast-tracking ASCII first before falling back to the standard loop
Copy link
Member

@jkotas jkotas Apr 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means that this is going to fully fix the regression for ASCII-only payloads. The regression is still going to be there to varying degress once the payload contains non-ASCII characters. What do the perf numbers look like for non-ASCII payloads?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we marked Rune.DecodeUtf8 and Rune.DecodeUtf16 as aggressive-inlining, at least on wasm? Those methods each have some amount of error handling logic, where they'll try to figure out how to recover from any invalid data. But these immediate callers don't care about recovering from invalid data. These callers simply want to halt when invalid data is encountered. If we inline those methods, in theory the error recovery logic should be elided because it triggers the if (not success) { break; } block. That might result in a perf improvement, but I don't know offhand what such an improvement might look like.

int numAsciiBytesTranscoded = (int)ASCIIUtility.WidenAsciiToUtf16(pInputBuffer, pOutputBuffer, (uint)Math.Min(inputLength, outputCharsRemaining));
var input = new ReadOnlySpan<byte>(pInputBuffer, inputLength).Slice(numAsciiBytesTranscoded);
var output = new Span<char>(pOutputBuffer, outputCharsRemaining).Slice(numAsciiBytesTranscoded);

OperationStatus opStatus = OperationStatus.Done;
while (!input.IsEmpty)
Expand Down Expand Up @@ -49,9 +51,10 @@ public static OperationStatus TranscodeToUtf8(char* pInputBuffer, int inputLengt
Debug.Assert(outputBytesRemaining >= 0, "Destination length must not be negative.");
Debug.Assert(pOutputBuffer != null || outputBytesRemaining == 0, "Destination length must be zero if destination buffer pointer is null.");


var input = new ReadOnlySpan<char>(pInputBuffer, inputLength);
var output = new Span<byte>(pOutputBuffer, outputBytesRemaining);
// try fast-tracking ASCII first before falling back to the standard loop
int numAsciiCharsTranscoded = (int)ASCIIUtility.NarrowUtf16ToAscii(pInputBuffer, pOutputBuffer, (uint)Math.Min(inputLength, outputBytesRemaining));
var input = new ReadOnlySpan<char>(pInputBuffer, inputLength).Slice(numAsciiCharsTranscoded);
var output = new Span<byte>(pOutputBuffer, outputBytesRemaining).Slice(numAsciiCharsTranscoded);

OperationStatus opStatus = OperationStatus.Done;
while (!input.IsEmpty)
Expand Down Expand Up @@ -86,9 +89,12 @@ public static OperationStatus TranscodeToUtf8(char* pInputBuffer, int inputLengt
Debug.Assert(inputLength >= 0, "Input length must not be negative.");
Debug.Assert(pInputBuffer != null || inputLength == 0, "Input length must be zero if input buffer pointer is null.");

var input = new ReadOnlySpan<byte>(pInputBuffer, inputLength);
int cumulativeUtf16CodeUnitCount = 0;
int cumulativeScalarValueCount = 0;
// try fast-tracking ASCII first before falling back to the standard loop
int numLeadingAsciiBytes = (int)ASCIIUtility.GetIndexOfFirstNonAsciiByte(pInputBuffer, (uint)inputLength);
var input = new ReadOnlySpan<byte>(pInputBuffer, inputLength).Slice(numLeadingAsciiBytes);
int cumulativeUtf16CodeUnitCount = numLeadingAsciiBytes;
int cumulativeScalarValueCount = numLeadingAsciiBytes;

while (!input.IsEmpty)
{
if (Rune.DecodeFromUtf8(input, out Rune rune, out int bytesConsumed) != OperationStatus.Done)
Expand Down