-
Notifications
You must be signed in to change notification settings - Fork 29.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
buffer: improve btoa
performance
#52427
Conversation
@anonrig I recommend trying something like this... We have three cases: one external one-byte string (easy), one non-external one byte string (we use WriteOneByte) and the general case where we go to 16-bit words, we try converting to latin1 and then we see what happens. In this scenario, there might be an error condition, currently in the code I suggest, it returns the empty string. static void Btoa(const FunctionCallbackInfo<Value>& args) {
CHECK_EQ(args.Length(), 1);
Environment* env = Environment::GetCurrent(args);
THROW_AND_RETURN_IF_NOT_STRING(env, args[0], "argument");
Local<String> input = args[0].As<String>();
MaybeStackBuffer<char> buffer;
size_t written;
if (input->IsExternalOneByte()) { // 8-bit case
auto ext = input->GetExternalOneByteStringResource();
size_t expected_length = simdutf::base64_length_from_binary(ext->length());
buffer.AllocateSufficientStorage(expected_length + 1);
buffer.SetLengthAndZeroTerminate(expected_length);
written = simdutf::binary_to_base64(
ext->data(), ext->length(), buffer.out(), simdutf::base64_default);
} if(input->IsOneByte()) {
MaybeStackBuffer<uint8_t> stack_buf(input->Length());
input->WriteOneByte(env->isolate(),
stack_buf.out(),
0,
input->Length(),
String::NO_NULL_TERMINATION);
size_t expected_length = simdutf::base64_length_from_binary(input->Length());
buffer.AllocateSufficientStorage(expected_length + 1);
buffer.SetLengthAndZeroTerminate(expected_length);
written = simdutf::binary_to_base64(
reinterpret_cast<const char*>(*stack_buf), input->Length(), buffer.out());
} else {
String::Value value(env->isolate(), input);
MaybeStackBuffer<char> stack_buf(value.length());
size_t out_len = simdutf::convert_utf16_to_latin1(reinterpret_cast<const char16_t *>(*value), value.length(), stack_buf.out());
if(out_len == 0) { // error
return args.GetReturnValue().SetEmptyString();
}
size_t expected_length = simdutf::base64_length_from_binary(out_len);
buffer.AllocateSufficientStorage(expected_length + 1);
buffer.SetLengthAndZeroTerminate(expected_length);
written = simdutf::binary_to_base64(
*stack_buf, out_len, buffer.out());
}
auto value =
String::NewFromOneByte(env->isolate(),
reinterpret_cast<const uint8_t*>(buffer.out()),
NewStringType::kNormal,
written)
.ToLocalChecked();
return args.GetReturnValue().Set(value);
} This should be considered ChatGTP-quality code: I did not even try to test it... It is only as an idea. |
@anonrig Lint. Lint. Lint. |
@lemire I've updated the benchmark results as well :-) |
@anonrig My proposed code would return the empty string on error, but we need to throw, I pushed a commit that does that. (Very minor change.) |
Commit Queue failed- Loading data for nodejs/node/pull/52427 ✔ Done loading data for nodejs/node/pull/52427 ----------------------------------- PR info ------------------------------------ Title buffer: improve `btoa` performance (#52427) Author Yagiz Nizipli (@anonrig) Branch anonrig:improve-btoa -> nodejs:main Labels buffer, c++, needs-ci Commits 2 - buffer: improve `btoa` performance - fix: return exception Committers 2 - Yagiz Nizipli - Daniel Lemire PR-URL: https://github.com/nodejs/node/pull/52427 Reviewed-By: Daniel Lemire Reviewed-By: James M Snell ------------------------------ Generated metadata ------------------------------ PR-URL: https://github.com/nodejs/node/pull/52427 Reviewed-By: Daniel Lemire Reviewed-By: James M Snell -------------------------------------------------------------------------------- ℹ This PR was created on Mon, 08 Apr 2024 17:24:56 GMT ✔ Approvals: 2 ✔ - Daniel Lemire (@lemire): https://github.com/nodejs/node/pull/52427#pullrequestreview-1992265694 ✔ - James M Snell (@jasnell) (TSC): https://github.com/nodejs/node/pull/52427#pullrequestreview-1993089038 ⚠ GitHub cannot link the author of 'fix: return exception' to their GitHub account. ⚠ Please suggest them to take a look at https://github.com/nodejs/node/blob/99b1ada/doc/guides/contributing/pull-requests.md#step-1-fork ✔ Last GitHub CI successful ℹ Last Full PR CI on 2024-04-11T02:25:23Z: https://ci.nodejs.org/job/node-test-pull-request/58258/ - Querying data for job/node-test-pull-request/58258/ ✔ Last Jenkins CI successful -------------------------------------------------------------------------------- ✔ No git cherry-pick in progress ✔ No git am in progress ✔ No git rebase in progress -------------------------------------------------------------------------------- - Bringing origin/main up to date... From https://github.com/nodejs/node * branch main -> FETCH_HEAD ✔ origin/main is now up-to-date - Downloading patch for 52427 From https://github.com/nodejs/node * branch refs/pull/52427/merge -> FETCH_HEAD ✔ Fetched commits as ee4fa77624f1..6900b0dce7cf -------------------------------------------------------------------------------- [main 50c914c521] buffer: improve `btoa` performance Author: Yagiz Nizipli Date: Mon Apr 8 13:23:56 2024 -0400 3 files changed, 81 insertions(+), 7 deletions(-) create mode 100644 benchmark/buffers/buffer-btoa.js [main de7c8dcb6a] fix: return exception Author: Daniel Lemire Date: Wed Apr 10 09:44:11 2024 -0400 2 files changed, 6 insertions(+), 2 deletions(-) ✔ Patches applied There are 2 commits in the PR. Attempting autorebase. Rebasing (2/4)https://github.com/nodejs/node/actions/runs/8645651485 |
Landed in 21211a3 |
PR-URL: #52427 Reviewed-By: Daniel Lemire <daniel@lemire.me> Reviewed-By: James M Snell <jasnell@gmail.com>
PR-URL: #52427 Reviewed-By: Daniel Lemire <daniel@lemire.me> Reviewed-By: James M Snell <jasnell@gmail.com>
Improves the performance of the legacy
btoa
function. It's not recommended to be used in production.Disclaimer
"btoa" is a legacy functionality, and not recommended to be used in production applications.
Thanks Matthieu Riegler for the poster