-
Notifications
You must be signed in to change notification settings - Fork 8.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve Base64::Decode performance #11467
Conversation
b91b899
to
91d81d4
Compare
VERIFY_ARE_EQUAL(true, success); | ||
VERIFY_ARE_EQUAL(L"foo", result); | ||
// NOTE: Modify testRounds to get the feeling of running a fuzz test on Base64::Decode. | ||
static constexpr auto testRounds = 8; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've run this test 10 times with 1M rounds each.
It tests the decode function with valid base64 strings of random length between 0 and 128 bytes, with and without trailing =
.
What's missing are tests for invalid base64. However given the pointer arithmetic and lack of if conditions based on input in my loop I'm confident that it behaves correctly in such situations.
91d81d4
to
273f870
Compare
@@ -3255,7 +3255,7 @@ class StateMachineExternalTest final | |||
|
|||
pDispatch->_copyContent = L"UNCHANGED"; | |||
// Passing a non-base64 `Pd` param is illegal, won't change the content. | |||
mach.ProcessString(L"\x1b]52;;foo\x07"); | |||
mach.ProcessString(L"\x1b]52;;???\x07"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"foo"
is actually a valid base64 string according to RFC 4648.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clever.
@@ -228,6 +228,8 @@ cfae | |||
Cfg | |||
cfie | |||
cfiex | |||
Cfj |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
interesting -- I wonder if we should not add base64 snippets into the expect list and instead pattern or file exclude them?
src/terminal/parser/base64.cpp
Outdated
do \ | ||
{ \ | ||
const auto n = decodeTable[ch & 0x7f]; \ | ||
error |= (ch | n) & 0xff80; \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wat
how does this work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay so error is effectively, "is the loaded value 255 (the invalid sentinel in the table) or the character larger than 7 bits (clearly invalid ASCII)"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some comments explaining this.
} | ||
|
||
switch (state) | ||
switch (ri) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ri
=> remainder index
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I called r
because it's the accumulation "register" (since on a CPU level r
will almost certainly live inside a register most of the time).
And since i
is usually a counter for something I called it ri
, the "register index-counter-thingy".
"remainder index" however is much better. I'll steal that. 😄
This comment has been minimized.
This comment has been minimized.
// Decodes an UTF8 string encoded with RFC 4648 (Base64) and returns it as UTF16 in dst. | ||
// It supports both variants of the RFC (base64 and base64url), but | ||
// throws an error for non-alphabet characters, including newlines. | ||
// * Throws an exception for all invalid base64 inputs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well,
// It supports both variants of the RFC (base64 and base64url), but | ||
// throws an error for non-alphabet characters, including newlines. | ||
// * Throws an exception for all invalid base64 inputs. | ||
// * Doesn't support whitespace and will throw an exception for such strings. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i bet that this will come bite us later, but i am willing to take that risk
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh it’s "return ERROR_INVALID_DATA" now. I‘ll update the comment.
Is there anything else you’re worried about?
for (auto& i : randomData) | ||
{ | ||
i = rng(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This almost felt like a std::generate
moment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could‘ve used std::generate
but that would force me to add calls to std::begin
, std::end
and std::ref
as well just to generate worse assembly (which really doesn’t matter here).
So I left it as a simple loop which I found to be much more readable. 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah you're right. But it's fun to dream.
Hello @zadjii-msft! Because this pull request has the p.s. you can customize the way I help with merging this pull request, such as holding this pull request until a specific person approves. Simply @mention me (
|
🎉 Handy links: |
This commit renames
Base64::s_Decode
intoBase64::Decode
and improves itsaverage performance on short strings of less than 200 characters by 4.5x.
This is achieved by implementing a classic base64 decoder that reads 4
characters at a time and produces 3 output bytes. Furthermore a small
128 byte lookup table is used to quickly map characters to values.
PR Checklist
Validation Steps Performed
printf "\033]52;c;aHR0cHM6Ly9naXRodWIuY29tL21pY3Jvc29mdC90ZXJtaW5hbC9wdWxsLzExNDY3\a"
https://github.com/microsoft/terminal/pull/11467
✔️