-
Notifications
You must be signed in to change notification settings - Fork 642
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf(encoding): improve base32 encode/decode performance #6479
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #6479 +/- ##
==========================================
- Coverage 95.32% 95.29% -0.03%
==========================================
Files 577 576 -1
Lines 43417 43256 -161
Branches 6485 6467 -18
==========================================
- Hits 41386 41222 -164
- Misses 1992 1995 +3
Partials 39 39 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
@@ -37,9 +37,9 @@ Deno.test({ | |||
name: "decodeBase32() throws on bad length", | |||
fn() { | |||
assertThrows( | |||
() => decodeBase32("OOOO=="), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The result of this input seems changed by this PR. Is that intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. The way the new code detects errors is different where the old input would be valid. The old input had the wrong length based off the encoding + padding but the new code doesn't care about the padding as long as it isn't too much padding. Four encoded characters is valid here while three is invalid
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the new code doesn't care about the padding as long as it isn't too much padding.
I'm not sure if we should do that change. I checked the standard libraries of some other languages, but they seem checking padding length. Probably we should keep that check?
$ cat enc.py
import base64
base64.b32decode("OOOO==")
$ python3 enc.py
Traceback (most recent call last):
File "/Users/kt3k/denoland/std/enc.py", line 2, in <module>
base64.b32decode("OOOO==")
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/base64.py", line 254, in b32decode
return _b32decode(_b32alphabet, s, casefold, map01)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/base64.py", line 210, in _b32decode
raise binascii.Error('Incorrect padding')
binascii.Error: Incorrect padding
$ cat main.go
package main
import (
"encoding/base32"
"fmt"
)
func main() {
enc := base32.NewEncoding("ABCDEFGHIJKLMNOPQRSTUVWXYZ234567")
res, e := enc.DecodeString("OOOO==")
if e != nil {
fmt.Printf("%s", e)
} else {
fmt.Printf("%s", res)
}
}
$ go run main.go
illegal base32 data at input byte 6
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was more written this way for convince. The code for it is here: https://github.com/denoland/std/blob/main/encoding/_common32.ts#L107-L119 where it really is just seeing how much padding it needs to slice off before doing the actual decoding.
I can understand wanting to be in align with other std's but I also don't see a benefit in being strict here with the padding when decoding when you know what the missing bytes are.
The change to base64 operates in the same manner here. Where base64url may allow omitting padding, but base64 requires it, the decoder doesn't care if padding is short
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this kind of change needs more broader consensus from the community and core team, and I'm not in favor of landing this as part of perf
change. Can you add back the simple check of length like the below and create an issue for discussing that?:
if (len % 8 > 0) {
throw new Error(
`Cannot decode base32 string as the length must be a multiple of 8: received length ${len}`,
);
}
The change to base64 operates in the same manner here. Where base64url may allow omitting padding, but base64 requires it, the decoder doesn't care if padding is short
Base64 decoder seems ignoring missing padding before the recent rewrite (#6461). That might be a good reason to remove padding length check from decodeBase32
. Anyway I think we need to discuss this topic independently somewhere else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Nice work!
This pull request brings the performance added in #6471 to the stable base32 functions, but in a non-breaking manner. The only difference a user would find is a difference in error message if providing an invalid encoded string.