-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ignore spacing chars for base64decode #32422
Conversation
stdlib/Base64/src/decode.jl
Outdated
@@ -209,6 +209,7 @@ julia> String(b) | |||
``` | |||
""" | |||
function base64decode(s) | |||
s = replace(s, r"\s" => "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implementation is not ideal since it copies the data. Would be best to handle it during decoding. I also believe it is only necessary to handle ASCII space and newline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So maybe do a || (space) || (newline)
somwhere in here?
julia/stdlib/Base64/src/decode.jl
Line 67 in b56a9f0
if b1 < 0x40 && b2 < 0x40 && b3 < 0x40 && b4 < 0x40 && p + 2 < p_end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, I wrote the original code of this implementation to improve the performance based on my package (https://github.com/bicycle1885/CodecBase.jl). The implementation in this package ignores whitespace characters so it may help here. Otherwise, I can fix the issue instead of you if you are not in a rush.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I think I can pull this one off, Of course it would be easier if you can give me some hint on which part of the code is used to ignore invalid characters when decoding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't remember the code well, but I think you can modify the decoding behavior just by modifying its look-up table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that was the case in your CodecBase but not in Julia's code base.
Will need a test of course. |
Fun fact, python does this to handle invalid characters: |
8694ea5
to
05de117
Compare
05de117
to
a9ee6a7
Compare
@JeffBezanson I have the test covered, and I don't think it's worth to do in the decoding first because there are copying and allocation anyways (if we need to check if a char is valid or not), second because I don't think people pass ~GB of data into base64decode (bonus, regular expression is okally fast?) |
I disagree, and I also realized another problem which is that a user can directly call |
fixes #32397