Ignore spacing chars for base64decode #32422

Moelf · 2019-06-26T15:52:53Z

JeffBezanson · 2019-06-26T15:58:02Z

stdlib/Base64/src/decode.jl

@@ -209,6 +209,7 @@ julia> String(b)
 ```
 """
 function base64decode(s)
+    s = replace(s, r"\s" => "")


This implementation is not ideal since it copies the data. Would be best to handle it during decoding. I also believe it is only necessary to handle ASCII space and newline.

So maybe do a || (space) || (newline) somwhere in here?

julia/stdlib/Base64/src/decode.jl

Line 67 in b56a9f0

if b1 < 0x40 && b2 < 0x40 && b3 < 0x40 && b4 < 0x40 && p + 2 < p_end

Hi, I wrote the original code of this implementation to improve the performance based on my package (https://github.com/bicycle1885/CodecBase.jl). The implementation in this package ignores whitespace characters so it may help here. Otherwise, I can fix the issue instead of you if you are not in a rush.

Thanks, I think I can pull this one off, Of course it would be easier if you can give me some hint on which part of the code is used to ignore invalid characters when decoding.

I don't remember the code well, but I think you can modify the decoding behavior just by modifying its look-up table.

I think that was the case in your CodecBase but not in Julia's code base.

JeffBezanson · 2019-06-26T16:08:00Z

Will need a test of course.

Moelf · 2019-06-26T17:13:50Z

Fun fact, python does this to handle invalid characters:
https://github.com/python/cpython/blob/5150d327924959639215ed0a78feffc0d88258da/Lib/base64.py#L85

Moelf · 2019-06-26T20:26:24Z

@JeffBezanson I have the test covered, and I don't think it's worth to do in the decoding first because there are copying and allocation anyways (if we need to check if a char is valid or not), second because I don't think people pass ~GB of data into base64decode (bonus, regular expression is okally fast?)

JeffBezanson · 2019-06-26T21:48:16Z

I disagree, and I also realized another problem which is that a user can directly call readbytes!(Base64DecodePipe(data), vector) where the same issue needs to be handled, plus vector is pre-allocated so no allocation necessarily occurs.

JeffBezanson reviewed Jun 26, 2019

View reviewed changes

Moelf force-pushed the base64decode_space branch from 8694ea5 to 05de117 Compare June 26, 2019 20:23

Ignore spacing chars for base64decode and test

a9ee6a7

Moelf force-pushed the base64decode_space branch from 05de117 to a9ee6a7 Compare June 26, 2019 20:24

Moelf closed this Jul 4, 2019

Moelf deleted the base64decode_space branch July 4, 2019 06:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ignore spacing chars for base64decode #32422

Ignore spacing chars for base64decode #32422

Moelf commented Jun 26, 2019

JeffBezanson Jun 26, 2019

Moelf Jun 26, 2019 •

edited

Loading

bicycle1885 Jun 26, 2019

Moelf Jun 26, 2019

bicycle1885 Jun 26, 2019

Moelf Jun 26, 2019

JeffBezanson commented Jun 26, 2019

Moelf commented Jun 26, 2019

Moelf commented Jun 26, 2019

JeffBezanson commented Jun 26, 2019

Ignore spacing chars for base64decode #32422

Ignore spacing chars for base64decode #32422

Conversation

Moelf commented Jun 26, 2019

JeffBezanson Jun 26, 2019

Choose a reason for hiding this comment

Moelf Jun 26, 2019 • edited Loading

Choose a reason for hiding this comment

bicycle1885 Jun 26, 2019

Choose a reason for hiding this comment

Moelf Jun 26, 2019

Choose a reason for hiding this comment

bicycle1885 Jun 26, 2019

Choose a reason for hiding this comment

Moelf Jun 26, 2019

Choose a reason for hiding this comment

JeffBezanson commented Jun 26, 2019

Moelf commented Jun 26, 2019

Moelf commented Jun 26, 2019

JeffBezanson commented Jun 26, 2019

Moelf Jun 26, 2019 •

edited

Loading