Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to decode base 64 string #959

Closed
ManuISEN opened this issue Jul 26, 2024 · 5 comments · Fixed by #960
Closed

Unable to decode base 64 string #959

ManuISEN opened this issue Jul 26, 2024 · 5 comments · Fixed by #960
Labels
type: bug A code related bug vrl: stdlib Changes to the standard library

Comments

@ManuISEN
Copy link

I'm unable to use the VRL decode_base64 function with the following string

$ decode_base64!("eyJzY2hlbWEiOiJpZ2x1OmNvbS5zbm93cGxvd2FuYWx5dGljcy5zbm93cGxvdy91bnN0cnVjdF9ldmVudC9qc29uc2NoZW1hLzEtMC0wIiwiZGF0YSI6eyJzY2hlbWEiOiJpZ2x1OmNvbS5zbm93cGxvd2FuYWx5dGljcy5zbm93cGxvdy9saW5rX2NsaWNrL2pzb25zY2hlbWEvMS0wLTEiLCJkYXRhIjp7InRhcmdldFVybCI6Imh0dHBzOi8vaWRwLWF1dGguZ2FyLmVkdWNhdGlvbi5mci9kb21haW5lR2FyP2lkRU5UPVNqQT0maWRTcmM9WVhKck9pODBPRFUyTmk5d2RERTRNREF3TVE9PSIsImVsZW1lbnRJZCI6IiIsImVsZW1lbnRDbGFzc2VzIjpbImxpbmstYnV0dG9uIiwidHJhY2tlZCJdLCJlbGVtZW50VGFyZ2V0IjoiX2JsYW5rIn19fQ") function call error for "decode_base64" at (0:500): unable to decode value to base64

I can decode it outside VRL. It seems related to the URL in it.

Here below the decoded string with Notepad++.

{"schema":"iglu:com.snowplowanalytics.snowplow/unstruct_event/jsonschema/1-0-0","data":{"schema":"iglu:com.snowplowanalytics.snowplow/link_click/jsonschema/1-0-1","data":{"targetUrl":"https://idp-auth.gar.education.fr/domaineGar?idENT=SjA=&idSrc=YXJrOi80ODU2Ni9wdDE4MDAwMQ==","elementId":"","elementClasses":["link-button","tracked"],"elementTarget":"_blank"}}}

@jszwedko
Copy link
Member

Interesting, it seems like the issue is lack of padding. For example:

decode_base64!("eyJzY2hlbWEiOiJpZ2x1OmNvbS5zbm93cGxvd2FuYWx5dGljcy5zbm93cGxvdy91bnN0cnVjdF9ldmVudC9qc29uc2NoZW1hLzEtMC0wIiwiZGF0YSI6eyJzY2hlbWEiOiJpZ2x1OmNvbS5zbm93cGxvd2FuYWx5dGljcy5zbm93cGxvdy9saW5rX2NsaWNrL2pzb25zY2hlbWEvMS0wLTEiLCJkYXRhIjp7InRhcmdldFVybCI6Imh0dHBzOi8vaWRwLWF1dGguZ2FyLmVkdWNhdGlvbi5mci9kb21haW5lR2FyP2lkRU5UPVNqQT0maWRTcmM9WVhKck9pODBPRFUyTmk5d2RERTRNREF3TVE9PSIsImVsZW1lbnRJZCI6IiIsImVsZW1lbnRDbGFzc2VzIjpbImxpbmstYnV0dG9uIiwidHJhY2tlZCJdLCJlbGVtZW50VGFyZ2V0IjoiX2JsYW5rIn19fQ==") 

Works. I think decode_base64 should work regardless of padding though so this seems like a bug.

@jszwedko jszwedko added type: bug A code related bug vrl: stdlib Changes to the standard library labels Jul 26, 2024
jszwedko added a commit that referenced this issue Jul 26, 2024
Fixes: #959

Signed-off-by: Jesse Szwedko <jesse.szwedko@datadoghq.com>
github-merge-queue bot pushed a commit that referenced this issue Jul 26, 2024
* fix(stdlib): `decode_base64` shouldn't require padding

Fixes: #959

Signed-off-by: Jesse Szwedko <jesse.szwedko@datadoghq.com>

* rename changelog fragment

Signed-off-by: Jesse Szwedko <jesse.szwedko@datadoghq.com>

---------

Signed-off-by: Jesse Szwedko <jesse.szwedko@datadoghq.com>
@ManuISEN
Copy link
Author

Hello @jszwedko,
Thanks for your quick return.
Do you plan to push the fix in a new release?

@jszwedko
Copy link
Member

Hi @ManuISEN,

Yes, it'll be included in the next Vector release (v0.41.0). It was just a bit too late to make it into v0.40.0.

@shaeqahmed
Copy link

@jszwedko I've copied the function from the latest code, and although the base64_decode is now able to parse the input from this issue regardless of padding, it still fails on this input:

decode_base64!("VmVjdG9yIFNjb3JlOiAxMCwgREVOWSB0aHJlc2hvbGQ6IDksIEFsZX")

Modifiying the base64::engine::general_purpose::GeneralPurposeConfig config to add a .with_decode_allow_trailing_bits(true) fixes it, and online decoders seem to decode this base64 string just fine.

From docs: [...] decode base64 produced by a buggy encoder that has bits set in the unused space on the last base64 character as per forgiving-base64 decode. If invalid trailing bits are present and this is true, those bits will be silently ignored, else DecodeError::InvalidLastSymbol will be emitted.

.with_decode_allow_trailing_bits(true)

Wondering if this should be an optionally configured parameter in VRL's stdlib decode_base64 or just enabled always to make the parsing more tolerant. This example is taken from Akamai Logs which contain base64 encoded strings. If you are open to such a change, happy to make a PR, thanks.

@jszwedko
Copy link
Member

jszwedko commented Aug 9, 2024

Thanks for testing this out @shaeqahmed !

I would be ok with adding an option to allow trailing bits and defaulting it to true. A PR would be welcome :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug A code related bug vrl: stdlib Changes to the standard library
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants