Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Base64 encode message before sending to WebView #2854

Closed
wants to merge 2 commits into from

Conversation

borisyankov
Copy link
Contributor

@borisyankov borisyankov commented Aug 1, 2018

This fixes the infamous red errors in the WebView with the very misleading error numbers.

Messages sent on Android are URL decoded so a message containing %22 (or maybe several other similar sequences) are interpreted and produce invalid data.

See the commits for more details.

Once I've figured out the issue I managed to search for and identify multiple bug reports to React Native that relate to this. (our JS code not being able to contain // comments is related)

There were multiple bug fix PR attempts, but none got merged.

Related reported issues:
facebook/react-native#20069
facebook/react-native#19611

Related bug-fix PRs:
facebook/react-native#17394
facebook/react-native#19655
facebook/react-native#20366

@gnprice gnprice added the upstream: RN Issues related to an issue in React Native label Aug 1, 2018
Copy link
Member

@gnprice gnprice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent! 🎉 Thanks for the diagnosis -- this will be really great to have fixed.

Would you link to one or more of those upstream bugs? That'd help for cross-reference.

One comment below on the internal API.

@@ -32,6 +32,9 @@ export const base64ToHex = (bytes: string) => asciiToHex(base64.decode(bytes));

export const hexToBase64 = (hex: string) => base64.encode(hexToAscii(hex));

export const objToBase64 = (obj: Object | Array<*>): string =>
base64.encode(unescape(encodeURIComponent(JSON.stringify(obj))));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a pretty funny-looking sequence (encode, then unescape, then different encode?). It should have a comment pointing to an explanation. :-)

This one is pretty good:
https://developer.mozilla.org/en-US/docs/Web/API/WindowBase64/Base64_encoding_and_decoding

Also, I'd rather separate this into two pieces:

  • JSON.stringify -- the caller can do this, it knows its business
  • this funny dance to turn what's already a perfectly good string, into something that will be inert under percent-decoding and basically any other reinterpretation.

I'd also want the name to be a bit more specific, because there's more going on to what this encoding means than "convert to base64". It's converting to base64 in a very particular way, which is that it's turning the string into UTF-8 (through this hack of unescape(encodeURIComponent(...))), and base64-encoding that bytestream. If the code interpreting this only knew "it's base64", that wouldn't be enough to interpret it correctly with confidence.

So I think the primitive I'd want here is something like this:

export const base64Utf8Encode = (text: string): string => base64.encode(unescape(encodeURIComponent(text)))

@gnprice
Copy link
Member

gnprice commented Aug 1, 2018

Also, the commit message lists five of our issues:

#2538 #2505 #2538 #2558 #2751

Those are actually four issues, as there's a duplicate :). And #2751 says it happens on iOS too -- so perhaps that one isn't fixed by this, and is something else?

@borisyankov borisyankov force-pushed the encode-webview-data branch 2 times, most recently from 5221609 to 4a5df0d Compare August 2, 2018 07:15
The function encodes a JavaScript string to a string in Base64 format.

The way to reliably encode strings to Base64 is described in
many places, including:
https://developer.mozilla.org/en-US/docs/Web/API/WindowOrWorkerGlobalScope/btoa#Unicode_strings

Instead of the `btoa` function we use `base64.encode` from our
external base64 library to prevent the error:
'InvalidCharacterError: The string to be encoded contains
characters outside of the Latin1 range.'

Note: JavaScript strings are UCS-2 or UTF-16 (not UTF-8)
http://es5.github.io/x2.html
Fixes zulip#2505 zulip#2538 zulip#2558

When using `postMessage` to communicate with the WebView, depending
on the content of the message we send, an exception might be thrown.

A minimal but reliable way to reproduce the issue is to send
the string `'%22'`. Other character combinations might also cause issues.

Messages are sent to the WebView in different ways for iOS/Android.
This explains why the issue this is fixing is Android-specific.

In iOS, a custom navigation scheme is used to pass the data into
the webview (the RCTJSNavigationScheme constant one can grep for)

More interesting source code reading on that can be done if one
looks at `webViewDidFinishLoad` in `RTCWebView.m`.

The Android messaging happens in `receiveCommand` in `ReactWebViewManager.java`
It is passed by navigating to a URL of the type `javascript:(.....)`
which is a standard way of injecting JavaScript into webpages.

The issue comes from the fact that `loadUrl` since Android 4.4 does
an URL decode on the string passed to it:

https://developer.android.com/reference/android/webkit/WebView#loadUrl(java.lang.String)

`evaluateJavascript` does not:

https://developer.android.com/reference/android/webkit/WebView.html#evaluateJavascript(java.lang.String,%20android.webkit.ValueCallback%3Cjava.lang.String%3E)
@borisyankov
Copy link
Contributor Author

I did add several links to issues and bugs.
Also removed the duplicate issue and the one that is iOS-related (and I can confirm is not fixed by this)

Also renamed the encoding function. I did not call it base64Utf8Encode as the JS string format is not UTF-8 but one of the two UCS-2 or UTF-16 thus called it strToBase64; added the link to MDN and the spec for string encoding.

@gnprice
Copy link
Member

gnprice commented Aug 2, 2018

Thanks!

I did not call it base64Utf8Encode as the JS string format is not UTF-8 but one of the two UCS-2 or UTF-16 thus called it strToBase64; added the link to MDN and the spec for string encoding.

Right, so, this is why what it's doing is kind of odd and it's good to be explicit:

What text => unescape(encodeURIComponent(text)) does is precisely to take the Unicode string that we had that is a nice normal JS string (with UTF-16 or UCS-2 internally, but that doesn't matter for this particular API)... and encode it as UTF-8. That is, anywhere that the string has a codepoint with a value greater than 255, it replaces that by a sequence of characters each no more than 255, and it uses UTF-8 to do it.

The reason that's what happens is that encodeURIComponent encodes things as UTF-8, and follows up by encoding those bytes with percent-encoding. The job of the unescape is to undo the percent-encoding; but it doesn't undo the UTF-8, it just leaves us with a sequence of characters that each represent one of those bytes of UTF-8.

Here's an example, which I did just now in the Chrome devtools console:

> '😈';
"😈"
> encodeURIComponent('😈');
"%F0%9F%98%88"
> unescape(encodeURIComponent('😈'));
"�"
> unescape(encodeURIComponent('😈')) === '\xf0\x9f\x98\x88';
true

And in fact the four bytes f0 9f 98 88 are the UTF-8 encoding of U+1f608, which is...

$ python3 -c 'print("\U0001f608")'
😈

@gnprice
Copy link
Member

gnprice commented Aug 2, 2018

And perhaps the other half of that explanation: the reason that that's what we want to happen, or anyway what we choose to have happen, is that unfortunate error "The string to be encoded contains characters outside of the Latin1 range." from either btoa or the equivalent from base64.encode.

In other words, what they really want to be encoding is bytes. They take their input in the form of a JS string, because that's the type that's available... but then they insist that all the characters in the input actually have values no more than 255, i.e. they can interpret their values as bytes. And one way of saying that is that they need to be "in the Latin1 range", because iso-8859-1 aka latin-1 is the character set that consists of the characters in the low 256 codepoints of Unicode (i.e. 0 to 255), each represented as the byte with the same value.


test('supports unicode characters', () => {
const obj = { key: '😇😈' };
const expected = 'W29iamVjdCBPYmplY3Rd';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this does not test what you want it to test: :-p

$ echo W29iamVjdCBPYmplY3Rd | base64 -d ; echo '$'
[object Object]$

gnprice pushed a commit that referenced this pull request Aug 2, 2018
Fixes #2505, #2538, #2558.

When using `postMessage` to communicate with the WebView, depending
on the content of the message we send, an exception might be thrown.

A minimal but reliable way to reproduce the issue is to send the
string `'%22'`.  Other character combinations might also cause issues.

To work around the issue, we encode our data in the nice boring
characters of base64 so that it doesn't get reinterpreted.

---

More details on how React Native sends messages to the WebView:

The logic is different on iOS vs. Android.  This explains why the
issue this is fixing is Android-specific.

On iOS, a custom navigation scheme is used to pass the data into the
webview; search the RN source for `RCTJSNavigationScheme`, and see the
implementation of `webViewDidFinishLoad` in `RTCWebView.m`.

The Android messaging happens in `receiveCommand` in
`ReactWebViewManager.java`, by navigating to a URL of the form
`javascript:(.....)`, which is a standard way of injecting JavaScript
into web pages.

The issue comes from the fact that since Android 4.4, `loadUrl` does
a URL decode on the string passed to it:
  https://issuetracker.google.com/issues/36995865

See #2854 for links to upstream RN bug reports and PRs.

[greg: lightly revised commit message; found better reference for the
 `loadUrl` issue]
@gnprice
Copy link
Member

gnprice commented Aug 2, 2018

Merged as 787c4a5 -- thanks again!

@gnprice gnprice closed this Aug 2, 2018
gnprice pushed a commit that referenced this pull request Aug 8, 2018
Fixes #2505, #2538, #2558.

When using `postMessage` to communicate with the WebView, depending
on the content of the message we send, an exception might be thrown.

A minimal but reliable way to reproduce the issue is to send the
string `'%22'`.  Other character combinations might also cause issues.

To work around the issue, we encode our data in the nice boring
characters of base64 so that it doesn't get reinterpreted.

---

More details on how React Native sends messages to the WebView:

The logic is different on iOS vs. Android.  This explains why the
issue this is fixing is Android-specific.

On iOS, a custom navigation scheme is used to pass the data into the
webview; search the RN source for `RCTJSNavigationScheme`, and see the
implementation of `webViewDidFinishLoad` in `RTCWebView.m`.

The Android messaging happens in `receiveCommand` in
`ReactWebViewManager.java`, by navigating to a URL of the form
`javascript:(.....)`, which is a standard way of injecting JavaScript
into web pages.

The issue comes from the fact that since Android 4.4, `loadUrl` does
a URL decode on the string passed to it:
  https://issuetracker.google.com/issues/36995865

See #2854 for links to upstream RN bug reports and PRs.

[greg: lightly revised commit message; found better reference for the
 `loadUrl` issue]
@gnprice gnprice mentioned this pull request Mar 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
a-Android bug upstream: RN Issues related to an issue in React Native
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants