Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Router fix #2061

Closed
wants to merge 2 commits into from
Closed

Router fix #2061

wants to merge 2 commits into from

Conversation

dead-claudia
Copy link
Member

@dead-claudia dead-claudia commented Dec 26, 2017

Description

Motivation and Context

Fixes part of #2060, and needs backported.

The original bug seems to make a case for security with this, but I doubt exploiting the bug could do any more than just play a practical joke on people.

How Has This Been Tested?

Added a few new tests on both the API and router service

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING document.
  • I have added tests to cover my changes.
  • All new and existing tests passed.
  • I have updated docs/change-log.md

@dead-claudia dead-claudia requested a review from pygy December 26, 2017 20:17
@dead-claudia dead-claudia requested a review from tivac as a code owner December 26, 2017 20:17
@dead-claudia
Copy link
Member Author

@pygy This doesn't address the part of browsers returning inconsistent results for routes. Just the case of invalid routes.

@pygy
Copy link
Member

pygy commented Dec 26, 2017

@isiahmeadows I'll have to ponder over this...

Now Chrome also uses percent encoding for unicode characters, and none encodes % or plain spaces.

https://flems.io/#0=N4IgZglgNgpgziAXAbVAOwIYFsZJAOgAsAXLKEAGhAGMB7NYmBvEAXwvW10QICsEqdBk2J4wAVzTViEegAJGcYgAoAHgEo5wADpo5cgO4Q0AE1oH8UWtQwz6RDHEJyAvHNW79QuLViXaAOZqFIbGZhZWNnZoDk7quqy6uooq2iAApGnxwkrKaRgARtTphdRZyfCpIBhyRekATAAMZSDZKXkgAG9ZbBwgmDh4+NRwAjT0jMw8bAC6rEA

Also, I think that @ FremyCompany wants to have % characters in the route.

@dead-claudia
Copy link
Member Author

dead-claudia commented Dec 26, 2017

@pygy To clarify, this only concerns itself with routes like /abc%def, which are obviously going to get rejected.

IMHO the easiest workaround if @ FremyCompany wants to just have raw % in the route is to just use %25, the encoded equivalent of that route.

Oddly enough, our support for such percent-encoded routes appears completely undocumented somehow... (search for either encode or %) Probably something we need to fix at some point.

@pygy
Copy link
Member

pygy commented Dec 26, 2017

@isiahmeadows do we need to gracefully fall back if it happens? I don't think that raw % are supported in URLs... Likewise, percent-encoding a non-utf-8 byte sequence sounds like asking for trouble.

We could perhaps make the regexp smarter: /%[ab89][a-f0-9](?:%[c-f][a-f0-9])+/gim , or even

/%[0-7][0-9a-f]|%[cd][0-9a-f]%[89ab][0-9a-f]|%e[0-9a-f]%[89ab][0-9a-f]%[89ab][0-9a-f]|%f[0-3]%[89ab][0-9a-f]%[89ab][0-9a-f]%[89ab][0-9a-f]|%f4%8[0-9a-f]%[89ab][0-9a-f]%[89ab][0-9a-f]%[89ab][0-9a-f]/gi

(not sure why the original has a m flag). That would only decode things that JS understands (including unpaired UTF-16 surrogates, since they are valid UCS2).

@dead-claudia
Copy link
Member Author

@pygy My only thought is that we not crash the app ourselves over something that not even the developers can realistically control.

Of course we could also update the regexp, but I'd rather still have the safeguard in place.

@pygy
Copy link
Member

pygy commented Dec 27, 2017

@isiahmeadows having the check is prudent. The regexp above is too broad, /%[0-7][0-9a-f]|%[cd][0-9a-f]%[89ab][0-9a-f]|%e[0-9a-f]%[89ab][0-9a-f]%[89ab][0-9a-f]/ is sufficient to go up to U+10FFFF.

However I couldn't test it on the whole range of code points because encodeURI refuses to encode not only unpaired UTF-16 surrogate code points, but also their astral plane counterparts (which AFAIK are valid code points). See here...

Edit, actually, when using the correct function (String.fromCodePoint(), not .fromCharCode()) the regexp works with the whole range of valid code points.

@dead-claudia
Copy link
Member Author

@pygy Thought I'd clarify: I'm not against the idea of also fixing the regexp (I'm actually kind of with you on it). It's just not the subject of this PR. This is purely just defining the fallback mechanism in case we ever miss an edge case and fail to catch the grammar perfectly, for a bit of fault tolerance.

@pygy
Copy link
Member

pygy commented Dec 27, 2017

@isiahmeadows understood.

In that case, we might as well do

var data = $window.location[fragment]
try {data = data.replace($regexp, decodeURIComponent)} catch(e){}

in normalize, it will preserve more infomation than setting the route to null.

Note that the correct RegExp is actually

/
%[0-7][0-9a-f] |
% [cd][0-9a-f] %[89ab][0-9a-f] |
%   e (?!d%[ab])
      [0-9a-f] %[89ab][0-9a-f] %[89ab][0-9a-f] |
%   f    [0-3] %[89ab][0-9a-f] %[89ab][0-9a-f] %[89ab][0-9a-f] |
%   f       4  %    8 [0-9a-f] %[89ab][0-9a-f] %[89ab][0-9a-f]
/gi

, after stripping white space (I shouldn't code that late, but past a certain point in tiredness I lose the ability to judge my own sleepiness...). decodeURI doesn't support doubly-encoded UTF-8 (WTF-8).

The %[89ab][0-9a-f] parts may be repeated with {2} and {3} depending on what compresses better.

Another thing I wonder: what happens when the page's encoding isn't UTF-8? Is the page encoding used to extract percent-encoded characters?

@dead-claudia
Copy link
Member Author

@pygy

In that case, we might as well do [...]

I'm okay with that, if you prefer that over the existing "go to default".

Another thing I wonder: what happens when the page's encoding isn't UTF-8? Is the page encoding used to extract percent-encoded characters?

Should have exactly zero impact on the issue. It might affect how the page itself is interpreted, but not the URL.

@dead-claudia
Copy link
Member Author

@pygy Regarding the serialization issue:

  1. It appears IE has no support for URL (and its href is erroneously unescaped), and the rest have had it for years. So we can special-case IE's brokenness based on that knowledge alone.
  2. If you can repro Chrome's bad hash value with new URL("https://example.com/ö/o?ö/o#ö/o").hash (and maybe @tivac with Edge), we can use URL-based feature testing for the rest, so we can avoid double-unescaping URLs. (This could come up with say, /abc%25123, in which we don't want to decode that to "/abc\u{12}3".) I don't have Chrome easy-access ATM, but if you could check, that would be really nice.

@pygy
Copy link
Member

pygy commented Dec 27, 2017

@isiahmeadows Chrome now works like Firefox and Safari.

Edge still behaves like it did two years ago when I wrote the #881 (comment) table.

For IE, we can emulate URL by creating a dummy <a> element and playing with its href attribute.

@dead-claudia
Copy link
Member Author

@pygy Does the new URL(location.href).* feature test work identically to location.* setting in Edge?

@pygy
Copy link
Member

pygy commented Dec 27, 2017

I only tested new URL() today, and it behaves as location did two years ago (unicode chars are not encoded by the getters).

Edit: do you specifically want to see how new URL(location.href) behaves?

@dead-claudia
Copy link
Member Author

@pygy I need you to test both, so I know if they deviate or not. If they do, then we can't use it for feature-testing double-escaping.

@pygy
Copy link
Member

pygy commented Dec 27, 2017

So, I just tried this.

In Edge, nothing is percent encoded.

In all other browsers, the a.search part is percent-encoded in Latin 1 where ö is %f6.
In Firefox, location.search is also percent-encoded in Latin 1.
All other parts are percent encoded in UTF-8 (%C3%B6).

The Latin 1 thing is new, I wonder if someone, somewhere is trolling me.

Edit: another one with new URL() added.

new URL() has everything percent-encoded in UTF-8, even in Firefox (except in Edge where nothing is encoded).

FFFFFFFUUUUUUUUUUUU, and I'm mincing my words.

Edit2: Thankfully the Latin 1 thing seems to be limited to Flems. @porsager any idea of what could cause this?

Edit3: chrome using percent encoding in the hash is quite recent. I have an old alt-Chromium installed (v58) that still has the hash non-encoded. In the new version the hash part of the location bar ends up percent encoded whereas the pathname and the search parts don't.

@porsager
Copy link
Contributor

@pygy it appears to be because the flems runtime html doesn't have <meta charset="utf-8">. I've added it to next.flems.io where you can see the correct results now..

I suppose it would be a good idea adding it to production as well? (just want to be sure what the most correct behavior/expectation is)

@pygy
Copy link
Member

pygy commented Dec 28, 2017

So, the HTML page encoding affects the way percent encoding works in odd ways, even though it shouldn't according to the spec. Thanks @porsager for the diagnostic and the quick fix :-)

@porsager
Copy link
Contributor

You're welcome... I've been burned by this before, so was the first thing I tried out :P I've added the fix to flems.io as well now..

@dead-claudia
Copy link
Member Author

@pygy Does the above discussion block this PR from getting merged?

@pygy
Copy link
Member

pygy commented Dec 28, 2017

@isiahmeadows the Latin-1 stuff, no, but I'd rather have the try/catch block in normalize(), it will preserve more info in some cases (bad characers in param values) and go to the default route otherwise.

@dead-claudia
Copy link
Member Author

Okay. I'll see about updating it once I have a chance.

@dead-claudia
Copy link
Member Author

@pygy Updated.

@dead-claudia
Copy link
Member Author

@pygy Ping?

@pygy
Copy link
Member

pygy commented Jan 7, 2018

@isiahmeadows I think that the try/catch block in getPath() is now redundant (likewise for the null check in defineRoutes().

I'll be mostly offline for the next few days (back on Thursday at worse).

@dead-claudia
Copy link
Member Author

Done.

@StephanHoyer
Copy link
Member

closed in favour of #2743

@dead-claudia dead-claudia deleted the router-fix branch February 18, 2022 05:26
@JAForbes JAForbes mentioned this pull request Apr 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Bug For bugs and any other unexpected breakage
Projects
Status: Closed
Development

Successfully merging this pull request may close these issues.

4 participants