Router fix #2061

dead-claudia · 2017-12-26T20:17:16Z

Description

Motivation and Context

Fixes part of #2060, and needs backported.

The original bug seems to make a case for security with this, but I doubt exploiting the bug could do any more than just play a practical joke on people.

How Has This Been Tested?

Added a few new tests on both the API and router service

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

My code follows the code style of this project.
My change requires a change to the documentation.
I have updated the documentation accordingly.
I have read the CONTRIBUTING document.
I have added tests to cover my changes.
All new and existing tests passed.
I have updated docs/change-log.md

dead-claudia · 2017-12-26T20:19:57Z

@pygy This doesn't address the part of browsers returning inconsistent results for routes. Just the case of invalid routes.

pygy · 2017-12-26T20:47:02Z

@isiahmeadows I'll have to ponder over this...

Now Chrome also uses percent encoding for unicode characters, and none encodes % or plain spaces.

https://flems.io/#0=N4IgZglgNgpgziAXAbVAOwIYFsZJAOgAsAXLKEAGhAGMB7NYmBvEAXwvW10QICsEqdBk2J4wAVzTViEegAJGcYgAoAHgEo5wADpo5cgO4Q0AE1oH8UWtQwz6RDHEJyAvHNW79QuLViXaAOZqFIbGZhZWNnZoDk7quqy6uooq2iAApGnxwkrKaRgARtTphdRZyfCpIBhyRekATAAMZSDZKXkgAG9ZbBwgmDh4+NRwAjT0jMw8bAC6rEA

Also, I think that @ FremyCompany wants to have % characters in the route.

dead-claudia · 2017-12-26T20:55:31Z

@pygy To clarify, this only concerns itself with routes like /abc%def, which are obviously going to get rejected.

IMHO the easiest workaround if @ FremyCompany wants to just have raw % in the route is to just use %25, the encoded equivalent of that route.

Oddly enough, our support for such percent-encoded routes appears completely undocumented somehow... (search for either encode or %) Probably something we need to fix at some point.

pygy · 2017-12-26T22:16:37Z

@isiahmeadows do we need to gracefully fall back if it happens? I don't think that raw % are supported in URLs... Likewise, percent-encoding a non-utf-8 byte sequence sounds like asking for trouble.

We could perhaps make the regexp smarter: /%[ab89][a-f0-9](?:%[c-f][a-f0-9])+/gim , or even

/%[0-7][0-9a-f]|%[cd][0-9a-f]%[89ab][0-9a-f]|%e[0-9a-f]%[89ab][0-9a-f]%[89ab][0-9a-f]|%f[0-3]%[89ab][0-9a-f]%[89ab][0-9a-f]%[89ab][0-9a-f]|%f4%8[0-9a-f]%[89ab][0-9a-f]%[89ab][0-9a-f]%[89ab][0-9a-f]/gi

(not sure why the original has a m flag). That would only decode things that JS understands (including unpaired UTF-16 surrogates, since they are valid UCS2).

dead-claudia · 2017-12-26T23:36:02Z

@pygy My only thought is that we not crash the app ourselves over something that not even the developers can realistically control.

Of course we could also update the regexp, but I'd rather still have the safeguard in place.

pygy · 2017-12-27T00:45:13Z

@isiahmeadows having the check is prudent. The regexp above is too broad, /%[0-7][0-9a-f]|%[cd][0-9a-f]%[89ab][0-9a-f]|%e[0-9a-f]%[89ab][0-9a-f]%[89ab][0-9a-f]/ is sufficient to go up to U+10FFFF.

However I couldn't test it on the whole range of code points because encodeURI refuses to encode not only unpaired UTF-16 surrogate code points, but also their astral plane counterparts (which AFAIK are valid code points). See here...

Edit, actually, when using the correct function (String.fromCodePoint(), not .fromCharCode()) the regexp works with the whole range of valid code points.

dead-claudia · 2017-12-27T02:06:01Z

@pygy Thought I'd clarify: I'm not against the idea of also fixing the regexp (I'm actually kind of with you on it). It's just not the subject of this PR. This is purely just defining the fallback mechanism in case we ever miss an edge case and fail to catch the grammar perfectly, for a bit of fault tolerance.

pygy · 2017-12-27T11:55:06Z

@isiahmeadows understood.

In that case, we might as well do

var data = $window.location[fragment]
try {data = data.replace($regexp, decodeURIComponent)} catch(e){}

in normalize, it will preserve more infomation than setting the route to null.

Note that the correct RegExp is actually

/
%[0-7][0-9a-f] |
% [cd][0-9a-f] %[89ab][0-9a-f] |
%   e (?!d%[ab])
      [0-9a-f] %[89ab][0-9a-f] %[89ab][0-9a-f] |
%   f    [0-3] %[89ab][0-9a-f] %[89ab][0-9a-f] %[89ab][0-9a-f] |
%   f       4  %    8 [0-9a-f] %[89ab][0-9a-f] %[89ab][0-9a-f]
/gi

, after stripping white space (I shouldn't code that late, but past a certain point in tiredness I lose the ability to judge my own sleepiness...). decodeURI doesn't support doubly-encoded UTF-8 (WTF-8).

The %[89ab][0-9a-f] parts may be repeated with {2} and {3} depending on what compresses better.

Another thing I wonder: what happens when the page's encoding isn't UTF-8? Is the page encoding used to extract percent-encoded characters?

dead-claudia · 2017-12-27T19:36:01Z

@pygy

In that case, we might as well do [...]

I'm okay with that, if you prefer that over the existing "go to default".

Another thing I wonder: what happens when the page's encoding isn't UTF-8? Is the page encoding used to extract percent-encoded characters?

Should have exactly zero impact on the issue. It might affect how the page itself is interpreted, but not the URL.

dead-claudia · 2017-12-27T19:54:28Z

@pygy Regarding the serialization issue:

It appears IE has no support for URL (and its href is erroneously unescaped), and the rest have had it for years. So we can special-case IE's brokenness based on that knowledge alone.
If you can repro Chrome's bad hash value with new URL("https://example.com/ö/o?ö/o#ö/o").hash (and maybe @tivac with Edge), we can use URL-based feature testing for the rest, so we can avoid double-unescaping URLs. (This could come up with say, /abc%25123, in which we don't want to decode that to "/abc\u{12}3".) I don't have Chrome easy-access ATM, but if you could check, that would be really nice.

pygy · 2017-12-27T21:57:11Z

@isiahmeadows Chrome now works like Firefox and Safari.

Edge still behaves like it did two years ago when I wrote the #881 (comment) table.

For IE, we can emulate URL by creating a dummy <a> element and playing with its href attribute.

dead-claudia · 2017-12-27T22:06:47Z

@pygy Does the new URL(location.href).* feature test work identically to location.* setting in Edge?

pygy · 2017-12-27T22:59:23Z

I only tested new URL() today, and it behaves as location did two years ago (unicode chars are not encoded by the getters).

Edit: do you specifically want to see how new URL(location.href) behaves?

dead-claudia · 2017-12-27T23:00:57Z

@pygy I need you to test both, so I know if they deviate or not. If they do, then we can't use it for feature-testing double-escaping.

pygy · 2017-12-27T23:19:46Z

So, I just tried this.

In Edge, nothing is percent encoded.

In all other browsers, the a.search part is percent-encoded in Latin 1 where ö is %f6.
In Firefox, location.search is also percent-encoded in Latin 1.
All other parts are percent encoded in UTF-8 (%C3%B6).

The Latin 1 thing is new, I wonder if someone, somewhere is trolling me.

Edit: another one with new URL() added.

new URL() has everything percent-encoded in UTF-8, even in Firefox (except in Edge where nothing is encoded).

FFFFFFFUUUUUUUUUUUU, and I'm mincing my words.

Edit2: Thankfully the Latin 1 thing seems to be limited to Flems. @porsager any idea of what could cause this?

Edit3: chrome using percent encoding in the hash is quite recent. I have an old alt-Chromium installed (v58) that still has the hash non-encoded. In the new version the hash part of the location bar ends up percent encoded whereas the pathname and the search parts don't.

porsager · 2017-12-27T23:59:54Z

@pygy it appears to be because the flems runtime html doesn't have <meta charset="utf-8">. I've added it to next.flems.io where you can see the correct results now..

I suppose it would be a good idea adding it to production as well? (just want to be sure what the most correct behavior/expectation is)

pygy · 2017-12-28T00:07:35Z

So, the HTML page encoding affects the way percent encoding works in odd ways, even though it shouldn't according to the spec. Thanks @porsager for the diagnostic and the quick fix :-)

porsager · 2017-12-28T00:10:22Z

You're welcome... I've been burned by this before, so was the first thing I tried out :P I've added the fix to flems.io as well now..

dead-claudia · 2017-12-28T02:10:25Z

@pygy Does the above discussion block this PR from getting merged?

pygy · 2017-12-28T21:15:56Z

@isiahmeadows the Latin-1 stuff, no, but I'd rather have the try/catch block in normalize(), it will preserve more info in some cases (bad characers in param values) and go to the default route otherwise.

dead-claudia · 2017-12-29T19:24:17Z

Okay. I'll see about updating it once I have a chance.

dead-claudia · 2017-12-31T06:26:05Z

@pygy Updated.

dead-claudia · 2018-01-06T23:39:11Z

@pygy Ping?

pygy · 2018-01-07T20:51:24Z

@isiahmeadows I think that the try/catch block in getPath() is now redundant (likewise for the null check in defineRoutes().

I'll be mostly offline for the next few days (back on Thursday at worse).

dead-claudia · 2018-01-08T09:59:01Z

Done.

StephanHoyer · 2022-02-17T09:25:40Z

closed in favour of #2743

Merge MithrilJS/next into next

816178a

dead-claudia requested a review from pygy December 26, 2017 20:17

dead-claudia requested a review from tivac as a code owner December 26, 2017 20:17

dead-claudia force-pushed the router-fix branch from c97be14 to 0fe0fd1 Compare December 31, 2017 06:25

Correctly handle invalid escapes in routes

0a5ead3

dead-claudia force-pushed the router-fix branch from 0fe0fd1 to 0a5ead3 Compare January 8, 2018 09:58

dead-claudia added the Type: Bug For bugs and any other unexpected breakage label Oct 28, 2018

dead-claudia mentioned this pull request Feb 1, 2019

Streamline route/request path handling and split params + body in requests #2361

Merged

11 tasks

dead-claudia mentioned this pull request Feb 26, 2019

Avoid breaking pages that use the URL fragment for routing/state WICG/scroll-to-text-fragment#15

Closed

dead-claudia force-pushed the next branch from 292a667 to 51e0aee Compare May 29, 2019 14:11

StephanHoyer mentioned this pull request Feb 17, 2022

Correctly handle invalid escapes in routes based on 0a5ead31c9fbd7b153c521c7f9d3df7bf826ce6c #2743

Merged

StephanHoyer closed this Feb 17, 2022

dead-claudia deleted the router-fix branch February 18, 2022 05:26

JAForbes mentioned this pull request Apr 28, 2022

Release v2.1.0 #2766

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Router fix #2061

Router fix #2061

dead-claudia commented Dec 26, 2017 •

edited

Loading

dead-claudia commented Dec 26, 2017

pygy commented Dec 26, 2017

dead-claudia commented Dec 26, 2017 •

edited

Loading

pygy commented Dec 26, 2017 •

edited

Loading

dead-claudia commented Dec 26, 2017

pygy commented Dec 27, 2017 •

edited

Loading

dead-claudia commented Dec 27, 2017

pygy commented Dec 27, 2017 •

edited

Loading

dead-claudia commented Dec 27, 2017

dead-claudia commented Dec 27, 2017

pygy commented Dec 27, 2017

dead-claudia commented Dec 27, 2017

pygy commented Dec 27, 2017 •

edited

Loading

dead-claudia commented Dec 27, 2017

pygy commented Dec 27, 2017 •

edited

Loading

porsager commented Dec 27, 2017

pygy commented Dec 28, 2017

porsager commented Dec 28, 2017

dead-claudia commented Dec 28, 2017

pygy commented Dec 28, 2017

dead-claudia commented Dec 29, 2017

dead-claudia commented Dec 31, 2017

dead-claudia commented Jan 6, 2018

pygy commented Jan 7, 2018

dead-claudia commented Jan 8, 2018

StephanHoyer commented Feb 17, 2022

Router fix #2061

Router fix #2061

Conversation

dead-claudia commented Dec 26, 2017 • edited Loading

Description

Motivation and Context

How Has This Been Tested?

Types of changes

Checklist:

dead-claudia commented Dec 26, 2017

pygy commented Dec 26, 2017

dead-claudia commented Dec 26, 2017 • edited Loading

pygy commented Dec 26, 2017 • edited Loading

dead-claudia commented Dec 26, 2017

pygy commented Dec 27, 2017 • edited Loading

dead-claudia commented Dec 27, 2017

pygy commented Dec 27, 2017 • edited Loading

dead-claudia commented Dec 27, 2017

dead-claudia commented Dec 27, 2017

pygy commented Dec 27, 2017

dead-claudia commented Dec 27, 2017

pygy commented Dec 27, 2017 • edited Loading

dead-claudia commented Dec 27, 2017

pygy commented Dec 27, 2017 • edited Loading

porsager commented Dec 27, 2017

pygy commented Dec 28, 2017

porsager commented Dec 28, 2017

dead-claudia commented Dec 28, 2017

pygy commented Dec 28, 2017

dead-claudia commented Dec 29, 2017

dead-claudia commented Dec 31, 2017

dead-claudia commented Jan 6, 2018

pygy commented Jan 7, 2018

dead-claudia commented Jan 8, 2018

StephanHoyer commented Feb 17, 2022

dead-claudia commented Dec 26, 2017 •

edited

Loading

dead-claudia commented Dec 26, 2017 •

edited

Loading

pygy commented Dec 26, 2017 •

edited

Loading

pygy commented Dec 27, 2017 •

edited

Loading

pygy commented Dec 27, 2017 •

edited

Loading

pygy commented Dec 27, 2017 •

edited

Loading

pygy commented Dec 27, 2017 •

edited

Loading