-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
URL Normalization #23
Comments
I've thought briefly about the encoding question. My current plan was to automatically URL encode illegal characters passed to the URLPattern constructor like the URL constructor. If the pattern or input URL has URL encoded characters that are normally legal we wouldn't do any special logic to try to match them. It sounds like we'll have to give this more though, though. Thanks for filing! |
@domenic and @annevk, do you have any opinions here? Currently I plan to:
I'm not sure what to do about repeated |
Note, it doesn't appear the URL spec does anything special for repeated |
cc @jeffposnick and @jakearchibald for developer experience input. |
I suspect some of this might have to be configurable, e.g., whether |
Right. For input I'm planning to convert to UTF-8 and percent encode automatically like URL does. |
I guess I could have URLPattern.match() and URLPattern.test() automatically normalize the input by default. This would call String.normalize(), perform UTF8 and percent encoding, and then remove any duplicate |
I don't think |
I guess I'm more thinking of how can we remove footguns for people. If there are ways to have multiple encodings that produce the same output, it seems matching those consistently with a single pattern would be a nice default. As someone who mainly works in English, though, I don't know how large of a concern this is. |
I would prefer starting without any normalization (other than what the URL parser does) as that would reduce the expressiveness of URLs. It seems that any such thing should be opt-in and probably be added in later versions once it's more clear what the demand is and what the demand is for. |
I guess my concern is that some of these issues may occur outside the developers control. If a user types duplicate slashes in the URL bar, then a FetchEvent.request.url will also include those duplicate slashes. A fetch handler using URLPattern that did not test against this may break unexpectedly. In these cases, what would we tell the developer to do? Remove duplicate slashes I think. And this advice would probably apply to all fetch handlers since any user could do this. If we would offer such blanket guidance it seems it might make sense to be the default. |
It really depends on the server what any given path+query will yield. If slashes would always be de-duplicated, the URL parser should do it as that would be more indicative of them not being significant. (But it doesn't because they are.) |
@jeffposnick does workbox do anything special to handle FetchEvents with duplicate slashes in the url? |
As a convenience in |
The following Node.js server exhibits how servers can respond differently to one versus two slashes: 'use strict';
const http = require('http');
const server = http.createServer((req, res) => {
res.statusCode = 200;
res.setHeader('Content-Type', 'text/plain');
if (req.url === '/test/') {
res.end('One slash');
} else if (req.url === '/test//') {
res.end('Two slashes');
} else {
res.end('Another page');
}
});
server.listen(8000, 'localhost'); Locally I can trigger different results by going to However, I tried to put this code into a Glitch, and I cannot get it to exhibit the two-slash result. My guess is that there's some load-balancer or other proxy in front of Glitch servers which normalizes away the two slashes before it reaches the Node.js layer. (I also did a variant with Overall this feels similar to the trailing slash question. Probably some portion of servers normalize I'm also not sure where that option would be best located: in the URL parser, or in the URL pattern matcher. Or even in a new "URL normalizer" API. |
Some touchy edge-cases with pathname matching that I've come up against that are worth considering are:
/
in URLs/你好
and have it work instead of writing/%E4%BD%A0%E5%A5%BD
.Some of the solutions to this is covered in https://github.com/pillarjs/path-to-regexp#process-pathname, but a lot of implementations ignore the edge cases which result in inconsistent handling across users. Both 1 and 3 likely depend on how "exact" matches want to be.
The text was updated successfully, but these errors were encountered: