Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use "inherit left, wildcard right" behavior in base URL application and constructor string parsing. #198

Merged
merged 10 commits into from
Nov 7, 2023

Conversation

jeremyroman
Copy link
Collaborator

@jeremyroman jeremyroman commented Oct 27, 2023

The following changes apply to patterns which are constructed using a base URL, the constructor string syntax, or both -- but not any pattern which explicitly specifies components separately without a base URL.

  • Components are not inherited from a base URL if an "earlier" component is explicitly specified.
  • In the string format, unspecified "later" components are implicitly wildcarded, rather than required to be empty (with the exception of the port, which is always taken to be specified when the hostname is).
  • Username and password are never implicitly specified or inherited.

For example:

  1. "https://example.com/*" also matches with any username, password, search, and hash. Previously this would be written "https://*:*@example.com/*\\?*#*".
  2. new URLPattern({ pathname: "/login" }, "https://example.com/?q=hello") accepts any query string and hash, not only "?q=hello" and "".
  3. "https://*:*" or {protocol: "https"} means "any HTTPS URL, on any port", and "https://*" means "any HTTPS URL on the default port (443)". These have the same meaning whether or not a base URL is provided, since specifying the protocol prohibits inheritance of other components.

This makes patterns more expansive than before, in cases where wildcards are likely to be desirable.

The logic of inheriting components from a base URL dictionary is also similarly changed in a way that may make it not match where it did before, but more consistently with the above and with how relative URL strings are resolved. For example, new URLPattern("https://example.com/foo?q=1#hello").test({pathname: "/foo", hash: "hello", baseURL: "https://example.com/bar?q=1"}) previously returned true but will now be false, since the search component is not inherited when the pathname is specified. This is analogous to how new URL("/foo#hello", "https://example.com/bar?q=1") works. The reverse is also possible; in both cases this is quite niche.

Fixes #179.


(See WHATWG Working Mode: Changes for more details.)


Preview | Diff

@jeremyroman
Copy link
Collaborator Author

@sisidovski @domenic FYI

@jeremyroman jeremyroman added impacts documentation Used by documentation communities, such as MDN, to track changes that impact documentation needs implementer interest Moving the issue forward requires implementers to express interest labels Nov 1, 2023
@jeremyroman
Copy link
Collaborator Author

Updated commit message in the opening comment (hopefully it's reasonably clear).

I assume Gecko and WebKit implementation bugs are not desired since they don't yet (to my knowledge) implement URLPattern; is that right? I can file Deno and urlpattern-polyfill ones shortly before merging if appropriate. Ditto for MDN.

For implementer interest, it sounded like Anne was on board with this direction; what level of assent is required to add WebKit to the implementer interest list?

@domenic
Copy link
Member

domenic commented Nov 1, 2023

Updated commit message in the opening comment (hopefully it's reasonably clear).

LGTM! You might want to add something about the compatibility implications, and how we've measured them to be low.

I assume Gecko and WebKit implementation bugs are not desired since they don't yet (to my knowledge) implement URLPattern; is that right?

Right, we can treat them as being under the umbrella "implement URLPattern" bugs.

For implementer interest, it sounded like Anne was on board with this direction; what level of assent is required to add WebKit to the implementer interest list?

What I would do is check the box for WebKit, linking to #179 (comment) , but also tag @annevk to confirm, and give a few days before merging in case he has objections or wants to take a closer look.

@jeremyroman
Copy link
Collaborator Author

What I would do is check the box for WebKit, linking to #179 (comment) , but also tag @annevk to confirm, and give a few days before merging in case he has objections or wants to take a closer look.

Alright, will do unless @annevk objects.

Copy link
Member

@domenic domenic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normatively LGTM, just editorial suggestions

spec.bs Show resolved Hide resolved
spec.bs Outdated Show resolved Hide resolved
spec.bs Outdated Show resolved Hide resolved
spec.bs Outdated Show resolved Hide resolved
spec.bs Show resolved Hide resolved
spec.bs Outdated Show resolved Hide resolved
spec.bs Outdated Show resolved Hide resolved
spec.bs Show resolved Hide resolved
@annevk
Copy link
Member

annevk commented Nov 6, 2023

I didn't do an in-depth review, but happy to agree on behalf of WebKit with the behavior we're aiming for here.

Copy link
Member

@domenic domenic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with more nits, not all of which I'm sure about. @sisidovski did you want to take a look?

spec.bs Outdated Show resolved Hide resolved
spec.bs Show resolved Hide resolved
spec.bs Show resolved Hide resolved
spec.bs Outdated
* protocol, hostname, port, pathname, search, hash
* protocol, hostname, port, username, password

Username and password are also never inherited from a base URL when constructing a {{URLPattern}}. (They are, however, inherited from the base URL supplied as an argument to {{URLPattern/test()}} or {{URLPattern/exec()}}.)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Username and password are also never inherited from a base URL when constructing a {{URLPattern}}. (They are, however, inherited from the base URL supplied as an argument to {{URLPattern/test()}} or {{URLPattern/exec()}}.)
Username and password are also never inherited from a base URL when constructing a {{URLPattern}}. (They are, however, inherited from the base URL when parsing a URL from the string supplied as an argument to {{URLPattern/test()}} or {{URLPattern/exec()}}.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added "when parsing a URL". You can actually supply a dictionary rather than a string if you want, and this behavior also applies there. e.g. urlPattern.test({pathname: "/world"}, "https://user:pass@example.com/") is a thing you can do.

spec.bs Show resolved Hide resolved
spec.bs Outdated Show resolved Hide resolved
@@ -510,17 +601,14 @@ A [=constructor string parser=] has an associated <dfn export for="constructor s
<p>First, the URLPattern constructor string parser operates on [=tokens=] generated using the "`lenient`" [=tokenize policy=]. In constrast, [=basic URL parser=] operates on code points. Operating on [=tokens=] allows the URLPattern constructor string parser to more easily distinguish between code points that are significant pattern syntax and code points that might be a URL component separator. For example, it makes it trivial to handle named groups like "`:hmm`" in "`https://a.c:hmm.example.com:8080`" without getting confused with the port number.
<p>Second, the URLPattern constructor string parser needs to avoid applying URL canonicalization to all code points like [=basic URL parser=] does. Instead we perform canonicalization on only parts of the pattern string we know are safe later when compiling each component pattern string.
<p>Finally, the URLPattern constructor string parser does not handle some parts of the [=basic URL parser=] state machine. For example, it does not treat backslashes specially as they would all be treated as pattern characters and would require excessive escaping. In addition, this parser might not handle some more esoteric parts of the URL parsing algorithm like file URLs with a hostname. The goal with this parser was to handle the most common URLs while allowing any niche case to be handled instead via the {{URLPatternInit}} constructor.
<p>In the constructor string algorithm, the pathname, search, and hash are wildcarded if earlier components are specified but later ones are not. For example, "`https://example.com/foo`" matches any search and any hash. Similarly, "`https://example.com`" matches any URL on that origin. This is analogous to the notion of a more specific component in the notes about [=process a URLPatternInit=] (e.g., a search is more specific than a pathname), but the constructor syntax only has a few cases where it is possible to specify a more specific component without also specifying the less specific components.
<p>The username and password components are always wildcard unless they are explicitly specified.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want a mention the exception of port here as other components are explicitly mentioned? It also makes sense not having it since we have a note in L723 though.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a mention here.

1. If |init|["{{URLPatternInit/port}}"] is not null then set |result|["{{URLPatternInit/port}}"] to the result of [=process port for init=] given |init|["{{URLPatternInit/port}}"], |result|["{{URLPatternInit/protocol}}"], and |type|.
1. If |init|["{{URLPatternInit/pathname}}"] is not null:
1. If |init|["{{URLPatternInit/protocol}}"] does not [=map/exist=], then set |result|["{{URLPatternInit/protocol}}"] to the result of [=processing a base URL string=] given |baseURL|'s [=url/scheme=] and |type|.
1. If |type| is not "`pattern`" and |init| [=map/contains=] none of "{{URLPatternInit/protocol}}", "{{URLPatternInit/hostname}}", "{{URLPatternInit/port}}" and "{{URLPatternInit/username}}", then set |result|["{{URLPatternInit/username}}"] to the result of [=processing a base URL string=] given |baseURL|'s [=url/username=] and |type|.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to confirm, |type| will be pattern only when constructing, right? match() and exec() also process URLPatternInit, but |type| is "url" in that case so I believe this is correct.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, exactly.

@sisidovski
Copy link
Collaborator

Apologize for the delay, LGTM with a few questions.

@jeremyroman
Copy link
Collaborator Author

Chromium CL has also landed, and should ship in M121. 🤞

@jeremyroman jeremyroman merged commit ed205a4 into whatwg:main Nov 7, 2023
2 checks passed
@jeremyroman jeremyroman deleted the syntax-179 branch November 7, 2023 21:53
chromium-wpt-export-bot pushed a commit to web-platform-tests/wpt that referenced this pull request Nov 21, 2023
This CL changes the behavior of the router rule registration in the
ServiceWorker Static Routing API, especially when the |urlPattern|
condition accepts the URLPatternInit object.

Before this CL, the URLPatternInit input is accepted as it is, that
means any unspecified fields are resulted in the wildcards (*). This
behavior is inconsistent with the case when |urlPattern| accepts a
string. When a string is passed, missing fields are complemented by
baseURL, which the SW script URL is internally used when adding the new
router rule. So some missing components will use the values inherited
from the baseURL in that case.

After this CL, the URLPatternInit input also internally uses the SW
script URL as a baseURL if not provided. It typically happens when the
input is an object that complies with the URLPatterInit interface. On
the other hand, it wouldn't affect the case when URLPatternInit is the
output of `new URLPattern()`. Thank to the recent change on the
URLPattern[1], the constructed components in URLPatternInit will inherit
or wildcarded. baseURL won't override components if those are not empty.

This behavior change is based on the discussion in
whatwg/urlpattern#182.

[1] whatwg/urlpattern#198

Bug: 1371756
Change-Id: I5cce80fde05cf18237c8b6412b00e017ff5aad5b
chromium-wpt-export-bot pushed a commit to web-platform-tests/wpt that referenced this pull request Nov 27, 2023
This CL changes the behavior of the router rule registration in the
ServiceWorker Static Routing API, especially when the |urlPattern|
condition accepts the URLPatternInit object.

Before this CL, the URLPatternInit input is accepted as it is, that
means any unspecified fields are resulted in the wildcards (*). This
behavior is inconsistent with the case when |urlPattern| accepts a
string. When a string is passed, missing fields are complemented by
baseURL, which the SW script URL is internally used when adding the new
router rule. So some missing components will use the values inherited
from the baseURL in that case.

After this CL, the URLPatternInit input also internally uses the SW
script URL as a baseURL if not provided. It typically happens when the
input is an object that complies with the URLPatterInit interface. On
the other hand, it wouldn't affect the case when URLPatternInit is the
output of `new URLPattern()`. Thank to the recent change on the
URLPattern[1], the constructed components in URLPatternInit will inherit
or wildcarded. baseURL won't override components if those are not empty.

This behavior change is based on the discussion in
whatwg/urlpattern#182.

[1] whatwg/urlpattern#198

Bug: 1371756
Change-Id: I5cce80fde05cf18237c8b6412b00e017ff5aad5b
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
impacts documentation Used by documentation communities, such as MDN, to track changes that impact documentation needs implementer interest Moving the issue forward requires implementers to express interest
Development

Successfully merging this pull request may close these issues.

Base URL inheritance gives unintuitive results
4 participants