Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

All our examples using /*\\?* URL pattern syntax break for fragment links #259

Closed
domenic opened this issue Mar 24, 2023 · 10 comments · Fixed by #286
Closed

All our examples using /*\\?* URL pattern syntax break for fragment links #259

domenic opened this issue Mar 24, 2023 · 10 comments · Fixed by #286

Comments

@domenic
Copy link
Collaborator

domenic commented Mar 24, 2023

E.g.

(new URLPattern("https://*/*\\?*", { baseURL: document.baseURI })).test("https://example.com/#test")
// => false

We should, I guess, be recommending /*\\?*#* ??

cc @wanderview in case there's anything better we can do at the URLPattern layer... Maybe some sort of syntax like /** would be helpful?

@jeremyroman
Copy link
Collaborator

We could (and maybe we do?) skip the fragment part since it clearly has no effect for prefetching at least. (I suppose it can for prerendering, though, now that I think about it.) That said, even"/*\\?*" looks like Perl code. :)

@wanderview
Copy link

What is the goal of the pattern? To match URLs like the current document's URL, but with any path, query, or fragment?

In that case I find it easier to read:

new URLPattern({ pathname: '*', search: '*', hash: '*', base: document.baseURI })

@domenic
Copy link
Collaborator Author

domenic commented Mar 27, 2023

Yep, that's the goal. The issue, I think, is that it's non-obvious in our experience, that /*, or even { pathname: '*' }, doesn't work.

I think there's a sort of intuition that if you don't care about the pathname, then of course you don't want to match the current base URL's search and hash. Nobody would want to carry over the search and hash from there. And you expect the API to understand this, somehow.

Basically, I think the implicit defaulting of base to document.baseURI in declarative APIs has unintuitive consequences. If you're doing it to yourself, with { base: document.baseURI }, then it's easier to understand what's happening. But in contexts like speculation rules or a hypothetical usage of URLPattern for service worker scopes or CSP rules, I think it would be confusing.

@wanderview
Copy link

Right, I think giving more control/visibility of the application of the base URL would help.

I think there's a sort of intuition that if you don't care about the pathname, then of course you don't want to match the current base URL's search and hash.

I think some of this comes from doing new URL('/a', 'https://example.com/b?c') ignores the base's pathname and search. It looks like this is just because you specified a pathname and nothing else. But in fact, `/a' does exactly specify an empty string search and hash.

And if we changed URLPattern to auto-wildcard search/hash if you wildcard pathname, then that would prevent any way to override the pathname but continue to inherit the search/hash. It feels like that should be possible, even if its a niche or rare use case.

@jeremyroman
Copy link
Collaborator

The reason I think implicit base URL is correct is that:

  1. it is very well-established as the default for URLs in any such context, and it would be strange for URL patterns to not match that
  2. cases like "you can prefetch /bar" probably should mean "you can prefetch /bar on this origin" and not "you can prefetch /bar anywhere"

Implicit base URL only for relative URLs would be another option, but it kinda adds a third behavior to the two that URLPattern already defines, which is also a little awkward.

@domenic
Copy link
Collaborator Author

domenic commented Apr 14, 2023

Today @KenjiBaheux and I were working on some slide decks explaining how to set up document rules for your site. We were continually caught out by this. We had to update the presentation in the following ways:

  • /wp-login.php -> /wp-login.php\\?*#*
  • /wp-admin/* -> /wp-admin/*\\?*#*
  • *.(pdf|doc|docx|xls) -> /*.(pdf|doc|docx|xls)\\?*#*. (Adding the / seems unavoidable; I don't have any good solutions to offer here.)

Here is my best proposal:

When constructing a URLPattern from a base URL, by default, if there is any pattern for a given component, the components "after" it in the URL become wildcards. So for example:

  • new URLPattern("/wp-login.php", "https://example.com/foo") will match /wp-login.php?bar#baz, not just /wp-login.php.
  • new URLPattern("https://*", "https://example.com/foo?bar#baz") will match https://qux.com/a?b#c, not just https://qux.com/

If you want to inherit those components from the base URL instead, then you can specify an option, e.g. new URLPattern("/wp-login.php", "https://example.com/foo", { baseURLInheritance: "strict" }). This addresses

that would prevent any way to override the pathname but continue to inherit the search/hash. It feels like that should be possible, even if its a niche or rare use case.

WDYT?

@wanderview
Copy link

Is there any reason your integration of base URL and the URLPattern constructor have to match? For example, when I was looking at integration with service workers I was going to use URLPattern syntax for the pathname, but apply the baseURL to it separately aligning with service worker rules.

That being said, I don't have strong objections if you want to pursue this. I don't personally have time to make the changes to URLPattern, though.

@domenic
Copy link
Collaborator Author

domenic commented Apr 24, 2023

Is there any reason your integration of base URL and the URLPattern constructor have to match? For example, when I was looking at integration with service workers I was going to use URLPattern syntax for the pathname, but apply the baseURL to it separately aligning with service worker rules.

I'd love to learn more about alternate models here. I can see how this would work if you had a single-component (i.e. pathname-only) URLPattern: you could extract out the pathname and match it against the URLPattern's pathname.

But we need something more generic for our case. We need to support patterns that restrict on query params (e.g., prefetch /product?id=123 but not /product?id=456); we need to support cross-origin patterns (e.g., prefetch https://example.com/*); etc. Given that we're so generic, I'd be hesitant to invent a new method of combining a generic URL pattern with a generic base URL, that didn't match the constructor.

If you think inventing such a combination method and using it only in this spec is reasonable though, let us know, and we can put that option on the table.

That being said, I don't have strong objections if you want to pursue this. I don't personally have time to make the changes to URLPattern, though.

Understood! I'm currently trying to figure out where fixing this annoyance falls in our teams' priorities. If it's high enough, then we'd be happy to do the work of updating the spec, tests, and Chromium implementation, including engaging with the URLPattern repo and its community to make sure we're building the right thing. In either case, thanks very much to you for helping us understand the problem space over in this repo, before we take that step.

@jeremyroman
Copy link
Collaborator

Looking at this again, I think this would have to decompose into two possible changes:

  1. In the string form, search and hash default to "*" rather than "" if not present. (This can still be explicitly emptied with ?#.) This itself fixes the examples @domenic gave above.
  2. When the base URL is combined with the URLPatternInit, absent later components (modulo handwaving around username and password) default to a wildcard, possibly with an option to inherit anyway. I think this is only necessary if we want to change the meaning of new URLPattern({pathname: "/wp-login.php", baseURL: ...}) and similar patterns, where it is actually possible for the later components to be absent as opposed to empty or wildcard.

The second isn't sufficient because the first actually emits empty strings, not absent components.

The latter is the one where I think @wanderview is asking for an opt-out, along the lines of a baseURLInheritance: "all" option. I admit I don't see the use case for it, though, and authors in script can always copy the components by hand if necessary. But not strongly opposed if we agree that that's the way forward.

I've agreed to add a UseCounter to look at what this would affect, but want to make sure we're on the same page on the particulars of what I'm to measure.

@domenic
Copy link
Collaborator Author

domenic commented Nov 14, 2023

We've fixed the underlying issue. We need to update all the docs to use the simpler, more-intuitive syntax now :)

domenic added a commit that referenced this issue Nov 14, 2023
Now that URL patterns are better, as of whatwg/urlpattern@ed205a4, we no longer need the ?* or ?*#* syntax.

Fixes #259.
domenic added a commit that referenced this issue Nov 14, 2023
Now that URL patterns are better, as of whatwg/urlpattern@ed205a4, we no longer need the ?* or ?*#* syntax.

Fixes #259.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants