Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pattern to exclude any URLs with query strings. #329

Closed
aryadhiratara opened this issue Aug 23, 2024 · 13 comments
Closed

Pattern to exclude any URLs with query strings. #329

aryadhiratara opened this issue Aug 23, 2024 · 13 comments

Comments

@aryadhiratara
Copy link

aryadhiratara commented Aug 23, 2024

Context:

I'm looking for a regular expression pattern to exclude any URLs containing query strings using the Speculation Rules API.

I tested various patterns, but none worked as expected.

This simple pattern works:

{ "not": {"href_matches": "?*"}}

But only if the url with query strings pointed to the same page.

For example,

  • if we have this url in homepage https://domain.ext/?test-query, the exclusion works.
  • But if in the homepage, we have URLs with query string that pointed to another page, for example https://domain.ext/about/?test-query, the exclusion doesn't work.
  • However, If we're on the About page, the exclusion for https://domain.ext/about/?test-query works.

 
Following the example I found in MDN and Chrome’s documentation:

  • /*\\?*(^|&)add-to-cart=*
  • /*\\?*(^|&)category=*

I tests various exclusion pattern based from the example above. None of them are working as expected.

Any idea for the correct regex to exclude any URLs with query strings that will work?
 

Found Unexpected Behavior:

I accidentally found that when I use this/*\\?*(^|&) in the not href_match value (it should matches URLs that contain a ?, and optionally, those followed by an & )

{ "not": {"href_matches": "/*\\?*(^|&)"}}

Full code:

<script type="speculationrules">
{
  "prefetch": [{
    "source": "document",
    "where": {
      "and": [
        { "href_matches": "/*" },
        { "not": {"href_matches": "/*\\?*(^|&)"}}
      ]
    },
    "eagerness": "moderate"
  }]
}
</script>

The pattern inadvertently excludes normal URLs without query strings, preventing them from being prefetched while allowing URLs with query strings to be prefetched.

So I decided to test by emptying the 'not' value, and placed the URL in the (include) href_match

{ "href_matches": "/*\\?*(^|&)" }

Full code:

<script type="speculationrules">
{
  "prefetch": [{
    "source": "document",
    "where": {
      "and": [
        { "href_matches": "/*\\?*(^|&)" },
        { "not": {"href_matches": ""}}
      ]
    },
    "eagerness": "moderate"
  }]
}
</script>

It effectively excludes any URL with query strings while allowing normal URLs to be prefetched as intended.

Could someone provide clarification on why this behavior occurs and how to correctly exclude all URLs containing query strings using the Speculation Rules API?

@tunetheweb
Copy link

tunetheweb commented Aug 23, 2024

This seems to work for me for query strings with an = sign (e.g. https://example.com?a=b):

<script type="speculationrules">
        {
          "prerender": [
            {
              "where": {
                "and": [
                  { "href_matches": "/*" },
                  { "not": { "href_matches": "*\\?*=*" } }
                ]
              }
            }
          ]
        }
</script>

Example here: https://prerender-demos.glitch.me/no-query-params.html

However it doesn't exclude query strings without a value (e.g. https://example.com?a). Not sure if it's possible to do that?

@aryadhiratara
Copy link
Author

aryadhiratara commented Aug 23, 2024

Hi @tunetheweb,

Thank you for the quick respond and suggestion.

Unfortunately, I have tried that pattern. It makes all links, even the normal links (without query string present), it's not prefetched.

In the example that you provided https://prerender-demos.glitch.me/no-query-params.html, this one <a href="[next.html](https://prerender-demos.glitch.me/next.html)">This link should be prerendered</a> is not getting prefetched when hovered.

In my tests, if we add * before the ?, it always make the prefetch not working to all kind of URLs.

Any other suggestion?

Again, thank you :)

@domenic
Copy link
Collaborator

domenic commented Aug 23, 2024

This is really a question for https://github.com/whatwg/urlpattern.

One thing you may not be aware of is that you don't have to use the string form of URL patterns. You can use the dictionary form too. So I think something like this (not tested) would work:

{ "not": { "href_matches": { "search": "(.+)" } } }

I suggest playing around with the URL Pattern API directly which will make your testing go much faster. Just do

(new URLPattern(whateverYouWouldPutForHrefMatches, { baseURL: document.baseURI })).test(targetURL)

@aryadhiratara
Copy link
Author

Hi @domenic,

Thank you. I'll check out the URL Pattern API after this.

@tunetheweb
Copy link

In the example that you provided https://prerender-demos.glitch.me/no-query-params.html, this one This link should be prerendered is not getting prefetched when hovered.

@aryadhiratara I didn't have eagerness set to it defaulted to pointerdown. It's now set and I can see it recognising the URLs and prerendering them on hover.

One thing you may not be aware of is that you don't have to use the string form of URL patterns.

@domenic I did not know that! I will add that to [our docs]https://developer.chrome.com/docs/web-platform/prerender-pages), however I cannot get it to work.

This works:

const pattern = new URLPattern({ search: '(.+)' }, { baseURL: "https://prerender-demos.glitch.me" });
console.log(pattern.test("https://prerender-demos.glitch.me/next.html")); // false
console.log(pattern.test("https://prerender-demos.glitch.me/next.html?query")); // true
console.log(pattern.test("https://prerender-demos.glitch.me/next.html?query=123")); // true

But I tried several variants of that in Speculation Rules and not of it is accepted by Chrome:

"href_matches": { search: "(.+)" }
"href_matches": { "search": "(.+)" }
"href_matches": "{ search: '(.+)' }"

@jeremyroman
Copy link
Collaborator

The second of those is syntactically closest, but { "search": "(.+)" } is inheriting the pathname from the base URL (the same way that a link to "?foo" would inherit the pathname, so unless you're on next.html it doesn't match.

For example, for all same-origin links with non-empty query strings:

{"href_matches": { "pathname": "/*", "search": "(.+)" }}
{"href_matches": "/*\\?(.+)"}

(The backslash is necessary here because *? has a special meaning in the pattern syntax. Specifically, /foo/*? matches /foo while /foo/* does not and requires a forward slash after foo. So it's necessary to tell the parser to treat it as a ? that divides URL components instead.)

Or if you want absolutely everything with a non-empty query string, specifying the protocol explicitly wipes out inheritance (you can also specify the other components if you like, and in the string form you must):

{"href_matches": { "protocol": "*", "search": "(.+)" }}
{"href_matches": "*:*:*\\?(.+)" }

The reason your comparison didn't work correctly is because baseURL should be in the same object as the other URL components when using the URLPattern constructor; the second object is an "options" dictionary that doesn't have a baseURL field. @domenic should have written:

(new URLPattern({baseURL: document.baseURI, ...whateverYouWouldPutForHrefMatches})).test(targetURL)

At some point I should write a URL pattern explorer tool; despite our efforts to make as many things as possible natural, quirks persist.

@tunetheweb
Copy link

Thanks! Updated the demo.

I think the key thing I was missing is that it seems like you need parentheses to access full regex functionality (i.e. to get access to the + operator needed in this example to match 1 or more times, rather than 0 or more times). I don't see that explained anywhere on the MDN docs nor the spec. It's kinda there in examples but the parentheses could easily be ignored by those less familiar with this syntax (i.e. me!) as being part of the regex. Raised spec issue and MDN PR.

Anyway, that is definitely a URL Pattern issue and not for here. Confirmed it is all working for Speculation Rules once you know the syntax! So think this issue can be closed now.

@jeremyroman
Copy link
Collaborator

jeremyroman commented Aug 23, 2024

Just a quick additional note -- if you just want "has no query params", then instead of negating a "has query params" pattern, you could also use "search": "" (or similar in the string syntax) to accomplish that. For example, in your demo:

                "and": [
                  { "href_matches": "/*" },
                  { "not": { "href_matches": "*\\?(.+)" } }

could also be written

{ "href_matches": "/*\\?" }

or

{ "href_matches": { "pathname": "/*", "search": "" } }

@aryadhiratara
Copy link
Author

Hi @jeremyroman,

Need clarification. So if I need to exclude any URLs with query string, I should use this?

<script type="speculationrules">
{
  "prefetch": [{
    "source": "document",
    "where": {
      "and": [
        { "href_matches": "/*" },
        { "not": { "href_matches": { "pathname": "/*", "search": "" } } }
      ]
    },
    "eagerness": "moderate"
  }]
}
</script>

@tunetheweb
Copy link

This works too as given in my demo:

  <script type="speculationrules">
        {
          "prefetch": [
            {
              "where": {
                "href_matches": "/*\\?"
              },
              "eagerness": "moderate"
            }
          ]
        }
  </script>

@aryadhiratara
Copy link
Author

aryadhiratara commented Aug 24, 2024

Hi @tunetheweb,

Yes, that's what I'm trying to explain :)
It's similar with what I use before

<script type="speculationrules">
{
  "prefetch": [{
    "source": "document",
    "where": {
      "and": [
        { "href_matches": "/*\\?*(^|&)" },
        { "not": {"href_matches": ""}}
      ]
    },
    "eagerness": "moderate"
  }]
}
</script>

It effectively excludes any URL with query strings while allowing normal URLs to be prefetched as intended.

Logically, shouldn't this /*\\? be in the "not href_matches" section?

 
If I use this

<script type="speculationrules">
{
  "prefetch": [{
    "source": "document",
    "where": {
      "and": [
        { "href_matches": "/*" },
        { "not": { "href_matches": { "pathname": "/*", "search": "" } } }
      ]
    },
    "eagerness": "moderate"
  }]
}
</script>

the results is the opposite; normal urls are excluded while urls with query strings able to be prefetch.

However if use this

<script type="speculationrules">
{
  "prefetch": [{
    "source": "document",
    "where": {
      "and": [
        {"href_matches": { "pathname": "/*", "search": "" }}
      ]
    },
    "eagerness": "moderate"
  }]
}
</script>

It effectively excludes any URL with query strings while allowing normal URLs to be prefetched as intended.

@tunetheweb
Copy link

As /*\\? ends on the Query Param (without a * after it) it effectively says it only matches URLs with an empty query param so it;'s the same as: {"href_matches": { "pathname": "/*", "search": "" }}. Try it!

@aryadhiratara
Copy link
Author

ah, I finally got it. this explanation

it effectively says it only matches URLs with an empty query param

is what missing from my understanding.

Thank you very much @tunetheweb 🙏🏼

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants