Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add $path modifier of Adguard or any equivalents of this #1690

Closed
8 tasks done
Yuki2718 opened this issue Aug 21, 2021 · 23 comments
Closed
8 tasks done

Add $path modifier of Adguard or any equivalents of this #1690

Yuki2718 opened this issue Aug 21, 2021 · 23 comments
Labels
enhancement New feature or request fixed issue has been addressed

Comments

@Yuki2718
Copy link

Yuki2718 commented Aug 21, 2021

Prerequisites

I tried to reproduce the issue when...

  • uBO is the only extension
  • uBO with default lists/settings
  • using a new, unmodified browser profile

Description

AdguardTeam/CoreLibs#124 (comment)
The modifier is useful at least on Japanese sites and blog.livedoor.jp is the best example where thousands of different structure of pages share the same domain. This resulted in tons of verbose filters in my personal lists to avoid false positive, otherwise a filter working for a page breaks another page. An actual case reported to me (https://jbbs.shitaraba.net/bbs/read.cgi/internet/25463/1598352715/206) is that blog.livedoor.jp###containerWrap > #container > .blog-title-outer + #content.hfeed made for http://blog.livedoor.jp/sexykpopidol/ breaks http://blog.livedoor.jp/kknao81/. Currently for this very reason I ignore some ad-placeholder on blog.livedoor.jp in AdGuard Japanese, which can fully be resolved if the modifier becomes available.

(Not worth your time but in https://blog.livedoor.com/ranking/ you can pick many blog.livedoor.jp sites with different structure each.)

A specific URL where the issue occurs

blog.livedoor.jp

Steps to Reproduce

NA

Expected behavior

NA

Actual behavior

NA

uBlock Origin version

1.37.2

Browser name and version

Irrelevant

Operating System and version

Irrelevant

@uBlock-user uBlock-user added the enhancement New feature or request label Aug 21, 2021
@uBlock-user
Copy link
Contributor

uBlock-user commented Aug 21, 2021

We wouldn't need to use $shide to partially disable cosmetic filtering on google shopping pages(uBlockOrigin/uAssets#9642, uBlockOrigin/uAssets#9694) if we could use $path to narrow blocking google product ads to google search engine pages only.

Likewise for bing ?

@gorhill
Copy link
Member

gorhill commented Aug 21, 2021

An actual case reported to me

This works now without having to further complicate the code base and neither have to wait:

blog.livedoor.jp##link[rel="alternate"][href*="sexykpopidol"]:upward(html) #containerWrap > #container > .blog-title-outer + #content.hfeed

@Yuki2718
Copy link
Author

@gorhill I already fixed the issue and the point is not that I can't address them within the current arsenal. I can fix each case, but every time the rules become more complex and verbose.

@Yuki2718
Copy link
Author

But okay it seems link[rel="alternate"][href*=...]:upward(html) can always be used in case of blog.livedoor.jp. Since in the current uBO procedural engine will only be triggered if the first matching node exists, it can hardly be a matter. How about another point made by @uBlock-user ?

@gorhill
Copy link
Member

gorhill commented Aug 21, 2021

I could add a procedural operator like so:

blog.livedoor.jp##:match-path(/sexykpopidol) #containerWrap > #container > .blog-title-outer + #content.hfeed

This way implementation becomes quite trivial without having to further complicate the code base for all cosmetic filters. The disadvantage is that this is a procedural cosmetic filters, but it's opened to optimisation, i.e. a filtering engine could have special treatment for such filter in order to have them injected early along with other plain CSS selector-based filters.

Also an advantage is that we just keep reusing the same now familiar operator-based approach, which was the entire point of these operators, that is easy to extend with little to no impact on codebase complexity.


To calrify "code complexity", the suggested [path=...] approach by AdGuard adds quite more complexity to the parsing/storing/lookup of cosmetic filters, while a mere new procedural operator requires no refactoring, procedural operators were designed to naturally be extensible with no added widespread complication.

@uBlock-user
Copy link
Contributor

Would it be possible to have whitelist approach with :match-path ? I would like simply skip a path such as /shop at google/bing.

@gorhill
Copy link
Member

gorhill commented Aug 21, 2021

If argument to :match-path() can be a regex, than you can just use a negative lookahead?


Or even :not(:match-path(...)) ... should work I gather?

@Yuki2718
Copy link
Author

Sounds great! The reason AG use [path=...] is that they already use a similar [app=...] but probably they can transform [path=...] to :match-path() just like they do for other filters, so I have no problem.

@uBlock-user
Copy link
Contributor

:not(:match-path(/shop)) would do.

@gorhill
Copy link
Member

gorhill commented Aug 21, 2021

Trying to come out with a name I won't feel like changing in the future, please advise. Should it be:

  • :matches-path(...)
  • :match-path(...)
  • :path-has-text(...)
  • ?

Keep in mind this is the list of supported operators for now: https://github.com/gorhill/uBlock/blob/b44d9219c321cf42379b1b7e8711728321ef0a16/src/js/static-filtering-parser.js#L1818-L1838. Consistency is important.

@uBlock-user
Copy link
Contributor

Third can be changed to :has-path() as :path-has-text() sounds un-necessarilly longer.

@Yuki2718
Copy link
Author

Trying to come out with a name I won't feel like changing in the future, please advise. Should it be:

  • :matches-path(...)
  • :match-path(...)
  • :path-has-text(...)
  • ?

Consistency is important.

I'm not English-native but it appears match-path is inconsistent with matches-css.

@gorhill
Copy link
Member

gorhill commented Aug 21, 2021

I am also not English-native, and I wonder whether :matches-css() was the right choice, so I wouldn't want to repeat the same mistake, and if this was a mistake, I can always deprecate :matches-css() and instead make :match-css() official. I often make mistakes with those trailing s in English, so I am not a reference on when to properly use these.

@uBlock-user
Copy link
Contributor

Both are fine, no need to change anything regarding :matches-css()

gorhill added a commit to gorhill/uBlock that referenced this issue Aug 21, 2021
Related issue:
- uBlockOrigin/uBlock-issues#1690

New procedural operator: `:matches-path(...)`

Description: this is a all-or-nothing passthrough operator, which
on/off behavior is dictated by whether the argument match the
path of the current location. The argument can be either plain
text to be found at any position in the path, or a literal regex
against which the path is tested.

Whereas cosmetic filters can be made specific to whole domain,
the new `:matches-path()` operator allows to further narrow
the specificity according to the path of the current document
lcoation.

Typically this procedural operator is used as first operator in
a procedural cosmetic filter, so as to ensure that no further
matching work is performed should there be no match against the
current path of the current document location.

Example of usage:

    example.com##:matches-path(/shop) p

Will hide all `p` elements when visiting `https://example.com/shop/stuff`,
but not when visiting `https://example.com/` or any other page
on `example.com` which has no instance of `/shop` in the path part
of the URL.
@gorhill
Copy link
Member

gorhill commented Aug 21, 2021

:matches-path() will be in next dev build, 1.37.3b12.

If you look at the commit, you can see the required changes to the code were very little (even with an opportunistic unrelated change in there), meaning risk of regression is virtually zero -- so I far prefer this approach than supporting a new spec which would require a whole lot of work and assuming I would be fine with all the specs, I would work on this only in a far, undetermined future, with a lot of potential regression (including in performance) as a result.

gorhill added a commit to gorhill/uBlock that referenced this issue Aug 21, 2021
@uBlock-user
Copy link
Contributor

uBlock-user commented Aug 21, 2021

@gorhill on https://www.google.com/search?q=Under+Armour+Sportstyle+Logo+Tank&newwindow=1&sxsrf=ALeKk012vqIPWbQDpDef2IwbreyIoADXPw:1627344963115&source=lnms&tbm=shop&sa=X&ved=2ahUKEwin-s-z_IHyAhVPUxUIHW_cCCQQ_AUoAXoECBMQAw&biw=2560&bih=1297

Filter www.google.*##:not(:matches-path(/\/search\?q\=\*tbm=shop|\/shopping/)) [href^="/aclk"][href$="adurl="]:not(> *:upward([class^="sh-sr__shop-result-group"])):upward(3) gets applied even though we're excluding /search?q=*tbm=shop from the path via :not.

To repro add the following filters --

www.google.*##:not(:matches-path(/\/search\?q\=\*tbm=shop|\/shopping/)) [href^="/aclk"][href$="adurl="]:not(>  *:upward([class^="sh-sr__shop-result-group"])):upward(3)
www.google.*#@#[href^="/aclk"][href$="adurl="]:not(:scope > :upward([class^="sh-sr__shop-result-group"])):upward(3)
@@||www.google.*/search?q=*tbm=shop$shide,badfilter

and then visit https://www.google.com/search?q=Under+Armour+Sportstyle+Logo+Tank&newwindow=1&sxsrf=ALeKk012vqIPWbQDpDef2IwbreyIoADXPw:1627344963115&source=lnms&tbm=shop&sa=X&ved=2ahUKEwin-s-z_IHyAhVPUxUIHW_cCCQQ_AUoAXoECBMQAw&biw=2560&bih=1297

Am I falling short somewhere ?

@gorhill
Copy link
Member

gorhill commented Aug 21, 2021

I missed this comment:

I guess we'd better cover path+query right away and not wait for it to be requested from us.

So I went with path only, but I will modify to match against path + query.

gorhill added a commit to gorhill/uBlock that referenced this issue Aug 21, 2021
@uBlock-user
Copy link
Contributor

Still happens on 1.37.3b13 build as per the above repro...

@gorhill
Copy link
Member

gorhill commented Aug 21, 2021

You regex is:

/\/search\?q\=\*tbm=shop|\/shopping/

You are escaping the *, meaning you are trying to match a literal asterisk in the URL, while there is no such character. Is this more what you intended?

/^/search\?q=.*?tbm=shop|^/shopping/

@uBlock-user
Copy link
Contributor

uBlock-user commented Aug 21, 2021

You are escaping the *, meaning you are trying to match a literal asterisk in the URL,

Yeah, need to correct that.

tbm=shop is not always accompanied with ? sometimes it's & btw...

Thanks, works now as expected.

@uBlock-user uBlock-user added the fixed issue has been addressed label Aug 21, 2021
@gorhill
Copy link
Member

gorhill commented Aug 21, 2021

tbm=shop is not always accompanied with ? sometimes it's & btw...

The ? in q=.*?tbm=shop is not a literal ?, .*? means non-greedy match to make regex faster.

@uBlock-user
Copy link
Contributor

Yeah, I realised it after I commented, not used to crafting regexes everyday, thanks for helping out.

@gorhill
Copy link
Member

gorhill commented Jun 3, 2022

this won't work

You are using invalid syntax.

subdomain.example.com should be unaffected

Exclude subdomain.example.com:

example.com,~subdomain.example.com##...

Yuki2718 referenced this issue in AdguardTeam/AdguardFilters Dec 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request fixed issue has been addressed
Projects
None yet
Development

No branches or pull requests

4 participants
@gorhill @uBlock-user @Yuki2718 and others