Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve/refine syntax highlighter #1134

Closed
13 of 17 tasks
gorhill opened this issue Jul 1, 2020 · 37 comments
Closed
13 of 17 tasks

Improve/refine syntax highlighter #1134

gorhill opened this issue Jul 1, 2020 · 37 comments
Labels
enhancement New feature or request fixed issue has been addressed

Comments

@gorhill
Copy link
Member

gorhill commented Jul 1, 2020

Instead of having countless issues opened for every single improvements/refining that could be brought to the syntax highlighter, here is one meta issue to handle them all. I will list all the actionable items in the top comment. Feel free to report un-handled cases or suggest more refinements.

To do:

  • Add a filter list which purpose is to validate the static filtering parser
    • Done:
      • https://raw.githubusercontent.com/gorhill/uBlock/master/docs/tests/static-filtering-parser-checklist.txt
      • Subscribe to it and disable it after the list is imported, then use asset viewer to see the result.
  • *$xhr,empty should be reported as valid
  • The following redirect rules should be reported as invalid:
    • *$beacon,redirect-rule=empty
    • *$ping,redirect-rule=empty
    • *$websocket,redirect-rule=empty
  • Detect invalid !# directives
    • Support auto-completion for valid directives
  • Detect unmatched !#if-!#endif directives
  • Handle line continuation -- I need to read and wrap my mind around CodeMirror's state object
  • Detect and fix regex-based filters starting with /.*: they are inefficient and the leading .* accomplishes nothing.
  • Ambiguous option separator, see Improve/refine syntax highlighter #1134 (comment).

Nice to have:

  • Buttons to navigate to next/prev problematic filter in asset viewer and "My filters
  • Warn in gutter about untokenizable static network filters
  • Create clickable links for URLs present in comments
    • Implemented as follow: double clicking the URL will select whole URL, at which point you can use your browser's "Open in new tab" feature on the selection.
  • Code folding ability:
    • !#if ... / !#endif blocks
    • !#include blocks
@uBlock-user uBlock-user added the TODO todo label Jul 1, 2020
@Yuki2718
Copy link

Yuki2718 commented Jul 2, 2020

Warn in gutter about untokenizable static network filters
This will be huge plus!

gorhill added a commit to gorhill/uBlock that referenced this issue Jul 5, 2020
Related issue:
- uBlockOrigin/uBlock-issues#1134

Specifically;

- `beacon`, `ping`, and `websocket` cannot be redirected;
- it's ok to not specify a type when redirecting to `empty`
  resource;
- `csp=` option can't be mixed with other types, redirec
  directives, and more `csp=` options.
gorhill added a commit to gorhill/uBlock that referenced this issue Jul 8, 2020
Related issue:
- uBlockOrigin/uBlock-issues#1134

Invalid values for `!#if ...` will be highlighted as errors.

Auto completion is now supported for both the directives
themselves and the valid values for `!#if ...`.

For examples, when pressing ctrl-space:

- `!#e` will auto-complete to `!#endif`
- `!#i` will offer to choose between `!#if ` or `!#include `
- `!#if fir` will auto-complete to `!#if env_firefox`

Additionally, support for some of AdGuard preparsing
directives, i.e. `!#if adguard` is now a valid and will be
honoured -- it always evaluate to `false` in uBO.
gorhill added a commit to gorhill/uBlock that referenced this issue Jul 10, 2020
Related issue:
- uBlockOrigin/uBlock-issues#1134

CodeMirror's code folding reference:
- https://codemirror.net/doc/manual.html#addon_foldcode

This commit adds support for code-folding to the filter
list editor/viewer.

The following blocks of code are foldable by clicking the
corresponding marker in the gutter:

- !#if/#endif blocks
- !#include blocks

Addtionally, the following changes:

- The `!#include` line is now preserved when importing a
  sublist
- The `!#if` directives will be syntax-colored according
  to whether they evaluate to true or false on the current
  platform
- Double-clicking on a foldable line in the gutter will
  select the content of the foldable block
- Minor visual improvement to matching brackets
@gwarser
Copy link

gwarser commented Jul 11, 2020

This is invalid now:

$all,~document

easylist/easylist#5621 (comment)

gorhill/uBlock#2385

@gorhill
Copy link
Member Author

gorhill commented Jul 12, 2020

I will fix this asap. However I would like to understand the purpose of that /adguard. filter in EasyList. Someone knows?

gorhill added a commit to gorhill/uBlock that referenced this issue Jul 12, 2020
@krystian3w
Copy link

Maybe some ad engine have name "adguard...".

@Yuki2718
Copy link

I will fix this asap. However I would like to understand the purpose of that /adguard. filter in EasyList. Someone knows?

Apparently came from the Fanboy's list so @ryanbr would know. Clearly it has nothing to do with AdGuard.
easylist/easylist@648b822

@gwarser
Copy link

gwarser commented Jul 12, 2020

This look like first occurrence: ryanbr/fanboy-adblock@d51ac92

@gorhill
Copy link
Member Author

gorhill commented Jul 12, 2020

If I am getting this right, that /adguard. filter has always been rejected by uBO, since it does not use the all property, ~document can't be used on its own.

I am trying to assess whether I should publish an emergency fix for the all,~document regression.

@gwarser
Copy link

gwarser commented Jul 12, 2020

~document was added recently easylist/easylist@24784f0

This is puzzling: easylist/easylist#5286


[rdk@on assets.json_resources]$ grep '~document' *
DEU-0_DEU EasyList Germany.txt:@@||adonline.anonza.de^$~document,~third-party
easylist_EasyList.txt:/adguard.$~document,domain=~adguard-com.cdn.ampproject.org|~adguard.com|~adguard.mobi|~adguard.oneskyapp.com|~greinr.com|~marketplace.visualstudio.com
easylist_EasyList.txt:@@||lemon-ads.com^$~document,~third-party
[rdk@on assets.json_resources]$ 

@gorhill
Copy link
Member Author

gorhill commented Jul 12, 2020

This is puzzling: easylist/easylist#5286

Yeah, /adguard. on its own would never cause strict-blocking to kick in, so it's unclear what the original issue was.

The other cases seem to be about not blocking 1st-party secondary resources without wholly disabling filtering -- so probably they are targeting ABP specifically.

@gwarser
Copy link

gwarser commented Jul 12, 2020

Yeah, /adguard. on its own would never cause strict-blocking to kick in, so it's unclear what the original issue was.

This was fixed after gorhill/uBlock@41636c5 v1.21 and I think it was fixed very recently in legacy builds. In 1.16.4.22

@gwarser
Copy link

gwarser commented Jul 18, 2020

Are filters with errors actually dropped? Because:

My filters 1 used out of 1

||testwa-blokada.eu.example^$script,domain=gameplanet.onet.pl/gry-online/

image

And nothing is logged in the logger.

v1.28.5b1 in Chome.

MajkiIT/polish-ads-filter#16850 (comment)

@gorhill
Copy link
Member Author

gorhill commented Jul 18, 2020

That one is not dropped but on the other hand it will never match, since no hostname will ever match gameplanet.onet.pl/gry-online/.

I don't think uBO should drop whole filter because one domain= value is invalid, for example:

||testwa-blokada.eu.example^$script,domain=gameplanet.onet.pl/gry-online/|example.com

Surely we still want to enforce the filter on example.com despite that gameplanet.onet.pl/gry-online/ is not a valid domain.

This is why I chose to treat invalid domain= values as soft error, i.e. the filter will not be dropped, and neither the value (or else that would cause the filter to become broader). It gets complicated with negated invalid hostname though...

@gwarser
Copy link

gwarser commented Jul 26, 2020

Turns out some people imported uMatrix ruleset recipes as filter list in uBO https://www.reddit.com/r/uBlockOrigin/comments/hy3nsy/rulesets_for_english_websites_breaking_youtube/

Maybe ask for proper header in filter lists and treat host files more strict?


Another

randomwebsite.org/* $inline-script

https://www.reddit.com/r/uBlockOrigin/comments/hxorev/ublock_origin_page_loading_issue/fzaguh5/

@gorhill
Copy link
Member Author

gorhill commented Jul 26, 2020

randomwebsite.org/* $inline-script

That will be taken care with #1118.

cqx931 pushed a commit to cqx931/AdNauseam that referenced this issue Jul 28, 2020
Related commit:
- gorhill@2eec285

Related feedback:
- uBlockOrigin/uBlock-issues#1134 (comment)

Source commit was imported manually because cherry-pick
didn't work properly dur to earlier changes in target
file.
@jspenguin2017
Copy link

jspenguin2017 commented Aug 24, 2020

Here's a suggestion for the filter viewer/editor, although not exactly related to syntax highlighting:

When double-clicking on a domain name, the whole domain should be selected, instead of just the part that was clicked on.

Use cases:

  • Quickly select and copy domain
  • Quickly select and search domain (double-click, then Ctrl+F)

Current behavior:
image

Suggested behavior:
image


Also when double-clicking on the suffix of a cosmetic filter, the - character should not break selection.

Current behavior:
image

Suggested behavior:
image

@jspenguin2017

This comment has been minimized.

gorhill added a commit to gorhill/uBlock that referenced this issue Oct 11, 2020
gorhill added a commit to gorhill/uBlock that referenced this issue Oct 14, 2020
This allows to bring in all the benefits of
syntax highlighting and enhanced editing
features in the element picker, like auto-
completion, etc.

This is also a necessary step to possibly solve
the following issue:

- #2035

Additionally, incrementally improved the behavior
of uBO's custom CodeMirror static filtering syntax
mode when double-clicking somewhere in a static
extended filter:

- on a class/id string will cause the whole
  class/id string to be   selected, including the
  prepending `.`/`#`.

- somewhere in a hostname/entity will cause all
  the labels from the cursor position to the
  right-most label to be selected (subject to
  change/fine-tune as per feedback of filter
  list maintainers).

Related feedback:
- uBlockOrigin/uBlock-issues#1134 (comment)
gorhill added a commit to gorhill/uBlock that referenced this issue Oct 16, 2020
Related issue:
- uBlockOrigin/uBlock-issues#1134

Double-clicking on...

... a filter option will cause the option to be
wholly selected, including `=[value]` if present;

... a value assigned to a filter option will cause
the value to be wholly selected, except when the
value is a hostname/entity, in which case all the
labels from the cursor position to the right-most
label will be selected.
@gwarser
Copy link

gwarser commented Oct 28, 2020

image

||video-weaver..hls.ttvnw.net/v/playlist/.m3u8$replace=/#EXT-X-DATERANGE:ID="stitched-ad[\s\S](#EXT-X-PROGRAM-DATE-TIME)(?![\s\S]?#EXTINF:\d.\d+\,live)[\s\S]|(#EXT-X-DATERANGE:ID="stitched-ad[\s\S]?(#EXT-X-PROGRAM-DATE-TIME)([\s\S]?(#EXTINF:\d.\d+\,live)))/\$5/

gorhill added a commit to gorhill/uBlock that referenced this issue Dec 2, 2020
@rusty-snake
Copy link

Pipes in regexp of removeparam rules are blue. As there is no regexp highlighting they shouldn't be either.

$removeparam=/^(foo|bar)=/,domain=foo.com|bar.com

Screenshot_2021-10-26_11-41-45-fs8

@u-RraaLL
Copy link
Contributor

I'm entirely not sure if this is within the purview of this issue, but uBO's highlighter doesn't detect errors within the :is() selector (or :where()).

@gorhill
Copy link
Member Author

gorhill commented Oct 26, 2021

uBO uses the browser itself to validate CSS selector. If the browser reports as valid, uBO reports as valid.


From console, Chromium says document.querySelector('div:is(1)'); is valid -- no exception is triggered, so it's reported as valid.

@uBlock-user
Copy link
Contributor

uBlock-user commented Oct 26, 2021

https://developer.mozilla.org/en-US/docs/Web/CSS/:where
https://developer.mozilla.org/en-US/docs/Web/CSS/:is

Both are fully supported by the browser vendors as part of the Selectors-4 spec.

@Yuki2718
Copy link

Before I forget: DandelionSprout/adfilt#163 (reply in thread)

@krystian3w
Copy link

krystian3w commented Oct 26, 2021

but uBO's highlighter doesn't detect errors

uses the browser itself to validate CSS selector

I suppose RaL try said about bad colorize elements inside ( ) for :is / :where and legacy:

:-moz-any (2011)
:-webkit-any (2011)
:matches() - Note: :matches() was renamed to :is() in CSSWG issue #3258 (2018).

@gwarser
Copy link

gwarser commented Oct 26, 2021

Anything you put in :is() or :where() will be valid? https://developer.mozilla.org/en-US/docs/Web/CSS/:is#forgiving_selector_parsing


https://drafts.csswg.org/selectors-4/#typedef-forgiving-selector-list

To parse as a forgiving selector list given an input input:
...
2. Remove all failure items from selector list, and all items that are invalid selectors, then return a representing the remaining items in selector list. (This might be empty.)

@gwarser
Copy link

gwarser commented Oct 27, 2021

Can errors be marked by text-decoration: red wavy underline;? It will make them more distinguishable from tokenization issues and make it more in line with how text editors mark incorrect spelling.

image

@gorhill
Copy link
Member Author

gorhill commented Oct 27, 2021

I tried to use wavy originally, but this not work well in CodeMirror.

Screenshot from 2021-10-27 10-20-47

Eventually I plan to add a gutter markers to better convey extra information.

@gwarser
Copy link

gwarser commented Oct 27, 2021

Not pretty on Chrome, but it's not broken so badly on my side:

image

Nightly 95 uBO dev vs Chrome 95 uBO stable

Different font?

@krystian3w
Copy link

krystian3w commented Oct 27, 2021

Or this is ugly https://developer.mozilla.org/en-US/docs/Web/CSS/text-decoration-skip-ink

@krystian3w
Copy link

@uBlock-user uBlock-user added fixed issue has been addressed and removed TODO todo labels Nov 20, 2022
@MasterKia
Copy link
Member

*$1p,strict3p,script,header=via:1.1 google is highlighted correctly in "My filters"
Screenshot_20230308_214232

But the header=via:1.1 google part is red in assets viewer.

Screenshot_20230308_214206

filterOnHeaders is set to true.

STR:

  • Import https://raw.githubusercontent.com/MasterKia/PersianBlocker/main/PersonalBlocker.txt
  • Click on the "View content" button.

gorhill added a commit to gorhill/uBlock that referenced this issue Mar 8, 2023
gorhill added a commit to gorhill/uBlock that referenced this issue Apr 1, 2023
@MasterKia
Copy link
Member

##+js and :matches-path highlighting problem. Not sure if it's worth fixing though.

example.com##+js(noeval):matches-path(/some-path)

example.com##+js(noeval, x):matches-path(/some-path)

Screenshot_20230705_174103

@gorhill
Copy link
Member Author

gorhill commented Jul 5, 2023

It's not a highlighting issue, since the scriptlet is interpreted as:

example.com##+js(noeval, "x):matches-path(/some-path")

Which is valid.

matches-path is a procedural operator for cosmetic filtering purpose, it's completely unrelated to scriptlet injection.

@uBlock-user
Copy link
Contributor

There are still many tasks unfinished in #1134 (comment), are they abandoned since you closed this issue ?

@gwarser

This comment was marked as resolved.

@gwarser

This comment was marked as resolved.

@MasterKia

This comment was marked as resolved.

@garry-ut99
Copy link

Improve extraction of tokens from regex-based filters - worth to mention also about this commit, as a part of it is related to untokenizable regex syntax highlighting (and it doesn't contain any reference to this thread, nor any key words in the title, like: syntax, highlight, linter).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request fixed issue has been addressed
Projects
None yet
Development

No branches or pull requests

10 participants