Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: control/filtering/set_url | scanning filter contents: bufio.Scanner: token too long | 400 on blocklist update #6003

Closed
4 tasks done
ppfeufer opened this issue Jul 12, 2023 · 26 comments
Assignees
Milestone

Comments

@ppfeufer
Copy link

Prerequisites

Platform (OS and CPU architecture)

Linux/ARM64

Installation

GitHub releases or script from README

Setup

On one machine

AdGuard Home version

v0.107.34

Action

Trying to update my blocklist via the UI.

Expected result

Blocklist updating successfully.

Actual result

Error: control/filtering/set_url | scanning filter contents: bufio.Scanner: token too long | 400

image

Additional information and/or screenshots

This is a blocklist I have been using for a long time, and after today's update, I noticed that it is mentioned with 0 entries.
image

So I tried to update it manually by editing and saving, which resulted in this error message.

Blocklist URL: https://raw.githubusercontent.com/ppfeufer/adguard-filter-list/master/blocklist

@ppfeufer
Copy link
Author

Verbose log:

023/07/12 20:19:55.609026 4179874#71 [debug] started POST 138.201.77.133:8100 /control/filtering/set_url
2023/07/12 20:19:55.609319 4179874#71 [debug] filtering: set name to "[GitHub] ppfeufer/adguard-filter-list", url to https://raw.githubusercontent.com/ppfeufer/adguard-filter-list/master/blocklist, enabled to true for filter https://github.com/ppfeufer/adguard-filter-list/blob/master/blocklist?raw=true
2023/07/12 20:19:55.609540 4179874#71 [debug] filtering: downloading update for filter 1642338271 from "https://raw.githubusercontent.com/ppfeufer/adguard-filter-list/master/blocklist"
2023/07/12 20:19:55.609749 4179874#64 [debug] home: customdial: dialing addr "raw.githubusercontent.com:443" for network tcp
2023/07/12 20:19:55.609877 4179874#82 [debug] dnsproxy: cache: serving cached response
2023/07/12 20:19:55.609980 4179874#81 [debug] dnsproxy: cache: serving cached response
2023/07/12 20:19:55.610162 4179874#64 [debug] dnsServer.Resolve: "raw.githubusercontent.com": [{185.199.108.133 } {185.199.109.133 } {185.199.110.133 } {185.199.111.133 } {2606:50c0:8000::154 } {2606:50c0:8001::154 } {2606:50c0:8002::154 } {2606:50c0:8003::154 }]
2023/07/12 20:19:55.640904 4179874#71 [debug] filtering: filter 1642338271 from url "https://raw.githubusercontent.com/ppfeufer/adguard-filter-list/master/blocklist" has no changes, skipping
2023/07/12 20:19:55.641207 4179874#71 [error] POST 138.201.77.133:8100 /control/filtering/set_url: scanning filter contents: bufio.Scanner: token too long
2023/07/12 20:19:55.641338 4179874#71 [debug] finished POST 138.201.77.133:8100 /control/filtering/set_url in 32.290566ms

@ppfeufer
Copy link
Author

And when trying to add it as new blocklist:
image

2023/07/12 20:22:53.763555 4179874#132 [debug] started POST 138.201.77.133:8100 /control/filtering/add_url
2023/07/12 20:22:53.763801 4179874#132 [debug] filtering: downloading update for filter 1689185952 from "https://raw.githubusercontent.com/ppfeufer/adguard-filter-list/master/blocklist"
2023/07/12 20:22:53.764040 4179874#134 [debug] home: customdial: dialing addr "raw.githubusercontent.com:443" for network tcp
2023/07/12 20:22:53.764157 4179874#135 [debug] dnsproxy: cache: serving cached response
2023/07/12 20:22:53.764252 4179874#136 [debug] dnsproxy: cache: serving cached response
2023/07/12 20:22:53.764315 4179874#134 [debug] dnsServer.Resolve: "raw.githubusercontent.com": [{185.199.108.133 } {185.199.109.133 } {185.199.110.133 } {185.199.111.133 } {2606:50c0:8000::154 } {2606:50c0:8001::154 } {2606:50c0:8002::154 } {2606:50c0:8003::154 }]
2023/07/12 20:22:53.796294 4179874#132 [debug] filtering: filter 1689185952 from url "https://raw.githubusercontent.com/ppfeufer/adguard-filter-list/master/blocklist" has no changes, skipping
2023/07/12 20:22:53.796391 4179874#132 [error] filtering: os.Chtimes(): chtimes /opt/AdGuardHome/data/filters/1689185952.txt: no such file or directory
2023/07/12 20:22:53.796585 4179874#132 [error] POST 138.201.77.133:8100 /control/filtering/add_url: Couldn't fetch filter from URL "https://raw.githubusercontent.com/ppfeufer/adguard-filter-list/master/blocklist": scanning filter contents: bufio.Scanner: token too long
2023/07/12 20:22:53.796629 4179874#132 [debug] finished POST 138.201.77.133:8100 /control/filtering/add_url in 33.089971ms

@ainar-g
Copy link
Contributor

ainar-g commented Jul 12, 2023

Thanks for the report. We've introduced an optimization that limits the RAM consumed by the update check by limiting the length of a single rule to 1024 bytes, and it seems like your list has 66 rules longer than that:

grep -e '^.\{1024,\}' -- ./blocklist | wc

Moreover, neither of these rules seem to be DNS rules, mostly being content-blocking rules. You can filter them out with a script like:

sed '/^.\{1024,\}/d' ./blocklist > ./blocklist_dns

@ppfeufer
Copy link
Author

Ah, I see. I'll try that.

ppfeufer added a commit to ppfeufer/adguard-filter-list that referenced this issue Jul 12, 2023
@ainar-g ainar-g added the waiting for data Waiting for users to provide more data. label Jul 12, 2023
@ppfeufer
Copy link
Author

Success!

After tweaking the transformation option of my hostlist-compiler settings it's all working again. Thanks for the quick answer and the hint!

@mphin
Copy link

mphin commented Jul 13, 2023

感谢您的报告。我们引入了一项优化,通过将单个规则的长度限制为 1024 字节来限制更新检查消耗的 RAM,您的列表似乎有 66 条规则比这长:

grep -e '^.\{1024,\}' -- ./blocklist | wc

此外,这些规则似乎都不是DNS规则,主要是内容阻止规则。您可以使用如下脚本过滤掉它们:

sed '/^.\{1024,\}/d' ./blocklist > ./blocklist_dns

Since updating to v0.107.34, I have encountered this error. I subscribed to someone else's rules, so what should I do?

@monsm
Copy link

monsm commented Jul 13, 2023

Error: control/filtering/add_url | Couldn't fetch filter from URL "https://raw.gitmirror.com/monsm/XXKiller/main/x.txt": line at index 44290: character at index 91: non-printable character | 400
@ainar-g what should I do

@ppfeufer
Copy link
Author

Ask the maintainer of that list to use HostListCompiler and apply the Validate transformation filter, that's the easiest way to generate compatible lists and what fixed my issue.

Example: https://github.com/ppfeufer/adguard-filter-list/blob/master/hostlist-compiler-config.json

@ppfeufer
Copy link
Author

Since quite a number of filter lists are used with both, AdGuardHome and ad-blocker extensions for browsers (µblock, Adguard, etc.), I guess we'll see this issue popping up for a number of these lists.

@mphin
Copy link

mphin commented Jul 13, 2023

Thank you, it seems that the rule maintainer can only make the changes.

@monsm
Copy link

monsm commented Jul 13, 2023

@ppfeufer Help me see how to implement it with the HostListCompiler,https://github.com/monsm/XXKiller/blob/mae/RMaker/make.cmd

@ppfeufer
Copy link
Author

All can be found here » https://github.com/ppfeufer/adguard-filter-list

@monsm
Copy link

monsm commented Jul 13, 2023

@ppfeufer Please check my revision to see if there are any mistakes,thinks
https://raw.githubusercontent.com/monsm/XXKiller/mae/.github/workflows/xxkiller.yml
https://raw.githubusercontent.com/monsm/XXKiller/mae/RMaker/make.cmd

@ppfeufer
Copy link
Author

This is beyond the scope and topic of this issue.

How to use the HostListCompiler is well explained in their repository (https://github.com/AdguardTeam/HostlistCompiler). Please have a look there.

@ainar-g
Copy link
Contributor

ainar-g commented Jul 13, 2023

Upon reinspecting the code, I think we can actually allow larger lines without losing the optimization for the most common case. We can also improve the error message as well. I'm going to reopen the issue now and commit a fix soon.

@ainar-g ainar-g reopened this Jul 13, 2023
@ainar-g ainar-g self-assigned this Jul 13, 2023
@ainar-g ainar-g added this to the v0.107.35 milestone Jul 13, 2023
adguard pushed a commit that referenced this issue Jul 13, 2023
Updates #6003.

Squashed commit of the following:

commit 1cc4230
Author: Ainar Garipov <A.Garipov@AdGuard.COM>
Date:   Thu Jul 13 13:47:41 2023 +0300

    all: fix chlog

commit e835084
Author: Ainar Garipov <A.Garipov@AdGuard.COM>
Date:   Thu Jul 13 13:40:45 2023 +0300

    rulelist: imp longer line handling
@ainar-g
Copy link
Contributor

ainar-g commented Jul 13, 2023

The line-length limit has been relaxed, and the error message now includes the character in question:

line 66499: character 92: non-printable character '\u200c'

@ainar-g ainar-g closed this as completed Jul 13, 2023
@monsm
Copy link

monsm commented Jul 13, 2023

The line-length limit has been relaxed, and the error message now includes the character in question:

line 66499: character 92: non-printable character '\u200c'

could the adguardHome auto fix the error,auto delete line

@ainar-g
Copy link
Contributor

ainar-g commented Jul 13, 2023

@monsm, from what I understand, the error is there to prevent users from putting e.g. binary files instead of text ones. There is a similar check against HTML text too. What kind of error are you getting? Perhaps the check could be relaxed.

@monsm
Copy link

monsm commented Jul 13, 2023

@monsm, from what I understand, the error is there to prevent users from putting e.g. binary files instead of text ones. There is a similar check against HTML text too. What kind of error are you getting? Perhaps the check could be relaxed.

zwnj & zwsp error in rules,But I don't know how to remove the unsupported lines from the rules

adguard pushed a commit that referenced this issue Jul 13, 2023
Updates #6003.

Squashed commit of the following:

commit 1874860
Author: Ainar Garipov <A.Garipov@AdGuard.COM>
Date:   Thu Jul 13 19:36:26 2023 +0300

    filtering/rulelist: imp test

commit 871a41a
Author: Ainar Garipov <A.Garipov@AdGuard.COM>
Date:   Thu Jul 13 19:10:35 2023 +0300

    filtering/rulelist: relax validation
@monsm
Copy link

monsm commented Jul 14, 2023

@ainar-g Does this submission make a relaxed judgment about zwnj, zwsp, or other special characters? What is the 1024 byte length limit now? 2adc862

@ainar-g
Copy link
Contributor

ainar-g commented Jul 14, 2023

@monsm, yes, and we have added test cases for that to make sure that they keep working. The hard line-length limit has been returned to 64 KiB.

@fbaijnauth
Copy link

Hello
Will there be a fix for this issue? I am receiving "Error: control/filtering/set_url | scanning filter contents: bufio.Scanner: token too long | 400" when trying to access the following filter
https://raw.githubusercontent.com/AdguardTeam/AdguardFilters/master/MobileFilter/sections/specific_app.txt

@ainar-g
Copy link
Contributor

ainar-g commented Jul 14, 2023

@fbaijnauth, please read above. The fix is already on the Edge channel. The README has instructions on testing the Edge and Beta versions. (Do not forget to backup your configuration.)

@fbaijnauth
Copy link

thank you

@Jefffish09
Copy link

@ainar-g May I ask, when will the stable version of v0.107.35 be released?

@ainar-g ainar-g modified the milestones: v0.107.36, v0.107.35 Jul 26, 2023
@ainar-g
Copy link
Contributor

ainar-g commented Jul 26, 2023

@Jefffish09, about 15 minutes ago, heh.

@ainar-g ainar-g unpinned this issue Jul 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants