Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

$removeparam doesn't work well with UrlEncoded gb2312 Chinese word #1717

Closed
8 tasks done
MkQtS opened this issue Sep 12, 2021 · 7 comments
Closed
8 tasks done

$removeparam doesn't work well with UrlEncoded gb2312 Chinese word #1717

MkQtS opened this issue Sep 12, 2021 · 7 comments
Labels
bug Something isn't working fixed issue has been addressed

Comments

@MkQtS
Copy link

MkQtS commented Sep 12, 2021

Prerequisites

I tried to reproduce the issue when...

  • uBO is the only extension
  • uBO with default lists/settings
  • using a new, unmodified browser profile

Description

The url is https://www.baidu.com/s?wd=%D6%D0%CE%C4&oq=test, %D6%D0%CE%C4 is actually a UrlEncoded form of a Chinese word 中文 which encoding with gb2312. Before using the fliter

After I add a fliter like this: ||baidu.com/s^$removeparam=oq, the url became https://www.baidu.com/s?wd=%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD, which makes no sense. After using the fliter

However, It works well for https://www.baidu.com/s?wd=%E4%B8%AD%E6%96%87&oq=test, %E4%B8%AD%E6%96%87 is also a UrlEncoded form of a Chinese word 中文 encoding with UTF-8. After using the fliter, the url became https://www.baidu.com/s?wd=%E4%B8%AD%E6%96%87.

I am not sure if it's related to system or browser. When I exactly type https://www.baidu.com/s?wd=%E4%B8%AD%E6%96%87&oq=test, the Omnibox would just show https://www.baidu.com/s?wd=中文&oq=test. When I type https://www.baidu.com/s?wd=%D6%D0%CE%C4&oq=test, the Omnibox would also show https://www.baidu.com/s?wd=%D6%D0%CE%C4&oq=test

A specific URL where the issue occurs

https://www.baidu.com/s?wd=%D6%D0%CE%C4&oq=test

Steps to Reproduce

1.Open https://www.baidu.com/s?wd=%D6%D0%CE%C4&oq=test
2.Add a fliter: ||baidu.com/s^$removeparam=oq
3.Refresh

Expected behavior

The url becomes https://www.baidu.com/s?wd=%D6%D0%CE%C4

Actual behavior

The url became https://www.baidu.com/s?wd=%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD

uBlock Origin version

1.37.2

Browser name and version

Chrome 93.0.4577.63

Operating System and version

Windows 10, 21H1

@gorhill
Copy link
Member

gorhill commented Sep 12, 2021

I can reproduce. It seems the issue is when parsing the query parameters using URLSearchParams -- this is surprising, I would expect this API to properly handle encoded query values.


So if I understand correctly, the URL is encoded using the page encoding.

@gwarser gwarser added the bug Something isn't working label Sep 12, 2021
gorhill added a commit to gorhill/uBlock that referenced this issue Sep 12, 2021
@gorhill
Copy link
Member

gorhill commented Sep 12, 2021

Fix is in 1.37.3rc2, but rc1 dev build in Chrome store is pending review, so I don't know when rc2 will be available in Chrome store.

@uBlock-user uBlock-user added the fixed issue has been addressed label Sep 12, 2021
@gwarser
Copy link

gwarser commented Sep 13, 2021

@MkQtS from where did you originally get this gb2312-encoded "中文"?

@vtriolet
Copy link

It looks like a similar URLSearchParams issue will cause encoded parameter values to be displayed incorrectly on the strict-blocking page:

https://github.com/gorhill/uBlock/blob/89064478dda8c32b467e95cbd7dfbd3f8ecdde07/src/js/document-blocked.js#L155-L164

ubo_encoded_params_strict_blocking
-- Screenshot from 1.37.3rc2

(I'm not sure if the protocol is to file a new ticket, but I thought I'd at least mention it here first for confirmation.)

@gorhill
Copy link
Member

gorhill commented Sep 13, 2021

I am pretty sure this is a site issue since it appears it custom-encode some parts of the URL which should be encoded using encodeURIComponent(), but uBO will have to be ready for invalid decodeURIComponent().

@MkQtS
Copy link
Author

MkQtS commented Sep 13, 2021

@MkQtS from where did you originally get this gb2312-encoded "中文"?

@gwarser I am Chinese. Long long ago, I found that sometimes Chinese character would be shown as %XX%XX in Address bar, then I learnt it's UrlEncoding. Some website provides this kind of tool.
When I found this problem, the url was really really long. So, I just write a shorter url to better represent the problem.

@gwarser
Copy link

gwarser commented Sep 13, 2021

@MkQtS I ask because I tried various ways and I always get correct utf-8 encoding. There is even ie=utf-8 parameter in search URL.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working fixed issue has been addressed
Projects
None yet
Development

No branches or pull requests

5 participants