-
-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GNU grep can't match foreign language characters and outputs everything #3010
Comments
@vovcacik We don't have locale support. |
I don't want to make this issue about the It seems that the grep see the č=0xC4 0x8D UTF-8 bytes correctly:
|
UTF-8 support and locales are different things. |
No doubt about that. I am afraid that because of
made you think this is also locale problem. I am not saying it is not, but notice that the Example of regex operation that requires locale would be |
More findings:
I guess So I've got my workaround, feel free to close if you don't consider this a bug. |
@vovcacik Thanks a lot for reporting! I'm unable to reproduce it on a device I tested with just now. The below transcript indicates that I cannot reproduce your problem, right?
As seen, both busybox grep and coreutils correctly finds only the matching line. This is regardless of me setting Some things to try:
Does that make a change? If not, could you paste the output from running |
Yes, it appears alright on your device. You could maybe double check that you are running gnu grep from I'll try the suggestions as soon as possible and get back to you. |
I did the suggestion to
|
I can confirm the problem on arm and android 7.1, but busybox grep works as intended |
@vovcacik Just being curious, what do
give you? |
@tomty89 interesting. The grep switches to binary mode and it stops printing rest of the line when it hits
But I can't say whether this is expected or not. |
Hmm, looks like it's even more messed up than I thought (that the newlines were ignored for some reason, like multiple characters being treated as a single character, for example). Now I wonder if |
I know. Most likely it's some old bionic bug.
That actually makes the problem look even more irrational. Seems like grep ignore newlines but only in a peculiar manner? (Partially ignore it when doing the final output but not when matching?) Not sure if it's relevant, but I can't make grep in Termux do what's in your second post. In Arch (proot) I can make that happen by unsetting LANG or setting it to |
When libandroid-support dependency is set, the script if [ "$TERMUX_PKG_DEPENDS" != "${TERMUX_PKG_DEPENDS/libandroid-support/}" ]; then
# If using the android support library, link to it and include its headers as system headers:
CPPFLAGS+=" -isystem $TERMUX_PREFIX/include/libandroid-support"
LDFLAGS+=" -landroid-support"
fi |
grep packages for testing: |
I know, which is silly. I don't see any reason that we should have symlink for one of the headers but not the other (and explicitly depend to libandroid-support package by package when we notice a problem. In fact I'm not sure if there's good reason for not putting them directly under |
The updated @tomty89 Agreed, this whack-a-mole of adding libandroid-support when a problem pops up is a bit silly. |
It's fixed, thank you! |
See #5171. |
Hi, I noticed that GNU grep has problem to match czech characters and so it outputs more lines than it should.
Reproducible example:
Shortened output
I think the
LANG
variable should make this work but I do setLC_ALL
along with it. Unfortunately it fails, but that is a problem I reported separately #3009.tl;dr
The text was updated successfully, but these errors were encountered: