Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accept unicode letters in keyword search terms #4745

Closed
philrz opened this issue Aug 10, 2023 · 2 comments · Fixed by #4799 or #4796
Closed

Accept unicode letters in keyword search terms #4745

philrz opened this issue Aug 10, 2023 · 2 comments · Fixed by #4799 or #4796
Assignees

Comments

@philrz
Copy link
Contributor

philrz commented Aug 10, 2023

At the time of the filing of this issue, Zed is at commit b8e1b18.

A user said in a community Slack thread:

I've started to use simplified search expressions. It annoys me that when search for a person with non-ascii names I have to quote the name. For instance that zq 'bjorndal | {id,name}' is allowed, but if I do zq 'bjørndal | {id, name} I get this error:

zq: error parsing Zed at column 3:
bjørndal | {id, name}
= ^ ===

and I have to go back and edit this expression to zq '"bjørndal" | {id, name}'. It all seems to depend on the definition of https://zed.brimdata.io/docs/language/search-expressions#keyword-search-term which says that only a very limited set of ASCII letters are part of keywords. I wish you would extend this. Most modern languages just follow the Unicode classifications of letters to define stuff like this.

We discussed this one as a team and consensus was that this should not be too difficult to extend in the parser.

Here's a repro using data input.csv.

$ cat input.csv 
id,name
1,bjorndal
2,bjørndal

$ zq -version
Version: v1.9.0-7-gb8e1b188

$ zq 'bjorndal' input.csv
{id:1.,name:"bjorndal"}

$ zq 'bjørndal' input.csv
zq: error parsing Zed at column 3:
bjørndal
= ^ ===
@artemklevtsov
Copy link

The same with yield:

$ echo '{"тест": "значение"}' | zq -Z 'yield тест' -
# zq: error parsing Zed at column 7:
# yield тест
#   === ^ ===

mattnibs added a commit that referenced this issue Oct 5, 2023
mattnibs added a commit that referenced this issue Oct 5, 2023
mattnibs added a commit that referenced this issue Oct 10, 2023
mattnibs added a commit that referenced this issue Oct 13, 2023
mattnibs added a commit that referenced this issue Oct 13, 2023
mattnibs added a commit that referenced this issue Oct 16, 2023
mattnibs added a commit that referenced this issue Oct 16, 2023
Support Unicode Identifiers

Part of: #4745
mattnibs added a commit that referenced this issue Oct 16, 2023
mattnibs added a commit that referenced this issue Oct 16, 2023
@philrz
Copy link
Contributor Author

philrz commented Oct 17, 2023

Verified in Zed commit 24b9cc9.

Repeating the original repro steps, now I can search for and successfully find the unicode text.

$ zq -version
Version: v1.10.0-10-g24b9cc95

$ zq 'bjørndal' input.csv
{id:2.,name:"bjørndal"}

Thanks @mattnibs!

@philrz philrz linked a pull request Oct 17, 2023 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants