Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Providing an invalid regex produces a misformatted error message #395

Closed
bdesham opened this issue Mar 7, 2017 · 1 comment
Closed

Providing an invalid regex produces a misformatted error message #395

bdesham opened this issue Mar 7, 2017 · 1 comment
Labels
bug A bug. icebox A feature that is recognized as possibly desirable, but is unlikely to implemented any time soon.

Comments

@bdesham
Copy link

bdesham commented Mar 7, 2017

I tried to search for the regex \s*{, which is invalid—the { needed to be backslash-escaped. The error that ripgrep produced was kind of bizarre though:

Screenshot of my terminal, showing the weird error message from ripgrep

I’m using ripgrep 0.4.0 with the built-in macOS Terminal application. The text encoding is set to UTF-8.

Here is a hex dump of ripgrep’s error message:

$ rg "\s*{" 2>&1 | hexdump -C
00000000  45 72 72 6f 72 20 70 61  72 73 69 6e 67 20 72 65  |Error parsing re|
00000010  67 65 78 20 6e 65 61 72  20 27 5c 73 2a 7b 27 20  |gex near '\s*{' |
00000020  61 74 20 63 68 61 72 61  63 74 65 72 20 6f 66 66  |at character off|
00000030  73 65 74 20 33 3a 20 49  6e 76 61 6c 69 64 20 61  |set 3: Invalid a|
00000040  70 70 6c 69 63 61 74 69  6f 6e 20 6f 66 20 72 65  |pplication of re|
00000050  70 65 74 69 74 69 6f 6e  20 6f 70 65 72 61 74 6f  |petition operato|
00000060  72 20 74 6f 3a 20 27 28  3f 75 3a 5b 09 2d 0d 20  |r to: '(?u:[.-. |
00000070  2d 20 c2 85 2d c2 85 c2  a0 2d c2 a0 e1 9a 80 2d  |- ..-....-.....-|
00000080  e1 9a 80 e2 80 80 2d e2  80 8a e2 80 a8 2d e2 80  |......-......-..|
00000090  a9 e2 80 af 2d e2 80 af  e2 81 9f 2d e2 81 9f e3  |....-......-....|
000000a0  80 80 2d e3 80 80 5d 29  2a 27 2e 0a              |..-...])*'..|

Other queries that end with the bad { produce fine-looking error messages:

$ rg "\s{"                   
Error parsing regex near '\s{' at character offset 3: Missing maximum in counted
repetition operator.
$ rg "a*{"
Error parsing regex near 'a*{' at character offset 2: Invalid application of
repetition operator to: '(?u:a)*'.
@BurntSushi
Copy link
Owner

Oh that's just wonderful. The error message is pretty printing the AST, and since the AST isn't a faithful AST, it ends up emitting an unreadable character class. (The AST stores the expanded form of \s rather than \s itself.)

There are various things that can fix this. One is building our own error message. Another is rewriting the regex parser to itself give better error messages (which is in progress).

@BurntSushi BurntSushi added the bug A bug. label Mar 7, 2017
@BurntSushi BurntSushi added the icebox A feature that is recognized as possibly desirable, but is unlikely to implemented any time soon. label Apr 9, 2017
bors added a commit to rust-lang/regex that referenced this issue Oct 22, 2017
…urntSushi

Remove sub-expressions from parsing error

This PR skips printing sub-expressions in the error message.

This should avoid situations where the error message contains an AST with unreadable characters, as shown [here](BurntSushi/ripgrep#395).

<img width="1030" alt="unreadable-chars" src="https://user-images.githubusercontent.com/12599697/29924193-02147d0c-8e11-11e7-8b36-a97b63f802b4.png">
BurntSushi added a commit that referenced this issue Mar 14, 2018
This update brings with it many bug fixes:

  * Better error messages are printed overall. We also include
    explicit call out for unsupported features like backreferences
    and look-around.
  * Regexes like `\s*{` no longer emit incomprehensible errors.
  * Unicode escape sequences, such as `\u{..}` are now supported.

For the most part, this upgrade was done in a straight-forward way. We
resist the urge to refactor the `grep` crate, in anticipation of it
being rewritten anyway.

Note that we removed the `--fixed-strings` suggestion whenever a regex
syntax error occurs. In practice, I've found that it results in a lot of
false positives, and I believe that its use is not as paramount now that
regex parse errors are much more readable.

Closes #268, Closes #395, Closes #702, Closes #853
BurntSushi added a commit that referenced this issue Mar 14, 2018
This update brings with it many bug fixes:

  * Better error messages are printed overall. We also include
    explicit call out for unsupported features like backreferences
    and look-around.
  * Regexes like `\s*{` no longer emit incomprehensible errors.
  * Unicode escape sequences, such as `\u{..}` are now supported.

For the most part, this upgrade was done in a straight-forward way. We
resist the urge to refactor the `grep` crate, in anticipation of it
being rewritten anyway.

Note that we removed the `--fixed-strings` suggestion whenever a regex
syntax error occurs. In practice, I've found that it results in a lot of
false positives, and I believe that its use is not as paramount now that
regex parse errors are much more readable.

Closes #268, Closes #395, Closes #702, Closes #853
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug A bug. icebox A feature that is recognized as possibly desirable, but is unlikely to implemented any time soon.
Projects
None yet
Development

No branches or pull requests

2 participants