Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attempting to use a newline character in pattern with the PCRE2 engine, outside multiline mode fails silently #1261

Closed
dprobinson opened this issue Apr 21, 2019 · 2 comments
Labels
bug A bug.

Comments

@dprobinson
Copy link

What version of ripgrep are you using?

ripgrep 11.0.1 (rev e7829c0)
-SIMD -AVX (compiled)
+SIMD -AVX (runtime)

How did you install ripgrep?

Compiled from source

What operating system are you using ripgrep on?

Arch Linux

Describe your question, feature request, or bug.

Attempting to use a newline character ("\n") in pattern with the PCRE2 engine (-P), outside multiline mode fails silently.

If this is a bug, what are the steps to reproduce the behavior?

Create the following corpus.txt:

testing
RIP
GREP
again

First, try the search with Rust's engine:

rg "RIP\nGREP" corpus.txt

Output:

the literal '"\n"' is not allowed in a regex

Consider enabling multiline mode with the --multiline flag (or -U for short).
When multiline mode is enabled, new line characters can be matched.

As expected (from the documentation), we see an error.

Now try searching using the PCRE2 engine:

rg -P "RIP\nGREP" corpus.txt

Output:

As we see, we've tried to use a newline char without multiline mode, but receive no error. The search fails silently.

N.B. Searching using the PCRE2 engine, with multiline mode (-U) works as expected:

rg -PU "RIP\nGREP" corpus.txt

Output:

2:RIP
3:GREP

If this is a bug, what is the actual behavior?

See output above

If this is a bug, what is the expected behavior?

I would expect ripgrep to throw an error, warning the user that newlines cannot be matched outside multiline mode (-U) when using PCRE2. This would yield the same behaviour as when Rust's engine is used (#1055)

@BurntSushi
Copy link
Owner

BurntSushi commented Apr 21, 2019

This can't be fixed because PCRE doesn't expose anything to parse its syntax to detect the newline characters in the pattern. It might be possible to detect some simple cases without parsing the regex, but I don't know how far down that road I want to go.

@dprobinson
Copy link
Author

That's a shame. I thought the reason might be something like that.

Maybe it would be worth us tweaking the documentation, to make to clear an error won't be thrown when using PCRE2?

Current text:

For example, when multiline mode is not enabled (the default), then the regex \p{any} will match any Unicode codepoint other than \n. Similarly, the regex \n is explicitly forbidden, and if you try to use it, ripgrep will return an error.

@BurntSushi BurntSushi added the bug A bug. label Apr 22, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug A bug.
Projects
None yet
Development

No branches or pull requests

2 participants