-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Request: Early Bail Out for Time-Consuming Regex Matches in pcre2 #2786
Comments
FWIW, here's a faster way to match two words in the same line: |
It serves me well, thank you. |
Or also, Anyway, PCRE2 doesn't support a way to say, "if the search is taking longer than X time, stop." I don't know of any regex engine that supports that. If you think for a moment about how something like that would be implemented, it's pretty clear why: imagine trying to check a timeout value when you're in the middle of some vectorized SIMD loop. It would trash performance. And this sort of timeout check would need to be in virtually every loop everywhere. Ain't going to happen. PCRE2, like most regex engines, does expose other types of resource limits. But none of them really approximate "time." Your best bet is to filter out longer lines yourself ( |
I don't mean to be pedantic, but I'll add a few comments since it's an interestic topic 🙂 The .NET regex engine supports timeouts (there's a constructor overload which takes a timeout parameter). This feature is mainly provided for searching with untrusted patterns in order to avoid DoS attacks. I haven't looked at how it's implemented but I believe a check is inserted when backtracking (and maybe in some other cases such as lookarounds). The check is simple enough but it still needs to retrieve the clock when enabled. In any case this article states:
As for PCRE2, it has a |
Describe your feature request
When using
rg
with--engine pcre2
for regex patterns that include complex assertions such as lookaheads, the search can take an excessive amount of time, especially with very large input strings. It would be beneficial to have an option that allowsrg
to skip strings that exceed a certain processing time threshold, thus preventing long-running searches.The
rg
command at the bottom of the code block below took around 10 minutes to complete.An early bail-out option in
rg
would help users prioritize search speed when processing complex regex patterns on large data, avoiding long waits for unlikely matches.Example Usage
In this example,
--max-match-time-pcre2 1000
is a new option that instructsrg
to spend no more than 1000 milliseconds attempting to find a match in a line. If the match attempt exceeds this duration,rg
will skip it.Background Info
To provide some background information on why this feature request was created: After downloading the
Mailing List Archive
forzsh
,rg
was used to search for two words in the same line.However, there is at least one base64 string with approximately 700,000 characters, which caused
rg
to take a considerable amount of time to process.zsh.org/mla/workers/2015/msg03191.html
The text was updated successfully, but these errors were encountered: