-
Notifications
You must be signed in to change notification settings - Fork 542
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
simpify and speed up /.*.../ handling
See RT ##123743. A pattern that starts /.*/ has a fake MBOL or SBOL flag added, along with PREGf_IMPLICIT. The idea is that, with /.*.../s, if the NFA don't match when started at pos 0, then it's not going to match if started at any other position either; while /.*.../ won't match at any other start position up until the next \n. However, the branch in regexec() that implemented this was a bit a mess (like much in the perl core, it had gradually accreted), and caused intuit-enabled /.*.../ and /.*...patterns to go quadratic. The branch looked roughly like: if (anchored) { if (regtry(s)) goto success; if (can_intuit) { while (s < end) { s = intuit(s+1); if (!s) goto fail; if (regtry(s)) goto success; } } else { while (s < end) { s = skip_to_next_newline(s); if (regtry(s)) goto success; } } } The problem is that in the presence of a .* at the start of the pattern, intuit() will always return either NULL on failure, or the start position, rather than any later position. So the can_intuit branch above calls regtry() on every character position. This commit fixes this by changing the structure of the code to be like this, where it only tries things on newline boundaries: if (anchored) { if (regtry(s)) goto success; while (1) { s = skip_to_next_newline(s); if (can_intuit) { s = intuit(s+1); if (!s) goto fail; } if (regtry(s)) goto success; } } This makes the code a lot simpler, and mostly avoids quadratic behaviour (you can still get it with a string consisting mainly of newlines).
- Loading branch information
Showing
2 changed files
with
58 additions
and
80 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters