-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Factor common prefix text out of Regex alternations #2171
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
stephentoub
added
area-System.Text.RegularExpressions
tenet-performance
Performance related issue
labels
Jan 24, 2020
danmoseley
reviewed
Jan 24, 2020
src/libraries/System.Text.RegularExpressions/tests/RegexReductionTests.cs
Outdated
Show resolved
Hide resolved
danmoseley
reviewed
Jan 24, 2020
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexNode.cs
Outdated
Show resolved
Hide resolved
danmoseley
reviewed
Jan 24, 2020
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexNode.cs
Show resolved
Hide resolved
danmoseley
reviewed
Jan 24, 2020
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexNode.cs
Show resolved
Hide resolved
danmoseley
reviewed
Jan 24, 2020
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexNode.cs
Show resolved
Hide resolved
danmoseley
reviewed
Jan 24, 2020
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexNode.cs
Show resolved
Hide resolved
danmoseley
reviewed
Jan 24, 2020
src/libraries/System.Text.RegularExpressions/tests/RegexReductionTests.cs
Show resolved
Hide resolved
danmoseley
reviewed
Jan 24, 2020
src/libraries/System.Text.RegularExpressions/tests/RegexReductionTests.cs
Show resolved
Hide resolved
danmoseley
reviewed
Jan 24, 2020
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexNode.cs
Show resolved
Hide resolved
Given a regex like "this|that|there", we will now factor out the common prefix into a new node concatenated with the alternation, e.g. "th(?:is|at|ere)". This has a few benefits, including exposing more text to FindFirstChar if this is at the beginning of the sequence, reducing backtracking, and enabling further reduction/optimization opportunities in the alternation.
stephentoub
force-pushed
the
alternaterefactor
branch
from
January 27, 2020 16:19
53ee687
to
fe88ba0
Compare
eerhardt
reviewed
Jan 27, 2020
...braries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexBoyerMoore.cs
Outdated
Show resolved
Hide resolved
eerhardt
approved these changes
Jan 27, 2020
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pgovind
approved these changes
Jan 27, 2020
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Labels
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Given a regex like "this|that|there", we will now factor out the common prefix into a new node concatenated with the alternation, e.g. "th(?:is|at|ere)". This has a few benefits, including exposing more text to FindFirstChar if this is at the beginning of the sequence, reducing backtracking, and enabling further reduction/optimization opportunities in the alternation.
For example, in this trivial microbenchmark,
this|that
will be refactored intoth(?:is|at)
, which will cause FindFirstChar to generate a Boyer-Moore search for "th" instead of just looking for 't', and it'll cause Go to generate a check for "th" before the alternation, such that it won't re-check "th" when it has to backtrack and try the second branch of the alternation.Without whitespace diff: https://github.com/dotnet/runtime/pull/2171/files?w=1
Contributes to #1349
cc: @danmosemsft, @eerhardt, @pgovind, @ViktorHofer