Coalesce adjacent loops in concatenation RegexNodes #1838
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This augments the reduction phase of concatenation nodes to combine adjacent one/notone/setloops, e.g.
a*a+a{1,2}b
becomesa{2,}b
(previously added optimizations will then see that the a loop can be made atomic and replace it with the equivalent of(?>a{2,})b
). This has several benefits. First, it simplifies the node tree, creating less work for IR writer and less work for the interpreter/compiler. Second, it gives the compiler more opportunity to choose how the loop should be represented, when and how to unroll, etc. Third, it enables the auto-atomicity step to apply to more loops (as in the previous example). And most importantly, it can drastically reduce backtracking (especially with the atomicity optimization, but even without that). An expression likea*a*a*a*a*a*b
run against an input likeaaaaaaaaaaaaaa
could previously take a very long time; now, it'll be very fast, e.g.(Note that this is lacking the anchoring optimization in #1706... once that's in, this example would be an order of magnitude even faster.)
Contributes to #1349
cc: @danmosemsft, @eerhardt, @pgovind, @ViktorHofer, @lpereira
@danmosemsft, FYI, I thought more about your suggestion and added some (limited) reflection-based tests.