More Regex perf improvements #1348

stephentoub · 2020-01-07T05:17:37Z

Found some time over my vacation to play more with our regex implementation.

Primary changes here:

Auto-atomicification. Other regex libraries refer to this as auto-possessification, but .NET doesn't have a possessive quantifier, so I'm referring to it as atomicification instead :-), since possessive quantifiers are just shorthande for atomic groups. Backtracking is a huge source of cost in regex handling, in particular when matching fails, and it's really easy to write regular expressions that backtrack even when they don't have to. For example, the regex \d{1,2}\/\d{1,2}\/\d{2,4} as an approach to parsing a date... each of those repeaters will backtrack when a match fails, even though such backtracking is useless: since a / can never match \d, there's no way here that backtracking will help. A developer could instead write this using atomic groups, e.g. (?>\d{1,2})\/(?>\d{1,2})\/(?>\d{2,4}) but few developers think to do that. Instead, we can detect these conditions and automatically change the regex to make these atomic.
Primitives for atomic one, notone, and set loops. To reduce the amount of atomic nodes in the tree and to more easily communicate atomicity of these commonly found loops to the backend, I added atomic grouped version of the oneloop, notoneloop, and setloop primitives. These are then recognized in the backends. In addition to avoiding backtracking, we can generate better code even for the non-backtracking case. For example, prior to this PR an atomic node around a loop would result in the loop still outputting backtracking state, and then the atomic node would just cause that state to be skipped over by someone backtracking; we can instead for these primitives just avoid outputting that state to begin with.
Specialized code-gen for non-backtracking regexes. It turns out we generate a lot of boilerplate code to handle the possibility of backtracking, even if we never end up backtracking. And if you look at the code generated, it's very hard to follow / nothing like what you'd write if you were writing a custom implementation by hand. If we instead focus only on non-backtracking constructs, we can generate much better / smaller code, akin to what you'd write if you were manually writing code to match a regex by hand. I added a check to analyze the RegexNode tree to see whether we can support it, and then added a custom code path to generate the better code for these trees. This is helped significantly by the auto-atomicification, as it makes it more likely that a regex will be supported. This already handles a bunch of regexes, but additional work can be done to make it support more, e.g. the main thing it's missing that we can easily add is support for capture groups. As an example, the previously mentioned date regex \d{1,2}\/\d{1,2}\/\d{2,4}: with the code in master, the JIT ends up generating 2137 bytes of asm for the Go method, and with this PR, that shrinks to 758 bytes. As another example, consider a regex that just matches the string "abcdefghijklmnopqrstuvwxyz". With master, that results in a Go method that's 1204 bytes of asm, and just looking at the first six character checks for the unrolled loop portion, you can see why it's so much larger:

       6642837C410C61       cmp      word  ptr [rcx+2*r8+12], 97
       0F8527030000         jne      G_M56610_IG09
       448D4701             lea      r8d, [rdi+1]
       453BC1               cmp      r8d, r9d
       0F83F4030000         jae      G_M56610_IG17
       4D63C0               movsxd   r8, r8d
       6642837C410C62       cmp      word  ptr [rcx+2*r8+12], 98
       0F850A030000         jne      G_M56610_IG09
       448D4702             lea      r8d, [rdi+2]
       453BC1               cmp      r8d, r9d
       0F83D7030000         jae      G_M56610_IG17
       4D63C0               movsxd   r8, r8d
       6642837C410C63       cmp      word  ptr [rcx+2*r8+12], 99
       0F85ED020000         jne      G_M56610_IG09
       448D4703             lea      r8d, [rdi+3]
       453BC1               cmp      r8d, r9d
       0F83BA030000         jae      G_M56610_IG17
       4D63C0               movsxd   r8, r8d
       6642837C410C64       cmp      word  ptr [rcx+2*r8+12], 100
       0F85D0020000         jne      G_M56610_IG09
       448D4704             lea      r8d, [rdi+4]
       453BC1               cmp      r8d, r9d
       0F839D030000         jae      G_M56610_IG17
       4D63C0               movsxd   r8, r8d
       6642837C410C65       cmp      word  ptr [rcx+2*r8+12], 101
       0F85B3020000         jne      G_M56610_IG09
       448D4705             lea      r8d, [rdi+5]
       453BC1               cmp      r8d, r9d
       0F8380030000         jae      G_M56610_IG17
       4D63C0               movsxd   r8, r8d
       6642837C410C66       cmp      word  ptr [rcx+2*r8+12], 102
       0F8596020000         jne      G_M56610_IG09
       448D4706             lea      r8d, [rdi+6]
       453BC1               cmp      r8d, r9d
       0F8363030000         jae      G_M56610_IG17
       4D63C0               movsxd   r8, r8d

In contrast, with this PR, it's only 384 bytes, and the matching for the first six characters looks instead like this:

       6641833961           cmp      word  ptr [r9], 97
       0F8507010000         jne      G_M56613_IG10
       664183790262         cmp      word  ptr [r9+2], 98
       0F85FB000000         jne      G_M56613_IG10
       664183790463         cmp      word  ptr [r9+4], 99
       0F85EF000000         jne      G_M56613_IG10
       664183790664         cmp      word  ptr [r9+6], 100
       0F85E3000000         jne      G_M56613_IG10
       664183790865         cmp      word  ptr [r9+8], 101
       0F85D7000000         jne      G_M56613_IG10
       664183790A66         cmp      word  ptr [r9+10], 102
       0F85CB000000         jne      G_M56613_IG10

Vectorized searching. FindFirstChar often ends up in a loop looking for a character; we can use string.IndexOf instead. It's also common to have small sets that contain just two or three characters; we now use IndexOfAny for these. Such changes are helpful when the input is longer and the target characters infrequent. We now also use IndexOf in some cases in Go. For example, a subexpression like .* will end up searching for \n, and in the non-backtracking implementation we now use Span.IndexOf for that.
Singleton optimization. This is an optimization that converts a character class containing only a single character into a "one" node, which has optimized handling. I'd removed this optimization in a previous change erroneously, on the bad assumption that you'd only end up with one of these if someone manually created such a set in their regex... but the implementation itself creates these, and in common situations. I then went beyond reverting it and added special code-gen for it to EmitCallCharInClass.
Delayed JIT'ing. This is addressing a regression I introduced as part of a previous change to reduce allocations. In that change I got delegates from the DynamicMethods as part of constructing the compiled Regex, so as to avoid needing to allocate a new delegate every time we ran the regex and we couldn't use a cached instance. But in doing so, I ended up causing these methods to be JIT'd as part of constructing the Regex. This can hurt things like start-up, as we'd end up JIT'ing the code for a Regex created in a static cctor, but it can also reduce the parallelism possible if individual regexes were invoked for the first time concurrently.
EnsureStorage call. Every trip through the backtracking jump table, we were calling EnsureStorage, which checks the fields on the runner to see if the various arrays need to grow. But that means that before the call to EnsureStorage, we need to spill our locals out to the fields, so that it can examine them, and then after the call we need to reload those fields back into our locals. We can instead generate the checks that EnsureStorage does, but with our locals rather than with the fields, and then we only need to do the spilling if we actually need to grow the arrays.
Allocation reduction in RegexNode tree and RegexWriter. Most intermediate nodes in the tree only ever have 0 or 1 child, and for the 1 child case, we were creating a list presized to contain 4 elements. We now allow a single child to be stored directly, and only create the list when adding more than 1 child. Also, in RegexWriter, any operation in the RegexWriter that involves strings (strings for multi nodes, strings for sets, etc.) were storing those strings into both a dictionary and a list... turns out the latter was unnecessary and I got rid of it.
Tests. I re-enabled the netfx test configuration, to serve as an oracle and to help ensure we're not regressing. I also ported hundreds of test cases from mono, which it appears in turn ported from perl. This caught a few functional regressions introduced in some of the recent perf improvements, so I fixed those up.
Terminology. One thing that was really annoying me was terminology used in the implementation for atomic groups. For whatever reason, the implementation was referring to them as "Greedy" groups, which is then easily confused with the greediness of a loop like .* as opposed to a lazy one like .*?. I've renamed all of the relevant usage in the implementation from greedy to atomic, leaving the use of greedy in cases where it was referring to the greedy vs lazy case. I think we should also update the docs to refer to atomic groups / subexpressions rather than greedy groups / subexpressions, as the former is established terminology.

Other minor changes:

Codegen tweaks. For example, the JIT generates slightly better code for span[i] = x; i++; than it does for span[i++] = x;; since we're manually writing out the IL, we can just do it for the former instead of the latter.
Style fixes. Tried to keep these in separate commits.

Running the perf tests from dotnet/performance#1117 on .NET Core 3.1, master, and this PR:

Method	Toolchain	Options	Mean	Error	StdDev	Ratio	Allocated
Email_IsMatch	netcore31	None	623.02 ns	3.051 ns	2.704 ns	1.00	-
Email_IsMatch	master	None	468.31 ns	2.477 ns	2.068 ns	0.75	-
Email_IsMatch	pr	None	418.27 ns	1.983 ns	1.758 ns	0.67	-

Email_IsNotMatch	netcore31	None	1,568.18 ns	11.984 ns	10.008 ns	1.00	-
Email_IsNotMatch	master	None	1,290.24 ns	10.016 ns	8.364 ns	0.82	-
Email_IsNotMatch	pr	None	754.01 ns	4.181 ns	3.911 ns	0.48	-

Date_IsMatch	netcore31	None	444.95 ns	1.403 ns	1.095 ns	1.00	-
Date_IsMatch	master	None	195.94 ns	0.877 ns	0.777 ns	0.44	-
Date_IsMatch	pr	None	184.96 ns	0.622 ns	0.552 ns	0.42	-

Date_IsNotMatch	netcore31	None	1,500.69 ns	7.065 ns	6.263 ns	1.00	-
Date_IsNotMatch	master	None	743.07 ns	2.210 ns	2.067 ns	0.50	-
Date_IsNotMatch	pr	None	590.19 ns	2.947 ns	2.757 ns	0.39	-

IP_IsMatch	netcore31	None	2,802.70 ns	4.802 ns	4.256 ns	1.00	-
IP_IsMatch	master	None	1,999.82 ns	19.084 ns	16.918 ns	0.71	-
IP_IsMatch	pr	None	2,005.45 ns	7.103 ns	6.644 ns	0.72	-

IP_IsNotMatch	netcore31	None	2,505.86 ns	10.129 ns	9.475 ns	1.00	-
IP_IsNotMatch	master	None	1,830.37 ns	7.060 ns	6.603 ns	0.73	-
IP_IsNotMatch	pr	None	1,858.30 ns	8.521 ns	7.971 ns	0.74	-

Uri_IsMatch	netcore31	None	560.87 ns	2.793 ns	2.612 ns	1.00	-
Uri_IsMatch	master	None	290.85 ns	1.052 ns	0.878 ns	0.52	-
Uri_IsMatch	pr	None	281.03 ns	1.381 ns	1.291 ns	0.50	-

Uri_IsNotMatch	netcore31	None	2,549.70 ns	14.587 ns	13.644 ns	1.00	-
Uri_IsNotMatch	master	None	1,307.79 ns	14.294 ns	11.936 ns	0.51	-
Uri_IsNotMatch	pr	None	1,135.40 ns	4.717 ns	4.412 ns	0.45	-

MatchesSet	netcore31	None	485,601.08 ns	2,335.707 ns	1,950.421 ns	1.00	6704 B
MatchesSet	master	None	177,524.14 ns	922.807 ns	863.195 ns	0.37	6704 B
MatchesSet	pr	None	174,293.63 ns	999.666 ns	834.766 ns	0.36	6704 B

MatchesBoundary	netcore31	None	393,404.51 ns	2,637.810 ns	2,202.690 ns	1.00	6704 B
MatchesBoundary	master	None	131,526.87 ns	736.618 ns	652.993 ns	0.33	6704 B
MatchesBoundary	pr	None	135,595.54 ns	1,538.993 ns	1,364.277 ns	0.34	6704 B

MatchesWord	netcore31	None	3,639.83 ns	20.716 ns	18.364 ns	1.00	1488 B
MatchesWord	master	None	3,531.38 ns	16.601 ns	15.529 ns	0.97	1488 B
MatchesWord	pr	None	3,501.90 ns	15.131 ns	13.413 ns	0.96	1488 B

MatchesWords	netcore31	None	104,433.48 ns	525.353 ns	491.415 ns	1.00	3512 B
MatchesWords	master	None	66,421.70 ns	253.156 ns	224.416 ns	0.64	3512 B
MatchesWords	pr	None	65,292.04 ns	305.344 ns	285.619 ns	0.63	3512 B

MatchWord	netcore31	None	2,660.92 ns	16.474 ns	15.410 ns	1.00	208 B
MatchWord	master	None	1,732.68 ns	11.956 ns	10.599 ns	0.65	208 B
MatchWord	pr	None	1,780.01 ns	35.347 ns	31.334 ns	0.67	208 B

ReplaceWords	netcore31	None	114,003.27 ns	2,144.486 ns	2,005.954 ns	1.00	9970 B
ReplaceWords	master	None	67,036.62 ns	506.997 ns	474.245 ns	0.59	9968 B
ReplaceWords	pr	None	66,631.49 ns	501.255 ns	468.875 ns	0.58	9968 B

SplitWords	netcore31	None	107,249.50 ns	699.472 ns	654.287 ns	1.00	11152 B
SplitWords	master	None	66,851.06 ns	313.541 ns	277.946 ns	0.62	11152 B
SplitWords	pr	None	68,735.01 ns	356.457 ns	333.430 ns	0.64	11152 B

Ctor	netcore31	None	6,103.89 ns	40.322 ns	37.718 ns	1.00	10712 B
Ctor	master	None	5,475.36 ns	58.453 ns	54.677 ns	0.90	9752 B
Ctor	pr	None	5,723.42 ns	26.869 ns	23.818 ns	0.94	7560 B

CtorInvoke	netcore31	None	6,714.64 ns	38.538 ns	36.049 ns	1.00	13208 B
CtorInvoke	master	None	6,203.35 ns	39.082 ns	34.646 ns	0.92	12304 B
CtorInvoke	pr	None	6,710.59 ns	33.666 ns	31.491 ns	1.00	9856 B

Email_IsMatch	netcore31	Compiled	401.20 ns	2.684 ns	2.511 ns	1.00	-
Email_IsMatch	master	Compiled	156.48 ns	0.652 ns	0.610 ns	0.39	-
Email_IsMatch	pr	Compiled	146.44 ns	0.600 ns	0.469 ns	0.37	-

Email_IsNotMatch	netcore31	Compiled	760.27 ns	1.364 ns	1.139 ns	1.00	-
Email_IsNotMatch	master	Compiled	417.07 ns	1.628 ns	1.522 ns	0.55	-
Email_IsNotMatch	pr	Compiled	212.55 ns	1.259 ns	1.116 ns	0.28	-

Date_IsMatch	netcore31	Compiled	353.99 ns	1.439 ns	1.346 ns	1.00	-
Date_IsMatch	master	Compiled	76.42 ns	0.438 ns	0.410 ns	0.22	-
Date_IsMatch	pr	Compiled	69.44 ns	0.164 ns	0.153 ns	0.20	-

Date_IsNotMatch	netcore31	Compiled	1,102.17 ns	13.635 ns	12.754 ns	1.00	-
Date_IsNotMatch	master	Compiled	241.03 ns	2.021 ns	1.791 ns	0.22	-
Date_IsNotMatch	pr	Compiled	160.44 ns	0.971 ns	0.908 ns	0.15	-

IP_IsMatch	netcore31	Compiled	1,206.32 ns	5.444 ns	4.826 ns	1.00	-
IP_IsMatch	master	Compiled	383.81 ns	1.530 ns	1.431 ns	0.32	-
IP_IsMatch	pr	Compiled	360.25 ns	2.052 ns	1.819 ns	0.30	-

IP_IsNotMatch	netcore31	Compiled	1,150.44 ns	4.192 ns	3.922 ns	1.00	-
IP_IsNotMatch	master	Compiled	362.96 ns	2.240 ns	1.986 ns	0.32	-
IP_IsNotMatch	pr	Compiled	335.52 ns	2.375 ns	2.222 ns	0.29	-

Uri_IsMatch	netcore31	Compiled	387.61 ns	0.343 ns	0.286 ns	1.00	-
Uri_IsMatch	master	Compiled	103.85 ns	0.447 ns	0.397 ns	0.27	-
Uri_IsMatch	pr	Compiled	94.90 ns	0.500 ns	0.468 ns	0.24	-

Uri_IsNotMatch	netcore31	Compiled	1,752.44 ns	10.116 ns	9.462 ns	1.00	-
Uri_IsNotMatch	master	Compiled	324.73 ns	1.771 ns	1.656 ns	0.19	-
Uri_IsNotMatch	pr	Compiled	271.42 ns	0.939 ns	0.879 ns	0.15	-

MatchesSet	netcore31	Compiled	384,247.92 ns	874.748 ns	775.442 ns	1.00	6705 B
MatchesSet	master	Compiled	62,538.17 ns	386.138 ns	361.193 ns	0.16	6704 B
MatchesSet	pr	Compiled	45,438.50 ns	377.098 ns	352.738 ns	0.12	6704 B

MatchesBoundary	netcore31	Compiled	296,852.48 ns	501.091 ns	468.721 ns	1.00	6705 B
MatchesBoundary	master	Compiled	67,112.18 ns	296.617 ns	277.456 ns	0.23	6704 B
MatchesBoundary	pr	Compiled	54,094.37 ns	335.814 ns	297.691 ns	0.18	6704 B

MatchesWord	netcore31	Compiled	1,548.42 ns	13.773 ns	11.501 ns	1.00	1488 B
MatchesWord	master	Compiled	1,539.23 ns	12.271 ns	10.878 ns	1.00	1488 B
MatchesWord	pr	Compiled	1,501.77 ns	13.623 ns	12.076 ns	0.97	1488 B

MatchesWords	netcore31	Compiled	58,428.19 ns	264.911 ns	247.798 ns	1.00	3512 B
MatchesWords	master	Compiled	20,576.37 ns	128.042 ns	119.771 ns	0.35	3512 B
MatchesWords	pr	Compiled	11,337.64 ns	109.763 ns	102.672 ns	0.19	3512 B

MatchWord	netcore31	Compiled	1,317.14 ns	7.897 ns	7.386 ns	1.00	208 B
MatchWord	master	Compiled	458.43 ns	3.379 ns	3.161 ns	0.35	208 B
MatchWord	pr	Compiled	306.76 ns	3.609 ns	3.014 ns	0.23	208 B

ReplaceWords	netcore31	Compiled	59,539.32 ns	344.087 ns	321.859 ns	1.00	9968 B
ReplaceWords	master	Compiled	22,202.34 ns	100.450 ns	89.047 ns	0.37	9968 B
ReplaceWords	pr	Compiled	12,029.63 ns	173.355 ns	162.156 ns	0.20	9968 B

SplitWords	netcore31	Compiled	59,326.27 ns	289.645 ns	256.762 ns	1.00	11152 B
SplitWords	master	Compiled	22,158.89 ns	86.230 ns	80.660 ns	0.37	11152 B
SplitWords	pr	Compiled	11,975.59 ns	171.859 ns	160.757 ns	0.20	11152 B

Ctor	netcore31	Compiled	33,124.78 ns	249.868 ns	233.727 ns	1.00	36256 B
Ctor	master	Compiled	45,728.61 ns	606.260 ns	567.096 ns	1.38	34415 B
Ctor	pr	Compiled	26,362.50 ns	175.571 ns	155.639 ns	0.80	27976 B

CtorInvoke	netcore31	Compiled	6,478,557.36 ns	58,365.438 ns	54,595.066 ns	1.00	43271 B
CtorInvoke	master	Compiled	4,980,893.08 ns	22,577.260 ns	20,014.152 ns	0.77	36825 B
CtorInvoke	pr	Compiled	4,887,560.49 ns	15,232.326 ns	13,503.060 ns	0.75	33329 B

Email_IsMatch	netcore31	IgnoreCase, Compiled	524.15 ns	3.031 ns	2.835 ns	1.00	-
Email_IsMatch	master	IgnoreCase, Compiled	314.26 ns	0.355 ns	0.296 ns	0.60	-
Email_IsMatch	pr	IgnoreCase, Compiled	292.36 ns	1.502 ns	1.405 ns	0.56	-

Email_IsNotMatch	netcore31	IgnoreCase, Compiled	1,036.26 ns	4.353 ns	4.072 ns	1.00	-
Email_IsNotMatch	master	IgnoreCase, Compiled	694.80 ns	4.730 ns	3.950 ns	0.67	-
Email_IsNotMatch	pr	IgnoreCase, Compiled	371.25 ns	2.063 ns	1.929 ns	0.36	-

Date_IsMatch	netcore31	IgnoreCase, Compiled	445.45 ns	1.993 ns	1.865 ns	1.00	-
Date_IsMatch	master	IgnoreCase, Compiled	100.64 ns	0.325 ns	0.304 ns	0.23	-
Date_IsMatch	pr	IgnoreCase, Compiled	90.90 ns	0.681 ns	0.637 ns	0.20	-

Date_IsNotMatch	netcore31	IgnoreCase, Compiled	1,290.41 ns	2.768 ns	2.453 ns	1.00	-
Date_IsNotMatch	master	IgnoreCase, Compiled	361.69 ns	0.650 ns	0.543 ns	0.28	-
Date_IsNotMatch	pr	IgnoreCase, Compiled	255.74 ns	1.706 ns	1.596 ns	0.20	-

IP_IsMatch	netcore31	IgnoreCase, Compiled	1,752.59 ns	3.323 ns	2.945 ns	1.00	-
IP_IsMatch	master	IgnoreCase, Compiled	1,107.62 ns	5.786 ns	5.412 ns	0.63	-
IP_IsMatch	pr	IgnoreCase, Compiled	1,044.35 ns	5.011 ns	4.687 ns	0.60	-

IP_IsNotMatch	netcore31	IgnoreCase, Compiled	1,693.27 ns	4.067 ns	3.605 ns	1.00	-
IP_IsNotMatch	master	IgnoreCase, Compiled	1,095.16 ns	6.315 ns	5.907 ns	0.65	-
IP_IsNotMatch	pr	IgnoreCase, Compiled	1,040.71 ns	4.134 ns	3.867 ns	0.61	-

Uri_IsMatch	netcore31	IgnoreCase, Compiled	509.52 ns	3.097 ns	2.745 ns	1.00	-
Uri_IsMatch	master	IgnoreCase, Compiled	195.40 ns	0.830 ns	0.776 ns	0.38	-
Uri_IsMatch	pr	IgnoreCase, Compiled	194.83 ns	2.642 ns	2.342 ns	0.38	-

Uri_IsNotMatch	netcore31	IgnoreCase, Compiled	2,253.63 ns	10.972 ns	9.727 ns	1.00	-
Uri_IsNotMatch	master	IgnoreCase, Compiled	936.70 ns	6.150 ns	5.452 ns	0.42	-
Uri_IsNotMatch	pr	IgnoreCase, Compiled	839.51 ns	7.670 ns	6.405 ns	0.37	-

MatchesSet	netcore31	IgnoreCase, Compiled	431,135.91 ns	2,740.110 ns	2,563.100 ns	1.00	6705 B
MatchesSet	master	IgnoreCase, Compiled	152,123.57 ns	1,286.211 ns	1,203.122 ns	0.35	6704 B
MatchesSet	pr	IgnoreCase, Compiled	130,385.23 ns	619.477 ns	579.459 ns	0.30	6704 B

MatchesBoundary	netcore31	IgnoreCase, Compiled	344,429.58 ns	1,807.146 ns	1,690.406 ns	1.00	6705 B
MatchesBoundary	master	IgnoreCase, Compiled	112,308.73 ns	623.921 ns	553.089 ns	0.33	6705 B
MatchesBoundary	pr	IgnoreCase, Compiled	110,686.87 ns	1,684.881 ns	1,576.039 ns	0.32	6704 B

MatchesWord	netcore31	IgnoreCase, Compiled	7,405.59 ns	47.152 ns	44.106 ns	1.00	1488 B
MatchesWord	master	IgnoreCase, Compiled	7,137.93 ns	24.059 ns	20.090 ns	0.96	1488 B
MatchesWord	pr	IgnoreCase, Compiled	7,094.48 ns	74.635 ns	69.813 ns	0.96	1488 B

MatchesWords	netcore31	IgnoreCase, Compiled	86,403.27 ns	525.438 ns	491.495 ns	1.00	3512 B
MatchesWords	master	IgnoreCase, Compiled	52,272.24 ns	357.391 ns	316.818 ns	0.61	3512 B
MatchesWords	pr	IgnoreCase, Compiled	45,188.72 ns	205.831 ns	182.463 ns	0.52	3512 B

MatchWord	netcore31	IgnoreCase, Compiled	2,144.65 ns	4.196 ns	3.925 ns	1.00	208 B
MatchWord	master	IgnoreCase, Compiled	1,359.17 ns	8.709 ns	8.146 ns	0.63	208 B
MatchWord	pr	IgnoreCase, Compiled	1,183.68 ns	7.559 ns	7.070 ns	0.55	208 B

ReplaceWords	netcore31	IgnoreCase, Compiled	86,858.44 ns	189.816 ns	177.554 ns	1.00	9968 B
ReplaceWords	master	IgnoreCase, Compiled	53,635.09 ns	275.973 ns	230.450 ns	0.62	9968 B
ReplaceWords	pr	IgnoreCase, Compiled	47,319.73 ns	281.218 ns	263.051 ns	0.54	9968 B

SplitWords	netcore31	IgnoreCase, Compiled	88,568.31 ns	466.583 ns	436.442 ns	1.00	11152 B
SplitWords	master	IgnoreCase, Compiled	53,237.67 ns	418.119 ns	370.651 ns	0.60	11152 B
SplitWords	pr	IgnoreCase, Compiled	46,646.75 ns	868.290 ns	852.777 ns	0.53	11152 B

Ctor	netcore31	IgnoreCase, Compiled	35,398.57 ns	369.582 ns	327.625 ns	1.00	37072 B
Ctor	master	IgnoreCase, Compiled	53,360.30 ns	474.193 ns	443.561 ns	1.51	36919 B
Ctor	pr	IgnoreCase, Compiled	31,597.71 ns	165.355 ns	146.583 ns	0.89	30032 B

CtorInvoke	netcore31	IgnoreCase, Compiled	6,761,415.56 ns	81,006.588 ns	75,773.613 ns	1.00	44296 B
CtorInvoke	master	IgnoreCase, Compiled	5,628,468.33 ns	43,293.714 ns	40,496.967 ns	0.83	39313 B
CtorInvoke	pr	IgnoreCase, Compiled	5,432,093.33 ns	49,266.756 ns	46,084.154 ns	0.80	35696 B

I also tried out the regex redux code from the perf repo, but on the full input5000000.txt input rather than the much reduced one it uses. Results:

netcoreapp31: 4.92s
master: 3.04s
pr: 2.69s

There is still a lot more that can / should be done. I've opened an issue (#1349) to track that work, and to hopefully enlist help in working through the list, and also finding additional improvement opportunities.

cc: @danmosemsft, @ViktorHofer, @eerhardt, @davidwrighton

stephentoub · 2020-01-07T17:52:44Z

@wfurt, several legs are failing here in a stack overflow in the NetworkInformation tests. Can you please take a look?

wfurt · 2020-01-07T18:31:34Z

NetworkInformation seems busted cross the board. I will take look. As far as the OSX OpenSSL, this may be version mismatch discussed here #1129.

danmoseley · 2020-01-07T18:32:44Z

@stephentoub there's several independent optimizations here -- would it make sense to merge commit this, rather than squash? If an issue occurs, it would be helpful to be able to bisect, assuming each commit is functional.

stephentoub · 2020-01-07T18:34:41Z

would it make sense to merge commit this, rather than squash? If an issue occurs, it would be helpful to be able to bisect, assuming each commit is functional

Up to you. My default would be to squash, but we can merge if you like. I can try to clean up the commits to squash some of them individually and make it so that they're actually bisect-able, but I'm not sure how much work that'll be.

danmoseley · 2020-01-07T18:41:59Z

Up to you. My default would be to squash, but we can merge if you like. I can try to clean up the commits to squash some of them individually and make it so that they're actually bisect-able, but I'm not sure how much work that'll be.

It's large enough that I think it would be worth not squashing, but I think that still has value even if you don't feel like spending more time making them bisectable. (It would just mean we would have to make them bisectable at that point, but we'd have a start on it.)

pgovind

Just a couple comments.

...ibraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexCharClass.cs

src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexCode.cs

src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexFCD.cs

src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexNode.cs

stephentoub · 2020-01-09T03:06:38Z

Just as a final example to highlight the differences in code gen...

Here's one of the regular expressions in the regex redux test: <[^>]*>

With master, the generated Go method decompiles into this:

string runtext = base.runtext;
int runtextstart = runtextstart;
int runtextbeg = runtextbeg;
int runtextend = base.runtextend;
int num = runtextpos;
int[] runtrack = base.runtrack;
int runtrackpos = base.runtrackpos;
int[] runstack = base.runstack;
int runstackpos = base.runstackpos;
runtrack[--runtrackpos] = num;
runtrack[--runtrackpos] = 0;
runstack[--runstackpos] = num;
runtrack[--runtrackpos] = 1;
int num4;
if (num<runtextend && runtext[num++] == '<')
{
	int num3;
	num4 = (num3 = runtextend - num) + 1;
	while (--num4 > 0)
	{
		if (runtext[num++] == '>')
		{
			num--;
			break;
		}
	}
	if (num3 > num4)
	{
		runtrack[--runtrackpos] = num3 - num4 - 1;
		runtrack[--runtrackpos] = num - 1;
		runtrack[--runtrackpos] = 2;
	}
	goto IL_00f2;
}
goto IL_0147;
IL_00f2:
if (num<runtextend && runtext[num++] == '>')
{
	num4 = runstack[runstackpos];
	Capture(0, num4, num);
	runtrack[--runtrackpos] = num4;
	runtrack[runtrackpos - 1] = 3;
	goto IL_013e;
}
goto IL_0147;
IL_0147:
while (true)
{
	base.runtrackpos = runtrackpos;
	base.runstackpos = runstackpos;
	EnsureStorage();
	runtrackpos = base.runtrackpos;
	runstackpos = base.runstackpos;
	runtrack = base.runtrack;
	runstack = base.runstack;
	switch (runtrack[runtrackpos++])
	{
	case 1:
		runstackpos++;
		continue;
	case 2:
		goto IL_01b8;
	case 3:
		runstack[--runstackpos] = runtrack[runtrackpos++];
		Uncapture();
		continue;
	}
	break;
}
num = runtrack[runtrackpos];
goto IL_013e;
IL_013e:
runtextpos = num;
return;
IL_01b8:
num = runtrack[runtrackpos++];
num4 = runtrack[runtrackpos++];
if (num4 > 0)
{
	runtrack[--runtrackpos] = num4 - 1;
	runtrack[--runtrackpos] = num - 1;
	runtrack[--runtrackpos] = 2;
}
goto IL_00f2;

With this PR, the generated Go method now decompiles into this:

string runtext = base.runtext;
int runtextpos;
int num = runtextpos = base.runtextpos;
ReadOnlySpan<char> readOnlySpan = runtext.AsSpan(num, runtextend - num);
if (0u < (uint)readOnlySpan.Length && readOnlySpan[0] == '<')
{
    int num2 = readOnlySpan.Slice(1).IndexOf<char>('>');
    if (num2 == -1)
    {
        num2 = readOnlySpan.Length - 1;
    }
    readOnlySpan = readOnlySpan.Slice(num2);
    num += num2;
    if (1u < (uint)readOnlySpan.Length && readOnlySpan[1] == '>')
    {
        base.runtextpos = num + 2;
        Capture(0, runtextpos, base.runtextpos);
    }
}

danmoseley · 2020-01-09T03:37:09Z

That’s way more readable.. like I can actually read it

I'd previously removed multiple delegate allocations that were being incurred every time we needed to run a compiled regex concurrently. In doing so, however, I inadvertently forced JIT'ing to occur when the Regex was created rather than on first use; that can in turn lead to more start-up overheads for regexes that are created on startup (e.g. to initialize a static field) but that aren't used at start-up, as well as reduce potential parallelism that could otherwise occur for distinct regexes being run for the first time concurrently. This addresses the issue by still caching delegates, but in addition to rather than instead of the DynamicMethod objects.

I had removed it because it seemed illogical that anyone would write such character classes. But it turns out the implementation/parser itself generates them for certain forms, e.g. "hello|hithere" will create a character class containing just 'h' in order to find the start of the alternation.

- Only use ilg from "macro" helpers - Move some blocks of code around

Takes advantage of vectorization in string.IndexOf.

The JIT generates slightly better code for: ```C# text[pos] = ....; pos++; ``` instead of: ```C# text[pos++] = ...; ``` so use the IL equivalent of the former.

Only spill if we have to grow the relevant arrays.

In an expression like A*B, backtracing may be involved even if it's impossible that it'll do any good, namely if there's no overlap between the A and B sets. This backtracking can be avoided by using a greedy subexpression, e.g. (?>A*)B, but a) few developers think to do that, and b) even when they do the implementation has more overhead than is desirable. This change does three things: 1. It introduces primitives for "oneloopgreedy" and "setloopgreedy" that correspond to the existing "oneloop" (a single character in a loop) and "setloop" (a char class in a loop) primitives. 2. It reduces a Greedy node that contains a one/setloop to instead just be a one/setloopgreedy node. 3. It finds cases of one/setloop concatenated with other things, and automatically updates those to be one/setloopgreedy when it can provie that the backtracking would have no benefit. (2) removes a small amount of overhead, in particular in compiled, because it avoids needing to spit additional tracking state for backtracking. (3) can likely be expanded in the future to catch more cases, but it handles the common cases now. This has a very measurable impact on non-matches that would otherwise trigger backtracking into the subexpression.

When written as a* these are already transformed appropriately by the parser, but other reductions can expose them in a way that the parser won't see. For example, (?:a)* ends up being a much more expensive regex even though it's identical functionally. This reduces the latter to the former, for cases where a {Lazy}loop is found to directly wrap a Set/One/Notone.

When we can determine from the regex that backtracking isn't needed, we can generate much smaller / better code in RegexCompiler.GenerateGo.

I initially mistakenly thought this wouldn't be useful, but it is. For example, the expression ".*\n" can be made non-backtracing.

…repeaters

With all the changes being made in the regex implementation in this release, we want to make sure we don't regress, and a good way to help doing that is add a bunch of tests and ensure they pass on netfx as well, where appropriate.

The Boolean logic was previously wrong, leading to characters being included in the set even if they weren't meant to be, e.g. given the set [^bc], this should include everything other than b and c, but if invariant culture was being used, it incorrectly included B and C in the set.

The greedy naming is really confusing (we should fix it in the framework documentation as well). In general regular expression nomenclature, "greedy" is used as the opposite of "lazy", indicating how much a loop initially consumes, e.g. does `a*` first try consuming as many 'a's as possible or does it first try consuming as few 'a's as possible... it's "greedy" (denoted as `a*`) if it consumes as many as possible and "lazy" (denoted as `a*?` if it consumes as few. How aggressively it consumes, however, is orthogonal to whether it backtracks. Whereas `a*` is greedy and backtracking and `a*?` is lazy and backtracking, `(?>a*)` is greedy and non-backtracking and `(?>a*?) is lazy and non-backtracking. Unfortunately, the nomenclature in the implementation and the documentation describes the `(?> ... )` as being a "greedy subexpression", which then conflates the meaning of "greedy". The rest of the industry refers to these instead as "atomic", so I've changed it to that in the implementation.

- In Go, Improve code generation from several constructs to reduce code size and increase bounds reduction in the non-backtracking compiler - In FindFirstChars, use IndexOf instead of Boyer-Moore if the found prefix is only one character in length. - Hide some debug-only functions from code coverage

stephentoub · 2020-01-09T10:41:59Z

Merging with a merge commit per Dan's request

Dotnet-GitSync-Bot added the area-System.Text.RegularExpressions label Jan 7, 2020

stephentoub mentioned this pull request Jan 7, 2020

Regex optimization opportunities #1349

Closed

41 tasks

stephentoub added this to the 5.0 milestone Jan 7, 2020

stephentoub mentioned this pull request Jan 7, 2020

Expand Regex perf tests dotnet/performance#1117

Merged

stephentoub force-pushed the moreregexperf branch from aefaf7e to ba8a5c7 Compare January 7, 2020 16:10

danmoseley requested a review from pgovind January 7, 2020 18:30

stephentoub force-pushed the moreregexperf branch 3 times, most recently from 2bf275e to ee0ea7c Compare January 8, 2020 03:26

devatwork mentioned this pull request Jan 8, 2020

Patterns with very large minoccurs crash bwrrp/xspattern.js#11

Open

pgovind approved these changes Jan 8, 2020

View reviewed changes

stephentoub force-pushed the moreregexperf branch 2 times, most recently from 429d659 to ec7f4d3 Compare January 9, 2020 02:59

stephentoub added 10 commits January 8, 2020 22:50

Clean up more code in RegexCompiler

8c28c7c

- Only use ilg from "macro" helpers - Move some blocks of code around

Use IndexOf{Any} for case-sensitive singletons in FindFirstChar

26a848f

Takes advantage of vectorization in string.IndexOf.

Add EmitCallCharInClass optimization for singleton

3cd3b37

Fix inadvertent closure in EmitCallCharInClass

69b5e4c

Remove stale generated local in RegexCompiler

4fa40d3

Minor RegexCompiler codegen tweak for text[pos++]

6f7f4f9

The JIT generates slightly better code for: ```C# text[pos] = ....; pos++; ``` instead of: ```C# text[pos++] = ...; ``` so use the IL equivalent of the former.

Clean up minor style issues in RegexCompiler

a6d125b

Code spit EnsureStorage replacement to avoid spilling locals

b2c639e

Only spill if we have to grow the relevant arrays.

stephentoub added 20 commits January 8, 2020 22:50

Optimize allocation for child lists of RegexNodes

974dfbd

Improve StringCode lookup in RegexWriter to avoid unnecessary list

820727e

Add optimized non-backtracking code gen for limited subset of regexes

2a2915b

When we can determine from the regex that backtracking isn't needed, we can generate much smaller / better code in RegexCompiler.GenerateGo.

Upgrade some loops followed by boundaries to be greedy

91b9137

Auto-greedy last set/one loop in regex

3a1059a

Add Notoneloopgreedy to go with One/Setloopgreedy

807fcc3

I initially mistakenly thought this wouldn't be useful, but it is. For example, the expression ".*\n" can be made non-backtracing.

Use vectorized IndeOf for some notoneloopgreedy loops

8ed69c9

Add non-backtracking go support for anchors, alternation, lookahead, …

d06f0f5

…repeaters

Avoid unnecessary operations when textSpanPos == 0

cf0dcee

Auto-greedy for last Alternate/Loop/Lazyloop in regex

2f89d44

Add non-backtracing support for greedy loops and lazyloops

b54df89

Restore netfx test configuration to serve as oracle

9b2be42

With all the changes being made in the regex implementation in this release, we want to make sure we don't regress, and a good way to help doing that is add a bunch of tests and ensure they pass on netfx as well, where appropriate.

Add more tests, and fix some bugs found

23430e5

Address PR feedback

3f270ff

Add a few more tests / improve code coverage

95965c9

stephentoub force-pushed the moreregexperf branch from ec7f4d3 to 95965c9 Compare January 9, 2020 03:53

stephentoub merged commit 0434b87 into dotnet:master Jan 9, 2020

stephentoub deleted the moreregexperf branch January 9, 2020 10:42

stephentoub added the tenet-performance Performance related issue label Jan 12, 2020

ghost locked as resolved and limited conversation to collaborators Dec 11, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More Regex perf improvements #1348

More Regex perf improvements #1348

stephentoub commented Jan 7, 2020 •

edited

Loading

stephentoub commented Jan 7, 2020

wfurt commented Jan 7, 2020

danmoseley commented Jan 7, 2020

stephentoub commented Jan 7, 2020

danmoseley commented Jan 7, 2020

pgovind left a comment

stephentoub commented Jan 9, 2020 •

edited

Loading

danmoseley commented Jan 9, 2020

stephentoub commented Jan 9, 2020

More Regex perf improvements #1348

More Regex perf improvements #1348

Conversation

stephentoub commented Jan 7, 2020 • edited Loading

stephentoub commented Jan 7, 2020

wfurt commented Jan 7, 2020

danmoseley commented Jan 7, 2020

stephentoub commented Jan 7, 2020

danmoseley commented Jan 7, 2020

pgovind left a comment

Choose a reason for hiding this comment

stephentoub commented Jan 9, 2020 • edited Loading

danmoseley commented Jan 9, 2020

stephentoub commented Jan 9, 2020

stephentoub commented Jan 7, 2020 •

edited

Loading

stephentoub commented Jan 9, 2020 •

edited

Loading