Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize newline handling for RegexOptions.Multiline #34566

Merged
merged 2 commits into from
Apr 9, 2020

Conversation

stephentoub
Copy link
Member

@stephentoub stephentoub commented Apr 5, 2020

We previously didn't do any special handling of beginning-of-line anchors (^ when RegexOptions.Multiline is specified). This PR adds special handling for the anchor so that FindFirstChar will jump to the next newline as part of its processing.

As part of this, I also cleaned up some of the anchor handling code. The RegexPrefixAnalyzer only ever returns a single anchor, but the rest of the code was written such that it was expecting multiple anchors.

Example:
Finding all lines in Romeo and Juliet that contain the word "love":

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System.IO;
using System.Text.RegularExpressions;

public class Program
{
    static void Main(string[] args) => BenchmarkSwitcher.FromAssemblies(new[] { typeof(Program).Assembly }).Run(args);

    private readonly string _input = File.ReadAllText(@"C:\romeojuliet.txt");
    private Regex _regex;

    [Params(false, true)]
    public bool Compiled { get; set; }

    [GlobalSetup]
    public void Setup() => _regex = new Regex(@"^.*\blove\b.*$", RegexOptions.Multiline | (Compiled ? RegexOptions.Compiled : RegexOptions.None));

    [Benchmark]
    public int Count() => _regex.Matches(_input).Count;
}
Method Toolchain Compiled Mean Error StdDev Median Ratio RatioSD
Count \netcore31\corerun.exe False 21.184 ms 0.1938 ms 0.1619 ms 21.179 ms 2.15 0.05
Count \master\corerun.exe False 9.868 ms 0.2870 ms 0.2545 ms 9.784 ms 1.00 0.00
Count \pr\corerun.exe False 4.436 ms 0.0482 ms 0.0428 ms 4.445 ms 0.45 0.01
Count \netcore31\corerun.exe True 16.929 ms 0.3705 ms 1.0630 ms 16.122 ms 3.96 0.23
Count \master\corerun.exe True 4.292 ms 0.0415 ms 0.0388 ms 4.272 ms 1.00 0.00
Count \pr\corerun.exe True 2.167 ms 0.0045 ms 0.0040 ms 2.166 ms 0.50 0.00

Contributes to #1349
cc: @pgovind, @eerhardt, @ViktorHofer, @danmosemsft

@stephentoub stephentoub added this to the 5.0 milestone Apr 5, 2020
Copy link
Member

@eerhardt eerhardt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Nice improvement!

Copy link
Contributor

@pgovind pgovind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!!

@danmoseley
Copy link
Member

danmoseley commented Apr 6, 2020

Another case to handle if @jzabroski still plans to revive dotnet/corefx#41195

edit: I meant #1449

@jzabroski
Copy link
Contributor

@danmosemsft I do plan to. Just got trapped under mountain of COVID-19 DR planning work. I expect to give it the old college try in about 2 weeks. Thanks for thinking of me and tagging me.

@danmoseley
Copy link
Member

Sure @jzabroski.. we will get there eventually!

We previously didn't do any special handling of beginning-of-line anchors (^ when RegexOptions.Multiline is specified).  This PR adds special handling for the anchor so that FindFirstChar will jump to the next newline as part of its processing.

As part of this, I also cleaned up some of the anchor handling code.  The RegexPrefixAnalyzer only ever returns a single anchor, but the rest of the code was written such that it was expecting multiple anchors.
Also factor out a few lines of duplication.
@stephentoub stephentoub merged commit a83fe5f into dotnet:master Apr 9, 2020
@stephentoub stephentoub deleted the bolffc branch April 9, 2020 16:36
@ghost ghost locked as resolved and limited conversation to collaborators Dec 9, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants