Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Perf] Regression in Perf_Regex.Uri_IsNotMatch(IgnoreCase, Compiled) #61786

Closed
performanceautofiler bot opened this issue Nov 18, 2021 · 8 comments · Fixed by #67184
Closed

[Perf] Regression in Perf_Regex.Uri_IsNotMatch(IgnoreCase, Compiled) #61786

performanceautofiler bot opened this issue Nov 18, 2021 · 8 comments · Fixed by #67184

Comments

@performanceautofiler
Copy link

Run Information

Architecture arm64
OS Windows 10.0.19041
Baseline 10e107debae28de4bd1e710cfe448be3c293e841
Compare 9962c10813de695782baa20f1bf0bb2e9810971d
Diff Diff

Regressions in System.Text.RegularExpressions.Tests.Perf_Regex_Common

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio Baseline ETL Compare ETL
Uri_IsNotMatch - Duration of single invocation 315.14 ns 557.11 ns 1.77 0.20 False

graph
Test Report

Repro

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net6.0 --filter 'System.Text.RegularExpressions.Tests.Perf_Regex_Common*'

Payloads

Baseline
Compare

Histogram

System.Text.RegularExpressions.Tests.Perf_Regex_Common.Uri_IsNotMatch(Options: IgnoreCase, Compiled)


Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

@DrewScoggins DrewScoggins transferred this issue from dotnet/perf-autofiling-issues Nov 18, 2021
@dotnet-issue-labeler
Copy link

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Nov 18, 2021
@DrewScoggins DrewScoggins added tenet-performance Performance related issue tenet-performance-benchmarks Issue from performance benchmark arch-arm64 os-windows labels Nov 18, 2021
@DrewScoggins
Copy link
Member

Seems related to #61490. This is the only regression we detected, but it is fairly large (80%)

@DrewScoggins DrewScoggins changed the title [Perf] Changes at 11/17/2021 8:43:26 PM [Perf] Regression in Perf_Regex.Uri_IsNotMatch(IgnoreCase, Compiled) Nov 18, 2021
@DrewScoggins
Copy link
Member

@stephentoub

@stephentoub
Copy link
Member

This is the only regression we detected, but it is fairly large (80%)

I'm actually pleased it's the only one detected.

I expected and accepted this one in exchange for gains elsewhere. Though looking at it again there's probably somethings we can do to improve it.

@stephentoub
Copy link
Member

The issue here actually isn't what I thought it was.

The core of the issue is:

// If the two nodes don't agree on options in any way, don't try to optimize them.
if (node.Options != subsequent.Options)
{
return false;
}

This test has this sequence:

[\w]+://

Previously we'd use this CanBeMadeAtomic to check whether : can match \w, find it can't, and thus turn the \w+ loop into an atomic one. However, the use of IgnoreCase in this test is causing an issue. The cited PR optimizes the "://" to be case-sensitive, even in the presence of IgnoreCase, because none of the characters there participate in case conversion. And as a result, CanBeMadeAtomic bails early when it sees that the case-sensitivity of the two nodes differs. So the regression is because the \w+ stays non-atomic and thus causes us to backtrack more than we did previously.

The good news is the problem will just go away with #61048. There are workarounds we could put in place in the interim, but I'm tempted to say we should just prioritize fully addressing that issue asap rather than trying to patch things like this.

@joperezr, thoughts?

@joperezr
Copy link
Member

joperezr commented Nov 18, 2021

Let me make sure I understand correctly. The reason why fixing issue #61048 would fix this is because now [\w]+ would now also become case-sensitive even in IgnoreCase because at construction time we would have already translated [\w] into it's upper-case lower-case equivalent (like transforming [A] => [Aa]) so in theory CanBeMadeAtomic should now return true and reduce again the number of backtracking we do, is that right? If so, I agree it is better to just wait for that to be fixed than trying to undo, or patch the optimizations made with your PR.

@stephentoub
Copy link
Member

@joperezr, exactly. (For this one specific case, it's further interesting because we recognize \w as being case-sensitive even under IgnoreCase, because the categories it maps to already includes both upper and lower. But because of how the parser works, even though [\w] and \w are 100% identical sets, the case-sensitive \w gets added to the set and [\w] ends up as case-insensitive.)

@ghost
Copy link

ghost commented Nov 21, 2021

Tagging subscribers to this area: @eerhardt, @dotnet/area-system-text-regularexpressions
See info in area-owners.md if you want to be subscribed.

Issue Details

Run Information

Architecture arm64
OS Windows 10.0.19041
Baseline 10e107debae28de4bd1e710cfe448be3c293e841
Compare 9962c10813de695782baa20f1bf0bb2e9810971d
Diff Diff

Regressions in System.Text.RegularExpressions.Tests.Perf_Regex_Common

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio Baseline ETL Compare ETL
Uri_IsNotMatch - Duration of single invocation 315.14 ns 557.11 ns 1.77 0.20 False

graph
Test Report

Repro

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net6.0 --filter 'System.Text.RegularExpressions.Tests.Perf_Regex_Common*'

Payloads

Baseline
Compare

Histogram

System.Text.RegularExpressions.Tests.Perf_Regex_Common.Uri_IsNotMatch(Options: IgnoreCase, Compiled)


Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

Author: performanceautofiler[bot]
Assignees: -
Labels:

arch-arm64, area-System.Text.RegularExpressions, os-windows, tenet-performance, tenet-performance-benchmarks, untriaged

Milestone: -

@joperezr joperezr added this to the 7.0.0 milestone Feb 2, 2022
@jeffschwMSFT jeffschwMSFT removed the untriaged New issue has not been triaged by the area owner label Mar 28, 2022
@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Mar 29, 2022
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Apr 6, 2022
@ghost ghost locked as resolved and limited conversation to collaborators May 6, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants