Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some regex errors are misleading and could be improved #39075

Closed
abelbraaksma opened this issue Jul 10, 2020 · 6 comments · Fixed by #68861
Closed

Some regex errors are misleading and could be improved #39075

abelbraaksma opened this issue Jul 10, 2020 · 6 comments · Fixed by #68861
Labels
area-System.Text.RegularExpressions help wanted [up-for-grabs] Good issue for external contributors
Milestone

Comments

@abelbraaksma
Copy link
Contributor

abelbraaksma commented Jul 10, 2020

Description

Going over the possible exceptions for regexes (#38872) I noticed that a few appear to be raised that could be improved upon.

Better to raise MalformedNameRef

Regex(@"\k<>")
// raises:
// Framework: parsing "\k<>" - Unrecognized escape sequence \k.
// Core: Invalid pattern '\k<>' at offset 2. Unrecognized escape sequence \\k.

Regex(@"(?<xfo)")
// raises:
// Framework: parsing "(?<xfo)" - Invalid group name: Group names must begin with a word character.
// Core: Invalid pattern '(?<xfo)' at offset 6. Invalid group name: Group names must begin with a word character.

Both the above would be clearer if they raised MalformedNameRef. \k<...> is a valid way to reference a group, so MalformedNameRef just seems more logical.

Also, the 2nd case above, the text says "must begin with a word character", which is precisely what already happens in the regex...

Also, the Core version says \\k is not a recognized escape sequence, but there's no \\k in the expression (which would be valid), only \k.

Text is too specific or wrong

Not an {x, y} quantifier here:

Regex(@"((*foo)bar)")
Regex(@"?(x)?true|false)")
// raises:
// Framework: parsing "((*foo)bar)" - Quantifier {x,y} following nothing.
// Core: Invalid pattern '((*foo)bar)' at offset 3. Quantifier {x,y} following nothing.

Not a capture group in this expression:

Regex(@"x{234567899988}")
// raises:
// Framework: parsing "x{234567899988}" - Capture group numbers must be less than or equal to Int32.MaxValue.
// Core: Invalid pattern 'x{234567899988}' at offset 12. Capture group numbers must be less than or equal to Int32.MaxValue.

Confusing error, but not wrong

Regex(@"[--[b] ")
// raises:
// Framework: parsing "[--[b] " - A subtraction must be the last element in a character class.
// Core: Invalid pattern '[--[b] ' at offset 6. A subtraction must be the last element in a character class.

But I would find UnterminatedBracket clearer here, as non-termination of brackets or parens should be high-prio errors before the more special errors.

Unclear text when alternation is malformed

Regex(@"(x)(?(3x|y)")
// raises
// Framework: parsing "(x)(?(3x|y)" - (?(3) ) malformed.
// Core: Invalid pattern '(x)(?(3x|y)' at offset 8. (?(3) ) malformed.

I think this can be improved by making this error "conditional alternation malformed". Also, printing "(?(3) )" doesn't help here, as it is no present like that in the original expression.

But given the expression, I'd rather expect it to raise TooFewParentheses.

More info

Considering that Regex has been given so much attention lately (thanks so much for the perf improvements!!!), I figured why not also give the errors some scrutiny ;).

Edited: added full exception messages for both .NET Core and .NET Framework following up on #39075 (comment). The specific part is, as can be seen, the same for either environment.

Tested with:

  • .NET Framework 4.8 (specifically: ".NET Framework 4.8.4069.0")
  • .NET Core 3.1 (specifically: ".NET Core 3.1.3")
@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added area-System.Text.RegularExpressions untriaged New issue has not been triaged by the area owner labels Jul 10, 2020
@ghost
Copy link

ghost commented Jul 10, 2020

Tagging subscribers to this area: @eerhardt, @pgovind
Notify danmosemsft if you want to be subscribed.

@stephentoub
Copy link
Member

Thanks for the report, but what version are you running / testing this on?

With the current code, for example,
@"(x)(?(3x|y)"
produces
Invalid pattern '(x)(?(3x|y)' at offset 8. (?(3) ) malformed.

@abelbraaksma
Copy link
Contributor Author

abelbraaksma commented Jul 10, 2020

Oh, I ran them under newest VS inside FSI window. I think that uses 4.8. I should've tested these with Core as well, I guess.

But, @stephentoub, you're showing the same error I did ;), except for the window dressing, which apparently got updated.

@stephentoub
Copy link
Member

stephentoub commented Jul 10, 2020

I should've tested these with Core as well, I guess

Yes :-)

you're showing the same error I did

The core of the text is the same for these examples. But there's additional description around it, such as the position, to help clarify.

@abelbraaksma
Copy link
Contributor Author

@stephentoub, to prevent further confusion, I've updated the text above with the errors from both Framework and Core.

@stephentoub
Copy link
Member

Cool, thanks :-)

@eerhardt eerhardt added this to the Future milestone Jul 30, 2020
@eerhardt eerhardt removed the untriaged New issue has not been triaged by the area owner label Jul 30, 2020
@danmoseley danmoseley added the help wanted [up-for-grabs] Good issue for external contributors label Aug 30, 2021
@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label May 4, 2022
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label May 4, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Jun 4, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Text.RegularExpressions help wanted [up-for-grabs] Good issue for external contributors
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants