Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normative: allow duplicate named capture groups #2721

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

bakkot
Copy link
Contributor

@bakkot bakkot commented Apr 3, 2022

This allows you to have a regex like

/(?<year>[0-9]{4})-[0-9]{2}|[0-9]{2}-(?<year>[0-9]{4})/

where a capturing group name is re-used across alternatives. It continues to be illegal to re-use a name within the same alternative.

As currently specified, it also enforces that named backreferences correspond to capturing groups in the same alternative, which would make the following (currently legal) program illegal:

/(?<a>x)|\k<a>/

There is no reason to write this because the \k can never refer to anything, meaning it will always match the empty string. For this reason I think it should have been illegal in the first place. But if we want to preserve that behavior, it's easy enough to specify.

EDIT: updated so that the above remains legal, per plenary.

(I have a proposal repo for this, but figured it might as well be a PR.)

@bakkot bakkot added normative change Affects behavior required to correctly evaluate some ECMAScript source text needs consensus This needs committee consensus before it can be eligible to be merged. labels Apr 3, 2022
@bakkot bakkot force-pushed the duplicate-named-capture-groups branch from 7587c99 to b2bc16e Compare April 3, 2022 03:46
@ljharb ljharb changed the title Normative: allow duplicagte named capture groups Normative: allow duplicate named capture groups Apr 3, 2022
spec.html Outdated Show resolved Hide resolved
spec.html Outdated Show resolved Hide resolved
@bakkot bakkot force-pushed the duplicate-named-capture-groups branch 2 times, most recently from 8e770b5 to 2140ce7 Compare May 25, 2022 02:37
@ljharb ljharb added needs test262 tests The proposal should specify how to test an implementation. Ideally via github.com/tc39/test262 pending stage 4 This proposal has not yet achieved stage 4, but may otherwise be ready to merge. proposal This is related to a specific proposal, and will be closed/merged when the proposal reaches stage 4. and removed needs consensus This needs committee consensus before it can be eligible to be merged. labels Jun 6, 2022
@ljharb ljharb marked this pull request as draft June 6, 2022 19:51
Copy link
Contributor

@gibson042 gibson042 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only materially necessary change I see is _j_<sup>th</sup> capture of _R_ is not *undefined*_j_<sup>th</sup> element of _r_'s _captures_ List is not *undefined*, but I also left some suggestions that you can adopt or ignore at your discretion.

spec.html Outdated
@@ -34758,6 +34758,22 @@ <h1>
</emu-alg>
</emu-clause>

<emu-clause id="sec-patterns-static-semantics-can-both-participate" type="abstract operation">
<h1>
Static Semantics: CanBothParticipate (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opposite polarity might lead to a more intuitive name, e.g. MutuallyExclusive.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @gibson042, the sense of the CanBothParticipate AO is confusing. It returns true when x and y are in the same Alternative and therefore we should throw a Syntax Error. Can we rename CanBothParticipate to CannotBothParticipate?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's more like MightBothParticipate. As in, both could be components of a single match.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed to MightBothParticipate.

spec.html Outdated Show resolved Hide resolved
spec.html Outdated
Comment on lines 36286 to 36287
1. For each integer _j_ such that _j_ &ne; _i_, _j_ &ge; 1, and _j_ &le; _n_, do
1. If the _j_<sup>th</sup> capture of _R_ was defined with a |GroupName|, then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this be better expressed more procedurally?

Suggested change
1. For each integer _j_ such that _j_ &ne; _i_, _j_ &ge; 1, and _j_ &le; _n_, do
1. If the _j_<sup>th</sup> capture of _R_ was defined with a |GroupName|, then
1. For each integer _j_ such that _j_ &ge; 1 and _j_ &le; _n_, in ascending order, do
1. If _j_ &ne; _i_ and the _j_<sup>th</sup> capture of _R_ was defined with a |GroupName|, then

spec.html Outdated
1. Let _isMatchedElsewhere_ be *false*.
1. For each integer _j_ such that _j_ &ne; _i_, _j_ &ge; 1, and _j_ &le; _n_, do
1. If the _j_<sup>th</sup> capture of _R_ was defined with a |GroupName|, then
1. Let _sj_ be the CapturingGroupName of that |GroupName|.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not object to refactoring these aliases, e.g. sgroupName and sjotherName.

spec.html Outdated Show resolved Hide resolved
spec.html Outdated Show resolved Hide resolved
spec.html Outdated Show resolved Hide resolved
@bakkot bakkot added has test262 tests and removed needs test262 tests The proposal should specify how to test an implementation. Ideally via github.com/tc39/test262 labels Aug 3, 2022
@bakkot bakkot force-pushed the duplicate-named-capture-groups branch from 61e7436 to 8dd5ab0 Compare March 28, 2024 20:32
@bakkot bakkot marked this pull request as ready for review March 28, 2024 20:32
@syg
Copy link
Contributor

syg commented Apr 3, 2024

Happy do defer the editorial review, I'm told there've been enough eyes on this.

@syg syg added has stage 4 This PR represents a proposal that has achieved stage 4, and is ready to merge. and removed pending stage 4 This proposal has not yet achieved stage 4, but may otherwise be ready to merge. labels Apr 17, 2024
spec.html Outdated
@@ -35715,7 +35715,7 @@ <h1>Static Semantics: Early Errors</h1>
It is a Syntax Error if CountLeftCapturingParensWithin(|Pattern|) ≥ 2<sup>32</sup> - 1.
</li>
<li>
It is a Syntax Error if |Pattern| contains two or more |GroupSpecifier|s for which CapturingGroupName of |GroupSpecifier| is the same.
It is a Syntax Error if |Pattern| contains two distinct |GroupSpecifier|s _x_ and _y_ for which CapturingGroupName(_x_) is the same as CapturingGroupName(_y_) and such that CanBothParticipate(_x_, _y_) is *true*.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. PR various editorial changes for comparisons #2877 says to avoid "is the same as", use "is" for comparing Strings.

  2. It's rare (though not unheard of) to invoke an SDO with parenthesis notation. On the other hand, using the normal notation would put CapturingGroupName of _y_ on the RHS of is, which I don't think we ever do.

  3. ... for which A and such that B is odd. Seems like either the two arms should agree re "for which" vs "such that", or else just use one that covers both arms: ... such that A and B

Suggested change
It is a Syntax Error if |Pattern| contains two distinct |GroupSpecifier|s _x_ and _y_ for which CapturingGroupName(_x_) is the same as CapturingGroupName(_y_) and such that CanBothParticipate(_x_, _y_) is *true*.
It is a Syntax Error if |Pattern| contains two distinct |GroupSpecifier|s _x_ and _y_ such that CapturingGroupName(_x_) is CapturingGroupName(_y_) and CanBothParticipate(_x_, _y_) is *true*.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to

It is a Syntax Error if |Pattern| contains two distinct |GroupSpecifier|s _x_ and _y_ such that the CapturingGroupName of _x_ is the CapturingGroupName of _y_ and such that CanBothParticipate(_x_, _y_) is *true*.

spec.html Outdated Show resolved Hide resolved
Copy link
Member

@michaelficarra michaelficarra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM other than @jmdyck's nits.

@michaelficarra michaelficarra removed the editor call to be discussed in the next editor call label Oct 23, 2024
@bakkot bakkot force-pushed the duplicate-named-capture-groups branch from 8dd5ab0 to 2c813b7 Compare October 30, 2024 02:49
@bakkot
Copy link
Contributor Author

bakkot commented Oct 30, 2024

Comments addressed.

@bakkot bakkot added the ready to merge Editors believe this PR needs no further reviews, and is ready to land. label Nov 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
has stage 4 This PR represents a proposal that has achieved stage 4, and is ready to merge. has test262 tests normative change Affects behavior required to correctly evaluate some ECMAScript source text proposal This is related to a specific proposal, and will be closed/merged when the proposal reaches stage 4. ready to merge Editors believe this PR needs no further reviews, and is ready to land.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants