Skip to content
This repository has been archived by the owner on Jan 26, 2022. It is now read-only.

Improve consistency with existing RegExp methods and subclass handling #37

Closed
wants to merge 2 commits into from

Conversation

mathiasbynens
Copy link
Member

@mathiasbynens mathiasbynens commented Aug 8, 2018

@anba outlined two separate concerns about the remaining IsRegExp call in MatchAllIterator. Quoting from #34 (comment):

  1. When MatchAllIterator is called from RegExp.prototype [ @@matchAll ], we should/have to assume the user explicitly decided to treat R as a RegExp object, so having an additional IsRegExp call to change this decision seems questionable. It’s also not consistent with how the other RegExp.prototype methods work.

  2. When MatchAllIterator is called from String.prototype.matchAll, I’d prefer to handle it more like the other String.prototype methods which create RegExp objects (that means String.prototype.match and String.prototype.search), because I want to avoid adding yet another way to handle RegExp sub-classes. There are already two different RegExp sub-classing/extension interfaces: the RegExp.prototype methods all call RegExpExec, which means sub-classes, or any other classes, only need to provide their own exec methods when they want to reuse the other RegExp.prototype methods. And in addition to that, the @@match/replace/search/split interfaces allow sub-classes to provide their own implementations for just these methods. The matchAll proposal in its current form adds another dimension to this by providing different code paths depending on whether or not an object is RegExp-like (as per the IsRegExp abstract operation). In my opinion we should only support RegExp sub-classing in two ways: 1) Either the RegExp sub-class has %RegExpPrototype% on its prototype chain, or 2) the RegExp sub-class copies the relevant methods from %RegExpPrototype% into its prototype object.

This PR addresses those issues following @anba’s proposal.

Ref. #21, #34.

@anba outlined two separate concerns about the remaining `IsRegExp` call in `MatchAllIterator`. Quoting from tc39#34 (comment):

1. When `MatchAllIterator` is called from `RegExp.prototype [ @@matchall ]`, we should/have to assume the user explicitly decided to treat `R` as a RegExp object, so having an additional `IsRegExp` call to change this decision seems questionable. It’s also not consistent with how the other `RegExp.prototype` methods work.

2. When `MatchAllIterator` is called from `String.prototype.matchAll`, I’d prefer to handle it more like the other `String.prototype` methods which create RegExp objects (that means `String.prototype.match` and `String.prototype.search`), because I want to avoid adding yet another way to handle RegExp sub-classes. There are already two different RegExp sub-classing/extension interfaces: the `RegExp.prototype` methods all call `RegExpExec`, which means sub-classes, or any other classes, only need to provide their own `exec` methods when they want to reuse the other `RegExp.prototype` methods. And in addition to that, the `@@match/replace/search/split` interfaces allow sub-classes to provide their own implementations for just these methods. The `matchAll` proposal in its current form adds another dimension to this by providing different code paths depending on whether or not an object is RegExp-like (as per the `IsRegExp` abstract operation). In my opinion we should only support RegExp sub-classing in two ways: 1) Either the RegExp sub-class has `%RegExpPrototype%` on its prototype chain, or 2) the RegExp sub-class copies the relevant methods from `%RegExpPrototype%` into its prototype object.

Ref. tc39#21, tc39#34.
@@ -20,7 +20,9 @@ contributors: Jordan Harband
1. Let _matcher_ be ? GetMethod(_regexp_, @@matchAll).
1. If _matcher_ is not *undefined*, then
1. Return ? Call(_matcher_, _regexp_, « _O_ »).
1. Return ? MatchAllIterator(_regexp_, _O_).
1. Let _string_ be ? ToString(_O_).
1. Let _rx_ be ? RegExpCreate(_regexp_, `"g"`).

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

@@ -20,7 +20,9 @@ contributors: Jordan Harband
1. Let _matcher_ be ? GetMethod(_regexp_, @@matchAll).
1. If _matcher_ is not *undefined*, then
1. Return ? Call(_matcher_, _regexp_, « _O_ »).
1. Return ? MatchAllIterator(_regexp_, _O_).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separately, I very intentionally made this function continue to work after delete RegExp.prototype[Symbol.matchAll], which this change breaks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's more consistent with the existing methods though, as discussed here: #34 (comment)

@anba, @littledan, @tschneidereit all spoke up in favor of consistency over deviating from the status quo in this edge case. I'm inclined to agree.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The majority of the existing methods fall back when the Symbol method is deleted; those two examples are exceptions, and I think matchAll should follow the more common approach of falling back.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I agree that String.p.matchAll should be specced consistently with String.p.match and String.p.search. These are the only other 'regexp' methods defined on String. Is there a reason to diverge from their behavior?

Likewise in RegExp.p[@@matchAll].

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's also split and replace - see #38 (comment).

The consistency argument is exactly why the proposal has its current fallback behavior.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the fallback in split and replace is to perform a non-RegExp string matching (i.e. "." only matches the code point U+002E (FULL STOP)), which is different from what matchAll does.

@@ -20,7 +20,9 @@ contributors: Jordan Harband
1. Let _matcher_ be ? GetMethod(_regexp_, @@matchAll).
1. If _matcher_ is not *undefined*, then
1. Return ? Call(_matcher_, _regexp_, « _O_ »).
1. Return ? MatchAllIterator(_regexp_, _O_).
1. Let _string_ be ? ToString(_O_).
1. Let _rx_ be ? RegExpCreate(_regexp_, `"g"`).

This comment was marked as outdated.

1. Let _fullUnicode_ be *false*.
1. Assert: ! Get(_matcher_, `"lastIndex"`) is *0*.
1. Let _S_ be ? ToString(_string_).
1. Let _C_ be ? SpeciesConstructor(_R_, %RegExp%).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain why it's necessary to reconstruct a RegExp here? @@match doesn't behave like this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I don't want mutations to .lastIndex to be observable, so matchAll intentionally clones the regex to prevent that.

matchAll provides an iterator, so it's much more important to avoid observable mutations over time, and also to ensure that mutating the lastIndex of the regex mid-iteration doesn't mess with the algorithm.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an issue explaining this behavior? It seems we could use a internal object to store this state, instead of calling the @@species hook a second time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A second time? it's only called once in this spec - RegExpCreate doesn't call it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha, I got confused with the changes. Never mind.

ljharb added a commit that referenced this pull request Aug 8, 2018
 - `String.prototype.matchAll`:
   - use `RegExpCreate` when `Symbol.prototype.matchAll` is not found
   - fall back to regex coercion otherwise
 - `RegExp.prototype[Symbol.matchAll]`:
   - receiver is assumed to be a regex implicitly
 - remove `MatchAllIterator` abstract operation

Thus, `IsRegExp` call no longer exists.

Addresses #21. Addresses #34. Closes #37.
@ljharb
Copy link
Member

ljharb commented Aug 8, 2018

Now that I have more clarity on the committee consensus, on the objections to IsRegExp, and on the consistency arguments about the Symbol.matchAll method on RegExp.prototype, I think I understand what changes are needed.

I've put up #38; @mathiasbynens, does that alternate approach address your concerns?

@mathiasbynens
Copy link
Member Author

I don't understand why we need the alternate approach. It looks like you're still trying to support delete RegExp.prototype[Symbol.matchAll] — is that it? That's one of the things #37 is supposed to fix, per the discussion in #34 (comment), where @anba, @littledan, @tschneidereit all spoke up in favor of consistency over deviating from the status quo in this edge case. It seems weird to simply dismiss this feedback.

@ljharb
Copy link
Member

ljharb commented Aug 9, 2018

Indeed I am; commented here: #38 (comment)

@@ -20,7 +20,9 @@ contributors: Jordan Harband
1. Let _matcher_ be ? GetMethod(_regexp_, @@matchAll).
1. If _matcher_ is not *undefined*, then
1. Return ? Call(_matcher_, _regexp_, « _O_ »).
1. Return ? MatchAllIterator(_regexp_, _O_).
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I agree that String.p.matchAll should be specced consistently with String.p.match and String.p.search. These are the only other 'regexp' methods defined on String. Is there a reason to diverge from their behavior?

Likewise in RegExp.p[@@matchAll].

spec.emu Outdated
1. Let _C_ be ? SpeciesConstructor(_R_, %RegExp%).
1. Let _flags_ be ? ToString(? Get(_R_, `"flags"`)).
1. Let _matcher_ be ? Construct(_C_, « _R_, _flags_ »).
1. Let _global_ be ? ToBoolean(? Get(_matcher_, `"global"`)).
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@split detects flag values of the resulting regexp by checking the string value of flags. It would probably make sense to do the same here: 1. for consistency, 2. it would let us avoid two ToBoolean(Get) sequences. See https://tc39.github.io/ecma262/#sec-regexp.prototype-@@split:

If flags contains "u", let unicodeMatching be true.

Copy link
Member Author

@mathiasbynens mathiasbynens Aug 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call. Instead of e.g.:

1. Let _global_ be ? ToBoolean(? Get(_matcher_, `"global"`)).

…we should do:

1. If _flags_ contains `"g"`, let _global_ be *true*.
1. Else, let _global_ be *false*.

…to follow the precedent.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, that seems reasonable. I'll do that directly in master since that's an observability improvement instead of API design questions.

@littledan
Copy link
Member

I agree with the others here who spoke up in favor of consistency. I don't see the need to be defensive of this particular case. I'm wondering, why were several comments marked as "outdated"? This marking makes it harder for me to read the thread.

@ljharb
Copy link
Member

ljharb commented Aug 9, 2018

@littledan the one outdated comment thread is from a mistaken comment by me that justin corrected; nobody needs to read that thread.

ljharb added a commit that referenced this pull request Aug 9, 2018
 - `String.prototype.matchAll`:
   - use `RegExpCreate` when `Symbol.prototype.matchAll` is not found
   - fall back to regex coercion otherwise
 - `RegExp.prototype[Symbol.matchAll]`:
   - receiver is assumed to be a regex implicitly
 - remove `MatchAllIterator` abstract operation

Thus, `IsRegExp` call no longer exists.

Addresses #21. Addresses #34. Closes #37.
@ljharb ljharb closed this in #38 Aug 25, 2018
@mathiasbynens mathiasbynens deleted the suggestion-from-anba branch September 17, 2018 06:55
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants