-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
<regex>: Successful negative lookahead assertions do not matche to capture groups #5269
base: main
Are you sure you want to change the base?
<regex>: Successful negative lookahead assertions do not matche to capture groups #5269
Conversation
I'd recommend running tests locally. |
stl/inc/regex
Outdated
@@ -3610,7 +3607,6 @@ bool _Matcher<_BidIt, _Elem, _RxTraits, _It>::_Match_pat(_Node_base* _Nx) { // c | |||
_Node_end_group* _Node = static_cast<_Node_end_group*>(_Nx); | |||
_Node_capture* _Node0 = static_cast<_Node_capture*>(_Node->_Back); | |||
if (_Cap || _Node0->_Idx != 0) { // update capture data | |||
_Tgt_state._Grp_valid[_Node0->_Idx] = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This deletion does not just correct the matches for negative assertions. With this change, no capture group at all will ever be matched.
Changes to fix #5245 actually have to be applied here:
Lines 3575 to 3590 in fc15609
case _N_neg_assert: | |
case _N_assert: | |
{ // check assert | |
_It _Ch = _Tgt_state._Cur; | |
bool _Neg = _Nx->_Kind == _N_neg_assert; | |
_Bt_state_t<_It> _St = _Tgt_state; | |
if (_Match_pat(static_cast<_Node_assert*>(_Nx)->_Child) == _Neg) { | |
// restore initial state and indicate failure | |
_Tgt_state = _St; | |
_Failed = true; | |
} else { | |
_Tgt_state._Cur = _Ch; | |
} | |
break; | |
} |
Specifically, _Tgt_state._Cur = _Ch;
insufficiently resets the state after a successful negative assertion. (But it is the appropriate reset for positive assertions.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have restored the initial state after a successful negative assertion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a test (like the one in #5245) should be added to verify that the behavior is correct after this PR.
Edit: It would also be nice to add test coverage which verifies that captures are recorded correctly for positive assertions as well.
@@ -3583,7 +3580,11 @@ bool _Matcher<_BidIt, _Elem, _RxTraits, _It>::_Match_pat(_Node_base* _Nx) { // c | |||
_Tgt_state = _St; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pre-existing: This state reset after a match failure is both unnecessary and pointless, so I think it should be removed. No caller of _Match_pat()
can assume that the match state hasn't changed after failure, because the function does not fully reset the state after trying to match two or more NFA nodes anyway.
stl/inc/regex
Outdated
@@ -3583,7 +3580,11 @@ bool _Matcher<_BidIt, _Elem, _RxTraits, _It>::_Match_pat(_Node_base* _Nx) { // c | |||
_Tgt_state = _St; | |||
_Failed = true; | |||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The opening brace after else
leads to unnecessary nesting. You can just remove the braces and write else if
instead.
stl/inc/regex
Outdated
if (_Neg) { | ||
_Tgt_state = _St; | ||
} else { | ||
_Tgt_state._Cur = _Ch; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pre-existing: We can assign _St._Cur
to _Tgt_state._Cur
here instead of _Ch
and eliminate the local variable _Ch
. This should also decrease the amount of stack this function uses per call a bit, slightly reducing the danger of stack overflow (as in #997).
} else if (!_Neg) { | ||
_Tgt_state._Cur = _St._Cur; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not equivalent to the prior commit. Now the state is no longer being reset for a successful negative lookahead assertion at all.
(My comment about a superfluous _Tgt_state = _St;
statement points a different line, namely the one in the failure case. The one removed by the last commit in the successful case isn't superfluous at all.)
This suggests we really need more test coverage for negative assertions if no test catches this.
Fixes: #5245