`regexp_parser` rejects `/\xA/` but MRI accepts it #75

dgollahon · 2020-12-20T02:18:24Z

Hi,

I am working on re-introducing regexp mutation support on mutant and I noticed that since the old integration existed regexp_parser seems to have decided to stop rejecting a large % of regexps that ruby would accept (#63) but regexp_parser did not. I did find one additional case that was not documented anywhere I found (I tried brute-forcing millions of regexps to infer if there were any cases where regexp_parser was stricter than MRI and this is the only class of instances I could find).

"\xA" # => "\n"
/\xA/.match?("\n") # => true

 Regexp::Parser.parse(/\xA/) # => Regexp::Scanner::PrematureEndError: Premature end of pattern at \x

Is this a bug or intended behavior? Either is fine for my purposes since I can just add a special check to ignore errors in this case, but I was curious if this was an intended difference or not. The coverage matrix in the README suggests that hex escapes work but I guess this is a special case that was not highlighted. If it is intentional behavior, it would be helpful to document it (unless I missed where this was done already) or alternatively having parity with MRI would work for me.

Thanks!

The text was updated successfully, but these errors were encountered:

This reverts commit 21d3fef. This was not a clean revert. Note that: - The version of `regexp_parser` was 1.3.0, now it is 1.8.2 to accomodate our current `rubocop` version and because there were some relevant bugfixes implemented between 1.3.x and 1.8.x. We should eventually move to 2.0 but it is currently incompatible with this integration. There are some issues with the frozen Regexp classes getting mutated so we may have to open an issue. - Since "expected exception" support was removed from the specs, I have had to exclude two files entirely. This seems unfortunate as it reduces our overall coverage. - Since unsupported nodes are no longer explicitly tracked, I removed the code that used to handle that for regular expressions. See: #1021 - I had to change the example case for where we are more permissive than `regexp_parser` because `regexp_parser` has decided to become more permissive and try to match Ruby's semantics. It was actually very hard to find a case that failed--I brute-forced 50 million regexp strings that had perfect parity of being accepted and then stumbled onto the single hex escape case by accident. See: ammar/regexp_parser#75 - Changed an access pattern for regexp mutations which became equivalent based on this: https://github.com/ammar/regexp_parser/blame/4ca7cec03b210e3e00473b7b1a7308f963190c1e/lib/regexp_parser/expression/subexpression.rb#L30-L33 - Some other minor conflicts and small spec assertion changes were resolved as well.

@mbj

This reverts commit 21d3fef. This was not a clean revert. Note that: - The version of `regexp_parser` was 1.3.0, now it is 1.8.2 to accomodate our current `rubocop` version and because there were some relevant bugfixes implemented between 1.3.x and 1.8.x. We should eventually move to 2.0 but it is currently incompatible with this integration. There are some issues with the frozen Regexp classes getting mutated so we may have to open an issue. - Since "expected exception" support was removed from the specs, I have had to exclude two files entirely. This seems unfortunate as it reduces our overall coverage. - Since unsupported nodes are no longer explicitly tracked, I removed the code that used to handle that for regular expressions. See: #1021 - I had to change the example case for where we are more permissive than `regexp_parser` because `regexp_parser` has decided to become more permissive and try to match Ruby's semantics. It was actually very hard to find a case that failed--I brute-forced 50 million regexp strings that had perfect parity of being accepted and then stumbled onto the single hex escape case by accident. See: ammar/regexp_parser#75 - Changed an access pattern for regexp mutations which became equivalent based on this: https://github.com/ammar/regexp_parser/blame/4ca7cec03b210e3e00473b7b1a7308f963190c1e/lib/regexp_parser/expression/subexpression.rb#L30-L33 - I have marked several dispatch methods as `private`. - I have also removed the old YARD doc comments on private methods at @mbj's request. - Some other minor conflicts and small spec assertion changes were resolved as well.

jaynetics · 2020-12-20T21:52:57Z

@dgollahon thanks for the report, and for going to such lengths to check all kinds of regexps! ❤️

This one was also clearly a bug. We had code to handle hex escapes with just one xdigit since the start, it's just been unreachable for all these 10 years 😄

The fix is included in v2.0.1.

dgollahon · 2020-12-20T22:47:12Z

Fantastic! Thanks for the excellent response @jaynetics. :D

@mbj

This reverts commit 21d3fef. This was not a clean revert. Note that: - The version of `regexp_parser` was 1.3.0, now it is 1.8.2 to accomodate our current `rubocop` version and because there were some relevant bugfixes implemented between 1.3.x and 1.8.x. We should eventually move to 2.0 but it is currently incompatible with this integration. There are some issues with the frozen Regexp classes getting mutated so we may have to open an issue. - Since "expected exception" support was removed from the specs, I have had to exclude two files entirely. This seems unfortunate as it reduces our overall coverage. - Since unsupported nodes are no longer explicitly tracked, I removed the code that used to handle that for regular expressions. See: #1021 - I had to change the example case for where we are more permissive than `regexp_parser` because `regexp_parser` has decided to become more permissive and try to match Ruby's semantics. It was actually very hard to find a case that failed--I brute-forced 50 million regexp strings that had perfect parity of being accepted and then stumbled onto the single hex escape case by accident. See: ammar/regexp_parser#75. This can be removed once we reach `regexp_parser` >= 2.0.1. - Added logic to skip invalid group options until we are on `regexp_parser` >= 2.0.1. See: ammar/regexp_parser#76 - Changed an access pattern for regexp mutations which became equivalent based on this: https://github.com/ammar/regexp_parser/blame/4ca7cec03b210e3e00473b7b1a7308f963190c1e/lib/regexp_parser/expression/subexpression.rb#L30-L33 - I have marked several dispatch methods as `private`. - I have also removed the old YARD doc comments on private methods at @mbj's request. - Some other minor conflicts and small spec assertion changes were resolved as well.

dgollahon mentioned this issue Dec 20, 2020

Reintroduce Regexp mutations mbj/mutant#1166

Merged

dgollahon mentioned this issue Dec 20, 2020

Multi-byte named capture groups do not parse #76

Closed

jaynetics closed this as completed in b2fa2b3 Dec 20, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`regexp_parser` rejects `/\xA/` but MRI accepts it #75

`regexp_parser` rejects `/\xA/` but MRI accepts it #75

dgollahon commented Dec 20, 2020

jaynetics commented Dec 20, 2020

dgollahon commented Dec 20, 2020

regexp_parser rejects /\xA/ but MRI accepts it #75

regexp_parser rejects /\xA/ but MRI accepts it #75

Comments

dgollahon commented Dec 20, 2020

jaynetics commented Dec 20, 2020

dgollahon commented Dec 20, 2020

`regexp_parser` rejects `/\xA/` but MRI accepts it #75

`regexp_parser` rejects `/\xA/` but MRI accepts it #75