Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

regexp_parser rejects /\xA/ but MRI accepts it #75

Closed
dgollahon opened this issue Dec 20, 2020 · 2 comments
Closed

regexp_parser rejects /\xA/ but MRI accepts it #75

dgollahon opened this issue Dec 20, 2020 · 2 comments

Comments

@dgollahon
Copy link
Contributor

Hi,

I am working on re-introducing regexp mutation support on mutant and I noticed that since the old integration existed regexp_parser seems to have decided to stop rejecting a large % of regexps that ruby would accept (#63) but regexp_parser did not. I did find one additional case that was not documented anywhere I found (I tried brute-forcing millions of regexps to infer if there were any cases where regexp_parser was stricter than MRI and this is the only class of instances I could find).

"\xA" # => "\n"
/\xA/.match?("\n") # => true

 Regexp::Parser.parse(/\xA/) # => Regexp::Scanner::PrematureEndError: Premature end of pattern at \x

Is this a bug or intended behavior? Either is fine for my purposes since I can just add a special check to ignore errors in this case, but I was curious if this was an intended difference or not. The coverage matrix in the README suggests that hex escapes work but I guess this is a special case that was not highlighted. If it is intentional behavior, it would be helpful to document it (unless I missed where this was done already) or alternatively having parity with MRI would work for me.

Thanks!

dgollahon added a commit to mbj/mutant that referenced this issue Dec 20, 2020
This reverts commit 21d3fef.

This was not a clean revert. Note that:
- The version of `regexp_parser` was 1.3.0, now it is 1.8.2 to accomodate our current `rubocop` version and because there were some relevant bugfixes implemented between 1.3.x and 1.8.x. We should eventually move to 2.0 but it is currently incompatible with this integration. There are some issues with the frozen Regexp classes getting mutated so we may have to open an issue.
- Since "expected exception" support was removed from the specs, I have had to exclude two files entirely. This seems unfortunate as it reduces our overall coverage.
- Since unsupported nodes are no longer explicitly tracked, I removed the code that used to handle that for regular expressions. See: #1021
- I had to change the example case for where we are more permissive than `regexp_parser` because `regexp_parser` has decided to become more permissive and try to match Ruby's semantics. It was actually very hard to find a case that failed--I brute-forced 50 million regexp strings that had perfect parity of being accepted and then stumbled onto the single hex escape case by accident. See: ammar/regexp_parser#75
- Changed an access pattern for regexp mutations which became equivalent based on this: https://github.com/ammar/regexp_parser/blame/4ca7cec03b210e3e00473b7b1a7308f963190c1e/lib/regexp_parser/expression/subexpression.rb#L30-L33
- Some other minor conflicts and small spec assertion changes were resolved as well.
dgollahon added a commit to mbj/mutant that referenced this issue Dec 20, 2020
This reverts commit 21d3fef.

This was not a clean revert. Note that:
- The version of `regexp_parser` was 1.3.0, now it is 1.8.2 to accomodate our current `rubocop` version and because there were some relevant bugfixes implemented between 1.3.x and 1.8.x. We should eventually move to 2.0 but it is currently incompatible with this integration. There are some issues with the frozen Regexp classes getting mutated so we may have to open an issue.
- Since "expected exception" support was removed from the specs, I have had to exclude two files entirely. This seems unfortunate as it reduces our overall coverage.
- Since unsupported nodes are no longer explicitly tracked, I removed the code that used to handle that for regular expressions. See: #1021
- I had to change the example case for where we are more permissive than `regexp_parser` because `regexp_parser` has decided to become more permissive and try to match Ruby's semantics. It was actually very hard to find a case that failed--I brute-forced 50 million regexp strings that had perfect parity of being accepted and then stumbled onto the single hex escape case by accident. See: ammar/regexp_parser#75
- Changed an access pattern for regexp mutations which became equivalent based on this: https://github.com/ammar/regexp_parser/blame/4ca7cec03b210e3e00473b7b1a7308f963190c1e/lib/regexp_parser/expression/subexpression.rb#L30-L33
- Some other minor conflicts and small spec assertion changes were resolved as well.
dgollahon added a commit to mbj/mutant that referenced this issue Dec 20, 2020
This reverts commit 21d3fef.

This was not a clean revert. Note that:
- The version of `regexp_parser` was 1.3.0, now it is 1.8.2 to accomodate our current `rubocop` version and because there were some relevant bugfixes implemented between 1.3.x and 1.8.x. We should eventually move to 2.0 but it is currently incompatible with this integration. There are some issues with the frozen Regexp classes getting mutated so we may have to open an issue.
- Since "expected exception" support was removed from the specs, I have had to exclude two files entirely. This seems unfortunate as it reduces our overall coverage.
- Since unsupported nodes are no longer explicitly tracked, I removed the code that used to handle that for regular expressions. See: #1021
- I had to change the example case for where we are more permissive than `regexp_parser` because `regexp_parser` has decided to become more permissive and try to match Ruby's semantics. It was actually very hard to find a case that failed--I brute-forced 50 million regexp strings that had perfect parity of being accepted and then stumbled onto the single hex escape case by accident. See: ammar/regexp_parser#75
- Changed an access pattern for regexp mutations which became equivalent based on this: https://github.com/ammar/regexp_parser/blame/4ca7cec03b210e3e00473b7b1a7308f963190c1e/lib/regexp_parser/expression/subexpression.rb#L30-L33
- Some other minor conflicts and small spec assertion changes were resolved as well.
dgollahon added a commit to mbj/mutant that referenced this issue Dec 20, 2020
This reverts commit 21d3fef.

This was not a clean revert. Note that:
- The version of `regexp_parser` was 1.3.0, now it is 1.8.2 to accomodate our current `rubocop` version and because there were some relevant bugfixes implemented between 1.3.x and 1.8.x. We should eventually move to 2.0 but it is currently incompatible with this integration. There are some issues with the frozen Regexp classes getting mutated so we may have to open an issue.
- Since "expected exception" support was removed from the specs, I have had to exclude two files entirely. This seems unfortunate as it reduces our overall coverage.
- Since unsupported nodes are no longer explicitly tracked, I removed the code that used to handle that for regular expressions. See: #1021
- I had to change the example case for where we are more permissive than `regexp_parser` because `regexp_parser` has decided to become more permissive and try to match Ruby's semantics. It was actually very hard to find a case that failed--I brute-forced 50 million regexp strings that had perfect parity of being accepted and then stumbled onto the single hex escape case by accident. See: ammar/regexp_parser#75
- Changed an access pattern for regexp mutations which became equivalent based on this: https://github.com/ammar/regexp_parser/blame/4ca7cec03b210e3e00473b7b1a7308f963190c1e/lib/regexp_parser/expression/subexpression.rb#L30-L33
- Some other minor conflicts and small spec assertion changes were resolved as well.
dgollahon added a commit to mbj/mutant that referenced this issue Dec 20, 2020
This reverts commit 21d3fef.

This was not a clean revert. Note that:
- The version of `regexp_parser` was 1.3.0, now it is 1.8.2 to accomodate our current `rubocop` version and because there were some relevant bugfixes implemented between 1.3.x and 1.8.x. We should eventually move to 2.0 but it is currently incompatible with this integration. There are some issues with the frozen Regexp classes getting mutated so we may have to open an issue.
- Since "expected exception" support was removed from the specs, I have had to exclude two files entirely. This seems unfortunate as it reduces our overall coverage.
- Since unsupported nodes are no longer explicitly tracked, I removed the code that used to handle that for regular expressions. See: #1021
- I had to change the example case for where we are more permissive than `regexp_parser` because `regexp_parser` has decided to become more permissive and try to match Ruby's semantics. It was actually very hard to find a case that failed--I brute-forced 50 million regexp strings that had perfect parity of being accepted and then stumbled onto the single hex escape case by accident. See: ammar/regexp_parser#75
- Changed an access pattern for regexp mutations which became equivalent based on this: https://github.com/ammar/regexp_parser/blame/4ca7cec03b210e3e00473b7b1a7308f963190c1e/lib/regexp_parser/expression/subexpression.rb#L30-L33
- Some other minor conflicts and small spec assertion changes were resolved as well.
dgollahon added a commit to mbj/mutant that referenced this issue Dec 20, 2020
This reverts commit 21d3fef.

This was not a clean revert. Note that:
- The version of `regexp_parser` was 1.3.0, now it is 1.8.2 to accomodate our current `rubocop` version and because there were some relevant bugfixes implemented between 1.3.x and 1.8.x. We should eventually move to 2.0 but it is currently incompatible with this integration. There are some issues with the frozen Regexp classes getting mutated so we may have to open an issue.
- Since "expected exception" support was removed from the specs, I have had to exclude two files entirely. This seems unfortunate as it reduces our overall coverage.
- Since unsupported nodes are no longer explicitly tracked, I removed the code that used to handle that for regular expressions. See: #1021
- I had to change the example case for where we are more permissive than `regexp_parser` because `regexp_parser` has decided to become more permissive and try to match Ruby's semantics. It was actually very hard to find a case that failed--I brute-forced 50 million regexp strings that had perfect parity of being accepted and then stumbled onto the single hex escape case by accident. See: ammar/regexp_parser#75
- Changed an access pattern for regexp mutations which became equivalent based on this: https://github.com/ammar/regexp_parser/blame/4ca7cec03b210e3e00473b7b1a7308f963190c1e/lib/regexp_parser/expression/subexpression.rb#L30-L33
- Some other minor conflicts and small spec assertion changes were resolved as well.
dgollahon added a commit to mbj/mutant that referenced this issue Dec 20, 2020
This reverts commit 21d3fef.

This was not a clean revert. Note that:
- The version of `regexp_parser` was 1.3.0, now it is 1.8.2 to accomodate our current `rubocop` version and because there were some relevant bugfixes implemented between 1.3.x and 1.8.x. We should eventually move to 2.0 but it is currently incompatible with this integration. There are some issues with the frozen Regexp classes getting mutated so we may have to open an issue.
- Since "expected exception" support was removed from the specs, I have had to exclude two files entirely. This seems unfortunate as it reduces our overall coverage.
- Since unsupported nodes are no longer explicitly tracked, I removed the code that used to handle that for regular expressions. See: #1021
- I had to change the example case for where we are more permissive than `regexp_parser` because `regexp_parser` has decided to become more permissive and try to match Ruby's semantics. It was actually very hard to find a case that failed--I brute-forced 50 million regexp strings that had perfect parity of being accepted and then stumbled onto the single hex escape case by accident. See: ammar/regexp_parser#75
- Changed an access pattern for regexp mutations which became equivalent based on this: https://github.com/ammar/regexp_parser/blame/4ca7cec03b210e3e00473b7b1a7308f963190c1e/lib/regexp_parser/expression/subexpression.rb#L30-L33
- Some other minor conflicts and small spec assertion changes were resolved as well.
dgollahon added a commit to mbj/mutant that referenced this issue Dec 20, 2020
This reverts commit 21d3fef.

This was not a clean revert. Note that:
- The version of `regexp_parser` was 1.3.0, now it is 1.8.2 to accomodate our current `rubocop` version and because there were some relevant bugfixes implemented between 1.3.x and 1.8.x. We should eventually move to 2.0 but it is currently incompatible with this integration. There are some issues with the frozen Regexp classes getting mutated so we may have to open an issue.
- Since "expected exception" support was removed from the specs, I have had to exclude two files entirely. This seems unfortunate as it reduces our overall coverage.
- Since unsupported nodes are no longer explicitly tracked, I removed the code that used to handle that for regular expressions. See: #1021
- I had to change the example case for where we are more permissive than `regexp_parser` because `regexp_parser` has decided to become more permissive and try to match Ruby's semantics. It was actually very hard to find a case that failed--I brute-forced 50 million regexp strings that had perfect parity of being accepted and then stumbled onto the single hex escape case by accident. See: ammar/regexp_parser#75
- Changed an access pattern for regexp mutations which became equivalent based on this: https://github.com/ammar/regexp_parser/blame/4ca7cec03b210e3e00473b7b1a7308f963190c1e/lib/regexp_parser/expression/subexpression.rb#L30-L33
- Some other minor conflicts and small spec assertion changes were resolved as well.
dgollahon added a commit to mbj/mutant that referenced this issue Dec 20, 2020
This reverts commit 21d3fef.

This was not a clean revert. Note that:
- The version of `regexp_parser` was 1.3.0, now it is 1.8.2 to accomodate our current `rubocop` version and because there were some relevant bugfixes implemented between 1.3.x and 1.8.x. We should eventually move to 2.0 but it is currently incompatible with this integration. There are some issues with the frozen Regexp classes getting mutated so we may have to open an issue.
- Since "expected exception" support was removed from the specs, I have had to exclude two files entirely. This seems unfortunate as it reduces our overall coverage.
- Since unsupported nodes are no longer explicitly tracked, I removed the code that used to handle that for regular expressions. See: #1021
- I had to change the example case for where we are more permissive than `regexp_parser` because `regexp_parser` has decided to become more permissive and try to match Ruby's semantics. It was actually very hard to find a case that failed--I brute-forced 50 million regexp strings that had perfect parity of being accepted and then stumbled onto the single hex escape case by accident. See: ammar/regexp_parser#75
- Changed an access pattern for regexp mutations which became equivalent based on this: https://github.com/ammar/regexp_parser/blame/4ca7cec03b210e3e00473b7b1a7308f963190c1e/lib/regexp_parser/expression/subexpression.rb#L30-L33
- I have marked several dispatch methods as `private`.
- I have also removed the old YARD doc comments on private methods at @mbj's request.
- Some other minor conflicts and small spec assertion changes were resolved as well.
@jaynetics
Copy link
Collaborator

@dgollahon thanks for the report, and for going to such lengths to check all kinds of regexps! ❤️

This one was also clearly a bug. We had code to handle hex escapes with just one xdigit since the start, it's just been unreachable for all these 10 years 😄

The fix is included in v2.0.1.

@dgollahon
Copy link
Contributor Author

Fantastic! Thanks for the excellent response @jaynetics. :D

dgollahon added a commit to mbj/mutant that referenced this issue Dec 23, 2020
This reverts commit 21d3fef.

This was not a clean revert. Note that:
- The version of `regexp_parser` was 1.3.0, now it is 1.8.2 to accomodate our current `rubocop` version and because there were some relevant bugfixes implemented between 1.3.x and 1.8.x. We should eventually move to 2.0 but it is currently incompatible with this integration. There are some issues with the frozen Regexp classes getting mutated so we may have to open an issue.
- Since "expected exception" support was removed from the specs, I have had to exclude two files entirely. This seems unfortunate as it reduces our overall coverage.
- Since unsupported nodes are no longer explicitly tracked, I removed the code that used to handle that for regular expressions. See: #1021
- I had to change the example case for where we are more permissive than `regexp_parser` because `regexp_parser` has decided to become more permissive and try to match Ruby's semantics. It was actually very hard to find a case that failed--I brute-forced 50 million regexp strings that had perfect parity of being accepted and then stumbled onto the single hex escape case by accident. See: ammar/regexp_parser#75. This can be removed once we reach `regexp_parser` >= 2.0.1.
- Added logic to skip invalid group options until we are on `regexp_parser` >= 2.0.1. See: ammar/regexp_parser#76
- Changed an access pattern for regexp mutations which became equivalent based on this: https://github.com/ammar/regexp_parser/blame/4ca7cec03b210e3e00473b7b1a7308f963190c1e/lib/regexp_parser/expression/subexpression.rb#L30-L33
- I have marked several dispatch methods as `private`.
- I have also removed the old YARD doc comments on private methods at @mbj's request.
- Some other minor conflicts and small spec assertion changes were resolved as well.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants