Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix regexp mutation p{Latin} #1234

Merged
merged 3 commits into from
May 3, 2021
Merged

Fix regexp mutation p{Latin} #1234

merged 3 commits into from
May 3, 2021

Conversation

mbj
Copy link
Owner

@mbj mbj commented May 3, 2021

Fixes the p{Latin} regexp constructs that used to crash mutant as reported in #1231.

Also refactor the Mutant::Registry class to allow use case specific default behavior, making it easier to address incomplete mappings in the future.

* Before the default was to return the generic node mutator on lookup
  misses.
* This is fine for the main mutation registry, but not the external
  regexp ast transformations.
* This change allows the default to be instance specific, allowing the
  transform registry to fail on unknown nodes rather than to return the
  generic mutator that hides the real issue.
@mbj mbj changed the title Fix regexp mutation Fix regexp mutation p{Latin} May 3, 2021
@mbj mbj merged commit db7167f into master May 3, 2021
@mbj mbj deleted the fix/regexp-mutation branch May 3, 2021 14:48
dgollahon added a commit that referenced this pull request Nov 7, 2021
- This is up for early review because I'm not sure about the dynamic creation of the table of unicode properties. I tried just creating a list of them but it was so slow for my editor to process that I couldn't even format the giant lookup table. I suspect that if we want to "bake" these to avoid however long it takes to compute the table and maybe avoid any unexpected drift, it might make sense to dump to YAML or something like that. I'm not sure the best approach.
- I'm also guessing there's a better option than just dumping all the regexp node types in the other list of supported regexp nodes.
- We probably should do this for other regex types--we might be missing some of the posix classes, for instance (I have not checked yet).
- Prevents crashes when having an unsupported property type in source.
- Related to #1234 (which was a very partial fix)
- Note that this turns our `\p{Latin}` formatting into `\p{latin}`. We could fix this with some very simple inflection but I wanted to do the simplest approach first to demonstrate the problem since this seems to be semantically equivalent. The ruby docs use the uppercase form. I have a text file from the upstream regex toolkit that we could use to confirm inflection rules if we want to.
dgollahon added a commit that referenced this pull request Nov 7, 2021
- This is up for early review because I'm not sure about the dynamic creation of the table of unicode properties. I tried just creating a list of them but it was so slow for my editor to process that I couldn't even format the giant lookup table. I suspect that if we want to "bake" these to avoid however long it takes to compute the table and maybe avoid any unexpected drift, it might make sense to dump to YAML or something like that. I'm not sure the best approach.
- I'm also guessing there's a better option than just dumping all the regexp node types in the other list of supported regexp nodes.
- We probably should do this for other regex types--we might be missing some of the posix classes, for instance (I have not checked yet).
- Prevents crashes when having an unsupported property type in source.
- Related to #1234 (which was a very partial fix)
- Note that this turns our `\p{Latin}` formatting into `\p{latin}`. We could fix this with some very simple inflection but I wanted to do the simplest approach first to demonstrate the problem since this seems to be semantically equivalent. The ruby docs use the uppercase form. I have a text file from the upstream regex toolkit that we could use to confirm inflection rules if we want to.
dgollahon added a commit that referenced this pull request Nov 7, 2021
- This is up for early review because I'm not sure about the dynamic creation of the table of unicode properties. I tried just creating a list of them but it was so slow for my editor to process that I couldn't even format the giant lookup table. I suspect that if we want to "bake" these to avoid however long it takes to compute the table and maybe avoid any unexpected drift, it might make sense to dump to YAML or something like that. I'm not sure the best approach.
- I'm also guessing there's a better option than just dumping all the regexp node types in the other list of supported regexp nodes.
- We probably should do this for other regex types--we might be missing some of the posix classes, for instance (I have not checked yet).
- Prevents crashes when having an unsupported property type in source.
- Related to #1234 (which was a very partial fix)
- Note that this turns our `\p{Latin}` formatting into `\p{latin}`. We could fix this with some very simple inflection but I wanted to do the simplest approach first to demonstrate the problem since this seems to be semantically equivalent. The ruby docs use the uppercase form. I have a text file from the upstream regex toolkit that we could use to confirm inflection rules if we want to.
dgollahon added a commit that referenced this pull request Nov 7, 2021
- This is up for early review because I'm not sure about the dynamic creation of the table of unicode properties. I tried just creating a list of them but it was so slow for my editor to process that I couldn't even format the giant lookup table. I suspect that if we want to "bake" these to avoid however long it takes to compute the table and maybe avoid any unexpected drift, it might make sense to dump to YAML or something like that. I'm not sure the best approach.
- I'm also guessing there's a better option than just dumping all the regexp node types in the other list of supported regexp nodes.
- We probably should do this for other regex types--we might be missing some of the posix classes, for instance (I have not checked yet).
- Prevents crashes when having an unsupported property type in source.
- Related to #1234 (which was a very partial fix)
- Note that this turns our `\p{Latin}` formatting into `\p{latin}`. We could fix this with some very simple inflection but I wanted to do the simplest approach first to demonstrate the problem since this seems to be semantically equivalent. The ruby docs use the uppercase form. I have a text file from the upstream regex toolkit that we could use to confirm inflection rules if we want to.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant