Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make regexp/no-dupe-disjunctions account for nested alternatives #404

Merged
merged 17 commits into from
Mar 15, 2022

Conversation

RunDevelopment
Copy link
Collaborator

This fixes #402.


To implement this, I needed to introduce 2 new concepts to the code base:

  1. Nested alternatives: These are alternatives or things that behave like alternatives (e.g. character class elements). They are called nested because they are the (transitive) child nodes of some root alternative.
    E.g. if a(c|b)[de] is the root alternative, then b, c, d, and e are nested alternatives.
  2. Partial NFAs: The partial NFA of a given root alternative and nested alternative is the NFA that contains all paths of the root alternative that go through the nested alternative.
    E.g. for the root alternative (a|b)(c|d|e)f and nested alternative d, the language of the partial NFA will be (a|b)df.

These 2 concepts are useful because they allow us to ask: Is there some part of an alternative that is a subset?

This is actually quite a bit more general than simple de-sugaring of character classes. This PR allows use to detect partial subsets and duplications within alternatives, and that includes character classes.


Side note: I expected this to massively impact performance. We are potentially creating a bunch more NFAs and DFAs than before, but (surprisingly) I couldn't measure any difference when I ran this PR against Prism's 2k regexes. (It did find 5 bugs though.)

@RunDevelopment RunDevelopment added the enhancement New feature or request label Mar 11, 2022
Copy link
Owner

@ota-meshi ota-meshi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your wonderful work!!
I made two comments.

lib/rules/no-dupe-disjunctions.ts Outdated Show resolved Hide resolved
lib/rules/no-dupe-disjunctions.ts Outdated Show resolved Hide resolved
@RunDevelopment
Copy link
Collaborator Author

Sorry for the inconvenience. I just found a bug in PartialParser that caused it to construct the partial NFA incorrectly for alternatives. I should have tested this as well.

@RunDevelopment
Copy link
Collaborator Author

Again, sorry for the inconvenience, @ota-meshi.

I think I'm done. Could you please review?

Copy link
Owner

@ota-meshi ota-meshi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@ota-meshi ota-meshi merged commit 68776bd into master Mar 15, 2022
@ota-meshi ota-meshi deleted the issue402 branch March 15, 2022 23:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

regexp/no-dupe-disjunctions should "de-sugar" character classes
2 participants