Make `regexp/no-dupe-disjunctions` account for nested alternatives #404

RunDevelopment · 2022-03-11T18:51:36Z

This fixes #402.

To implement this, I needed to introduce 2 new concepts to the code base:

Nested alternatives: These are alternatives or things that behave like alternatives (e.g. character class elements). They are called nested because they are the (transitive) child nodes of some root alternative.
E.g. if a(c|b)[de] is the root alternative, then b, c, d, and e are nested alternatives.
Partial NFAs: The partial NFA of a given root alternative and nested alternative is the NFA that contains all paths of the root alternative that go through the nested alternative.
E.g. for the root alternative (a|b)(c|d|e)f and nested alternative d, the language of the partial NFA will be (a|b)df.

These 2 concepts are useful because they allow us to ask: Is there some part of an alternative that is a subset?

This is actually quite a bit more general than simple de-sugaring of character classes. This PR allows use to detect partial subsets and duplications within alternatives, and that includes character classes.

Side note: I expected this to massively impact performance. We are potentially creating a bunch more NFAs and DFAs than before, but (surprisingly) I couldn't measure any difference when I ran this PR against Prism's 2k regexes. (It did find 5 bugs though.)

ota-meshi

Thank you for your wonderful work!!
I made two comments.

lib/rules/no-dupe-disjunctions.ts

Co-authored-by: Yosuke Ota <otameshiyo23@gmail.com>

RunDevelopment · 2022-03-15T11:29:17Z

Sorry for the inconvenience. I just found a bug in PartialParser that caused it to construct the partial NFA incorrectly for alternatives. I should have tested this as well.

tests/lib/rules/no-dupe-disjunctions.ts

RunDevelopment · 2022-03-15T17:44:55Z

Again, sorry for the inconvenience, @ota-meshi.

I think I'm done. Could you please review?

ota-meshi

LGTM!

RunDevelopment added 8 commits March 11, 2022 10:39

Removed usage of deprected isDisjointWith function

3a3d0ea

Added partial parser

c2e8df8

Implemented nested subsets

fb54544

Added tests for nested subset

eebac90

Added nested prefix subsets

6197450

Added tests for nested prefix subsets

5afdc14

Added missing docs

0905c5b

Improved deduping

ccc0275

RunDevelopment added the enhancement New feature or request label Mar 11, 2022

RunDevelopment requested a review from ota-meshi March 11, 2022 18:51

ota-meshi requested changes Mar 14, 2022

View reviewed changes

lib/rules/no-dupe-disjunctions.ts Outdated Show resolved Hide resolved

lib/rules/no-dupe-disjunctions.ts Outdated Show resolved Hide resolved

RunDevelopment and others added 3 commits March 15, 2022 11:42

Update lib/rules/no-dupe-disjunctions.ts

9560e3b

Co-authored-by: Yosuke Ota <otameshiyo23@gmail.com>

Fixed linting error

abd5285

Fixed and documented optimization

5f985a1

RunDevelopment requested a review from ota-meshi March 15, 2022 11:08

RunDevelopment added 2 commits March 15, 2022 12:15

Added failing tests for bug

e6113ea

Fixed incorrect partial NFA construction

dc8b4ad

RunDevelopment commented Mar 15, 2022

View reviewed changes

tests/lib/rules/no-dupe-disjunctions.ts Outdated Show resolved Hide resolved

RunDevelopment added 4 commits March 15, 2022 12:42

Fixed capturing group warning for nested alternatives

b423491

Proper message escaping

60860b8

Improved messages

1956fdd

Don't repeat the loc code

cae140e

RunDevelopment mentioned this pull request Mar 15, 2022

Add suggestions for regexp/no-dupe-disjunctions #406

Merged

ota-meshi approved these changes Mar 15, 2022

View reviewed changes

ota-meshi merged commit 68776bd into master Mar 15, 2022

ota-meshi deleted the issue402 branch March 15, 2022 23:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make `regexp/no-dupe-disjunctions` account for nested alternatives #404

Make `regexp/no-dupe-disjunctions` account for nested alternatives #404

RunDevelopment commented Mar 11, 2022

ota-meshi left a comment

RunDevelopment commented Mar 15, 2022

RunDevelopment commented Mar 15, 2022

ota-meshi left a comment

Make regexp/no-dupe-disjunctions account for nested alternatives #404

Make regexp/no-dupe-disjunctions account for nested alternatives #404

Conversation

RunDevelopment commented Mar 11, 2022

ota-meshi left a comment

Choose a reason for hiding this comment

RunDevelopment commented Mar 15, 2022

RunDevelopment commented Mar 15, 2022

ota-meshi left a comment

Choose a reason for hiding this comment

Make `regexp/no-dupe-disjunctions` account for nested alternatives #404

Make `regexp/no-dupe-disjunctions` account for nested alternatives #404