Wrong handling for metacharacters in sets, e.g. `[\w0-9]` #4

Zac-HD · 2019-10-09T23:54:43Z

[\w0-9] gets translated to [[a-zA-Z]0-9], which is invalid. See python-jsonschema/jsonschema#612.

I think fixing this may require us to actually parse the pattern rather than just doing local replacements, and it certainly indicates that we need to test against a much wider range of regular expressions! (i.e. write a better strategy, and contribute a larger set of patterns upstream)

The text was updated successfully, but these errors were encountered:

Zac-HD · 2019-10-17T12:18:53Z

See tests: https://github.com/Zac-HD/js-regex/compare/fix-sets

jayvdb · 2020-02-09T16:59:48Z

One possible partial workaround, for at least \w and similar, is to use the Python re engine in bytes mode which doesnt support Unicode. That would not help if the regex also needs Unicode support, but often that wont be needed, as JS regexp historically hasnt been used for unicode because it was poorly supported. The user could turn this on explicitly to ensure they are informed about the side effects.

jayvdb · 2020-02-10T07:36:11Z

Or simpler, use re.ASCII.

Zac-HD mentioned this issue Dec 4, 2019

Checking for the absence of \ before a replacement isn't robust #6

Open

jayvdb mentioned this issue Feb 9, 2020

Bad escape \Z #8

Open

anentropic linked a pull request May 10, 2020 that will close this issue

Support metachars in character sets, including 'negated' short-hands like \W #14

Open

nezhar mentioned this issue Jun 18, 2021

Add support for draft 2020-12 python-jsonschema/jsonschema#817

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wrong handling for metacharacters in sets, e.g. `[\w0-9]` #4

Wrong handling for metacharacters in sets, e.g. `[\w0-9]` #4

Zac-HD commented Oct 9, 2019

Zac-HD commented Oct 17, 2019

jayvdb commented Feb 9, 2020

jayvdb commented Feb 10, 2020

Wrong handling for metacharacters in sets, e.g. [\w0-9] #4

Wrong handling for metacharacters in sets, e.g. [\w0-9] #4

Comments

Zac-HD commented Oct 9, 2019

Zac-HD commented Oct 17, 2019

jayvdb commented Feb 9, 2020

jayvdb commented Feb 10, 2020

Wrong handling for metacharacters in sets, e.g. `[\w0-9]` #4

Wrong handling for metacharacters in sets, e.g. `[\w0-9]` #4