Skip to content
This repository has been archived by the owner on Aug 26, 2020. It is now read-only.

Wrong handling for metacharacters in sets, e.g. [\w0-9] #4

Open
Zac-HD opened this issue Oct 9, 2019 · 3 comments · May be fixed by #14
Open

Wrong handling for metacharacters in sets, e.g. [\w0-9] #4

Zac-HD opened this issue Oct 9, 2019 · 3 comments · May be fixed by #14

Comments

@Zac-HD
Copy link
Owner

Zac-HD commented Oct 9, 2019

[\w0-9] gets translated to [[a-zA-Z]0-9], which is invalid. See python-jsonschema/jsonschema#612.

I think fixing this may require us to actually parse the pattern rather than just doing local replacements, and it certainly indicates that we need to test against a much wider range of regular expressions! (i.e. write a better strategy, and contribute a larger set of patterns upstream)

@Zac-HD
Copy link
Owner Author

Zac-HD commented Oct 17, 2019

@jayvdb
Copy link

jayvdb commented Feb 9, 2020

One possible partial workaround, for at least \w and similar, is to use the Python re engine in bytes mode which doesnt support Unicode. That would not help if the regex also needs Unicode support, but often that wont be needed, as JS regexp historically hasnt been used for unicode because it was poorly supported. The user could turn this on explicitly to ensure they are informed about the side effects.

@jayvdb
Copy link

jayvdb commented Feb 10, 2020

Or simpler, use re.ASCII.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants