-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add regex validation to pattern utils #10376
Conversation
# validate regexes, raise Error when invalid | ||
for pattern in patterns: | ||
try: | ||
re.compile(pattern["pattern"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Compiling regex could be slowish and Python recommends reusing regex objects. Would it work to catch the exception during later compilation while training and raise exception? I mean in regex_featurizer is compiling anyway, catch exception there and raise. Not sure if we lose pattern name etc. over there? Otherwise this PR is fine and the function here would validate albeit run a little slower.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The regex_featurizer compiles when calling re.finditer
. This method takes a few parameters which messes with the error returned.
I agree that compiling more than once is less efficient, but I feel like it is more logical to validate on the utility side.
|
||
assert "Model training failed." in str(e.value) | ||
assert "not a valid regex." in str(e.value) | ||
assert "Please update your nlu training data configuration" in str(e.value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 thinking out loud, e is still in scope here so this works
Proposed changes:
addresses the OSS side of #6960
This adds regex validation to the
extract_pattern
method ofrasa/nlu/utils/pattern_utils.py
and a corresponding test.I'm not totally sure about the error message, eg:
Discussion concluded that training should fail and a readable error message displayed, however I'm not sure if this message gives the user enough info about where this config is.
Status (please check what you already did):