Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add regex validation to pattern utils #10376

Merged
merged 2 commits into from
Nov 24, 2021
Merged

add regex validation to pattern utils #10376

merged 2 commits into from
Nov 24, 2021

Conversation

carlad
Copy link
Contributor

@carlad carlad commented Nov 24, 2021

Proposed changes:
addresses the OSS side of #6960

This adds regex validation to the extract_pattern method of rasa/nlu/utils/pattern_utils.py and a corresponding test.
I'm not totally sure about the error message, eg:

 "Model training failed. '*[0-9]{5}' is not a valid regex. Please update your nlu training data configuration at {'name': 'zipcode', 'pattern': '*[0-9]{5}'}."

Discussion concluded that training should fail and a readable error message displayed, however I'm not sure if this message gives the user enough info about where this config is.

Status (please check what you already did):

  • added some tests for the functionality
  • updated the documentation

@carlad carlad requested review from a team, samsucik, wochinge, indam23, joejuzl and mprazz and removed request for a team November 24, 2021 12:35
@carlad carlad enabled auto-merge (squash) November 24, 2021 12:37
@carlad carlad requested a review from mmayla November 24, 2021 13:14
# validate regexes, raise Error when invalid
for pattern in patterns:
try:
re.compile(pattern["pattern"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compiling regex could be slowish and Python recommends reusing regex objects. Would it work to catch the exception during later compilation while training and raise exception? I mean in regex_featurizer is compiling anyway, catch exception there and raise. Not sure if we lose pattern name etc. over there? Otherwise this PR is fine and the function here would validate albeit run a little slower.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regex_featurizer compiles when calling re.finditer. This method takes a few parameters which messes with the error returned.
I agree that compiling more than once is less efficient, but I feel like it is more logical to validate on the utility side.


assert "Model training failed." in str(e.value)
assert "not a valid regex." in str(e.value)
assert "Please update your nlu training data configuration" in str(e.value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 thinking out loud, e is still in scope here so this works

@carlad carlad requested a review from tayfun November 24, 2021 15:05
@carlad carlad merged commit 7b833fb into main Nov 24, 2021
@carlad carlad deleted the 6960-regex-fix branch November 24, 2021 15:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants