-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Another approach to parse tag string with spaces to avoid long running time on regex matching. #7625
Conversation
e45df9e
to
6366d33
Compare
samcli/cli/types.py
Outdated
for group in groups: | ||
key, v = group | ||
self._add_value(result, _unquote_wrapped_quotes(key), _unquote_wrapped_quotes(v)) | ||
# Instead of parsing a full {tag}={tag} pattern, we will try to look for quoted string with spaces |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we refactor this new logic out to a separate method to unit test? There's a lot of moving parts here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is tricky because the block below
groups = re.findall(self._pattern, val)
take longer time to return
samcli/cli/types.py
Outdated
@@ -194,6 +219,7 @@ def __init__(self, multiple_values_per_key=False): | |||
TAG_REGEX = '[A-Za-z0-9\\"_:\\.\\/\\+-\\@=]' | |||
|
|||
_pattern = r"{tag}={tag}".format(tag=_generate_match_regex(match_pattern=TAG_REGEX, delim=" ")) | |||
_quoted_pattern = _generate_quoted_match_regex(match_pattern=TAG_REGEX) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we reuse or refactor the old _generate_match_regex
? They share similar return values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They have different regex, hence creating a new function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The difference between the two is the extra regex string for the deliminator right? Can we make the deliminator argument optional, then just append the extra deliminator regex it was passed in? Another option is to make constants for the singe and double quote regex
samcli/cli/types.py
Outdated
space_positions = [i for i, char in enumerate(text) if char == " "] | ||
modified = text.replace(" ", replacement) | ||
|
||
return modified, space_positions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thoughts on using dataclasses for return objects, instead of returning tuples and performing array indexes to get the modified strings/replacement index array?
I imagine that changing things to dataclasses will make it easier for future readers to understand what is happening quickly.
What does this mean? Does this change how tags works for customers? |
4d19832
to
6238ced
Compare
…unning time on regex matching.
6238ced
to
081e82a
Compare
Which issue(s) does this change fix?
#6657
Why is this change necessary?
Previous fix (fix: improve tag parsing performance for list input #7049) only handle the case when
tags
are provided throughsamconfig.toml
, which resulted into a list toCfnTags
, but it doesn't fix the case when customer specify--tags
parameter. In the later case, the provided value will be in a string, and we will face the issue withre.findall
again, mainly due to the complex regex provided to parse{tag}={tag}
pattern intags
.The solution is instead of parsing a full complex regex with
{tag}={tag}
, we will look for any quote strings with space in the value (e.g. "Test App"), replace the space with some replacement, and restart the parsing process.How does it address the issue?
Run
sam deploy
with--tags tag1="value 1" tag2="value 2"
What side effects does this change have?
Customer might not be able to set tags properly during
sam deploy
.Mandatory Checklist
PRs will only be reviewed after checklist is complete
make pr
passesmake update-reproducible-reqs
if dependencies were changedBy submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.