Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Another approach to parse tag string with spaces to avoid long running time on regex matching. #7625

Merged
merged 2 commits into from
Nov 27, 2024

Conversation

dkphm
Copy link
Contributor

@dkphm dkphm commented Oct 29, 2024

Which issue(s) does this change fix?

#6657

Why is this change necessary?

  • Previous fix (fix: improve tag parsing performance for list input #7049) only handle the case when tags are provided through samconfig.toml, which resulted into a list to CfnTags, but it doesn't fix the case when customer specify --tags parameter. In the later case, the provided value will be in a string, and we will face the issue with re.findall again, mainly due to the complex regex provided to parse {tag}={tag} pattern in tags.

  • The solution is instead of parsing a full complex regex with {tag}={tag}, we will look for any quote strings with space in the value (e.g. "Test App"), replace the space with some replacement, and restart the parsing process.

How does it address the issue?

Run sam deploy with --tags tag1="value 1" tag2="value 2"

What side effects does this change have?

Customer might not be able to set tags properly during sam deploy.

Mandatory Checklist

PRs will only be reviewed after checklist is complete

  • Add input/output type hints to new functions/methods
  • Write design document if needed (Do I need to write a design document?)
  • Write/update unit tests
  • Write/update integration tests
  • Write/update functional tests if needed
  • make pr passes
  • make update-reproducible-reqs if dependencies were changed
  • Write documentation

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@dkphm dkphm requested a review from a team as a code owner October 29, 2024 19:09
@dkphm dkphm requested review from lucashuy and mildaniel October 29, 2024 19:09
@github-actions github-actions bot added pr/external stage/needs-triage Automatically applied to new issues and PRs, indicating they haven't been looked at. labels Oct 29, 2024
@dkphm dkphm added pr/internal and removed pr/external stage/needs-triage Automatically applied to new issues and PRs, indicating they haven't been looked at. labels Oct 29, 2024
@dkphm dkphm force-pushed the tags_with_space branch 2 times, most recently from e45df9e to 6366d33 Compare October 29, 2024 20:23
@dkphm dkphm requested review from hawflau and removed request for mildaniel October 29, 2024 21:18
samcli/cli/types.py Outdated Show resolved Hide resolved
samcli/cli/types.py Outdated Show resolved Hide resolved
samcli/cli/types.py Outdated Show resolved Hide resolved
samcli/cli/types.py Outdated Show resolved Hide resolved
for group in groups:
key, v = group
self._add_value(result, _unquote_wrapped_quotes(key), _unquote_wrapped_quotes(v))
# Instead of parsing a full {tag}={tag} pattern, we will try to look for quoted string with spaces
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we refactor this new logic out to a separate method to unit test? There's a lot of moving parts here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is tricky because the block below

groups = re.findall(self._pattern, val)

take longer time to return

@@ -194,6 +219,7 @@ def __init__(self, multiple_values_per_key=False):
TAG_REGEX = '[A-Za-z0-9\\"_:\\.\\/\\+-\\@=]'

_pattern = r"{tag}={tag}".format(tag=_generate_match_regex(match_pattern=TAG_REGEX, delim=" "))
_quoted_pattern = _generate_quoted_match_regex(match_pattern=TAG_REGEX)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we reuse or refactor the old _generate_match_regex? They share similar return values.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They have different regex, hence creating a new function.

Copy link
Contributor

@lucashuy lucashuy Nov 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The difference between the two is the extra regex string for the deliminator right? Can we make the deliminator argument optional, then just append the extra deliminator regex it was passed in? Another option is to make constants for the singe and double quote regex

samcli/cli/types.py Outdated Show resolved Hide resolved
space_positions = [i for i, char in enumerate(text) if char == " "]
modified = text.replace(" ", replacement)

return modified, space_positions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts on using dataclasses for return objects, instead of returning tuples and performing array indexes to get the modified strings/replacement index array?

I imagine that changing things to dataclasses will make it easier for future readers to understand what is happening quickly.

@lucashuy
Copy link
Contributor

lucashuy commented Nov 5, 2024

What side effects does this change have?

Customer might not be able to set tags properly during sam deploy.

What does this mean? Does this change how tags works for customers?

@dkphm dkphm force-pushed the tags_with_space branch 3 times, most recently from 4d19832 to 6238ced Compare November 21, 2024 19:36
@dkphm dkphm added this pull request to the merge queue Nov 27, 2024
Merged via the queue into aws:develop with commit 575253a Nov 27, 2024
55 checks passed
@dkphm dkphm deleted the tags_with_space branch November 27, 2024 17:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants