Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update element hashing logic - subscribe only. otherwise, any element update would break element hashing #1784

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

wintonzheng
Copy link
Contributor

@wintonzheng wintonzheng commented Feb 18, 2025

Important

Update clean_element_before_hashing() in scraper.py to select specific keys for consistent element hashing in subscription scenarios.

  • Behavior:
    • Update clean_element_before_hashing() in scraper.py to select specific keys (frame, tagName, attributes, beforePseudoText, text, afterPseudoText, children) for hashing instead of removing id and rect.
    • Ensures consistent element hashing when elements are updated, particularly for subscription scenarios.

This description was created by Ellipsis for 42c6afc. It will automatically update as commits are pushed.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Reviewed everything up to 42c6afc in 1 minute and 23 seconds

More details
  • Looked at 20 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 3 drafted comments based on config settings.
1. skyvern/webeye/scraper/scraper.py:159
  • Draft comment:
    Consider using a dictionary comprehension to simplify selective deep-copy (e.g., {k: copy.deepcopy(element[k]) for k in keys if k in element}). Also, add a comment explaining why these keys are chosen.
  • Reason this comment was not posted:
    Confidence changes required: 33% <= threshold 50%
    None
2. skyvern/webeye/scraper/scraper.py:159
  • Draft comment:
    Consider using a dict comprehension for building element_copy for clarity. For example: element_copy = {k: copy.deepcopy(element[k]) for k in allowed_keys if k in element}.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50%
    The suggested dict comprehension would make the code more concise and arguably more "Pythonic". However, the current for loop version is also quite clear and straightforward. The dict comprehension would do the exact same thing in a slightly more compact way. This is a relatively minor stylistic suggestion rather than a functional improvement.
    The current code is already quite readable and maintainable. The dict comprehension might actually be slightly harder to read for developers less familiar with Python's more advanced features.
    While dict comprehensions are a good Python feature, suggesting their use here is more of a personal style preference rather than a clear improvement.
    This comment should be deleted as it suggests a minor stylistic change that doesn't meaningfully improve the code.
3. skyvern/webeye/scraper/scraper.py:159
  • Draft comment:
    Ensure the whitelist of keys for hashing (frame, tagName, attributes, beforePseudoText, text, afterPseudoText, children) is complete and well documented, to avoid unintentional omissions.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 50%
    This comment is asking the PR author to ensure that the whitelist of keys is complete and well documented. This falls under the category of asking the author to double-check things, which is against the rules. Therefore, this comment should be removed.

Workflow ID: wflow_XTONjHfN0hv7GKiW


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

element_copy.pop("rect", None)
element_copy = {}
# pick up these element keys: frame, tagName, attributes, beforePseudoText, text, afterPseudoText, children
for key in ["frame", "tagName", "attributes", "beforePseudoText", "text", "afterPseudoText", "children"]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why should we keep frame attribute?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'll remove.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants