Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: Added basic HTMl parsing logic #62

Merged
merged 2 commits into from
Jan 27, 2025

Conversation

sauravpanda
Copy link
Member

@sauravpanda sauravpanda commented Jan 27, 2025

Comprehensive Browser Agent Updates and HTML Cleaner Utility

  • Purpose:
    Enhance the Browser Agent project with new features and an HTML cleaner utility to extract structured data from web content.
  • Key Changes:
    • Updated GitHub Actions to streamline the build, lint, and test processes for the Browser Agent project.
    • Introduced a new release workflow that automates versioning and tagging on main branch pushes.
    • Added a new React + TypeScript + Vite demo for the Browser Agent, including components for chat and browsing.
    • Improved README documentation with new model entries and setup instructions for the Browser Agent demo.
    • Updated package version and added Jest for testing with a new configuration.
    • Implemented an HTML cleaner utility with methods to remove specific tags and attributes, extract semantic content, identify interactive elements, preserve heading structure, and extract metadata and schema.org information.
  • Impact:
    These changes improve the development workflow, enhance project usability, ensure better testing and deployment practices, and provide a powerful tool for processing and analyzing HTML content.

✨ Generated with love by Kaizen ❤️

Original Description None

@sauravpanda sauravpanda linked an issue Jan 27, 2025 that may be closed by this pull request
Copy link
Contributor

@kaizen-bot kaizen-bot bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider implementing the following changes to improve the code.

Comment on lines +71 to +72
await this.wait(delay);
return this.executeWithRetry(fn, attempt + 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment: Potential performance issue with multiple await calls in a loop.

Solution: Consider using Promise.all for concurrent execution of independent promises where applicable.
!! Make sure the following suggestion is correct before committing it !!

Suggested change
await this.wait(delay);
return this.executeWithRetry(fn, attempt + 1);
await Promise.all([this.wait(delay), this.executeWithRetry(fn, attempt + 1)]);

src/core/agent/html-cleaner.ts Show resolved Hide resolved
@kaizen-bot kaizen-bot bot requested a review from shreyashkgupta January 27, 2025 10:12
Copy link
Contributor

kaizen-bot bot commented Jan 27, 2025

🔍 Code Review Summary

Attention Required: This push has potential issues. 🚨

Overview

  • Total Feedbacks: 1 (Critical: 1, Refinements: 0)
  • Files Affected: 1
  • Code Quality: [█████████████████░░░] 85% (Good)

🚨 Critical Issues

security (1 issues)

1. Potential XSS vulnerability due to unescaped HTML content.


📁 File: src/core/agent/html-cleaner.ts
🔍 Reasoning:
The innerHTML property is being used to set HTML content directly, which can lead to cross-site scripting (XSS) vulnerabilities if the input is not sanitized properly. This is particularly concerning if the html parameter can be influenced by user input.

💡 Solution:
Use a library or built-in methods to sanitize the HTML before setting it to innerHTML.

Current Code:

tempElement.innerHTML = html;

Suggested Code:

        tempElement.innerHTML = sanitizeHTML(html); // Ensure sanitizeHTML is a function that properly escapes or removes unsafe content.

✨ Generated with love by Kaizen ❤️

Useful Commands
  • Feedback: Share feedback on kaizens performance with !feedback [your message]
  • Ask PR: Reply with !ask-pr [your question]
  • Review: Reply with !review
  • Update Tests: Reply with !unittest to create a PR with test changes

Copy link
Contributor

@kaizen-bot kaizen-bot bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider implementing the following changes to improve the code.

src/core/agent/html-cleaner.ts Show resolved Hide resolved
@sauravpanda sauravpanda merged commit d0bb577 into main Jan 27, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add Webpage context extraction
1 participant