caching semgrep scan #652

ojokure · 2024-12-20T11:17:56Z

Why are these changes needed?

Cache semgrep scan

Related issue number

Checks

I've added tests (if relevant) corresponding to the changes introduced in this PR.
I've made sure all auto checks have passed.

Summary by Sourcery

New Features:

Cache Semgrep scan results to improve performance.

sourcery-ai · 2024-12-20T11:18:01Z

Reviewer's Guide by Sourcery

This pull request implements caching for semgrep scans, storing results in a local SQLite database. It also adds autoescaping to the Jinja2 environment for enhanced security.

Sequence diagram for semgrep scan with caching

sequenceDiagram
    participant Agent
    participant Cache
    participant Semgrep
    participant FileSystem

    Agent->>Semgrep: Run scan with custom args
    Semgrep->>FileSystem: Write findings to JSON file
    alt scan successful
        Agent->>FileSystem: Read findings file
        Agent->>Cache: Cache results
        Note over Agent,Cache: Store in ~/.verinfast_cache/semgrep.db
    end
    Agent->>FileSystem: Process and update findings
    Agent->>FileSystem: Write processed findings

Class diagram showing Agent class modifications

classDiagram
    class Agent {
        +Cache cache
        +Config config
        +String directory
        +__init__()
        +create_template()
        +parseRepo(path: str, repo_name: str, branch: str)
    }
    class Cache {
        +set(key, value)
    }
    Agent --> Cache: uses
    note for Agent "Added caching capability"
    note for Cache "New cache implementation for semgrep results"

File-Level Changes

Change	Details	Files
Implement caching for semgrep scan results	Initialized a cache directory and database path in the user's home directory. Created a `Cache` object to manage the semgrep cache. Stored semgrep scan results in the cache after a successful scan. Added error handling for caching failures. Modified output file handling to use a consistent file name format based on the repository name.	`src/verinfast/agent.py`
Add autoescaping to Jinja2 environment	Added `autoescape=select_autoescape(['html', 'xml'])` to the Jinja2 environment to prevent cross-site scripting (XSS) vulnerabilities.	`src/verinfast/agent.py`
Improve logging and error handling	Improved logging messages for errors during semgrep execution and findings post-processing. Added more specific error messages for dependency checks and AWS CLI profile matching.	`src/verinfast/agent.py`
Refactor file handling and naming	Standardized output file names to use a consistent format of `<repo_name>.<type>.json`. Simplified file handling logic.	`src/verinfast/agent.py`
Update preflight checks	Improved logging messages in preflight checks for repository access and AWS account access. Added more detailed error messages when repository access fails.	`src/verinfast/agent.py`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time. You can also use
this command to specify where the summary should be inserted.

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai

Hey @ojokure - I've reviewed your changes - here's some feedback:

Overall Comments:

The caching implementation needs cache invalidation logic to avoid using stale results. Consider adding checks based on file modification times or content hashes before using cached scan results.
Consider adding cache management features like maximum cache size limits and a way to clear old entries to prevent unbounded cache growth.

Here's what I looked at during the review

🟢 General issues: all looks good
🟢 Security: all looks good
🟢 Testing: all looks good
🟢 Complexity: all looks good
🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

aylusltd · 2024-12-20T13:31:24Z

Please make sure to add unit tests.

SeanTConrad

One change on some deleted lines.

SeanTConrad · 2025-01-08T03:17:20Z

src/verinfast/agent.py

            findings_success = False
-            if not self.config.dry:
-                self.log(msg=repo_name, tag="Scanning repository", display=True)


These lines are important. They make it not run at all if it's "dry".

codecov-commenter · 2025-01-09T21:45:10Z

Codecov Report

Attention: Patch coverage is 63.04348% with 17 lines in your changes missing coverage. Please review.

Project coverage is 77.80%. Comparing base (f7709e1) to head (d5ca268).

Files with missing lines	Patch %	Lines
src/verinfast/agent.py	63.04%	17 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #652      +/-   ##
==========================================
+ Coverage   77.59%   77.80%   +0.20%     
==========================================
  Files          29       29              
  Lines        2178     2194      +16     
==========================================
+ Hits         1690     1707      +17     
+ Misses        488      487       -1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

caching semgrep scan

57b9d48

ojokure requested review from aylusltd and SeanTConrad December 20, 2024 11:17

sourcery-ai bot reviewed Dec 20, 2024

View reviewed changes

added test

4c49ad7

SeanTConrad requested changes Jan 8, 2025

View reviewed changes

ojokure force-pushed the cachehash-semgrep-scan branch from 335e8d4 to 9cbaaa9 Compare January 9, 2025 21:30

ojokure force-pushed the cachehash-semgrep-scan branch 2 times, most recently from 1b96ad3 to 2b149ab Compare January 9, 2025 21:52

sean's feedback

df75812

ojokure force-pushed the cachehash-semgrep-scan branch from 2b149ab to df75812 Compare January 9, 2025 21:53

SeanTConrad and others added 2 commits January 13, 2025 15:37

Merge branch 'main' into cachehash-semgrep-scan

452a02f

Update agent.py

d5ca268

SeanTConrad approved these changes Jan 13, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

caching semgrep scan #652

caching semgrep scan #652

ojokure commented Dec 20, 2024 •

edited by sourcery-ai bot

Loading

sourcery-ai bot commented Dec 20, 2024 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

sourcery-ai bot left a comment

aylusltd commented Dec 20, 2024

SeanTConrad left a comment

SeanTConrad Jan 8, 2025

codecov-commenter commented Jan 9, 2025 •

edited

Loading

caching semgrep scan #652

Are you sure you want to change the base?

caching semgrep scan #652

Conversation

ojokure commented Dec 20, 2024 • edited by sourcery-ai bot Loading

Why are these changes needed?

Related issue number

Checks

Summary by Sourcery

sourcery-ai bot commented Dec 20, 2024 • edited Loading

Reviewer's Guide by Sourcery

Sequence diagram for semgrep scan with caching

Class diagram showing Agent class modifications

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

sourcery-ai bot left a comment

Choose a reason for hiding this comment

aylusltd commented Dec 20, 2024

SeanTConrad left a comment

Choose a reason for hiding this comment

SeanTConrad Jan 8, 2025

Choose a reason for hiding this comment

codecov-commenter commented Jan 9, 2025 • edited Loading

Codecov Report

ojokure commented Dec 20, 2024 •

edited by sourcery-ai bot

Loading

sourcery-ai bot commented Dec 20, 2024 •

edited

Loading

codecov-commenter commented Jan 9, 2025 •

edited

Loading