Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

caching semgrep scan #652

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open

caching semgrep scan #652

wants to merge 5 commits into from

Conversation

ojokure
Copy link
Collaborator

@ojokure ojokure commented Dec 20, 2024

Why are these changes needed?

Cache semgrep scan

Related issue number

Checks

  • I've added tests (if relevant) corresponding to the changes introduced in this PR.
  • I've made sure all auto checks have passed.

Summary by Sourcery

New Features:

  • Cache Semgrep scan results to improve performance.

Copy link
Contributor

sourcery-ai bot commented Dec 20, 2024

Reviewer's Guide by Sourcery

This pull request implements caching for semgrep scans, storing results in a local SQLite database. It also adds autoescaping to the Jinja2 environment for enhanced security.

Sequence diagram for semgrep scan with caching

sequenceDiagram
    participant Agent
    participant Cache
    participant Semgrep
    participant FileSystem

    Agent->>Semgrep: Run scan with custom args
    Semgrep->>FileSystem: Write findings to JSON file
    alt scan successful
        Agent->>FileSystem: Read findings file
        Agent->>Cache: Cache results
        Note over Agent,Cache: Store in ~/.verinfast_cache/semgrep.db
    end
    Agent->>FileSystem: Process and update findings
    Agent->>FileSystem: Write processed findings
Loading

Class diagram showing Agent class modifications

classDiagram
    class Agent {
        +Cache cache
        +Config config
        +String directory
        +__init__()
        +create_template()
        +parseRepo(path: str, repo_name: str, branch: str)
    }
    class Cache {
        +set(key, value)
    }
    Agent --> Cache: uses
    note for Agent "Added caching capability"
    note for Cache "New cache implementation for semgrep results"
Loading

File-Level Changes

Change Details Files
Implement caching for semgrep scan results
  • Initialized a cache directory and database path in the user's home directory.
  • Created a Cache object to manage the semgrep cache.
  • Stored semgrep scan results in the cache after a successful scan.
  • Added error handling for caching failures.
  • Modified output file handling to use a consistent file name format based on the repository name.
src/verinfast/agent.py
Add autoescaping to Jinja2 environment
  • Added autoescape=select_autoescape(['html', 'xml']) to the Jinja2 environment to prevent cross-site scripting (XSS) vulnerabilities.
src/verinfast/agent.py
Improve logging and error handling
  • Improved logging messages for errors during semgrep execution and findings post-processing.
  • Added more specific error messages for dependency checks and AWS CLI profile matching.
src/verinfast/agent.py
Refactor file handling and naming
  • Standardized output file names to use a consistent format of <repo_name>.<type>.json.
  • Simplified file handling logic.
src/verinfast/agent.py
Update preflight checks
  • Improved logging messages in preflight checks for repository access and AWS account access.
  • Added more detailed error messages when repository access fails.
src/verinfast/agent.py

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time. You can also use
    this command to specify where the summary should be inserted.

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @ojokure - I've reviewed your changes - here's some feedback:

Overall Comments:

  • The caching implementation needs cache invalidation logic to avoid using stale results. Consider adding checks based on file modification times or content hashes before using cached scan results.
  • Consider adding cache management features like maximum cache size limits and a way to clear old entries to prevent unbounded cache growth.
Here's what I looked at during the review
  • 🟢 General issues: all looks good
  • 🟢 Security: all looks good
  • 🟢 Testing: all looks good
  • 🟢 Complexity: all looks good
  • 🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@aylusltd
Copy link
Contributor

Please make sure to add unit tests.

Copy link
Collaborator

@SeanTConrad SeanTConrad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One change on some deleted lines.

findings_success = False
if not self.config.dry:
self.log(msg=repo_name, tag="Scanning repository", display=True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These lines are important. They make it not run at all if it's "dry".

@ojokure ojokure force-pushed the cachehash-semgrep-scan branch from 335e8d4 to 9cbaaa9 Compare January 9, 2025 21:30
@codecov-commenter
Copy link

codecov-commenter commented Jan 9, 2025

Codecov Report

Attention: Patch coverage is 63.04348% with 17 lines in your changes missing coverage. Please review.

Project coverage is 77.80%. Comparing base (f7709e1) to head (d5ca268).

Files with missing lines Patch % Lines
src/verinfast/agent.py 63.04% 17 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #652      +/-   ##
==========================================
+ Coverage   77.59%   77.80%   +0.20%     
==========================================
  Files          29       29              
  Lines        2178     2194      +16     
==========================================
+ Hits         1690     1707      +17     
+ Misses        488      487       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ojokure ojokure force-pushed the cachehash-semgrep-scan branch 2 times, most recently from 1b96ad3 to 2b149ab Compare January 9, 2025 21:52
@ojokure ojokure force-pushed the cachehash-semgrep-scan branch from 2b149ab to df75812 Compare January 9, 2025 21:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants