Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(performance): reduce the number of files scanned after merge conflict #946

Conversation

fnareoh
Copy link
Collaborator

@fnareoh fnareoh commented Aug 7, 2024

Context

The goal of this Merge Request is to improve the speed of the ggshield pre-commit scan when commiting after a merge conflict.

What has been done

I created a script gen_merge_commit.py (temporarily added at the repository root) that creates a repo with two branches (master and feature_branch) a merge commit (with or without commit depending on the options) giving the following commit graph:

*   17d3a27 (HEAD -> feature_branch) Merge branch 'master' into feature_branch
|\  
| * db20e84 (master) Commit master n°2
| * 53a4cd0 Commit master n°1
| * a78f68d Commit master n°0
* | 9f5717c Commit feature branch
|/  
* e1db622 Initial commit

I then experimented and with it:

  • Without conflict: our pre-commit does not seem to be called on merge
  • With conflict: I implemented a proof of concept way to limit the number of files scan: when ggshield pre-commit is called
    - if MERGE_HEAD does not exists, we assume it is a regular commit and we should scan everything (current behavior)
    - if MERGE_HEAD exists: we list the files about to be committed (git diff --staged --name-only), then compare the current shas of those files (using the command git ls-files --staged <list_files_diff>) with the shas of HEAD and MERGE_HEAD (obtained via git ls-tree <list_files_diff>), and if it differs from both, then the file has been modified and should be re-scanned...

I'm opening this PR as a draft as it is clearly not ready (no proper automated testing) but the general idea should be reviewed...

@fnareoh fnareoh self-assigned this Aug 7, 2024
Copy link

codecov bot commented Aug 7, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 91.54%. Comparing base (40a904b) to head (c3b3b62).
Report is 9 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #946      +/-   ##
==========================================
+ Coverage   91.49%   91.54%   +0.05%     
==========================================
  Files         178      178              
  Lines        7321     7372      +51     
==========================================
+ Hits         6698     6749      +51     
  Misses        623      623              
Flag Coverage Δ
unittests 91.54% <100.00%> (+0.05%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@fnareoh fnareoh force-pushed the garancegourdel/scrt-4759-optimize-files-to-scan-on-merge-commits branch from 1d63a47 to 5fb03d9 Compare August 8, 2024 09:01
@fnareoh fnareoh requested a review from agateau-gg August 8, 2024 09:05
Copy link
Collaborator

@agateau-gg agateau-gg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall, your approach makes sense.

I am wondering if we always want this behavior or if it should be opt-in 🤔. What do you think?

One remark: calls to ls-files and ls-tree should use -z to handle files with what git calls "unusual names" (*). An easy way to create an "unusual name" is to create one with a double quote in it.

(*): Other parts of ggshield would need to be fixed with regard to that too...

@fnareoh fnareoh force-pushed the garancegourdel/scrt-4759-optimize-files-to-scan-on-merge-commits branch 2 times, most recently from 67ae7da to e33938f Compare August 9, 2024 09:44
tests/unit/cmd/scan/test_precommit.py Outdated Show resolved Hide resolved
tests/unit/cmd/scan/test_precommit.py Outdated Show resolved Hide resolved
tests/unit/core/scan/test_commit.py Outdated Show resolved Hide resolved
tests/functional/secret/test_merge_commit.py Outdated Show resolved Hide resolved
@fnareoh fnareoh force-pushed the garancegourdel/scrt-4759-optimize-files-to-scan-on-merge-commits branch 2 times, most recently from ae9fca1 to d95d1a6 Compare August 9, 2024 14:44
@fnareoh fnareoh force-pushed the garancegourdel/scrt-4759-optimize-files-to-scan-on-merge-commits branch from d95d1a6 to 3e4b043 Compare August 19, 2024 14:36
@fnareoh
Copy link
Collaborator Author

fnareoh commented Aug 19, 2024

@agateau-gg I started doing something because it looked convenient in my tests and possibly useful but maybe you won't like it: I added an option to ggshield install to install the hook with option.

For example, ggshield install -o --json will install the following hook: ggshield secret scan pre-commit --json "$@".

But as it seems to fail on windows, I would like to check with you before debugging... 😓

@agateau-gg
Copy link
Collaborator

@agateau-gg I started doing something because it looked convenient in my tests and possibly useful but maybe you won't like it: I added an option to ggshield install to install the hook with option.

For example, ggshield install -o --json will install the following hook: ggshield secret scan pre-commit --json "$@".

But as it seems to fail on windows, I would like to check with you before debugging... 😓

The idea sounds good, but it should be in a separate commit or maybe even in a separate PR.

@fnareoh fnareoh force-pushed the garancegourdel/scrt-4759-optimize-files-to-scan-on-merge-commits branch 2 times, most recently from fee9967 to 43c340f Compare August 21, 2024 12:02
@fnareoh fnareoh marked this pull request as ready for review August 21, 2024 12:33
ggshield/cmd/secret/scan/precommit.py Outdated Show resolved Hide resolved
ggshield/core/scan/commit.py Outdated Show resolved Hide resolved
tests/functional/utils_create_merge_repo.py Outdated Show resolved Hide resolved
tests/functional/utils_create_merge_repo.py Outdated Show resolved Hide resolved
tests/functional/utils_create_merge_repo.py Outdated Show resolved Hide resolved
tests/functional/utils_create_merge_repo.py Outdated Show resolved Hide resolved
ggshield/core/scan/commit.py Outdated Show resolved Hide resolved
ggshield/core/scan/commit.py Outdated Show resolved Hide resolved
ggshield/core/scan/commit.py Outdated Show resolved Hide resolved
ggshield/core/scan/commit.py Show resolved Hide resolved
@fnareoh fnareoh force-pushed the garancegourdel/scrt-4759-optimize-files-to-scan-on-merge-commits branch 2 times, most recently from 629756b to 29abeab Compare August 23, 2024 12:15
Copy link
Collaborator

@agateau-gg agateau-gg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor remarks, but this is much easier to read!

ggshield/core/scan/commit.py Outdated Show resolved Hide resolved
ggshield/core/scan/commit_utils.py Outdated Show resolved Hide resolved
ggshield/core/scan/commit_utils.py Outdated Show resolved Hide resolved
ggshield/core/scan/commit.py Show resolved Hide resolved
@fnareoh fnareoh force-pushed the garancegourdel/scrt-4759-optimize-files-to-scan-on-merge-commits branch from 29abeab to 6dcb59a Compare August 26, 2024 13:38
@fnareoh fnareoh force-pushed the garancegourdel/scrt-4759-optimize-files-to-scan-on-merge-commits branch from 6dcb59a to c3b3b62 Compare September 3, 2024 14:17
@fnareoh fnareoh requested review from agateau-gg and removed request for agateau-gg September 3, 2024 19:13
@agateau-gg agateau-gg merged commit b0b0c85 into main Sep 4, 2024
28 checks passed
@agateau-gg agateau-gg deleted the garancegourdel/scrt-4759-optimize-files-to-scan-on-merge-commits branch September 4, 2024 10:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants