-
Notifications
You must be signed in to change notification settings - Fork 482
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add filtering of english words from entropy (and keyword) plugins #241
Conversation
In get_raw_secret_value()
`os.path.splitext(filename)[1]` includes the '.'
The current heuristic will never return True
Start capitalized and after 2 spaces when on the same line as code
- Add `pyahocorasick` as an optional dependency See issue #240 for more information.
Forgot to make it case-insensitive, will do tomorrow. |
By .lower()ing when creating and retrieving
I used Something random that made me think it was missing things was that |
- 🐍 Refactor hadouken code - 🐍 Remove high_entropy_strings.py from the uncovered files list in tox
cee53a8
to
de7fbd2
Compare
- Change `is_secret` string names to `(true|false)-positives` (Negative meaning false-positive was confusing.) - Replace list of secrets with {filename: {plaintext: line:}} - Replace top-level `results` key with `plugins` since we have a results key already
And normalize main_test.py comments
739f3bf
to
20d1921
Compare
098e956
to
1bbdcce
Compare
1bbdcce
to
b529d8e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally seems fine to me. The main contention I have is the new class but if you have a good reason for it I want to know :)
@@ -65,7 +79,7 @@ def initialize( | |||
files_to_scan, | |||
) | |||
|
|||
for file in files_to_scan: | |||
for file in sorted(files_to_scan): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No real reason, just looked nicer when I was running it on things.
@@ -570,15 +570,18 @@ def test_determine_audit_results_plugin_config( | |||
|
|||
results = audit.determine_audit_results(baseline, '.secrets.baseline') | |||
|
|||
assert results['results']['HexHighEntropyString']['config'].items() \ | |||
>= plugins_used[0].items() | |||
assert ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TIL this formatting 👍
coverage report --show-missing --include=tests/* --fail-under 100 | ||
coverage report --show-missing --include=detect_secrets/* --fail-under 98 | ||
# This is so that we do not regress unintentionally |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️
When following a symlink, we just subtracted the `cwd` from the path. This caused us to scan non-existant files.
b529d8e
to
b440623
Compare
* use more python env * keep pypy out
* use more python env * keep pypy out
This is a solution for #240.
I will add that I am personally not against hardcoding a list of ~2,298 English words in the future, like from the
How bad can it git
paper, as that can be immediately beneficial to most users with no effort, but for now this is okay.I tried to keep the commit history pretty clean 😁