Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reporting feature #387

Merged
merged 20 commits into from
Apr 13, 2021
Merged

Reporting feature #387

merged 20 commits into from
Apr 13, 2021

Conversation

pablosnt
Copy link
Contributor

This feature allows to generate reports that include all the detected secrets details. This feature can be executed with the option --report of the audit command. The generated reports include the following fields for each detected secret:

  • class: class associated to the secret. It can be TRUE_POSITIVE, FALSE_POSITIVE or UNKNOWN.
  • filename: file where the secret was detected.
  • lines: list of lines where the secret was detected. It will contain an entry for each occurrence of the secret in these file.
  • secret: secret value in plaintext.
  • types: list of secret types associated to the secret. Note that each secret can be detected by more than one plugin.

Moreover, this feature offer the following options:

  • --only-real: only includes secrets with class UNKNOWN or TRUE_POSITIVE in the report.
  • --only-false: only includes secrets with class FALSE_POSITIVE in the report.

Both options are optional, so if no option was selected, the report will include secrets of all classes.

Next, you can see an example of execution:
report

Thank you very much!
I hope that you like this feature

Copy link
Contributor

@domanchi domanchi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, an interesting concept. Thank you for adding the screenshot -- that helps with understanding the intent of this feature much clearer.

There are a lot of developer improvements that will be released in v1: most of my comments revolve around this fact. You should check them out (and the (much more thorough) docs/) and use them, rather than rewriting a lot of shared concepts.

Please also write tests if you intend to merge this. It also looks like linting may not have happened in this branch, so make sure you setup your development environment, and run the pre-commit hooks configured.

Also, I am curious -- my assumption when writing this tool is that the secret value is rarely used multiple times within the same file. This is written as if that may not be the case. Have you encountered real secrets declared multiple times within the same file?

detect_secrets/audit/report.py Outdated Show resolved Hide resolved
detect_secrets/audit/report.py Outdated Show resolved Hide resolved
detect_secrets/audit/report.py Outdated Show resolved Hide resolved
detect_secrets/audit/report.py Outdated Show resolved Hide resolved
detect_secrets/audit/report.py Outdated Show resolved Hide resolved
detect_secrets/core/usage/audit.py Outdated Show resolved Hide resolved
detect_secrets/audit/report.py Outdated Show resolved Hide resolved
@pablosnt
Copy link
Contributor Author

Have you encountered real secrets declared multiple times within the same file?

Yes, this situation can be possible in large projects. With the information about the occurrences of a secret (line list) we can monitor the utilization of secrets for multiple systems and help the developers to retire all occurrences from the repository.

Thank you for all your comments! I'm working in them

@pablosnt pablosnt requested a review from domanchi December 23, 2020 20:22
@@ -95,6 +95,43 @@ def get_raw_secret_from_file(
raise SecretNotFoundOnSpecifiedLineError(secret.line_number)


def get_all_secrets_from_file(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is very similar to get_raw_secret_from_file with the difference that it search all the ocurrences of one secret in one file with one plugin. It allows to offer a very complete report of all secrets found with detect-secrets. The method get_all_secrets_from_file has been added to the detect_secrets/audit/common.py file, so it can be reused by other features in the future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Continuing the discussion from #387 (comment), in my side-by-side diff, I identify two distinct changes between this function and get_raw_secret_from_file:

  1. This does not require a line_number to be associated with the secret. Instead, it looks like it enumerates through the lines of the file to try and essentially try to find the secret again
  2. It returns multiple results, rather than just the first one.

Given this, it still makes sense to me to refactor these requirements into one function, for code reuse. For example:

def get_raw_secret_line_from_file(secret, line_getter_factory) -> str:
    if not secret.line_number:
        raise NoLineNumberError

    for item in get_raw_secrets_from_file(secret, line_getter_factory):
        return item.secret_value


def get_raw_secrets_from_file(secret, line_getter_factory) -> Iterator[PotentialSecret]:
    ...
    while True:
        if secret.line_number:
            # if we have a line number, let's use it.
            try:
                lines_to_scan = [line_getter.lines[secret.line_number - 1]]
                line_numbers = [secret.line_number]
            except IndexError:
                raise SecretNotFoundOnSpecifiedLineError(secret.line_number)
        else:
            # otherwise, let's just scan the whole file
            lines_to_scan = line_getter.lines
            line_numbers = range(1, len(lines_to_scan) + 1)

        for line_number, line in zip(line_numbers, lines_to_scan):
            ...

continue
detections = get_all_secrets_from_file(secret)
identifier = hashlib.sha512((secret.secret_hash + filename).encode('utf-8')).hexdigest()
line_getter = line_getter_factory(filename)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method generate_report uses the new LineGetter class to manage all the file lines.

@@ -140,3 +140,126 @@ There are times you want to extract the raw secret values to run further analysi
so with the `--raw` flag.

TODO: Example when this feature is written up.

## Report generation
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The audit documentation has been updated to include the new reporting feature information. It also includes some examples.

@@ -357,29 +357,34 @@ const secret = "hunter2";

```bash
$ detect-secrets audit --help
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The audit help output has been updated to include the new report generation options.

@pablosnt
Copy link
Contributor Author

pablosnt commented Feb 2, 2021

Hi @domanchi, I think that this new implementation of the reporting feature solves your previous comments. With this implementation, the report feature is simpler and supports the code reutilization. I hope that you like it, can you review it?

@pablosnt
Copy link
Contributor Author

Hi @domanchi, do you think this implementation is good enough to be merged? Please, tell me any improvement you want to apply. Thank you very much!

Copy link
Contributor

@domanchi domanchi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking a lot better!

Many of my comments are style-related; I think the core functionality for this report is sound. Tests look sane, with nice edge cases considered. I hope my code examples are illustrative and helpful, if Python isn't your primary development language.

Sorry it's took me so long to get to this! We've been addressing other issues internally, but finally able to get back to this effort and bring this tool to a new shiny future.

detect_secrets/core/usage/audit.py Show resolved Hide resolved
detect_secrets/audit/common.py Outdated Show resolved Hide resolved
detect_secrets/audit/common.py Outdated Show resolved Hide resolved
@@ -95,6 +95,43 @@ def get_raw_secret_from_file(
raise SecretNotFoundOnSpecifiedLineError(secret.line_number)


def get_all_secrets_from_file(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Continuing the discussion from #387 (comment), in my side-by-side diff, I identify two distinct changes between this function and get_raw_secret_from_file:

  1. This does not require a line_number to be associated with the secret. Instead, it looks like it enumerates through the lines of the file to try and essentially try to find the secret again
  2. It returns multiple results, rather than just the first one.

Given this, it still makes sense to me to refactor these requirements into one function, for code reuse. For example:

def get_raw_secret_line_from_file(secret, line_getter_factory) -> str:
    if not secret.line_number:
        raise NoLineNumberError

    for item in get_raw_secrets_from_file(secret, line_getter_factory):
        return item.secret_value


def get_raw_secrets_from_file(secret, line_getter_factory) -> Iterator[PotentialSecret]:
    ...
    while True:
        if secret.line_number:
            # if we have a line number, let's use it.
            try:
                lines_to_scan = [line_getter.lines[secret.line_number - 1]]
                line_numbers = [secret.line_number]
            except IndexError:
                raise SecretNotFoundOnSpecifiedLineError(secret.line_number)
        else:
            # otherwise, let's just scan the whole file
            lines_to_scan = line_getter.lines
            line_numbers = range(1, len(lines_to_scan) + 1)

        for line_number, line in zip(line_numbers, lines_to_scan):
            ...

detect_secrets/audit/common.py Outdated Show resolved Hide resolved
tests/audit/report_test.py Outdated Show resolved Hide resolved
tests/audit/report_test.py Outdated Show resolved Hide resolved
tests/audit/report_test.py Outdated Show resolved Hide resolved
tests/audit/report_test.py Outdated Show resolved Hide resolved
tests/audit/report_test.py Outdated Show resolved Hide resolved
@pablosnt
Copy link
Contributor Author

Hi @domanchi, thank you very much for the comments and suggestions, I think that everything is done!

PD: Contratulations for the new release of detect-secrets 💯

@pablosnt pablosnt changed the base branch from pre-v1-launch to master February 25, 2021 19:10
@pablosnt pablosnt requested a review from domanchi February 25, 2021 19:15
Copy link
Contributor

@domanchi domanchi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments, but LGTM overall. Let's get this out in our next feature release (waiting for any other patch fixes that may need to be applied to 1.0.X lol)

if secret.line_number:
try:
lines_to_scan = [line_getter.lines[secret.line_number - 1]]
line_numbers = [secret.line_number]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might have an off-by-one error here.

detect_secrets/audit/report.py Show resolved Hide resolved
detect_secrets/audit/report.py Outdated Show resolved Hide resolved
plugin.analyze_line,
filename=secret.filename,
line=line,
line_number=line_number + 1,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if secret.line_number:
             try:
                 lines_to_scan = [line_getter.lines[secret.line_number - 1]]
                 line_numbers = [secret.line_number]

I think that this code is correct, note that in the line 98 the line_number value is increased in one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but secret.line_number is already 1-based. As such, when we obtain the target_line, we subtract it by 1 (to obtain from the line_getter.lines array).

So, if we add one to it again, I think we'll get an off-by-one error.

Copy link
Contributor Author

@pablosnt pablosnt Feb 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I get it. I have runned the tests and I think that it's fine. Thank you!

@domanchi
Copy link
Contributor

@pablosantiagolopez, can you run make test and fix the surfaced issues? Looks like there's some mypy issues, and detect-secrets findings too.

@@ -44,7 +45,7 @@ def open_file(filename: str) -> 'LineGetter':
def get_raw_secret_from_file(
secret: PotentialSecret,
line_getter_factory: Callable[[str], 'LineGetter'] = open_file,
) -> str:
) -> Any:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uh, is there a reason why this is Any? str looks sound to me.

Copy link
Contributor

@syn-4ck syn-4ck Apr 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @domanchi, we made this change because mypy recommended it. I'm going to check it right now. When I have something else I will reply this comment.

Copy link
Contributor

@syn-4ck syn-4ck Apr 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I think that I got it... We should change str to Optional[str]. I will test again, run the pre-commit hook and push this change. What do you think @domanchi?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohhh, I see the issue. It's because of https://github.com/Yelp/detect-secrets/blob/master/detect_secrets/core/potential_secret.py#L66, so mypy (rightly) expects this to return the same type.

Yeah, Optional[str] works for me.

@syn-4ck
Copy link
Contributor

syn-4ck commented Apr 13, 2021

Hi again @domanchi, with the last change (#387 (comment)) I think the mypy issues are fixed. Thanks for your feedback!

@domanchi domanchi merged commit f4f7247 into Yelp:master Apr 13, 2021
jimmyhlee94 pushed a commit to jimmyhlee94/detect-secrets that referenced this pull request Aug 19, 2021
Prepare release of optional db2 feature
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants