Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reporting feature #387

Merged
merged 20 commits into from
Apr 13, 2021
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 16 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -357,29 +357,34 @@ const secret = "hunter2";

```bash
$ detect-secrets audit --help
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The audit help output has been updated to include the new report generation options.

usage: detect-secrets audit [-h] [--diff] [--stats] [--json]
filename [filename ...]
usage: detect-secrets audit [-h] [--diff] [--stats]
[--report] [--only-real | --only-false]
[--json]
filename [filename ...]

Auditing a baseline allows analysts to label results, and optimize plugins for
the highest signal-to-noise ratio for their environment.

positional arguments:
filename Audit a given baseline file to distinguish the difference
between false and true positives.
filename Audit a given baseline file to distinguish the difference
between false and true positives.

optional arguments:
-h, --help show this help message and exit
--diff Allows the comparison of two baseline files, in order to
effectively distinguish the difference between various plugin
configurations.
--stats Displays the results of an interactive auditing session which
have been saved to a baseline file.
-h, --help show this help message and exit
--diff Allows the comparison of two baseline files, in order to
effectively distinguish the difference between various plugin
configurations.
--stats Displays the results of an interactive auditing session which
have been saved to a baseline file.
--report Displays a report with the secrets detected
--only-real Only includes real secrets in the report
--only-false Only includes false positives in the report

analytics:
Quantify the success of your plugins based on the labelled results in your
baseline. To be used with the statisitcs mode (--stats).

--json Outputs results in a machine-readable format.
--json Outputs results in a machine-readable format.
```

## Configuration
Expand Down
1 change: 1 addition & 0 deletions detect_secrets/audit/__init__.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
from . import analytics # noqa: F401
from . import report # noqa: F401
from .audit import audit_baseline # noqa: F401
from .compare import compare_baselines # noqa: F401
37 changes: 37 additions & 0 deletions detect_secrets/audit/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,43 @@ def get_raw_secret_from_file(
raise SecretNotFoundOnSpecifiedLineError(secret.line_number)


def get_all_secrets_from_file(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is very similar to get_raw_secret_from_file with the difference that it search all the ocurrences of one secret in one file with one plugin. It allows to offer a very complete report of all secrets found with detect-secrets. The method get_all_secrets_from_file has been added to the detect_secrets/audit/common.py file, so it can be reused by other features in the future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Continuing the discussion from #387 (comment), in my side-by-side diff, I identify two distinct changes between this function and get_raw_secret_from_file:

  1. This does not require a line_number to be associated with the secret. Instead, it looks like it enumerates through the lines of the file to try and essentially try to find the secret again
  2. It returns multiple results, rather than just the first one.

Given this, it still makes sense to me to refactor these requirements into one function, for code reuse. For example:

def get_raw_secret_line_from_file(secret, line_getter_factory) -> str:
    if not secret.line_number:
        raise NoLineNumberError

    for item in get_raw_secrets_from_file(secret, line_getter_factory):
        return item.secret_value


def get_raw_secrets_from_file(secret, line_getter_factory) -> Iterator[PotentialSecret]:
    ...
    while True:
        if secret.line_number:
            # if we have a line number, let's use it.
            try:
                lines_to_scan = [line_getter.lines[secret.line_number - 1]]
                line_numbers = [secret.line_number]
            except IndexError:
                raise SecretNotFoundOnSpecifiedLineError(secret.line_number)
        else:
            # otherwise, let's just scan the whole file
            lines_to_scan = line_getter.lines
            line_numbers = range(1, len(lines_to_scan) + 1)

        for line_number, line in zip(line_numbers, lines_to_scan):
            ...

secret: PotentialSecret,
line_getter_factory: Callable[[str], 'LineGetter'] = open_file,
) -> [PotentialSecret]:
"""
We're analyzing the contents straight from the baseline, and therefore, we don't know
the secret value (by design). However, we have secret hashes, filenames, and how we detected
it was a secret in the first place, so we can reverse-engineer it. This method searchs all
pablosnt marked this conversation as resolved.
Show resolved Hide resolved
the ocurrences of one secret in one file using one plugin.
pablosnt marked this conversation as resolved.
Show resolved Hide resolved
"""
plugin = cast(BasePlugin, plugins.initialize.from_secret_type(secret.type))
line_getter = line_getter_factory(secret.filename)
is_first_time_opening_file = not line_getter.has_cached_lines
all_secrets = []
while True:
for line_number, line in enumerate(line_getter.lines):
identified_secrets = call_function_with_arguments(
plugin.analyze_line,
filename=secret.filename,
line=line,
line_number=line_number + 1,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if secret.line_number:
             try:
                 lines_to_scan = [line_getter.lines[secret.line_number - 1]]
                 line_numbers = [secret.line_number]

I think that this code is correct, note that in the line 98 the line_number value is increased in one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but secret.line_number is already 1-based. As such, when we obtain the target_line, we subtract it by 1 (to obtain from the line_getter.lines array).

So, if we add one to it again, I think we'll get an off-by-one error.

Copy link
Contributor Author

@pablosnt pablosnt Feb 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I get it. I have runned the tests and I think that it's fine. Thank you!


# We enable eager search, because we *know* there's a secret here -- the baseline
# flagged it after all.
enable_eager_search=True,
pablosnt marked this conversation as resolved.
Show resolved Hide resolved
)

for identified_secret in (identified_secrets or []):
if identified_secret == secret:
all_secrets.append(identified_secret)

if len(all_secrets) == 0 and is_first_time_opening_file and not line_getter.use_eager_transformers: # noqa: E501
line_getter.use_eager_transformers = True
else:
return all_secrets


class LineGetter:
"""
The problem we try to address with this class is to cache the lines of a transformed file,
Expand Down
83 changes: 83 additions & 0 deletions detect_secrets/audit/report.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
import hashlib
from enum import Enum
from typing import Callable

from ..constants import VerifiedResult
from .common import get_all_secrets_from_file
from .common import get_baseline_from_file
from .common import LineGetter
from .common import open_file


class SecretClassToPrint(Enum):
REAL_SECRET = 1
FALSE_POSITIVE = 2

def from_class(secret_class: VerifiedResult) -> Enum:
pablosnt marked this conversation as resolved.
Show resolved Hide resolved
if secret_class in [VerifiedResult.UNVERIFIED, VerifiedResult.VERIFIED_TRUE]:
return SecretClassToPrint.REAL_SECRET
else:
return SecretClassToPrint.FALSE_POSITIVE


def generate_report(
baseline_file: str,
class_to_print: SecretClassToPrint = None,
line_getter_factory: Callable[[str], 'LineGetter'] = open_file,
) -> None:
secrets = {}
for filename, secret in get_baseline_from_file(baseline_file):
verified_result = get_verified_result_from_boolean(secret.is_secret)
if class_to_print is not None and SecretClassToPrint.from_class(verified_result) != class_to_print: # noqa: E501
pablosnt marked this conversation as resolved.
Show resolved Hide resolved
continue
detections = get_all_secrets_from_file(secret)
identifier = hashlib.sha512((secret.secret_hash + filename).encode('utf-8')).hexdigest()
pablosnt marked this conversation as resolved.
Show resolved Hide resolved
line_getter = line_getter_factory(filename)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method generate_report uses the new LineGetter class to manage all the file lines.

for detection in detections:
if identifier in secrets:
secrets[identifier]['lines'][detection.line_number] = line_getter.lines[detection.line_number - 1] # noqa: E501
if secret.type not in secrets[identifier]['types']:
secrets[identifier]['types'].append(secret.type)
secrets[identifier]['category'] = get_prioritary_verified_result(
verified_result,
VerifiedResult[secrets[identifier]['category']],
).name
else:
secrets[identifier] = {
'secrets': detection.secret_value,
'filename': filename,
'lines': {
detection.line_number: line_getter.lines[detection.line_number - 1],
},
'types': [
secret.type,
],
'category': verified_result.name,
}

output = []
for identifier in secrets:
output.append(secrets[identifier])

return output
pablosnt marked this conversation as resolved.
Show resolved Hide resolved


def get_prioritary_verified_result(
pablosnt marked this conversation as resolved.
Show resolved Hide resolved
result1: VerifiedResult,
result2: VerifiedResult,
) -> VerifiedResult:
if result1.value > result2.value:
return result1
else:
return result2


def get_verified_result_from_boolean(
is_secret: bool,
) -> VerifiedResult:
if is_secret is None:
return VerifiedResult.UNVERIFIED
elif is_secret:
return VerifiedResult.VERIFIED_TRUE
else:
return VerifiedResult.VERIFIED_FALSE
pablosnt marked this conversation as resolved.
Show resolved Hide resolved
28 changes: 28 additions & 0 deletions detect_secrets/core/usage/audit.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ def add_audit_action(parent: argparse._SubParsersAction) -> argparse.ArgumentPar
)

_add_mode_parser(parser)
_add_report_parser(parser)
_add_statistics_module(parser)
return parser

Expand All @@ -46,6 +47,33 @@ def _add_mode_parser(parser: argparse.ArgumentParser) -> None:
)


def _add_report_parser(parser: argparse.ArgumentParser) -> None:
parser.add_argument(
'--report',
action='store_true',
help=(
'Displays a report with the secrets detected'
),
)

report_parser = parser.add_mutually_exclusive_group()
report_parser.add_argument(
'--only-real',
action='store_true',
help=(
'Only includes real secrets in the report'
),
)

report_parser.add_argument(
'--only-false',
action='store_true',
help=(
'Only includes false positives in the report'
),
)
pablosnt marked this conversation as resolved.
Show resolved Hide resolved


def _add_statistics_module(parent: argparse.ArgumentParser) -> None:
parser = parent.add_argument_group(
title='analytics',
Expand Down
13 changes: 13 additions & 0 deletions detect_secrets/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -120,6 +120,19 @@ def handle_audit_action(args: argparse.Namespace) -> None:
print(json.dumps(stats.json(), indent=2))
else:
print(str(stats))
elif args.report:
class_to_print = None
if args.only_real:
class_to_print = audit.report.SecretClassToPrint.REAL_SECRET
elif args.only_false:
class_to_print = audit.report.SecretClassToPrint.FALSE_POSITIVE
print(
json.dumps(
audit.report.generate_report(args.filename[0], class_to_print),
indent=4,
sort_keys=True,
),
)
else:
# Starts interactive session.
if args.diff:
Expand Down
123 changes: 123 additions & 0 deletions docs/audit.md
Original file line number Diff line number Diff line change
Expand Up @@ -140,3 +140,126 @@ There are times you want to extract the raw secret values to run further analysi
so with the `--raw` flag.

TODO: Example when this feature is written up.

## Report generation
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The audit documentation has been updated to include the new reporting feature information. It also includes some examples.


Maybe, you need to generate a full report with all the detect-secrets findings. You can generate
one with the `--report` flag:

'''bash
$ detect-secrets audit --report .secret.baseline
[
{
"category": "VERIFIED_TRUE",
"filename": "test.properties",
"lines": {
"1": "secret=value",
"6": "password=value"
},
"secrets": "value",
"types": [
"Secret Keyword"
]
},
{
"category": "UNVERIFIED",
"filename": "test.properties",
"lines": {
"2": "password=changeit",
"5": "password=changeit"
},
"secrets": "changeit",
"types": [
"Secret Keyword"
]
},
{
"category": "VERIFIED_TRUE",
"filename": "test.properties",
"lines": {
"3": "password=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6IkpvaG4gRG9lIiwiaWF0IjoxNTE2MjM5MDIyfQ.",
"4": "test=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6IkpvaG4gRG9lIiwiaWF0IjoxNTE2MjM5MDIyfQ."
},
"secrets": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6IkpvaG4gRG9lIiwiaWF0IjoxNTE2MjM5MDIyfQ.",
"types": [
"Secret Keyword",
"JSON Web Token"
]
},
{
"category": "VERIFIED_FALSE",
"filename": "test.properties",
"lines": {
"7": "password=faketest"
},
"secrets": "faketest",
"types": [
"Secret Keyword"
]
}
]
'''

You can also select only the real secrets with the option `--only-real`:

'''bash
$ detect-secrets audit --report --only-real .secret.baseline
[
{
"category": "VERIFIED_TRUE",
"filename": "test.properties",
"lines": {
"1": "secret=value",
"6": "password=value"
},
"secrets": "value",
"types": [
"Secret Keyword"
]
},
{
"category": "UNVERIFIED",
"filename": "test.properties",
"lines": {
"2": "password=changeit",
"5": "password=changeit"
},
"secrets": "changeit",
"types": [
"Secret Keyword"
]
},
{
"category": "VERIFIED_TRUE",
"filename": "test.properties",
"lines": {
"3": "password=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6IkpvaG4gRG9lIiwiaWF0IjoxNTE2MjM5MDIyfQ.",
"4": "test=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6IkpvaG4gRG9lIiwiaWF0IjoxNTE2MjM5MDIyfQ."
},
"secrets": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6IkpvaG4gRG9lIiwiaWF0IjoxNTE2MjM5MDIyfQ.",
"types": [
"JSON Web Token",
"Secret Keyword"
]
}
]
'''

Or include only the false positives with `--only-false`:

'''bash
$ detect-secrets audit --report --only-false .secret.baseline
[
{
"category": "VERIFIED_FALSE",
"filename": "test.properties",
"lines": {
"7": "password=faketest"
},
"secrets": "faketest",
"types": [
"Secret Keyword"
]
}
]
'''
Loading