Skip to content

Commit

Permalink
feat: Ignored file extensions can now be configured in the PII scanner (
Browse files Browse the repository at this point in the history
#559)

secureli-558

I'm on a Go project, and need the ability to add additional file
extensions, as the default set doesn't include them. Within the project,
go.mod and go.sum were including entries that appeared to be phone
numbers. There was no way to configure PII Scanner to ignore these
files, as any practical contents of these files would not be PII.

Rather than simply add them to the growing set of excluded extensions, I
thought I would make the set configurable.

As the set contains language-agnostic files, I also elected NOT to
include the go-specific files in that set.

A good follow-up work would be to also lay out, and selectively
activate, additional extensions to ignore based on the languages
configured in the repo.

## Changes
<!-- A detailed list of changes -->
* `pii_scanner` is added to the .secureli-config.yaml file structure
* within that, `ignored_extensions` is a list of extensions that can be
added to the default set
* PII scanner will now ignore all files in its default set of excluded
files, as well as anything provided in the PII-scanner's specific
config.

## Testing
I tested with a repo containing only some go-based package files
(specifically go.mod and go.sum), which were appearing to the PII
scanner to contain phone numbers.

## Clean Code Checklist
<!-- This is here to support you. Some/most checkboxes may not apply to
your change -->
- [ ] Meets acceptance criteria for issue (n/a)
- [ ] New logic is covered with automated tests
- [x] Appropriate exception handling added
- [x] Thoughtful logging included
- [x] Documentation is updated
- [ ] Follow-up work is documented in TODOs
- [ ] TODOs have a ticket associated with them
- [x] No commented-out code included


<!--
Github-flavored markdown reference:
https://docs.github.com/en/get-started/writing-on-github
-->
  • Loading branch information
tristanl-slalom authored Jun 4, 2024
1 parent b30cfb0 commit 482cd57
Show file tree
Hide file tree
Showing 6 changed files with 51 additions and 4 deletions.
9 changes: 8 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -133,10 +133,11 @@ seCureLI is configurable via a .secureli.yaml file present in the root of your l
### top level
| Key | Description |
| ------------------ | ------------------------------------------------------------------------------------------------------------------ |
|--------------------|--------------------------------------------------------------------------------------------------------------------|
| `repo_files` | Affects how seCureLI will interpret the repository, both for language analysis and as it executes various linters. |
| `echo` | Adjusts how seCureLI will print information to the user. |
| `language_support` | Affects seCureLI's language analysis and support phase. |
| `pii_scanner` | Includes options for seCureLI's PII scanner |
| `telemetry` | Includes options for seCureLI telemetry/api logging |
### repo_files
Expand All @@ -153,6 +154,12 @@ seCureLI is configurable via a .secureli.yaml file present in the root of your l
| ------- | -------------------------------------------------------------------------------------------------------------------------------------------------- |
| `level` | The log level to display to the user. Defaults to ERROR, which includes `error` and `print` messages, without including warnings or info messages. |
### pii_scanner
| Key | Description |
|----------------------|----------------------------------------------------------------|
| `ignored_extensions` | The extensions of files to ignore in addition to the defaults. |
### telemetry
| Key | Description |
Expand Down
1 change: 1 addition & 0 deletions secureli/container.py
Original file line number Diff line number Diff line change
Expand Up @@ -141,6 +141,7 @@ class Container(containers.DeclarativeContainer):
PiiScannerService,
repo_files=repo_files_repository,
echo=echo,
ignored_extensions=config.pii_scanner.ignored_extensions,
)

updater_service = providers.Factory(
Expand Down
7 changes: 6 additions & 1 deletion secureli/modules/pii_scanner/pii_scanner.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,9 +35,14 @@ def __init__(
self,
repo_files: RepoFilesRepository,
echo: EchoAbstraction,
ignored_extensions: list[str],
):
self.repo_files = repo_files
self.echo = echo
self.ignored_extensions = ignored_extensions
if ignored_extensions != IGNORED_EXTENSIONS:
# Make sure the original ignored extensions are always present
self.ignored_extensions = list(set(ignored_extensions + IGNORED_EXTENSIONS))

def scan_repo(
self,
Expand Down Expand Up @@ -96,7 +101,7 @@ def scan_repo(

def _file_extension_excluded(self, filename) -> bool:
_, file_extension = os.path.splitext(filename)
if file_extension in IGNORED_EXTENSIONS:
if file_extension in self.ignored_extensions:
return True

return False
Expand Down
12 changes: 11 additions & 1 deletion secureli/modules/shared/models/repository.py
Original file line number Diff line number Diff line change
@@ -1,11 +1,20 @@
from enum import Enum
from typing import Optional
from pydantic import BaseModel, BaseSettings, Field

from secureli.modules.shared.consts.pii import IGNORED_EXTENSIONS
from secureli.modules.shared.consts.repository import default_ignored_extensions
from secureli.modules.shared.models.echo import Level
from secureli.modules.shared.models.language import LanguageSupportSettings


class PiiScannerSettings(BaseSettings):
"""
Settings that adjust how seCureLI evaluates the PII of the consuming repository.
"""

ignored_extensions: list[str] = Field(default=IGNORED_EXTENSIONS)


class RepoFilesSettings(BaseSettings):
"""
Settings that adjust how seCureLI evaluates the consuming repository.
Expand Down Expand Up @@ -76,3 +85,4 @@ class SecureliFile(BaseModel):
echo: Optional[EchoSettings] = None
language_support: Optional[LanguageSupportSettings] = Field(default=None)
telemetry: Optional[TelemetrySettings] = None
pii_scanner: Optional[PiiScannerSettings] = None
1 change: 1 addition & 0 deletions secureli/settings.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ class Settings(pydantic.BaseSettings):
echo: repo_settings.EchoSettings = repo_settings.EchoSettings()
language_support: LanguageSupportSettings = LanguageSupportSettings()
telemetry: repo_settings.TelemetrySettings = repo_settings.TelemetrySettings()
pii_scanner: repo_settings.PiiScannerSettings = repo_settings.PiiScannerSettings()

class Config:
env_file_encoding = "utf-8"
Expand Down
25 changes: 24 additions & 1 deletion tests/modules/pii_scanner/test_pii_scanner_service.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
import contextlib, io
from pathlib import Path
from secureli.modules.pii_scanner.pii_scanner import PiiScannerService
from secureli.modules.shared.consts.pii import IGNORED_EXTENSIONS
from secureli.modules.shared.models.scan import ScanMode


Expand Down Expand Up @@ -51,7 +52,18 @@ def mock_re(mocker: MockerFixture) -> MagicMock:
def pii_scanner_service(
mock_repo_files_repository: MagicMock, mock_echo: MagicMock
) -> PiiScannerService:
return PiiScannerService(mock_repo_files_repository, mock_echo)
return PiiScannerService(
mock_repo_files_repository, mock_echo, ignored_extensions=IGNORED_EXTENSIONS
)


@pytest.fixture()
def pii_scanner_service_alternate_extensions(
mock_repo_files_repository: MagicMock, mock_echo: MagicMock
) -> PiiScannerService:
return PiiScannerService(
mock_repo_files_repository, mock_echo, ignored_extensions=["go.mod", "go.sum"]
)


def test_that_pii_scanner_service_finds_potential_pii(
Expand Down Expand Up @@ -127,3 +139,14 @@ def test_that_pii_scanner_prints_when_exceptions_encountered(

mock_echo.print.assert_called_once()
assert "Error PII scanning" in mock_echo.print.call_args.args[0]


def test_that_pii_scanner_accepts_alternate_ignored_extensions(
pii_scanner_service_alternate_extensions: PiiScannerService,
mock_open_fn: MagicMock,
mock_echo: MagicMock,
):
# Assert that both the custom extensions and the standard defaults
# are present in the initialized ignored extensions.
assert "go.mod" in pii_scanner_service_alternate_extensions.ignored_extensions
assert ".jpeg" in pii_scanner_service_alternate_extensions.ignored_extensions

0 comments on commit 482cd57

Please sign in to comment.