Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/presidio-structured #1192

Merged
merged 65 commits into from
Jan 14, 2024
Merged
Show file tree
Hide file tree
Changes from 45 commits
Commits
Show all changes
65 commits
Select commit Hold shift + click to select a range
8c6be26
presidio-structured
Jakob-98 Sep 13, 2023
87e4d18
Add unit tests
Jakob-98 Oct 31, 2023
99a5b5d
Merge branch 'main' into feature/presidio-tabular
omri374 Oct 31, 2023
f9ec126
rename engine, add buildfile
Jakob-98 Nov 9, 2023
e4622fd
Update setup.py
Jakob-98 Nov 9, 2023
d4f37de
Merge branch 'main' into feature/presidio-tabular
omri374 Nov 10, 2023
1097a62
Merge branch 'main' into feature/presidio-tabular
SharonHart Nov 16, 2023
1427528
lint-build-test
Jakob-98 Nov 17, 2023
7a971ac
Merge branch 'feature/presidio-tabular' of https://github.com/Jakob-9…
Jakob-98 Nov 17, 2023
463beba
Update lint-build-test.yml
Jakob-98 Nov 17, 2023
5f36b40
Add packages to setup.py
Nov 22, 2023
6693817
Update presidio-structured to alpha version
Nov 22, 2023
25e961e
Update Presidio structured README.md
Nov 22, 2023
c356dd2
Add logging configuration to presidio-structured
Nov 22, 2023
3d9bf2f
Refactor AnalysisBuilder constructor to accept an
Nov 22, 2023
fe0750f
Fix entity mapping in JsonAnalysisBuilder
Nov 22, 2023
48a0cd6
Drop type in docstring in analysis builder classes
Nov 22, 2023
7a6ed72
Refactor TabularAnalysisBuilder to use
Nov 22, 2023
fff9a36
Update data_reader.py with type hints for file
Nov 22, 2023
0915d9f
Update data_reader.py to include additional
Nov 22, 2023
d0db1c3
Update Transformer to Processor term in
Nov 22, 2023
3931558
Add PandasDataProcessor as default to StructuredEngine
Nov 22, 2023
5977230
Move structured sample files to the docs
Nov 22, 2023
1770112
Add Presidio Structured Notebook to samples index
Nov 22, 2023
c202f0c
Remove unnecessary imports in structured sample
Nov 22, 2023
91f9f6b
Update to processors in structured __init__ files
Nov 22, 2023
d71ff88
Add explanation for structured table sample
Nov 22, 2023
15e03c3
Delete unnecessary __init__s in structured test
Nov 22, 2023
354e223
Fix bug in JsonAnalysisBuilder entity mapping
Nov 23, 2023
f637f34
Merge pull request #1 from ebotiab/feature/presidio-tabular
Jakob-98 Nov 24, 2023
db1f3d8
pr comments, nits, minor tests
Jakob-98 Nov 24, 2023
29f7f8a
README
Jakob-98 Nov 27, 2023
33182bb
Add TabularAnalysisBuilder
Jakob-98 Nov 27, 2023
43c39d8
Some basic logging
Jakob-98 Nov 27, 2023
411f1bd
linting
Jakob-98 Nov 27, 2023
e31ff12
Fix typo in logger variable name
Nov 27, 2023
bdd7e20
Refactor analysis builder to include score
Nov 27, 2023
15b756f
Linting, continued
Jakob-98 Nov 27, 2023
d4e317c
Update Pipfile
Jakob-98 Nov 27, 2023
78f0c01
Merge remote-tracking branch 'upstream/feature/presidio-tabular' into…
Nov 27, 2023
6513668
Refactor JsonAnalysisBuilder to support language
Nov 27, 2023
df2a4e0
Fix not camel case in TabularAnalysisBuilder
Nov 27, 2023
75da36a
Add score_threshold parameter to AnalysisBuilder
Nov 27, 2023
54fb99c
Refactor JSON analysis builder to gain consistency
Nov 27, 2023
7fe314a
Remove low score results in JsonAnalysisBuilder
Nov 27, 2023
c25d82f
Add tests to json analysis with score threshold
Nov 27, 2023
0f3364d
Fix bug in JSON analysis to update map with
Nov 27, 2023
0d6ebfc
Fix bug in JSON analysis to take only entity types
Nov 27, 2023
5f60ee5
Fix typos in test anl json names and assert values
Nov 27, 2023
b942513
Update build-structured.yml
Jakob-98 Nov 28, 2023
f042ffe
Create __init__.py
omri374 Nov 29, 2023
22ee87d
Type hint fix python <3.10, loggger typo
Jakob-98 Nov 29, 2023
c60e727
Merge branch 'feature/presidio-tabular' of https://github.com/Jakob-9…
Jakob-98 Nov 29, 2023
575498f
Update setup.py
Jakob-98 Nov 29, 2023
0499de0
Merge branch 'feature/presidio-tabular' into analysis-builder-improve…
Nov 30, 2023
8246d22
Merge branch 'main' into feature/presidio-tabular
omri374 Dec 3, 2023
6977c5d
Merge branch 'main' into feature/presidio-tabular
omri374 Dec 11, 2023
b522b88
Merge branch 'main' into feature/presidio-tabular
SharonHart Dec 24, 2023
38beac7
Merge pull request #3 from ebotiab/analysis-builder-improvements
Jakob-98 Jan 9, 2024
4e2bea4
PR comments variety
Jakob-98 Jan 9, 2024
0388c89
further pr comments
Jakob-98 Jan 9, 2024
cdc8923
readme, refactor score, refactor tabular analysis
Jakob-98 Jan 9, 2024
6985aa7
Update test_analysis_builder.py
Jakob-98 Jan 9, 2024
0a87783
lint
Jakob-98 Jan 10, 2024
1ed6e6c
Merge branch 'main' into feature/presidio-tabular
omri374 Jan 11, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 28 additions & 0 deletions .pipelines/templates/build-structured.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
steps:
- task: Bash@3
displayName: 'Setup pipenv'
inputs:
targetType: 'inline'
script: |
set -eux # fail on error
python -m pip install --upgrade pip
python -m pip install pipenv
pipenv --python 3

- task: Bash@3
displayName: 'Install deps'
inputs:
targetType: 'inline'
workingDirectory: 'presidio-structured'
script: |
set -eux # fail on error
export PYTHONPATH=.
Jakob-98 marked this conversation as resolved.
Show resolved Hide resolved
pipenv install --deploy --dev
pipenv run pip install -e ../presidio-analyzer/. # Use the existing analyzer and not the one in PyPI
pipenv run pip install -e ../presidio-anonymizer/. # Use the existing analyzer and not the one in PyPI

- template: ./build-python.yml
parameters:
SERVICE: 'Structured'
WORKING_FOLDER: 'presidio-structured'

23 changes: 23 additions & 0 deletions .pipelines/templates/lint-build-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,7 @@ stages:
versionSpec: '$(python.version)'
displayName: 'Use Python $(python.version)'
- template: ./build-image-redactor.yml

- job: TestCli
displayName: Test Cli
pool:
Expand All @@ -97,3 +98,25 @@ stages:
versionSpec: '$(python.version)'
displayName: 'Use Python $(python.version)'
- template: ./build-cli.yml

- job: TestStructured
displayName: Test Presidio Structured
pool:
vmImage: 'ubuntu-latest'
strategy:
matrix:
Python38:
python.version: '3.8'
Python39:
python.version: '3.9'
Python310:
python.version: '3.10'
Python311:
python.version: '3.11'

steps:
- task: UsePythonVersion@0
inputs:
versionSpec: '$(python.version)'
displayName: 'Use Python $(python.version)'
- template: ./build-structured.yml
8 changes: 7 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,12 @@

All notable changes to this project will be documented in this file.


## [Unreleased]
### Added
#### Structured
* Added alpha of presidio-structured, a library (presidio-structured) which re-uses existing logic from existing presidio components to allow anonymization of (semi-)structured data.

## [2.2.351] - Nov. 6th 2024
### Changed
#### Analyzer
Expand All @@ -17,6 +23,7 @@ All notable changes to this project will be documented in this file.
#### Analyzer
* Put org in ignore as it has many FPs (#1200)


## [2.2.34] - Oct. 30th 2024

### Added
Expand Down Expand Up @@ -66,7 +73,6 @@ All notable changes to this project will be documented in this file.
* Changed the ACR instance (#1089)
* Updated to Cred Scan V3 (#1154)


## [2.2.33] - June 1st 2023
### Added
#### Anonymizer
Expand Down
1 change: 1 addition & 0 deletions docs/samples/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
| Usage | Images | Python Notebook | [Plot custom bounding boxes](https://github.com/microsoft/presidio/blob/main/docs/samples/python/plot_custom_bboxes.ipynb)
| Usage | Text | Python Notebook | [Integrating with external services](https://github.com/microsoft/presidio/blob/main/docs/samples/python/integrating_with_external_services.ipynb) |
| Usage | Text | Python file | [Remote Recognizer](https://github.com/microsoft/presidio/blob/main/docs/samples/python/example_remote_recognizer.py) |
| Usage | Structured | Python Notebook | [Presidio Structured Basic Usage Notebook](https://github.com/microsoft/presidio/blob/main/docs/samples/python/example_structured.ipynb) |
| Usage | Text | Python file | [Azure AI Language as a Remote Recognizer](python/text_analytics/index.md) |
| Usage | CSV | Python file | [Analyze and Anonymize CSV file](https://github.com/microsoft/presidio/blob/main/docs/samples/python/process_csv_file.py) |
| Usage | Text | Python | [Using Flair as an external PII model](https://github.com/microsoft/presidio/blob/main/docs/samples/python/flair_recognizer.py)|
Expand Down
4 changes: 4 additions & 0 deletions docs/samples/python/csv_sample_data/test_structured.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
id,name,email,street,city,state,postal_code
1,John Doe,john.doe@example.com,123 Main St,Anytown,CA,12345
2,Jane Smith,jane.smith@example.com,456 Elm St,Somewhere,TX,67890
3,Alice Johnson,alice.johnson@example.com,789 Pine St,Elsewhere,NY,11223
Loading