Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/presidio-structured #1192

Merged
merged 65 commits into from
Jan 14, 2024
Merged

Conversation

Jakob-98
Copy link
Collaborator

@Jakob-98 Jakob-98 commented Oct 24, 2023

Change Description

The proposed approach is to build a library (presidio-structured) which re-uses existing logic from existing presidio components to allow anonymization of (semi-)structured data. A priority is to have a recognizable user experience/interface compared to the existing library components. This has been a much requested feature, see for instance:
Supporting structured / semi-structured data with Presidio · microsoft/presidio · Discussion #714 (github.com)

In the sample folder there is a notebook showcasing the logic to be supported in V1 of presidio-structured

Issue reference

This PR fixes issue #714

Checklist

  • I have reviewed the contribution guidelines
  • I have signed the CLA (if required)
  • My code includes unit tests
  • All unit tests and lint checks pass locally
  • My PR contains documentation updates / additions if required

changelog

Static analysis

docstrings, types

preliminary tests engine

static analysis

isort

Minor refactorings

Update README.md

Fix late binding issues and example

removal of old samples

Refactoring, adding example

pre-clean-break-commit

broken commit, fixing TabularConfigBuilder

Rename TabularConfig

pre-breaking replace commit

removal of some old experimental files

rename tabular to structured

restructuring presidio tabular - pre del commit

Add project TODOs

testing dump presidio tabular
@Jakob-98 Jakob-98 force-pushed the feature/presidio-tabular branch from ccb469a to 8c6be26 Compare October 26, 2023 10:54
@omri374
Copy link
Contributor

omri374 commented Oct 31, 2023

Thanks @Jakob-98!
Could you please add a CI step, similar to the other python modules, so that we could run unit tests and other tests in the CI?

@Jakob-98
Copy link
Collaborator Author

Jakob-98 commented Nov 1, 2023

Thanks @Jakob-98! Could you please add a CI step, similar to the other python modules, so that we could run unit tests and other tests in the CI?

@omri374 Yes, working on it

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@SharonHart
Copy link
Contributor

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@Jakob-98 Jakob-98 changed the title [Draft] Feature/presidio-structured Feature/presidio-structured Nov 16, 2023
@Jakob-98
Copy link
Collaborator Author

Jakob-98 commented Nov 16, 2023

Opening up for review

Items left from my side:

  • logging
  • add README
  • move sample
  • TabularAnalysis builder to ABC -> PandasAnalysisBuilder should be concrete impl.
  • CI

@Jakob-98 Jakob-98 marked this pull request as ready for review November 16, 2023 08:46
@Jakob-98 Jakob-98 requested a review from a team as a code owner November 16, 2023 08:46
@omri374
Copy link
Contributor

omri374 commented Nov 16, 2023

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Contributor

@omri374 omri374 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, I did an initial review and added some comments.

presidio-structured/README.md Show resolved Hide resolved
presidio-structured/__init__.py Outdated Show resolved Hide resolved
presidio-structured/tests/data/__init__.py Outdated Show resolved Hide resolved
presidio-structured/sample/example.ipynb Outdated Show resolved Hide resolved
presidio-structured/sample/example.ipynb Outdated Show resolved Hide resolved
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@SharonHart
Copy link
Contributor

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Contributor

@SharonHart SharonHart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comment, looks great!

.pipelines/templates/build-structured.yml Outdated Show resolved Hide resolved
presidio-structured/README.md Outdated Show resolved Hide resolved
@omri374
Copy link
Contributor

omri374 commented Dec 24, 2023

Agree with Sharon, this looks great! Added a few comments, all minor.

@Jakob-98
Copy link
Collaborator Author

Jakob-98 commented Jan 2, 2024

Happy 2024, and thanks for the review! Will address once some dev time frees up on my end :)

Copy link
Contributor

@omri374 omri374 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Amazing work :)

@omri374 omri374 merged commit 966d17a into microsoft:main Jan 14, 2024
26 checks passed
@omri374 omri374 mentioned this pull request Feb 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants