Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loading analyzer engine & recognizer registry from configuration file #1367

Merged
merged 17 commits into from
May 1, 2024

Conversation

roeybc
Copy link
Collaborator

@roeybc roeybc commented Apr 24, 2024

Change Description

This PR adds a file-based configuration for presidio-analyzer. It allows users to:

  1. Configure their app (either Python or REST) using a set of three yaml files for each main engine:
    • Analyzer
    • Recognizer Registry
    • NLP Engine
  2. Pass those files into the Docker image and read them when the REST version of Presidio (in flask) is loaded
  3. Easily define which recognizer to use, which language they should support + specific configuration for each (for example, a list of context words per language)

roeybc and others added 16 commits April 21, 2024 17:50
* initial version of loader

* addressed comments

* linting fixes

* re structured recognizers in yaml

* addressed comments and fixed predefined recognizers loading

* added engine provider to analyzer init

* moved logic to recognizer registry provider

* some name fixes to recognizer provider

* added language support to recognizer registry

* fixed interface issues, added unit tests for providers

* fixed tests, addressed comments

* added yaml configuration to package, fixed linting rules

* move all conf file to a single location

* remove file from previous location

* merged from main, added default conf file for engine provider

* addressed some comments

* setup fixups

* remove redundant line

* fix long line

* fixing linting errors

* Update presidio-analyzer/presidio_analyzer/analyzer_engine_provider.py

Co-authored-by: Sharon Hart <sharonh.dev@gmail.com>

* updates to the existing logic for loading engines through configuration

* updates to Dockerfile

---------

Co-authored-by: roeybc <robencha@microsoft.com>
Co-authored-by: Sharon Hart <sharonh.dev@gmail.com>
…feature/engines_from_conf

# Conflicts:
#	presidio-analyzer/setup.py
@omri374
Copy link
Contributor

omri374 commented Apr 25, 2024

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

omri374
omri374 previously approved these changes Apr 28, 2024
@omri374 omri374 requested a review from SharonHart April 28, 2024 21:01
Copy link
Contributor

@SharonHart SharonHart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing outcome :)

@omri374 omri374 merged commit 2805c86 into main May 1, 2024
31 checks passed
@omri374 omri374 deleted the feature/engines_from_conf branch May 1, 2024 12:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants