Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve url detector #1398

Merged
merged 5 commits into from
Jun 7, 2024
Merged

Improve url detector #1398

merged 5 commits into from
Jun 7, 2024

Conversation

afogel
Copy link
Contributor

@afogel afogel commented Jun 6, 2024

Change Description

Currently, the presidio URL detector does not appropriately recognize URLs from less common, albeit valid, gTLDs. For example, if attempting to recognize the URL of https://webhook.site/a8eedfd6-9d8a-44e0-b0fc-cc7d517db5dc?q=1&b=2, the recognizer will only match on https://webhook.si, thus missing the complete URL including query params.
This change fixes it.

The list of URLs was pulled from the list of gTLDs supported by namecheap as of today:
https://www.namecheap.com/domains/full-tld-list/

Issue reference

There was no related issue opened, this was simply uncovered through the use of presidio analyzers.

Checklist

  • I have reviewed the contribution guidelines
  • My code includes unit tests
  • All unit tests and lint checks pass locally
  • My PR contains documentation updates / additions if required

@afogel
Copy link
Contributor Author

afogel commented Jun 6, 2024

@microsoft-github-policy-service agree [company="Pillar Security"]

@afogel
Copy link
Contributor Author

afogel commented Jun 6, 2024

@microsoft-github-policy-service agree company="Pillar Security"

@omri374
Copy link
Contributor

omri374 commented Jun 6, 2024

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Contributor

@omri374 omri374 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@omri374 omri374 merged commit a3a609b into microsoft:main Jun 7, 2024
32 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants