Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: ipv4 address regex #3808

Merged
merged 7 commits into from
Dec 9, 2024

Conversation

praktiskt
Copy link
Contributor

I noticed the ipv4 regex is wrong (it only capture one or two-digit octets, e.g. n.nn.n.nn). Here's a correction and a bumped test for it.

If you wish I can break out the ipv4 test to its own case, so we don't interfere with the existing EMAIL_META_DATA_INPUT ipv6 extraction test.

Side note: The comment at unstructured/nlp/patterns.py#95 includes a bad ipv4 address example (last octet is wrongfully left-padded with a zero). I left it as it is because I'm not sure if the intention is to include "non-conventional" ipv4 addresses, like octal or hexadecimal octets.

@praktiskt
Copy link
Contributor Author

The last test failed on changelog not being updated. My last commit ce5bf74 bumps the changelog. Let me know if this needs to be reverted.

@praktiskt
Copy link
Contributor Author

@cragwolfe Is there anything else you need from me? (Pinging you since you seem to be an active contributor for this project - apologies if I'm mistaken.)

Copy link
Contributor

@cragwolfe cragwolfe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, however please update the version to be 0.16.11-dev0.
Thanks for the contribution, @praktiskt !

@praktiskt praktiskt force-pushed the praktiskt/fix-ipv4-regex branch from 90d388c to c1d47f8 Compare December 8, 2024 18:24
@praktiskt
Copy link
Contributor Author

@cragwolfe Done! Thanks.

@cragwolfe cragwolfe merged commit 1e2da6d into Unstructured-IO:main Dec 9, 2024
41 checks passed
@praktiskt praktiskt deleted the praktiskt/fix-ipv4-regex branch December 11, 2024 13:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants