-
Notifications
You must be signed in to change notification settings - Fork 817
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: ipv4 address regex #3808
fix: ipv4 address regex #3808
Conversation
The last test failed on changelog not being updated. My last commit ce5bf74 bumps the changelog. Let me know if this needs to be reverted. |
@cragwolfe Is there anything else you need from me? (Pinging you since you seem to be an active contributor for this project - apologies if I'm mistaken.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, however please update the version to be 0.16.11-dev0
.
Thanks for the contribution, @praktiskt !
90d388c
to
c1d47f8
Compare
@cragwolfe Done! Thanks. |
I noticed the ipv4 regex is wrong (it only capture one or two-digit octets, e.g.
n.nn.n.nn
). Here's a correction and a bumped test for it.If you wish I can break out the ipv4 test to its own case, so we don't interfere with the existing
EMAIL_META_DATA_INPUT
ipv6 extraction test.Side note: The comment at
unstructured/nlp/patterns.py#95
includes a bad ipv4 address example (last octet is wrongfully left-padded with a zero). I left it as it is because I'm not sure if the intention is to include "non-conventional" ipv4 addresses, like octal or hexadecimal octets.