-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: make ipv4 regex faster #73
Conversation
All credits for this regex to https://stackoverflow.com/a/36760050. The regex is both more correct and faster (https://esbench.com/bench/6532596e7ff73700a4debb6a). Examples: previous regex recognized "01.01.01.01" or "1.1.000.1" as correct ipv4 regex, but those aren't valid as per https://datatracker.ietf.org/doc/html/rfc5954#section-4.1.
Hi Kurt, Func fact: Actually "01.01.01.01" is a valid ip. the 0 as prefix means octal number. But node does not support octals as IPv4 segments. :D BTT: I seems, that we only need the match and not sub groups. So you could use But I find it strange that you use |
Not according to the URI spec, which explicitly prohibits this: https://datatracker.ietf.org/doc/html/rfc3986#section-7.4
I know e.g. Windows will interpret it as octet number though, but I guess "fast-uri" is meant to parse a URI (so the mentioned spec applies). The author on SO mentions this:
... but good catch! I've updated the PR
just much shorter and faster (as it does not involve lookaheads):
Hands down, I'd also have no idea how to come up with that replacement. Not even regexp-tree's optimizer does it. I'd have to dig in deeper why this works. |
Coming from this thread: ota-meshi/eslint-plugin-regexp#659
is now the fastest (at least in my microbenchmark) and most readable one. The linked thread pretty much explains why the negative lookahead and the word-boundry 'hack' (?) are the same and why this regex is now faster while being equivalent. |
Did #74 superseded this PR? |
@kurtextrem should we close this one? |
What about this claim? So in the end is |
I updated the benchmark: https://esbench.com/bench/6532596e7ff73700a4debb6a Depending on what we want, we should pick 3 (regex that matches the spec) or keep 4 (regex that retains the current semantics of matching the spec-invalid IP) |
Seems relevant |
nice find. that one is the fastest now - the question regarding whether we want to become spec compatible or not remains (the zod one also doesn't match the ones I mentioned) |
Okay, so 3 is spec compliant and faster than current? Seems like an easy win
We can investigate colinhacks/zod#3413 later imo, and I suspect others will not agree to not being correct spec wise |
Sorry, I expressed it a bit complicated. Zod's is the fastest one and also does not match I've updated to the one used by zod now. So, if we want to match IPs from the spec, let's drop the failing test that uses a non-spec IP (https://github.com/fastify/fast-uri/blob/a85e8941a3c539dd2d16f9b59cb723a64ed663bb/test/parse.test.js#L178C27-L178C41)? |
Well, yes and no. prefixing with 0 means octal notation. So for example 09 should be invalid. Only decimal notation. |
I think it's good, we should update the tests then |
Signed-off-by: Jacob Groß <kurtextrem@gmail.com>
Can you comment this out and say it's not valid? We now parse it correctly to 10.10.0.10 but it's no longer compatible... fast-uri/test/compatibility.test.js Line 13 in b9f7726
|
done |
Please update the lint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lgtm
Thanks @kurtextrem |
Thank you for the collab! |
* Update for v5 (#80) * update * up * workflows: update and pin node LTS versions (#81) * update workflow * start from 16 * perf: make ipv4 regex faster (#73) * perf: make ipv4 regex faster All credits for this regex to https://stackoverflow.com/a/36760050. The regex is both more correct and faster (https://esbench.com/bench/6532596e7ff73700a4debb6a). Examples: previous regex recognized "01.01.01.01" or "1.1.000.1" as correct ipv4 regex, but those aren't valid as per https://datatracker.ietf.org/doc/html/rfc5954#section-4.1. * avoid capture groups * update to fastest regexp variant * comment out 10.10.000.1110 * lint --------- Signed-off-by: Jacob Groß <kurtextrem@gmail.com> Co-authored-by: Gürgün Dayıoğlu <hey@gurgun.day> --------- Signed-off-by: Jacob Groß <kurtextrem@gmail.com> Co-authored-by: Gürgün Dayıoğlu <hey@gurgun.day> Co-authored-by: Jacob Groß <kurtextrem@gmail.com>
All credits for this regex to https://stackoverflow.com/a/36760050.
The regex is both more correct and faster (https://esbench.com/bench/6532596e7ff73700a4debb6a). Examples:
previous regex recognized "01.01.01.01" or "1.1.000.1" as correct ipv4 regex, but those aren't valid as per https://datatracker.ietf.org/doc/html/rfc5954#section-4.1.
I made this PR as draft, because tests are failing due to expected normalization of the ip (https://github.com/fastify/fast-uri/blob/77b659c7853730bb07e5b888b5136c4c5a1aca86/test/parse.test.js#L178C27-L178C41), but should an IP be normalized that is not valid?
Checklist
npm run test
andnpm run benchmark
and the Code of conduct