-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bench/typed: calling csv-simple-parser in a more optimized way #17
Bench/typed: calling csv-simple-parser in a more optimized way #17
Conversation
imo, for typed parsing "0" is definitely 0 and not "0", aside from inference mistakes where the rest of the column contains non-numbers, like "John". so a "0" would be a string in the same column. i'll re-bench soon, in Bun this time :) i'm waiting for a few memory improvements to land there first. |
That makes a lot of sense, but I'm not sure the library should check that kind of thing 🤔 Either way, I've updated the custom infer functions for the benchmark to match the desired behavior.
😱 curious to see the changes there. IIRC |
i've been locally benchmarking in Bun for quite a while, 60% improvement is typical for the faster libs, especially those that minimize mem allocation and string concat. |
Running my little benchmark for my library in Bun apparently makes it ~25% faster lol some tests are closer to ~50% faster. It'd be interesting to see the result in your benchmark. |
i tried with the 100k customers csv [1] and saw about 25% improvement between v1 and v2 + new approach. interestingly, not much difference between Bun and Node here except better mem usage in Bun, which is almost universally true. i've found that the perf stats are quite different for small files (< 1MB) and large files (5MB+). the next benchmark won't bother with files < 5MB. [1] https://github.com/datablist/sample-csv-files#customers-csv-sample |
Oh I had compared Node v18 against Bun latest, maybe Node latest is about on par with Bun here. |
yeah i usually update everything. so this is Node v23.7.0 vs Bun canary ( |
I think
csv-simple-parser
was being called inefficiently, effectively executing 2 transform functions on the value, when we could just execute one.I've changed how this is implemented in v2 of the library to incentivize always using the library in the more optimized way.
Also I've reworked the custom infer function a bit to make use of the "isExplicitlyQuoted" argument. That changes how it works slightly though, it would break things if we don't actually want
"0"
to be interpreted as"0"
but rather as0
. If we want that to be interpreted as0
the check forisExplicitlyQuoted
should be deleted basically.Lastly I've changed the built-in infer function, which now is slightly slower, though this benchmark doesn't seem to use that at all now so that should be irrelevant for it.
Presumably this change should make
csv-simple-parser
perform a little bitter in this benchmark, but I haven't rerun the benchmark myself, fingers crossed, super curious to see the result 🤞