Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bench/typed: calling csv-simple-parser in a more optimized way #17

Closed

Conversation

fabiospampinato
Copy link

@fabiospampinato fabiospampinato commented Jan 27, 2025

I think csv-simple-parser was being called inefficiently, effectively executing 2 transform functions on the value, when we could just execute one.

I've changed how this is implemented in v2 of the library to incentivize always using the library in the more optimized way.

Also I've reworked the custom infer function a bit to make use of the "isExplicitlyQuoted" argument. That changes how it works slightly though, it would break things if we don't actually want "0" to be interpreted as "0" but rather as 0. If we want that to be interpreted as 0 the check for isExplicitlyQuoted should be deleted basically.

Lastly I've changed the built-in infer function, which now is slightly slower, though this benchmark doesn't seem to use that at all now so that should be irrelevant for it.

Presumably this change should make csv-simple-parser perform a little bitter in this benchmark, but I haven't rerun the benchmark myself, fingers crossed, super curious to see the result 🤞

@leeoniya
Copy link
Owner

leeoniya commented Jan 28, 2025

That changes how it works slightly though, it would break things if we don't actually want "0" to be interpreted as "0" but rather as 0. If we want that to be interpreted as 0 the check for isExplicitlyQuoted should be deleted basically.

imo, for typed parsing "0" is definitely 0 and not "0", aside from inference mistakes where the rest of the column contains non-numbers, like "John". so a "0" would be a string in the same column.

i'll re-bench soon, in Bun this time :) i'm waiting for a few memory improvements to land there first.

oven-sh/bun#15557

@fabiospampinato
Copy link
Author

imo, for typed parsing "0" is definitely 0 and not "0", aside from inference mistakes where the rest of the column contains non-numbers, like "John". so a "0" would be a string in the same column.

That makes a lot of sense, but I'm not sure the library should check that kind of thing 🤔 Either way, I've updated the custom infer functions for the benchmark to match the desired behavior.

i'll re-bench soon, in Bun this time :) i'm waiting for a few memory improvements to land there first.

😱 curious to see the changes there. IIRC String.prototype.indexOf is not as fast in JSC, so maybe it would benchmark worse even after this little improvement.

@leeoniya
Copy link
Owner

leeoniya commented Jan 29, 2025

i've been locally benchmarking in Bun for quite a while, 60% improvement is typical for the faster libs, especially those that minimize mem allocation and string concat.

@fabiospampinato
Copy link
Author

Running my little benchmark for my library in Bun apparently makes it ~25% faster lol some tests are closer to ~50% faster. It'd be interesting to see the result in your benchmark.

@leeoniya leeoniya closed this in 19a7de4 Feb 5, 2025
@leeoniya
Copy link
Owner

leeoniya commented Feb 5, 2025

i tried with the 100k customers csv [1] and saw about 25% improvement between v1 and v2 + new approach.

interestingly, not much difference between Bun and Node here except better mem usage in Bun, which is almost universally true.

i've found that the perf stats are quite different for small files (< 1MB) and large files (5MB+). the next benchmark won't bother with files < 5MB.

[1] https://github.com/datablist/sample-csv-files#customers-csv-sample

@fabiospampinato
Copy link
Author

Oh I had compared Node v18 against Bun latest, maybe Node latest is about on par with Bun here.

@leeoniya
Copy link
Owner

leeoniya commented Feb 5, 2025

yeah i usually update everything. so this is Node v23.7.0 vs Bun canary (bun upgrade --canary)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants