Bench/typed: calling csv-simple-parser in a more optimized way #17

fabiospampinato · 2025-01-27T11:59:39Z

I think csv-simple-parser was being called inefficiently, effectively executing 2 transform functions on the value, when we could just execute one.

I've changed how this is implemented in v2 of the library to incentivize always using the library in the more optimized way.

Also I've reworked the custom infer function a bit to make use of the "isExplicitlyQuoted" argument. That changes how it works slightly though, it would break things if we don't actually want "0" to be interpreted as "0" but rather as 0. If we want that to be interpreted as 0 the check for isExplicitlyQuoted should be deleted basically.

Lastly I've changed the built-in infer function, which now is slightly slower, though this benchmark doesn't seem to use that at all now so that should be irrelevant for it.

Presumably this change should make csv-simple-parser perform a little bitter in this benchmark, but I haven't rerun the benchmark myself, fingers crossed, super curious to see the result 🤞

leeoniya · 2025-01-28T19:06:18Z

That changes how it works slightly though, it would break things if we don't actually want "0" to be interpreted as "0" but rather as 0. If we want that to be interpreted as 0 the check for isExplicitlyQuoted should be deleted basically.

imo, for typed parsing "0" is definitely 0 and not "0", aside from inference mistakes where the rest of the column contains non-numbers, like "John". so a "0" would be a string in the same column.

i'll re-bench soon, in Bun this time :) i'm waiting for a few memory improvements to land there first.

oven-sh/bun#15557

fabiospampinato · 2025-01-28T21:50:45Z

imo, for typed parsing "0" is definitely 0 and not "0", aside from inference mistakes where the rest of the column contains non-numbers, like "John". so a "0" would be a string in the same column.

That makes a lot of sense, but I'm not sure the library should check that kind of thing 🤔 Either way, I've updated the custom infer functions for the benchmark to match the desired behavior.

i'll re-bench soon, in Bun this time :) i'm waiting for a few memory improvements to land there first.

😱 curious to see the changes there. IIRC String.prototype.indexOf is not as fast in JSC, so maybe it would benchmark worse even after this little improvement.

leeoniya · 2025-01-29T04:58:11Z

i've been locally benchmarking in Bun for quite a while, 60% improvement is typical for the faster libs, especially those that minimize mem allocation and string concat.

fabiospampinato · 2025-01-29T20:36:38Z

Running my little benchmark for my library in Bun apparently makes it ~25% faster lol some tests are closer to ~50% faster. It'd be interesting to see the result in your benchmark.

leeoniya · 2025-02-05T14:41:32Z

i tried with the 100k customers csv [1] and saw about 25% improvement between v1 and v2 + new approach.

interestingly, not much difference between Bun and Node here except better mem usage in Bun, which is almost universally true.

i've found that the perf stats are quite different for small files (< 1MB) and large files (5MB+). the next benchmark won't bother with files < 5MB.

[1] https://github.com/datablist/sample-csv-files#customers-csv-sample

fabiospampinato · 2025-02-05T14:45:49Z

Oh I had compared Node v18 against Bun latest, maybe Node latest is about on par with Bun here.

leeoniya · 2025-02-05T14:46:55Z

yeah i usually update everything. so this is Node v23.7.0 vs Bun canary (bun upgrade --canary)

fabiospampinato added 2 commits January 27, 2025 11:56

Bench/typed: calling csv-simple-parser in a more optimized way

ffd456e

Upgraded csv-simple-parser to v2

22cd6b0

fabiospampinato added 3 commits January 28, 2025 21:38

Updated infer functions

3e7c05a

Updated infer functions

82613b3

Updated infer functions

21b0db2

leeoniya closed this in 19a7de4 Feb 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bench/typed: calling csv-simple-parser in a more optimized way #17

Bench/typed: calling csv-simple-parser in a more optimized way #17

fabiospampinato commented Jan 27, 2025 •

edited

Loading

leeoniya commented Jan 28, 2025 •

edited

Loading

fabiospampinato commented Jan 28, 2025

leeoniya commented Jan 29, 2025 •

edited

Loading

fabiospampinato commented Jan 29, 2025

leeoniya commented Feb 5, 2025 •

edited

Loading

fabiospampinato commented Feb 5, 2025

leeoniya commented Feb 5, 2025 •

edited

Loading

Bench/typed: calling csv-simple-parser in a more optimized way #17

Bench/typed: calling csv-simple-parser in a more optimized way #17

Conversation

fabiospampinato commented Jan 27, 2025 • edited Loading

leeoniya commented Jan 28, 2025 • edited Loading

fabiospampinato commented Jan 28, 2025

leeoniya commented Jan 29, 2025 • edited Loading

fabiospampinato commented Jan 29, 2025

leeoniya commented Feb 5, 2025 • edited Loading

fabiospampinato commented Feb 5, 2025

leeoniya commented Feb 5, 2025 • edited Loading

fabiospampinato commented Jan 27, 2025 •

edited

Loading

leeoniya commented Jan 28, 2025 •

edited

Loading

leeoniya commented Jan 29, 2025 •

edited

Loading

leeoniya commented Feb 5, 2025 •

edited

Loading

leeoniya commented Feb 5, 2025 •

edited

Loading