Ignore preprocessData option #359

randdusing · 2019-02-22T18:21:55Z

Does preprocessData() do anything if opts.unwind and opts.flatten are not defined? As far as I can tell, it creates a single item array for each row and then concats them back together. This is very inefficient in the browser when working with 100k+ rows.

I read that streams are more efficient, but I didn't see any documentation about using this in the browser. Not sure if streams make sense in the browser since everything is in memory anyways.

I was able to "fix" the issue by just overriding the preprocessData function in a very hacky way. I was hoping you'd provide an option to skip preprocessing.

const parser = new Parser({ fields }) as any;
parser.preprocessData = (x:Array<any>) => x;
const csv = parser.parse(data);

Exporting ~90 fields and ~90k rows went from taking 45+ seconds (or freezing) to a more reasonable 9 seconds.

I'm using v4.3.3 in Chromium v70. Let me know if you need any other data.

The text was updated successfully, but these errors were encountered:

juanjoDiaz · 2019-02-22T20:36:49Z

It doesn't do much but you are right that it still loops over the elements anyway.
I'll do some performance testing and try to improve it.

In any case, for such a large dataset, I would definitely use the stream API instead of the synchronous API. You are going to get much better performance and lower memory footprint.

randdusing · 2019-02-23T14:12:28Z

Do you have any tips on using the stream API in browsers?

juanjoDiaz · 2019-02-24T18:05:02Z

I'm working on a new API to use the streaming API in the browser easily.
Just give me a couple days.

juanjoDiaz · 2019-03-23T17:51:58Z

This was released a month ago so I'll close.

Feel free to reopen if there is anything else that you think that could be done.

randdusing · 2019-03-25T18:53:34Z

Oops, I totally missed the merge comment and didn't realize you had already made changes. I'll try it out and create a new issue if necessary.

randdusing · 2019-03-25T20:52:48Z

Works well, thanks again.

juanjoDiaz added the question label Feb 22, 2019

juanjoDiaz mentioned this issue Feb 24, 2019

Performance improvements #360

Merged

juanjoDiaz closed this as completed Mar 23, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ignore preprocessData option #359

Ignore preprocessData option #359

randdusing commented Feb 22, 2019

juanjoDiaz commented Feb 22, 2019

randdusing commented Feb 23, 2019

juanjoDiaz commented Feb 24, 2019

juanjoDiaz commented Mar 23, 2019

randdusing commented Mar 25, 2019

randdusing commented Mar 25, 2019

Ignore preprocessData option #359

Ignore preprocessData option #359

Comments

randdusing commented Feb 22, 2019

juanjoDiaz commented Feb 22, 2019

randdusing commented Feb 23, 2019

juanjoDiaz commented Feb 24, 2019

juanjoDiaz commented Mar 23, 2019

randdusing commented Mar 25, 2019

randdusing commented Mar 25, 2019