Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ignore preprocessData option #359

Closed
randdusing opened this issue Feb 22, 2019 · 6 comments
Closed

Ignore preprocessData option #359

randdusing opened this issue Feb 22, 2019 · 6 comments
Labels

Comments

@randdusing
Copy link

Does preprocessData() do anything if opts.unwind and opts.flatten are not defined? As far as I can tell, it creates a single item array for each row and then concats them back together. This is very inefficient in the browser when working with 100k+ rows.

I read that streams are more efficient, but I didn't see any documentation about using this in the browser. Not sure if streams make sense in the browser since everything is in memory anyways.

I was able to "fix" the issue by just overriding the preprocessData function in a very hacky way. I was hoping you'd provide an option to skip preprocessing.

const parser = new Parser({ fields }) as any;
parser.preprocessData = (x:Array<any>) => x;
const csv = parser.parse(data);

Exporting ~90 fields and ~90k rows went from taking 45+ seconds (or freezing) to a more reasonable 9 seconds.

I'm using v4.3.3 in Chromium v70. Let me know if you need any other data.

@juanjoDiaz
Copy link
Collaborator

It doesn't do much but you are right that it still loops over the elements anyway.
I'll do some performance testing and try to improve it.

In any case, for such a large dataset, I would definitely use the stream API instead of the synchronous API. You are going to get much better performance and lower memory footprint.

@randdusing
Copy link
Author

Do you have any tips on using the stream API in browsers?

@juanjoDiaz
Copy link
Collaborator

I'm working on a new API to use the streaming API in the browser easily.
Just give me a couple days.

@juanjoDiaz
Copy link
Collaborator

This was released a month ago so I'll close.

Feel free to reopen if there is anything else that you think that could be done.

@randdusing
Copy link
Author

Oops, I totally missed the merge comment and didn't realize you had already made changes. I'll try it out and create a new issue if necessary.

@randdusing
Copy link
Author

Works well, thanks again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants