Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return iterator of arrow record batches to JS #95

Closed
kylebarron opened this issue Apr 24, 2022 · 2 comments
Closed

Return iterator of arrow record batches to JS #95

kylebarron opened this issue Apr 24, 2022 · 2 comments

Comments

@kylebarron
Copy link
Owner

Motivation: Parquet and Arrow are chunked formats. Therefore we shouldn't need to wait for the entire dataset to load/parse before getting some data back.

However I'm still not aware of a way to return an iterable or an async iterable from rust to js. To get around this, I think we can "drive" the iteration from JS. Essentially this:

import * as wasm from 'parquet-wasm';

const arr = new Uint8Array(); // Parquet bytes
// name readSchema to align with pyarrow api?
const parquetFile = new wasm.ParquetFile(arr);
const schemaIPC = parquetFile.schema();
for (let i = 0; i < parquetFile.numRowGroups; i++) {
  const recordBatchIPC = parquetFile.readRowGroup(i);
}

And ideally we'll have an async version of this too

@kylebarron
Copy link
Owner Author

This is also closed with #296

@kylebarron
Copy link
Owner Author

This is also closed with #296

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant