-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sink_parquet_cloud doesnt work when updating from 0.40 -> 0.41 #17172
Comments
This functionality was community provided. At the moment it is a bit low priority for us. We want to redesign this properly in the new engine. I would accept a fix. |
This makes the "ObjectStore" choose to upload using a single request or a multi request (multi part) upload depending on the size of the chunks. This change is needed because Polars is now calling the writer without buffering, so this was breaking the upload. There are two tests failing for different reasons: - the IPC upload to a unknown bucket is failing because the new CloudWriter is not propagating the error. This is probably an easy fix. - the Lazy "parquet to cloud" is failing for the reason I wrote above. This is probably related to this issue: pola-rs/polars#17172
It seems that with the new This is not explicitly documented in the ObjectStore's change log, but is described in the "removed docs" of this commit: apache/arrow-rs@96c4c0b As a solution, we may need to use the Do you have any preference? I can try to submit a PR with a fix for this. |
The issue started after the bump of `ObjectStore` to v0.10. Before that, ObjectStore was doing an internal buffer. The implementation is using `ObjectStore::BufWriter`, that is going to perform a "put" request if the size of data is below the "capacity". Otherwise it is going to do a "put multipart" instead. Fixes pola-rs#17172
Checks
Reproducible example
Log output
Issue description
On version 0.40 this code was used to write a small aggregation to cloud storage (S3), after updating to 0.41.* the file is not written and no error is being thrown
Expected behavior
On version File will be written to S3 or error thrown
Installed versions
polars = { version = "0.41.2", features = [
"lazy",
"aws",
"parquet",
"streaming",
"performant",
"cloud_write",
] }
The text was updated successfully, but these errors were encountered: