-
Notifications
You must be signed in to change notification settings - Fork 901
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handler csv reader options in cudf-polars #16211
Handler csv reader options in cudf-polars #16211
Conversation
# because polars will already have handled them. | ||
path = Path(p) | ||
with path.open() as f: | ||
while f.readline() == "\n": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this something that read_csv
should eventually implement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At some point (maybe soon?), cudf-polars is not going to use cudf.read_csv
, but rather the libcudf layer directly, so we'll probably need to do this here anyway.
@@ -206,24 +208,104 @@ def __post_init__(self) -> None: | |||
if self.file_options.n_rows is not None: | |||
raise NotImplementedError("row limit in scan") | |||
if self.typ not in ("csv", "parquet"): | |||
raise NotImplementedError(f"Unhandled scan type: {self.typ}") | |||
if self.cloud_options is not None and any( | |||
self.cloud_options[k] is not None for k in ("aws", "azure", "gcp") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confirming that these strings are guaranteed to be in the dict at this point? If this option isn't supported at all would it be worthwhile just to only accept None
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
csv reader has non-None cloud_options but the values are None, so I can't punt for cloud_options is not None
.
Convert internal PySeries to public Series in translation.
If one passes an empty list for the names, things should work.
Possibly some pieces are missing, but this is much closer to complete.
a13836a
to
184979c
Compare
/merge |
Description
Previously we were just relying on the default cudf read_csv options which doesn't do the right thing if the user has configured things.
Now that polars passes through the information to us, we can handle things properly, and raise for unsupported cases.
While here, update to new polars release and adapt tests to bug fixes that have been made upstream.
Checklist