You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently hipscat.io.file_io.read_parquet_dataset() makes some assumptions which could be suboptimal for some use-cases. For instance, it sets pyarrow.dataset.dataset(exclude_invalid_files=True) which force-scan the whole catalog. I propose to make this behavior optional.
This change would also require updating ignore_prefixes, which currently doesn't include JSON and FITS metadata files, which makes pyarrow.dataset.datase believe that they are parquet files. Probably, we could also allow user to provide custom ignore_prefixes.
The probable solution would be adding **kwargs to the function and allow to pass any arguments to pyarrow.dataset.dataset arguments.
The text was updated successfully, but these errors were encountered:
Currently
hipscat.io.file_io.read_parquet_dataset()
makes some assumptions which could be suboptimal for some use-cases. For instance, it setspyarrow.dataset.dataset(exclude_invalid_files=True)
which force-scan the whole catalog. I propose to make this behavior optional.This change would also require updating
ignore_prefixes
, which currently doesn't include JSON and FITS metadata files, which makespyarrow.dataset.datase
believe that they are parquet files. Probably, we could also allow user to provide customignore_prefixes
.The probable solution would be adding
**kwargs
to the function and allow to pass any arguments topyarrow.dataset.dataset
arguments.The text was updated successfully, but these errors were encountered: