-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Combine PandasLoadOptions into single class #1722
Conversation
Codecov ReportBase: 97.72% // Head: 97.72% // No change to project coverage 👍
Additional details and impacted files@@ Coverage Diff @@
## main #1722 +/- ##
=======================================
Coverage 97.72% 97.72%
=======================================
Files 21 21
Lines 835 835
=======================================
Hits 816 816
Misses 19 19 Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the following changes would be required:
- I think we need to have a way to pass all other supported parameters for Pandas as
kwargs
here. We should add this change as part of this PR. More at: https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html, https://pandas.pydata.org/docs/reference/api/pandas.read_json.html - Documentation has to be changed accordingly in
load_file.rst
for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the earlier implementation, I thought a class per pandas API (PandasJsonLoadOptions
read_json) will be cleaner.
currently, you have put a comment like json/csv etc this comment does not look correct to me since dtype
is a valid param for even read_json method and in future, we may like to add this param in PandasJsonLoadOptions
class.
I feel we have different classes then the code editor also helps users to understand what parameters they can use, which would be confusing if we have a single class.
So I would like to know the other thought on this PR
@pankajastro Thanks for pointing this out, I have updated the docstring. Now we are referring to the pandas doc for a superset of all the valid options. Also, I think if we are anyway exposing to end users that we are using pandas in the backend to do stuff, I think we can also assume that they will be aware of which options need to be passed for a particular type of file. We have also included the pandas link to aid for this. |
# Description ## What is the current behavior? We currently have PandasCsvLoadOptions, PandasJsonLoadOptions, PandasNdjsonLoadOptions, and PandasParquetLoadOptions. Which can be confusing to users. closes: #1681 ## What is the new behavior? Instead of multiple classes, we can probably merge this in one single class PandasLoadOptions that can be superset for all the PandasLoadOptions. ## Does this introduce a breaking change? Yes ### Checklist - [ ] Created tests which fail without the change (if possible) - [ ] Extended the README / documentation, if necessary
Description
What is the current behavior?
We currently have PandasCsvLoadOptions, PandasJsonLoadOptions, PandasNdjsonLoadOptions, and PandasParquetLoadOptions. Which can be confusing to users.
closes: #1681
What is the new behavior?
Instead of multiple classes, we can probably merge this in one single class PandasLoadOptions that can be superset for all the PandasLoadOptions.
Does this introduce a breaking change?
Yes
Checklist