-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use load_options param to pass pandasOptions in load_file #1466
Conversation
Codecov ReportBase: 97.59% // Head: 94.01% // Decreases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## main #1466 +/- ##
==========================================
- Coverage 97.59% 94.01% -3.58%
==========================================
Files 22 89 +67
Lines 789 4347 +3558
Branches 0 428 +428
==========================================
+ Hits 770 4087 +3317
- Misses 19 178 +159
- Partials 0 82 +82
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
883434a
to
26e9705
Compare
@pankajastro What sort of options do we want to pass using pandas_options? is delimiter? |
yes, currently delimiter, but in future maybe other params as well which pandas read_{csv, json} etc method accept. |
@pankajastro I was a little concerned because if we keep adding the internal libs-specific parameters we will soon need to expose every database-specific or file type-specific parameter. We are exposing internal details to the end user, kinda violating the interface we have provided. Instead, since load files only have to deal with the file(s) and tables can we keep the parameters specific to tables and files? Like in this scenario, the delimiter is specific to CSV files, maybe we can move this parameter to File Object? WDYT? |
I understand your concern and probably we will expose only limited param. We don't want to make File object messy either so I closed #1225 there is some discussion on #482 too |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pankajastro, as we discussed, my only concern - which needs a bit of further look into the code - is if we are calling multiple pandas functions as part of load_file (e.g. to create a table if it doesn't exist), and the user would like to specify custom arguments to multiple of these functions.
I feel it is fine for us to focus on the read_csv
, read_json
, etc., but it would be great to look at the code and confirm this is not an issue.
4a8cb3a
to
b649a5f
Compare
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
108a2b0
to
302efaa
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! :)
Hi @dimberman Can you please re-review this PR |
f570616
to
04a9779
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot, @pankajastro , for addressing all the comments. I'm happy for this PR to be merged after this last comment we agreed on is addressed.
requested changes does not apply since were existing from before. but feel free to comments if I have missed something will take care in separate PR
# Description closes: #1519 ## What is the current behavior? currently, load_file do not have an option to pass the pandas-related param while reading file ## What is the new behavior? use `load_options` and pass the given values while reading files using the pandas path ## Does this introduce a breaking change? No ### Checklist - [ ] Created tests which fail without the change (if possible) - [ ] Extended the README / documentation, if necessary Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Description
closes: #1519
What is the current behavior?
currently, load_file do not have an option to pass the pandas-related param while reading file
What is the new behavior?
use
load_options
and pass the given values while reading files using the pandas pathDoes this introduce a breaking change?
No
Checklist