Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Combine PandasLoadOptions into single class #1722

Merged
merged 10 commits into from
Feb 8, 2023
Merged

Conversation

utkarsharma2
Copy link
Collaborator

@utkarsharma2 utkarsharma2 commented Feb 7, 2023

Description

What is the current behavior?

We currently have PandasCsvLoadOptions, PandasJsonLoadOptions, PandasNdjsonLoadOptions, and PandasParquetLoadOptions. Which can be confusing to users.

closes: #1681

What is the new behavior?

Instead of multiple classes, we can probably merge this in one single class PandasLoadOptions that can be superset for all the PandasLoadOptions.

Does this introduce a breaking change?

Yes

Checklist

  • Created tests which fail without the change (if possible)
  • Extended the README / documentation, if necessary

@codecov
Copy link

codecov bot commented Feb 7, 2023

Codecov Report

Base: 97.72% // Head: 97.72% // No change to project coverage 👍

Coverage data is based on head (5826d22) compared to base (42a7dd9).
Patch has no changes to coverable lines.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1722   +/-   ##
=======================================
  Coverage   97.72%   97.72%           
=======================================
  Files          21       21           
  Lines         835      835           
=======================================
  Hits          816      816           
  Misses         19       19           

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@utkarsharma2 utkarsharma2 marked this pull request as ready for review February 7, 2023 13:11
@utkarsharma2 utkarsharma2 changed the title Combined PandasLoadOptions into single class Combine PandasLoadOptions into single class Feb 7, 2023
Copy link
Contributor

@sunank200 sunank200 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the following changes would be required:

  1. I think we need to have a way to pass all other supported parameters for Pandas as kwargs here. We should add this change as part of this PR. More at: https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html, https://pandas.pydata.org/docs/reference/api/pandas.read_json.html
  2. Documentation has to be changed accordingly in load_file.rst for this.

python-sdk/src/astro/dataframes/load_options.py Outdated Show resolved Hide resolved
Copy link
Contributor

@pankajastro pankajastro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the earlier implementation, I thought a class per pandas API (PandasJsonLoadOptions
read_json) will be cleaner.
currently, you have put a comment like json/csv etc this comment does not look correct to me since dtype is a valid param for even read_json method and in future, we may like to add this param in PandasJsonLoadOptions class.
I feel we have different classes then the code editor also helps users to understand what parameters they can use, which would be confusing if we have a single class.
So I would like to know the other thought on this PR

@utkarsharma2
Copy link
Collaborator Author

@pankajastro Thanks for pointing this out, I have updated the docstring. Now we are referring to the pandas doc for a superset of all the valid options.

Also, I think if we are anyway exposing to end users that we are using pandas in the backend to do stuff, I think we can also assume that they will be aware of which options need to be passed for a particular type of file. We have also included the pandas link to aid for this.

@utkarsharma2 utkarsharma2 merged commit c212f4d into main Feb 8, 2023
@utkarsharma2 utkarsharma2 deleted the FixPandasOptions branch February 8, 2023 13:43
utkarsharma2 added a commit that referenced this pull request Feb 8, 2023
# Description
## What is the current behavior?
We currently have PandasCsvLoadOptions, PandasJsonLoadOptions,
PandasNdjsonLoadOptions, and PandasParquetLoadOptions. Which can be
confusing to users.

closes: #1681 

## What is the new behavior?
Instead of multiple classes, we can probably merge this in one single
class PandasLoadOptions that can be superset for all the
PandasLoadOptions.

## Does this introduce a breaking change?
Yes

### Checklist
- [ ] Created tests which fail without the change (if possible)
- [ ] Extended the README / documentation, if necessary
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Standardize the pandas load options
4 participants