-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: support describe for non-numerical type string #973
Conversation
🤖 I detect that the PR title and the commit message differ and there's only one commit. To use the PR title for the commit history, you can use Github's automerge feature with squashing, or use -- conventional-commit-lint bot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
bigframes/dataframe.py
Outdated
@@ -1913,6 +1915,11 @@ def _reindex_rows( | |||
def _reindex_columns(self, columns): | |||
block = self._block | |||
new_column_index, indexer = self.columns.reindex(columns) | |||
|
|||
if indexer is None: | |||
# The new index is the same as the old one. Do nothing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious how you discovered this. Worth separating out into it's own PR (with tests).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was discovered when I wrote an ealier version of the "describe()":
I didn't check the emptiness of aggregated dataframes, and always concated them in the end. Then there was an edge case where the original dataframe contains only numerical columns, and as a result I was basically reindexing the columns with their original order, which triggered this bug.
60c25e8
to
2c73313
Compare
This change adds the "include" param in the describe method. It can only be set as None(default) or 'all'
I only enabled "count" and "nunique" for string types. More non-numerical types can be added later.