Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support describe for non-numerical type string #973

Merged
merged 1 commit into from
Sep 16, 2024
Merged

Conversation

sycai
Copy link
Contributor

@sycai sycai commented Sep 10, 2024

This change adds the "include" param in the describe method. It can only be set as None(default) or 'all'

I only enabled "count" and "nunique" for string types. More non-numerical types can be added later.

@sycai sycai requested review from a team as code owners September 10, 2024 20:41
@sycai sycai requested a review from tswast September 10, 2024 20:41
Copy link

conventional-commit-lint-gcf bot commented Sep 10, 2024

🤖 I detect that the PR title and the commit message differ and there's only one commit. To use the PR title for the commit history, you can use Github's automerge feature with squashing, or use automerge label. Good luck human!

-- conventional-commit-lint bot
https://conventionalcommits.org/

@product-auto-label product-auto-label bot added size: l Pull request size is large. api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. labels Sep 10, 2024
@sycai sycai added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Sep 12, 2024
@bigframes-bot bigframes-bot removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Sep 12, 2024
@sycai sycai requested a review from GarrettWu September 12, 2024 17:03
Copy link
Collaborator

@tswast tswast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@@ -1913,6 +1915,11 @@ def _reindex_rows(
def _reindex_columns(self, columns):
block = self._block
new_column_index, indexer = self.columns.reindex(columns)

if indexer is None:
# The new index is the same as the old one. Do nothing.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious how you discovered this. Worth separating out into it's own PR (with tests).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was discovered when I wrote an ealier version of the "describe()":

I didn't check the emptiness of aggregated dataframes, and always concated them in the end. Then there was an edge case where the original dataframe contains only numerical columns, and as a result I was basically reindexing the columns with their original order, which triggered this bug.

@sycai sycai force-pushed the b325474421 branch 2 times, most recently from 60c25e8 to 2c73313 Compare September 13, 2024 23:54
@sycai sycai enabled auto-merge (squash) September 16, 2024 17:12
@sycai sycai merged commit deac6d2 into main Sep 16, 2024
21 of 23 checks passed
@sycai sycai deleted the b325474421 branch September 16, 2024 18:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. size: l Pull request size is large.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants