Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data Explorer: More gracefully handle summary statistics for all-null arrays for pandas and polars #4329

Merged
merged 1 commit into from
Aug 14, 2024

Conversation

wesm
Copy link
Contributor

@wesm wesm commented Aug 13, 2024

As described in #4307, for pandas we were raising a warning for all null / NA arrays when computing summary stats. The polars implementation turned out to have a different bug, so that is fixed and tested here also.

QA Notes

Follow example reported in #4307 for both pandas and polars (substitute all None values for polars)

@wesm wesm requested a review from seeM August 13, 2024 00:05
@petetronic
Copy link
Collaborator

Confirmed that this removes the warning for Pandas

@petetronic
Copy link
Collaborator

With:

import pandas as pd
import polars as pl
import numpy as np

...for our treatment of Polars, should np.nan be considered a NA value? Right now it does not seem to be (and the summary stats show NAN):

pl_na = pl.DataFrame({"missing": pl.Series([np.nan] * 5, dtype=pl.Float64)})
Screenshot 2024-08-13 at 9 00 24 PM

Where as, using None is seen as an NA value with Polars (and avoids showing summary stats):

pl_na = pl.DataFrame({"missing": pl.Series([None] * 5, dtype=pl.Float64)})
Screenshot 2024-08-13 at 9 01 04 PM

For Pandas, note we consider either None or np.nan to be NA (and note summary stats are not shown too):

Screenshot 2024-08-13 at 9 05 49 PM

@petetronic
Copy link
Collaborator

petetronic commented Aug 14, 2024

I've logged an issue to track the question with polars with np.nan, and will land this change.

@petetronic petetronic merged commit 55429d0 into main Aug 14, 2024
22 checks passed
@petetronic petetronic deleted the bug/de-pandas-summary-stat-warning branch August 14, 2024 14:16
@github-actions github-actions bot locked and limited conversation to collaborators Aug 14, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants