Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data explorer throws error on summary calculation for columns with 100% missing data #4307

Closed
jthomasmock opened this issue Aug 9, 2024 · 1 comment
Assignees
Labels
area: data explorer Issues related to Data Explorer category. bug Something isn't working lang: python

Comments

@jthomasmock
Copy link
Contributor

System details:

Positron and OS details:

Positron Version: 2024.08.0 (Universal) build 24
Code - OSS Version: 1.91.0
Commit: d1012cc
Date: 2024-08-09T04:20:57.440Z
Electron: 29.4.0
Chromium: 122.0.6261.156
Node.js: 20.9.0
V8: 12.2.281.27-electron.0
OS: Darwin arm64 23.5.0

Interpreter details:

Describe the issue:

I am seeing this with 100% NA columns + Pandas, not with Polars or R.

It will view and display fine, but when opening the Summary Panel for that column:

/Users/thomasmock/arrow-eda/.venv/lib/python3.10/site-packages/numpy/lib/nanfunctions.py:1215: RuntimeWarning: Mean of empty slice
  return np.nanmean(a, axis, out=out, keepdims=keepdims)

Steps to reproduce the issue:

In a Python script:

  1. Create a dataframe with a column that contains 100% missing data.
import pandas as pd
import numpy as np

pd_na = pd.DataFrame({"missing": [np.nan,np.nan,np.nan,np.nan, np.nan]})

It will view just fine in the grid, but opening summary stats will throw a warning:

/Users/thomasmock/arrow-eda/.venv/lib/python3.10/site-packages/numpy/lib/nanfunctions.py:1215: RuntimeWarning: Mean of empty slice

I can repro that independent of Data Explorer:

np.nanmean(pd_na['missing'])
<positron-console-cell-13>:1: RuntimeWarning: Mean of empty slice

Expected or desired behavior:

Ignore NAs for summary calculations such as mean, sd, median, etc.

Were there any error messages in the UI, Output panel, or Developer Tools console?

@jthomasmock jthomasmock added area: data explorer Issues related to Data Explorer category. lang: python labels Aug 9, 2024
@softwarenerd softwarenerd self-assigned this Aug 11, 2024
@seeM seeM added the bug Something isn't working label Aug 12, 2024
@petetronic petetronic added this to the 2024.09.0 Pre-Release milestone Aug 13, 2024
petetronic pushed a commit that referenced this issue Aug 14, 2024
… arrays for pandas and polars (#4329)

As described in #4307, for pandas we were raising a warning for all null
/ NA arrays when computing summary stats. The polars implementation
turned out to have a different bug, so that is fixed and tested here
also.

### QA Notes

Follow example reported in #4307 for both pandas and polars (substitute
all None values for polars)
@testlabauto
Copy link
Contributor

Verified Fixed

Positron Version(s) : 2024.08.0-41
OS Version          : OSX

Test scenario(s)

Verified using original filing instructions.

Link(s) to TestRail test cases run or created:
Will add a fully "missing data" row to data frame when writing summary stats automated test.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 30, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area: data explorer Issues related to Data Explorer category. bug Something isn't working lang: python
Projects
None yet
Development

No branches or pull requests

5 participants