Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REGR: pd.to_hdf(dropna=True) not dropping all nan rows #35719

Closed
XiaozhanYang opened this issue Aug 14, 2020 · 1 comment · Fixed by #37564
Closed

REGR: pd.to_hdf(dropna=True) not dropping all nan rows #35719

XiaozhanYang opened this issue Aug 14, 2020 · 1 comment · Fixed by #37564
Labels
Bug IO HDF5 read_hdf, HDFStore Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Regression Functionality that used to work in a prior pandas version
Milestone

Comments

@XiaozhanYang
Copy link

Location of the documentation

[this should provide the location of the documentation, e.g. "pandas.read_csv" or the URL of the documentation, e.g. "https://dev.pandas.io/docs/reference/api/pandas.read_csv.html"]

Note: You can check the latest versions of the docs on master here.

https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html

Documentation problem

[this should provide a description of what documentation you believe needs to be fixed/improved]

On the section:

"HDFStore will by default not drop rows that are all missing. This behavior can be changed by setting dropna=True."

The last example:

In [370]: pd.read_hdf('file.h5', 'df_with_missing')
Out[370]:
col1 col2
0 0.0 1.0
1 NaN NaN
2 2.0 NaN

should output:

col1 col2
0 0.0 1.0
1 2.0 NaN

which does not have rows that are all missing.

Suggested fix for documentation

[this should explain the suggested fix and why it's better than the existing documentation]

This has been provided above.

@XiaozhanYang XiaozhanYang added Docs Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 14, 2020
@usneil usneil mentioned this issue Aug 14, 2020
4 tasks
@simonjayhawkins
Copy link
Member

Thanks @XiaozhanYang for the report.

The docs are built from the code samples and so this is a bug and not directly a documentation issue. #35720 (comment)

This is a regression between 0.25.3 and 1.0.5 and persists on master

>>> pd.__version__
'0.25.3'
>>>
>>> df_with_missing = pd.DataFrame({"col1": [0, np.nan, 2], "col2": [1, np.nan, np.nan]})
>>> df_with_missing
   col1  col2
0   0.0   1.0
1   NaN   NaN
2   2.0   NaN
>>>
>>> df_with_missing.to_hdf(
...     "file.h5", "df_with_missing", format="table", mode="w", dropna=True
... )
>>> result = pd.read_hdf("file.h5", "df_with_missing")
>>> result
   col1  col2
0   0.0   1.0
2   2.0   NaN
>>>
>>> pd.__version__
'1.0.5'
>>>
>>> df_with_missing = pd.DataFrame({"col1": [0, np.nan, 2], "col2": [1, np.nan, np.nan]})
>>> df_with_missing
   col1  col2
0   0.0   1.0
1   NaN   NaN
2   2.0   NaN
>>>
>>> df_with_missing.to_hdf(
...     "file.h5", "df_with_missing", format="table", mode="w", dropna=True
... )
>>> result = pd.read_hdf("file.h5", "df_with_missing")
>>> result
   col1  col2
0   0.0   1.0
1   NaN   NaN
2   2.0   NaN
>>>

@simonjayhawkins simonjayhawkins added Bug IO HDF5 read_hdf, HDFStore Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Regression Functionality that used to work in a prior pandas version and removed Docs Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 14, 2020
@simonjayhawkins simonjayhawkins added this to the Contributions Welcome milestone Aug 14, 2020
@simonjayhawkins simonjayhawkins changed the title DOC: Typo of "IO Tools' documentation REGR: pd.read_hdf(dropna=True) not dropping all nan rows Aug 14, 2020
@simonjayhawkins simonjayhawkins changed the title REGR: pd.read_hdf(dropna=True) not dropping all nan rows REGR: pd.to_hdf(dropna=True) not dropping all nan rows Aug 14, 2020
@jreback jreback modified the milestones: Contributions Welcome, 1.2 Nov 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO HDF5 read_hdf, HDFStore Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants