REGR: pd.to_hdf(dropna=True) not dropping all nan rows #35719

XiaozhanYang · 2020-08-14T13:52:25Z

Location of the documentation

[this should provide the location of the documentation, e.g. "pandas.read_csv" or the URL of the documentation, e.g. "https://dev.pandas.io/docs/reference/api/pandas.read_csv.html"]

Note: You can check the latest versions of the docs on master here.

https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html

Documentation problem

[this should provide a description of what documentation you believe needs to be fixed/improved]

On the section:

"HDFStore will by default not drop rows that are all missing. This behavior can be changed by setting dropna=True."

The last example:

In [370]: pd.read_hdf('file.h5', 'df_with_missing')
Out[370]:
col1 col2
0 0.0 1.0
1 NaN NaN
2 2.0 NaN

should output:

col1 col2
0 0.0 1.0
1 2.0 NaN

which does not have rows that are all missing.

Suggested fix for documentation

[this should explain the suggested fix and why it's better than the existing documentation]

This has been provided above.

The text was updated successfully, but these errors were encountered:

simonjayhawkins · 2020-08-14T19:11:48Z

Thanks @XiaozhanYang for the report.

The docs are built from the code samples and so this is a bug and not directly a documentation issue. #35720 (comment)

This is a regression between 0.25.3 and 1.0.5 and persists on master

>>> pd.__version__
'0.25.3'
>>>
>>> df_with_missing = pd.DataFrame({"col1": [0, np.nan, 2], "col2": [1, np.nan, np.nan]})
>>> df_with_missing
   col1  col2
0   0.0   1.0
1   NaN   NaN
2   2.0   NaN
>>>
>>> df_with_missing.to_hdf(
...     "file.h5", "df_with_missing", format="table", mode="w", dropna=True
... )
>>> result = pd.read_hdf("file.h5", "df_with_missing")
>>> result
   col1  col2
0   0.0   1.0
2   2.0   NaN
>>>

>>> pd.__version__
'1.0.5'
>>>
>>> df_with_missing = pd.DataFrame({"col1": [0, np.nan, 2], "col2": [1, np.nan, np.nan]})
>>> df_with_missing
   col1  col2
0   0.0   1.0
1   NaN   NaN
2   2.0   NaN
>>>
>>> df_with_missing.to_hdf(
...     "file.h5", "df_with_missing", format="table", mode="w", dropna=True
... )
>>> result = pd.read_hdf("file.h5", "df_with_missing")
>>> result
   col1  col2
0   0.0   1.0
1   NaN   NaN
2   2.0   NaN
>>>

XiaozhanYang added Docs Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 14, 2020

usneil mentioned this issue Aug 14, 2020

update io documentation #35720

Closed

4 tasks

simonjayhawkins added Bug IO HDF5 read_hdf, HDFStore Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Regression Functionality that used to work in a prior pandas version and removed Docs Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 14, 2020

simonjayhawkins added this to the Contributions Welcome milestone Aug 14, 2020

simonjayhawkins changed the title ~~DOC: Typo of "IO Tools' documentation~~ REGR: pd.read_hdf(dropna=True) not dropping all nan rows Aug 14, 2020

simonjayhawkins changed the title ~~REGR: pd.read_hdf(dropna=True) not dropping all nan rows~~ REGR: pd.to_hdf(dropna=True) not dropping all nan rows Aug 14, 2020

arw2019 mentioned this issue Nov 1, 2020

REGR: pd.to_hdf(..., dropna=True) not dropping missing rows #37564

Merged

5 tasks

jreback modified the milestones: Contributions Welcome, 1.2 Nov 2, 2020

jreback closed this as completed in #37564 Nov 4, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

REGR: pd.to_hdf(dropna=True) not dropping all nan rows #35719

REGR: pd.to_hdf(dropna=True) not dropping all nan rows #35719

XiaozhanYang commented Aug 14, 2020

simonjayhawkins commented Aug 14, 2020

REGR: pd.to_hdf(dropna=True) not dropping all nan rows #35719

REGR: pd.to_hdf(dropna=True) not dropping all nan rows #35719

Comments

XiaozhanYang commented Aug 14, 2020

Location of the documentation

Documentation problem

Suggested fix for documentation

simonjayhawkins commented Aug 14, 2020