Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

regression fix for merging DF with datetime index with empty DF #36897

Merged
merged 17 commits into from
Nov 3, 2020

Conversation

PCerles
Copy link
Contributor

@PCerles PCerles commented Oct 5, 2020

@PCerles PCerles changed the title Phil/possible merge bugfix bugfix for merging datetime with object column on indices Oct 5, 2020
@dsaxton
Copy link
Member

dsaxton commented Oct 5, 2020

Can you also add a test and a v1.1.4 release note?

@jbrockmendel
Copy link
Member

needs test(s)

@dsaxton dsaxton added Regression Functionality that used to work in a prior pandas version Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Oct 5, 2020
@PCerles PCerles force-pushed the phil/possible-merge-bugfix branch from fbd9585 to cd96385 Compare October 5, 2020 22:04
@PCerles
Copy link
Contributor Author

PCerles commented Oct 5, 2020

👍

@PCerles PCerles changed the title bugfix for merging datetime with object column on indices regression fix for merging datetime with object column on indices Oct 5, 2020
@PCerles PCerles changed the title regression fix for merging datetime with object column on indices regression fix for merging DF with datetime index with empty DF Oct 5, 2020
@PCerles PCerles force-pushed the phil/possible-merge-bugfix branch from cd96385 to d34461a Compare October 6, 2020 18:27
@PCerles
Copy link
Contributor Author

PCerles commented Oct 6, 2020

^^ These test failures seem unrelated and windows-dependent on a timezone feature

expected[["date", "panel", "data"]] = frame.reset_index()[
["date", "panel", "data"]
]
expected = expected.set_index(["date", "panel"])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you write these tests without using set_index or reset_index (building data explicitly using constructors)? Helps isolate the testing to only merge and make things easier to understand. Also could make the original data smaller.

pandas/tests/reshape/merge/test_multi.py Outdated Show resolved Hide resolved
doc/source/whatsnew/v1.1.4.rst Outdated Show resolved Hide resolved
@DannyNemer
Copy link

This looks great! This fix will help my team a great deal.

@PCerles
Copy link
Contributor Author

PCerles commented Oct 7, 2020

^ These test failures look like unrelated timeouts

@DannyNemer
Copy link

Any updates on the availability of this bug-fix? 🙂

@jreback
Copy link
Contributor

jreback commented Oct 8, 2020

well this hasn't even been reviewed yet

doc/source/whatsnew/v1.1.4.rst Outdated Show resolved Hide resolved
pandas/tests/reshape/merge/test_multi.py Outdated Show resolved Hide resolved
pandas/tests/reshape/merge/test_multi.py Outdated Show resolved Hide resolved
pandas/tests/reshape/merge/test_multi.py Outdated Show resolved Hide resolved
@jreback jreback added this to the 1.2 milestone Oct 10, 2020
@PCerles PCerles force-pushed the phil/possible-merge-bugfix branch from 63312ce to aa06288 Compare October 27, 2020 17:57
@DannyNemer
Copy link

@jreback Ready for your re-review. Thank you!

doc/source/whatsnew/v1.2.0.rst Outdated Show resolved Hide resolved
doc/source/whatsnew/v1.2.0.rst Outdated Show resolved Hide resolved
pandas/tests/reshape/merge/test_multi.py Outdated Show resolved Hide resolved
pandas/tests/reshape/merge/test_multi.py Outdated Show resolved Hide resolved
},
index=midx3,
)
result = frame.merge(other, how="left", on=["date", "panel"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can u test the other cases from the OP that they give the correct results

pandas/tests/reshape/merge/test_multi.py Outdated Show resolved Hide resolved
@PCerles PCerles force-pushed the phil/possible-merge-bugfix branch from 3ed5aea to e5045f3 Compare November 2, 2020 21:34
@@ -467,7 +467,7 @@ MultiIndex

- Bug in :meth:`DataFrame.xs` when used with :class:`IndexSlice` raises ``TypeError`` with message ``"Expected label or tuple of labels"`` (:issue:`35301`)
- Bug in :meth:`DataFrame.reset_index` with ``NaT`` values in index raises ``ValueError`` with message ``"cannot convert float NaN to integer"`` (:issue:`36541`)
-
- Fixed regression in :func:`merge` on merging DatetimeIndex with empty DataFrame (:issue:`36895`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry can you move to the reshaping section

pandas/tests/reshape/merge/test_multi.py Show resolved Hide resolved
index=midx3,
)

result_left_merge = left.merge(right, how="left", on=["date", "panel"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you paramaterize this; it is very hard to understand what is equal to what here. you can put the expected things in the parametr. I would like to hard code the result data frame (rather than trying to do operations to get it). its very hard to read otherwise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I parametrized the merge type in my commit -- did you want me to explicitly construct an expected data frame in the parametrize call here? https://github.com/pandas-dev/pandas/pull/36897/files#diff-513c054eb6b1b4b5767980db8c8dc0bd93f7ea55df9facea3465bfac11785474R509
That feels like it might be difficult to read but I can do it.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok one more round and pls ping on green.

pandas/tests/reshape/merge/test_multi.py Outdated Show resolved Hide resolved
pandas/tests/reshape/merge/test_multi.py Outdated Show resolved Hide resolved
pandas/tests/reshape/merge/test_multi.py Outdated Show resolved Hide resolved
@PCerles
Copy link
Contributor Author

PCerles commented Nov 3, 2020

@jreback green!

@jreback jreback merged commit 1c2de61 into pandas-dev:master Nov 3, 2020
@jreback
Copy link
Contributor

jreback commented Nov 3, 2020

thanks @PCerles very nice!

@PCerles
Copy link
Contributor Author

PCerles commented Nov 3, 2020

You're welcome, thanks for your help maintaining Pandas!

jreback added a commit that referenced this pull request Nov 13, 2020
… (#37655)

* Moving the file test_frame.py to a new directory

* Сreated file test_frame_color.py

* Transfer tests
of test_frame.py
to test_frame_color.py

* PEP 8 fixes

* Transfer tests

of test_frame.py
to test_frame_groupby.py and test_frame_subplots.py

* Removing unnecessary imports

* PEP 8 fixes

* Fixed class name

* Transfer tests

of test_frame.py
to test_frame_subplots.py

* Transfer tests

of test_frame.py
to test_frame_groupby.py, test_frame_subplots.py, test_frame_color.py

* Changed class names

* Removed unnecessary imports

* Removed import

* catch FutureWarnings (#37587)

* TST/REF: collect indexing tests by method (#37590)

* REF: prelims for single-path setitem_with_indexer (#37588)

* ENH: __repr__ for 2D DTA/TDA (#37164)

* CLN: de-duplicate _validate_where_value with _validate_setitem_value (#37595)

* TST/REF: collect tests by method (#37589)

* TST/REF: move remaining setitem tests from test_timeseries

* TST/REF: rehome test_timezones test

* move misplaced arithmetic test

* collect tests by method

* move misplaced file

* REF: Categorical.is_dtype_equal -> categories_match_up_to_permutation (#37545)

* CLN refactor non-core (#37580)

* refactor core/computation (#37585)

* TST/REF: share method tests between DataFrame and Series (#37596)

* BUG: Index.where casting ints to str (#37591)

* REF: IntervalArray comparisons (#37124)

* regression fix for merging DF with datetime index with empty DF (#36897)

* ERR: fix error message in Period for invalid frequency (#37602)

* CLN: remove rebox_native (#37608)

* TST/REF: tests.generic (#37618)

* TST: collect tests by method (#37617)

* TST/REF: collect test_timeseries tests by method

* misplaced DataFrame.values tst

* misplaced dataframe.values test

* collect test by method

* TST/REF: share tests across Series/DataFrame (#37616)

* Gh 36562 typeerror comparison not supported between float and str (#37096)

* docs: fix punctuation (#37612)

* REGR: pd.to_hdf(..., dropna=True) not dropping missing rows (#37564)

* parametrize set_axis tests (#37619)

* CLN: clean color selection in _matplotlib/style (#37203)

* DEPR: DataFrame/Series.slice_shift (#37601)

* REF: re-use validate_setitem_value in Categorical.fillna (#37597)

* PERF: release gil for ewma_time (#37389)

* BUG: Groupy dropped nan groups from result when grouping over single column (#36842)

* ENH: implement timeszones support for read_json(orient='table') and astype() from 'object' (#35973)

* REF/BUG/TYP: read_csv shouldn't close user-provided file handles (#36997)

* BUG/REF: read_csv shouldn't close user-provided file handles

* get_handle: typing, returns is_wrapped, use dataclass, and make sure that all created handlers are returned

* remove unused imports

* added IOHandleArgs.close

* added IOArgs.close

* mostly comments

* move memory_map from TextReader to CParserWrapper

* moved IOArgs and IOHandles

* more comments

Co-authored-by: Jeff Reback <jeff@reback.net>

* more typing checks to pre-commit (#37539)

* TST: 32bit dtype compat test_groupby_dropna (#37623)

* BUG: Metadata propagation for groupby iterator (#37461)

* BUG: read-only values in cython funcs (#37613)

* CLN refactor core/arrays (#37581)

* Fixed Metadata Propogation in DataFrame (#37381)

* TYP: add Shape alias to pandas._typing (#37128)

* DOC: Fix typo (#37630)

* CLN: parametrize test_nat_comparisons (#37195)

* dataframe dataclass docstring updated (#37632)

* refactor core/groupby (#37583)

* BUG: set index of DataFrame.apply(f) when f returns dict (#37544) (#37606)

* BUG: to_dict should return a native datetime object for NumPy backed dataframes (#37571)

* ENH: memory_map for compressed files (#37621)

* DOC: add example & prose of slicing with labels when index has duplicate labels  (#36814)

* DOC: add example & prose of slicing with labels when index has duplicate labels #36251

* DOC: proofread the sentence.

Co-authored-by: Jun Kudo <jun-lab@junnoMacBook-Pro.local>

* DOC: Fix typo (#37636)

"columns(s)" sounded odd, I believe it was supposed to be just "column(s)".

* CI: troubleshoot win py38 builds (#37652)

* TST/REF: collect indexing tests by method (#37638)

* TST/REF: collect tests for get_numeric_data (#37634)

* misplaced loc test

* TST/REF: collect get_numeric_data tests

* REF: de-duplicate _validate_insert_value with _validate_scalar (#37640)

* CI: catch windows py38 OSError (#37659)

* share test (#37679)

* TST: match matplotlib warning message (#37666)

* TST: match matplotlib warning message

* TST: match full message

* pd.Series.loc.__getitem__ promotes to float64 instead of raising KeyError (#37687)

* REF/TST: misplaced Categorical tests (#37678)

* REF/TST: collect indexing tests by method (#37677)

* CLN: only call _wrap_results one place in nanmedian (#37673)

* TYP: Index._concat (#37671)

* BUG: CategoricalIndex.equals casting non-categories to np.nan (#37667)

* CLN: _replace_single (#37683)

* REF: simplify _replace_single by noting regex kwarg is bool

* Annotate

* CLN: remove never-False convert kwarg

* TYP: make more internal funcs keyword-only (#37688)

* REF: make Series._replace_single a regular method (#37691)

* REF: simplify cycling through colors (#37664)

* REF: implement _wrap_reduction_result (#37660)

* BUG: preserve fold in Timestamp.replace (#37644)

* CLN: Clean indexing tests (#37689)

* TST: fix warning for pie chart (#37669)

* PERF: reverted change from commit 7d257c6 to solve issue #37081 (#37426)

* DataFrameGroupby.boxplot fails when subplots=False (#28102)

* ENH: Improve error reporting for wrong merge cols (#37547)

* Transfer tests
of test_frame.py
to test_frame_color.py

* PEP8

* Fixes for linter

* Сhange pd.DateFrame to DateFrame

* Move inconsistent namespace check to pre-commit, fixup more files (#37662)

* check for inconsistent namespace usage

* doc

* typos

* verbose regex

* use verbose flag

* use verbose flag

* match both directions

* add test

* don't import annotations from future

* update extra couple of cases

* 🚚 rename

* typing

* don't use subprocess

* don't type tests

* use pathlib

* REF: simplify NDFrame.replace, ObjectBlock.replace (#37704)

* REF: implement Categorical.encode_with_my_categories (#37650)

* REF: implement Categorical.encode_with_my_categories

* privatize

* BUG: unpickling modifies Block.ndim (#37657)

* REF: dont support dt64tz in nanmean (#37658)

* CLN: Simplify groupby head/tail tests (#37702)

* Bug in loc raised for numeric label even when label is in Index (#37675)

* REF: implement replace_regex, remove unreachable branch in ObjectBlock.replace (#37696)

* TYP: Check untyped defs (except vendored) (#37556)

* REF: remove ObjectBlock._replace_single (#37710)

* Transfer tests
of test_frame.py
to test_frame_color.py

* TST/REF: collect indexing tests by method (#37590)

* PEP8

* Сhange DateFrame to pd.DateFrame

* Сhange pd.DateFrame to DateFrame

* Removing imports

* Bug fixes

* Bug fixes

* Fix incorrect merge

* test_frame_color.py edit

* Transfer tests

of test_frame.py
to test_frame_color.py, test_frame_groupby.py and test_frame_subplots.py

* Removing unnecessary imports

* PEP8

* # Conflicts:
#	pandas/tests/plotting/frame/test_frame.py
#	pandas/tests/plotting/frame/test_frame_color.py
#	pandas/tests/plotting/frame/test_frame_subplots.py

* Moving the file test_frame.py to a new directory

* Transfer tests

of test_frame.py
to test_frame_color.py, test_frame_groupby.py and test_frame_subplots.py

* Removing unnecessary imports

* PEP8

* CLN: clean categorical indexes tests (#37721)

* Fix merge error

* PEP 8 fixes

* Fix merge error

* Removing unnecessary imports

* PEP 8 fixes

* Fixed class name

* Transfer tests

of test_frame.py
to test_frame_subplots.py

* Transfer tests

of test_frame.py
to test_frame_groupby.py, test_frame_subplots.py, test_frame_color.py

* Changed class names

* Removed unnecessary imports

* Removed import

* TST/REF: collect indexing tests by method (#37590)

* TST: match matplotlib warning message (#37666)

* TST: match matplotlib warning message

* TST: match full message

* TST: fix warning for pie chart (#37669)

* Transfer tests
of test_frame.py
to test_frame_color.py

* PEP8

* Fixes for linter

* Сhange pd.DateFrame to DateFrame

* Transfer tests
of test_frame.py
to test_frame_color.py

* PEP8

* Сhange DateFrame to pd.DateFrame

* Сhange pd.DateFrame to DateFrame

* Removing imports

* Bug fixes

* Bug fixes

* Fix incorrect merge

* test_frame_color.py edit

* Fix merge error

* Fix merge error

* Removing unnecessary features

* Resolving Commit Conflicts daf999f 365d843

* black fix

Co-authored-by: jbrockmendel <jbrockmendel@gmail.com>
Co-authored-by: Marco Gorelli <m.e.gorelli@gmail.com>
Co-authored-by: Philip Cerles <philip.cerles@gmail.com>
Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Co-authored-by: Sven <sven.schellenberg@paradynsystems.com>
Co-authored-by: Micael Jarniac <micael@jarniac.com>
Co-authored-by: Andrew Wieteska <48889395+arw2019@users.noreply.github.com>
Co-authored-by: Maxim Ivanov <41443370+ivanovmg@users.noreply.github.com>
Co-authored-by: Erfan Nariman <34067903+erfannariman@users.noreply.github.com>
Co-authored-by: Fangchen Li <fangchen.li@outlook.com>
Co-authored-by: patrick <61934744+phofl@users.noreply.github.com>
Co-authored-by: attack68 <24256554+attack68@users.noreply.github.com>
Co-authored-by: Torsten Wörtwein <twoertwein@users.noreply.github.com>
Co-authored-by: Jeff Reback <jeff@reback.net>
Co-authored-by: Janus <janus@insignificancegalore.net>
Co-authored-by: Joel Whittier <rootbeerfriend@gmail.com>
Co-authored-by: taytzehao <jtth95@gmail.com>
Co-authored-by: ma3da <34522496+ma3da@users.noreply.github.com>
Co-authored-by: junk <juntrp0207@gmail.com>
Co-authored-by: Jun Kudo <jun-lab@junnoMacBook-Pro.local>
Co-authored-by: Alex Kirko <alexander.kirko@gmail.com>
Co-authored-by: Yassir Karroum <ukarroum17@gmail.com>
Co-authored-by: Kaiqi Dong <kaiqi@kth.se>
Co-authored-by: Richard Shadrach <45562402+rhshadrach@users.noreply.github.com>
Co-authored-by: Simon Hawkins <simonjayhawkins@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Regression Functionality that used to work in a prior pandas version Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: Merging two data frames on a datetime index where one index is Object data type
5 participants