BUG: combine_first does not retain dtypes with Timestamp DataFrames #38145

arw2019 · 2020-11-29T04:31:12Z

closes combine_first returns unexpected results for timestamp dataframes #28481
tests added / passed
passes black pandas
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

Picking up #35514

jbrockmendel · 2020-11-29T05:31:44Z

Why does combin_first use expressions.where instead of mgr.where? Casting should be handled correctly there

arw2019 · 2020-11-29T07:43:15Z

Why does combin_first use expressions.where instead of mgr.where? Casting should be handled correctly there

That line hasn't been touched since #17744, will look

jbrockmendel · 2020-11-29T15:51:02Z

Now that i have an actual keyboard: this would likely also require fixing Block.where #38073

arw2019 · 2020-11-29T17:38:02Z

Now that i have an actual keyboard: this would likely also require fixing Block.where #38073

Ok - in this case this would sit on top of that

I'm struggling a little with the pattern. Atm I have:

    def combine_first(self, other: DataFrame) -> DataFrame:
        def combiner(x, y):
            mask = extract_array(isna(x))

            x_values = extract_array(x, extract_numpy=True)
            y_values = extract_array(y, extract_numpy=True)

            # If the column y in other DataFrame is not in first DataFrame,
            # just return y_values.
            if y.name not in self.columns:
                return y_values

            _where = self._mgr.where(y_values, mask, align=True, errors="raise", try_cast=True, axis=1)
            # extract array from _where

        return self.combine(other, combiner, overwrite=False)

Is this in the right ballpark?

jreback

Now that i have an actual keyboard: this would likely also require fixing Block.where #38073

@jbrockmendel is the current patch not good for 1.2? (agreed could use some refactoring).

jreback · 2020-11-29T19:30:37Z

pandas/tests/frame/methods/test_combine_first.py

+    "scalar1, scalar2",
+    [
+        (datetime(2020, 1, 1), datetime(2020, 1, 2)),
+        (pd.Period("2020-01-01", "D"), pd.Period("2020-01-02", "D")),


can add an Interval example here as well

jbrockmendel · 2020-11-29T19:36:52Z

is the current patch not good for 1.2? (agreed could use some refactoring).

This looks good for now

jreback · 2020-11-29T19:43:22Z

@arw2019 ok small test comment. ping on green.

…28481-combine_first_timestamp

jreback · 2020-11-30T13:21:52Z

thanks @arw2019

…andas-dev#38145)

jbrockmendel · 2020-12-01T03:54:46Z

@arw2019 do you have time to try to make this go through self._mgr.where? (no hurry)

arw2019 · 2020-12-01T06:04:25Z

@arw2019 do you have time to try to make this go through self._mgr.where? (no hurry)

yes, in the works

nixphix and others added 21 commits August 2, 2020 11:32

BUG: fix combine_first converting timestamp to int (pandas-dev#28481)

8a4181d

Add whats new entry

0b938f6

Uncomment failing test cases

9f841c9

Merge branch 'master' into fix/combine-first

115667e

Merge branch 'master' into fix/combine-first

a044c2d

Fix failing test cases

457a0ab

Resolve comments

8178c2e

Merge branch 'master' into fix/combine-first

5f22f4d

Black format

28b61c3

Merge branch 'master' into fix/combine-first

62997c4

Split test case

299ff04

merge with upstream

6dba6d0

tests

6aa312b

pd namespace usage

ac30896

merge conflicts

259822e

rewrite test

5f6f79b

rewrite test

2240e95

rewrite test

268c062

rewrite test

b439269

rewrite test

c0fdfe2

merge error

a4f30eb

arw2019 mentioned this pull request Nov 29, 2020

BUG: fix combine_first converting timestamp to int #35514

Closed

5 tasks

jreback added Bug Dtype Conversions Unexpected or buggy dtype conversions Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Nov 29, 2020

jreback added this to the 1.2 milestone Nov 29, 2020

jreback requested changes Nov 29, 2020

View reviewed changes

arw2019 added 2 commits November 29, 2020 23:14

add Interval example

76ca620

Merge branch 'master' of https://github.com/pandas-dev/pandas into GH…

42ccb58

…28481-combine_first_timestamp

arw2019 mentioned this pull request Nov 30, 2020

REF/CLN: DataFrame.combine_first #38174

Open

arw2019 closed this Nov 30, 2020

arw2019 reopened this Nov 30, 2020

jreback merged commit aad85ad into pandas-dev:master Nov 30, 2020

jbrockmendel pushed a commit to jbrockmendel/pandas that referenced this pull request Nov 30, 2020

BUG: combine_first does not retain dtypes with Timestamp DataFrames (p…

4781a9c

…andas-dev#38145)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: combine_first does not retain dtypes with Timestamp DataFrames #38145

BUG: combine_first does not retain dtypes with Timestamp DataFrames #38145

arw2019 commented Nov 29, 2020

jbrockmendel commented Nov 29, 2020

arw2019 commented Nov 29, 2020

jbrockmendel commented Nov 29, 2020

arw2019 commented Nov 29, 2020

jreback left a comment

jreback Nov 29, 2020

arw2019 Nov 30, 2020

jbrockmendel commented Nov 29, 2020

jreback commented Nov 29, 2020

jreback commented Nov 30, 2020

jbrockmendel commented Dec 1, 2020

arw2019 commented Dec 1, 2020

BUG: combine_first does not retain dtypes with Timestamp DataFrames #38145

BUG: combine_first does not retain dtypes with Timestamp DataFrames #38145

Conversation

arw2019 commented Nov 29, 2020

jbrockmendel commented Nov 29, 2020

arw2019 commented Nov 29, 2020

jbrockmendel commented Nov 29, 2020

arw2019 commented Nov 29, 2020

jreback left a comment

Choose a reason for hiding this comment

jreback Nov 29, 2020

Choose a reason for hiding this comment

arw2019 Nov 30, 2020

Choose a reason for hiding this comment

jbrockmendel commented Nov 29, 2020

jreback commented Nov 29, 2020

jreback commented Nov 30, 2020

jbrockmendel commented Dec 1, 2020

arw2019 commented Dec 1, 2020