[REVIEW] Handle various parameter combinations in `replace` API #7207

galipremsagar · 2021-01-26T04:43:00Z

The replace API has two parameters to_replace & value which are overloaded and support different types of inputs for each of these two parameters have different behaviors. These changes introduce clear code-flow for each type of possible parameter combination. This way it would be easier to support newer parameters in future like regex & nested dict types, which would change the behaviour of to_replace & value parameters..

Ensure all combinations are covered for to_replace & value for both DataFrame.replace & Series.replace.
Document changes inline & Update func docs.
Add tests to include coverage for all combinations that are not yet covered.

codecov · 2021-01-26T08:52:34Z

Codecov Report

Merging #7207 (a68e754) into branch-0.18 (8860baf) will increase coverage by 0.16%.
The diff coverage is n/a.

@@               Coverage Diff               @@
##           branch-0.18    #7207      +/-   ##
===============================================
+ Coverage        82.09%   82.25%   +0.16%     
===============================================
  Files               97       99       +2     
  Lines            16474    16890     +416     
===============================================
+ Hits             13524    13893     +369     
- Misses            2950     2997      +47

Impacted Files	Coverage Δ
python/cudf/cudf/__init__.py	`100.00% <ø> (ø)`
python/cudf/cudf/_fuzz_testing/parquet.py	`0.00% <ø> (ø)`
python/cudf/cudf/_lib/__init__.py	`100.00% <ø> (ø)`
python/cudf/cudf/_typing.py	`92.30% <ø> (ø)`
python/cudf/cudf/core/__init__.py	`100.00% <ø> (ø)`
python/cudf/cudf/core/abc.py	`91.48% <ø> (+4.25%)`	⬆️
python/cudf/cudf/core/buffer.py	`80.00% <ø> (+0.95%)`	⬆️
python/cudf/cudf/core/column/__init__.py	`100.00% <ø> (ø)`
python/cudf/cudf/core/column/categorical.py	`92.55% <ø> (-0.80%)`	⬇️
python/cudf/cudf/core/column/column.py	`87.75% <ø> (-0.39%)`	⬇️
... and 71 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b608832...a68e754. Read the comment docs.

shwina · 2021-01-28T18:58:05Z

python/cudf/cudf/core/column/numerical.py

+        if not isinstance(to_replace_col, NumericalColumn) or not isinstance(
+            replacement_col, NumericalColumn
+        ):
+            return self


If replacing numbers with e.g., strings, should we fail instead?

I think we should allow this operation. Reason is that a dataframe that could have a mix of string and numeric values will be called with some generic to_replace & value params like:

>>> import pandas as pd >>> df = pd.DataFrame({'a':[1, 2, 3, 4], 'strings':['a','b','d','e']}) >>> df a strings 0 1 a 1 2 b 2 3 d 3 4 e >>> df.replace(1, 111) a strings 0 111 a 1 2 b 2 3 d 3 4 e >>> df.replace("a", "aaa") a strings 0 1 aaa 1 2 b 2 3 d 3 4 e

In such cases failing with an error message asking the user to drop the rest of the columns and do replace and again the caller of the API will have to perform the reordering/rejoining of the dataframe columns would be painful. Instead, we can simply ignore if we detect mismatching dtypes are passed.

Got it. What happens when we do the same with a Series? For example:

In [2]: a = cudf.Series([1, 2, 3]) In [3]: a.replace(1, "a")

Do we fail or silently return?

Return silently in case of series aswell.

Pandas actually does the replace here, which makes me think we should either raise or clearly document the difference in behaviour with an example.

Added an example and mentioned the difference in behavior with respect to pandas.

FWIW I think we should throw in this situation. If the dtype of to_replace and value aren't the same with reason (allow different integers / floats for example), we're in a situation we don't want to support in cuDF and we should throw accordingly.

This is different than the case where to_replace and value are the same type but a different type than self, in which case silently returning a copy at least somewhat makes sense.

Got it, so we want to error loudly for:

>>> s = cudf.Series(['a', 'b', 'c']) >>> s.replace(['a', 'b'], [0, 1])

and return silently for:

>>> s = cudf.Series(['a', 'b', 'c']) >>> s.replace([1, 2], [3, 4])

Did I get it right?

Yup, those are my thoughts at least. What do you think @galipremsagar and @shwina?

I think it makes sense to me. I have added the validation and tests validating them.

python/cudf/cudf/core/frame.py

Co-authored-by: Ashwin Srinath <3190405+shwina@users.noreply.github.com>

python/cudf/cudf/core/frame.py

python/cudf/cudf/core/column/numerical.py

python/cudf/cudf/core/column/string.py

python/cudf/cudf/core/frame.py

galipremsagar · 2021-02-01T17:26:59Z

@gpucibot merge

galipremsagar added 2 commits January 25, 2021 19:54

fix replacement logic for replace api

c6cb9a5

remove old logic

c5f480f

galipremsagar added 2 - In Progress Currently a work in progress Python Affects Python cuDF API. labels Jan 26, 2021

galipremsagar self-assigned this Jan 26, 2021

galipremsagar added 7 commits January 26, 2021 15:12

Merge remote-tracking branch 'upstream/branch-0.18' into 7206

f20da08

add tests

595aa49

Merge remote-tracking branch 'upstream/branch-0.18' into 7206

cf8d4a6

add tests

f45d92a

Merge remote-tracking branch 'upstream/branch-0.18' into 7206

32a603e

add docs

c3917bd

update all docs

3a0dc6a

galipremsagar added the non-breaking Non-breaking change label Jan 28, 2021

galipremsagar marked this pull request as ready for review January 28, 2021 18:43

galipremsagar requested a review from a team as a code owner January 28, 2021 18:43

galipremsagar requested review from shwina and brandon-b-miller January 28, 2021 18:43

galipremsagar changed the title ~~[WIP] Handle various parameter combinations in replace API~~ [REVIEW] Handle various parameter combinations in replace API Jan 28, 2021

galipremsagar added 3 - Ready for Review Ready for review by team 4 - Needs cuDF (Python) Reviewer and removed 2 - In Progress Currently a work in progress labels Jan 28, 2021

galipremsagar requested a review from kkraus14 January 28, 2021 18:43

galipremsagar added the bug Something isn't working label Jan 28, 2021

galipremsagar added 2 commits January 28, 2021 10:48

copyright

ca45932

Merge remote-tracking branch 'upstream/branch-0.18' into 7206

dea2a63

shwina reviewed Jan 28, 2021

View reviewed changes

python/cudf/cudf/core/frame.py Outdated Show resolved Hide resolved

shwina reviewed Jan 28, 2021

View reviewed changes

python/cudf/cudf/core/frame.py Outdated Show resolved Hide resolved

shwina reviewed Jan 28, 2021

View reviewed changes

python/cudf/cudf/core/frame.py Outdated Show resolved Hide resolved

Update python/cudf/cudf/core/frame.py

29086d5

Co-authored-by: Ashwin Srinath <3190405+shwina@users.noreply.github.com>

shwina reviewed Jan 28, 2021

View reviewed changes

python/cudf/cudf/core/frame.py Outdated Show resolved Hide resolved

fix naming

37fd1f6

kkraus14 reviewed Jan 28, 2021

View reviewed changes

galipremsagar added 3 commits January 28, 2021 19:05

Merge remote-tracking branch 'upstream/branch-0.18' into 7206

abc0637

cast

ba16abc

rename variable

b3d59bb

kkraus14 reviewed Jan 29, 2021

View reviewed changes

python/cudf/cudf/core/frame.py Outdated Show resolved Hide resolved

brandon-b-miller reviewed Jan 29, 2021

View reviewed changes

python/cudf/cudf/core/frame.py Show resolved Hide resolved

galipremsagar added 3 commits January 29, 2021 09:34

handle deep copies and shallow copies

e4438c0

handled missed deep copy case

3343492

broadcast to scalar as to_replace can potentially be large

b5818e3

kkraus14 reviewed Jan 29, 2021

View reviewed changes

python/cudf/cudf/core/frame.py Outdated Show resolved Hide resolved

python/cudf/cudf/core/frame.py Outdated Show resolved Hide resolved

python/cudf/cudf/core/frame.py Outdated Show resolved Hide resolved

galipremsagar added 6 commits January 29, 2021 12:21

replace isinstance cudf.Series checks

9933433

add validation errors

6882cc2

add tests

14ba184

reorg tests

b38faf1

add tests

6f06dba

add more tests

a68e754

kkraus14 approved these changes Jan 29, 2021

View reviewed changes

kkraus14 removed 3 - Ready for Review Ready for review by team 4 - Needs cuDF (Python) Reviewer labels Jan 29, 2021

brandon-b-miller approved these changes Feb 1, 2021

View reviewed changes

galipremsagar added the 5 - Ready to Merge Testing and reviews complete, ready to merge label Feb 1, 2021

rapids-bot bot merged commit b8cb8c7 into rapidsai:branch-0.18 Feb 1, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[REVIEW] Handle various parameter combinations in `replace` API #7207

[REVIEW] Handle various parameter combinations in `replace` API #7207

galipremsagar commented Jan 26, 2021 •

edited

Loading

codecov bot commented Jan 26, 2021 •

edited

Loading

shwina Jan 28, 2021

galipremsagar Jan 28, 2021

shwina Jan 28, 2021

galipremsagar Jan 28, 2021

shwina Jan 28, 2021

galipremsagar Jan 28, 2021

kkraus14 Jan 29, 2021 •

edited

Loading

galipremsagar Jan 29, 2021

kkraus14 Jan 29, 2021

galipremsagar Jan 29, 2021

galipremsagar commented Feb 1, 2021

[REVIEW] Handle various parameter combinations in replace API #7207

[REVIEW] Handle various parameter combinations in replace API #7207

Conversation

galipremsagar commented Jan 26, 2021 • edited Loading

codecov bot commented Jan 26, 2021 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kkraus14 Jan 29, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

galipremsagar commented Feb 1, 2021

[REVIEW] Handle various parameter combinations in `replace` API #7207

[REVIEW] Handle various parameter combinations in `replace` API #7207

galipremsagar commented Jan 26, 2021 •

edited

Loading

codecov bot commented Jan 26, 2021 •

edited

Loading

kkraus14 Jan 29, 2021 •

edited

Loading