BUG: make dense ranks results scale to 100 percent #21203

peterpanmj · 2018-05-25T08:12:23Z

closes BUG: groupby.rank results do not scale to 100% for dense #20731
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

WillAyd · 2018-05-25T23:28:12Z

pandas/tests/groupby/test_rank.py

@@ -167,11 +167,11 @@ def test_infs_n_nans(grps, vals, ties_method, ascending, na_option, exp):
    ('dense', True, 'keep', False,
        [1., 1., np.nan, 3., 1., 2., np.nan, np.nan]),
    ('dense', True, 'keep', True,
-        [0.2, 0.2, np.nan, 0.6, 0.2, 0.4, np.nan, np.nan]),
+        [1. / 3., 1. / 3., np.nan, 3. / 3., 1. / 3., 2. / 3., np.nan, np.nan]),


Can you use literals instead of expressions here?

@that will be 0.3333333333333333. A little hard to read. Still use literal ?

Hmm yea don’t do that. If it’s minimal effort to replace the data with something more easily divisible please do so. Otherwise then I’m ok with it as is

WillAyd · 2018-05-25T23:29:52Z

pandas/_libs/groupby_helper.pxi.in

                grp_na_count = 0
                val_start = i + 1
                grp_start = i + 1
+                lab_start = i + 1


Isn't this the same thing as grp_start?

WillAyd · 2018-05-25T23:33:29Z

pandas/_libs/groupby_helper.pxi.in

-                    grp_sizes[_as[j], 0] = i - grp_start + 1 - grp_na_count
-                dups = sum_ranks = 0
+                if pct:
+                    if tiebreak != TIEBREAK_DENSE:


Instead of doing this here is it not possible to alter the handling of the tie_count variable incrementing above depending on whether or not we are using pct and TIEBREAK_DENSE together? Seems like it would be simpler to do that then to implement another branch for assigning percents, if that's possible

Do you mean move the logic to here ?

elif tiebreak == TIEBREAK_DENSE: for j in range(i - dups + 1, i + 1): out[_as[j], 0] = grp_vals_seen

Hmm well I suppose that that would alter the non-pct items, but my overall point is I feel like this can be done more succinctly. In either case (TIEBREAK_DENSE or not) it's a matter of dividing by the right denominator to get the correct value.

So I think it would be cleaner to do something along the lines of:

if TIEBREAK_DENSE: denominator = ... else: denominator = ... for j in range(lab_start, i + 1): out[_as[j], 0] = out[_as[j], 0] / denominator

codecov · 2018-05-27T04:19:34Z

Codecov Report

Merging #21203 into master will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master   #21203   +/-   ##
=======================================
  Coverage   91.84%   91.84%           
=======================================
  Files         153      153           
  Lines       49543    49543           
=======================================
  Hits        45504    45504           
  Misses       4039     4039

Flag	Coverage Δ
#multiple	`90.24% <ø> (ø)`	⬆️
#single	`41.87% <ø> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5348e06...c5b2756. Read the comment docs.

jreback

can you add a whatsnew note. otherwise lgtm.

jreback · 2018-05-29T00:49:43Z

I guess this should be in 0.24 as its a numeric change. any takers for 0.23.1?

WillAyd · 2018-05-29T03:43:47Z

I think its minor enough to be in 0.23.1 but don't have that strong a preference either way

jreback

minor comments, let's move to 0.23.1 @WillAyd merge when ready.

jreback · 2018-05-31T10:29:23Z

pandas/_libs/groupby_helper.pxi.in

+                        grp_sizes[_as[j], 0] = i - grp_start + 1 - grp_na_count
+                else:
+                    for j in range(grp_start, i + 1):
+                        grp_sizes[_as[j], 0] = (grp_tie_count


minor, but can you put the - on the previous line

break line after the operator ?

for j in range(grp_start, i + 1): grp_sizes[_as[j], 0] = (grp_tie_count - (grp_na_count > 0))

jreback · 2018-05-31T10:29:42Z

doc/source/whatsnew/v0.24.0.txt

@@ -111,7 +111,7 @@ Offsets
 Numeric
 ^^^^^^^

-
+- Bug in :func:`pandas.core.groupby.GroupBy.rank` where results did not scale to 100% when specifying ``method='dense'`` and ``pct=True``


ok let's move to 0.23.1

WillAyd · 2018-05-31T20:45:12Z

Thanks @peterpanmj !

(cherry picked from commit b237b11)

WillAyd added the Groupby label May 25, 2018

WillAyd requested changes May 25, 2018

View reviewed changes

jreback requested changes May 29, 2018

View reviewed changes

jreback added the Bug label May 29, 2018

peterpanmj force-pushed the dense_scale branch from 3b770fa to 9921b68 Compare May 30, 2018 12:41

jreback approved these changes May 31, 2018

View reviewed changes

jreback added this to the 0.23.1 milestone May 31, 2018

jreback added the Needs Backport label May 31, 2018

BUG: make dense ranks results scale to 100 percent (pandas-dev#20731)

6c04e5a

peterpanmj force-pushed the dense_scale branch from 9921b68 to 6c04e5a Compare May 31, 2018 14:20

Merge branch 'master' into dense_scale

c5b2756

WillAyd approved these changes May 31, 2018

View reviewed changes

WillAyd changed the title ~~BUG: make dense ranks results scale to 100 percent (#20731)~~ BUG: make dense ranks results scale to 100 percent May 31, 2018

WillAyd merged commit b237b11 into pandas-dev:master May 31, 2018

jorisvandenbossche removed the Needs Backport label Jun 8, 2018

jorisvandenbossche pushed a commit to jorisvandenbossche/pandas that referenced this pull request Jun 8, 2018

BUG: make dense ranks results scale to 100 percent (pandas-dev#21203)

7b54afc

(cherry picked from commit b237b11)

jorisvandenbossche pushed a commit that referenced this pull request Jun 9, 2018

BUG: make dense ranks results scale to 100 percent (#21203)

110cf95

(cherry picked from commit b237b11)

david-liu-brattle-1 pushed a commit to david-liu-brattle-1/pandas that referenced this pull request Jun 18, 2018

BUG: make dense ranks results scale to 100 percent (pandas-dev#21203)

8f0136b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: make dense ranks results scale to 100 percent #21203

BUG: make dense ranks results scale to 100 percent #21203

peterpanmj commented May 25, 2018 •

edited

Loading

WillAyd May 25, 2018

peterpanmj May 27, 2018 •

edited

Loading

WillAyd May 27, 2018

WillAyd May 25, 2018

WillAyd May 25, 2018

peterpanmj May 26, 2018

WillAyd May 26, 2018

codecov bot commented May 27, 2018 •

edited

Loading

jreback left a comment

jreback commented May 29, 2018

WillAyd commented May 29, 2018

jreback left a comment

jreback May 31, 2018

peterpanmj May 31, 2018

jreback May 31, 2018

peterpanmj May 31, 2018

WillAyd commented May 31, 2018

BUG: make dense ranks results scale to 100 percent #21203

BUG: make dense ranks results scale to 100 percent #21203

Conversation

peterpanmj commented May 25, 2018 • edited Loading

Choose a reason for hiding this comment

peterpanmj May 27, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented May 27, 2018 • edited Loading

Codecov Report

jreback left a comment

Choose a reason for hiding this comment

jreback commented May 29, 2018

WillAyd commented May 29, 2018

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

WillAyd commented May 31, 2018

peterpanmj commented May 25, 2018 •

edited

Loading

peterpanmj May 27, 2018 •

edited

Loading

codecov bot commented May 27, 2018 •

edited

Loading