BUG: groupby.rank results do not scale to 100% for dense #20731

peterpanmj · 2018-04-18T08:54:27Z

Code Sample

In [61]: df_test = pd.DataFrame({"A":[1,1,2,2],"B":[1,1,1,1]})

In [62]: df_test.groupby("B").rank(method="dense", ascending=True, pct=False, na_option='top')
Out[62]:
     A
0  1.0
1  1.0
2  2.0
3  2.0

In [63]: df_test.groupby("B").rank(method="dense", ascending=True, pct=True, na_option='top')
Out[63]:
      A
0  0.25
1  0.25
2  0.50
3  0.50

Problem description

pd.groupby.rank result does not scale to 100% when method is "dense".

Expected Output

In [65]: df_test['A'].rank(method="dense", ascending=True, pct=True, na_option='top')
Out[65]:
0    0.5
1    0.5
2    1.0
3    1.0
Name: A, dtype: float64

Output of `pd.show_versions()`

INSTALLED VERSIONS ------------------ commit: 448124c138dc39001638aacd68f253b1034d7f04 python: 3.6.4.final.0 python-bits: 64 OS: Windows OS-release: 7 machine: AMD64 processor: Intel64 Family 6 Model 60 Stepping 3, GenuineIntel byteorder: little LC_ALL: None LANG: zh_CN.UTF-8 LOCALE: None.None

pandas: 0.23.0.dev0+743.g448124c13
pytest: 3.3.2
pip: 9.0.1
setuptools: 38.4.0
Cython: 0.27.3
numpy: 1.14.0
scipy: 1.0.0
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: 1.6.7
patsy: 0.5.0
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.4
feather: None
matplotlib: 2.1.2
openpyxl: 2.4.10
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml: 4.1.1
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.1
pymysql: 0.7.11.None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

The text was updated successfully, but these errors were encountered:

peterpanmj · 2018-04-19T07:56:20Z

cc @rouzazari

jreback · 2018-04-22T15:00:37Z

xref #15639

peterpanmj · 2018-05-22T10:55:50Z

I am working on a fix. There are some existing tests do not expect the percentage ranks to scale to 1. We need to agree on what is expected results first.

pandas/pandas/tests/groupby/test_rank.py

Lines 200 to 204 in 791de95

    
               ('dense', True, 'no_na', True, 
        
                   [0.125, 0.125, 0.5, 0.375, 0.125, 0.25, 0.5, 0.5]), 
        
               ('dense', False, 'no_na', False, [3., 3., 4., 1., 3., 2., 4., 4.]), 
        
               ('dense', False, 'no_na', True, 
        
                   [0.375, 0.375, 0.5, 0.125, 0.375, 0.25, 0.5, 0.5])

#20781 #19481
@jreback @WillAyd

WillAyd · 2018-05-22T16:07:45Z

Hmm didn't realize that's how things worked outside of GroupBy. If I'm not mistaken all of the values here should just be doubled

peterpanmj changed the title ~~BUG: groupby.rank results do not scale to 1 for dense~~ BUG: groupby.rank results do not scale to 100% for dense Apr 18, 2018

peterpanmj mentioned this issue Apr 18, 2018

BUG: Fix problems in group rank when both nans and infinity are present #20561 #20681

Merged

4 tasks

jreback added Groupby API Design Numeric Operations Arithmetic, Comparison, and Logical operations labels Apr 22, 2018

jreback added this to the Next Major Release milestone Apr 22, 2018

jreback added Difficulty Intermediate labels Apr 22, 2018

peterpanmj added a commit to peterpanmj/pandas that referenced this issue May 22, 2018

BUG: Modify rank calculation when for dense rank pandas-dev#20731

adfd619

peterpanmj added a commit to peterpanmj/pandas that referenced this issue May 25, 2018

BUG: make dense ranks results scale to 100 percent (pandas-dev#20731)

128d005

peterpanmj mentioned this issue May 25, 2018

BUG: make dense ranks results scale to 100 percent #21203

Merged

4 tasks

peterpanmj added a commit to peterpanmj/pandas that referenced this issue May 27, 2018

BUG: simplify the logic for rank calculation pandas-dev#20731

aa0043b

jreback modified the milestones: Next Major Release, 0.24.0 May 29, 2018

peterpanmj added a commit to peterpanmj/pandas that referenced this issue May 30, 2018

BUG: make dense ranks results scale to 100 percent (pandas-dev#20731)

9921b68

jreback modified the milestones: 0.24.0, 0.23.1 May 31, 2018

peterpanmj added a commit to peterpanmj/pandas that referenced this issue May 31, 2018

BUG: make dense ranks results scale to 100 percent (pandas-dev#20731)

6c04e5a

WillAyd closed this as completed in #21203 May 31, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: groupby.rank results do not scale to 100% for dense #20731

BUG: groupby.rank results do not scale to 100% for dense #20731

peterpanmj commented Apr 18, 2018

peterpanmj commented Apr 19, 2018

jreback commented Apr 22, 2018

peterpanmj commented May 22, 2018

WillAyd commented May 22, 2018

BUG: groupby.rank results do not scale to 100% for dense #20731

BUG: groupby.rank results do not scale to 100% for dense #20731

Comments

peterpanmj commented Apr 18, 2018

Code Sample

Problem description

Expected Output

Output of pd.show_versions()

peterpanmj commented Apr 19, 2018

jreback commented Apr 22, 2018

peterpanmj commented May 22, 2018

WillAyd commented May 22, 2018

Output of `pd.show_versions()`