Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: groupby.rank results do not scale to 100% for dense #20731

Closed
peterpanmj opened this issue Apr 18, 2018 · 4 comments · Fixed by #21203
Closed

BUG: groupby.rank results do not scale to 100% for dense #20731

peterpanmj opened this issue Apr 18, 2018 · 4 comments · Fixed by #21203
Labels
API Design Groupby Numeric Operations Arithmetic, Comparison, and Logical operations
Milestone

Comments

@peterpanmj
Copy link
Contributor

Code Sample

In [61]: df_test = pd.DataFrame({"A":[1,1,2,2],"B":[1,1,1,1]})

In [62]: df_test.groupby("B").rank(method="dense", ascending=True, pct=False, na_option='top')
Out[62]:
     A
0  1.0
1  1.0
2  2.0
3  2.0

In [63]: df_test.groupby("B").rank(method="dense", ascending=True, pct=True, na_option='top')
Out[63]:
      A
0  0.25
1  0.25
2  0.50
3  0.50

Problem description

pd.groupby.rank result does not scale to 100% when method is "dense".

Expected Output

In [65]: df_test['A'].rank(method="dense", ascending=True, pct=True, na_option='top')
Out[65]:
0    0.5
1    0.5
2    1.0
3    1.0
Name: A, dtype: float64

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: 448124c138dc39001638aacd68f253b1034d7f04 python: 3.6.4.final.0 python-bits: 64 OS: Windows OS-release: 7 machine: AMD64 processor: Intel64 Family 6 Model 60 Stepping 3, GenuineIntel byteorder: little LC_ALL: None LANG: zh_CN.UTF-8 LOCALE: None.None

pandas: 0.23.0.dev0+743.g448124c13
pytest: 3.3.2
pip: 9.0.1
setuptools: 38.4.0
Cython: 0.27.3
numpy: 1.14.0
scipy: 1.0.0
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: 1.6.7
patsy: 0.5.0
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.4
feather: None
matplotlib: 2.1.2
openpyxl: 2.4.10
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml: 4.1.1
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.1
pymysql: 0.7.11.None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@peterpanmj peterpanmj changed the title BUG: groupby.rank results do not scale to 1 for dense BUG: groupby.rank results do not scale to 100% for dense Apr 18, 2018
@peterpanmj
Copy link
Contributor Author

cc @rouzazari

@jreback
Copy link
Contributor

jreback commented Apr 22, 2018

xref #15639

@jreback jreback added Groupby API Design Numeric Operations Arithmetic, Comparison, and Logical operations labels Apr 22, 2018
@jreback jreback added this to the Next Major Release milestone Apr 22, 2018
peterpanmj added a commit to peterpanmj/pandas that referenced this issue May 22, 2018
@peterpanmj
Copy link
Contributor Author

I am working on a fix. There are some existing tests do not expect the percentage ranks to scale to 1. We need to agree on what is expected results first.

('dense', True, 'no_na', True,
[0.125, 0.125, 0.5, 0.375, 0.125, 0.25, 0.5, 0.5]),
('dense', False, 'no_na', False, [3., 3., 4., 1., 3., 2., 4., 4.]),
('dense', False, 'no_na', True,
[0.375, 0.375, 0.5, 0.125, 0.375, 0.25, 0.5, 0.5])

#20781 #19481
@jreback @WillAyd

@WillAyd
Copy link
Member

WillAyd commented May 22, 2018

Hmm didn't realize that's how things worked outside of GroupBy. If I'm not mistaken all of the values here should just be doubled

peterpanmj added a commit to peterpanmj/pandas that referenced this issue May 27, 2018
@jreback jreback modified the milestones: Next Major Release, 0.24.0 May 29, 2018
@jreback jreback modified the milestones: 0.24.0, 0.23.1 May 31, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Groupby Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants