Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Expose libcudf column round in Python #1270

Closed
beckernick opened this issue Mar 22, 2019 · 11 comments · Fixed by #7022
Closed

[FEA] Expose libcudf column round in Python #1270

beckernick opened this issue Mar 22, 2019 · 11 comments · Fixed by #7022
Assignees
Labels
feature request New feature or request Python Affects Python cuDF API.

Comments

@beckernick
Copy link
Member

Is your feature request related to a problem? Please describe.
As a user, I'd like to be able to round a Series to a specified number of decimal places, like in Pandas.

Describe the solution you'd like
I'd like to call series.round(n) and round the values in the series to n decimal places.

Describe alternatives you've considered
The alternative is to define a rounding UDF or go to the CPU.

Additional context
Add any other context, code examples, or references to existing implementations about the feature request here.

@beckernick beckernick added Needs Triage Need team to review and classify feature request New feature or request labels Mar 22, 2019
@kkraus14 kkraus14 added Python Affects Python cuDF API. libcudf Affects libcudf (C++/CUDA) code. and removed Needs Triage Need team to review and classify labels Apr 8, 2019
@mrocklin
Copy link
Collaborator

It looks like the // operator does the right thing. Presumably the functionality exists to do this today.

In [1]: import cudf

In [2]: df = cudf.DataFrame({'x': [1.2345, 6.7890]})

In [3]: print(df)
        x
0  1.2345
1   6.789

In [4]: print(df // 0.01 * 0.01)
      x
0  1.23
1  6.78

In [5]: df.to_pandas().round(2)
Out[5]:
      x
0  1.23
1  6.79

@beckernick
Copy link
Member Author

beckernick commented Apr 23, 2019

Floor division will work often but doesn't resolve the discrepancies shown when you should be rounding up for that place.

@beckernick
Copy link
Member Author

beckernick commented May 15, 2019

Stopgap Python layer functionality for this feature has been merged with #1745. Explicitly backlogging this and defer to your prioritization on the libcudf side, @harrism

@beckernick beckernick changed the title [FEA] Series level round [FEA][BACKLOG] libcudf column round May 15, 2019
@harrism
Copy link
Member

harrism commented May 16, 2019

@beckernick I think the reason that C/C++ doesn't have a round(x, decimals) to round floating point values to a number of decimal points is because binary floating point numbers don't have decimals, by definition. Rounding can lead to unrepresentable values and thus result in less accuracy than the decimals parameter suggests... In C++, people usually round on printing, rather than changing the stored value.

But if you insist...

@beckernick
Copy link
Member Author

@harrism that makes sense to me. As long as the end user has dataframe/series functionality that they expect (which includes rounding since it's in pandas/numpy), I'm happy. Completely defer to you and @kkraus14 as to whether this functionality should exist (or not exist) in libcuDF.

@mrocklin
Copy link
Collaborator

Yeah, there are lots of technical reasons to not round. It's good to remind ourselves that our users aren't technical, and don't care about those reasons :)

@harrism
Copy link
Member

harrism commented May 17, 2019

I wonder how often users round their data and then complain that the results are wrong. However, I think it's unfair to call users of tools like Python and Pandas non-technical...

@beckernick
Copy link
Member Author

I believe this is now ready, based on discussion with @codereport in #6976

@ChrisJar would you be able to explore plumbing cudf::round up to python and removing the old numba.cuda round code?

@harrism harrism removed the libcudf Affects libcudf (C++/CUDA) code. label Dec 10, 2020
@harrism harrism changed the title [FEA][BACKLOG] libcudf column round [FEA][BACKLOG] Expose libcudf column round in Python Dec 10, 2020
@harrism
Copy link
Member

harrism commented Dec 10, 2020

Thanks for digging up this old issue, @beckernick , I forgot about it! Retitled this and relabeled it to reflect that it is a Python feature request now, since the libcudf feature is implemented.

Note that this allows us to remove some Numba code. See #6976 (comment)

@harrism harrism changed the title [FEA][BACKLOG] Expose libcudf column round in Python [FEA] Expose libcudf column round in Python Dec 10, 2020
@ChrisJar
Copy link
Contributor

Yeah, would love to work on this

@kkraus14
Copy link
Collaborator

Yeah, would love to work on this

Thanks!

rapids-bot bot pushed a commit that referenced this issue Jan 20, 2021
This enables round for DataFrames and Series using the libcudf round implementation and removes the old numba round implementation.

Closes #1270

Authors:
  - @ChrisJar

Approvers:
  - Ashwin Srinath (@shwina)
  - Michael Wang (@isVoid)
  - Ram (Ramakrishna Prabhu) (@rgsl888prabhu)
  - GALI PREM SAGAR (@galipremsagar)

URL: #7022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants