-
Notifications
You must be signed in to change notification settings - Fork 901
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve GroupBy JIT error handling #13854
Improve GroupBy JIT error handling #13854
Conversation
"apply.\nPlease file an issue " | ||
"requesting support for this feature at cuDF's GitHub page." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we intend to support all possible inputs that can raise this error? It seems like we're asking for a lot of small issues and some of them might even be "wontfix."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a great question and worth discussion. From my point of view, the API is trying to convince the user as best as possible that their function is simply being run in a loop over groups, as in the iterative approach. In an ideal world then, every function we support in JIT groupby apply would support the same set of dtypes that the real Series.api
supports. We have big holes in that right now, for example corr
which is int
only and the rest of the functions which don't support int16
, int8
or any unsigned types yet. Each one of these missing overloads is a missing feature IMO and it will take some time to close the gap.
In raising as such I mainly intend to clarify what is going wrong, however in suggesting the issue I was aiming to create a feedback mechanism that might aid in our prioritization. However maybe as you suggest this is clumsy as we'd be waiting for users to file issues we already know exist.
What would you think of creating an umbrella issue that tracks the gap between supported dtypes and linking to it in the error instead, perhaps suggesting a github reply? We could track everything that's wontfix
there. Or, if pointing to github seems unnecessary in its entirety I can remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A single issue would be better! Maybe you can start it and populate it with known gaps. But generally it’s unusual for cudf to point to GitHub issues in other circumstances such as when calling unsupported keyword arguments from pandas.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shwina What are your thoughts here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think perhaps pointing a single issue in the error message so folks can provide input as comments is maybe a good compromise? Also it would be a real clickable link in the error message, which is nice. Perhaps something along:
The
<func>()
function is not supported for the input types<...>
by cuDF's JIT engine. To see what's currently supported or request support for a new function, go tohttps://github.com/rapidsai/cudf/issues/42
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty close, couple of small things left that are worth doing though.
Co-authored-by: Vyas Ramasubramani <vyas.ramasubramani@gmail.com>
/ok to test |
@brandon-b-miller could you fix style here? Then we can finalize. |
/ok to test |
/ok to test |
/ok to test |
/ok to test |
/ok to test |
/ok to test |
I think this is ready to merge - any last thoughts anyone? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggested simplifying the error message text. Otherwise I don't have strong opinions on this PR and we should merge it to unblock further work on migration to Numba 0.58.
/ok to test |
/merge |
This PR implements general error handling for features that are missing dtype overloads in JIT GroupBy. Before, if a feature was not supported for a certain dtype, a somewhat confusing numba error is raised. With this PR however, a clear error is raised, for instance in the case of missing
float
dtypes for thecorr
aggregation:Now prints