-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Windowed rank functions need NA handling #774
Comments
I can't see how Could you provide a use case for why you'd want to rank NA's higher than every other value? |
For
The other rank functions are amenable to a similar workaround. With respect to use cases, it's a question of how to interpret
We can't reasonably impose our view that the data should be in the form of case 1 since as developers we don't know what the data means nor the correct way to interpret |
Oh, I see - I think that |
I agree completely. However, you might consider keeping the default as it is now to be compatible with the |
@romainfrancois does the hybrid evaluator also have implementations of these functions that need to be updated? |
Yes.
The nice thing is that they all share the same implementation, |
However, for all of the functions handled by the Anyhow, what should the interface be for
It looks as if the R versions of the functions did propagate the NA:
So should I mimic those results, or do we need some ways to control NA, ... For |
Oh oops, I'd say that's a bug in my R implementation. The denominator should be the number of non-NAs, not the length. We don't need options to control behave, just ensure that NAs in input are NA in output |
Did the first set of functions: min_rank, percent_rank, dense_rank and cume_dist. Also I think I fixed the R implementation of them. I guess next is |
Each of the windowed rank functions could really use an
NA
handling parameter, similar to what is done in base R'srank()
function with the parameterna.last
.At present it appears that
NA
is always last (highest rank), which is equivalent tona.last=TRUE
and there is no obvious way to change this.Of particular importance is the
na.last=NA
case, where any missing data is not included in the ranking (it gets a rank ofNA
). For me, at least, exclusion of missing values from rankings is the most common desired case.The text was updated successfully, but these errors were encountered: