-
Notifications
You must be signed in to change notification settings - Fork 978
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
implicit type promotion in fcoalesce and fifelse #4101
Comments
I think it is easy to do for |
@BenoitLondon I think these error messages could be expected as
Luckily you can force missing value to be of a certain type, by using
|
I think that fixing data from the upstream provider is the most appropriate step, it doesn't have to be anything sophisticated. dt = as.data.table(dbGetQuery("select * from users"))
dt[, `:=`(id=as.integer(id), name=as.character(name), rate=as.numeric(rate))]
myifelse(dt, ...) Or a sketch of handling that more generally by a wrapper function having
This way any logical This is really a bug in data delivery layer. The fact that it does not preserve schema information, but build schema based on data it reads. It is fine for a csv or other schema-less format, but then there is csvy which meant to address that gap by carrying schema in csv header. I would close it for now, we can still observe it will be coming back often or not. |
Ok I may have badly expressed myself... I m aware of these solutions, I m just saying they are not user friendly. Dplyr is implementing exactly this and base ifelse always behaved this way. I don't understand why you go the other way... If really you want to keep exact type checking maybe we could have type promoting versions ? tfcoalesce and tfifelse ? |
@BenoitLondon I see that
It might therefore be worth considering to keep consistency with base R's |
|
Was thinking about that, but again base ifelse behaves differently and dplyr is implementing type promotion by default AFAIK. |
|
Yeah for dplyr I was thinking of coalesce |
OTOH we allow |
agreed, especially since that will force consistency on how this happens |
I think we should resist this feature request.
Well it was never advertised as such -- indeed one of its primary features is that it doesn't do type promotion. From NEWS
A similar rationale was presented when hadley introduced You offer some cases where type promotion seems a sensible choice, but I don't why data.table should change, potentially at the cost of consistency, when it is simple to create your own function, tailored to your expectations. Type promotion is always idiosyncratic. |
Well I really love data.table and wish I could use only this package for data manipulation but it was missing three functions for me coalesce ifelse and case_when which are very handy but the handiness is lost if they behave differently from base/dplyr alternatives. As you say, I might be able to implement my versions but it s preferable to be in data.table. it is also a good case for data.table to be more accessible to new users and get more attention. |
@BenoitLondon , you might want to check |
Change is pretty straightforward utilizing str(fifelse(TRUE, NA, 1.0))
# num NA
str(fifelse(FALSE, NA, 1.0))
# num 1 Docs mentioned by @HughParsonage still holds
The question is should we also introduce class promotion? for example promotion to i64 works fine, unless one of the objects is REAL, because then type match but class don't. i64 = bit64::as.integer64
str(fifelse(TRUE, NA, i64(1)))
#integer64 NA
str(fifelse(FALSE, NA, i64(1)))
#integer64 1
str(fifelse(TRUE, 1.5, i64(1)))
#Error in fifelse(TRUE, 1.5, i64(1)) :
# 'yes' has different class than 'no'. Please make sure that both arguments have the same class.
str(fifelse(FALSE, 1.5, i64(1)))
#Error in fifelse(FALSE, 1.5, i64(1)) :
# 'yes' has different class than 'no'. Please make sure that both arguments have the same class. |
The tricky thing about class promotion, as opposed to type promotion is to handle unrelated classes, like how to promote IDate and factor. |
* nafill rework for more robust coerce * initial change for #4101 * nafill simple fill coerce * nafill const works for locf and nocb as well, closes #3594 * fix tests for #4101 * coverage * placeholder for nafill #3992 * use coerceAs in froll * enable disabled test * nafill retain names, tests for fill list * coerceAs gets copyArg so now can return its input * better control of verbose, and better find class for coerceAs * proper verbose to int * verbose changes coverage * rm unused anymore * more precise verbose arg check * memrecycle escape warnings and simple verbose for numcol==0 * coerceAs does not emit extra warning anymore * coerceAs verbose, more tests * use older nanotime api * update error msg * coverage * codecov of coerceAs * catch all verbose * Revert "initial change for #4101" This reverts commit 1fa2069. * Revert "fix tests for #4101" This reverts commit cc2cc0e. * use coerceAs in fcast.c * restore actual fix * ws * incomplete merge * vestigial bad merge * minimize diff * coerceAs throws warning for string->double, and errors on list * comment * example without warning * bad NEWS merge * warning is still needed --------- Co-authored-by: jangorecki <j.gorecki@wit.edu.pl> Co-authored-by: Michael Chirico <michael.chirico@grabtaxi.com>
Not sure I understand how the fcoalesce function is meant to be used but I found it not very user-friendly.
e.g I have a data.table coming from some computation or database and if one column is full of NA it will be of type Logical and then cant be used directly inside fcoalesce.
The text was updated successfully, but these errors were encountered: