Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nafill, setnafill for character, factor and other types #3992

Open
mik3y64 opened this issue Oct 25, 2019 · 11 comments · May be fixed by #4980
Open

nafill, setnafill for character, factor and other types #3992

mik3y64 opened this issue Oct 25, 2019 · 11 comments · May be fixed by #4980
Assignees
Labels
feature request top request One of our most-requested issues

Comments

@mik3y64
Copy link

mik3y64 commented Oct 25, 2019

nafill and setnafill only supports numeric types.

library(data.table)

x = letters[1:10]
x[c(1:2, 5:6, 9:10)] = NA

nafill(x, "locf")
# Error in nafill(x, "locf") : 
# 'x' argument must be numeric type, or list/data.table of numeric types

setnafill(x, "locf")
# Error in setnafill(x, "locf") : 
#   'x' argument is atomic vector, in-place update is supported only for list/data.table

Both zoo::na.locf and tidyr::fill support all data types.

zoo::na.locf(x, na.rm = FALSE)
# [1] NA  NA  "c" "d" "d" "d" "g" "h" "h" "h"

tidyr::fill(tibble::as_tibble(x), value)
# A tibble: 10 x 1
#    value
#    <chr>
#  1 NA   
#  2 NA   
#  3 c    
#  4 d    
#  5 d    
#  6 d    
#  7 g    
#  8 h    
#  9 h    
# 10 h

Feature request for supporting character, factor and other types. It's useful in many cases in merging and cleaning data.tables/data.frames. Unlike matrix, data.frames/data.tables usually contain columns of arbitrary types, and it's very common for data.tables/data.frames to have NA values filled in columns of arbitrary types, especially characters. Right now I have to import the specific functions from other packages just for that functionalities and those functions are much slower than the ones created by data.table.

Thank you for creating this superior data.table package.

@mik3y64 mik3y64 changed the title Feature request: nafill, setnafill for character types, factor and other types Feature request: nafill, setnafill for character, factor and other types Oct 25, 2019
@chinsoon12
Copy link

chinsoon12 commented Oct 25, 2019

Maybe as a temp workaround:

DT <- data.table(X=c("A",NA,NA,"B",NA,"C"))
DT[, X := X[nafill(replace(.I, is.na(X), NA), "locf")]]

and using mik3y64 example

x = letters[1:10]
x[c(1:2, 5:6, 9:10)] = NA
x[nafill(replace(seq_along(x), is.na(x), NA_integer_), "locf")]

@jangorecki
Copy link
Member

Thanks @mik3y64 for the request. It is on the list to do, it was just not the part of the initial functionality. We have to first finish #3765 then the logic here can re-use changes from that PR.
Thanks @chinsoon12 for interesting workaround.

@mik3y64

This comment has been minimized.

@jangorecki jangorecki changed the title Feature request: nafill, setnafill for character, factor and other types nafill, setnafill for character, factor and other types Oct 26, 2019
@tdhock
Copy link
Member

tdhock commented Mar 3, 2020

I agree that character support would be useful. Right now here is a workaround:

library(data.table)
DT=data.table(x.chr=c("foo", NA, "bar"))
DT[, x.fac := factor(x.chr)]
DT[, x.int := as.integer(x.fac)]
setnafill(DT, type="locf", cols="x.int")
levs <- levels(DT$x.fac)
DT[, x.fac2 := factor(x.int, seq_along(levs), levs)]
DT[, x.chr2 := paste(x.fac2)]

@MichaelChirico
Copy link
Member

Extending @tdhock's approach to many character columns:

char_cols = c(...)
DT[ , (char_cols) := lapply(.SD, factor), .SDcols = char_cols]
lev = sapply(char_cols, function(x) levels(DT[[x]]))
DT[ , (char_cols) := lapply(.SD, as.integer), .SDcols = char_cols]
DT[ , (char_cols) := lapply(.SD, nafill, 'locf'), /*by = ...,*/ .SDcols = char_cols]
for (col in char_cols) set(DT, NULL, col, lev[[col]][DT[[col]]])

@jangorecki we don't need #4491 to do nafill for factor right? Just need copy the attributes?

That would take out a lot of the work of the above...

@jangorecki

This comment has been minimized.

@ethanbsmith

This comment has been minimized.

@m-mburu

This comment has been minimized.

@iago-pssjd

This comment has been minimized.

@jangorecki

This comment has been minimized.

@jangorecki jangorecki self-assigned this May 5, 2021
@jangorecki jangorecki linked a pull request May 7, 2021 that will close this issue
@jangorecki
Copy link
Member

jangorecki commented May 7, 2021

I filled #4980 that implements requested functionality. Still WIP.

MichaelChirico added a commit that referenced this issue Mar 4, 2024
* nafill rework for more robust coerce

* initial change for #4101

* nafill simple fill coerce

* nafill const works for locf and nocb as well, closes #3594

* fix tests for #4101

* coverage

* placeholder for nafill #3992

* use coerceAs in froll

* enable disabled test

* nafill retain names, tests for fill list

* coerceAs gets copyArg so now can return its input

* better control of verbose, and better find class for coerceAs

* proper verbose to int

* verbose changes coverage

* rm unused anymore

* more precise verbose arg check

* memrecycle escape warnings and simple verbose for numcol==0

* coerceAs does not emit extra warning anymore

* coerceAs verbose, more tests

* use older nanotime api

* update error msg

* coverage

* codecov of coerceAs

* catch all verbose

* Revert "initial change for #4101"

This reverts commit 1fa2069.

* Revert "fix tests for #4101"

This reverts commit cc2cc0e.

* use coerceAs in fcast.c

* restore actual fix

* ws

* incomplete merge

* vestigial bad merge

* minimize diff

* coerceAs throws warning for string->double, and errors on list

* comment

* example without warning

* bad NEWS merge

* warning is still needed

---------

Co-authored-by: jangorecki <j.gorecki@wit.edu.pl>
Co-authored-by: Michael Chirico <michael.chirico@grabtaxi.com>
@MichaelChirico MichaelChirico added the top request One of our most-requested issues label Apr 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request top request One of our most-requested issues
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants