Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

as.IDate error that can be fixed #4676

Closed
arunsrinivasan opened this issue Aug 18, 2020 · 14 comments · Fixed by #4692
Closed

as.IDate error that can be fixed #4676

arunsrinivasan opened this issue Aug 18, 2020 · 14 comments · Fixed by #4692

Comments

@arunsrinivasan
Copy link
Member

arunsrinivasan commented Aug 18, 2020

require(data.table) # 1.13.1 from today
x <- c("2020-01-01", "")
y <- c("", "2020-01-01")
as.IDate(x) # works fine
as.IDate(y)
# Error in charToDate(x) : 
#   character string is not in a standard unambiguous format
@MichaelChirico
Copy link
Member

Thanks @arunsrinivasan , this is inherited from Date, so to get around it, we'd need to build our own parser basically:

require(data.table) # 1.13.1 from today
x <- c("2020-01-01", "")
y <- c("", "2020-01-01")
as.Date(x) # works fine
as.Date(y) # same issue

@arunsrinivasan
Copy link
Member Author

I am aware that the issue is in as.Date. I forgot to mention it in the original post. I was thinking the fix could be as simple as replacing all "" with NA before passing it on to as.Date() method.

require(data.table) # 1.13.1 from today
x <- c("2020-01-01", "")
y <- c("", "2020-01-01")
y[y %chin% ""] <- NA_character_
as.IDate(x)
# [1] "2020-01-01" NA
as.IDate(y)
# [1] NA           "2020-01-01"

@MichaelChirico
Copy link
Member

MichaelChirico commented Aug 21, 2020 via email

@eddelbuettel
Copy link
Contributor

eddelbuettel commented Aug 21, 2020

That was my feeling too when I saw Arun's very clever "fudge".

Drop in %in% instead and you have a fix. Now, will "they" take it for base R? Nobody knows...

@arunsrinivasan
Copy link
Member Author

Why not? But I won't be using the recent version of R anytime soon for work. But I would be using a more recent data.table version sooner. So I'd suggest a/this workaround fix until R fixes this.

@arunsrinivasan
Copy link
Member Author

arunsrinivasan commented Aug 21, 2020

BTW if the fix were to go into base R, they just need to add "" to the charToDate internal function in as.Date.character. Here's the relevant part:

# this logic just needs to account for `""`: is.na(xx) ==> xx %in% c(NA_character_, "")
charToDate <- function(x) {
    xx <- x[1L]
    if (is.na(xx)) {
        j <- 1L
        while (is.na(xx) && (j <- j + 1L) <= length(x)) xx <- x[j]
        if (is.na(xx))
            f <- "%Y-%m-%d"
    }
    ....

@MichaelChirico
Copy link
Member

MichaelChirico commented Aug 21, 2020 via email

@arunsrinivasan
Copy link
Member Author

I disagree. We should issue a fix while filing an issue there. Like I said, it takes time to get things into base R and people don't always use the most recent version of R, if you use in production. I'm all up for filing an issue there, but I'd like the issue taken care of in data.table nonetheless.

@MichaelChirico
Copy link
Member

I agree with you about it being easier/more common to update data.table than R, and agree we should make a fix in data.table no matter what, I am just pushing back on doing so immediately.

What I'm trying to prevent is setting ourselves up for a breaking change in a later release:

T0: we submit an issue to Bugzilla & merge our own solution in data.table
T1: release this version of data.table
T2: R disagrees with our solution, but does fix the issue in a different, incompatible way
T3: Because we inherit from/extend Date, we have to patch our original fix [breaking change]

So I think we should at least file the issue first & see if R core has any opinion. If not we just merge & move on.

@MichaelChirico
Copy link
Member

I have submitted a bug to R:

https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17909

Note the simpler fix:

is.na(x) <- !nzchar(x)

@arunsrinivasan
Copy link
Member Author

T2: R disagrees with our solution, but does fix the issue in a different, incompatible way

Sorry but I'm having a hard time making sense of this. There are only two possible scenarios I can think of.

  1. as.Date(x) is the same as as.Date(x, format="%Y-%m-%d"), in which case it doesn't matter how you get there, at least temporarily until base R accepts your PR or comes up with another.

  2. They keep the error, in which case, we all agree data.table could and should handle this.

It has to be equal to the output of as.Date(x, format=) in a non-error scenario. What other incompatible ways are there?

@mmaechler
Copy link
Contributor

mmaechler commented Sep 3, 2020

I have submitted a bug to R:

https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17909

I've fixed the bug in R now (svn rev 79119 for 'R-devel' -- to be ported to "R 4.0.2 patched" in a few days ("almost surely").

@ColeMiller1
Copy link
Contributor

R fixed this in 4.0.3 released on October 10. Should this be closed as is or should Michael's PR be merged first?

@MichaelChirico
Copy link
Member

MichaelChirico commented Feb 4, 2021 via email

@mattdowle mattdowle added this to the 1.14.1 milestone May 26, 2021
@jangorecki jangorecki modified the milestones: 1.14.9, 1.15.0 Oct 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants