-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Column attribute round-tripping is too strict for datetimes #598
Comments
Thanks. Yes, it's unfortunate. I'd like to focus efforts on supporting times with time zones in duckdb so that we don't need this check. Please open a separate issue for the shuffling you mentioned, would it belong in the duckdb-r repo? |
Sorry my mistake! The shuffling actually comes from dplyr's library(duckplyr)
df <- data.frame(datetime = lubridate::now(tzone = "UTC"))
attributes(df$datetime)
#> $class
#> [1] "POSIXct" "POSIXt"
#>
#> $tzone
#> [1] "UTC"
df_sliced <- df |> slice()
attributes(df_sliced$datetime)
#> $tzone
#> [1] "UTC"
#>
#> $class
#> [1] "POSIXct" "POSIXt"
df_duckdb <- as_duckdb_tibble(df_sliced)
#> Error in `duckdb_rel_from_df()`:
#> ! Attributes are lost during conversion. Affected column:
#> `datetime`. |
Thanks, I see. Should this be |
Sure! And just to be precise, in the snippet of code I linked to above, the local variables for my reprex are: df_attrib
#> $class
#> [1] "POSIXct" "POSIXt"
roundtrip_attrib
#> $class
#> [1] "POSIXct" "POSIXt"
#>
#> $tzone
#> [1] "UTC" So If the check should simply test whether any of the original attributes have been dropped or changed (i.e., ignore any new attributes that may have been created), one implementation could be: !(
all(names(df_attrib) %in% names(roundtrip_attrib)) && # all attribute fields preserved
identical(roundtrip_attrib[names(df_attrib)], df_attrib) # values of fields are identical
) But even with the above there's one thorny case of, e.g., df <- data.frame(datetime = as.POSIXct(character(0)))
attributes(df$datetime)
#> $class
#> [1] "POSIXct" "POSIXt"
#>
#> $tzone
#> [1] ""
as_duckdb_tibble(df)
#> Error in `duckdb_rel_from_df()` at duckplyr/R/duckplyr_df.R:20:5:
#> ! Attributes are lost during conversion. Affected column: `datetime`. So perhaps additional work is required to either treat |
A date time value without a set timezone (ex: value from
Sys.time()
) gains a timezone attribute in the duckdb conversion, thus failing the check for attribute round-tripping:It seems to me that the check
!identical(df_attrib, roundtrip_attrib)
is too strict. I'm actually surprised by theidentical()
requirement here, especially when the error message is about attributes being dropped, which is not the case in the reprex.duckplyr/R/relational-duckdb.R
Lines 190 to 196 in 097102f
Relatedly -- and I can open a separate issue about this if needed -- there are also cases where duckdb can shuffle the order of attributes without changing their contents, and I wonder whether the check should let those pass (or raise a warning vs. error).
The text was updated successfully, but these errors were encountered: