Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide rbind() and cbind() methods #909

Draft
wants to merge 15 commits into
base: main
Choose a base branch
from
Draft

Provide rbind() and cbind() methods #909

wants to merge 15 commits into from

Conversation

krlmlr
Copy link
Member

@krlmlr krlmlr commented Jul 21, 2021

for R >= 4.0.0.

Closes #34.

@krlmlr krlmlr requested review from DavisVaughan and lionel- July 21, 2021 04:27
R/class-tbl_df.R Outdated

#' @rawNamespace if (getRversion() >= "4.0.0") S3method(cbind, tbl_df)
if (getRversion() >= "4.0.0") cbind.tbl_df <- function(...) {
vec_cbind(...)
Copy link
Member

@DavisVaughan DavisVaughan Jul 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like it is reasonable that if users want to use the optional arguments in vec_c/rbind(), then they should just use vec_c/rbind() instead, but I wanted to check here if that is also what you are thinking.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Should we do vec_cbind(!!!list(...)) instead, to avoid passing through optional arguments?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that would work yes. Or should that be list2()? Or would it be weird to use !!! with base generics? I don't really know what to think about these mixed interfaces.

@krlmlr krlmlr marked this pull request as draft July 31, 2021 05:18
@krlmlr
Copy link
Member Author

krlmlr commented Jul 31, 2021

After #890, do we still want to use vec_rbind() here? Should rbind() (and also dplyr::bind_rows() for that matter) call dplyr_row_slice() to support data frame classes with variable-length attributes?

@krlmlr
Copy link
Member Author

krlmlr commented Jul 31, 2021

Tried that, doesn't work: vec_rbind() does much more, e.g. adjustment of factor levels.

I still wonder what the desired semantics for dplyr_reconstruct() are if nrow(x) != nrow(to) .

@krlmlr
Copy link
Member Author

krlmlr commented Jul 31, 2021

Docs say "you need to carefully check and think about it", I guess we're fine here.

@hadley
Copy link
Member

hadley commented Aug 2, 2021

Just started the revdeps with

library(revdepcheck)

pkgs <- unique(c(cran_revdeps("tidyr"), cran_revdeps("tibble"), cran_revdeps("dplyr")))
cloud_check(revdep_packages = pkgs)

@krlmlr
Copy link
Member Author

krlmlr commented Aug 7, 2021

Thanks @hadley!

This shows that:

  1. The following packages are using rbind() or cbind() in a way that is incompatible with vec_rbind() and vec_cbind():
    - dat
    - designr
    - ExpertChoice
    - ggiraph
    - ggiraphExtra
    - groupr
    - heemod
    - neonstore
    - prettyglm
    - RKorAPClient
    - VarBundle
    - visR
    - vlda
    - wpa
  2. All other failing packages might be incompatible with vec_rbind() and vec_cbind(), but they also have failing tests.
  3. We don't know anything yet about the packages we failed to check.

Do we have an easy way to send a message to all affected maintainers?

@hadley
Copy link
Member

hadley commented Aug 16, 2021

Before we notify the maintainers, I think it would be worth digging into 2-5 of the failures in more detail to see if there's a common underlying cause.

@krlmlr
Copy link
Member Author

krlmlr commented Oct 20, 2021

We can find the underlying cause by replacing rbind() with vctrs::vec_rbind() one by one in the packages, same with cbind().

egor rbind() failure has the following underlying reason:

library(tibble)

rbind(tibble(.b = integer()), tibble(a = 1, .b = 2))
#> # A tibble: 1 × 2
#>       a    .b
#>   <dbl> <dbl>
#> 1     1     2
vctrs::vec_rbind(tibble(.b = integer()), tibble(a = 1, .b = 2))
#> # A tibble: 1 × 2
#>      .b     a
#>   <dbl> <dbl>
#> 1     2     1

Created on 2021-10-20 by the reprex package (v2.0.1)

@krlmlr
Copy link
Member Author

krlmlr commented Oct 20, 2021

ypr checks for row names in its tests:

library(tibble)

row.names(rbind(tibble(a = 1), data.frame(a = c(x = 2))))
#> [1] "1" "x"
row.names(vctrs::vec_rbind(tibble(a = 1), data.frame(a = c(x = 2))))
#> [1] "...1" "x"

Created on 2021-10-20 by the reprex package (v2.0.1)

@krlmlr
Copy link
Member Author

krlmlr commented Oct 20, 2021

twoxtwo binds a vector and a tibble:

library(tibble)

base::rbind.data.frame(c("a", "b"), tibble(a = 1, b = 2))
#> # A tibble: 2 × 2
#>   a     b    
#>   <chr> <chr>
#> 1 a     b    
#> 2 1     2
vctrs::vec_rbind(c("a", "b"), tibble(a = 1, b = 2))
#> New names:
#> * `` -> ...1
#> * `` -> ...2
#> # A tibble: 2 × 4
#>   ...1  ...2      a     b
#>   <chr> <chr> <dbl> <dbl>
#> 1 a     b        NA    NA
#> 2 <NA>  <NA>      1     2

Created on 2021-10-20 by the reprex package (v2.0.1)

@krlmlr
Copy link
Member Author

krlmlr commented Oct 20, 2021

optimall calls cbind() with a vector and relies on auto-naming:

library(tibble)

x <- 2

base:::cbind.data.frame(tibble(a = 1), x)
#>   a x
#> 1 1 2
vctrs::vec_cbind(tibble(a = 1), x)
#> New names:
#> * `` -> ...2
#> # A tibble: 1 × 2
#>       a  ...2
#>   <dbl> <dbl>
#> 1     1     2

Created on 2021-10-20 by the reprex package (v2.0.1)

@krlmlr
Copy link
Member Author

krlmlr commented Oct 20, 2021

Especially the last issue looks like a serious incompatibility:

library(tibble)

x <- 2

base:::cbind.data.frame(tibble(a = 1), x)
#>   a x
#> 1 1 2
vctrs::vec_cbind(tibble(a = 1), x)
#> New names:
#> * `` -> ...2
#> # A tibble: 1 × 2
#>       a  ...2
#>   <dbl> <dbl>
#> 1     1     2

Created on 2021-10-20 by the reprex package (v2.0.1)

I suggest the following:

  • separate the changes to rbind() from the changes to cbind()
  • support fine-grained enabling and disabling of the new functionality
    • cbind() calls tbl_cbind() which is provided by us and exported, if and only if getOption("tibble.cbind_base") is unset or FALSE
    • otherwise cbind() forwards to NextMethod()
    • in tbl_cbind() we check another option, "tibble.cbind_trace", which prints diagnostics on the input and output of these methods
    • similarly for rbind()
  • depending on the new revdepcheck results, we might need to take action and tweak tbl_rbind() and tbl_cbind()
  • we schedule the change for tibble 4.0.0 and warn the maintainers well ahead of the release

This allows us and the maintainers to easier locate the source of a problem: we can gradually replace cbind() by tbl_cbind() and set option(tibble.cbind_base = TRUE), and/or set options(Tibble.cbind_trace = TRUE) to understand problems.

tbl_rbind(), tbl_cbind() and the options are implemented in this PR.

Do we still want to pursue this path?

@DavisVaughan
Copy link
Member

The story with rbind() and cbind() is growing so complex, I'd almost prefer to just leave them as they are, and advise people to use vec_rbind() and vec_cbind() directly instead. Two reasons I feel this is reasonable:

  • I imagine most data analysis focused users will already be using the tidyverse and bind_rows/cols() directly, so they wouldn't need this
  • I imagine that package developers who might not want to depend on dplyr would be fine depending on vctrs, since tibble already Imports it.

I think the important point is the second one. We now at least have a good answer for people who want to be able to combine tibbles without relying on dplyr (i.e. use vctrs).

@DavisVaughan
Copy link
Member

Seems like you could link to vec_rbind() in the tibble::add_row() documentation. i.e. "If you would like to row bind multiple data frames together, see..."

Similar for vec_cbind() and tibble::add_column().

@krlmlr
Copy link
Member Author

krlmlr commented Oct 22, 2021

Agreed, it's a lot of work for limited gain. Depends on how badly we want to support rbind() and cbind(), i.e. if we gain other advantages by overriding these methods for tibbles.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Provide rbind method
4 participants