-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tbl_df[, i] shouldn't coerce to vector #587
Comments
Would it be sufficient to just delegate to
|
Yes exactly. I think we might already has a method for grouped objects. |
Ugh, I give up for now. Here are some test cases test_that("tbl_df doesn't drop by default", {
df <- data_frame(x = 1:4)
expect_equal(df[], df)
expect_equal(df[1], df)
expect_equal(df[, 1], df)
expect_equal(df[1:2, ], df)
}) the obvious stuff like @kevinushey's idea doesn't work because of the special weirdness of |
Yuck. This is pretty much because of these being different:
Is there any way to differentiate between those two cases in a sensible way? |
Here's something gross: #' @export
`[.tbl_df` <- function(x, i, j, drop = FALSE) {
## Needed to distinguish between x[i] and x[i, ]
num_args <- nargs()
## If we have 3 or more arguments, can use regular dispatch
if (num_args >= 3L) {
return(as.tbl(`[.data.frame`(x, i, j, drop)))
}
## Otherwise, dispatch depending on missingness
if (missing(j)) {
if (missing(i)) {
return(x)
} else {
return(as.tbl(`[.data.frame`(x, , i, drop)))
}
}
as.tbl(`[.data.frame`(x, i, j, drop))
} It passes these tests: library(dplyr)
library(testthat)
tbl <- data_frame(x = 1:4)
df <- data.frame(x = 1:4)
check <- function(x, y) {
expect_equal(as.data.frame(x), y)
}
check(tbl[], df[])
check(tbl[1], df[1])
check(tbl[, 1], df[, 1, drop = FALSE])
check(tbl[1:2, ], df[1:2, , drop = FALSE]) Part of me thinks we should just roll our own subsetting methods though, rather than delegating to |
Does your code work with an explicit drop? Reimplementing subsetting for data frames from scratch would be a lot of work :( |
Iteration two (maybe I'll put this in a PR if this seems like the route we want to go): maybe_tbl_df <- function(object) {
if (is.data.frame(object)) {
tbl_df(object)
} else {
object
}
}
#' @export
`[.tbl_df` <- function(x, i, j, drop = FALSE) {
num_args <- nargs() - !missing(drop)
if (num_args >= 3L) {
return(maybe_tbl_df(`[.data.frame`(x, i, j, drop)))
}
if (missing(j)) {
if (missing(i)) {
return(x)
} else {
return(maybe_tbl_df(`[.data.frame`(x, , i, drop)))
}
}
maybe_tbl_df(`[.data.frame`(x, i, j, drop))
} passes: library(dplyr)
library(testthat)
tbl <- data_frame(x = 1:4)
df <- data.frame(x = 1:4)
check <- function(x, y) {
expect_equal(as.data.frame(x), y)
}
check(tbl, df)
check(tbl[], df[])
check(tbl[1], df[1])
check(tbl[, 1], df[, 1, drop = FALSE])
check(tbl[1:2, ], df[1:2, , drop = FALSE])
check(tbl[drop = FALSE], df[drop = FALSE])
check(tbl[1, drop = FALSE], df[1, drop = FALSE])
check(tbl[, 1, drop = FALSE], df[, 1, drop = FALSE])
check(tbl[1:2, , drop = FALSE], df[1:2, , drop = FALSE])
check(tbl[drop = TRUE], df[drop = TRUE])
check(tbl[1, drop = TRUE], df[1, drop = TRUE])
expect_equal(tbl[, 1, drop = TRUE], df[, 1, drop = TRUE]) ## because we don't want as.df
expect_equal(tbl[1:2, , drop = TRUE], df[1:2, , drop = TRUE]) ## don't want as.df |
One more test case: |
Works, but with the different default for > tbl[,]
Source: local data frame [4 x 1]
x
1 1
2 2
3 3
4 4
> df[,]
[1] 1 2 3 4
> df[, drop = FALSE]
x
1 1
2 2
3 3
4 4
Warning message:
In `[.data.frame`(df, , drop = FALSE) : 'drop' argument will be ignored
> df[,, drop = FALSE]
x
1 1
2 2
3 3
4 4 I was forced to laugh a bit at the warning... |
The last one really should be |
May I ask why this is a desirable behavior? If Here's an example that used to be fine with FUN <- function(x, y) {
if (is.data.frame(x)) stop("supply vector")
cor(x, y)
}
x <- tbl_df(mtcars)
FUN(x[, "mpg"], x[, "hp"]) Now to get the same effect the user would have to use It's likely people use a similar check of a formal throughout many R add on packages. This means
x <- tbl_df(mtcars)
paste(x[, 1], collapse="|")
## vs.
paste(mtcars[, 1], collapse="|") Similarly this means S3 classed functions that are serching for a vector of some sort, if you index as x <- tbl_df(mtcars)
FUN2 <- function(x, ...) {
UseMethod("FUN2")
}
FUN2.numeric <- function(x, ...) {
message("A vector can't do `cor`")
}
FUN2.data.frame <- function(x, ...) {
message(if(ncol(x) == 1) warning("You need 2 vectors for a correlation"))
cor(x)
}
FUN2(x[, 1])
## You need 2 vectors for a correlation
## mpg
## mpg 1
## Warning message:
## In message(if (ncol(x) == 1) warning("You need 2 vectors for a correlation")) :
## You need 2 vectors for a correlation
FUN2(mtcars[, 1])
## A vector can't do `cor` |
This feature is very problematic. It breaks existing code of mine that depends on the With respect, one of the biggest advantages dplyr has over other approaches, such as those of Unless there is a very critical reason to depart from the default behavior of data frames, I would respectfully encourage you to reconsider this decision. |
@gvfarns regardless of whether or not we rollback this change you can always use |
If enough people are uncomfortable with this it could be worth rolling back. Honestly, I think Subset a data.frame by columns: and the |
Put another way: |
So is this definitive or a rollback still studied ? will this code always give different results :
And if yes, you advise to use |
I'm pretty sure this behaviour is going to stay. Use |
Closes tidyverse#587. Closes tidyverse#610
No description provided.
The text was updated successfully, but these errors were encountered: