Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

setdiff vs. setequal inconsitency when columns have different types #6114

Closed
sigmaeon opened this issue Dec 2, 2021 · 1 comment · Fixed by #6354
Closed

setdiff vs. setequal inconsitency when columns have different types #6114

sigmaeon opened this issue Dec 2, 2021 · 1 comment · Fixed by #6354
Labels
feature a feature request or enhancement tables 🧮 joins and set operations vctrs ↗️

Comments

@sigmaeon
Copy link

sigmaeon commented Dec 2, 2021

It seems like setequal compares column types while setdiff doesn't.

df1 <- data.frame(x=1)
df2 <- df1

# df1$x and df2$x are both of type `num`
setdiff(df1, df2) # --> will output 0-rowed data.frame
setequal(df1, df2) # --> will output TRUE

df2$x %<>% as.integer 

# df1$x is of type `num` while df2$x is of type `int`
setdiff(df1, df2) # --> will output 0-rowed data.frame
setequal(df1, df2) # --> will output FALSE

I can see how this could actually be the intended behavior.
However, in that case a hint in the docs would be great.

@hadley hadley added feature a feature request or enhancement tables 🧮 joins and set operations labels Apr 16, 2022
@hadley hadley changed the title setdiff vs. setequal when comparing dataframes/tibbles with different column type setdiff vs. setequal inconsitency when columns have different types Jul 22, 2022
@hadley
Copy link
Member

hadley commented Jul 22, 2022

library(dplyr, warn.conflicts = FALSE)

df1 <- tibble(x = 1)
df2 <- tibble(x = 1L)

setdiff(df1, df2)
#> # A tibble: 0 × 1
#> # … with 1 variable: x <dbl>
#> # ℹ Use `colnames()` to see all variable names
setequal(df1, df2)
#> [1] FALSE

Created on 2022-07-22 by the reprex package (v2.0.1)

hadley added a commit that referenced this issue Jul 22, 2022
I don't love that there's so much variation in where the casting happens in these functions, but this seems like the simplest fix. `covert = TRUE` here eventually calls `vec_ptype2(x_i, y_i)` so it's still using vctrs rules.

Fixes #6114.
hadley added a commit that referenced this issue Jul 22, 2022
* `setequal()` now coerces columns to common type. Fixes #6114.
* `setequal()` ignores duplicates. Fixes #6057.
* Reorganised code to emphasise similarities between functions.
* Refactored tests and increased coverage to 100%.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature a feature request or enhancement tables 🧮 joins and set operations vctrs ↗️
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants