Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

select is not consistent with dplyr::select when used on data.frame with duplicate column names #92

Open
TimTeaFan opened this issue May 23, 2021 · 2 comments
Milestone

Comments

@TimTeaFan
Copy link

TimTeaFan commented May 23, 2021

I was playing around with data.frames with duplicate column names and stumbled upon this inconsistency with {dplyr}:

library(dplyr)
dat <- data.frame(a = 1, b = 2, a = 3, check.names = FALSE) 

dat %>% poorman::select(a)
#>   a
#> 1 1

dat %>% dplyr::select(a)
#> Error: Names must be unique.
#> x These names are duplicated:
#>   * "a" at locations 1 and 2.

Created on 2021-05-24 by the reprex package (v0.3.0)

The question is: is {poorman} supposed be 100% consistent with {dplyr}?

If yes then poorman::select should throw an error as well.

On the other hand, {poorman} - unlike {dplyr} - might not be bound in the same way to the concept of tidy data, and it would be nice to have a go-to package when dealing with untidy data.frame's. In this case both a columns should be selected.

Regarding mutate the behavior differs as well:

dat %>% poorman::mutate(c = 4)
#>   a b a.1 c
#> 1 1 2   3 4

dat %>% dplyr::mutate(c = 4)
#> Error: Can't transform a data frame with duplicate names.

It seems like mutate automatically uses check.names = TRUE and renames the duplicate column name without notice. In this case an error might be preferable (or as an alternative, the column names could be left untouched).

Created on 2021-05-24 by the reprex package (v0.3.0)

I didn't consider this to be a "bug", so I opened a blank issue.

@nathaneastwood
Copy link
Owner

Hi @TimTeaFan, thanks for submitting this issue - it's an interesting one. I would say that given {dplyr} fails in these instances, {poorman} should also fail. My initial curiosity lies in wondering where this fails within {dplyr}. Is it an issue from {dplyr} itself, {tibble} or maybe {tidyselect}? Once I know that, I will be better placed to understand where {poorman} should capture and handle this type of issue. I will do some digging and get back to you!

@TimTeaFan
Copy link
Author

TimTeaFan commented May 24, 2021

Regarding dplyr::select the issue is caused by tidyselect::eval_select. I digged into this a little in this SO answer. Regarding dplyr::mutate I'm not sure if this is caused by {tidyselect}.

@nathaneastwood nathaneastwood added this to the 0.2.6 milestone Jun 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants