Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

summarise_at does not correctly process select_helpers #2452

Closed
Feakster opened this issue Feb 20, 2017 · 6 comments
Closed

summarise_at does not correctly process select_helpers #2452

Feakster opened this issue Feb 20, 2017 · 6 comments
Labels
feature a feature request or enhancement

Comments

@Feakster
Copy link

I'm currently using an NHANES data cut I've made to develop a teaching unit for people to learn to use dplyr for data manipulation. While doing so, I've noticed that the select_helpers do not work correctly for the summarise_at() function. For instance, the commands below return a data frame containing the mean of all variables, as opposed to just those specified by "contains" and "starts_with" helpers:

summarise_at(nhanes, contains("lbd"), funs(mean(., na.rm = TRUE)))
summarise_at(nhanes, starts_with("lbd"), funs(mean(., na.rm = TRUE)))

Whereas, what I want is akin to this, which does work, but is not as elegant:

summarise_if(nhanes, grepl("lbd", names(nhanes)), funs(., na.rm = TRUE)))

Ben

@hadley
Copy link
Member

hadley commented Feb 20, 2017

You forgot the vars()

@hadley hadley closed this as completed Feb 20, 2017
@hadley
Copy link
Member

hadley commented Feb 20, 2017

But needs a better error message

@hadley hadley reopened this Feb 20, 2017
@hadley hadley added data frame feature a feature request or enhancement labels Feb 20, 2017
@hadley hadley closed this as completed in 4996dd6 Feb 20, 2017
@Feakster
Copy link
Author

Hi, I've been playing around with the mutate_at() and summarise_at() commands this morning. The help file for summarise_all() states that both summarise_at() and mutate_at() allow you to select variables using the same name-based select_helpers (presumably in place of vars() as an alternative means of specifying the variables you wish to summarise or mutate). However, neither command gives the anticipated behaviour.

Assuming I have a database called nhanes, consisting of mutiple numeric variables, two of which are called lbd1 and lbd2, and in which no other variables names contain the substring "lbd"...

The following two mutate_at() commands would be expected to return the same result:

mutate_at(nhanes, vars(lbd1, lbd2), funs(mean(., na.rm = TRUE)))
mutate_at(nhanes, contains("lbd"), funs(mean(., na.rm = TRUE)))

While the following two summarise_at() commands would be expected to return the same result:

summarise_at(nhanes, vars(lbd1, lbd2), funs(mean(., na.rm = TRUE)))
summarise_at(nhanes, contains("lbd"), funs(mean(., na.rm = TRUE)))

However, in both cases the select_helper contains() does not limit the operation to the variables specified. I've noticed that this problem only seems to be present when specifying . within funs() in conjunction with select_helpers, (but not in conjunction with vars()).

i.e., both of these commands work identically, as anticipated:

summarise_at(nhanes, vars(lbd1, lbd2), funs(mean))
summarise_at(nhanes, vars(lbd1, lbd2), funs(mean(.)))

... while only the first of these command works as anticipated:

summarise_at(nhanes, contains("lbd"), funs(mean)) # Performs specified function on only vairable names containing "lbd"
summarise_at(nhanes, contains("lbd"), funs(mean(.))) # Performs specifed function on all variables!

Ben

@hadley
Copy link
Member

hadley commented Feb 21, 2017

If you want to use select style, you must use vars(). Have at look at the examples.

@Feakster
Copy link
Author

If this is the case... could someone please change the wording of the help file for summarise_all(). It currently states:

summarise_at() and mutate_at() allow you to select columns using the same name-based select_helpers as with select().

Ben

@Feakster
Copy link
Author

Ah... just realised that you need to use the select_helpers inside the vars() argument. Not instead of it.

@lock lock bot locked as resolved and limited conversation to collaborators Jun 8, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
feature a feature request or enhancement
Projects
None yet
Development

No branches or pull requests

2 participants