Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use case for mutate(clean = TRUE) #302

Closed
ijlyttle opened this issue Mar 6, 2014 · 5 comments
Closed

Use case for mutate(clean = TRUE) #302

ijlyttle opened this issue Mar 6, 2014 · 5 comments
Assignees
Labels
feature a feature request or enhancement
Milestone

Comments

@ijlyttle
Copy link
Contributor

ijlyttle commented Mar 6, 2014

Per Hadley's suggestion, this was moved from the manipulatr google group. link

Let's say I get a csv file with a bunch of variables named "XPF102", "FGR24D", and so on. There may be twenty of these.

From my data-dictionary, I see that "XPF102" is pressure in psi, "FGR24D" is a part-per-million of contaminant, and so on.

I have used plyr::summarise() to do two things at once (this could be what you are trying to get away from in dplyr):

data_new <- plyr::summarise(
  data_old,
  pressure = XPF102 * 6894.75729,  # convert to Pa from psi (I know a function is more appropriate here)
  concentration_contaminant = FGR24D / 1.e6, # convert to proportion from parts-per-million
  ...
)

Using dplyr, I might have to do this:

data_new <- 
  data_old %.%
  mutate(
    pressure = XPF102 * 6894.75729,
    concentration_contaminant = FGR24D / 1.e6, 
    ...
  ) %.%
  select(
    pressure,
    concentration_contaminant,
    ...
  )

Doing this, I seem to have the opportunity to mistype a variable name by violating DRY.

Hadley suggested:

Maybe an option to mutate like clean = TRUE?

I like the idea. My only (relatively uneducated) concern is if the user wants to name a variable clean, but I'm sure there is a clever way to avoid that.

Thanks,

Ian

@hadley
Copy link
Member

hadley commented Jul 28, 2014

I'm now slightly leaning towards a new verb called transmute().

@ijlyttle
Copy link
Contributor Author

FWIW, I like it.

@hadley hadley modified the milestones: 0.3.1, 0.3 Aug 1, 2014
@hadley hadley self-assigned this Aug 1, 2014
@hadley hadley closed this as completed in 92690cf Aug 1, 2014
@piccolbo
Copy link

piccolbo commented Aug 1, 2014

When I picked transmute for plyrmr I thought it was odd enough that there would be no sharing of it. I have no problem with transmute popping up in dplyr, quite the opposite, the only problem is that transmute in plyrmr is uber-general and allows you to do thing like multi-row summaries (e.g. quantiles) or expansions, like splitting a line of text into words. It evaluates the ... arguments in an expanded environment and binds them together in a data frame (vectors and data.frames and lists are all allowed), applies fractional recycling like cbind does and returns the result. The right name for this was probably transform but that was taken for a much more constrained operation. So I went with transmute, but now you need that name. Fine. So what do you suggest? I am willing to pick anything that will suggest absolute freedom in assembling the result, is in the dictionary and won't appear in dplyr in the next century.

@hadley
Copy link
Member

hadley commented Aug 1, 2014

It might be ok - what does the signature of plyrmr::transmute look like? We might be able to share the same generic.

@piccolbo
Copy link

piccolbo commented Aug 1, 2014

function(.data, ..., .cbind = FALSE, .columns = NULL, .envir = parent.frame())

The problem I think is more the semantic difference. I know we had this discussion before, but it's not settled. Use case

mtcars %>% transmute(quantile(mpg), quantile(hp))

krlmlr pushed a commit to krlmlr/dplyr that referenced this issue Mar 2, 2016
@lock lock bot locked as resolved and limited conversation to collaborators Jun 10, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
feature a feature request or enhancement
Projects
None yet
Development

No branches or pull requests

3 participants