-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conditionally mutate selected rows #4050
Comments
We can fake it already, but overwriting would be a tad faster: library(tidyverse)
df <- tibble(a = 1:5)
df
#> # A tibble: 5 x 1
#> a
#> <int>
#> 1 1
#> 2 2
#> 3 3
#> 4 4
#> 5 5
if_flag <- function(quo, name) {
rlang::quo_set_expr(
quo,
expr(if (.flag[1]) !!rlang::quo_get_expr(quo) else !!rlang::sym(name))
)
}
mutate_if_row <- function(.data, cond, ...) {
cond <- rlang::enquo(cond)
quos <- rlang::quos(...)
quos <- map2(quos, names(quos), if_flag)
.data %>%
group_by(.flag = !!cond) %>%
mutate(!!!quos) %>%
ungroup() %>%
select(-.flag)
}
df %>%
mutate_if_row(a > 3, a = a + 1L)
#> # A tibble: 5 x 1
#> a
#> <int>
#> 1 1
#> 2 2
#> 3 3
#> 4 5
#> 5 6 Created on 2018-12-21 by the reprex package (v0.2.1.9000) |
Also not that trivial to implement. We can only realistically do that when R says the object has only one reference. This, to me, looks like modify by reference, à la data.table, and is out of scope for dplyr. This sounds like a use case for library(dplyr)
df <- tibble(a = 1:5)
df %>%
mutate(a = case_when(
a > 3 ~ a + 1L,
TRUE ~ a
))
#> # A tibble: 5 x 1
#> a
#> <int>
#> 1 1
#> 2 2
#> 3 3
#> 4 5
#> 5 6 Created on 2018-12-21 by the reprex package (v0.2.1.9000) |
I see your point, we need to copy anyway, even if R says it has only one copy. Copying via Maybe something to consider for 0.9.0? |
I see, I think I've been confused by the library(rlang)
library(dplyr)
library(purrr)
mutate_when <- function(data, condition, ...){
condition <- enquo(condition)
dots <- exprs(...)
expressions <- map2( dots, syms(names(dots)), ~{
quo( case_when(..condition.. ~ !!.x , TRUE ~ !!.y ) )
})
data %>%
mutate( ..condition.. = !!condition ) %>%
mutate( !!!expressions ) %>%
select( -..condition..)
}
d <- tibble( x = 1:4, y = 1:4)
mutate_when( d, x < 3,
x = -x,
y = -y
)
#> # A tibble: 4 x 2
#> x y
#> <int> <int>
#> 1 -1 -1
#> 2 -2 -2
#> 3 3 3
#> 4 4 4 Created on 2019-01-29 by the reprex package (v0.2.1.9000) |
Here are some approaches using data frame returns:
library(dplyr)
d <- tibble( x = 1:4, y = 1:4)
# using data frame returns
d %>%
mutate({
test <- x < 4
x[test] <- -x[test]
y[test] <- -y[test]
data.frame(x = x, y = y)
})
#> # A tibble: 4 x 2
#> x y
#> <int> <int>
#> 1 -1 -1
#> 2 -2 -2
#> 3 -3 -3
#> 4 4 4 if we want to do the same thing to a selected set of columns, we can use # using across()
d %>%
mutate({
test <- x < 4
across(c(x, y), ~ {.x[test] <- -.x[test]; .x })
})
#> # A tibble: 4 x 2
#> x y
#> <int> <int>
#> 1 -1 -1
#> 2 -2 -2
#> 3 -3 -3
#> 4 4 4 and we can further abstract, e.g. negate_if <- function(condition, cols) {
across({{ cols }}, ~ {
.x[condition] <- -.x[condition]
.x
})
}
d %>%
mutate(negate_if(x < 4, c(x, y)))
#> # A tibble: 4 x 2
#> x y
#> <int> <int>
#> 1 -1 -1
#> 2 -2 -2
#> 3 -3 -3
#> 4 4 4 Now if we want to do arbitrary mutations, e.g. mutate_when <- function(.data, when, ...) {
dots <- enquos(...)
names <- names(dots)
mutate(.data, {
test <- {{ when }}
changed <- data.frame(!!!dots)
out <- across(all_of(names))
# assuming `changed` and `out` have the same data frame type
out[test, ] <- changed[test, ]
out
})
}
mutate_when(d, x < 4, x = -x, y = -y)
#> # A tibble: 4 x 2
#> x y
#> <int> <int>
#> 1 -1 -1
#> 2 -2 -2
#> 3 -3 -3
#> 4 4 4 Created on 2021-04-21 by the reprex package (v0.3.0) This all feels like things we can do with the tools available, perhaps in some other package ? |
I just wanted to mention In my particular use case, I am creating code which creates output based upon a flow chart. My intended end users are less familiar with R, and I don't want them to get overwhelmed by the sheer volume of repetitive code. IMO, this Because I am naive and new, I thought something like this might work...
Thanks @romainfrancois for posting this function. |
We gave a serious attempt at this in #6313 for dplyr 1.1.0, but ultimately decided not to add it in that release. We aren't convinced that it is an operation that would be heavily used, as the main example usage we could come up with was replacing missing values, i.e.: mutate(df, x = 0, .when = is.na(x)) We can't think of many examples beyond this one where this would be very useful. Here are a few notes we should consider in the future when thinking about this:
We have to think about how useful this function is in light of the fact that we now have the ability to create type stable mutate(
x = case_match(x, NA ~ 0, .ptype = x, .default = x)
) And that could be wrapped into a mutate(
if_else(
is.na(x) | is.na(y),
tibble(x = 0, y = 0),
tibble(x = x, y = y)
)
) |
A nice little alternative to replace_at <- function(x, i, value) {
size <- vctrs::vec_size(x)
i <- vctrs::vec_as_location(i = i, n = size, missing = "remove")
# recycle up to size of x
value <- vctrs::vec_recycle(value, size, x_arg = "value")
# slice down to locations selected by i
value <- vctrs::vec_slice(value, i)
vctrs::vec_assign(x, i, value)
}
# with a vector the same size as x
mutate(
flights,
dep_delay = replace_at(dep_delay, dep_time > 500, -dep_delay)
)
# with a value
mutate(
flights,
dep_delay = replace_at(dep_delay, dep_time > 500, NA)
)
# at integer locations in x
mutate(
flights,
dep_delay = replace_at(dep_delay, c(5, 3), NA)
) |
How about mutate(flights, replace_at(dep_time > 500, dep_delay = -dep_delay)) with |
That can't be written as a standalone function IIUC. My hope was that we could figure out something that works outside of dplyr too |
I'm thinking about something along the following lines: options(conflicts.policy = list(warn = FALSE))
library(rlang)
library(vctrs)
library(tibble)
library(dplyr)
library(purrr)
replace_at <- function(where, ..., .envir = parent.frame()) {
replacement <- tibble(...)
orig_names <- names(replacement)
orig_values <- as_tibble(map(set_names(orig_names), get0, .envir))
vec_assign(orig_values, where, replacement)
}
foo <- 1:3
replace_at(2, foo = 5)
#> # A tibble: 3 × 1
#> foo
#> <int>
#> 1 1
#> 2 5
#> 3 3
tibble(foo) |>
mutate(replace_at(2, foo = 5))
#> # A tibble: 3 × 1
#> foo
#> <int>
#> 1 1
#> 2 5
#> 3 3 Created on 2023-11-03 with reprex v2.0.2 |
tidygraph now has a |
This would allow supporting an efficient
mutate_if_row()
verb here or elsewhere (assuming there's also a nice way to set the group data, as implemented inupdate_group_data()
here). I remember a discussion about using the group data for other exciting things such as bootstrapping?In the example below, the first three rows should remain unchanged.
Created on 2018-12-21 by the reprex package (v0.2.1.9000)
The text was updated successfully, but these errors were encountered: