Feature Request: Allow across() for column selection in complete() #1523

dereksonderegger · 2023-10-04T18:28:56Z

Because complete() can't act on grouping variables, we need to be able to work around this when necessary. One solution is to pull the grouping variables via group_vars() , then ungroup() the data and supply those strings to complete(). However, complete does not accept string inputs via direct use of all_of( ) or any_of() or the indirect use inside of across() via across(all_of( my_grp_vars )) solution. All of the other more nebulous tricks that I've found success in other scenarios, (e.g. {{ }}, !!ensym(), or !!enquo() ) also don't work.

library(tidyverse)

### A simple toy dataset
df <- data.frame(
  group1=factor(c('A','B'), levels=c('A','B','C')),
  group2=factor(c('W','X'), levels=c('W','X')),
  value = 1:2)

### What we actually want
df |> 
  complete(group1, group2)
#> # A tibble: 6 × 3
#>   group1 group2 value
#>   <fct>  <fct>  <int>
#> 1 A      W          1
#> 2 A      X         NA
#> 3 B      W         NA
#> 4 B      X          2
#> 5 C      W         NA
#> 6 C      X         NA

### The grouping variables we want.
by = c('group1','group2')

### Unfortunately any_of() can't be directly used, which is common, but
### we can not use across() inside complete() (or expand() for that matter)
df |>
  complete( across(any_of(by)) )
#> Error in `across()`:
#> ! Must only be used inside data-masking verbs like `mutate()`,
#>   `filter()`, and `group_by()`.
#> Backtrace:
#>      ▆
#>   1. ├─tidyr::complete(df, across(any_of(by)))
#>   2. ├─tidyr:::complete.data.frame(df, across(any_of(by)))
#>   3. │ ├─tidyr::expand(data, ...)
#>   4. │ └─tidyr:::expand.data.frame(data, ...)
#>   5. │   └─tidyr:::grid_dots(..., `_data` = data)
#>   6. │     └─rlang::eval_tidy(dot, data = mask)
#>   7. └─dplyr::across(any_of(by))
#>   8.   └─dplyr:::peek_mask()
#>   9.     └─dplyr:::context_peek(...)
#>  10.       ├─context_peek_bare(name) %||% ...
#>  11.       └─rlang::abort(glue("Must only be used inside {location}."), call = call)
Created on 2023-10-04 with [reprex v2.0.2](https://reprex.tidyverse.org/)`

The text was updated successfully, but these errors were encountered:

dereksonderegger · 2023-10-04T20:41:41Z

Just for completeness and to help someone else that has the same problem, here is my work around. The difficult problem is that I have a vector of group variables and in my problem, the length of the vector could be arbitrary and ensym() and ensyms() don't play nicely with vectors.

library(tidyverse)

### A simple toy dataset
df <- data.frame(
  group1=factor(c('A','B'), levels=c('A','B','C')),
  group2=factor(c('W','X'), levels=c('W','X')),
  value = 1:2)

### What we actually want
df |> 
  complete(group1, group2)
#> # A tibble: 6 × 3
#>   group1 group2 value
#>   <fct>  <fct>  <int>
#> 1 A      W          1
#> 2 A      X         NA
#> 3 B      W         NA
#> 4 B      X          2
#> 5 C      W         NA
#> 6 C      X         NA


### The grouping variables we want.
by_vars = c('group1','group2')


my_complete <- function(data, vars){
  out <- data
  for(i in 1:length(vars)){
    var <- vars[i]  # because ensym() only accepts simple symbols
    out <- out |>
      complete( !!ensym(var) ) |>
      group_by( !!ensym(var), .add=TRUE )
  }
  out <- out |> drop_na(all_of(vars))
  return(out)
}

df |> my_complete(by_vars)
#> # A tibble: 6 × 3
#> # Groups:   group1, group2 [6]
#>   group1 group2 value
#>   <fct>  <fct>  <int>
#> 1 A      W          1
#> 2 A      X         NA
#> 3 B      W         NA
#> 4 B      X          2
#> 5 C      W         NA
#> 6 C      X         NA

JakeRuss · 2023-10-05T13:09:45Z

Not to detract from the request, but the following might be a simpler fix in the interim:

library(tidyverse)

df <- data.frame(
  group1=factor(c('A','B'), levels=c('A','B','C')),
  group2=factor(c('W','X'), levels=c('W','X')),
  value = 1:2)

vector <- c('group1','group2')

df |> complete(!!! rlang::parse_exprs(vector))
#> # A tibble: 6 × 3
#>   group1 group2 value
#>   <fct>  <fct>  <int>
#> 1 A      W          1
#> 2 A      X         NA
#> 3 B      W         NA
#> 4 B      X          2
#> 5 C      W         NA
#> 6 C      X         NA

^{Created on 2023-10-05 with reprex v2.0.2}

hadley · 2023-11-01T18:48:38Z

@JakeRuss df |> complete(!!!syms(vector)) is a slightly simpler way of doing the same thing, but it would be nice if complete() did have some way of selecting multiple variables.

hadley · 2023-11-01T20:58:34Z

Duplicate of #1397

hadley marked this as a duplicate of #1397 Nov 1, 2023

hadley closed this as completed Nov 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Allow across() for column selection in complete() #1523

Feature Request: Allow across() for column selection in complete() #1523

dereksonderegger commented Oct 4, 2023 •

edited

Loading

dereksonderegger commented Oct 4, 2023

JakeRuss commented Oct 5, 2023

hadley commented Nov 1, 2023

hadley commented Nov 1, 2023

Feature Request: Allow across() for column selection in complete() #1523

Feature Request: Allow across() for column selection in complete() #1523

Comments

dereksonderegger commented Oct 4, 2023 • edited Loading

dereksonderegger commented Oct 4, 2023

JakeRuss commented Oct 5, 2023

hadley commented Nov 1, 2023

hadley commented Nov 1, 2023

dereksonderegger commented Oct 4, 2023 •

edited

Loading