Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Allow across() for column selection in complete() #1523

Closed
dereksonderegger opened this issue Oct 4, 2023 · 4 comments
Closed

Comments

@dereksonderegger
Copy link

dereksonderegger commented Oct 4, 2023

Because complete() can't act on grouping variables, we need to be able to work around this when necessary. One solution is to pull the grouping variables via group_vars() , then ungroup() the data and supply those strings to complete(). However, complete does not accept string inputs via direct use of all_of( ) or any_of() or the indirect use inside of across() via across(all_of( my_grp_vars )) solution. All of the other more nebulous tricks that I've found success in other scenarios, (e.g. {{ }}, !!ensym(), or !!enquo() ) also don't work.

library(tidyverse)

### A simple toy dataset
df <- data.frame(
  group1=factor(c('A','B'), levels=c('A','B','C')),
  group2=factor(c('W','X'), levels=c('W','X')),
  value = 1:2)

### What we actually want
df |> 
  complete(group1, group2)
#> # A tibble: 6 × 3
#>   group1 group2 value
#>   <fct>  <fct>  <int>
#> 1 A      W          1
#> 2 A      X         NA
#> 3 B      W         NA
#> 4 B      X          2
#> 5 C      W         NA
#> 6 C      X         NA

### The grouping variables we want.
by = c('group1','group2')

### Unfortunately any_of() can't be directly used, which is common, but
### we can not use across() inside complete() (or expand() for that matter)
df |>
  complete( across(any_of(by)) )
#> Error in `across()`:
#> ! Must only be used inside data-masking verbs like `mutate()`,
#>   `filter()`, and `group_by()`.
#> Backtrace:
#>      ▆
#>   1. ├─tidyr::complete(df, across(any_of(by)))
#>   2. ├─tidyr:::complete.data.frame(df, across(any_of(by)))
#>   3. │ ├─tidyr::expand(data, ...)
#>   4. │ └─tidyr:::expand.data.frame(data, ...)
#>   5. │   └─tidyr:::grid_dots(..., `_data` = data)
#>   6. │     └─rlang::eval_tidy(dot, data = mask)
#>   7. └─dplyr::across(any_of(by))
#>   8.   └─dplyr:::peek_mask()
#>   9.     └─dplyr:::context_peek(...)
#>  10.       ├─context_peek_bare(name) %||% ...
#>  11.       └─rlang::abort(glue("Must only be used inside {location}."), call = call)
Created on 2023-10-04 with [reprex v2.0.2](https://reprex.tidyverse.org/)`
@dereksonderegger
Copy link
Author

Just for completeness and to help someone else that has the same problem, here is my work around. The difficult problem is that I have a vector of group variables and in my problem, the length of the vector could be arbitrary and ensym() and ensyms() don't play nicely with vectors.

library(tidyverse)

### A simple toy dataset
df <- data.frame(
  group1=factor(c('A','B'), levels=c('A','B','C')),
  group2=factor(c('W','X'), levels=c('W','X')),
  value = 1:2)

### What we actually want
df |> 
  complete(group1, group2)
#> # A tibble: 6 × 3
#>   group1 group2 value
#>   <fct>  <fct>  <int>
#> 1 A      W          1
#> 2 A      X         NA
#> 3 B      W         NA
#> 4 B      X          2
#> 5 C      W         NA
#> 6 C      X         NA


### The grouping variables we want.
by_vars = c('group1','group2')


my_complete <- function(data, vars){
  out <- data
  for(i in 1:length(vars)){
    var <- vars[i]  # because ensym() only accepts simple symbols
    out <- out |>
      complete( !!ensym(var) ) |>
      group_by( !!ensym(var), .add=TRUE )
  }
  out <- out |> drop_na(all_of(vars))
  return(out)
}

df |> my_complete(by_vars)
#> # A tibble: 6 × 3
#> # Groups:   group1, group2 [6]
#>   group1 group2 value
#>   <fct>  <fct>  <int>
#> 1 A      W          1
#> 2 A      X         NA
#> 3 B      W         NA
#> 4 B      X          2
#> 5 C      W         NA
#> 6 C      X         NA

@JakeRuss
Copy link
Contributor

JakeRuss commented Oct 5, 2023

Not to detract from the request, but the following might be a simpler fix in the interim:

library(tidyverse)

df <- data.frame(
  group1=factor(c('A','B'), levels=c('A','B','C')),
  group2=factor(c('W','X'), levels=c('W','X')),
  value = 1:2)

vector <- c('group1','group2')

df |> complete(!!! rlang::parse_exprs(vector))
#> # A tibble: 6 × 3
#>   group1 group2 value
#>   <fct>  <fct>  <int>
#> 1 A      W          1
#> 2 A      X         NA
#> 3 B      W         NA
#> 4 B      X          2
#> 5 C      W         NA
#> 6 C      X         NA

Created on 2023-10-05 with reprex v2.0.2

@hadley
Copy link
Member

hadley commented Nov 1, 2023

@JakeRuss df |> complete(!!!syms(vector)) is a slightly simpler way of doing the same thing, but it would be nice if complete() did have some way of selecting multiple variables.

@hadley
Copy link
Member

hadley commented Nov 1, 2023

Duplicate of #1397

@hadley hadley marked this as a duplicate of #1397 Nov 1, 2023
@hadley hadley closed this as completed Nov 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants