Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In some cases using summarize_vars and add_overall_col leads to issue . #292

Closed
2 tasks
shajoezhu opened this issue Dec 22, 2021 · 5 comments · Fixed by #983
Closed
2 tasks

In some cases using summarize_vars and add_overall_col leads to issue . #292

shajoezhu opened this issue Dec 22, 2021 · 5 comments · Fixed by #983
Assignees
Labels
bug Something isn't working sme

Comments

@shajoezhu
Copy link
Contributor

shajoezhu commented Dec 22, 2021

Original message

source("https://raw.github.roche.com/NEST/nest_on_bee/master/bee_nest_utils.R")
bee_use_nest(release = "2021_07_07")
library(rtables)
library(tern)

a = data.frame(
  t = factor(c('a','a','a','a','a','b','b','b','b','')),
  a = factor(c(NA,NA, 'b','b','a', 'a','b', 'b', 'a', NA)),
  b = factor(c(NA,NA,'a','a','b', 'a','a','b','a', NA))
)

basic_table() %>%
  split_cols_by("t") %>%
  #add_overall_col("all pts") %>%
  split_rows_by("a") %>%
  summarize_vars("b") %>%
  build_table(a)

using add_overall_level is fine; using analyze instead of summarize_vars is also fine

Reported by @clarkliming

TODO

when using bee_use_nest(release = "2022_01_28")

This gives error as

Error in `[.data.frame`(spl_context, , names(colexprs), drop = FALSE) : 
  undefined columns selected
  • Investigate what is causing the issue, function calls? data? or rtables
  • If issue with rtables, file an issue
@shajoezhu shajoezhu added the bug Something isn't working label Dec 22, 2021
@yli110-stat697 yli110-stat697 self-assigned this May 6, 2022
@yli110-stat697
Copy link
Contributor

yli110-stat697 commented May 6, 2022

I think it is due to the fact that we didn't handle the missing values correctly. If we do df_explict_na on dataframe a, we can get the table rendered.

a = data.frame(
  t = factor(c('a','a','a','a','a','b','b','b','b','')),
  a = factor(c(NA,NA, 'b','b','a', 'a','b', 'b', 'a', NA)),
  b = factor(c(NA,NA,'a','a','b', 'a','a','b','a', NA))
)

basic_table() %>%
  split_cols_by("t") %>%
  add_overall_col("all pts") %>%
  split_rows_by("a") %>%
  summarize_vars("b") %>%
  build_table(df_explicit_na(a))

@shajoezhu
Copy link
Contributor Author

Hi @yli110-stat697 , you are absolutely right, in the helper of summarize_vars function, it does say

#' @describeIn summarize_variables Analyze Function to add a descriptive analyze
#'   layer to `rtables` pipelines. The analysis is applied to a vector and
#'   return the summary, in `rcells`. The ellipsis (`...`) conveys arguments to
#'   [s_summary()], for instance `na.rm = FALSE` if missing data should be
#'   accounted for. When factor variables contains `NA`, it is expected that `NA`
#'   have been conveyed to `na_level` appropriately beforehand with
#'   [df_explicit_na()].

I wonder we should add some assertion here, so the error message can be more informative. @clarkliming FYI, @yli110-stat697 figured out the source of the error

@yli110-stat697
Copy link
Contributor

yli110-stat697 commented May 6, 2022

Hi @shajoezhu , yes I was thinking that we should make the error message more informative, but I'm stuck how we can do that. It sees that currently we can only capturing NA, but @clarkliming 's example has '' which gave the mysterious error message.

For example

a = data.frame(
  t = factor(c('a','a','a','a','a','b','b','b','b', '')),
  a = factor(c(NA,NA, 'b','b','a', 'a','b', 'b', 'a', NA)),
  b = factor(c(NA,NA,'a','a','b', 'a','a','b','a', NA))
)

basic_table() %>%
  split_rows_by("t") %>%
  summarize_vars("b") %>%
  build_table(a)

would give the error message that

Error: NA in x has not been conveyed to na_level, please use explicit factor levels.

but split_cols_by will give the mysterious error message.

basic_table() %>%
    split_cols_by("t") %>%
    summarize_vars("b") %>%
    build_table(a)

For the workflow of tern functions, we have assertions within s_fun to check on the data passed to us, but this particular example seems coming from the empty string when doing split_cols_by.

@shajoezhu
Copy link
Contributor Author

hi @yli110-stat697 , fyi, an issue is raised at rtables relating to this insightsengineering/rtables#321. I am going to block this issue for now.

@kpagacz
Copy link
Contributor

kpagacz commented May 27, 2022

If the function only works properly when data is transformed why not transform it inside the function?

@shajoezhu shajoezhu removed the blocked label Apr 23, 2023
@edelarua edelarua self-assigned this Jun 14, 2023
@edelarua edelarua linked a pull request Jun 22, 2023 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working sme
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants