Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify package #498

Closed
wants to merge 19 commits into from
Closed

Simplify package #498

wants to merge 19 commits into from

Conversation

gogonzo
Copy link
Contributor

@gogonzo gogonzo commented Dec 13, 2023

WIP

fixes

This is a proposition to simplify FilteredData as much as possible (for me today). Some information are written in vignette for developers and in R/methods-documentation.R file.

Changes:

  • removed FilteredDataset and FilterStates
  • FIlteredData is the class to keep and manage all states
  • Instead of making rigid class structures, I propose little dependency injection where different datasets can be handled by set of S3 methods (get_code, get_slice_vector, module_add, module_active, module_overview etc.). Thanks to this change there is a possibility to make arbitrary nested objects structure. For example SummarizedExperiment can be standalone dataset and could be nested inside MultiAssayExperiment object. Same could apply to lists, "spatial" objects and other data classes composed of other smalled structures. This can support any nesting dept
  • Approach also allows to override any S3 method and write own - for example to change how ui looks like or replace dplyr::filter call with data.frame. Theoretically, we should export only xyz.default method so everyone can override methods for specific classes (for example data.frame). If we register S3 method for data.frame it will be not possible to override data.frame method. Confirmed here Change dispatch method to S3 instead of custom (target: @132_filter_panel_ui@main) #501
testing app
options(teal.log_level = "TRACE")
library(shiny)
library(cond)
data(airway)
pkgload::load_all("teal.slice")
pkgload::load_all("teal")

data <- teal_data() |>
  within({
    set.seed(1)
    library(scda)
    library(MultiAssayExperiment)
    data(miniACC, envir = environment())
    data(airway, envir = environment())
    ADSL <- synthetic_cdisc_data("latest")$adsl
    ADSL$categorical <- sample(letters[1:3], size = nrow(ADSL), replace = TRUE, prob = c(.1, .3, .6))
    ADTTE <- synthetic_cdisc_data("latest")$adtte
    ADRS <- synthetic_cdisc_data("latest")$adrs
  })

datanames(data) <- c("ADSL", "ADTTE", "ADRS", "miniACC", "airway")
join_keys(data) <- default_cdisc_join_keys[c("ADSL", "ADTTE", "ADRS")]

default_filters <- teal.slice::teal_slices(
  teal_slice(dataname = "ADSL", varname = "categorical", selected = c("a"), id = "categorical", multiple = FALSE),
  teal_slice(id = "SE", title = "Safety-Evaluable", dataname = "ADSL", expr = "SAFFL == 'Y'"),
  teal_slice(dataname = "miniACC", varname = "years_to_birth", selected = c(30, 50), keep_na = TRUE, keep_inf = FALSE),
  teal_slice(dataname = "miniACC", varname = "gender", selected = "female", keep_na = TRUE),
  teal_slice(dataname = "miniACC", varname = "ARRAY_TYPE", selected = "protein_level", experiment = "RPPAArray", arg = "subset"),
  teal_slice(dataname = "miniACC", varname = "Genes", selected = "G6PD", experiment = "RPPAArray", arg = "subset"),
  count_type = "none"
)

app <- init(
  data = data,
  modules = modules(
    modules(
      label = "tab1",
      example_module("funny"),
      example_module("funny2", datanames = "ADTTE") # will limit datanames to ADTTE and ADSL (parent)
    )
  ),
  filter = default_filters
)

runApp(app)

@gogonzo gogonzo added the core label Dec 13, 2023
@gogonzo gogonzo linked an issue Dec 13, 2023 that may be closed by this pull request
@gogonzo gogonzo changed the title WIP Simplify package Dec 20, 2023
@gogonzo gogonzo mentioned this pull request Dec 20, 2023
34 tasks
@@ -575,15 +562,17 @@ FilterState <- R6::R6Class( # nolint
# @description
# Return variable name prefixed by `dataname` to be evaluated as extracted object,
# for example `data$var`
# @param dataname `character(1)` name of a dataset
# @param extract_type `character(1)` type of extraction, one of `character(0)`, `"list"`, `"matrix"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I kind of dislike mixing character(0) with character(1). Usually there is a support of "none" value instead of character(0).
  • If you do the above you can then very easily validate values with match.arg(). Current implementation is rather: "list", "matrix" and everything else. (don't know if this is checked elsewhere - I'm looking at this method only)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know exactly how to call each way of extraction:

  • character(0) means that variables in a subset call are not prefixed (data |> subset(x == "a"))
  • list means that variables are extracted by $ in a subset call (data |> subsetByColData(data$x == "a"))
  • matrix means that variables are extracted by [[ in a subset call (data |> subset(data[, "x"] == "a"))

Somebody has better terms for above cases?

Suggested change
# @param extract_type `character(1)` type of extraction, one of `character(0)`, `"list"`, `"matrix"`
# @param extract_type `character(1)` type of extraction, one of `"nse"`, `"list"`, `"matrix"`

Copy link
Contributor

@chlebowa chlebowa Dec 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps change the name of the argument to var_prefix or something like that and then c("none", "list", "matrix") will be proper options?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps change the name of the argument to var_prefix or something like that and than c("none", "list", "matrix") will be proper options?

Yes exactly.

I have one more thought but I'm not yet super convinced to this. I'm interested in hearing your thoughts on this.
Instead of a string which is then matched to sprintf pattern, we can allow providing sprintf pattern or the whole sprintf call.
Taking this idea further - we are actually talking about an expression and not message / warning / anything else that requires a character type of data. Hance, instead of a sptrintf that produced a char which is then parsed to expression -> move to substitute that returns expression directly without string based step. I mean - operate on the right data type.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Note this is an internal class and the user never provides anything, unless you count senior classes as users.
  2. We have gone through operating on strings and language objects throughout teal.slice back in Spring. Trust me, the code to build this as a string is much, much simpler than that required to build expression. This is currently (I believe) the only departure from building expressions from language objects in the package.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have one more thought but I'm not yet super convinced to this. I'm interested in hearing your thoughts on this.
Instead of a string which is then matched to sprintf pattern, we can allow providing sprintf pattern or the whole sprintf call.

Theoretically you can just provide get_call(varname = "dataname$varname"), but this doesn't make sense, because dataname and varname is an information kept inside of FilterState class. When gathering calls from all FilterState object, parent (method which gathers) doesn't know anything about varname.

Providing sprintf pattern is not enough neither, because:

  1. sprintf("%s$%s", dataname, varname)
  2. sprintf('%s[, "%s"]', dataname, varname)
  3. `sprintf("%s", varname)

As you can see, dataname in first two is a first argument and varname second. In case of (3) dataname is not an argument for sprintf at all and varname is second.

Pattern is hard do handle because different arguments are taken in all cases.

Copy link
Contributor

@chlebowa chlebowa Dec 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just so this thread doesn't get buried and forgotten, I suggest to change extract_type to
var_prefix = c("none", "list", "matrix")
which could then be handled with
var_prefix <- match.arg(var_prefix)
and
ans <- switch(var_prefix, ...)

@@ -9,7 +9,7 @@
#' library(shiny)
#' datasets <- teal.slice::init_filtered_data(list(iris = iris, mtcars = mtcars))
#' @export
init_filtered_data <- function(x, join_keys = teal.data::join_keys(), code, check) { # nolint
init_filtered_data <- function(x = list(), join_keys = teal.data::join_keys(), code, check) { # nolint
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I don't think x should have a default value. Usually a default value is a signal that a given argument is not obligatory which is not the case.
  • If we are talking about this line - there are code and check argumnets that are both deprecated and without a default value. I think we can safely add default values. Note that missing() used below only checks if a value was explicitely specified in the function call.
foo <- \(x = NULL) {`if`(missing(x), 0, 1)}
foo()
#> [1] 0
foo(10)
#> [1] 1

Created on 2023-12-20 with reprex v2.0.2

@pawelru
Copy link
Contributor

pawelru commented Dec 20, 2023

Can you please provide some more explanation behind new dispatching? I can't stop having a feeling that it's a step back from what we are having currently.

You raised a data.table case which I think should be addressed by a new S3 method for data.table instead of overwriting existing one for data.frame. That should be true for anything else that extends data.frame. I personally don't follow the requirement for nesting in MAE and SE.

@chlebowa
Copy link
Contributor

  • If we register S3 method for data.frame it will be not possible to override data.frame method.

This is debatable and under investigation.

Can you please provide some more explanation behind new dispatching? I can't stop having a feeling that it's a step back from what we are having currently.

Rather than keeping R6 classes for every supported class of dataset, we have FilteredData calling generic functions on datasets (e.g., ui_active, get_code) and S3 methods to support particular classes. It is much easier to add a set of S3 methods than a set of R6 classes and there will be instructions specifying what is strictly required.

You raised a data.table case which I think should be addressed by a new S3 method for data.table instead of overwriting existing one for data.frame.

I think what @gogonzo was not talking about a dataset being a data.table but rather: we use dplyr::filter to apply filtering to data.frame datasets, which is encoded in the data.frame method here, but if someone wants to use data.table filtering instead, one can define their own method for data.frame to do just that.

I personally don't follow the requirement for nesting in MAE and SE.

MAE is composed of experiments and therefore is a nested structure. Experiments can be data.frame, SummarizedExperiment and other classes. This nesting has always been an issue, the difference here is in how it is handled.

By creating standalone methods (method suites, actually) for any class that we expect to be an experiment in MAE we allow them to be standalone datasets themselves. The methods for MAE simply dispatch to methods relevant for particular experiment classes.

@pawelru
Copy link
Contributor

pawelru commented Dec 20, 2023

Thanks @chlebowa for the context.

I agree with you that extending should be done with a new S3 method which is much easier than a new R6 class. I was more referring to the big if statement (that is conceptually more a switch than if) in foo.default that is calling foo_bar and foo_baz instead of having foo.bar and foo.baz directly. The latter is much easier to extend compared to the former one that would require additional value in that switch statement.
I think it's connected to the overwriting S3 methods investigation so I'm looking forward for the outcome.

I think what @gogonzo was not talking about a dataset being a data.table but rather: we use dplyr::filter to apply filtering to data.frame datasets, which is encoded in the data.frame method here, but if someone wants to use data.table filtering instead, one can define their own method for data.frame to do just that.

I think we are in agreement here - this should be a new method. Please just note that data.table way of filtering works only if a filtered object is a data.table. It does not work for a data.frame - the function used is data.table:::[.data.table. Because of above I think this should be a separate foo.data.table S3 method.

@chlebowa
Copy link
Contributor

chlebowa commented Dec 20, 2023

I think it's connected to the overwriting S3 methods investigation so I'm looking forward for the outcome.

Yup, exactly.

Because of above I think this should be a separate foo.data.table S3 method.

One can wish to handle data.frame x by as.data.frame(data.table::as.data.table(x)[... and that would be a method for dataset of class data.frame. If the input dataset is a data.table, yes, a full suite* of methods for that class should be provided.

(*) Since data.table inherits from data.frame, a limited suite can be sufficient.

@gogonzo
Copy link
Contributor Author

gogonzo commented Dec 21, 2023

I personally don't follow the requirement for nesting in MAE and SE.

Currently (on main) SummarizedExperiment, Matrix are handled only as a subitem of MAE. In this PR it doesn't matter if the this is a part of a larger-nested structure or not. It opens possibility to apply filter-panel for objects with any nested structure.

I was more referring to the big if statement (that is conceptually more a switch than if) in foo.default that is calling foo_bar and foo_baz instead of having foo.bar and foo.baz directly. The latter is much easier to extend compared to the former one that would require additional value in that switch statement.

I've checked that foo.bar registered in a package is prioritized before foo.bar outside of the package. Doesn't seem legit, I agree, this is why @averissimo double checks my conclusion. If we register foo.bar in the teal.slice then generic foo would not use foo.bar registered outside of the package. This is why I made foo.default in a weird way. We will confirm soon if it is true.

You raised a data.table case which I think should be addressed by a new S3 method for data.table instead of overwriting existing one for data.frame.

My mistake. I don't know why i wrote data.table. It should be data.frame. And yes, overriding defaults for data.frame should be done through registering a new S3 method for data.table.

Copy link
Contributor

@chlebowa chlebowa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love the idea but I think it needs a lot of refinement.

get_filter_call requires exposing the FilterState class, and the FilteredData$state_list member, both of which were hitherto private, and for good reason.
Extending support to a new class would require deep knowledge of the entire FilterPanel, probably more than I have myself. I find a third party writing their own srv_add and srv_active functions unrealistic.

Some specific suggestions for get_filter_call:

  • should take predicates, not FilterStates
  • should use hard-coded subsetting function to build subsetting expression
  • could take dataname as argument
  • perhaps move varname preixing from FilterState$get_call to this function?
    -- keep_na and keep_inf could be tough to handle
    -- perhaps use something akin to tmc::substitute_names?
  • final stages of filter call construction are esoteric
rhs <- as.call(c(str2lang("dplyr::filter"), c(list(dataname_lang), combined_predicate)))
substitute(
  expr = dataname_lang <- rhs,
  env = list(dataname_lang = dataname_lang, rhs = rhs)
) 

-- they will be looked at by people implementing their own methods
-- perhaps introduce a helper function like get_final_call <- function(dataname, predicates(prefixed), FUN)


We can still accept this as a good direction of change even if all the mechanisms remain internal and the Core Dev team takes on responsibility for adding support for all requested classes.

@@ -22,9 +22,9 @@
#' @keywords internal
calls_combine_by <- function(calls, operator) {
checkmate::assert_list(calls)
if (length(calls) > 0L) checkmate::assert_list(calls, types = c("call", "name"))
checkmate::assert_string(operator)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we could limit operator to valid operators & and | (possibly xor)?

@@ -575,15 +562,17 @@ FilterState <- R6::R6Class( # nolint
# @description
# Return variable name prefixed by `dataname` to be evaluated as extracted object,
# for example `data$var`
# @param dataname `character(1)` name of a dataset
# @param extract_type `character(1)` type of extraction, one of `character(0)`, `"list"`, `"matrix"`
Copy link
Contributor

@chlebowa chlebowa Dec 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just so this thread doesn't get buried and forgotten, I suggest to change extract_type to
var_prefix = c("none", "list", "matrix")
which could then be handled with
var_prefix <- match.arg(var_prefix)
and
ans <- switch(var_prefix, ...)

@@ -124,8 +124,10 @@ FilterStateExpr <- R6::R6Class( # nolint
#' for selected variable type.
#' Method is using internal reactive values which makes it reactive
#' and must be executed in reactive or isolated context.
#' @param extract_type (`character(0)`, `character(1)`)\cr
#' ignored
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#' ignored
#' for compatibility with `FilterState$get_call`; ignored

Comment on lines +8 to +13
#' @param id (`character`) Module id
#' @param data (`object`) Object of any class
#' @param data_filtered (`object`) Object of any class but must be the same class as `data`
#' @param dataname (`character`) Name of the dataset
#' @param filtered_data (`FilteredData`) Object of class `FilteredData`.
#' @param states_list (`list`) List of `FilterState` objects
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#' @param id (`character`) Module id
#' @param data (`object`) Object of any class
#' @param data_filtered (`object`) Object of any class but must be the same class as `data`
#' @param dataname (`character`) Name of the dataset
#' @param filtered_data (`FilteredData`) Object of class `FilteredData`.
#' @param states_list (`list`) List of `FilterState` objects
#' @param id (`character`) `shiny` module id.
#' @param data Data object of any class.
#' @param data_filtered Subset of `data`.
#' @param dataname (`character`) Name of the dataset.
#' @param filtered_data (`FilteredData`) Object of class `FilteredData`.
#' @param states_list (`list`) List of `FilterState` objects.

#'
#' `teal.slice` provides set of methods to manage filters for different data types.
#' # todo: write more about [srv_add()] and consequences of creating `teal_slice` for certain dataset
#' # - how slice defines location of the variable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is covered in ?teal_slice.

Copy link
Contributor

@chlebowa chlebowa Dec 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documentation says one of the supported class is "matrix" but the functions use "array". This latter should be changed as we support matrices but not (n-dimensional) arrays.

I think we should provide methods for DataFrame because they (DataFrames) are found in SE and MAE slots and they do not inherit from data.frame.

R/FilteredData.R Outdated
Copy link
Contributor

@chlebowa chlebowa Dec 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FilteredData $format and $print methods were lost.

@donyunardi
Copy link
Contributor

Just a reminder not to merge this before our teal release. Pushing this PR to later sprint.

@m7pr
Copy link
Contributor

m7pr commented Feb 19, 2024

@donyunardi is this the proper sprint for this card?

@donyunardi
Copy link
Contributor

@donyunardi is this the proper sprint for this card?

Maybe not as we still have other priorities.
I'll remove it from the board for now.

@gogonzo gogonzo closed this Aug 6, 2024
@github-actions github-actions bot locked and limited conversation to collaborators Aug 6, 2024
@gogonzo gogonzo deleted the 132_filter_panel_ui@main branch August 6, 2024 08:43
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remove access to data from UI modules
6 participants