Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

revdeps corporaexplorer/polle fail after forder re-use key #6349

Closed
tdhock opened this issue Aug 3, 2024 · 14 comments
Closed

revdeps corporaexplorer/polle fail after forder re-use key #6349

tdhock opened this issue Aug 3, 2024 · 14 comments
Assignees
Labels
revdep Reverse dependencies
Milestone

Comments

@tdhock
Copy link
Member

tdhock commented Aug 3, 2024

there are two revdeps which are affected by #4386 authored by @jangorecki and @MichaelChirico

https://rcdata.nau.edu/genomic-ml/data.table-revdeps/analyze/2024-08-02/

library(corporaexplorer) below looks like a test which is checking for computed result equal to some stored/saved result from a previous version of data table:

* checking tests ...
  Running 'testthat.R'
 ERROR
Running the tests in 'tests/testthat.R' failed.
Last 13 lines of output:
  
  == Skipped tests (2) ===========================================================
  * On CRAN (2): 'test_explorer_app.R:8:3', 'test_extractor_app.R:8:3'
  
  == Failed tests ================================================================
  -- Failure ('test_prep_func.R:12:3'): prepare_data() works ---------------------
  `test_obj` not equal to corporaexplorer::test_data.
  Component "original_matrix": Component "data_dok": Attributes: < Component "index": Attributes: < Component "__i": Attributes: < Modes: list, NULL > > >
  Component "original_matrix": Component "data_dok": Attributes: < Component "index": Attributes: < Component "__i": Attributes: < Lengths: 6, 0 > > >
  Component "original_matrix": Component "data_dok": Attributes: < Component "index": Attributes: < Component "__i": Attributes: < names for target but not for current > > >
  Component "original_matrix": Component "data_dok": Attributes: < Component "index": Attributes: < Component "__i": Attributes: < current is not list-like > > >
  
  [ FAIL 1 | WARN 3 | SKIP 2 | PASS 0 ]
  Error: Test failures

and not sure for library(polle) below

* checking tests ...
  Running 'test-all.R'
 ERROR
Running the tests in 'tests/test-all.R' failed.
Last 13 lines of output:
    5.     \-lava:::`distribution<-.lvm`(`*tmp*`, variable[i], ..., value = value[[i]])
    6.       +-lava::`addvar<-`(`*tmp*`, value = as.formula(paste("~", variable)))
    7.       \-lava:::`addvar<-.lvm`(...)
    8.         +-lava::`regression<-`(`*tmp*`, ..., value = value)
    9.         \-lava:::`regression<-.lvm`(`*tmp*`, ..., value = value)
   10.           \-lava::procformula(object, value, ...)
   11.             +-lava::`exogenous<-`(...)
   12.             \-lava:::`exogenous<-.lvm`(...)
   13.               \-lava::reindex(x)
   14.                 \-lava:::mat.lvm(x, res)
   15.                   \-base::cbind(idxM, pidxM)
  
  [ FAIL 42 | WARN 112 | SKIP 0 | PASS 261 ]
  Error: Test failures
@tdhock tdhock self-assigned this Aug 3, 2024
@tdhock tdhock added the revdep Reverse dependencies label Aug 3, 2024
@tdhock tdhock added this to the 1.16.0 milestone Aug 3, 2024
@jangorecki
Copy link
Member

jangorecki commented Aug 3, 2024

Generally setindex creates more attributes than it used to be, so if tests are checking exactness then there will be breaking change. I do not anticipate any other breaking change from this PR.
Big PRs like this is best to merge shortly after release to CRAN and avoid merging such shortly before release to CRAN.

@MichaelChirico
Copy link
Member

Filed kgjerde/corporaexplorer#33

@MichaelChirico
Copy link
Member

I get a slightly different error log on dev version of {polle}:

  ══ Failed tests ════════════════════════════════════════════════════════════════
  ── Failure ('test-policy_data.R:260:3'): policy_data melts wide data correctly in a two stage case. ──
  pd$stage_data not equal to `target_stage_data`.
  Attributes: < Component "index": Attributes: < Component "__event": Attributes: < target is NULL, current is list > > >
  ── Failure ('test-policy_data.R:303:3'): policy_data melts wide data correctly in a two stage case. ──
  pd$stage_data not equal to `target_stage_data`.
  Attributes: < Component "index": Attributes: < Component "__event": Attributes: < target is NULL, current is list > > >
  ── Failure ('test-policy_data.R:359:3'): policy_data melts wide data correctly in a single stage case. ──
  pd$stage_data not equal to `target_stage_data`.
  Attributes: < Component "index": Attributes: < Component "__event": Attributes: < target is NULL, current is list > > >
  ── Failure ('test-policy_data.R:381:3'): policy_data melts wide data correctly in a single stage case. ──
  ...$stage_data not equal to `target_stage_data`.
  Attributes: < Component "index": Attributes: < Component "__event": Attributes: < target is NULL, current is list > > >
  ── Failure ('test-policy_data.R:387:3'): policy_data melts wide data correctly in a single stage case. ──
  ...$stage_data not equal to `target_stage_data`.
  Attributes: < Component "index": Attributes: < Component "__event": Attributes: < target is NULL, current is list > > >

PS @tdhock WDYT about setting _R_CHECK_TESTS_NLINES_=0 to make sure we see a full log? It makes it much easier to trace.

If you're worried about the file size of the logs, maybe _R_CHECK_TESTS_NLINES_=100 instead?

@MichaelChirico
Copy link
Member

Filed AndreasNordland/polle#3

@tdhock
Copy link
Member Author

tdhock commented Aug 5, 2024

great thanks for investigating.

I set that env var on the revdep check system, tdhock/data.table-revdeps@8e20b4f so we should be able to see what that looks like in the next check: https://rcdata.nau.edu/genomic-ml/data.table-revdeps/analyze/2024-08-06/

@tdhock
Copy link
Member Author

tdhock commented Aug 7, 2024

ok here are the new results from _R_CHECK_TESTS_NLINES_=0,

* checking tests ...
  Running 'testthat.R'
 ERROR
Running the tests in 'tests/testthat.R' failed.
Complete output:
  > library(testthat)
  > library(corporaexplorer)
  > 
  > test_check("corporaexplorer")
  [ FAIL 1 | WARN 3 | SKIP 2 | PASS 0 ]
  
  == Skipped tests (2) ===========================================================
  * On CRAN (2): 'test_explorer_app.R:8:3', 'test_extractor_app.R:8:3'
  
  == Failed tests ================================================================
  -- Failure ('test_prep_func.R:12:3'): prepare_data() works ---------------------
  `test_obj` not equal to corporaexplorer::test_data.
  Component "original_matrix": Component "data_dok": Attributes: < Component "index": Attributes: < Component "__i": Attributes: < Modes: list, NULL > > >
  Component "original_matrix": Component "data_dok": Attributes: < Component "index": Attributes: < Component "__i": Attributes: < Lengths: 6, 0 > > >
  Component "original_matrix": Component "data_dok": Attributes: < Component "index": Attributes: < Component "__i": Attributes: < names for target but not for current > > >
  Component "original_matrix": Component "data_dok": Attributes: < Component "index": Attributes: < Component "__i": Attributes: < current is not list-like > > >
  
  [ FAIL 1 | WARN 3 | SKIP 2 | PASS 0 ]
  Error: Test failures

below abridged with ... for because too many similar errors.

* checking tests ...
  Running 'test-all.R'
 ERROR
Running the tests in 'tests/test-all.R' failed.
Complete output:
  > suppressPackageStartupMessages(library("testthat"))
  > test_check("polle")
  Loading required package: polle
  Loading required package: SuperLearner
  Loading required package: nnls
  Loading required package: gam
  Loading required package: splines
  Loading required package: foreach
  Loaded gam 1.22-4
  
  Super Learner
  Version: 2.0-29
  Package created on 2024-02-06
  
  [ FAIL 42 | WARN 112 | SKIP 0 | PASS 261 ]
  
  == Failed tests ================================================================
  -- Error ('test-g_empir.R:22:3'): g_empir predictions in a single stage setting --
  Error in `cbind(idxM, pidxM)`: cannot get data pointer of 'NULL' objects
  Backtrace:
       x
    1. \-polle::sim_single_stage(n = n, seed = 1) at test-g_empir.R:22:3
    2.   +-lava::`distribution<-`(...)
    3.   \-lava:::`distribution<-.lvm`(...)
    4.     +-lava::`distribution<-`(`*tmp*`, variable[i], ..., value = value[[i]])
    5.     \-lava:::`distribution<-.lvm`(`*tmp*`, variable[i], ..., value = value[[i]])
    6.       +-lava::`addvar<-`(`*tmp*`, value = as.formula(paste("~", variable)))
    7.       \-lava:::`addvar<-.lvm`(...)
    8.         +-lava::`regression<-`(`*tmp*`, ..., value = value)
    9.         \-lava:::`regression<-.lvm`(`*tmp*`, ..., value = value)
   10.           \-lava::procformula(object, value, ...)
   11.             +-lava::`exogenous<-`(...)
   12.             \-lava:::`exogenous<-.lvm`(...)
   13.               \-lava::reindex(x)
   14.                 \-lava:::mat.lvm(x, res)
   15.                   \-base::cbind(idxM, pidxM)
...
  -- Failure ('test-policy_data.R:387:3'): policy_data melts wide data correctly in a single stage case. --
  ...$stage_data not equal to `target_stage_data`.
  Attributes: < Component "index": Attributes: < Component "__event": Attributes: < target is NULL, current is list > > >
 ...
  -- Error ('test-q_xgboost.R:24:3'): q_xgboost gives the same result as plain xgboost --
  Error in `cbind(idxM, pidxM)`: cannot get data pointer of 'NULL' objects
  Backtrace:
       x
    1. \-polle::sim_single_stage(200, seed = 1) at test-q_xgboost.R:24:3
    2.   +-lava::`distribution<-`(...)
    3.   \-lava:::`distribution<-.lvm`(...)
    4.     +-lava::`distribution<-`(`*tmp*`, variable[i], ..., value = value[[i]])
    5.     \-lava:::`distribution<-.lvm`(`*tmp*`, variable[i], ..., value = value[[i]])
    6.       +-lava::`addvar<-`(`*tmp*`, value = as.formula(paste("~", variable)))
    7.       \-lava:::`addvar<-.lvm`(...)
    8.         +-lava::`regression<-`(`*tmp*`, ..., value = value)
    9.         \-lava:::`regression<-.lvm`(`*tmp*`, ..., value = value)
   10.           \-lava::procformula(object, value, ...)
   11.             +-lava::`exogenous<-`(...)
   12.             \-lava:::`exogenous<-.lvm`(...)
   13.               \-lava::reindex(x)
   14.                 \-lava:::mat.lvm(x, res)
   15.                   \-base::cbind(idxM, pidxM)
  
  [ FAIL 42 | WARN 112 | SKIP 0 | PASS 261 ]
  Error: Test failures

@jangorecki
Copy link
Member

jangorecki commented Aug 7, 2024

Maybe we just change the default for check.attributes to FALSS in all.equal method?

@tdhock
Copy link
Member Author

tdhock commented Aug 7, 2024

that would be fine with me

@MichaelChirico
Copy link
Member

MichaelChirico commented Aug 7, 2024

Maybe we just change the default for check.attributes to FALSS in all.equal method?

I think that's a bad idea because it will silently break users expecting all.equal.data.table() to also cover checking index/key equality. c.f.:

DT1 = data.table(a = 1:10)
DT2 = copy(DT1)
setkey(DT1, a)
all.equal(DT1, DT2)
# [1] "Datasets have different keys. 'target': [a]. 'current': has no key."
all.equal(DT1, DT2, check.attributes=FALSE)
# [1] TRUE

Three other alternatives:

  1. A new argument, e.g. check.index.attributes=FALSE by default to throw those hidden attributes away, but allow checking them for a really strict check.
  2. Just always throw those attributes away -- two data.tables that are equal in all but the metadata attributes of the indices are considered equal no matter what. I think that should be fine, there's no way that two data.tables that pass all.equal() for everything but those attributes can be at all different, right?
  3. Only compare those attributes if they are present in both target and current

@TysonStanley
Copy link
Member

I like option 3 with a message maybe? Can turn off the message by using an argument (possibly like in option 1). That way it's still clear while being more useful.

@jangorecki
Copy link
Member

jangorecki commented Aug 7, 2024

@MichaelChirico Agree, that could end up to be more serious breaking change.

I don't like 2 as makes us unable compare indexes. 3 sounds smart but may ended up to be very confusing by introducing some inconsistency. So I would say 1. Or 4: asking revdeps for check.attributes=F, which is not terrible idea, as it will teach revdep devs some extra care when writing tests.

@TysonStanley all.equal returns a character vector when not equal, it serves as a message already.

@tdhock
Copy link
Member Author

tdhock commented Aug 7, 2024

Jan is right, the revdeps should not be writing tests for exact equality with previously stored data tables, that is just asking for trouble (false positive test failures when there is nothing wrong).

@MichaelChirico
Copy link
Member

to clarify, for 2, I don't mean we skip checking indices(current) vs indices(target), I only mean checking these new metadata attributes of indices (any_isna, any_nonutf8, etc).

I'm not sure it matters if two tables only differ in whether those have been computed, I guess it can have some performance implication just like for rows matching but indices() not.

@MichaelChirico
Copy link
Member

With both {polle} and {corporaexplorer} now fixed (in their dev versions), I'm OK to close here. I think we can consider a new check.index.attributes= argument for all.equal.data.table() as a future FR if requested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
revdep Reverse dependencies
Projects
None yet
Development

No branches or pull requests

4 participants