revdeps corporaexplorer/polle fail after forder re-use key #6349

tdhock · 2024-08-03T14:03:46Z

there are two revdeps which are affected by #4386 authored by @jangorecki and @MichaelChirico

https://rcdata.nau.edu/genomic-ml/data.table-revdeps/analyze/2024-08-02/

library(corporaexplorer) below looks like a test which is checking for computed result equal to some stored/saved result from a previous version of data table:

* checking tests ...
  Running 'testthat.R'
 ERROR
Running the tests in 'tests/testthat.R' failed.
Last 13 lines of output:
  
  == Skipped tests (2) ===========================================================
  * On CRAN (2): 'test_explorer_app.R:8:3', 'test_extractor_app.R:8:3'
  
  == Failed tests ================================================================
  -- Failure ('test_prep_func.R:12:3'): prepare_data() works ---------------------
  `test_obj` not equal to corporaexplorer::test_data.
  Component "original_matrix": Component "data_dok": Attributes: < Component "index": Attributes: < Component "__i": Attributes: < Modes: list, NULL > > >
  Component "original_matrix": Component "data_dok": Attributes: < Component "index": Attributes: < Component "__i": Attributes: < Lengths: 6, 0 > > >
  Component "original_matrix": Component "data_dok": Attributes: < Component "index": Attributes: < Component "__i": Attributes: < names for target but not for current > > >
  Component "original_matrix": Component "data_dok": Attributes: < Component "index": Attributes: < Component "__i": Attributes: < current is not list-like > > >
  
  [ FAIL 1 | WARN 3 | SKIP 2 | PASS 0 ]
  Error: Test failures

and not sure for library(polle) below

* checking tests ...
  Running 'test-all.R'
 ERROR
Running the tests in 'tests/test-all.R' failed.
Last 13 lines of output:
    5.     \-lava:::`distribution<-.lvm`(`*tmp*`, variable[i], ..., value = value[[i]])
    6.       +-lava::`addvar<-`(`*tmp*`, value = as.formula(paste("~", variable)))
    7.       \-lava:::`addvar<-.lvm`(...)
    8.         +-lava::`regression<-`(`*tmp*`, ..., value = value)
    9.         \-lava:::`regression<-.lvm`(`*tmp*`, ..., value = value)
   10.           \-lava::procformula(object, value, ...)
   11.             +-lava::`exogenous<-`(...)
   12.             \-lava:::`exogenous<-.lvm`(...)
   13.               \-lava::reindex(x)
   14.                 \-lava:::mat.lvm(x, res)
   15.                   \-base::cbind(idxM, pidxM)
  
  [ FAIL 42 | WARN 112 | SKIP 0 | PASS 261 ]
  Error: Test failures

The text was updated successfully, but these errors were encountered:

jangorecki · 2024-08-03T15:14:31Z

Generally setindex creates more attributes than it used to be, so if tests are checking exactness then there will be breaking change. I do not anticipate any other breaking change from this PR.
Big PRs like this is best to merge shortly after release to CRAN and avoid merging such shortly before release to CRAN.

MichaelChirico · 2024-08-04T17:37:22Z

Filed kgjerde/corporaexplorer#33

MichaelChirico · 2024-08-04T18:01:35Z

I get a slightly different error log on dev version of {polle}:

  ══ Failed tests ════════════════════════════════════════════════════════════════
  ── Failure ('test-policy_data.R:260:3'): policy_data melts wide data correctly in a two stage case. ──
  pd$stage_data not equal to `target_stage_data`.
  Attributes: < Component "index": Attributes: < Component "__event": Attributes: < target is NULL, current is list > > >
  ── Failure ('test-policy_data.R:303:3'): policy_data melts wide data correctly in a two stage case. ──
  pd$stage_data not equal to `target_stage_data`.
  Attributes: < Component "index": Attributes: < Component "__event": Attributes: < target is NULL, current is list > > >
  ── Failure ('test-policy_data.R:359:3'): policy_data melts wide data correctly in a single stage case. ──
  pd$stage_data not equal to `target_stage_data`.
  Attributes: < Component "index": Attributes: < Component "__event": Attributes: < target is NULL, current is list > > >
  ── Failure ('test-policy_data.R:381:3'): policy_data melts wide data correctly in a single stage case. ──
  ...$stage_data not equal to `target_stage_data`.
  Attributes: < Component "index": Attributes: < Component "__event": Attributes: < target is NULL, current is list > > >
  ── Failure ('test-policy_data.R:387:3'): policy_data melts wide data correctly in a single stage case. ──
  ...$stage_data not equal to `target_stage_data`.
  Attributes: < Component "index": Attributes: < Component "__event": Attributes: < target is NULL, current is list > > >

PS @tdhock WDYT about setting _R_CHECK_TESTS_NLINES_=0 to make sure we see a full log? It makes it much easier to trace.

If you're worried about the file size of the logs, maybe _R_CHECK_TESTS_NLINES_=100 instead?

MichaelChirico · 2024-08-04T19:00:12Z

Filed AndreasNordland/polle#3

tdhock · 2024-08-05T10:43:51Z

great thanks for investigating.

I set that env var on the revdep check system, tdhock/data.table-revdeps@8e20b4f so we should be able to see what that looks like in the next check: https://rcdata.nau.edu/genomic-ml/data.table-revdeps/analyze/2024-08-06/

tdhock · 2024-08-07T10:51:46Z

ok here are the new results from _R_CHECK_TESTS_NLINES_=0,

* checking tests ...
  Running 'testthat.R'
 ERROR
Running the tests in 'tests/testthat.R' failed.
Complete output:
  > library(testthat)
  > library(corporaexplorer)
  > 
  > test_check("corporaexplorer")
  [ FAIL 1 | WARN 3 | SKIP 2 | PASS 0 ]
  
  == Skipped tests (2) ===========================================================
  * On CRAN (2): 'test_explorer_app.R:8:3', 'test_extractor_app.R:8:3'
  
  == Failed tests ================================================================
  -- Failure ('test_prep_func.R:12:3'): prepare_data() works ---------------------
  `test_obj` not equal to corporaexplorer::test_data.
  Component "original_matrix": Component "data_dok": Attributes: < Component "index": Attributes: < Component "__i": Attributes: < Modes: list, NULL > > >
  Component "original_matrix": Component "data_dok": Attributes: < Component "index": Attributes: < Component "__i": Attributes: < Lengths: 6, 0 > > >
  Component "original_matrix": Component "data_dok": Attributes: < Component "index": Attributes: < Component "__i": Attributes: < names for target but not for current > > >
  Component "original_matrix": Component "data_dok": Attributes: < Component "index": Attributes: < Component "__i": Attributes: < current is not list-like > > >
  
  [ FAIL 1 | WARN 3 | SKIP 2 | PASS 0 ]
  Error: Test failures

below abridged with ... for because too many similar errors.

* checking tests ...
  Running 'test-all.R'
 ERROR
Running the tests in 'tests/test-all.R' failed.
Complete output:
  > suppressPackageStartupMessages(library("testthat"))
  > test_check("polle")
  Loading required package: polle
  Loading required package: SuperLearner
  Loading required package: nnls
  Loading required package: gam
  Loading required package: splines
  Loading required package: foreach
  Loaded gam 1.22-4
  
  Super Learner
  Version: 2.0-29
  Package created on 2024-02-06
  
  [ FAIL 42 | WARN 112 | SKIP 0 | PASS 261 ]
  
  == Failed tests ================================================================
  -- Error ('test-g_empir.R:22:3'): g_empir predictions in a single stage setting --
  Error in `cbind(idxM, pidxM)`: cannot get data pointer of 'NULL' objects
  Backtrace:
       x
    1. \-polle::sim_single_stage(n = n, seed = 1) at test-g_empir.R:22:3
    2.   +-lava::`distribution<-`(...)
    3.   \-lava:::`distribution<-.lvm`(...)
    4.     +-lava::`distribution<-`(`*tmp*`, variable[i], ..., value = value[[i]])
    5.     \-lava:::`distribution<-.lvm`(`*tmp*`, variable[i], ..., value = value[[i]])
    6.       +-lava::`addvar<-`(`*tmp*`, value = as.formula(paste("~", variable)))
    7.       \-lava:::`addvar<-.lvm`(...)
    8.         +-lava::`regression<-`(`*tmp*`, ..., value = value)
    9.         \-lava:::`regression<-.lvm`(`*tmp*`, ..., value = value)
   10.           \-lava::procformula(object, value, ...)
   11.             +-lava::`exogenous<-`(...)
   12.             \-lava:::`exogenous<-.lvm`(...)
   13.               \-lava::reindex(x)
   14.                 \-lava:::mat.lvm(x, res)
   15.                   \-base::cbind(idxM, pidxM)
...
  -- Failure ('test-policy_data.R:387:3'): policy_data melts wide data correctly in a single stage case. --
  ...$stage_data not equal to `target_stage_data`.
  Attributes: < Component "index": Attributes: < Component "__event": Attributes: < target is NULL, current is list > > >
 ...
  -- Error ('test-q_xgboost.R:24:3'): q_xgboost gives the same result as plain xgboost --
  Error in `cbind(idxM, pidxM)`: cannot get data pointer of 'NULL' objects
  Backtrace:
       x
    1. \-polle::sim_single_stage(200, seed = 1) at test-q_xgboost.R:24:3
    2.   +-lava::`distribution<-`(...)
    3.   \-lava:::`distribution<-.lvm`(...)
    4.     +-lava::`distribution<-`(`*tmp*`, variable[i], ..., value = value[[i]])
    5.     \-lava:::`distribution<-.lvm`(`*tmp*`, variable[i], ..., value = value[[i]])
    6.       +-lava::`addvar<-`(`*tmp*`, value = as.formula(paste("~", variable)))
    7.       \-lava:::`addvar<-.lvm`(...)
    8.         +-lava::`regression<-`(`*tmp*`, ..., value = value)
    9.         \-lava:::`regression<-.lvm`(`*tmp*`, ..., value = value)
   10.           \-lava::procformula(object, value, ...)
   11.             +-lava::`exogenous<-`(...)
   12.             \-lava:::`exogenous<-.lvm`(...)
   13.               \-lava::reindex(x)
   14.                 \-lava:::mat.lvm(x, res)
   15.                   \-base::cbind(idxM, pidxM)
  
  [ FAIL 42 | WARN 112 | SKIP 0 | PASS 261 ]
  Error: Test failures

jangorecki · 2024-08-07T12:39:54Z

Maybe we just change the default for check.attributes to FALSS in all.equal method?

tdhock · 2024-08-07T12:43:06Z

that would be fine with me

MichaelChirico · 2024-08-07T18:34:24Z

Maybe we just change the default for check.attributes to FALSS in all.equal method?

I think that's a bad idea because it will silently break users expecting all.equal.data.table() to also cover checking index/key equality. c.f.:

DT1 = data.table(a = 1:10)
DT2 = copy(DT1)
setkey(DT1, a)
all.equal(DT1, DT2)
# [1] "Datasets have different keys. 'target': [a]. 'current': has no key."
all.equal(DT1, DT2, check.attributes=FALSE)
# [1] TRUE

Three other alternatives:

A new argument, e.g. check.index.attributes=FALSE by default to throw those hidden attributes away, but allow checking them for a really strict check.
Just always throw those attributes away -- two data.tables that are equal in all but the metadata attributes of the indices are considered equal no matter what. I think that should be fine, there's no way that two data.tables that pass all.equal() for everything but those attributes can be at all different, right?
Only compare those attributes if they are present in both target and current

TysonStanley · 2024-08-07T18:39:05Z

I like option 3 with a message maybe? Can turn off the message by using an argument (possibly like in option 1). That way it's still clear while being more useful.

jangorecki · 2024-08-07T18:49:47Z

@MichaelChirico Agree, that could end up to be more serious breaking change.

I don't like 2 as makes us unable compare indexes. 3 sounds smart but may ended up to be very confusing by introducing some inconsistency. So I would say 1. Or 4: asking revdeps for check.attributes=F, which is not terrible idea, as it will teach revdep devs some extra care when writing tests.

@TysonStanley all.equal returns a character vector when not equal, it serves as a message already.

tdhock · 2024-08-07T18:55:30Z

Jan is right, the revdeps should not be writing tests for exact equality with previously stored data tables, that is just asking for trouble (false positive test failures when there is nothing wrong).

MichaelChirico · 2024-08-07T18:57:02Z

to clarify, for 2, I don't mean we skip checking indices(current) vs indices(target), I only mean checking these new metadata attributes of indices (any_isna, any_nonutf8, etc).

I'm not sure it matters if two tables only differ in whether those have been computed, I guess it can have some performance implication just like for rows matching but indices() not.

MichaelChirico · 2024-08-12T21:27:42Z

With both {polle} and {corporaexplorer} now fixed (in their dev versions), I'm OK to close here. I think we can consider a new check.index.attributes= argument for all.equal.data.table() as a future FR if requested.

tdhock self-assigned this Aug 3, 2024

tdhock added the revdep Reverse dependencies label Aug 3, 2024

tdhock added this to the 1.16.0 milestone Aug 3, 2024

MichaelChirico mentioned this issue Aug 4, 2024

Skip checking attributes of data.table indices AndreasNordland/polle#3

Merged

MichaelChirico mentioned this issue Aug 12, 2024

GH-43633: [R] Add tests for packages that might be tricky to roundtrip data to Tables + Parquet files apache/arrow#43634

Merged

MichaelChirico closed this as completed Aug 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

revdeps corporaexplorer/polle fail after forder re-use key #6349

revdeps corporaexplorer/polle fail after forder re-use key #6349

tdhock commented Aug 3, 2024

jangorecki commented Aug 3, 2024 •

edited

Loading

MichaelChirico commented Aug 4, 2024

MichaelChirico commented Aug 4, 2024

MichaelChirico commented Aug 4, 2024

tdhock commented Aug 5, 2024

tdhock commented Aug 7, 2024

jangorecki commented Aug 7, 2024 •

edited

Loading

tdhock commented Aug 7, 2024

MichaelChirico commented Aug 7, 2024 •

edited

Loading

TysonStanley commented Aug 7, 2024

jangorecki commented Aug 7, 2024 •

edited

Loading

tdhock commented Aug 7, 2024

MichaelChirico commented Aug 7, 2024

MichaelChirico commented Aug 12, 2024

revdeps corporaexplorer/polle fail after forder re-use key #6349

revdeps corporaexplorer/polle fail after forder re-use key #6349

Comments

tdhock commented Aug 3, 2024

jangorecki commented Aug 3, 2024 • edited Loading

MichaelChirico commented Aug 4, 2024

MichaelChirico commented Aug 4, 2024

MichaelChirico commented Aug 4, 2024

tdhock commented Aug 5, 2024

tdhock commented Aug 7, 2024

jangorecki commented Aug 7, 2024 • edited Loading

tdhock commented Aug 7, 2024

MichaelChirico commented Aug 7, 2024 • edited Loading

TysonStanley commented Aug 7, 2024

jangorecki commented Aug 7, 2024 • edited Loading

tdhock commented Aug 7, 2024

MichaelChirico commented Aug 7, 2024

MichaelChirico commented Aug 12, 2024

jangorecki commented Aug 3, 2024 •

edited

Loading

jangorecki commented Aug 7, 2024 •

edited

Loading

MichaelChirico commented Aug 7, 2024 •

edited

Loading

jangorecki commented Aug 7, 2024 •

edited

Loading