Isolate each unit test in a self-contained file system #74

AlexAxthelm · 2017-08-24T15:19:50Z

I've been trying to chase down why some of the unit tests (consistently) fail on my local machine.

I'm planning a PR with some of the easy, minor fixes, but one that might be trickier is that anything with
r expect_equal(in_progress(), character(0)) fails for me.

My current thought is that dclean() isn't actually scrubbing the old plan, because report.md shows up as in progress, every time (even though it's been unlinked).

My proposal is to wrap the testthat files in something along the lines of

withr::with_dir(tempdir(), {

context("blah, blah blah")
clean()
testthat("the first test", {
#<CODE>
}
testthat("the second test", {
#<CODE>
}
#<so on, and so forth...>
clean()
}

This would allow us to use the actual clean(), and put that through its paces on a real directory, rather than just using dclean(). Additionally, this would get rid of any weird coincidental ordering effects, or unexpected persistence from previous tests.

The three big issues that I can see with this mothod right now are:
1: I'm not sure exactly how you've implemented #58, but I want to avoid conflicts with that.
2: Since tempdir() is the same within a session of R, we would need to make sure to run clean() at the beginning and end of each test. Alternately, we could put a file.remove(tempdir()) at the end of each test file, to completely scrap the tempdir, and force it to recreate for each test. A slight performance hit, but not huge.
3: tempdir() actually will keep the same value for child sessions spawned by parallel which is sort of a mixed bag. Good for testing, since we can still use it with multiple jobs (although I'm not entirely sure about how it would interact with parallelism = Makefile). Mostly the warning would be to not run tests in the same session as an actual make where something important is being made.

The text was updated successfully, but these errors were encountered:

AlexAxthelm · 2017-08-24T15:22:34Z

Alternately, we could use a fictitious but static directory (.testdir/ or something similar), and generate and scrub it each time.

wlandau-lilly · 2017-08-24T16:41:22Z

withr::with_dir(tempdir(), {...}) seems like a much better way of doing things anyway. Thank you for logging this issue. I agree that we will have to iron out the parallel computing concerns, and this will be much easier to work on after I push my fix to #58 so you can test multiple parallelism configurations automatically.

In tests/testthat/cache.R, the working directory changes in order to test cached(search = TRUE), so we may have to be careful there.

With expect_equal(in_progress(), character(0)), I am not sure dclean() is at fault since it explicitly removes the entire .drake/ cache with unlink(..., recursive = TRUE). Would you mind posting your test errors?

No matter how many platforms I test on, platform-dependent test errors still come along and surprise me every once in a while. Several months ago, for example, the tests in a previous CRAN release were failing on my personal Ubuntu box because sort() used a different order. The package itself was not broken, but I had to go back into the tests and explicitly sort everything so all the checks would pass.

wlandau-lilly · 2017-08-24T16:49:39Z

I wonder, do the unit tests for storr all pass on your machine?

AlexAxthelm · 2017-08-24T17:54:52Z

I think I fixed it. I tried a clean(destroy = TRUE) from the base drake/ directory, and all the test ran fine (except for other-features @ L51, detailed below). I must have had a .drake/ hanging around from doing some interactive testing where I didn't run dclean() at the end.

I was able to recreate the issue by running a make() in the base directory that failed, so it had an in_progress object still hanging around. then ending my R session, and starting another, I got:

 ⋯  Documents  drake  R                                                                             
> drake::in_progress()                                                                                 
[1] "small"                                                                                            
> devtools::test()                                                                                     
Loading drake                                                                                          
Loading required package: testthat                                                                     
Testing drake                                                                                          
basic: ..........................                                                                 
cache: ...1.....................................2......................                                
command-changes: ........                                                                              
edge-cases: .....................                                                                      
envir: .....                                                                                           
examples: ....                                                                                         
full-build: .............................................                                              
generate: ...................                                                                          
import-file: ..........                                                                                
import-object: ..........                                                                              
intermediate-file: ......                                                                              
Makefile: .............................                                                                
namespaced: .........                                                                                  
other-features: 3............4..................................                                       
plan: .............                                                                                                                                                           
                                                                                                       
Failed -------------------------------------------------------------------------                       
1. Failure: cache functions work (@test-cache.R#11) ----------------------------                       
in_progress(search = FALSE) not equal to character(0).                                                 
Lengths differ: 1 vs 0                                                                                 
                                                                                                       
                                                                                                       
2. Failure: cache functions work (@test-cache.R#114) ---------------------------                       
in_progress(search = TRUE, path = s) not equal to character(0).                                        
Lengths differ: 1 vs 0                                                                                 
                                                                                                       
                                                                                                       
3. Failure: in_progress() works (@test-other-features.R#6) ---------------------                       
in_progress() not equal to character(0).                                                               
Lengths differ: 1 vs 0                                                                                 
                                                                                                       
                                                                                                       
4. Failure: deps() correctly reports dependencies of functions and commands (@test-other-features.R#51)
deps(my_plan$command[2]) not equal to c("'tracked_input_file.rds'", "readRDS", "x").                   
2/3 mismatches                                                                                         
x[1]: "readRDS"                                                                                        
y[1]: "'tracked_input_file.rds'"                                                                       
                                                                                                       
x[2]: "'tracked_input_file.rds'"                                                                       
y[2]: "readRDS"                                                                                        
>

So while this may no longer be a bug fix issue, I wonder if it would still be worth implementing after the next round of approvals are up? I know that when I'm looking to write tests for #56, it would be easier for me to not worry about the environment.

Also, just tried storr, and those tests run fine.

Additionally, I have run into the sort() issue on my machine too, suggesting that my Ubuntu is doing something different yet than yours? In particular in the "other features" tests, line 51 fails on my Ubuntu setup, but passes on windows. Strangely enough, the next line (52) which also has multiple elements passes just fund on both. I tried wrapping them in a sort(), and everything passed. I can look for more like that one, where order doesn't matter, when I lint the test files.

wlandau-lilly · 2017-08-24T18:26:41Z

Glad you figured it out! And yes, while not technically a bug, I still think this is a problem because drake should pay no attention to .drake/ caches that are not part of the unit tests. I thought I had fixed this problem long ago.

Also, I think I just added enough sorting on my end to test_that("deps() ...", {...}) (test-other-features.R). That's a relatively new test, so I am not surprised that I forgot to add extra sorting.

wlandau-lilly · 2017-08-25T04:37:06Z

I actually decided to create a separate local file system for each test.

test_with_dir <- function(desc, code){
  root <- "workspaces"
  if (!file.exists(root)){
    dir.create(root)
  }
  relative_dir <- base32_encode(desc) %>%
    digest(algo = hash_algorithm)
  dir <- file.path(root, relative_dir)
  dir_empty(dir)
  dir <- normalizePath(dir)
  with_dir(dir, test_that(desc = desc, code = code))
  # Keep dir until the very end. We should check for hash conflicts. If there are, we need to rename 
  # the prose descriptions of the tests.
}

Rationale:

You had mentioned that tempdir() might conflict with a serious make() in an R session.
If there is some way to run all the calls to test_that() (or in our case, test_with_dir()) in parallel R sessions, we are prepared. Wouldn't that be awesome?
I like to run different kinds of testing on different clones of drake on the same machine, and I would rather use separate local file systems instead of all the checks pointing to potentially the same tempdir() (rare but possible).
This approach gets around my strange Linux errors on Red Hat: shell-init: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory.

wlandau-lilly · 2017-08-25T05:29:27Z

I guess the downside is that drake might still get confused if we test .drake/ caches in parent directories, but I think that can be fixed on a case-by-case basis as the need arises.

wlandau-lilly · 2017-08-25T05:55:46Z

By the way, I created a nonempty drake/.drake/ cache to try to reproduce the errors you were seeing, but all the tests passed in my local development copy (Red Hat Linux). But in the meantime, the strange shell-init error has been coming back. The tests seem to run fine in spite of it.

wlandau-lilly · 2017-08-25T13:51:54Z

Never mind about the shell-init errors. I managed to avoid them as long as I retain everything in tests/testthat/workspaces until the very end. This is also a good way of checking collisions: currently, the description of each test has to be unique or else multiple tests could share the same file system.

AlexAxthelm · 2017-08-25T14:07:12Z

Do you get issues with conflicting caches if you have an unfinished (failed) test, and then try to run it again? Either same session or after a restart? Mostly I'm wondering if it would be worth it to put a line to unlink(dir, recursive = TRUE) so that the directory is fresh before each test run.

wlandau-lilly · 2017-08-25T14:17:49Z

I did not check, but because of the dir_empty(dir) line, I do not think there will be conflicting caches. I had unlink(dir, recursive = TRUE) before, but removing it seemed to remove the elusive shell-init errors. I am, however, clearing out tests/testthat/workspaces before and after test_check() in testthat.R.

wlandau-lilly · 2017-08-25T15:46:19Z

The approaches above still gave me problems with the file system, so I decided to go back to tempdir(), but with a unique subfolder for each test. I would very much like to avoid dangling files from any failed/interrupted tests. This seems to work so far:

test_with_dir <- function(desc, code){
  root <- tempdir()
  stopifnot(file.exists(root))
  relative_dir <- base32_encode(desc) %>%
    digest(algo = hash_algorithm)
  dir <- file.path(root, relative_dir)
  dir_empty(dir)
  with_dir(dir, test_that(desc = desc, code = code))
}

wlandau-lilly · 2017-08-25T18:20:45Z

Confirmed: everything looks totally clean with the new test_with_dir() (Red Hat and Windows 7, all testing configurations).

wlandau-lilly changed the title ~~Use withr::with_dir for unit tests~~ Isolate unit tests in a self-contained file system Aug 24, 2017

wlandau-lilly added type: bug type: new feature and removed type: bug labels Aug 24, 2017

wlandau-lilly removed the type: new feature label Aug 24, 2017

wlandau-lilly changed the title ~~Isolate unit tests in a self-contained file system~~ Isolate each unit test in a self-contained file system Aug 24, 2017

wlandau-lilly added the wlandau-lilly has a patch and is waiting for permission to release it label Aug 25, 2017

wlandau-lilly modified the milestone: v4.1.0 CRAN release Aug 25, 2017

wlandau-lilly mentioned this issue Aug 25, 2017

58 test configs #59

Closed

wlandau-lilly closed this as completed in 5e68bfb Sep 1, 2017

wlandau-lilly removed the wlandau-lilly has a patch and is waiting for permission to release it label Sep 6, 2017

wlandau-lilly mentioned this issue Sep 7, 2017

Get rid of dclean() #85

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Isolate each unit test in a self-contained file system #74

Isolate each unit test in a self-contained file system #74

AlexAxthelm commented Aug 24, 2017 •

edited

Loading

AlexAxthelm commented Aug 24, 2017

wlandau-lilly commented Aug 24, 2017

wlandau-lilly commented Aug 24, 2017

AlexAxthelm commented Aug 24, 2017

wlandau-lilly commented Aug 24, 2017

wlandau-lilly commented Aug 25, 2017 •

edited

Loading

wlandau-lilly commented Aug 25, 2017

wlandau-lilly commented Aug 25, 2017 •

edited

Loading

wlandau-lilly commented Aug 25, 2017 •

edited

Loading

AlexAxthelm commented Aug 25, 2017

wlandau-lilly commented Aug 25, 2017 •

edited

Loading

wlandau-lilly commented Aug 25, 2017 •

edited

Loading

wlandau-lilly commented Aug 25, 2017

Isolate each unit test in a self-contained file system #74

Isolate each unit test in a self-contained file system #74

Comments

AlexAxthelm commented Aug 24, 2017 • edited Loading

AlexAxthelm commented Aug 24, 2017

wlandau-lilly commented Aug 24, 2017

wlandau-lilly commented Aug 24, 2017

AlexAxthelm commented Aug 24, 2017

wlandau-lilly commented Aug 24, 2017

wlandau-lilly commented Aug 25, 2017 • edited Loading

wlandau-lilly commented Aug 25, 2017

wlandau-lilly commented Aug 25, 2017 • edited Loading

wlandau-lilly commented Aug 25, 2017 • edited Loading

AlexAxthelm commented Aug 25, 2017

wlandau-lilly commented Aug 25, 2017 • edited Loading

wlandau-lilly commented Aug 25, 2017 • edited Loading

wlandau-lilly commented Aug 25, 2017

AlexAxthelm commented Aug 24, 2017 •

edited

Loading

wlandau-lilly commented Aug 25, 2017 •

edited

Loading

wlandau-lilly commented Aug 25, 2017 •

edited

Loading

wlandau-lilly commented Aug 25, 2017 •

edited

Loading

wlandau-lilly commented Aug 25, 2017 •

edited

Loading

wlandau-lilly commented Aug 25, 2017 •

edited

Loading