Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproducibility with random numbers #313

Closed
bart1 opened this issue Mar 10, 2018 · 7 comments
Closed

Reproducibility with random numbers #313

bart1 opened this issue Mar 10, 2018 · 7 comments

Comments

@bart1
Copy link

bart1 commented Mar 10, 2018

I think this is more a documentation issue then a real problem but I'm trying to figure out how drake deals with random numbers. I was implementing my own seed setting system for simulations. Now reading through the github pages this all seems to be solved among others with #56 and quick testing seems to indicate this also works in version 5.0.0. But searching through the manual for both seed and random does not produce any results (besides random tip). I think it would be great to have some improved documentation (either vignette or man pages ) so people that do think about this can inform them self. Now I'm unsure to what extend I do or dont need to worry if I'm working with random numbers

@wlandau
Copy link
Member

wlandau commented Mar 10, 2018

You bring up a good point. I'm not sure it would fit into the main vignettes, but an explanation should be added. I think I will elaborate here on this thread and then reference it from the FAQ.

@krlmlr
Copy link
Collaborator

krlmlr commented Mar 10, 2018

This FAQ could become its own vignette at some point, because this problem feels important enough, and if we cover it in all detail, it might become too large for a FAQ.

@bart1
Copy link
Author

bart1 commented Mar 10, 2018

Thanks, keep up the good work not having to deal with seeds greatly simplifies my code.

@wlandau
Copy link
Member

wlandau commented Mar 11, 2018

@krlmlr The FAQ is its own vignette already, but it is an automatically-generated stub that links to all the issues tagged "frequently asked question". I believe you mentioned that we might expand it at some point. To avoid redundant work, that might involve scraping specifically-marked comments from the thread.

@wlandau
Copy link
Member

wlandau commented Mar 11, 2018

@bart1 #218 is also relevant here because it helps explain how drake should handle pseudo-randomness. It describes an unexpected problem that is fixed in the development version and will be included in the next CRAN release.

@wlandau
Copy link
Member

wlandau commented Mar 11, 2018

Reproducible pseudo-randomness with drake

The global seed

On your first make(), you have the opportunity to set a global RNG seed for your project. The seed is 0 unless you provide a different one. To ensure reproducibility under pseudo-randomness, subsequent make()s use this same global seed unless you completely destroy the cache and pick another seed. Drake is too opinionated about reproducibility to let you pick another seed unless you destroy the cache and start from scratch. Use read_drake_seed() or read_drake_config()$seed to get the global seed of your project.

Target-level seeds

Drake uses the global seed to generate a separate seed for every target. Each target gets a different seed, and the seed is always the same given the same global seed and target name. Drake builds each target with its seed using withr::with_seed(). To retrieve the seed used to build a given target, call diagnose(your_target)$seed.

Example

Modified from @krlmlr's online example.

library(drake)
clean(destroy = TRUE)
random <- function(...) {
  list(...)
  runif(1)
}
plan <- drake_plan(random1 = random(), random2 = random(random1), random3 = random(random2), 
  random4 = random(random2, random3))
make(plan)
#> target random1
#> target random2
#> target random3
#> target random4
readd(random2)
#> [1] 0.2083636
readd(random4)
#> [1] 0.08183787
# Remove random2 from the cache.
clean(random2)
# Now, we will need to make random2 all over again.  If the value of random2
# changes, we will also need to re-make random3 and random4.  But random2
# will not change because we preserved the seed.
make(plan)
#> Unloading targets from environment:
#>   random4
#>   random3
#>   random2
#> target random2
readd(random2)
#> [1] 0.2083636
readd(random4)
#> [1] 0.08183787
# Start over from scratch with a new seed.
clean(destroy = TRUE)
make(plan, seed = 1)
#> Unloading targets from environment:
#>   random2
#>   random1
#> target random1
#> target random2
#> target random3
#> target random4
readd(random2)
#> [1] 0.07983418
readd(random4)
#> [1] 0.6617457

Caveat

So far, everything I have said only applies to the development version of drake. The CRAN release is a little behind right now, and it will catch up when version 5.1.0 rolls out.

Thanks

@wlandau wlandau closed this as completed Mar 11, 2018
@wlandau wlandau changed the title random numbers Reproducibility with random numbers Mar 11, 2018
@wlandau
Copy link
Member

wlandau commented Mar 11, 2018

FYI: I just updated the FAQ vignette and the accompanying page on the pkgdown site.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants