-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pseudo-randomness is not totally reproducible #218
Comments
To expand a bit: It appears that the random seed isn't fixed automatically. The simple example runs a plan with random output, then reruns parts of it. The output looks different for the two cases. I think we should be fixing the random seed for each target, for reproducibility. Issue in remake: richfitz/remake#106. |
@aedobbyn the basic example takes random bootstraps from mtcars. Are you looking for more randomness than that? Maybe something that explains how drake tries to handle reproducible random numbers? @krlmlr the way drake approaches reproducible random seeds is in R/random.R and tests/testthat/test-random.R. drake preserves the random seed using withr::with_seed() and salts that seed for individual targets. Sorry I did not provide links. The internet connection crashed here and I am using my phone. |
Also, the seed is stored in the config object, which you can recover with read_drake_config (). So drake does try to be reproducible under randomness. What I worry about is that the seed is salted deterministically for different targets based on the target name (in build_target()). Should the salting be random? |
Thanks for the pointers! The example appears to do something different than the tests: I'm not clearing everything, rather I'm deleting an intermediate target. |
Hmm... It should still work then. I wonder what the problem is. |
Kirill, in that example, did you have anything leftover in your cache from a previous make ()? I am starting to think the problem is that drake does not try to load the root seed if one is already cached. See where get_valid_seed() is called in drake_config(). |
We should also probably add a |
And a new test in |
In that example I cleaned the Would it work if the seed was only a function of the target name, and would be set unconditionally? Users shouldn't be worried about the random seed at all, things should just work. |
Things would run, yes. But I think the result should at least depend on the random seed of the R session that first created the cache. That way, |
I think I fixed this issue as of 307bbe1. The help file of the new read_drake_seed() function shows what Before, I was concerned about preserving the mathematical properties of pseudo-randomness when it came to deriving target-specific seeds from the root seed, but after talking with you both today, I am less worried. |
Agreed to the part about running the same project with different random seeds. How would you now embed the random seed in the workflow, so that different seeds are used with the same cache? In the following plan, I see different but stable results for the same rule: library(drake)
unlink(".drake", recursive = TRUE)
plan <-
drake_plan(
data1 = 1,
data2 = 1,
data3 = 1,
f_data1 = runif(data1),
f_data2 = runif(data2),
f_data3 = runif(data3)
)
set.seed(123)
make(plan)
#> cache /tmp/RtmpavUNa7/.drake
#> connect 1 import: plan
#> connect 6 targets: data1, data2, data3, f_data1, f_data2, f_data3
#> check 1 item: runif
#> check 3 items: data1, data2, data3
#> target data1
#> target data2
#> target data3
#> check 3 items: f_data1, f_data2, f_data3
#> target f_data1
#> target f_data2
#> target f_data3
f_data1
#> [1] 0.414735
f_data2
#> [1] 0.9176192
f_data3
#> [1] 0.4733941
Is there a more elegant way to write this? Does |
The user can set the seed before Does that answer your questions? Do you think any of this behavior should be changed? |
Right now, there is no warning. My current opinion is that The R session's |
Then again, target names could change without the project's seed changing. I think we'll have to use a |
Like you said, New cache,
|
Update: we should only throw an error if the non-NULL user-defined seed is not the same as the one in the cache. That piece makes it work without friction. |
@krlmlr did all the work for me here: http://rpubs.com/krlmlr/drake-random-seed
The text was updated successfully, but these errors were encountered: