-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Seeding #18
Comments
For the case of initializing multiple instances of a PRNG, using a seed sequence generator to produce multiple seeds is a generic approach to seeding. However, this has the risk of producing overlapping, correlated, or even identical random number sequences. There are several ways to reduce this risk (some of which are PRNG-specific):
For PRNGs that support jump-ahead or "one sequence per seed", there should be a way to allow these PRNGs to use an alternative strategy to initialize multiple PRNGs based on a common seed. The use of a seed sequence generator can remain the default behavior, especially because the risk of overlap is often negligible (S. Vigna, "On the probability of overlap of random subsequences of pseudorandom number generators", 2020). |
@curiousleo Thanks for the excellent summary! Another consideration is saving and restoring of generator state. It could be important for long-running simulations for creation of snapshot which allows to continue from latest snapshot instead of replaying hour and possibly days of compute. Single AFAIR crypto RNGs allow to add additional entropy midcalculation but I'm not sure about this. |
I think here is an example API that seems to allow such reseeding: https://hackage.haskell.org/package/crypto-api-0.13.3/docs/Crypto-Random.html#v:reseed I don't think we need to create the most general API though. We need to be careful with feature creep. I'd hate it to mess up our efforts. Please guys, correct me if I am wrong, but in my mind the goal for |
Totally agree. In fact I'm somewhat wary of CPRNGs. Requirements are just so different. So we should probably leave these to crypto libraries. They will hopefully know what they're doing I just more comfortable with "we decided to leave this thing out" than "No one thought about this" |
Rather, the Thus, it would not be within the scope of this package to "reseed" an existing PRNG (that is, initialize it in the same way as initializing a new PRNG) or to add "entropy" to an RNG to increase its "randomness", and it may or may not be within its scope to save and restore a PRNG's state, which is a very complicated matter because restoring the state this way requires knowing the exact PRNG algorithm and keeping it unchanged. To be clear, for the use case of initializing a new (noncryptographic) PRNG "without specifying a seed at all" (use case 1 in the opening post), it's appropriate to generate a seed using a cryptographic RNG (such as one provided by the operating system) or with an entropy source. |
@Shimuuar wrote:
We already have Lines 208 to 209 in 656ef17
Did you mean that there should be It's good you mention this though:
This is exactly what O'Neill's |
We've stepped away from the idea of a default split implementation for now. Closing. |
Initialising PRNGs correctly is hard. For casual users in particular, it should be easy to Do The Right Thing.
Typical things users should be able to do and the common solutions are:
Solution: draw non-deterministic randomness from a system source to create a good seed
Solution: mix user-provided seed deterministically to turn low-entropy seed (e.g. with lots of zero bytes) into high-entropy seed
Solution: provide a seed sequence with a
spawn
function, see belowPrior art
Some RNG APIs have a separate "seeding" layer that takes care of this, see #13.
C++
randutils
/seed_seq
seed_seq
idea from the C++ standard library; this is a kind of low level PRNG that can only fill arrays of 32-bit numbers with random values, nothing else - its only purpose is to seed other (more convenient and faster) PRNGs.numpy
SeedSequence
SeedSequence
without parameters uses a system randomness source to seed itSeedSequence.spawn
initialises a number of newSeedSequence
instances. The unique spawn key is hashed into the sequence's entropySeedSequence.generate_state
returns an array of random numbers for seedingSeedSequence
from which they draw their seedmwc-random
create
andinitialize
combine a user-provided seed with a fixed default seed - this is a form of deterministic mixingwithSystemRandom
andcreateSystemRandom
seed the PRNG with system randomnesssplitmix
seedSMGen
uses the given seeds without mixingmkSMGen
mixes the givenWord64
before initialising theSMGen
initSMGen
uses system timenewSMGen
splits an new generator off from the global generatorrandom
right nowmkStdGen
(public) throws away all but the lowest 32 bits of the givenInt
and then creates theStdGen
viamkStdGen32
.createStdGen
(private) does the same, but takes anInteger
.split
is used to create new RNG instancesDesign considerations
Mixing user seeds
random
as it is designed right now provides no method that deterministically mixes the user seed. This is easily fixed e.g. by calling out to the relevantsplitmix
function.Spawning and splitting
In the
random
API,split
is part of theRandomGen
class. This means that there is an assumption that each RNG must be split in its own way.This is in contrast to the C++ and numpy APIs, where there is a source of seeds separate from any particular RNG implementation (
seed_seq
andSeedSequence
, respectively) which is used to seed all RNGs.For concreteness' sake, what I have in mind is something along those lines:
This is a simplified translation of the numpy / C++ approach. In the background, it would use something like this
initialize
function inRandomGen
to create RNGs.Pros of such an API with a separate, opaque source of seeds:
systemSeed
andfixedSeed
above)Seed
is meant to be used for seeding, so its design constraints are different: it must create reasonable seeds for any RNG, but it can be slower than other RNGs because in the expected use case, only a small number of seeds are required compared to the overall number of random values generated by RNGs over the course of a programCons:
Implementation notes
entropy
could be used to get entropy from the system.The PRNG behind
Seed
could berandutils
, which is also used by numpy: it passes statistical tests and is now widely usedThoughts?
Edit: renamed
Seed
toSeedSeq
to avoid clashes with the existing definition ofSeed
insideMonadRandom
.The text was updated successfully, but these errors were encountered: