Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enable switching the default RNG (fix #23199) #23205

Closed
wants to merge 1 commit into from

Conversation

rfourquet
Copy link
Member

@rfourquet rfourquet commented Aug 10, 2017

Thanks to #265 fixed, we can use globalRNG() instead of GLOBAL_RNG, and a user can redefine this function to return a custom RNG. Even if the multi-threading story is not clarified for the global RNG, at least it doesn't seem to me to make things worst in this respect. Moreover, it's a good way for library author to not make assumptions about the default RNG.

EDIT: the first version of this PR was simply defining globalRNG() = GLOBAL_RNG, and changing GLOBAL_RNG to globalRNG() everywhere. Following the discussion, which pointed at the bad API of redefining a Base function (via Base.Random.globalRNG() = myrng), it is updated as follows: it's possible to change the type of the default RNG, which can be got via defaultRNG(), using the srand function: srand(RNG_Type, [seed]) is the same as srand(seed) if defaultRNG() is of type RNG_Type; otherwise initialize with seed a new RNG of the specified type and make it the new default RNG.

@rfourquet rfourquet added needs news A NEWS entry is required for this change randomness Random number generation and the Random stdlib labels Aug 10, 2017
@yuyichao
Copy link
Contributor

While this might be a good attempt, the fix of #265 doesn't mean you can override base (or other packages') functions and expect everything (especially precompilation) to work correctly or efficiently.

@rfourquet
Copy link
Member Author

Yes right, I forgot to mention my uncertainty about the validity of this PR. Also, I meant to check the perf: @nanosoldier runbenchmarks("random", vs=":master")

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - no performance regressions were detected. A full report can be found here. cc @ararslan

@rfourquet
Copy link
Member Author

So I don't understand if this solution makes unreasonable assumptions that everything will work. What wrong happens if a user overwrites with Base.Random.globalRNG() = MyRNG()? Merge or close?

@StefanKarpinski
Copy link
Member

I'm also unclear on what's wrong with this.

base/random.jl Outdated
@@ -17,7 +17,7 @@ export srand,
randperm, randperm!,
randcycle, randcycle!,
AbstractRNG, MersenneTwister, RandomDevice,
GLOBAL_RNG, randjump
GLOBAL_RNG, globalRNG, randjump
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably call it globalrng, otherwise the capital RNG makes it look like camel case.

@tkelman
Copy link
Contributor

tkelman commented Aug 19, 2017

Is the concern that precompiled package files could get used without having references to the global RNG invalidated if you overwrite this?

@rfourquet
Copy link
Member Author

Bump; are there some complications which make it difficult to decide whether this is correct or not?

@StefanKarpinski
Copy link
Member

A response from someone saying that there are problems with this would be helpful – @vtjnash, @yuyichao

@yuyichao
Copy link
Contributor

  1. It'll at least not be efficient since you'll recompile everything that needs random numbers.
  2. It's a pretty bad API since it's basically type piracy.

@StefanKarpinski
Copy link
Member

  1. It'll at least not be efficient since you'll recompile everything that needs random numbers.

Recompiling everything that depends on the global RNG is unavoidable if we want to be able to change the global RNG and have random number generation be fast. This is likely to only happen once in a process, so I think that's perfectly acceptable.

  1. It's a pretty bad API since it's basically type piracy.

As I understand the term, Base cannot commit type piracy by definition, so I'm not sure what you mean by this.

@ViralBShah
Copy link
Member

ViralBShah commented Aug 24, 2017

Not sure why this is type piracy. But, I do feel that we do not want to encourage APIs to change global state. Maybe it is best to pass an extra first argument if someone wants to use a different RNG?

This also ends up changing the RNG (when the user supplies their own) for any other package that has only been tested with the default one, and things could also break in unexpected ways.

@yuyichao
Copy link
Contributor

Recompiling everything that depends on the global RNG is unavoidable if we want to be able to change the global RNG and have random number generation be fast.

Not really. https://github.com/yuyichao/FunctionWrappers.jl #13984 Even with the cfunction overhead the mean of rand() only increases from 4ns to 7ns for rand and it'll be even faster with a more integrated implementation.

As I understand the term, Base cannot commit type piracy by definition, so I'm not sure what you mean by this.

Base cannot, what I said is that the "API" is. The API is redefining a base method, that's by definition the worst kind of type piracy. (It not even a specilization without new types but a overwrite of the old method.)

But, I do feel that we do not want to encourage APIs to change global state.

I agree in general though that discussion should go to #23199. The rand() API already requires global states (and I certainly hope we are not removing that since it's so useful) so at least it's not a whole new global state. That's why my comments are focused on this implementation.

@rfourquet
Copy link
Member Author

rfourquet commented Aug 25, 2017

Changing the global RNG is not something that would be particularly encouraged; but I feel it's good that libraries using implicitly the global RNG don't rely on special features/particularities of MersenneTwister. So if switching the global RNG is enabled, it could uncover unstated assumptions and encourage library writers to make an API where an RNG can be specified.

As for recompiling everything, again it would not be encouraged, but would give a means to people needing this feature, if they are ready to pay this recompile price.
I don't see this kind of "type piracy" a problem here, it can be an exception: Julia makes the rules, and Julia can choose to make "redefining one specific base method" its API. If it's valid technically, I would favor this approach. In any case, this PR in the current state doesn't document that it's possible to change the global RNG, so in itself it's not incorrect.

Edit: I tried the alternative of having const GLOBAL_RNG = Ref{AbstractRNG}(MersenneTwtister()), but it's about one order of magnitude slower for scalar rand.

@StefanKarpinski
Copy link
Member

@yuyichao, you've got to be a bit more constructive and explicit here. Links to some random wrapper function things doesn't really explain what you're thinking or how to solve this problem better. Scalar rand() going from 4 ns to 7 ns is not really acceptable performance – that's nearly a 2x slowdown, and unlike this solution, it's not a one-time compilation cost, it's a cost that every user of random numbers pays all the time, no matter if they change the default RNG or not. That's not an acceptable cost. Recompiling a ton of code once once at the start of a processes that wants to change the global RNG is an acceptable cost.

You've also shifted the goal posts / objections here. You started out claiming that this wouldn't be efficient or correct, but you refuse to provide details to back that claim up. Instead you've shifted to an argument that you: don't like the compilation this causes, and think it's a bad API. I agree with the API (see below for an alternative proposal), but this implementation does seem to work and have the desired performance characteristics, so for any claim to the contrary, the burden of proof is on you. If you think this is broken, why is it broken? If you think there's a better way to do this with equally good performance, what is it? Provide details, not just some vague links and claims.

Regarding API, @yuyichao is quite right that having people redefine the Base.Random.globalRNG() method is a bad interface and dangerously liberal. Something more constrained would be better, possibly like this:

# in Base.Random
global_rng_type() = MersenneTwister

function set_global_rng_type(T::Type{<:AbstractRNG})
    if T !== global_rng_type()
        global global_rng_type() = T
    end
    return T
end

get_global_rng_type() = global_rng_type()::Type{<:AbstractRNG}

This hides the method redefinition as the mechanism for accomplishing this, which leaves us free to implement this in a better way in the future if we come up with one.

We should consider requiring an RNG type when the seed is set so that programs that set a seed will continue to produce repeatable random sequences even if Base changes the default RNG type. If we don't require providing an RNG type when seeding, we should at least document that seeding without providing an RNG type will not be future proof.

@vtjnash
Copy link
Member

vtjnash commented Aug 25, 2017

    global global_rng_type() = T

This should be a syntax error, since we don't currently generate correct code for it. Jarrett's new validation pass detects it, but currently not lowering.

This is a bad API, since it's really confusing. You if you call set_global_rng(RNG); rand(), you don't get the answer that you want. It should use global variables for global data:

global RNGType = MersenneTwister
function set_global_rng_type(T)
    (T <: AbstractRNG) || throw(ArgumentError("invalid RNG type"))
    global RNGType = T
    nothing
end

@yuyichao
Copy link
Contributor

Links to some random wrapper function things doesn't really explain what you're thinking or how to solve this problem better.

The wrapper function things is not random and I even linked the issue it is a prototype of in case anyone didn't read the README. The frequent exposure on discourse made me think it doesn't need too much explaination for what it is doing but I guess that could be wrong. In anycase though, the issue explicitly says that this is for callback registration, which is exactly what this is about.

Scalar rand() going from 4 ns to 7 ns is not really acceptable performance – that's nearly a 2x slowdown

Well, as I said, the cost is basically from the current implementation (cfunction), a more integrated one would be much better. And the point is that it's already much faster compare to a type unstable version proposed otherwise.

You've also shifted the goal posts / objections here. You started out claiming that this wouldn't be efficient or correct, but you refuse to provide details to back that claim up.
this implementation does seem to work and have the desired performance characteristics, so for any claim to the contrary, the burden of proof is on you.

I did not shift away from that though I did focus more on the efficiency.
The efficiency is just about recompilation which I hope doesn't need extra proof. The fact that this is a bad API because it encourage multiple modules to redefine a function in a conflicting way is why I was worrying about correctness. Though I think that is just a little bit hard to define what is the correct behavior but the behavior could still be consistent in some way.

If you think this is broken, why is it broken?

An API that trashes precompilation is pretty broken.

If you think there's a better way to do this with equally good performance, what is it?
Provide details, not just some vague links and claims.

So I guess the point I was missing is that I think a function registering API with enough integrated support will be the way to provide this kind of API. And I don't think we are in a hurry to provide this API especially when the need for it is still questionable.

function set_global_rng_type(T::Type{<:AbstractRNG})
    if T !== global_rng_type()
        global global_rng_type() = T
    end
    return T
end

Note that this won't work, you'll need eval and this is fundamentally a toplevel operation so hiding that is arguably a even worse API.

@StefanKarpinski
Copy link
Member

An API that trashes precompilation is pretty broken.

This is what I'm talking about not being explicit – you mentioned precompilation once in a parenthetical. I can't read your mind – I don't know what you mean when you mention something like this in passing. You have to explain it or no one knows what your objection is, even if it's a perfectly valid problem.

@yuyichao
Copy link
Contributor

This is what I'm talking about not being explicit – you mentioned precompilation once in a parenthetical.

Well, but "precompilation" isn't actually the only important part. It's just one example of It'll at least not be efficient since you'll recompile everything that needs random numbers. I never thought that precompilation is more important than "everything". Until I saw this comment I actually don't know that precompilation is viewed to be more important than recompiling everything, including sysimg/precompiled modules/everything else compiled before doing the redefinition.

@StefanKarpinski
Copy link
Member

So I guess the point I was missing is that I think a function registering API with enough integrated support will be the way to provide this kind of API. And I don't think we are in a hurry to provide this API especially when the need for it is still questionable.

Ok, now we're getting somewhere. We do need to hurry, however, because, as I said, the current API for srand doesn't require explicitly passing in the RNG type. That means that we either can't change the default global RNG type ever or if we do so, it will break any code that relies on a particular repeatable sequence of pseudo-random numbers.

If we're really confident that we will be able to support changing the global RNG type but we just don't have the technology in place yet, then we could require passing in the RNG type when setting the seed now but just not support that argument being anything other than MersenneTwister. That would be a silly API if we were never able to change the global RNG, so we kind of need know now for real if we're going to be able to change the global RNG. Even an implementation of that functionality that is horrifically broken and comes with huge warnings but is still usable in some cases is better than nothing.

Note that this won't work, you'll need eval and this is fundamentally a toplevel operation so hiding that is arguably a even worse API.

The point of hiding it behind an API is that we can then change the implementation to something better. If the API is "use eval" then code that does that will continue to do that and be broken forever. How is that better than a good API with a bad implementation that we can improve?

@StefanKarpinski
Copy link
Member

Recompiling everything is not great but it's not the end of the world – if it's documented that changing the global RNG type forces recompilation of a ton of code, then people know they can do it but there's a cost. For some users, that's worth it. If we can eliminate that cost without sacrificing performance of rand() in the future, great. But in the meantime, we need this API badly, even with an imperfect implementation.

@yuyichao
Copy link
Contributor

If we're really confident that we will be able to support changing the global RNG type but we just don't have the technology in place yet, then we could require passing in the RNG type when setting the seed now but just not support that argument being anything other than MersenneTwister.

I'm confident that we can have really low cost callback registration but I don't think that'll have to come in an api like passing a RNG type to srand. In any case though, it will not be a breaking change any more than suddently allowing srand (and whatever global state it sets) to take a different type other than MersenneTwister so I don't see why we need to be in a hurry. Even if it will be passing a different type, we can just add that signature, the addition of the signature will still be non-breaking (other than code that use it can break the assumption of other code, which is basically shared between all approaches).

@yuyichao
Copy link
Contributor

The point of hiding it behind an API is that we can then change the implementation to something better. If the API is "use eval" then code that does that will continue to do that and be broken forever. How is that better than a good API with a bad implementation that we can improve?

Seems that you have implicitly assumed that this is an API that can stay (I'm trying to guess here), in that case, yes, having it here as a place holder can be better. However, I don't think that is the case and even if it is the API suggest that it changes the global state immediately so the behavior (that it does not change the global state immediately) will be pretty surprising to the users.

But in the meantime, we need this API badly, even with an imperfect implementation.

Still, why do we need a bad implementation for an API that can be added later?

@StefanKarpinski
Copy link
Member

StefanKarpinski commented Aug 25, 2017

Let me be crystal clear about why we need this API now. Here's the scenario without an srand that takes an RNG type:

  1. We release Julia 1.0 with srand(123) and no ability to set the RNG type.
  2. People write code that uses srand(123) expecting to get the same sequence of pseudorandom numbers every time they run their code.
  3. We release Julia 1.3 or Julia 2.0 with a different default RNG type.
  4. People run that code they wrote for Julia 1.0 and get a different random number sequence.
  5. People are angry because we made a change that broke their code.

If the RNG change is in Julia 2.0, then we could just deprecate srand and force people to be explicit about what RNG they want. But if we change default RNGs (let's say we discover that there's actually something terribly broken about our MersenneTwister setup) in 1.x we cannot deprecate srand(123) so we have no choice but to just keep having a completely broken RNG by default. This is not hypothetical, this has happened over and over again to languages:

I'm trying not to get stuck in the same trap. Instead, I want the process to look like this:

  1. We release Julia 1.0 with srand(MersenneTwister, 123) as a random seeding API.
  2. People are slightly puzzled that they have to pass a seemingly-useless RNG type argument srand but they do it since that's the API.
  3. We release Julia 1.3 with PCG as a default RNG – it's faster, uses less memory and has better statistical properties.
  4. People who don't need to seed get faster, better random numbers.
  5. People who wrote code for Julia 1.0 never notice because we still provide the MersenneTwister RNG type and their code explicitly passes it to srand.

@StefanKarpinski
Copy link
Member

In any case though, it will not be a breaking change any more than suddently allowing srand (and whatever global state it sets) to take a different type other than MersenneTwister so I don't see why we need to be in a hurry.

That is not a breaking change. A breaking change is a change to an API that causes someone's code to stop working altogether or to silently work differently. Allowing something that was previously an error is not a breaking change.

@yuyichao
Copy link
Contributor

yuyichao commented Aug 25, 2017

We release Julia 1.3 with PCG as a default RNG

OK, now that (i.e. changing default RNG in Base instead of providing an API to change the default RNG) is never mentioned anywhere in either this thread or the linked issue.
And still, I don't see how that's less breaking. People that want to access the global RNG (i.e. whatever global state it sets) will still notice that its type changed. As I mentioned earlier it seems that this is also assuming that the srand API will be final, which I think needs proving. The uncertainty about what an user level switchable RNG API should look like is the whole reason that I don't think we should rush this (and in case of confusing, this is a somewhat different concern for why I think the implementation in THIS PR is bad).

@rfourquet
Copy link
Member Author

Building on Stefan's API suggestion, but updated (as the actual RNG object has to be initialized somewhere), here is a suggestion which doesn't seem to suffer any performance penalty, and is in line with Stefan's desire to pass the RNG type to srand:

defaultRNG() = GLOBAL_RNG

function srand(::Type{T}, seed=nothing) where T <: AbstractRNG
    if defaultRNG() isa T
        srand(defaultRNG(), seed)
    else
        @eval let rng = $T($seed)
            global defaultRNG() = rng::$T
        end
    end
end

assuming MyRNG(::Void) = MyRNG(), similarly for srand. Again, I don't really understand whether this is correct technically, but I quite like it API-wise. With a warning that using srand without specifying the type is not future-proof.

As for recompilation: I think that using this feature to change the default RNG type is not to be used in library code, only in user applications or in the REPL. So I don't see different libs fighting against each-other to set the default RNG. Moreover, it should be considered bad style in library code to use the global RNG. The recommanded API would be libfunction(..., rng=defaultRNG()). So what will be recompiled is only the trivial wrapper libfunction(...) = libfunction(..., defaultRNG()), so it should not be expensive.

@rfourquet
Copy link
Member Author

providing an API to change the default RNG) is never mentioned

This is the title of the PR.

the implementation in THIS PR is bad

It would be helpful to distinguish "bad" in terms of technical correctness (i.e. it can crash, or misbehave) vs "bad" in terms of "i don't like that it recompiles everything", the second one being a personal preference.

@rfourquet
Copy link
Member Author

I have no say in statistics, but would it be crazy to do the following instead of freezing the random sequence in tests: don't seed the RNG, but run the test a second time if it fails the first time, and report failure only if it fails the second time?

The many uses of srand in Base's tests, when they apply to the implicit RNG, are broken IMHO. I would be willing to fix them before 1.0. As said by Stefan, if you need reproducibility, you get to specify the seed and the RNG. I don't see the problem of library code offering the convenience of using the default RNG, but having tests only with a specified RNG, at least for the tests which can fail "easily" if the sequence is truly random. And with the possibility to switch the default RNG like in this PR, it's even possible to have reproducible tests with the implicit RNG.

For my taste, the status quo is not good enough for 1.0: I'm now more convinced that when no explicit RNG object is passed, at least we should either disable srand(seed), or enable speciying the RNG type (srand(MersenneTwister, seed)). Tangentially, I think we may call the future-proof RNG MersenneTwister_1 (1 like in version 1.0), so that we can still improve MersenneTwister (I will try to do all the improvements I have in mind before 1.0, but still), but this may belong to another issue.

Regarding seeding without needing reproducible sequences: FWIW, I think it's safer to use the randjump method than initializing independant RNGs with totally random seeds (to not have overlapping sequences).

@yuyichao
Copy link
Contributor

yuyichao commented Aug 28, 2017

don't seed the RNG, but run the test a second time if it fails the first time, and report failure only if it fails the second time?

Well, that won't make it pass 100% of time either.

I don't see the problem of library code offering the convenience of using the default RNG, but having tests only with a specified RNG, at least for the tests which can fail "easily" if the sequence is truly random.

Note that this was mentioned only about a blackbox Base.rand that can't be seeded and I agree this is what library should do and I'm only asking how it can be done (edit: in that context).

And with the possibility to switch the default RNG like in this PR, it's even possible to have reproducible tests with the implicit RNG.
For my taste, the status quo is not good enough for 1.0

I'll not repeat the technical reasons why the implementation here is bad (or the definition of bad).
I will repeat though that I've already said that adding such an API in 1.x can be non-breaking and only if we DON'T do this now.

Tangentially, I think we may call the future-proof RNG MersenneTwister_1 (1 like in version 1.0), so that we can still improve MersenneTwister (I will try to do all the improvements I have in mind before 1.0, but still), but this may belong to another issue.

That renaming sounds fine. Though just name the improved one MersenneTwister_2 before switching the name for 2.0 can also be good enough. After all, we don't want to rename everything *_1 in 1.0 just in case they can have an improvement that subtly breaking.

@rfourquet
Copy link
Member Author

Well, that won't make it pass 100% of time either.

No, but could make the failure rate sufficiently small that it's acceptable (but maybe it's no faster that increasing the running time of one test to have better success rate?). It's no such a good solution as it does not allow to check that if succeeds most of the time, but freezing the sequence makes the test succeed for only, well, one sequence...

I will repeat though that I've already said that adding such an API in 1.x can be non-breaking and only if we DON'T do this now.

I know that you already said it, it doesn't mean I can not have another opinion. The definition of "breaking" is still blury, but if we change the default RNG type, won't people using srand(seed) have their code/test "break" if they rely on a specific sequence? I don't insist on this PR's implementation, but was also saying that I believed disabling srand(seed), or enabling srand(MersenneTwister, seed) (even if no other RNG type can be specified) would help make things less breaking (and "MersenneTwister_n" for a frozen variant).

As for the name, I would prefer to keep "MersenneTwister" for the most up-to-date/efficient implementation.

@yuyichao
Copy link
Contributor

I know that you already said it, it doesn't mean I can not have another opinion.

Well I just mean that you said your reasoning for what was following that part was For my taste, the status quo is not good enough for 1.0. (So ref #23205 (comment))

The definition of "breaking" is still blury,

It's pretty well defined. If you have a whole program and it's behavior changes in a way that you can observe that's breaking. Also, changes in one package breaking another is not breaking even if the change in the first package uses new Base features.

but if we change the default RNG type, won't people using srand(seed) have their code/test "break" if they rely on a specific sequence?

Exactly, that's why I said only the limited version of a unseedable Base.rand can have their RNG swiched without being breaking. (And again, the discussion about that is only about what packages and RNG users should do, not about how to make it non-breaking).

I don't insist on this PR's implementation, but was also saying that I believed disabling srand(seed), or enabling srand(MersenneTwister, seed) (even if no other RNG type can be specified) would help make things less breaking (and "MersenneTwister_n" for a frozen variant).

As long as the Base.rand RNG is still accessible, changing default RNG will be breaking.

It's unclear to me whether you are talking about changes preparing for a 1.x change of default RNG or a user API to change the RNG. This comment is AFAICT the first time you explicitly talk about "we change the default RNG type" and I was assuming all your previous comments are about user API to change the RNG.
I'll repeat that for change of RNG, the limited version is certainly implementable and the discussion is not about what we should do now, rather to finalize the design and talk about how it should be used to satisfy the non-breaking requirement. And for user API to change the RNG we should only have non-method-overwrite based method and it can be added at any point without being breaking when pieces to implement it are ready, possibly post-1.0 .

As for the name, I would prefer to keep "MersenneTwister" for the most up-to-date/efficient implementation.

Changing what that name refer to is breaking. Adding new names isn't.

@rfourquet
Copy link
Member Author

OK, sorry if I was not clear enough: I certainly don't think allowing the user to switch the default RNG is urgent, but that doing something about srand(seed) is necessary before 1.0. It was helpful that you give your definition of "breaking" (I think you already implied it when you referred to "document approach" but I didn't fully grasp it).
So I infer from what you said: to be non-breaking, the default RNG must be non-seedable and, according to

As long as the Base.rand RNG is still accessible, changing default RNG will be breaking.

it must be non-accessible? (I think I misunderstand) Does that mean hiding GLOBAL_RNG, e.g. like it was when we were using the global RNG from the dSFMT library, or using MersenneModule.rand instead?
For me, it's good enough that GLOBAL_RNG is not exported or that it's documented to not have a fixed type. For the "seedable" part: either making the default RNG non-seedable, or being able to seed it but having to specify it's type seems fine for me. I have a more liberal definition of "breaking": documenting that MersenneTwister_1 is future-proof and MersenneTwister will break/improve is enough. It can break code of people insisting on ignoring the warning (but they have another way: use "MersenneTwister_1"), but it makes the code of people wanting the latest version future-proof (with your naming scheme, they have no way to specify that AFAICS).

@yuyichao
Copy link
Contributor

it must be non-accessible?

Correct.

(I think I misunderstand) Does that mean hiding GLOBAL_RNG, e.g. like it was when we were using the global RNG from the dSFMT library, or using MersenneModule.rand instead?

AFAICT from Stefan's comment, there will be no Base.GLOBAL_RNG but there will be a MersenneTwister.GLOBAL_RNG. Base.rand will not use MersenneTwister.GLOBAL_RNG.

For me, it's good enough that GLOBAL_RNG is not exported or that it's documented to not have a fixed type.

It's not good enough if people can't avoid using it. From what I see in base and package code and from the necessity to provide a default RNG for both base and package functions, I don't think that's doable.
OTOH, we should be able to completely hide Base.GLOBAL_RNG for a unseedable Base.rand (closure) and even if we don't, packages can use MersenneTwister.GLOBAL_RNG.

documenting that MersenneTwister_1 is future-proof and MersenneTwister will break/improve is enough. It can break code of people insisting on ignoring the warning (but they have another way: use "MersenneTwister_1"), but it makes the code of people wanting the latest version future-proof (with your naming scheme, they have no way to specify that AFAICS).

As long as you give alternatives for all the usage, that'll be fine. The only important part of unseedable Base.rand is that you still have an accessible global RNG that people can use. Documenting that a function/variable (Base.srand() or Base.GLOBAL_RNG) should not be accessed is fine (that's what internal detail is), documenting that a feature that people actually use (a seedable and accessible default RNG) cannot be accesed is not fine (that's what missing feature is).

@StefanKarpinski
Copy link
Member

Please stop putting nonsense in my mouth that I didn't say – it's confrontational and not constructive. Try to understand what I'm saying instead and why I'm saying it.

But you didn't say explicitly what you think package should do. So I have to guess.

You habitually make the least charitable possible interpretation of what I’ve said. This is infuriating, rude and unconstructive. If you don’t know what I mean, you can ask, politely, assuming that I am not an idiot and have something reasonable in mind. I may have missed some serious problem, but assuming the worst does not help us get anywhere useful – it just makes discussing things with you unpleasant and unproductive.

I'll repeat that I think packages that uses RNG should by default use a default RNG and not requiring the user to pass it in. That can either be implicitly done by

  1. Using Base.rand, in which case it can't have reproducible tests.
  2. Using MersenneTwister.rand, in which case it can't see the switch.
  3. Using a global RNG as default, in which case it needs a global RNG, which cannot be the one used by Base.rand so they won't see the switch.

I've also given a example in #23205 (comment) just for Base and I don't see any explaination of which way we should take.

As @rfourquet and I have both explained, packages should be written to accept an explicit RNG. If the user doesn't provide one, they can use the default RNG. When a package tests something about itself that requires reproducible sequences, the test should seed a specific RNG and use it.

You’ve stated that you don't feel that packages should work like this, which is just, like, your opinion, man. Since you haven’t explained any alternative that is future-proof, I have a hard time seeing your point of view here. Yes, you keep linking to some issue about function wrappers but NO ONE ELSE UNDERSTANDS WHAT YOU MEAN BY THAT OR HOW IT RELATES TO CHANGING THE DEFAULT RNG. Please, spell it out since we do not know what you are thinking. You keep linking to #13984. That is a single-comment issue that you wrote in 2015 talking about C++11 and cfunction, boxing and non-leaf types. Nowhere does it mention RNGs or how to change them, and you have never explained here or anywhere else that I'm aware of how it relates to chaning the default RNG – you just keep linking to it as if that was a sufficient explanation.

It would also be helpful if, instead of shooting everything down and not providing any alternatives, you tried to be constructive and describe a scheme which actually works and is future proof.

I did. #23205 (comment) and #23205 (comment)

No, that is not an explanation or a plan. It is a link to the same seemingly unrelated issue with no indication of how it relates or how it would help address this issue. I’m sure you see a connection. We can’t read your mind. Use your words – spell it out.

So far, all you say is "no, that can't be done" or "no, that's broken".

So no I did give what I think is the alternative for what this PR is supposed to do. For what you want, which is not what this PR is about, i.e. switching Base RNG, I did say that a very limited version as you described is possible and I don't have objection for doing it. Every questions since have been about whether that's worth doing.

None of those are answers. They are just comments that I’ve already read (which don’t answer the question) and links to seemingly unrelated issues. If you have a way to change the global RNG, please tell us what it is. Again, you keep linking to the same issue with no explanation of how it relates. NO ONE KNOWS HOW #13984 helps address this issue.

I've been asking about what packages and base functions need to do in different ways trying to be as explicitly as possible and including more cases as I notice them. So far I've not got an answer yet.

I have no idea what this pair of sentences means.

Am I to take it that you believe that how things are now is fine and it's perfectly acceptable and we don't need to change anything?

There are multiple (at least two) different issues being discussed here.

  1. (What this PR is about.) Give the user an API to change global RNG.

I said I think such an API is not as bad as all other API's that changes global states and would like to discuss if we want such an API in the actual issue.. In additional to saying we shouldn't use a method overwrite approach given here, I gave what I think can be the right implementation (FunctionWrapper related comment linked above) and I also said that not doing anything about it for now will make it a feature addition instead of a breaking change. I'll also be happy to discuss about how #13984 can be improved for this but that can also go to #13984 and no one seems to be interested in that in this thread (which is ok since we don't have to finalize non-breaking feature additions right now).

Again, linking to #13984 is not an explanation or a plan. I have no idea what you mean by linking to that issue. I agree that not having an API for changing the global RNG is the more conservative approach if this was the choice we had to make in isolation. However, it’s not...

  1. Switching default Base RNG.

As mentioned above, since you've already started to talk about it in this thread, I think it makes sense to talk about what packages should actually do and if that can actually be seen by the majority of users (and especially the majority of packages).

We need to figure out if we will be able to switch the default RNG or not. You seem to have something in mind involving #13984, but we don’t know what because you won’t spell it out, you just keep linking back to comments where you didn’t explain it, which link back to a seemingly irrelevant issue.

Moreover, at this point, the better approach seems to be (which you’ve agreed to) to have separate srand and rand functions for different RNGs and not allow seeding the default RNG. If we’re going to make that work and solve the same problem as changing the default RNG would solve, then we need to make breaking changes to how Base exposes its RNG before 1.0. That is why I am not ok with just doing nothing at this point.

@StefanKarpinski
Copy link
Member

So are you saying that most packages that uses RNG shouldn't need reproducible testing?

And the logic behind this is that the following are true from what you've said AFAICT.

  1. Base.rand is the only way to get a random number and see the default switch (AFAICT from enable switching the default RNG (fix #23199) #23205 (comment) and comments before that)
  2. Base.rand can't be seeded enable switching the default RNG (fix #23199) #23205 (comment).
  3. Most packages will see the default switch. enable switching the default RNG (fix #23199) #23205 (comment)

1 + 2 means that any package that sees the default switch can't be seeded so they won't have reproducible tests. Add 3 to that, most packages won't have reproducible tests.

No, this is the wrong conclusion altogether, which you could learn by asking me instead of assuming the worst. Suppose a package has a function frizzle which uses random numbers. Then it should have a frizzle(rng::AbstractRNG, args...) method which takes an explicit RNG argument. It uses that RNG argument to do whatever it does (presumably frizzle the heck out of its arguments). It can also have a frizzle(args...) = frizzle(Base.DEFAULT_RNG, args...) method which uses Base's default RNG. For tests where the behavior of frizzle depends on a particular sequence of values, it should be tested something like this:

let rng = MersenneTwister1.RNG(seed)
    @test frizzle(rng, args...) == value
end

This will keep working no matter what the default RNG is. And yes, the TESTS will not see a change to the default RNG, but that's fine – they're tests and need to be updated if the default RNG changes anyway. Actual usage of frizzle in situations where reproducible sequences aren't required will not supply an RNG and will therefore end up using Base.DEFAULT_RNG, which will change to the new RNG type – but also doesn't accept a seed so cannot be expected to be reproducible.

@yuyichao
Copy link
Contributor

You habitually make the least charitable possible interpretation of what I’ve said. This is infuriating, rude and unconstructive. If you don’t know what I mean, you can ask, politely, assuming that I am not an idiot and have something reasonable in mind. I may have missed some serious problem, but assuming the worst does not help us get anywhere useful – it just makes discussing things with you unpleasant and unproductive.

I gave the more regress logic in #23205 (comment) so that's not assuming the worst but the only thing I can infer. I also don't think that's unreasonable or the thought of an idiot. I'm happy to be proven if the tests using srand I've seen can be done in a different way. From @staticfloat 's explaination later, it doesn't seem possible, but that was certainly a mistake/wrong feeling that anyone can have.

As @rfourquet and I have both explained, packages should be written to accept an explicit RNG. If the user doesn't provide one, they can use the default RNG. When a package tests something about itself that requires reproducible sequences, the test should seed a specific RNG and use it.

Yes, and I've clearly stated in the comment you quote, my only question is how do you use a inaccessible RNG as default.

You’ve stated that you don't feel that packages should work like this, which is just, like, your opinion, man.

I didn't.

Since you haven’t explained any alternative that is future-proof, I have a hard time seeing your point of view here.

There are two issues and I've clearly stated my view on both.

Yes, you keep linking to some issue about function wrappers but NO ONE ELSE UNDERSTANDS WHAT YOU MEAN BY THAT OR HOW IT RELATES TO CHANGING THE DEFAULT RNG. Please, spell it out since we do not know what you are thinking. You keep linking to #13984.
Nowhere does it mention RNGs or how to change them, and you have never explained here or anywhere else that I'm aware of how it relates to chaning the default RNG – you just keep linking to it as if that was a sufficient explanation.

I didn't keep linking to it. I linked to it exactly once before the reminder in the last comment. I also did explained why it's related: the issue explicitly says that this is for callback registration, which is exactly what this is about. where the "this" means this PR, not switching default Base RNG ourselves. No one followed up on that so I assume it was just ignored because either the link is clear or that we don't want to discuss about post-1.0 feature addition. If that's not the case and if you don’t know what I mean, you can ask, politely.
I'm actually not sure what else more I can explain about it though. Is it that it's not clear what this PR want to do is a callback registration? Or is it not clear that my issue is about callback registration? Or maybe some other logic I'm missing in that comment.

Moreover, at this point, the better approach seems to be (which you’ve agreed to) to have separate srand and rand functions for different RNGs and not allow seeding the default RNG. If we’re going to make that work and solve the same problem as changing the default RNG would solve, then we need to make breaking changes to how Base exposes its RNG before 1.0. That is why I am not ok with just doing nothing at this point.

I agree, and that's why I've repeated said in my comment that the only problem is what you want packages to do. i.e. "how do you use a inaccessible RNG as default". I've given all what I can think of what a package can write their code but none of them covers this.

@yuyichao
Copy link
Contributor

This will keep working no matter what the default RNG is. And yes, the TESTS will not see a change to the default RNG, but that's fine – they're tests and need to be updated if the default RNG changes anyway. Actual usage of frizzle in situations where reproducible sequences aren't required will not supply an RNG and will therefore end up using Base.DEFAULT_RNG, which will change to the new RNG type – but also doesn't accept a seed so cannot be expected to be reproducible.

This means exposing the Base.rand RNG, which you have clearly clearly stated that shouldn't be used #23205 (comment) . I don't think we can call a change of type of a variable that's allowed to be publically accessed non-breaking.

@StefanKarpinski
Copy link
Member

Yes, and I've clearly stated in the comment you quote, my only question is how do you use a inaccessible RNG as default.

Ah, this is a good point to ask for clarification on. The Base.DEFAULT_RNG value would be accessible, but it would be an opaque RNG wrapper around some seeded RNG type such as MersenneTwister (or even an unseeded RNG type like RandomDevice), but without any srand method. So Base.DEFAULT_RNG is accessible but you cannot tell through any public API what default RNG it wraps. This is completely orthogonal to being able to change the default RNG, of course, so should be discussed further on a new issue. When I've said "the default RNG should not be visible" I mean that the underlying implementation type must not be visible. Obviously the default RNG has to be callable and usable some way or it may as well not exist.

@yuyichao
Copy link
Contributor

The Base.DEFAULT_RNG value would be accessible, but it would be an opaque RNG wrapper around some seeded RNG type such as MersenneTwister (or even an unseeded RNG type like RandomDevice), but without any srand method.

OK, I see. This is very non-obvious. In that case I think we can even just call it nothing and have the actual RNG be truely inaccessible with closures. And as I said, since my question is finally answered, I do think this is a good aproach to be able to switch default RNG from within base in the future and I have no objection about it.

This is completely orthogonal to being able to change the default RNG, of course, so should be discussed further on a new issue.

Agree, and I'm still open for discussion about the final API/implementation for what THIS PR want to do either here or in #13984. It shouldn't be necessary at this point since it can be 1.x but if we do want to discuss, I'll want to know where the confusion is, i.e.

I'm actually not sure what else more I can explain about it though. Is it that it's not clear what this PR want to do is a callback registration? Or is it not clear that my issue is about callback registration? Or maybe some other logic I'm missing in that comment.

@StefanKarpinski
Copy link
Member

I also did explained why it's related: "the issue explicitly says that this is for callback registration, which is exactly what this is about." where the "this" means this PR, not switching default Base RNG ourselves. No one followed up on that so I assume it was just ignored because either the link is clear or that we don't want to discuss about post-1.0 feature addition. If that's not the case and if you don’t know what I mean, you can ask, politely.
I'm actually not sure what else more I can explain about it though. Is it that it's not clear what this PR want to do is a callback registration? Or is it not clear that my issue is about callback registration? Or maybe some other logic I'm missing in that comment.

Thank you, this is the clearest explanation of what you're thinking so far. So let me see if I can accurately summarize your position:

  1. This issue (changing the default RNG) is primarily about callback registration. [RFC] Type stable wrapper for arbitrary callables #13984 will be addressed with very efficient callback registration which you believe will allow changing the default RNG via a callback mechanism in a way that will allow the default RNG to be provided as a callback with essentially zero per-call cost.

  2. The rest of the way users interact with RNGs in Base Julia is fine, so we should just leave it alone.

I'm somewhat skeptical that (1) can actually be done without large amounts of recompilation which are essentially equivalent to what this issue does. But you're brilliant at these technical performance issues and I've seen equally amazing results in the past, so I do hope that you're right and we can have such incredibly low-cost callbacks – that would be wonderful.

Regarding (2), I no longer believe that's the case and think that @rfourquet is on the same page as me. We can come up with a design here and open a different issue about it since, as you've said, this issue is about changing the default global RNG, not changing the way people interact with RNGs.

@vtjnash
Copy link
Member

vtjnash commented Aug 28, 2017

but if we change the default RNG type, won't people using srand(seed) have their code/test "break" if they rely on a specific sequence?

Exactly, that's why I said only the limited version of a unseedable Base.rand can have their RNG swiched without being breaking. (And again, the discussion about that is only about what packages and RNG users should do, not about how to make it non-breaking).

The upshot of staticfloat's analysis is that you must seed the value predictably, but that you can change the RNG and/or the meaning of the seed at any time without affecting a valid test for statistical properties. The purpose of the call is to make failures reproducible. Preventing failures a priori (by testing for the specific random number sequence) would generally be poor test design – or a least unnecessary effort from the test author.

For stabilizing a 1.0 API, it sounds like the goal is to make it possible for us to change the default, while making it easy for packages to declare a dependency on a particular implementation. For that, I propose making the following refinements to the current API:

  • Move MersenneTwister-specific logic from Base.Random to Base.dSFMT (including GLOBAL_RNG, and unexport it)
  • Define the AbstractRNG API in Base.Random to include:

export srand,
rand, rand!,
randn, randn!,
randexp, randexp!,
bitrand,
randstring,
randsubseq, randsubseq!,
shuffle, shuffle!,
randperm, randperm!,
randcycle, randcycle!,
AbstractRNG, MersenneTwister, RandomDevice,
GLOBAL_RNG, randjump

  • Make the API of Base.dSFMT the same as Base.Random, such that dSFMT.srand !== Random.srand
  • Define const DEFAULT_RNG = Base.dSFMT::Module
  • Define default methods such as Random.srand(seed::Union{Integer, Vector{UInt32}} = make_seed()) = DEFAULT_RNG.srand(seed), and similarly for rand and other functions, which dispatch to a particular module instead of an object
  • OPTIONAL: Define global_rng() = DEFAULT_RNG.global_rng(), and sed apply s/GLOBAL_RNG/global_rng()/g

@yuyichao
Copy link
Contributor

Thank you, this is the clearest explanation of what you're thinking so far. So let me see if I can accurately summarize your position:

Yes, I think it's basically the last part of #23205 (comment), which is roughly my first comment that replies to both issues at the same time.

The rest of the way users interact with RNGs in Base Julia is fine, so we should just leave it alone.

Fine in that we can add new features without breaking it. With a completely separated Base.rand and it's RNG without a corresponding Base.srand I do think that Base.srand(::Type{...}, ...) could be a good API for this PR. However, that'll be even more strictly a feature addition so we don't have to add it now. Also, having it, afaict, will mean that we need to support Base.srand(MersenneTwister, ...) to seed the default RNG after switching Base default which require the API in this PR to be ready before that. I'd prefer to have the development of the two be more decoupled if at all necessary.....

@yuyichao
Copy link
Contributor

Slightly different proposal,

  • Move MersenneTwister-specific logic from Base.Random to Base.dSFMT (including GLOBAL_RNG, and make it an accessible global MersenneTwister RNG)
  • Define the AbstractRNG API in Base.Random to include:

export srand,
rand, rand!,
randn, randn!,
randexp, randexp!,
bitrand,
randstring,
randsubseq, randsubseq!,
shuffle, shuffle!,
randperm, randperm!,
randcycle, randcycle!,
AbstractRNG, MersenneTwister, RandomDevice,
GLOBAL_RNG, randjump

  • Make the API of Base.dSFMT the same as Base.Random, such that dSFMT.srand !== Random.srand
  • Create an inaccessible global RNG GLOBAL_RNG in Base.Random with closure.
  • Define default methods such as Random.srand(rng::Void, seed::Union{Integer, Vector{UInt32}} = make_seed()) = srand(GLOBAL_RNG, seed), and similarly for rand and other functions, which dispatch to a particular module instead of an object
  • sed apply s/GLOBAL_RNG/nothing/g

@yuyichao
Copy link
Contributor

Note that the main difference is Define const DEFAULT_RNG = Base.dSFMT::Module since that'll make the Base.rand seedable with MersenneTwister.srand().

@rfourquet
Copy link
Member Author

I'm sorry to not have yet fully understood the module approach: would one of you mind clarifying what it brings compared to the following approach:

  • freeze MersenneTwister into MersenneTwister_1 like I suggested above (not a type alias, a real new type)
  • delete Base.Random.GLOBAL_RNG in favor of defaultRNG() to be hopefully more future-proof w.r.t enabling users to change the default RNG (like this PR or FunctionWrapper):
let DEFAULT_RNG = MersenneTwister_1() # will change to PCG() or MersenneTwister_2 in Julia 2.0 (or even 1.x, i'm not clear on that)
   global defaultRNG() = DEFAULT_RNG
end
  • maybe disable srand(seed=make_seed());
  • maybe enable srand(::Type{MersenneTwister_1}, seed=make_seed()) (or srand(::Void, seed=make_seed())) to re-seed DEFAULT_RNG
  • maybe prevent a direct call like srand(defaultRNG()) for example by checking the object_id of the arg in srand

The "maybe's" above is because I'm not clear what are all the requirements implied by the module approach(es).

make it possible for us to change the default, while making it easy for packages to declare a dependency on a particular implementation.

With the module approach, most users will be able to still use rand with a default RNG (which can change in version 2), but library author can depend on a specific future-proof implementation of a default RNG by using e.g. dSFMT.rand(Int) instead of rand(Int); but why not use rand(::MersenneTwister_1, ::Type{Int}) then? (for example by using a const PACKAGE_DEFAULT_RNG = MersenneTwister_1()).

My questions are sincere, probably I'm missing something possibly obvious.

Define default methods such as Random.srand(rng::Void, seed)

Does that rely on the recent work which make Union{AbstractRNG, Void} efficient? Why not using instead of nothing a dummy type struct DefaultRNG <: AbstractRNG end and passing DefaultRNG() around; could make it easier for writing signatures to just have rand(rng::AbstractRNG=DefaultRNG(), ...).

@yuyichao
Copy link
Contributor

yuyichao commented Aug 28, 2017

Note that the splitting module thing is only about switch default RNG in base, not about letting user do it.

freeze MersenneTwister into MersenneTwister_1 like I suggested above (not a type alias, a real new type)

This is fine, as long as you don't change what you call MersenneTwister in 1.x. It's also unrelated.

delete Base.Random.GLOBAL_RNG in favor of defaultRNG() to be hopefully more future-proof w.r.t enabling users to change the default RNG (like this PR or FunctionWrapper):

That'll not affect whether changing the default RNG is breaking or not. As long as it is still explicitly accessible, changing it's type is breaking. The important part, and the part I've been asking about, is that the global RNG will only be implicitly accessible via basically a token/placeholder/wrapper. Letting defaultRNG() return the real/bare RNG breaks it.
Incidently, having only implicity access to the default RNG is also more likely to work with FunctionWrappers since that makes the only way to access it functions with well defined signatures and can be wrapped much more easily so adding such an API (edit: i.e. defaultRNG()) will likely not help (and hurt) future/final API/implementation.

With the module approach, most users will be able to still use rand with a default RNG (which can change in version 2), but library author can depend on a specific future-proof implementation of a default RNG by using e.g. dSFMT.rand(Int) instead of rand(Int); but why not use rand(::MersenneTwister_1, ::Type{Int}) then? (for example by using a const PACKAGE_DEFAULT_RNG = MersenneTwister_1()).

That's the whole point I was trying to ask, if this is the case, then I don't think packages will be able to see the switch at all. Now that I've finally got the answer for the missing piece #23205 (comment) what you state here is not what package should do. Packages that want to use any good random source in general but want reliable testing are recommended to use Base.rand with Base.DEFAULT_RNG (or nothing from what I propose) as the default RNG argument. Using const PACKAGE_DEFAULT_RNG = MersenneTwister_1() (or MersenneTwister.GLOBAL_RNG) will mean that the package won't see the default switch, which is perfectly fine for non-breaking but will mean that the default switch would be next to useless.

So the key idea is that we'll have an global RNG that you cannot directly access. You can only use a number of functions on them. You can either call these functions without an RNG arguments, in which case you just get the result with a opaque random source, or you can pass in an opaque object as the RNG to achieve the same result. Since they take this opaque object as RNG they'll also take a real RNG as RNG and so this enables writing library code that can be switch between the default global source or a custom fixed-typed source without explicit access to the real global RNG (since it's wrapped or represented as a token)

@yuyichao
Copy link
Contributor

So the key idea is that we'll have an global RNG that you cannot directly access. You can only use a number of functions on them. You can either call these functions without an RNG arguments, in which case you just get the result with a opaque random source, or you can pass in an opaque object as the RNG to achieve the same result. Since they take this opaque object as RNG they'll also take a real RNG as RNG and so this enables writing library code that can be switch between the default global source or a custom fixed-typed source without explicit access to the real global RNG (since it's wrapped or represented as a token)

Also note that we don't have to have a module separation for these to work. The module separation only seems to be necessary for having a seedable global RNG of specific type (i.e. MersenneTwister.GLOBAL_RNG). With the missing piece about what package should do sorted out, I am ok with not having that at all so basically we will only have srand that takes an RNG object (not type) and only rand with an real (not the token or wrapper for the default RNG) RNG object can be seeded.

@rfourquet
Copy link
Member Author

rfourquet commented Aug 28, 2017

Thanks! Yes I found difficult to reconcile the module approach together with the hidden/wrapped/inaccessible global RNG.
So the separate modules approach is only necessary if we want to be able to seed the global RNG (do you confirm Jameson?) But we seem to agree that if a package wants to test with reproducible sequences, then this package uses an explicit RNG object. So no need to use separate modules.

If it facilitates future implementations, it seems fine to use a token approach; still I would call this token defaultRNG() (with the dummy type from the last line of my previous post) which seems to be an API easier to evolve (e.g. it also allows this PR's implementation in the future). I think I would still favor a wrapped RNG which doesn't support seeding over the closure implementation: it's an implementation detail, so I don't see the need to be protect to that extent the user from shooting in the foot. I guess being able to access the real RNG by unwrapping the default RNG can be potentially helpful in debugging (even for base developpers), or for writing tests in "test/random.jl". It could even have underscores in its name to act as a warning (alternatively, having Base.Random.unsafe_get_global_RNG() defined within the closure would achieve the same).

but want reliable testing are recommended to use Base.rand with Base.DEFAULT_RNG

just to be certain, by "reliable" you don't mean reproducible but rather statistically solid sequences, right?

we will only have srand that takes an RNG object (not type) and only rand with an real (not the token or wrapper for the default RNG) RNG object can be seeded.

The 2 part of this sentence seem to mean the same thing no? (the second part is not clear as rand can not be seeded as a function, but I guess you mean "for getting a reproducible sequence using rand you get to pass it a real RNG object that is seeded).

I'm not yet 100% sure that we don't want a way at all to seed the global RNG; in development, I very often use srand(seed) at the REPL, as a shortcut, but I would not write that in a package. So it could be good to think of a way to achieve that which is future-proof API-wise (edit: e.g. what was proposed: srand(::Type{MersenneTwister}, seed)). (In the meantime, I could have srand_global(seed) = srand(unsafe_get_global_RNG(), seed) in my "juliarc.jl".

@yuyichao
Copy link
Contributor

So the separate modules approach is only necessary if we want to be able to seed the global RNG (do you confirm Jameson?)

if we want to have a seedable global RNG, i.e. there can be multiple global RNG.

So no need to use separate modules.

Yes, separate modules makes it easy to add (another) seedable global RNG. Unclear if it's necessary. This is basically the original proposal for getting specific RNG by importing certain symbols, rather than providing a specific RNG as input.
Of course it's also a way to organize things but I don't really thing that's much better.

still I would call this token defaultRNG() (with the dummy type from the last line of my previous post) which seems to be an API easier to evolve (e.g. it also allows this PR's implementation in the future).

I don't think we want to allow a method overwrite based approach. It is a technically much worse approach.

I think I would still favor a wrapped RNG which doesn't support seeding over the closure implementation: it's an implementation detail, so I don't see the need to be protect to that extent the user from shooting in the foot. I guess being able to access the real RNG by unwrapping the default RNG can be potentially helpful in debugging (even for base developpers), or for writing tests in "test/random.jl". It could even have underscores in its name to act as a warning (alternatively, having Base.Random.unsafe_get_global_RNG() defined within the closure would achieve the same).

We can also use a token without closure too.

just to be certain, by "reliable" you don't mean reproducible but rather statistically solid sequences, right?

I mean reproducible sequence.

we will only have srand that takes an RNG object (not type) and only rand with an real (not the token or wrapper for the default RNG) RNG object can be seeded.

The 2 part of this sentence seem to mean the same thing no? (the second part is not clear as rand can not be seeded as a function, but I guess you mean "for getting a reproducible sequence using rand you get to pass it a real RNG object that is seeded).

No. By you can seed a rand. Our current rand is seedable with srand. Maybe more explicitly would be "only the RNG used by rand with an real (not the token or wrapper for the default RNG) RNG object as input can be seeded."

I'm not yet 100% sure that we don't want a way at all to seed the global RNG; in development, I very often use srand(seed) at the REPL, as a shortcut, but I would not write that in a package.
(In the meantime, I could have srand_global(seed) = srand(unsafe_get_global_RNG(), seed) in my "juliarc.jl".

Exposing the actual base default RNG for this use sounds fine. As long as alternatives are provided (possibly an additional seedable RNG to cover the user that really just want to keep using the same API, i.e. srand and rand that affects each other without any explicit rng arguments)

So it could be good to think of a way to achieve that which is future-proof API-wise
srand(::Type{MersenneTwister}, seed)

As mentioned above, the only problem I see with having this now is that it couples the implementation of this PR to switching default Base RNG from Base. I'd like to avoid such coupling but I do think it's fine to add this API whenever a correct implementation of what this PR want to do is merged.

@yuyichao
Copy link
Contributor

And about token vs wrapper, a token has fewer indirection which is why I like it slightly more. Not that much a difference though, of course.

@rfourquet
Copy link
Member Author

Packages that want to use any good random source in general but want reliable testing are recommended to use Base.rand with Base.DEFAULT_RNG (or nothing from what I propose) as the default RNG argument.

Then I don't understand: how can they get reproducible/reliable sequences by using Base.DEFAULT_RNG if it's not seedable?

No. By you can seed a rand.

If you want to call it like this, why not, but a way to see it closer to the implementation is that rand-related functions are pure functions of their input, except when using the default RNG, and as such can not be "seeded"; only RNGs are seeded, global/default/implicit or not.
I took "we will only have srand that takes an RNG object (not type)" to mean no "srand" with an implicit RNG, meaning in turn no way to seed the default RNG, hence only explicit RNGs used be rand can be seeded, which is close to "only rand with an real RNG object can be seeded."

@yuyichao
Copy link
Contributor

yuyichao commented Aug 28, 2017

Then I don't understand: how can they get reproducible/reliable sequences by using Base.DEFAULT_RNG if it's not seedable?

You use Base.DEFAULT_RNG "as the default RNG argument." Ref #23205 (comment) for code example.

only RNGs are seeded, global/default/implicit or not.

Sure, so I just mean that implicit/default one (in Base, not MersenneTwister) won't be seedable.

I took "we will only have srand that takes an RNG object (not type)" to mean no "srand" with an implicit RNG, meaning in turn no way to seed the default RNG, hence only explicit RNGs used be rand can be seeded, which is close to "only rand with an real RNG object can be seeded."

In that sense, yes. I'm just describing what the two functions need to do and the two part of the sentence do kind of both means that no srand function can affect Base.rand when it takes no RNG or Base.DEFAULT_RNG as RNG.

@rfourquet
Copy link
Member Author

Packages that want to use any good random source in general but want reliable testing are recommended to use Base.rand with Base.DEFAULT_RNG as the default RNG argument

It's sometimes hard to decipher what you say, so let me rephrase it: "packages have to use an explicit RNG for reproducible sequences, and can use Base.DEFAULT_RNG as an implicit one if reproducibility is not a requirement" (i.e. it's not recommended to use an implicit seedable RNG). In other words, mimic what Base.Random will do, with the pattern foo([rng=Base.DEFAULT_RNG], args...).

@yuyichao
Copy link
Contributor

"packages have to use an explicit RNG for reproducible sequences, and can use Base.DEFAULT_RNG as an implicit one if reproducibility is not a requirement" (i.e. it's not recommended to use an implicit seedable RNG)

User has to pass in an explicit RNG (not Base.DEFAULT_RNG) to the packages for reproducible sequences, and the package can use Base.DEFAULT_RNG as an default argument, which will be used when reproducibility is not a requirement. (It's an alternative to use an implicit seedable RNG. I won't say not recommend since there are still minor cases where the approach above doesn't work)

@rfourquet rfourquet mentioned this pull request Jan 16, 2018
5 tasks
@rfourquet rfourquet closed this May 1, 2020
@rfourquet rfourquet deleted the rf/switch-GLOBAL_RNG branch May 1, 2020 10:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs news A NEWS entry is required for this change randomness Random number generation and the Random stdlib
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants