-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enable switching the default RNG (fix #23199) #23205
Conversation
While this might be a good attempt, the fix of #265 doesn't mean you can override base (or other packages') functions and expect everything (especially precompilation) to work correctly or efficiently. |
Yes right, I forgot to mention my uncertainty about the validity of this PR. Also, I meant to check the perf: @nanosoldier |
Your benchmark job has completed - no performance regressions were detected. A full report can be found here. cc @ararslan |
So I don't understand if this solution makes unreasonable assumptions that everything will work. What wrong happens if a user overwrites with |
I'm also unclear on what's wrong with this. |
base/random.jl
Outdated
@@ -17,7 +17,7 @@ export srand, | |||
randperm, randperm!, | |||
randcycle, randcycle!, | |||
AbstractRNG, MersenneTwister, RandomDevice, | |||
GLOBAL_RNG, randjump | |||
GLOBAL_RNG, globalRNG, randjump |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should probably call it globalrng
, otherwise the capital RNG makes it look like camel case.
Is the concern that precompiled package files could get used without having references to the global RNG invalidated if you overwrite this? |
Bump; are there some complications which make it difficult to decide whether this is correct or not? |
|
Recompiling everything that depends on the global RNG is unavoidable if we want to be able to change the global RNG and have random number generation be fast. This is likely to only happen once in a process, so I think that's perfectly acceptable.
As I understand the term, Base cannot commit type piracy by definition, so I'm not sure what you mean by this. |
Not sure why this is type piracy. But, I do feel that we do not want to encourage APIs to change global state. Maybe it is best to pass an extra first argument if someone wants to use a different RNG? This also ends up changing the RNG (when the user supplies their own) for any other package that has only been tested with the default one, and things could also break in unexpected ways. |
Not really. https://github.com/yuyichao/FunctionWrappers.jl #13984 Even with the cfunction overhead the mean of
Base cannot, what I said is that the "API" is. The API is redefining a base method, that's by definition the worst kind of type piracy. (It not even a specilization without new types but a overwrite of the old method.)
I agree in general though that discussion should go to #23199. The |
Changing the global RNG is not something that would be particularly encouraged; but I feel it's good that libraries using implicitly the global RNG don't rely on special features/particularities of As for recompiling everything, again it would not be encouraged, but would give a means to people needing this feature, if they are ready to pay this recompile price. Edit: I tried the alternative of having |
@yuyichao, you've got to be a bit more constructive and explicit here. Links to some random wrapper function things doesn't really explain what you're thinking or how to solve this problem better. Scalar You've also shifted the goal posts / objections here. You started out claiming that this wouldn't be efficient or correct, but you refuse to provide details to back that claim up. Instead you've shifted to an argument that you: don't like the compilation this causes, and think it's a bad API. I agree with the API (see below for an alternative proposal), but this implementation does seem to work and have the desired performance characteristics, so for any claim to the contrary, the burden of proof is on you. If you think this is broken, why is it broken? If you think there's a better way to do this with equally good performance, what is it? Provide details, not just some vague links and claims. Regarding API, @yuyichao is quite right that having people redefine the # in Base.Random
global_rng_type() = MersenneTwister
function set_global_rng_type(T::Type{<:AbstractRNG})
if T !== global_rng_type()
global global_rng_type() = T
end
return T
end
get_global_rng_type() = global_rng_type()::Type{<:AbstractRNG} This hides the method redefinition as the mechanism for accomplishing this, which leaves us free to implement this in a better way in the future if we come up with one. We should consider requiring an RNG type when the seed is set so that programs that set a seed will continue to produce repeatable random sequences even if Base changes the default RNG type. If we don't require providing an RNG type when seeding, we should at least document that seeding without providing an RNG type will not be future proof. |
This should be a syntax error, since we don't currently generate correct code for it. Jarrett's new validation pass detects it, but currently not lowering. This is a bad API, since it's really confusing. You if you call global RNGType = MersenneTwister
function set_global_rng_type(T)
(T <: AbstractRNG) || throw(ArgumentError("invalid RNG type"))
global RNGType = T
nothing
end |
The wrapper function things is not random and I even linked the issue it is a prototype of in case anyone didn't read the README. The frequent exposure on discourse made me think it doesn't need too much explaination for what it is doing but I guess that could be wrong. In anycase though, the issue explicitly says that this is for callback registration, which is exactly what this is about.
Well, as I said, the cost is basically from the current implementation (
I did not shift away from that though I did focus more on the efficiency.
An API that trashes precompilation is pretty broken.
So I guess the point I was missing is that I think a function registering API with enough integrated support will be the way to provide this kind of API. And I don't think we are in a hurry to provide this API especially when the need for it is still questionable.
Note that this won't work, you'll need |
This is what I'm talking about not being explicit – you mentioned precompilation once in a parenthetical. I can't read your mind – I don't know what you mean when you mention something like this in passing. You have to explain it or no one knows what your objection is, even if it's a perfectly valid problem. |
Well, but "precompilation" isn't actually the only important part. It's just one example of |
Ok, now we're getting somewhere. We do need to hurry, however, because, as I said, the current API for If we're really confident that we will be able to support changing the global RNG type but we just don't have the technology in place yet, then we could require passing in the RNG type when setting the seed now but just not support that argument being anything other than
The point of hiding it behind an API is that we can then change the implementation to something better. If the API is "use |
Recompiling everything is not great but it's not the end of the world – if it's documented that changing the global RNG type forces recompilation of a ton of code, then people know they can do it but there's a cost. For some users, that's worth it. If we can eliminate that cost without sacrificing performance of |
I'm confident that we can have really low cost callback registration but I don't think that'll have to come in an api like passing a RNG type to |
Seems that you have implicitly assumed that this is an API that can stay (I'm trying to guess here), in that case, yes, having it here as a place holder can be better. However, I don't think that is the case and even if it is the API suggest that it changes the global state immediately so the behavior (that it does not change the global state immediately) will be pretty surprising to the users.
Still, why do we need a bad implementation for an API that can be added later? |
Let me be crystal clear about why we need this API now. Here's the scenario without an
If the RNG change is in Julia 2.0, then we could just deprecate
I'm trying not to get stuck in the same trap. Instead, I want the process to look like this:
|
That is not a breaking change. A breaking change is a change to an API that causes someone's code to stop working altogether or to silently work differently. Allowing something that was previously an error is not a breaking change. |
OK, now that (i.e. changing default RNG in Base instead of providing an API to change the default RNG) is never mentioned anywhere in either this thread or the linked issue. |
Building on Stefan's API suggestion, but updated (as the actual RNG object has to be initialized somewhere), here is a suggestion which doesn't seem to suffer any performance penalty, and is in line with Stefan's desire to pass the RNG type to
assuming As for recompilation: I think that using this feature to change the default RNG type is not to be used in library code, only in user applications or in the REPL. So I don't see different libs fighting against each-other to set the default RNG. Moreover, it should be considered bad style in library code to use the global RNG. The recommanded API would be |
This is the title of the PR.
It would be helpful to distinguish "bad" in terms of technical correctness (i.e. it can crash, or misbehave) vs "bad" in terms of "i don't like that it recompiles everything", the second one being a personal preference. |
I have no say in statistics, but would it be crazy to do the following instead of freezing the random sequence in tests: don't seed the RNG, but run the test a second time if it fails the first time, and report failure only if it fails the second time? The many uses of For my taste, the status quo is not good enough for 1.0: I'm now more convinced that when no explicit RNG object is passed, at least we should either disable Regarding seeding without needing reproducible sequences: FWIW, I think it's safer to use the |
Well, that won't make it pass 100% of time either.
Note that this was mentioned only about a blackbox
I'll not repeat the technical reasons why the implementation here is bad (or the definition of bad).
That renaming sounds fine. Though just name the improved one |
No, but could make the failure rate sufficiently small that it's acceptable (but maybe it's no faster that increasing the running time of one test to have better success rate?). It's no such a good solution as it does not allow to check that if succeeds most of the time, but freezing the sequence makes the test succeed for only, well, one sequence...
I know that you already said it, it doesn't mean I can not have another opinion. The definition of "breaking" is still blury, but if we change the default RNG type, won't people using As for the name, I would prefer to keep "MersenneTwister" for the most up-to-date/efficient implementation. |
Well I just mean that you said your reasoning for what was following that part was
It's pretty well defined. If you have a whole program and it's behavior changes in a way that you can observe that's breaking. Also, changes in one package breaking another is not breaking even if the change in the first package uses new Base features.
Exactly, that's why I said only the limited version of a unseedable
As long as the It's unclear to me whether you are talking about changes preparing for a 1.x change of default RNG or a user API to change the RNG. This comment is AFAICT the first time you explicitly talk about "we change the default RNG type" and I was assuming all your previous comments are about user API to change the RNG.
Changing what that name refer to is breaking. Adding new names isn't. |
OK, sorry if I was not clear enough: I certainly don't think allowing the user to switch the default RNG is urgent, but that doing something about
it must be non-accessible? (I think I misunderstand) Does that mean hiding |
Correct.
AFAICT from Stefan's comment, there will be no
It's not good enough if people can't avoid using it. From what I see in base and package code and from the necessity to provide a default RNG for both base and package functions, I don't think that's doable.
As long as you give alternatives for all the usage, that'll be fine. The only important part of unseedable |
You habitually make the least charitable possible interpretation of what I’ve said. This is infuriating, rude and unconstructive. If you don’t know what I mean, you can ask, politely, assuming that I am not an idiot and have something reasonable in mind. I may have missed some serious problem, but assuming the worst does not help us get anywhere useful – it just makes discussing things with you unpleasant and unproductive.
As @rfourquet and I have both explained, packages should be written to accept an explicit RNG. If the user doesn't provide one, they can use the default RNG. When a package tests something about itself that requires reproducible sequences, the test should seed a specific RNG and use it. You’ve stated that you don't feel that packages should work like this, which is just, like, your opinion, man. Since you haven’t explained any alternative that is future-proof, I have a hard time seeing your point of view here. Yes, you keep linking to some issue about function wrappers but NO ONE ELSE UNDERSTANDS WHAT YOU MEAN BY THAT OR HOW IT RELATES TO CHANGING THE DEFAULT RNG. Please, spell it out since we do not know what you are thinking. You keep linking to #13984. That is a single-comment issue that you wrote in 2015 talking about C++11 and
No, that is not an explanation or a plan. It is a link to the same seemingly unrelated issue with no indication of how it relates or how it would help address this issue. I’m sure you see a connection. We can’t read your mind. Use your words – spell it out.
None of those are answers. They are just comments that I’ve already read (which don’t answer the question) and links to seemingly unrelated issues. If you have a way to change the global RNG, please tell us what it is. Again, you keep linking to the same issue with no explanation of how it relates. NO ONE KNOWS HOW #13984 helps address this issue.
I have no idea what this pair of sentences means.
Again, linking to #13984 is not an explanation or a plan. I have no idea what you mean by linking to that issue. I agree that not having an API for changing the global RNG is the more conservative approach if this was the choice we had to make in isolation. However, it’s not...
We need to figure out if we will be able to switch the default RNG or not. You seem to have something in mind involving #13984, but we don’t know what because you won’t spell it out, you just keep linking back to comments where you didn’t explain it, which link back to a seemingly irrelevant issue. Moreover, at this point, the better approach seems to be (which you’ve agreed to) to have separate |
No, this is the wrong conclusion altogether, which you could learn by asking me instead of assuming the worst. Suppose a package has a function let rng = MersenneTwister1.RNG(seed)
@test frizzle(rng, args...) == value
end This will keep working no matter what the default RNG is. And yes, the TESTS will not see a change to the default RNG, but that's fine – they're tests and need to be updated if the default RNG changes anyway. Actual usage of |
I gave the more regress logic in #23205 (comment) so that's not
Yes, and I've clearly stated in the comment you quote, my only question is how do you use a inaccessible RNG as default.
I didn't.
There are two issues and I've clearly stated my view on both.
I didn't keep linking to it. I linked to it exactly once before the reminder in the last comment. I also did explained why it's related:
I agree, and that's why I've repeated said in my comment that the only problem is what you want packages to do. i.e. "how do you use a inaccessible RNG as default". I've given all what I can think of what a package can write their code but none of them covers this. |
This means exposing the |
Ah, this is a good point to ask for clarification on. The |
OK, I see. This is very non-obvious. In that case I think we can even just call it
Agree, and I'm still open for discussion about the final API/implementation for what THIS PR want to do either here or in #13984. It shouldn't be necessary at this point since it can be 1.x but if we do want to discuss, I'll want to know where the confusion is, i.e.
|
Thank you, this is the clearest explanation of what you're thinking so far. So let me see if I can accurately summarize your position:
I'm somewhat skeptical that (1) can actually be done without large amounts of recompilation which are essentially equivalent to what this issue does. But you're brilliant at these technical performance issues and I've seen equally amazing results in the past, so I do hope that you're right and we can have such incredibly low-cost callbacks – that would be wonderful. Regarding (2), I no longer believe that's the case and think that @rfourquet is on the same page as me. We can come up with a design here and open a different issue about it since, as you've said, this issue is about changing the default global RNG, not changing the way people interact with RNGs. |
The upshot of staticfloat's analysis is that you must seed the value predictably, but that you can change the RNG and/or the meaning of the seed at any time without affecting a valid test for statistical properties. The purpose of the call is to make failures reproducible. Preventing failures a priori (by testing for the specific random number sequence) would generally be poor test design – or a least unnecessary effort from the test author. For stabilizing a 1.0 API, it sounds like the goal is to make it possible for us to change the default, while making it easy for packages to declare a dependency on a particular implementation. For that, I propose making the following refinements to the current API:
|
Yes, I think it's basically the last part of #23205 (comment), which is roughly my first comment that replies to both issues at the same time.
Fine in that we can add new features without breaking it. With a completely separated Base.rand and it's RNG without a corresponding |
Slightly different proposal,
|
Note that the main difference is |
I'm sorry to not have yet fully understood the module approach: would one of you mind clarifying what it brings compared to the following approach:
The "maybe's" above is because I'm not clear what are all the requirements implied by the module approach(es).
With the module approach, most users will be able to still use My questions are sincere, probably I'm missing something possibly obvious.
Does that rely on the recent work which make |
Note that the splitting module thing is only about switch default RNG in base, not about letting user do it.
This is fine, as long as you don't change what you call
That'll not affect whether changing the default RNG is breaking or not. As long as it is still explicitly accessible, changing it's type is breaking. The important part, and the part I've been asking about, is that the global RNG will only be implicitly accessible via basically a token/placeholder/wrapper. Letting
That's the whole point I was trying to ask, if this is the case, then I don't think packages will be able to see the switch at all. Now that I've finally got the answer for the missing piece #23205 (comment) what you state here is not what package should do. Packages that want to use any good random source in general but want reliable testing are recommended to use So the key idea is that we'll have an global RNG that you cannot directly access. You can only use a number of functions on them. You can either call these functions without an RNG arguments, in which case you just get the result with a opaque random source, or you can pass in an opaque object as the RNG to achieve the same result. Since they take this opaque object as RNG they'll also take a real RNG as RNG and so this enables writing library code that can be switch between the default global source or a custom fixed-typed source without explicit access to the real global RNG (since it's wrapped or represented as a token) |
Also note that we don't have to have a module separation for these to work. The module separation only seems to be necessary for having a seedable global RNG of specific type (i.e. |
Thanks! Yes I found difficult to reconcile the module approach together with the hidden/wrapped/inaccessible global RNG. If it facilitates future implementations, it seems fine to use a token approach; still I would call this token
just to be certain, by "reliable" you don't mean reproducible but rather statistically solid sequences, right?
The 2 part of this sentence seem to mean the same thing no? (the second part is not clear as I'm not yet 100% sure that we don't want a way at all to seed the global RNG; in development, I very often use |
if we want to have a seedable global RNG, i.e. there can be multiple global RNG.
Yes, separate modules makes it easy to add (another) seedable global RNG. Unclear if it's necessary. This is basically the original proposal for getting specific RNG by importing certain symbols, rather than providing a specific RNG as input.
I don't think we want to allow a method overwrite based approach. It is a technically much worse approach.
We can also use a token without closure too.
I mean reproducible sequence.
No. By you can seed a rand. Our current
Exposing the actual base default RNG for this use sounds fine. As long as alternatives are provided (possibly an additional seedable RNG to cover the user that really just want to keep using the same API, i.e. srand and rand that affects each other without any explicit rng arguments)
As mentioned above, the only problem I see with having this now is that it couples the implementation of this PR to switching default Base RNG from Base. I'd like to avoid such coupling but I do think it's fine to add this API whenever a correct implementation of what this PR want to do is merged. |
And about token vs wrapper, a token has fewer indirection which is why I like it slightly more. Not that much a difference though, of course. |
Then I don't understand: how can they get reproducible/reliable sequences by using
If you want to call it like this, why not, but a way to see it closer to the implementation is that |
You use
Sure, so I just mean that implicit/default one (in
In that sense, yes. I'm just describing what the two functions need to do and the two part of the sentence do kind of both means that no |
It's sometimes hard to decipher what you say, so let me rephrase it: "packages have to use an explicit RNG for reproducible sequences, and can use Base.DEFAULT_RNG as an implicit one if reproducibility is not a requirement" (i.e. it's not recommended to use an implicit seedable RNG). In other words, mimic what Base.Random will do, with the pattern |
User has to pass in an explicit RNG (not |
Thanks to #265 fixed, we can use
globalRNG()
instead ofGLOBAL_RNG
, and a user can redefine this function to return a custom RNG. Even if the multi-threading story is not clarified for the global RNG, at least it doesn't seem to me to make things worst in this respect. Moreover, it's a good way for library author to not make assumptions about the default RNG.EDIT: the first version of this PR was simply defining
globalRNG() = GLOBAL_RNG
, and changingGLOBAL_RNG
toglobalRNG()
everywhere. Following the discussion, which pointed at the bad API of redefining a Base function (viaBase.Random.globalRNG() = myrng
), it is updated as follows: it's possible to change the type of the default RNG, which can be got viadefaultRNG()
, using thesrand
function:srand(RNG_Type, [seed])
is the same assrand(seed)
ifdefaultRNG()
is of typeRNG_Type
; otherwise initialize withseed
a new RNG of the specified type and make it the new default RNG.