Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separate ranges and split #9

Closed
wants to merge 4 commits into from

Conversation

lehins
Copy link
Collaborator

@lehins lehins commented Feb 22, 2020

This PR implements ideas from #8 and #7

@lehins
Copy link
Collaborator Author

lehins commented Feb 22, 2020

CC @curiousleo @idontgetoutmuch and @cartazio
If you can guys, please weigh in on those tickets and possible implementation.

@lehins
Copy link
Collaborator Author

lehins commented Mar 3, 2020

I think we agreed so far that splitting Random into two classes is not a good idea and instead we'll go with Uniform and UniformRange classes as proposed in #8 (comment) and implemented in #14

Distinguishing splittable from non-splittable was decided to be solved by supplying a really good default implementation for splitting any pure RNG. So far proof-of-concept is implemented in #12

@lehins lehins closed this Mar 3, 2020
@cartazio
Copy link

cartazio commented Mar 5, 2020 via email

@lehins
Copy link
Collaborator Author

lehins commented Mar 5, 2020

@cartazio You are speaking too abstract. Could you provide some concrete examples?

@cartazio
Copy link

cartazio commented Mar 5, 2020 via email

@curiousleo
Copy link
Collaborator

@cartazio wrote:

In a splitting monad instance, every monadic bind does the split operation.

Sorry if I'm missing something obvious here. Can you point to the code in https://github.com/idontgetoutmuch/random/blob/interface-to-performance/System/Random.hs that calls split whenever a monadic bind takes place?

@cartazio
Copy link

cartazio commented Mar 6, 2020 via email

@curiousleo
Copy link
Collaborator

@lehins
Copy link
Collaborator Author

lehins commented Mar 6, 2020

@cartazio So you are talking about something like this, right?

There’s a different state
transformer monad instance for split vs not splitting

splitState :: (MonadState g m, RandomGen g) => (g -> (r, g)) -> m r
splitState f = do
  g <- get
  case split g of
    (h1, h2) -> fst (f h2) <$ put h1

I don't see how anything like this can be useful for laziness, but I definitely see how it is useful in uses cases like QuickCheck for example. In fact, I remember wondering about how does QuickCheck handles reproducibility when some subset of tests is being selected to run, so thank you for bringing it up.

Back to your original comment "Splittable rngs need to be a subclass. " If we all agree on splitting the RandomGen class into two, then I totally agree that unsplittable should be the superclass of splittable, like it was implemented in this PR

@cartazio
Copy link

cartazio commented Mar 6, 2020 via email

@lehins
Copy link
Collaborator Author

lehins commented Mar 6, 2020

@cartazio I had to implement another monad in order to see what you mean :)

So, here is a monad that is similar to the one in QuickCheck:

newtype GenM a = GenM
  { runGenM :: forall g. RandomGen g => g -> a
  }

instance Functor GenM where
  fmap f (GenM h) = GenM (f . h)

instance Applicative GenM where
  pure a = GenM (const a)
  (<*>) (GenM f) (GenM h) =
    GenM $ \g ->
      case split g of
        (r1, r2) -> f r1 (h r2)

instance Monad GenM where
  return = pure
  (>>=) (GenM f) gh =
    GenM $ \h ->
      case split h of
        (r1, r2) ->
          case gh (f r1) of
            GenM f' -> f' r2

And here is the more strict one that doesn't rely on splitting:

newtype GenS a = GenS
  { runGenS :: forall g. RandomGen g => g -> (a, g)
  }

instance Functor GenS where
  fmap f (GenS h) = GenS (first f . h)

instance Applicative GenS where
  pure a = GenS ((,) a)
  (<*>) (GenS f) (GenS h) =
    GenS $ \g ->
      case f g of
        (fa, g') ->
          case h g' of
            (b, h') -> (fa b, h')

instance Monad GenS where
  return = pure
  (>>=) (GenS f) gh =
    GenS $ \g ->
      case f g of
        (a, g') ->
          case gh a of
            GenS f' -> f' g'

If we create two actions where one of values can be very expensive to generate (namely ByteArray in the example):

fooM :: Int -> GenM (ByteArray, Word16)
fooM n = do
  ba <- GenM (fst . genByteArray n)
  w16 <- GenM (fst . genWord16)
  pure (ba, w16)

fooS :: Int -> GenS (ByteArray, Word16)
fooS n = do
  ba <- GenS (genByteArray n)
  w16 <- GenS genWord16
  pure (ba, w16)

Now it is pretty clear that fooS will have to compute ba, before w16:

λ> (ba, w) = runGenM (fooM 100000000) (mkStdGen 217)
λ> w
35990
λ> (ba, w) = fst $ runGenS (fooS 100000000) (mkStdGen 217)
λ> w
Interrupted.

The reason why I was a bit confused is that it is not that GenS is strict in the value it returns, but the fact that there is sequential dependency on the generator when splitting is not used:

fooS' :: GenS (ByteArray, Word16)
fooS' = do
  ba <- GenS (\g -> (undefined, g))
  w16 <- GenS genWord16
  pure (ba, w16)
λ> (ba, w) = fst $ runGenS fooS' (mkStdGen 217)
λ> w
63926

That was a fun exercise :)

Side note - above GenS and GenM can be made an instance of MonadRandom

@cartazio
Copy link

cartazio commented Mar 6, 2020 via email

@cartazio
Copy link

cartazio commented Mar 6, 2020 via email

@idontgetoutmuch
Copy link
Owner

So are we now proposing something like:

class RandomGen g => SplittableGen g where
  split    :: g -> (g, g)

instance SplittableGen StdGen where
  split = SM.splitSMGen

@curiousleo
Copy link
Collaborator

@lehins thank you so much for working through this in #9 (comment).

I still had to draw this (on a literal napkin in this case) to really get it.

In GenM f >>= gh, if gh ignores its first argument, then the overall result of the computation does not depend on f r1, so it never needs to be evaluted.

20200309_100421-01

In GenS f >>= gh, if the result of gh a uses the random number generator (that is, f' is not constant), then f g must be evaluated. This is what triggers the evaluation of genByteArray.

20200309_100428-01

@lehins
Copy link
Collaborator Author

lehins commented Mar 9, 2020

@idontgetoutmuch I propose if we do split RandomGen class into two "splitabble" and "not-splittable" concept we do it in a way like it was implemented in this PR:

So are we now proposing something like:

In particular it was number 3 from [this comment]:(#7 (comment))

class NoSplitGen g where
  ...
  genWord64 :: g -> (Word64, g)
class NoSplitGen g => RandomGen g where
  split  :: g -> (g, g)

because it results in less breakage.

But, if we do decide not to split RandomGen class, I would be fine with it as well and I wouldn't oppose an implementation of a helper function (eg. defaultSplit or basicSplit) that can be used as a split implementation, but only if it is really acceptable in the industry to use such split function for common non-splittable generators. People seem to suggest it is possible to implement such function, I just don't know, since it is not my area of expertise.

@cartazio
Copy link

cartazio commented Mar 9, 2020 via email

@idontgetoutmuch
Copy link
Owner

Since this PR is closed, can we carry on the discussion on this issue: #7?

@lehins
Copy link
Collaborator Author

lehins commented Mar 10, 2020

@cartazio That is exactly what is in this example:

Splitting must be a child subclass.

class NoSplitGen g where
  ...
  genWord64 :: g -> (Word64, g)
class NoSplitGen g => RandomGen g where
  split  :: g -> (g, g)

Or am I missing something you are trying to say here?

@cartazio
Copy link

cartazio commented Mar 10, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants