Separate ranges and split #9

lehins · 2020-02-22T23:20:21Z

This PR implements ideas from #8 and #7

lehins · 2020-02-22T23:22:28Z

CC @curiousleo @idontgetoutmuch and @cartazio
If you can guys, please weigh in on those tickets and possible implementation.

lehins · 2020-03-03T20:05:54Z

I think we agreed so far that splitting Random into two classes is not a good idea and instead we'll go with Uniform and UniformRange classes as proposed in #8 (comment) and implemented in #14

Distinguishing splittable from non-splittable was decided to be solved by supplying a really good default implementation for splitting any pure RNG. So far proof-of-concept is implemented in #12

cartazio · 2020-03-05T17:32:40Z

Splittable rngs need to be a subclass. There’s a different state transformer monad instance for split vs not splitting and and they have different performance characteristics

…

On Tue, Mar 3, 2020 at 3:05 PM Alexey Kuleshevich ***@***.***> wrote: Closed #9 <#9>. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#9?email_source=notifications&email_token=AAABBQWFUQR562YS3UT3KADRFVPKHA5CNFSM4KZVQM7KYY3PNVWWK3TUL52HS4DFWZEXG43VMVCXMZLOORHG65DJMZUWGYLUNFXW5KTDN5WW2ZLOORPWSZGOXBS6C3Q#event-3093684590>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAABBQQDWSEMPN7SCGW7IF3RFVPKHANCNFSM4KZVQM7A> .

lehins · 2020-03-05T17:47:38Z

@cartazio You are speaking too abstract. Could you provide some concrete examples?

cartazio · 2020-03-05T18:22:33Z

In a splitting monad instance, every monadic bind does the split operation. There are valid reasons to do this for laziness and independent sampling. One motivation in quick check like settings is you can replace a monadic step with return () and the rest of the computation will still give the same results for a fixed choice of seed. The cost is that bind isn’t free/ can’t be optimized away. I kind of think of the difference as : the splitting bind monad is good for macro system composition (assuming enough entropy and the bind split step doesn’t dominate computation time) and that the State style monad is best for inner loops because it’s gonna be something like a rejection loop

…

On Thu, Mar 5, 2020 at 12:47 PM Alexey Kuleshevich ***@***.***> wrote: @cartazio <https://github.com/cartazio> You are speaking to abstract. Could you provide some concrete examples? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#9?email_source=notifications&email_token=AAABBQWB4RSHFVP6RS6UE3DRF7QTVA5CNFSM4KZVQM7KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEN6HNPI#issuecomment-595359421>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAABBQREWHE3ZKHEKQBLPJ3RF7QTVANCNFSM4KZVQM7A> .

curiousleo · 2020-03-06T07:52:48Z

@cartazio wrote:

In a splitting monad instance, every monadic bind does the split operation.

Sorry if I'm missing something obvious here. Can you point to the code in https://github.com/idontgetoutmuch/random/blob/interface-to-performance/System/Random.hs that calls split whenever a monadic bind takes place?

cartazio · 2020-03-06T15:21:05Z

This is about how libraries such as quickcheck use the random api. The gen monad in quickcheck https://hackage.haskell.org/package/QuickCheck-2.13.2/docs/Test-QuickCheck-Gen.html

…

On Fri, Mar 6, 2020 at 2:52 AM Leonhard Markert ***@***.***> wrote: @cartazio <https://github.com/cartazio> wrote: In a splitting monad instance, every monadic bind does the split operation. Sorry if I'm missing something obvious here. Can you point to the code in https://github.com/idontgetoutmuch/random/blob/interface-to-performance/System/Random.hs that calls split whenever a monadic bind takes place? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#9?email_source=notifications&email_token=AAABBQV54VL6XFJT242OR3DRGCTVDA5CNFSM4KZVQM7KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEOANQFA#issuecomment-595646484>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAABBQRWHPY64TTFBWX4NKLRGCTVDANCNFSM4KZVQM7A> .

curiousleo · 2020-03-06T15:48:48Z

Ah, thanks. Code link: https://github.com/nick8325/quickcheck/blob/53d564bacb2cc4386fe328e8f01a60efee036781/Test/QuickCheck/Gen.hs#L70-L76

lehins · 2020-03-06T20:31:19Z

@cartazio So you are talking about something like this, right?

There’s a different state
transformer monad instance for split vs not splitting

splitState :: (MonadState g m, RandomGen g) => (g -> (r, g)) -> m r
splitState f = do
  g <- get
  case split g of
    (h1, h2) -> fst (f h2) <$ put h1

I don't see how anything like this can be useful for laziness, but I definitely see how it is useful in uses cases like QuickCheck for example. In fact, I remember wondering about how does QuickCheck handles reproducibility when some subset of tests is being selected to run, so thank you for bringing it up.

Back to your original comment "Splittable rngs need to be a subclass. " If we all agree on splitting the RandomGen class into two, then I totally agree that unsplittable should be the superclass of splittable, like it was implemented in this PR

cartazio · 2020-03-06T21:02:31Z

Look more closely : it’s in the definition of bind in the gen monad instance in the quickcheck library. It’s not just a lifting into monad state.

…

On Fri, Mar 6, 2020 at 3:31 PM Alexey Kuleshevich ***@***.***> wrote: @cartazio <https://github.com/cartazio> So you are talking about something like this, right? There’s a different state transformer monad instance for split vs not splitting splitState :: (MonadState g m, RandomGen g) => (g -> (r, g)) -> m r splitState f = do g <- get case split g of (h1, h2) -> fst (f h2) <$ put h1 I don't see how anything like this can be useful for laziness, but I definitely see how it is useful in uses cases like QuickCheck fro example. In fact, I remember wondering about how does QuickCheck handles reproducibility when some subset of tests is being selected to run, so thank you for bringing it up. Back to your original comment "Splittable rngs need to be a subclass. " If we all agree on splitting the RandomGen class into two, then I totally agree that unsplittable should be the superclass of splittable, like it was implemented in this PR — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#9?email_source=notifications&email_token=AAABBQS6KZAVGUOT5NQK2O3RGFMRRA5CNFSM4KZVQM7KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEOCYLGI#issuecomment-595953049>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAABBQVMOQ22XNIQ47GL4KTRGFMRRANCNFSM4KZVQM7A> .

lehins · 2020-03-06T23:04:57Z

@cartazio I had to implement another monad in order to see what you mean :)

So, here is a monad that is similar to the one in QuickCheck:

newtype GenM a = GenM
  { runGenM :: forall g. RandomGen g => g -> a
  }

instance Functor GenM where
  fmap f (GenM h) = GenM (f . h)

instance Applicative GenM where
  pure a = GenM (const a)
  (<*>) (GenM f) (GenM h) =
    GenM $ \g ->
      case split g of
        (r1, r2) -> f r1 (h r2)

instance Monad GenM where
  return = pure
  (>>=) (GenM f) gh =
    GenM $ \h ->
      case split h of
        (r1, r2) ->
          case gh (f r1) of
            GenM f' -> f' r2

And here is the more strict one that doesn't rely on splitting:

newtype GenS a = GenS
  { runGenS :: forall g. RandomGen g => g -> (a, g)
  }

instance Functor GenS where
  fmap f (GenS h) = GenS (first f . h)

instance Applicative GenS where
  pure a = GenS ((,) a)
  (<*>) (GenS f) (GenS h) =
    GenS $ \g ->
      case f g of
        (fa, g') ->
          case h g' of
            (b, h') -> (fa b, h')

instance Monad GenS where
  return = pure
  (>>=) (GenS f) gh =
    GenS $ \g ->
      case f g of
        (a, g') ->
          case gh a of
            GenS f' -> f' g'

If we create two actions where one of values can be very expensive to generate (namely ByteArray in the example):

fooM :: Int -> GenM (ByteArray, Word16)
fooM n = do
  ba <- GenM (fst . genByteArray n)
  w16 <- GenM (fst . genWord16)
  pure (ba, w16)

fooS :: Int -> GenS (ByteArray, Word16)
fooS n = do
  ba <- GenS (genByteArray n)
  w16 <- GenS genWord16
  pure (ba, w16)

Now it is pretty clear that fooS will have to compute ba, before w16:

λ> (ba, w) = runGenM (fooM 100000000) (mkStdGen 217)
λ> w
35990
λ> (ba, w) = fst $ runGenS (fooS 100000000) (mkStdGen 217)
λ> w
Interrupted.

The reason why I was a bit confused is that it is not that GenS is strict in the value it returns, but the fact that there is sequential dependency on the generator when splitting is not used:

fooS' :: GenS (ByteArray, Word16)
fooS' = do
  ba <- GenS (\g -> (undefined, g))
  w16 <- GenS genWord16
  pure (ba, w16)

λ> (ba, w) = fst $ runGenS fooS' (mkStdGen 217)
λ> w
63926

That was a fun exercise :)

Side note - above GenS and GenM can be made an instance of MonadRandom

cartazio · 2020-03-06T23:16:31Z

Yup.

…

On Fri, Mar 6, 2020 at 6:04 PM Alexey Kuleshevich ***@***.***> wrote: @cartazio <https://github.com/cartazio> I had to implement another monad in order to see what you mean :) So, here is a monad that is similar to the one in QuickCheck: newtype GenM a = GenM { runGenM :: forall g. RandomGen g => g -> a } instance Functor GenM where fmap f (GenM h) = GenM (f . h) instance Applicative GenM where pure a = GenM (const a) (<*>) (GenM f) (GenM h) = GenM $ \g -> case split g of (r1, r2) -> f r1 (h r2) instance Monad GenM where return = pure (>>=) (GenM f) gh = GenM $ \h -> case split h of (r1, r2) -> case gh (f r1) of GenM f' -> f' r2 And here is the more strict one that doesn't rely on splitting: newtype GenS a = GenS { runGenS :: forall g. RandomGen g => g -> (a, g) } instance Functor GenS where fmap f (GenS h) = GenS (first f . h) instance Applicative GenS where pure a = GenS ((,) a) (<*>) (GenS f) (GenS h) = GenS $ \g -> case f g of (fa, g') -> case h g' of (b, h') -> (fa b, h') instance Monad GenS where return = pure (>>=) (GenS f) gh = GenS $ \g -> case f g of (a, g') -> case gh a of GenS f' -> f' g' If we create two actions where one of values can be very expensive to generate (namely ByteArray in the example): fooM :: Int -> GenM (ByteArray, Word16) fooM n = do ba <- GenM (fst . genByteArray n) w16 <- GenM (fst . genWord16) pure (ba, w16) fooS :: Int -> GenS (ByteArray, Word16) fooS n = do ba <- GenS (genByteArray n) w16 <- GenS genWord16 pure (ba, w16) Now it is pretty clear that fooS will have to compute ba, before w16: λ> (ba, w) = runGenM (fooM 100000000) (mkStdGen 217) λ> w 35990 λ> (ba, w) = fst $ runGenS (fooS 100000000) (mkStdGen 217) λ> w Interrupted. The reason why I was a bit confused is that it is not that GenS is strict in the value it returns, but the fact that there is sequential dependency on the generator when splitting is not used: fooS' :: GenS (ByteArray, Word16) fooS' = do ba <- GenS (\g -> (undefined, g)) w16 <- GenS genWord16 pure (ba, w16) λ> (ba, w) = fst $ runGenS fooS' (mkStdGen 217) λ> w 63926 That was a fun exercise :) /Side note/ - above GenS and GenM can be made an instance of MonadRandom — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#9?email_source=notifications&email_token=AAABBQUMR2NWR5OYMRTP3ATRGF6RVA5CNFSM4KZVQM7KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEODEIHI#issuecomment-596001821>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAABBQUUUZV7XFRGAPKFBJ3RGF6RVANCNFSM4KZVQM7A> .

cartazio · 2020-03-06T23:19:55Z

An important footnote is the sequential / strict one has a trivial bind that can be optimized away, but the splitting one does real computation on every bind. Which is why having a default split could potentially back fire depending on the end user. But yes, an important application of the splitting monad is for constructing lazy data structures. On Fri, Mar 6, 2020 at 6:16 PM Carter Schonwald <carter.schonwald@gmail.com> wrote:

…

Yup. On Fri, Mar 6, 2020 at 6:04 PM Alexey Kuleshevich < ***@***.***> wrote: > @cartazio <https://github.com/cartazio> I had to implement another monad > in order to see what you mean :) > > So, here is a monad that is similar to the one in QuickCheck: > > newtype GenM a = GenM > > { runGenM :: forall g. RandomGen g => g -> a > > } > > > instance Functor GenM where > > fmap f (GenM h) = GenM (f . h) > > > instance Applicative GenM where > > pure a = GenM (const a) > > (<*>) (GenM f) (GenM h) = > > GenM $ \g -> > > case split g of > > (r1, r2) -> f r1 (h r2) > > > instance Monad GenM where > > return = pure > > (>>=) (GenM f) gh = > > GenM $ \h -> > > case split h of > > (r1, r2) -> > > case gh (f r1) of > > GenM f' -> f' r2 > > And here is the more strict one that doesn't rely on splitting: > > newtype GenS a = GenS > > { runGenS :: forall g. RandomGen g => g -> (a, g) > > } > > > instance Functor GenS where > > fmap f (GenS h) = GenS (first f . h) > > > instance Applicative GenS where > > pure a = GenS ((,) a) > > (<*>) (GenS f) (GenS h) = > > GenS $ \g -> > > case f g of > > (fa, g') -> > > case h g' of > > (b, h') -> (fa b, h') > > > instance Monad GenS where > > return = pure > > (>>=) (GenS f) gh = > > GenS $ \g -> > > case f g of > > (a, g') -> > > case gh a of > > GenS f' -> f' g' > > If we create two actions where one of values can be very expensive to > generate (namely ByteArray in the example): > > fooM :: Int -> GenM (ByteArray, Word16) > > fooM n = do > > ba <- GenM (fst . genByteArray n) > > w16 <- GenM (fst . genWord16) > > pure (ba, w16) > > > fooS :: Int -> GenS (ByteArray, Word16) > > fooS n = do > > ba <- GenS (genByteArray n) > > w16 <- GenS genWord16 > > pure (ba, w16) > > Now it is pretty clear that fooS will have to compute ba, before w16: > > λ> (ba, w) = runGenM (fooM 100000000) (mkStdGen 217) > > λ> w > 35990 > > λ> (ba, w) = fst $ runGenS (fooS 100000000) (mkStdGen 217) > > λ> w > Interrupted. > > The reason why I was a bit confused is that it is not that GenS is > strict in the value it returns, but the fact that there is sequential > dependency on the generator when splitting is not used: > > fooS' :: GenS (ByteArray, Word16) > > fooS' = do > > ba <- GenS (\g -> (undefined, g)) > > w16 <- GenS genWord16 > > pure (ba, w16) > > λ> (ba, w) = fst $ runGenS fooS' (mkStdGen 217) > > λ> w > 63926 > > That was a fun exercise :) > > /Side note/ - above GenS and GenM can be made an instance of MonadRandom > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <#9?email_source=notifications&email_token=AAABBQUMR2NWR5OYMRTP3ATRGF6RVA5CNFSM4KZVQM7KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEODEIHI#issuecomment-596001821>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/AAABBQUUUZV7XFRGAPKFBJ3RGF6RVANCNFSM4KZVQM7A> > . >

idontgetoutmuch · 2020-03-07T10:06:50Z

So are we now proposing something like:

class RandomGen g => SplittableGen g where
  split    :: g -> (g, g)

instance SplittableGen StdGen where
  split = SM.splitSMGen

curiousleo · 2020-03-09T09:19:32Z

@lehins thank you so much for working through this in #9 (comment).

I still had to draw this (on a literal napkin in this case) to really get it.

In GenM f >>= gh, if gh ignores its first argument, then the overall result of the computation does not depend on f r1, so it never needs to be evaluted.

In GenS f >>= gh, if the result of gh a uses the random number generator (that is, f' is not constant), then f g must be evaluated. This is what triggers the evaluation of genByteArray.

lehins · 2020-03-09T10:02:46Z

@idontgetoutmuch I propose if we do split RandomGen class into two "splitabble" and "not-splittable" concept we do it in a way like it was implemented in this PR:

So are we now proposing something like:

In particular it was number 3 from [this comment]:(#7 (comment))

class NoSplitGen g where
  ...
  genWord64 :: g -> (Word64, g)
class NoSplitGen g => RandomGen g where
  split  :: g -> (g, g)

because it results in less breakage.

But, if we do decide not to split RandomGen class, I would be fine with it as well and I wouldn't oppose an implementation of a helper function (eg. defaultSplit or basicSplit) that can be used as a split implementation, but only if it is really acceptable in the industry to use such split function for common non-splittable generators. People seem to suggest it is possible to implement such function, I just don't know, since it is not my area of expertise.

cartazio · 2020-03-09T17:11:35Z

Splitting must be a child subclass.

…

On Mon, Mar 9, 2020 at 6:02 AM Alexey Kuleshevich ***@***.***> wrote: @idontgetoutmuch <https://github.com/idontgetoutmuch> I propose if we do split RandomGen class into two "splitabble" and "not-splittable" concept we do it in a way like it was implemented in this PR: So are we now proposing something like: In particular it was number 3 from [this comment]:(#7 (comment) <#7 (comment)>) class NoSplitGen g where ... genWord64 :: g -> (Word64, g)class NoSplitGen g => RandomGen g where split :: g -> (g, g) because it results in less breakage. *But*, if we do decide not to split RandomGen class, I would be fine with it as well and I wouldn't oppose an implementation of a helper function (eg. defaultSplit or basicSplit) that can be used as a split implementation, but only if it is really acceptable in the industry to use such split function for common non-splittable generators. People seem to suggest it is possible to implement such function, I just don't know, since it is not my area of expertise. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#9?email_source=notifications&email_token=AAABBQUC6QRCL4C4F5MYK2LRGS5EPA5CNFSM4KZVQM7KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEOGOJCA#issuecomment-596436104>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAABBQW2VXCGGFT5BAKJ6LTRGS5EPANCNFSM4KZVQM7A> .

idontgetoutmuch · 2020-03-10T07:50:16Z

Since this PR is closed, can we carry on the discussion on this issue: #7?

lehins · 2020-03-10T11:55:51Z

@cartazio That is exactly what is in this example:

Splitting must be a child subclass.

class NoSplitGen g where
  ...
  genWord64 :: g -> (Word64, g)
class NoSplitGen g => RandomGen g where
  split  :: g -> (g, g)

Or am I missing something you are trying to say here?

cartazio · 2020-03-10T14:01:15Z

Misread

…

On Tue, Mar 10, 2020 at 7:55 AM Alexey Kuleshevich ***@***.***> wrote: @cartazio <https://github.com/cartazio> That is exactly what is in this example: Splitting must be a child subclass. class NoSplitGen g where ... genWord64 :: g -> (Word64, g)class NoSplitGen g => RandomGen g where split :: g -> (g, g) Or am I missing something you are trying to say here? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#9?email_source=notifications&email_token=AAABBQQOMQKLCKN2I5EOWWDRGYTERA5CNFSM4KZVQM7KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEOLDPXY#issuecomment-597047263>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAABBQWP5QCRMLXVJDCGNTDRGYTERANCNFSM4KZVQM7A> .

lehins added 3 commits February 23, 2020 01:08

A few inline pragmas. Check performance

bb10f39

Introduce NoSplitGen

e9d36e7

Separate Random and RandomR. Discussed in #8

a2c6a59

remove unused pragma

d3d7976

lehins closed this Mar 3, 2020

idontgetoutmuch mentioned this pull request Mar 10, 2020

Hierarchical classes Splittable/Unsplittable Gen #7

Closed

curiousleo mentioned this pull request Mar 10, 2020

Generic split function #12

Closed

idontgetoutmuch mentioned this pull request May 20, 2020

V1.2 proposal haskell/random#61

Merged

lehins mentioned this pull request Jun 4, 2020

V1.2 proposal haskell/random#62

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Separate ranges and split #9

Separate ranges and split #9

lehins commented Feb 22, 2020

lehins commented Feb 22, 2020

lehins commented Mar 3, 2020

cartazio commented Mar 5, 2020 via email

lehins commented Mar 5, 2020 •

edited

Loading

cartazio commented Mar 5, 2020 via email

curiousleo commented Mar 6, 2020

cartazio commented Mar 6, 2020 via email

curiousleo commented Mar 6, 2020

lehins commented Mar 6, 2020 •

edited

Loading

cartazio commented Mar 6, 2020 via email

lehins commented Mar 6, 2020 •

edited

Loading

cartazio commented Mar 6, 2020 via email

cartazio commented Mar 6, 2020 via email

idontgetoutmuch commented Mar 7, 2020

curiousleo commented Mar 9, 2020

lehins commented Mar 9, 2020

cartazio commented Mar 9, 2020 via email

idontgetoutmuch commented Mar 10, 2020

lehins commented Mar 10, 2020

cartazio commented Mar 10, 2020 via email

Separate ranges and split #9

Separate ranges and split #9

Conversation

lehins commented Feb 22, 2020

lehins commented Feb 22, 2020

lehins commented Mar 3, 2020

cartazio commented Mar 5, 2020 via email

lehins commented Mar 5, 2020 • edited Loading

cartazio commented Mar 5, 2020 via email

curiousleo commented Mar 6, 2020

cartazio commented Mar 6, 2020 via email

curiousleo commented Mar 6, 2020

lehins commented Mar 6, 2020 • edited Loading

cartazio commented Mar 6, 2020 via email

lehins commented Mar 6, 2020 • edited Loading

cartazio commented Mar 6, 2020 via email

cartazio commented Mar 6, 2020 via email

idontgetoutmuch commented Mar 7, 2020

curiousleo commented Mar 9, 2020

lehins commented Mar 9, 2020

cartazio commented Mar 9, 2020 via email

idontgetoutmuch commented Mar 10, 2020

lehins commented Mar 10, 2020

cartazio commented Mar 10, 2020 via email

lehins commented Mar 5, 2020 •

edited

Loading

lehins commented Mar 6, 2020 •

edited

Loading

lehins commented Mar 6, 2020 •

edited

Loading