GenIO #15

lehins · 2020-03-01T17:41:20Z

Current interface has a way of generating values in IO with randomIO. @Shimuuar pointed out that it could be useful to provide interface for such generator.

There are a couple of choices we can make here. We could just as before wrap pure StdGen in an IORef, but a better choice would be to use PrimMonad

GenPrim

Creating a more general GenPrim s by wrapping StdGen in a MutVar that can be used in ST and IO as well as all of the transformers

Atomicity

Splitmix uses two Word64s for its state, so if we want atomicity the only right way is the MutVar, on the other hand if atomicity is not a requirement then we could use a small MutableByteArray for it's state therefore speeding up value generation in the stateful environment by avoiding one pointer indirection.

I personally propose having both of those implemented, since both of the implementations are pretty easy and straightforward. The latter faster-nonatmoic version is curently blocked by an actual switch to splitmix generator, however an even more general interface could be prepared for all pure nonatomic generators that have instance of Prim and/or Storable

The text was updated successfully, but these errors were encountered:

lehins · 2020-03-02T23:11:45Z

Two separate PRs implement both concepts:

Implement PrimGen #16 - PrimGen wrap a pure generator in a MutVar, which is essentially the same thing as an IORef
Implement MutGen. #17 - MutGen which keeps the state of pure generator in a MutableByteArray

Here are the benchmarks with splitmix package used as RNG:

Let's look at pros and cons for both of these:

Both have the same pro that they can be used in ST and in IO in exactly the same fashion. In fact, both of their APIs are identical

1 - `PrimGen`

Cons:

As I suspected, the performance with MutVar is terrible, a factor of a ~x25 slowdown due to extra pointer indirection
Parallel number generation is horrendous because of contention for the same resource. A factor of ~x5000 slower than pure parallel generator!

Pros:

A unique ability of concurrent access to the same generator in a thread safe manner. This can be a pretty useful property for applications that need random numbers sporadically in a highly concurrent environment (a webserver comes to mind as an example).
Works just as well for any pure RNG

2 - `MutGen`

Cons:

Pure generator must have a Prim class instance, which isn't terribly hard to implement if RNG state is small (eg. splitmix has two Word64s)
Not thread safe. In order to safely use it with multiple threads, each of them will need a separate generator, which can conveniently be acquired with splitMutGen prior to a forkIO

Pros:

Practically no overhead when compared with pure generator

Conclusion

It seems like both of them can be useful in their own realms and therefore I think we should add them to random with good documentation in which situations each of them should be used.

idontgetoutmuch · 2020-03-13T14:32:02Z

I still don't understand why anyone would want to do

runPrimGenIO (StdGen 17 29) (\g -> replicateM 3 (uniformWord32 g))
([2865024807,3006832440,3335237757],1808919043 88936459)

rather than

runGenState (StdGen 17 29) (replicateM 3 (uniformWord32 PureGen))
([2865024807,3006832440,3335237757],1808919043 88936459)

Looking at your performance chart above, there seems to be no performance advantage so what am I missing?

idontgetoutmuch · 2020-03-13T15:03:06Z

This is necesarry for example for

ioGen gen = do
  withBinaryFile “foo.txt” WriteMode $ \h -> randomM gen >>= hPutStr h

Shimuuar · 2020-03-14T15:34:40Z

I still don't understand why anyone would want to do

I think there's case for adding type class along the lines:

class MonadRandom m where
  type PRNG m
  genUniformWord32 :: m Word32
  ...

People will want to use such API and random-fu will need to define such type class anyway. It makes sense to push such type class on top of package hierarchy

lehins · 2020-03-14T16:10:37Z

@Shimuuar I disagree. What you are suggesting is an alternative approach to what we already have.

I think it is a very limiting design. What you are suggesting is essentially the same thing as what we have now, but with fun deps:

class Monad m => MonadRandom g m | m -> g where

This reduces number of generators it can be used with, and that is exactly why random-source also requires another class RandomSource. All in all what I've implemented reduces the need for two classes.

I suggest we adjust random-fu to account for the new design, instead of adding inferior api to random.

To sum it up, we do not need such type class.

will need to define such type class anyway.

Shimuuar · 2020-03-14T17:18:08Z

Both designs are equal. Any instance for g → m W style could be converted to m W style by wrapping in ReaderT (and possible additional newtypes). One thing is lost. We couldn't use generator as parameter.

So I don't think that one design is strictly better than another. They make different tradeoffs. If you writing some Monte-Carlo code carrying same g everywhere becomes very annoying. If you write mtl-style code then exponential 3 is much better than something like: exponential 3 =<< asks prng. On other hand it forces to create newtype wrappers. I still hope that there's way to resolve this tensions without duplicating everything

Also what @idontgetoutmuch think about this?

lehins · 2020-03-14T17:45:29Z

@Shimuuar They are not equal, because this class cannot be used with stateful generators:

class MonadRandom m where
  type PRNG m
  genUniformWord32 :: m Word32

You cannot have these two instances at the same time:

import System.Random.MWC as MWC
instance MonadRandom IO where
  type PRNG IO = MWC.Gen RealWorld

import System.Random.PCG as PCG
instance MonadRandom IO where
  type PRNG IO = PCG.Gen RealWorld

Shimuuar · 2020-03-14T18:01:15Z

They absolutely can. But one have to add newtype wrappers:

instance MonadRandom (ReaderT (MWC.Gen RealWorld) IO) where
  type PRNG (ReaderT (MWC.Gen RealWorld) IO) = MWC.Gen RealWorld

This solution is of course far satisfactory but it illustrates idea. This is why I said this style forces use of newtypes.

lehins · 2020-03-14T18:09:58Z

I am trying to tell you that this idea is flawed! It doesn't solve the problem I am talking about!

What would be the instance for PCG?

instance MonadRandom (ReaderT (PCG.Gen RealWorld) IO) where
  type PRNG (ReaderT (PCG.Gen RealWorld) IO) = PCG.Gen RealWorld

But that will conflict with MWC instance you provided

What about all other monads? With above approach you are restricting it to ReaderT, which doesn't scale.

@Shimuuar I guess, instead of throwing ideas left and right the best way for you to demonstrate the concept is to implement it and submit a PR.

Shimuuar · 2020-03-14T18:52:14Z

instance MonadRandom (ReaderT (PCG.Gen RealWorld) IO) where
instance MonadRandom (ReaderT (MWC.Gen RealWorld) IO) where

Instance heads are different so there's no overlap

What about all other monads? With above approach you are restricting it to ReaderT, which doesn't scale.

In more realistic example I'd create newtype wrapper around ReaderT and would define instance around it. ReaderT and other transformer's monads would get passthrough instance (mtl curse)

@Shimuuar I guess, instead of throwing ideas left and right the best way for you to demonstrate the concept is to implement it and submit a PR.

I do argue a lot but here I agree with you :)

lehins · 2020-03-14T20:46:36Z

@Shimuuar I think I have an elegant solution for your concerns. I am using microlens here, but it can be rewritten without it. The suggested approach is in this PR: #38 (it also includes my suggestion from here #11 (comment))

The gist of it is:

Definition of a HasGen class that provides the access to stateful generator, with some trivial instances for included generators:

class HasGen env g | env -> g where
  genL :: Lens' env g

instance RandomGen g => HasGen (PureGen g) (PureGen g) where
  genL = lens (const PureGenI) const

instance RandomGen g => HasGen (PrimGen s g) (PrimGen s g) where
  genL = id

Definition of helper functions that are reader and random aware:

uniformEnv :: (MonadReader env m, MonadRandom g m, HasGen env g, Uniform a) => m a
uniformEnv = do
  env <- ask
  uniform (env ^. genL)

Usage of such functions becomes very easy and doesn't require generator to be passed around manually:

foo :: (MonadReader env m, MonadRandom g m, HasGen env g) => m Int
foo = do
  x :: Int <- uniformEnv
  y :: Int <- uniformEnv
  pure (x + y)

Then you get exactly what you where asking for:

runPureGenState :: RandomGen g => g -> ReaderT (PureGen g) (State g) a -> (a, g)
runPureGenState g = runGenState g . flip runReaderT PureGenI

bazState :: StdGen -> (Int, StdGen)
bazState g = runPureGenState g foo

But most importantly it allows you to keep mutable generators (such as in mwc-random for example) in the reader environment:

data MyEnvPrimGen = MyEnvPrimGen {
  myEnvPrimGen :: PrimGen RealWorld StdGen
  }

instance HasGen MyEnvPrimGen (PrimGen RealWorld StdGen) where
  genL = lens myEnvPrimGen (\e g -> e { myEnvPrimGen = g })

bazPrim :: StdGen -> IO (Int, StdGen)
bazPrim g = do
  (res, PrimGen g') <-
    withGenM (PrimGen g) $ \mg -> flip runReaderT (MyEnvPrimGen mg) foo
  pure (res, g')

@Shimuuar Let me know what you think or if you have questions about this approach. We use lens+reader extensively in rio and it works really well.

idontgetoutmuch · 2020-03-15T08:59:49Z

I still don't understand why anyone would want to do

I think there's case for adding type class along the lines:
class MonadRandom m where
  type PRNG m
  genUniformWord32 :: m Word32
  ...

@Shimuuar doesn't this already exist in random-source: https://hackage.haskell.org/package/random-source-0.3.0.8/docs/Data-Random-Source.html#t:MonadRandom?

class Monad m => MonadRandom m where
    -- |Generate a random value corresponding to the specified primitive.
    -- 
    -- This is an internal interface; use at your own risk.  It may change or
    -- disappear at any time.
    getRandomPrim :: Prim t -> m t
    getRandomPrim PrimWord8             = getRandomWord8
    getRandomPrim PrimWord16            = getRandomWord16
    getRandomPrim PrimWord32            = getRandomWord32
    getRandomPrim PrimWord64            = getRandomWord64
    getRandomPrim PrimDouble            = getRandomDouble
    getRandomPrim (PrimNByteInteger n)  = getRandomNByteInteger n

    -- |Generate a uniformly distributed random 'Word8'
    getRandomWord8 :: m Word8
    getRandomWord8 = getRandomPrim PrimWord8

    -- |Generate a uniformly distributed random 'Word16'
    getRandomWord16 :: m Word16
    getRandomWord16 = getRandomPrim PrimWord16

    -- |Generate a uniformly distributed random 'Word32'
    getRandomWord32 :: m Word32
    getRandomWord32 = getRandomPrim PrimWord32

    -- |Generate a uniformly distributed random 'Word64'
    getRandomWord64 :: m Word64
    getRandomWord64 = getRandomPrim PrimWord64

    -- |Generate a uniformly distributed random 'Double' in the range 0 <= U < 1
    getRandomDouble :: m Double
    getRandomDouble = getRandomPrim PrimDouble

    -- |Generate a uniformly distributed random 'Integer' in the range 0 <= U < 256^n
    getRandomNByteInteger :: MonadRandom m => Int -> m Integer
    getRandomNByteInteger n = getRandomPrim (PrimNByteInteger n)

idontgetoutmuch · 2020-03-15T10:52:05Z

@Shimuuar I disagree. What you are suggesting is an alternative approach to what we already have.

I think it is a very limiting design. What you are suggesting is essentially the same thing as what we have now, but with fun deps:
class Monad m => MonadRandom g m | m -> g where
This reduces number of generators it can be used with, and that is exactly why random-source also requires another class RandomSource. All in all what I've implemented reduces the need for two classes.

I suggest we adjust random-fu to account for the new design, instead of adding inferior api to random.

To sum it up, we do not need such type class.

will need to define such type class anyway.

I agree we should try and simplify random-fu and its dependencies. Maybe we can even get rid of random-source but this should be out of scope for what we are trying to do now.

idontgetoutmuch · 2020-03-15T12:25:40Z

Both designs are equal. Any instance for g → m W style could be converted to m W style by wrapping in ReaderT (and possible additional newtypes). One thing is lost. We couldn't use generator as parameter.

So I don't think that one design is strictly better than another. They make different tradeoffs. If you writing some Monte-Carlo code carrying same g everywhere becomes very annoying. If you write mtl-style code then exponential 3 is much better than something like: exponential 3 =<< asks prng. On other hand it forces to create newtype wrappers. I still hope that there's way to resolve this tensions without duplicating everything

Also what @idontgetoutmuch think about this?

My usual pattern for random numbers is to use random-fu e.g.

samples :: (Foldable f, MonadRandom m) =>
                    (Int -> RVar Double -> RVar (f Double)) ->
                    Int ->
                    m (f Double)
samples repM n = sample $ repM n $ stdNormal

biggerThan5 :: Int
biggerThan5 = length (evalState xs (pureMT 42))
  where
    xs :: MonadRandom m => m [Double]
    xs = liftM (filter (>= 5.0)) $ samples replicateM 100000

I certainly wouldn't want to thread a generator through all my code but I don't have to with random-fu. I'll see if I can find a better example.

EDIT: Maybe this is a better example of usage.

gibbsSampler ::
  Double ->
  RVarT (W.Writer [(Double,Double)]) Double

gibbsSampler oldTheta2 = do
  newTheta1 <- rvarT (Normal (y1 + rho * (oldTheta2 - y2)) var)

  lift $ W.tell [(newTheta1, oldTheta2)]

  newTheta2 <- rvarT (Normal (y2 + rho * (newTheta1 - y1)) var)

  lift $ W.tell [(newTheta1, newTheta2)]

  return $ newTheta2

runMCMC :: [(Double, Double)]
runMCMC =
  drop burnIn $
  snd $
  runWriter $
  evalStateT (sample (iterateM_ gibbsSampler 2.5))
             (pureMT 2)

@Shimuuar: does that help or were you asking me a different question?

Shimuuar · 2020-03-15T12:46:46Z

I think found another design which avoids duplication of API in RandomGen and MonadRangom and hard requirement of threading generator/proxy for selecting generator type. I'll submit a PR one I get it in a working state

lehins · 2020-03-15T13:15:55Z

@Shimuuar did you get a chance to look at what I wrote? Using Reader solves the problem of need to carry the generator around, but it doesn't force that pattern on the user.

CC @idontgetoutmuch

Shimuuar · 2020-03-15T13:36:05Z

I did. I however won't comment on it until I prepate my PR so it's possible to compare both approaches

idontgetoutmuch · 2020-03-30T12:03:27Z

@Shimuuar did you ever prepare a PR? I think we have made quite a lot of progress now and would be very interested in your comments / proposal.

Shimuuar · 2020-03-30T12:07:53Z

Sadly no. I had no time to work at random at all for last two weeks

curiousleo · 2020-04-14T11:50:11Z

@Shimuuar are you still looking into this? If not, I'd vote to close this issue.

Shimuuar · 2020-04-14T12:12:25Z

Yes. I think this could be closed

idontgetoutmuch · 2020-04-14T12:14:38Z

👍

This was referenced Mar 1, 2020

Implement Uniform and UniformRange #14

Merged

Implement PrimGen #16

Merged

Implement MutGen. #17

Closed

idontgetoutmuch closed this as completed Apr 14, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GenIO #15

GenIO #15

lehins commented Mar 1, 2020

lehins commented Mar 2, 2020 •

edited

Loading

idontgetoutmuch commented Mar 13, 2020

idontgetoutmuch commented Mar 13, 2020

Shimuuar commented Mar 14, 2020

lehins commented Mar 14, 2020

Shimuuar commented Mar 14, 2020

lehins commented Mar 14, 2020

Shimuuar commented Mar 14, 2020

lehins commented Mar 14, 2020

Shimuuar commented Mar 14, 2020

lehins commented Mar 14, 2020

idontgetoutmuch commented Mar 15, 2020

idontgetoutmuch commented Mar 15, 2020

idontgetoutmuch commented Mar 15, 2020 •

edited

Loading

Shimuuar commented Mar 15, 2020

lehins commented Mar 15, 2020

Shimuuar commented Mar 15, 2020 via email

idontgetoutmuch commented Mar 30, 2020

Shimuuar commented Mar 30, 2020

curiousleo commented Apr 14, 2020

Shimuuar commented Apr 14, 2020

idontgetoutmuch commented Apr 14, 2020

GenIO #15

GenIO #15

Comments

lehins commented Mar 1, 2020

GenPrim

Atomicity

lehins commented Mar 2, 2020 • edited Loading

1 - PrimGen

2 - MutGen

Conclusion

idontgetoutmuch commented Mar 13, 2020

idontgetoutmuch commented Mar 13, 2020

Shimuuar commented Mar 14, 2020

lehins commented Mar 14, 2020

Shimuuar commented Mar 14, 2020

lehins commented Mar 14, 2020

Shimuuar commented Mar 14, 2020

lehins commented Mar 14, 2020

Shimuuar commented Mar 14, 2020

lehins commented Mar 14, 2020

idontgetoutmuch commented Mar 15, 2020

idontgetoutmuch commented Mar 15, 2020

idontgetoutmuch commented Mar 15, 2020 • edited Loading

Shimuuar commented Mar 15, 2020

lehins commented Mar 15, 2020

Shimuuar commented Mar 15, 2020 via email

idontgetoutmuch commented Mar 30, 2020

Shimuuar commented Mar 30, 2020

curiousleo commented Apr 14, 2020

Shimuuar commented Apr 14, 2020

idontgetoutmuch commented Apr 14, 2020

lehins commented Mar 2, 2020 •

edited

Loading

1 - `PrimGen`

2 - `MutGen`

idontgetoutmuch commented Mar 15, 2020 •

edited

Loading