Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generators for floating point numbers #6

Closed
Shimuuar opened this issue Feb 18, 2020 · 17 comments
Closed

Generators for floating point numbers #6

Shimuuar opened this issue Feb 18, 2020 · 17 comments

Comments

@Shimuuar
Copy link

This question surfaced in discussion on PR #1 but it wasn't discussed in any depth. Here is quote and links from @cartazio:

There’s several gotchas with those algorithms for float and double.

The usual unit interval used in extent Haskell Libs is wrong.

See the commented out bit linked herein
https://github.com/haskell/random/blob/master/src/Data/Distribution/FloatingInterval.hs

@idontgetoutmuch

Here's another implementation for getting random floating point numbers

https://github.com/mokus0/random-fu/blob/69a563a7b0cf444748e4b38a8bda7ada0b9acf14/random-source/src/Data/Random/Internal/Words.hs#L103

wordToFloat :: Word64 -> Float
wordToFloat x = (encodeFloat $! toInteger (x .&. 0x007fffff {- 2^23-1 -} )) $ (-23)

I think it would be convenient to put such a function in random rather than lots of packages implementing it themselves but I don't feel strongly about it.

It seems there's general consensus that we should include generators for floating point numbers but very little discussion on implementation

@cartazio
Copy link

cartazio commented Feb 18, 2020 via email

@peteroupc
Copy link

peteroupc commented Feb 19, 2020

There are several algorithms available for generating random floating-point numbers. For example:

I am aware of the Rademacher Floating-Point Library (by @christoph-conrads), which uses a sophisticated algorithm to generate random binary floating-point numbers in a range. However, the algorithm is far from trivial, and the author has yet to write up how the algorithm works.

I am also aware of Allen Downey's investigation on generating random floating-point numbers, at least those bounded by 0 and 1.

See also my section on Uniform Random Real Numbers.

@curiousleo
Copy link
Collaborator

curiousleo commented Feb 19, 2020

I'm digging into this a little. Again, thanks for the great pointers @peteroupc! The Rademacher Floating-Point Library comes with a test script. I'm going to try to hook it up to our random-quality repo and run some tests.

Edit: see tweag/random-quality#18

@idontgetoutmuch
Copy link
Owner

I think from the discussion on #5 we are now saying this is out of scope and a generator for a floating point number should be obtained from another package.

Does everyone agree?

@peteroupc
Copy link

peteroupc commented Feb 21, 2020

Random floating-point numbers should be generated using random integers, not the other way around. For example, I define the method RNDINT(max) from which all other random number generation methods can be derived. Thus, it's reasonable to treat random floating-point number algorithms as a uniform distribution similar to C++'s uniform_real_distribution.

However, precisely because generating random floating-point numbers in a range (any range, not just a range bounded by 0 and 1) is so tricky (besides resorting to dividing a random integer by a constant), declaring random floating-point generators as "out of scope" in this library should not foreclose any efforts in specifying a "complete" and correct algorithm for generating such numbers (where "complete" is used with the meaning in FloatingInterval). Once such an algorithm is specified, FloatingInterval and other applications can use it.

@curiousleo
Copy link
Collaborator

Random floating-point numbers should be generated using random integers, not the other way around.

Yep, that's understood. I think that's what @idontgetoutmuch meant too: the algorithm that takes a random number generator (as defined in the new random) and generates from it a "complete" floating point number can be defined in a separate package.

However, precisely because generating random floating-point numbers in a range (any range, not just a range bounded by 0 and 1) is so tricky (besides resorting to dividing a random integer by a constant), declaring random floating-point generators as "out of scope" in this library should not foreclose any efforts in specifying a "complete" and correct algorithm for generating such numbers (where "complete" is used with the meaning in FloatingInterval). Once such an algorithm is specified, FloatingInterval and other applications can use it.

I think that's the idea: it would be good to have a "complete" floating point number generator in this sense. But we would also like to make progress on fixing other issues with random. So the idea is to separate the issue of "complete" floating point number generation out, while, as you said, not foreclosing any effort in building such an algorithm in the future.

@Shimuuar
Copy link
Author

I think from the discussion on #5 we are now saying this is out of scope and a generator for a floating point number should be obtained from another package.

Frankly I think this is ridiculous! So yes, you have generic API but to do anything useful with it you need another package. I think that sort of standard library for working with random numbers should be at least usable. I think it's fail to discuss whether random should include things like normal distribution. But to removing basic functionality that every would expect, like generating floating point values in (0,1) range is, and I will repeat this ridiculous!

Discussion #5 is very limited. It's only about primitives and not about API at large. How should we reconcile two variants: pure PRNG and stateful PRNG. How should we deal with PRNG initialization, handle seeds etc. I have proposal and I'll post it over the weekend

@idontgetoutmuch
Copy link
Owner

lol - @Shimuuar I thought you did not agree with including a way of generating floating point random numbers (from a uniform distribution) - my misunderstanding.

@idontgetoutmuch
Copy link
Owner

Discussion #5 is very limited. It's only about primitives and not about API at large. How should we reconcile two variants: pure PRNG and stateful PRNG. How should we deal with PRNG initialization, handle seeds etc. I have proposal and I'll post it over the weekend

Great :)

@Shimuuar
Copy link
Author

I guess I didn't make myself clear. Good library for PRNG is many layered cake and floating point don't belong to lowest level but absolutely must be included on the higher levels

@cartazio
Copy link

cartazio commented Feb 21, 2020 via email

@curiousleo
Copy link
Collaborator

Frankly I think this is ridiculous!

Let's stay friendly here.

I guess I didn't make myself clear. Good library for PRNG is many layered cake and floating point don't belong to lowest level but absolutely must be included on the higher levels

Yes, I think at this point it's clear that we need a concrete proposal that can then be criticised in a constructive manner. I fully agree with the "many layered cake" picture - that's the idea I tried to get across in #5 (comment). What is not clear to me at this point is which of those layers should be provided by the new random. There are advantages and disadvantages to implementing the upper layers in separate packages.

One piece of prior art here is Rust's rand_core, which implements only traits and helpers for RNGs, but does not provide an RNG, which are all implemented in their own packages. And then there is rand, which most users will likely pick, and which includes a standard RNG as well as distributions to sample from.

So in that sense, random could be more like rand_core or more like rand. @Shimuuar, you seem to be in favour of the latter.

I think it's fail to discuss whether random should include things like normal distribution.

Could you explain what you mean by this?

Discussion #5 is very limited. It's only about primitives and not about API at large. How should we reconcile two variants: pure PRNG and stateful PRNG. How should we deal with PRNG initialization, handle seeds etc. I have proposal and I'll post it over the weekend

That's great, thanks for taking the lead here. I look forward to reading the proposal.

@curiousleo
Copy link
Collaborator

@cartazio wrote:

Fact. Eg in my Linked example

Which linked example?

@Boarders
Copy link

I believe random should provide both the low-level api for generating primitive types and on top of that should provide a higher level state-monad-like interface for how to use those primitives. This should be in conjunction with something like mwc-probability which provides a wide variety of different distributions one might need and can include various methods for generating floating point numbers etc.

I strongly believe this should be part of a single package random and not split into a variety of packages. Much the same as mono-repos there is a lot to be said for single batteries-included packages. I think the most important in the case of Haskell is providing something which must all be kept in sync and up-to-date. HAving to coordinate across multiple packages seems to be a real pain point.

@lehins
Copy link
Collaborator

lehins commented Mar 3, 2020

@Boarders I disagree with you about providing functionality for other distributions:

This should be in conjunction with something like mwc-probability which provides a wide variety of different distributions

random fills a gap between Haskell types and various RNGs. The default distribution here is naturally the uniform distribution, so that's the one we should provide out of the box. Other distributions are derived from uniform mathematically.

Here are a few arguments against including many distributions in random

  1. First of all majority of users of random will not care about those other distributions simply because in those use cases random is not being used for math, but for general purpose programming.

  2. There are already good packages that provide wrappers, like a really good one that comes to mind is random-fu, which would integrate with the new interface very nicely. In fact @idontgetoutmuch is the maintainer, so he will ensure that it will play very good with the new random ;)

  3. Coordination across multiple packages is not that terrible, moreover there is not much syncing required between an interface that produces numbers in uniform distribution and all other distributions. If you look at System.Random.MWC.Distributions module all that it uses is the uniform function.

@Shimuuar
Copy link
Author

Shimuuar commented Mar 3, 2020

I agree with @lehins here and want to add that distribution generators require quite a bit of supporting infrastructure: vector, special functions. Those are really complicated beast which in fact doesn't care about details of PRNG implementation and only need way to generate uniform numbers.

Also lack of generic API for random really hamstrung development of such libraries. Why System.Random.MWC.Distributions even exists? Why it isn't part of some more generic library? Because no suitable API of PRNGs

@idontgetoutmuch
Copy link
Owner

Closing as we now have: #49

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants