-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Floating point numbers #105
Comments
Relevant: #113 (comment), which suggests thinking about FP numbers as triples |
Benchmarks for the different methods, as implemented here: https://gist.github.com/curiousleo/d8222f63be859f8207e805e71d13095a
It appears that of these "simple" methods (non-Downey-inspired), |
I think this says we should go with the method used in MWC i.e. select 32 bits convert to double, multiply by 2^-23 - this is a compromise between performance and coverage and much better than is in 1.1. Also simplifies the code base as we can delete |
What is the advantage of mwc-random's |
Oh sorry I thought they were the same but let's use the one with better coverage. |
Based on @christoph-conrads' comments on #113, I've started looking into shortcomings of the translation approach, i.e. "given Translating
Translating
|
My comments are misleading in that they obscure the fundamental reason why the affine transformation When computing random integers, it is well known that it may be necessary to reject certain random number generator (RNG) variates. For example, given an RNG computing values 0, 1, 2 with equal probability, the only way to select between an apple and a pear with 50% probability is by
Floating numbers are a finite set and so we must reject values, too. As an example, you are given an RNG with output in Here is the key question: How do you map the available set of numbers to To help you, here is a table with the probabilities.
There is no way to sum up a finite number of values This leads me to a nice conjecture:
Algorithm: def uniform(a, b):
assert a <= b
if b <= 0:
return -uniform(-b, -a)
if a < 0 and b > 0:
x = next_larger_power_of_two(abs(a))
y = next_larger_power_of_two(abs(b))
u = uniform(0, x) with probability x/(x+y), uniform(0, y) otherwise
check if u is in [a,b], try again if not
return u
while True:
x = random01()
d = b-a with upward rounding
e = next_larger_power_of_two(d)
u = x * e + a with rounding toward zero
if a <= u and u <= b:
return u |
@christoph-conrads thank you so much for this clear explanation! When I was trying to come up with an algorithm, I was thinking along similar lines but with much less clarity :) I believe that the code has 1.5 bugs:
|
Yes, this should read as
Yes. def uniform(a, b):
assert a <= b
if signbit(b) == 1:
return -uniform(-b, -a)
if a < 0 and b > 0:
x = next_larger_or_equal_power_of_two(abs(a))
y = next_larger_or_equal_power_of_two(abs(b))
u = -uniform(0, x) with probability x/(x+y), uniform(0, y) otherwise
check if u is in [a,b], try again if not
return u
while True:
x = random01()
d = b-a with upward rounding
e = next_larger_or_equal_power_of_two(d)
u = x * e + a with rounding toward zero
if a <= u and u <= b:
return u |
This issue motivates and gives context for #102, where I have implemented a method that lets us generate every representable number in the inclusive unit interval with the correct probability based on Downey's method [1].
Float
representationThere are roughly two ways to create a random floating point number in the unit interval: non-injective ones and injective ones.
As a quick refresher, an IEEE binary32 floating point number ("float") is laid out as follows (the
|
symbols are just for illustration):which is interpreted as the number
where
[bx .. by]
is the number we get if we interpret the bitsbx
toby
written consecutively as a binary number.b31
is the sign[b30 ... b23]
is the exponent, which is biased by -127[b22 ... b0]
is the mantissaFloat
is the set of all representable IEEE binary32 floating point numbers. Let's refer to the subset ofFloat
that represents the range [0,1] asUnitFloat
.What form do the numbers in
UnitFloat
take? They are:b31 = 0
and exponent < 0, that is,[b30 ... b23]
< 127; this includes, as a special case, the float interpreted as zerob31 = 0
,[b30 ... b23]
= 127 and[b22 ... b0]
= 0Non-injective (
Word32 -> UnitFloat
)Constant exponent
The method used right now is this:
where
w32
is a randomWord32
.What is happening here? This sets the lowest 32-9=23 bits, i.e. the mantissa, to random bits from the
Word32
. It also sets the upper bits to0x3f8
, which is0|01111111
:b31 = 0
so the sign is positive[b30 ... b23] = 01111111 = 127
, so the exponent is127 - 127 = 0
.In other words, this method generates numbers of the form
These numbers all lie in [1, 2):
By subtracting by 1.0 we get numbers in the range [0, 1).
AFAICT the v1.1 method does something very similar.
Multiplication
mwc-random
uses multiplication and addition to scale aWord32
into the (0,1] range.Comparison
Fortunately it is possible to simply enumerate all
Word32
. This is what I've done in the following.full
is the cardinality ofUnitFloat
count
is the cardinality of the image of the respective conversion functionspercentage hit
iscount / full
, expressed as a percentageGenerated by https://gist.github.com/curiousleo/d8222f63be859f8207e805e71d13095a.
Injective (
Word150 -> UnitFloat
)What would we need to do to actually hit every possible
UnitFloat
with the correct probability? Downey [1] shows one method.Roughly, it amounts to the following:
[b22 ... b0]
at random126 - n
wheren
is the number of failed Bernoulli trials up to the first success, up to 126. This can be done by drawing a random 126-bit bitstring and counting the number of leading zerosThis maps to every possible float in the range
[0, 2^-1*1.[11 ... 11]]
. To de-bias the method and include 1, the following correction is made:I won't go into the probabilistic justification for this here, but it should be clear that this makes it possible to hit 1: exactly when the mantissa was all zeros, the unbiased exponent before the correction was -1 and the coin toss results in increasing the exponent by one. Then we get 2^0 = 1.
This process consumes 150 bits of randomness overall (in most cases, a lot less: we can stop the Bernoulli trials once we've had a single success). It is injective: for every
UnitFloat
, we can find at least one conceptualWord150
that maps to it.[1] http://allendowney.com/research/rand/
Double
The same considerations apply to generating uniformly distributed
Double
values. Here, Downey's method consumes up to 1022 bits for Bernoulli trials + 52 bits for the mantissa + 1 bit for the exponent correction = 1075 bits.Unit interval vs. general range
In order to generate a floating point number in an arbitrary interval
[lo, hi]
withlo < hi
, we can generatex
in the unit interval and then calculatelo + x * (hi - lo)
.Alternatively, it is conceivable to use a variant of Downey's method to directly generate a floating point number uniformly distributed over an arbitrary interval. That is what
rademacher-fpl
[2] does. Pro: it's "perfect": hitting every representable number in the range with the right probability. Con: it's pretty involved. Generating a floating point number in the unit interval is a very simple special case of the more general procedure.In order to weigh these options, I wonder how we can analyse the quality of
lo + x * (hi - lo)
: how many representable numbers in the resulting range can we hit in this way? Also, is there a better way to do this calculation, perhapsx * hi - lo * (x - 1)
, that improves the "hit rate"?[2] https://gitlab.com/christoph-conrads/rademacher-fpl/-/blob/master/include/rademacher-fpl/impl/uniform-real-distribution.hpp
The text was updated successfully, but these errors were encountered: