Skip to content
This repository has been archived by the owner on Dec 11, 2020. It is now read-only.

Biased integer #332

Merged
merged 5 commits into from
Jan 2, 2015
Merged

Biased integer #332

merged 5 commits into from
Jan 2, 2015

Conversation

TimWolla
Copy link
Contributor

biasedInteger allows generating an integer that is biased by a given function.

See #304

See this example data:
graph

Number unbiased linearLow linearHigh sqrt square
1 9857 19016 1007 3222 84
2 10032 17000 2931 5715 717
3 9966 14977 4928 7538 1846
4 9914 13015 7033 8872 3645
5 10099 11080 9145 9915 6095
6 9800 9012 11001 11234 9138
7 9987 6860 12936 11906 12800
8 10096 5053 15047 13160 17019
9 10026 2945 16919 13775 21512
10 10223 1042 19053 14663 27144
Loop iterations: 100000 200217 199822 150115 299345
Loop iterations (percent) 100.00% 200.22% 199.82% 150.12% 299.34%

Loop iterations are the number of times the contents of the loop were executed, 100000 is perfect, no values were discarded.

For even more fascination see this sqrt distribution of a higher number range 😄
graph2

biasedInteger allows generating an integer that is biased by a given function.

See #304
* Returns a biased integer between $min and $max (both inclusive).
* The distribution depends on $function.
*
* The algorithm creates two doubles, x, y ∈ [0, 1) and checks whether the
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why you do this. Why isn't the following sufficient?

public function biasedInteger($min, $max, $function)
{
    $randMax = mt_getrandmax();
    return floor($function(mt_rand(0,$randMax) / $randMax) * ($max - $min + 1) + $min);
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would make creation of the “weighting” functions less intuitive. My method directly maps the distribution onto a graph and one can easily see: “Oh, that value is higher, the number is more likely to pop out”.

I cannot think of any function that would generate a linearLow distribution using your method on top of my head.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about:

function linearLow(x)
{
   return x / 2;
}

Your approach is really not fast enough because of the loop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return x / 2;

This would create all the lower half with equal probability and none of the higher half (so it is essentially the same as numberBetween($min, $max / 2).

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I shouldn't do maths after midnight :)

But anyway, I think power and root functions already deal with 90% of the required biases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, even powers are completely unintuitive with your suggestion. What distribution would you expect when using pow($x, 4);? Are the low, the medium or the high numbers favored? Think a moment about it.

I would say: The high ones, but after thinking about it one will see that in fact the lower numbers are favored!

Therefore I will not change the actual algorithm, unless the distribution stays (roughly) the same. And the numbers show that depending on the chosen function the biased generator is between 33% and 16% of the speed of numberBetween (see loop iterations percent), there are worse generators (e.g. numerify, which generates each digit independently).

If I could convince you I will happily fix the test failures and rename the function and whatnot, but if you still think that the algorithm is too slow then you may close the pull request.

@fzaninotto
Copy link
Owner

After thinking again about it, I agree with you approach. Do you have time to make the requested changes to make it mergeable?

@TimWolla
Copy link
Contributor Author

TimWolla commented Nov 4, 2014

After thinking again about it, I agree with you approach. Do you have time to make the requested changes to make it mergeable?

I'll try to take a look at it before end of the week.

@TimWolla
Copy link
Contributor Author

@fzaninotto The remaining issues should be fixed.

@TimWolla
Copy link
Contributor Author

TimWolla commented Dec 9, 2014

As a note: That tests fail is not my fault.

fzaninotto added a commit that referenced this pull request Jan 2, 2015
@fzaninotto fzaninotto merged commit ec4cae9 into fzaninotto:master Jan 2, 2015
@fzaninotto
Copy link
Owner

Thanks!

@TimWolla TimWolla deleted the biasedInteger branch January 2, 2015 23:14
@jayenne
Copy link

jayenne commented May 28, 2020

Hi, would this function be able to have return a number biased towards a 3rd number using a function?

e.g.. I want to generate users to a demographic with a min, max and mean age with a function like sqrt, gauss etc...

min = 0, max = 100, mean = 33, function= ? returns a "falloff curve" around the mean rather than one end or the other?

@christhofer
Copy link

christhofer commented Aug 28, 2020

Hi, what other values we can use in 3rd parameter?
Please update the documentation with more example, e.g how to make it biased for lower values, for middle values, or to use our own function maybe.

Tried $faker->biasedNumberBetween(1, 50, 'linearLow'),
I got error

call_user_func() expects parameter 1 to be a valid callback, function 'linearLow' not found or invalid function name
at D:\Projects\PHP\api.tss\vendor\fzaninotto\faker\src\Faker\Provider\Biased.php:27
    23|     {
    24|         do {
    25|             $x = mt_rand() / mt_getrandmax();
    26|             $y = mt_rand() / (mt_getrandmax() + 1);
  > 27|         } while (call_user_func($function, $x) < $y);
    28| 
    29|         return (int) floor($x * ($max - $min + 1) + $min);
    30|     }
    31| 

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants