-
Notifications
You must be signed in to change notification settings - Fork 18.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix bug related to Boost RNG #297
Conversation
The distributions are sharing the same random seed |
Possible explanations and fixes: Hellekalek, P.: Good random number generators are (not so) easy to find. Mathematics and Computers in Simulation, 46: 485--505, 1998. http://stackoverflow.com/questions/12779235/how-to-properly-choose-rng-seed-for-parallel-processes |
Unless I'm missing your point, I don't think this is what's happening -- I only do The two samples are not generated from two separate parallel processes; it is a single deterministic process first generating the Gaussian samples and then generating the Bernoulli samples. |
^ The above PR just re-seeds the RNG in the test case before sampling from the Bernoulli. This is not a bug fix, it just makes the test pass -- you should be able to sample from a Gaussian followed by a Bernoulli without having the samples be correlated. |
I was wrong though, the behavior is the same as re-seeding with 1701 before generating the Bernoulli samples. Anyway, I'm too tired to think about this now. |
The correlation can be introduced by the combination of two uncorrelated sequences. http://www.sitmo.com/article/generating-correlated-random-numbers/ |
True random number generators (TRNG) may not be achievable without physical simulation[1]. Boost.Random as a pseudo random number generator (PRNG) inevitably introduces serial correlation even between different distributions. If the bug has no significant practical influence in training or testing the network, then it can be temporarily ignored. [1] Helmut G. Katzgraber. Random Numbers in Scientific Computing: An Introduction. arXiv:1005.4117 [physics.comp-ph] |
I fixed this in 1dfdce8. The problem was with Boost's |
Thanks for the fix Jeff!
I thought about this since we talked this morning and I don't have a better solution, so let's just live with it. |
I'll add such a comment. Also there's at least one test_math_functions failure now -- looking into that. |
@shelhamer added comment in 51b2694 Re failing test_math_functions: the only failure was TestGPUHammingDistance and it would seem that the only reason it previously passed was because the RNG wasn't working, so it would GaussianFill in blob A, and then GaussianFill in blob B, but of course these had the same result so they had Hamming distance = 0. Non-trivial Hamming distances don't seem to be computed correctly. I spent a while trying to debug this but failed, and since this function isn't used anywhere in the codebase (I'm not actually sure why it was added -- why do we need a Hamming distance function?) I removed it (and its test case) in 8e2285b. Happy to see it brought back if someone (e.g. @kloudkl who added it) wants to fix it. |
Thanks for commenting and investigating the failing math test case. Let's The fancy Hamming distances were introduced by @kloudkl for their proposed On Mon, Apr 7, 2014 at 7:59 PM, Jeff Donahue notifications@github.comwrote:
|
I'd like to clean up the added RNG unit tests a bit (they're barely readable to me, and I wrote them). Will probably not happen until tomorrow, but should be good to go after that. |
In my latest two commits I cleaned up the RNG unit tests (85974f5) and was less aggressive in my removal of |
OK, done for real now...since last comment I've: -changed the RNG function names from, e.g. -changed the RNG function argument ordering so the pointer to be filled by the RNG is last (per Google C++ style guidelines and the other math functions). -added analogous GPU RNG functions for Gaussian and Uniform (thought about also adding Bernoulli but curand doesn't have a native function for it); changed all current uses of curandGenerate in the code to call these functions instead. -added unit tests for the GPU RNG functions. -confirmed that adding -confirmed that all tests pass. |
gaussian followed by a bernoulli
…EMENTED with TODO to fix and disable its unit test
actually check for uncorrelated RNG results
Nice to have this figured out and the rng interface polished. Thanks Jeff. |
Fix persistence of random state with Boost RNG
@jeffdonahue I'm getting random core dumps during fillers initialization after updating dev in durian2 and using atlas and blas. |
@sguada I haven't seen that behavior - can you give any more information? Does it stop happening when you checkout the commit before this PR? |
@sguada Did you do Note the Makefile.config format changed slightly in ed38827 to support more BLAS choices. |
Fix RNG segfault related to #297
Fix persistence of random state with Boost RNG
Fix RNG segfault related to BVLC#297
I've observed some weird behavior with the new Boost RNG that I can't explain or fix, and am posting a PR here with test cases so others can hopefully reproduce the behavior. An explanation of the steps taken in the test
TestRngGaussianTimesBernoulli
to demonstrate the weird behavior:caffe_vRngGaussian
-- check that the samples are roughly half positive and half negative (with no zeros) as expected. These checks pass.caffe_vRngBernoulli
-- check that roughly N_p are ones and N_(1-p) are zeros (with no other values) as expected. These checks pass.I also added the equivalent test but replacing the Gaussian with a uniform distribution on the interval [-1, 1] using
caffe_vRngUniform
. In this case the result indicates an exact inverse correlation between the Bernoulli andGaussianUniform results -- the product consists of ONLY negative values.If anyone has a suggestion for what the problem might be and/or how to fix it, please comment here (and/or send a PR with the fix if you like). Thanks!