-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Conversation
… error reproducability.
I prefer that the original author of the cpu implementation correct the non-robustness of the float16 test. In the meantime, we could have the entire test ignored or put it on a fixed seed, or comment out just the float16 cpu part and file an issue: Which approach would you recommend? |
Hello @DickJC123 , interesting idea! Do you also support the possibility to set a global seed which will be set for each test before it is getting executed? Right now, the only way I see is adding |
If you execute 'export MXNET_TEST_SEED=42' before running your tests, then every test (except those that override the seed like the 1234 case below) with have the seeds set to 42. If you instead execute 'export MXNET_MODULE_SEED=42' then a random number generator seeded by 42 will provide the sequence of RNG seeds for the tests of each module (i.e. file). If your tests pass in this mode, they will do so repeatedly since every test's seed is deterministic.
I'm hoping that most people will want to run with no environment variables set. Yes, that means that the tests will get different seeds each time, but those seeds are known and printed out for any test that throws an exception. Even a core-dump can be reproduced in isolation by rerunning the module first with the same seed in a debug mode. So far, this methodology has made it easy for me to track down and fix many of our test's random failures. Most often this has been a case of too-small tolerances or issues with use of the finite difference method, but not always! I'm hoping this PR will be incorporated and will lead to a stronger framework.
…----- Original Message -----
From: "Marco de Abreu" <notifications@github.com>
To: "apache/incubator-mxnet" <incubator-mxnet@noreply.github.com>
Cc: "Dick Carter" <dick.carter@comcast.net>, "Mention" <mention@noreply.github.com>
Sent: Sunday, November 5, 2017 2:47:07 PM
Subject: Re: [apache/incubator-mxnet] Ci test randomness2 (#8526)
Hello @DickJC123 , interesting idea! Do you also support the possibility to set a global seed which will be set for each test before it is getting executed? Right now, the only way I see is adding @with_seed(1234) to every single test. My goal is to have a completely deterministic and reproducible behaviour during test execution by using the exact same seed for every single test without having to alter the code.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub , or mute the thread .
|
I like this approach, but I would like to have fixed seeds by default instead of setting an environment var, mostly for the cognitive load on test runners, we already have issues running tests on different platforms and all, if we add on top that not defining this variable is going to cause flaky tests or random failures, I think it's better to have it the other way around. |
@DickJC123 Ping. Is this a duplicate? I'm going to close the older one |
Yes, the 'randomness2' is an update to my thinking, so the old one can be closed. I can rebase this PR if you are interested in accepting it. The feedback was generally favorable, I just got busy so I didn't lobby heavily for its acceptance. I believe it adds new tools for diagnosing CI failures, helping it to pass reliably, yet operate with random data from run to run. -Dick
…----- Original Message -----
From: "Eric Junyuan Xie" <notifications@github.com>
To: "apache/incubator-mxnet" <incubator-mxnet@noreply.github.com>
Cc: "Dick Carter" <dick.carter@comcast.net>, "Mention" <mention@noreply.github.com>
Sent: Tuesday, December 12, 2017 2:17:18 PM
Subject: Re: [apache/incubator-mxnet] Ci test randomness2 (#8526)
@DickJC123 Ping. Is this a duplicate? I'm going to close the older one
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub , or mute the thread .
|
So this PR is complete? I'll see if I got time to look into it more
closely. It would be great if you could rebase it in order to test
everything on the new CI. We're having a lot of issues with flaky tests, so
this could come in handy.
Am 12.12.2017 11:26 nachm. schrieb "Dick Carter" <notifications@github.com>:
… Yes, the 'randomness2' is an update to my thinking, so the old one can be
closed. I can rebase this PR if you are interested in accepting it. The
feedback was generally favorable, I just got busy so I didn't lobby heavily
for its acceptance. I believe it adds new tools for diagnosing CI failures,
helping it to pass reliably, yet operate with random data from run to run.
-Dick
----- Original Message -----
From: "Eric Junyuan Xie" ***@***.***>
To: "apache/incubator-mxnet" ***@***.***>
Cc: "Dick Carter" ***@***.***>, "Mention" <
***@***.***>
Sent: Tuesday, December 12, 2017 2:17:18 PM
Subject: Re: [apache/incubator-mxnet] Ci test randomness2 (#8526)
@DickJC123 Ping. Is this a duplicate? I'm going to close the older one
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub , or mute the thread .
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#8526 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ARxB65tXmnp0U3RCOJBFKSzqNAScYs35ks5s_v2EgaJpZM4QQk-h>
.
|
@DickJC123 ping |
@DickJC123 any update on this? |
I think this can be closed. @DickJC123 opened another PR for this and it was merged. |
Description
This PR introduces a new simple level of control over unittest random seeds while providing inter-test random number generator (RNG) isolation. This PR is an improved replacement to the pending PR #8313. The improvements over that PR are:
To provide this functionality, a custom decorator "@with_seed()" was created. This was considered more powerful than the nosetests "@with_setup()" facility, and less disruptive than changing all tests to become methods of a nosetests test class. The proposed new approach is demonstrated on a simple "test_module.py" test file of three tests. Assuming that the second test needs a set seed for robustness, the file might currently appear as:
Even though test_op3() is OK with nondeterministic data, it will have only a single dataset because it is run after test_op2, which sets the seed. Also, if test_op1() were to fail, there would be no way to reproduce the failure, except for running the test individually to produce a new and hopefully similar failure.
With the proposed approach, the test file becomes:
By importing unittests/common.py, the seeds of the numpy and mxnet RNGs are set initially to a random "module seed" that is output to the log file. The initial RNGs can be thought of as module-level RNGs that are isolated from the individual tests and that provide the string of "test seeds" that determine the behavior of each test's RNGs. The "@with_seed()" test function decorator requests a test seed from the module RNG, sets the numpy and mxnet seeds appropriately and runs the test. Should the test fail, the seed for that test is output to the log file. Pass or fail, the decorator reinstates the module RNG state before the next test's decorator is executed, effectively isolating the tests. Debugging a failing test_op3 in the example would proceed as follows:
Because test_op3 failed, its seed appeared in the log file. Also, the test_op2 seed was displayed as a reminder that that test needs more work before it is robust enough for random data. The command to reproduce the problem is produced simply by cutting and pasting from the log:
If test_op3 instead dumped core, the test seed would not be initially apparent. Assuming the core-dump is repeatable based on the data, the module would first be re-run with the command:
The log would now include the test seeds for all tests before they are run, so the test could then be run in isolation as before with MXNET_TEST_SEED=2096230603.
Let's assume that test_op3 was altered by increasing a tolerance. How robust is the test now? This can be explored by repeating the test many times as in:
Finally, this PR adds the @with_seed() decorator for all tests in modules that use random numbers. Also, it includes many specific test robustness fixes that were exposed once this new methodology was adopted internally by the author.
Checklist
Essentials
make lint
)Changes
Comments