-
Notifications
You must be signed in to change notification settings - Fork 165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Get rid of GAP stones in tests #1060
Comments
I agree with this proposal. |
I agree. The 'stonesLimit' is mainly for travis, to run the things in teststandard which are "fast enough". it would be better to just split teststandard into two directories. |
(I am sure this has been discussed before) should there be a way to time tests just in wallclock time (using As you say, this is only relevant if one is running some test on some well-defined hardware, but it seems useful to be recording runtimes to see regressions, maybe? |
@markuspf Something like this certainly would be desirable, but one needs to put some thought into it, else it'll not be useful but only distracting. One approach would be to setup our own version of http://speed.pypy.org/ (the code for the site is available) and simply plot the timing results produced by some fixed hardware. E.g. perhaps from our Jenkins. (The travis CI timings are not useful for this, as they can vary a lot). A second (not quite alternative, but complementing) approach would be to compute a "baseline speed" at the start of the tests (i.e. time some artificial mini benchmark), and then time all tests, but "normalize" the results against the timings of the mini benchmark (i.e. compute the quotient). Then, check those quotients in order to determine whether a speed regression occurred. This approach might make it possible to do the tests on travis, though it's still not perfect, as the Travis VMs can slow down or speed up in the middle of a test run, as load changes... I'd love to have 1, as the great graphical representation also makes it easier to spot and understand regressions, and it also shows improvements, which can be quite motivating. Alas, somebody would have to invest the time and effort required to set it up.... |
I agree, so lets first get rid of the GAPStones to scratch that itch, then move on to do the timing page, and lets then come up with a clear spec of what we're expecting of the timing page (and find someone who sets it up). |
I never found the GAPStones particularly useful; so I agree to get rid of them. Concerning timing and comparing for regression tests: Are you aware of the optional argument for the |
@frankluebeck I saw those before, but must admit I wasn't actively aware of them. Thank you for the reminder! They look indeed potentially useful. Not sure if we can use them with travis (due to the fluctuation of performance I mentioned, and also because we'd have to see how we store results between builds -- might be possible via their caching feature). But perhaps this could be setup on our Jenkins? Or perhaps it already is? I vageuely recall that @alex-konovalov showed me something like that... That said, I could never really "get" our Jenkins. I find it far too complicate to navigate and undertand (resp. rememeber) what is what and where and how to use it... The user interface is just bad (it might be great if you have time to train it and learn it, and then constantly use it... but I don't have that time and patience. In contrast, I managed to pick up Travis (and also codecov, after they improved it a bit) fairly quickly. |
My opinion quite differs from @fingolfin's, but I use Jenkins regularly since 2009 so I am very biased. The UI could be better, but a particular feature I like is matrix jobs, when you can easily see patterns if tests are failing not in every configuration: Anyhow, this issue is about GAP4Stones, and I agree to get rid of them. Regarding walltime, we can use Regarding storing timings in Jenkins: some are extracted to build plots, but that is not perfect, so I would agree improving it as suggested above, and made it easier accessible. I think the main reasons of high entry threshold for Jenkins was not the UI itself, but the lack of publicly accessible interface, unlike those of Travis, AppVeyor, CodeCov etc... |
Well, the fact the our Jenkins still requires a username and password, and is also not integrated with GitHub (and hence not easily discoverable) are also pretty big barriers. I always have to search until I figure out its URL, too. You may discount all of these as minor issues, but to me, having to spend time to figure out how I even find and then access the system each time, as opposed to just a single click on the relevant PR, is a pretty big drawback of our Jenkins installation. If it was under a memorizable URL (jenkins.gap-system.org ?) and open to the public, that would be quite different already... |
GAP4stones were removed in #1072. |
We discussed this on some other issue at some point, but here it is on its own:
Let's get rid of the GAP stones in tests.
Rationale: Most people will add new tests without updating the stones; or add completely new .tst files with random stones values. And that's reasonable, too, for it's very time consuming to add "correct" stones -- it is not even clear what exactly that means, as there is no central "reference" server available on which one can recompute stones values; hence if I update stones, then it will change the stones values in all tst files. Indeed, due to the randomness inherent in measuring the times, if I immediately recompute stones again right after, it will also change (almost) all stones values again.
They also seem to serve little purpose. From what I can see:
TestDirectory
has astonesLimit
option, which can be used to run only tests which are faster than the given limit. That might be somewhat useful, butAs for doing this: of course
STOP_TEST
should retain its "stones" argument for backwards compatibility, but it should be made optional, and would do nothing. All the code dealing with predeciting the runtime of tests could go, as could everything else related to stones and their recalibration.The text was updated successfully, but these errors were encountered: