Get rid of GAP stones in tests #1060

fingolfin · 2017-01-09T10:37:22Z

We discussed this on some other issue at some point, but here it is on its own:

Let's get rid of the GAP stones in tests.

Rationale: Most people will add new tests without updating the stones; or add completely new .tst files with random stones values. And that's reasonable, too, for it's very time consuming to add "correct" stones -- it is not even clear what exactly that means, as there is no central "reference" server available on which one can recompute stones values; hence if I update stones, then it will change the stones values in all tst files. Indeed, due to the randomness inherent in measuring the times, if I immediately recompute stones again right after, it will also change (almost) all stones values again.

They also seem to serve little purpose. From what I can see:

they are used to estimate the duration of the next test, based on the duration of the preceding tests. That's neat, but hardly important
used to "sort" tests, from quickest to slowest. Again, neat, but hardly important.
Apparently TestDirectory has a stonesLimit option, which can be used to run only tests which are faster than the given limit. That might be somewhat useful, but
1. it is less useful given that often stones values go uncalibrated for some time,
2. running an essential random subset of the tests seems not like a good idea,
3. I am not sure anybody actually does use this feature anyway

As for doing this: of course STOP_TEST should retain its "stones" argument for backwards compatibility, but it should be made optional, and would do nothing. All the code dealing with predeciting the runtime of tests could go, as could everything else related to stones and their recalibration.

The text was updated successfully, but these errors were encountered:

james-d-mitchell · 2017-01-09T10:40:12Z

I agree with this proposal.

ChrisJefferson · 2017-01-09T10:41:07Z

I agree. The 'stonesLimit' is mainly for travis, to run the things in teststandard which are "fast enough". it would be better to just split teststandard into two directories.

markuspf · 2017-01-09T11:24:49Z

(I am sure this has been discussed before) should there be a way to time tests just in wallclock time (using NanosecondsSinceEpoch for instance) to detect regressions?

As you say, this is only relevant if one is running some test on some well-defined hardware, but it seems useful to be recording runtimes to see regressions, maybe?

fingolfin · 2017-01-09T14:28:48Z

@markuspf Something like this certainly would be desirable, but one needs to put some thought into it, else it'll not be useful but only distracting.

One approach would be to setup our own version of http://speed.pypy.org/ (the code for the site is available) and simply plot the timing results produced by some fixed hardware. E.g. perhaps from our Jenkins. (The travis CI timings are not useful for this, as they can vary a lot).

A second (not quite alternative, but complementing) approach would be to compute a "baseline speed" at the start of the tests (i.e. time some artificial mini benchmark), and then time all tests, but "normalize" the results against the timings of the mini benchmark (i.e. compute the quotient). Then, check those quotients in order to determine whether a speed regression occurred. This approach might make it possible to do the tests on travis, though it's still not perfect, as the Travis VMs can slow down or speed up in the middle of a test run, as load changes...

I'd love to have 1, as the great graphical representation also makes it easier to spot and understand regressions, and it also shows improvements, which can be quite motivating. Alas, somebody would have to invest the time and effort required to set it up....

markuspf · 2017-01-09T16:00:37Z

I agree, so lets first get rid of the GAPStones to scratch that itch, then move on to do the timing page, and lets then come up with a clear spec of what we're expecting of the timing page (and find someone who sets it up).

frankluebeck · 2017-01-09T17:00:41Z

I never found the GAPStones particularly useful; so I agree to get rid of them.

Concerning timing and comparing for regression tests: Are you aware of the optional argument for the Test function which allows to specify writeTimings, compareTimings and reportTimeDiff?

fingolfin · 2017-01-09T18:34:29Z

@frankluebeck I saw those before, but must admit I wasn't actively aware of them. Thank you for the reminder! They look indeed potentially useful. Not sure if we can use them with travis (due to the fluctuation of performance I mentioned, and also because we'd have to see how we store results between builds -- might be possible via their caching feature). But perhaps this could be setup on our Jenkins? Or perhaps it already is? I vageuely recall that @alex-konovalov showed me something like that...

That said, I could never really "get" our Jenkins. I find it far too complicate to navigate and undertand (resp. rememeber) what is what and where and how to use it... The user interface is just bad (it might be great if you have time to train it and learn it, and then constantly use it... but I don't have that time and patience. In contrast, I managed to pick up Travis (and also codecov, after they improved it a bit) fairly quickly.

olexandr-konovalov · 2017-01-11T00:00:53Z

My opinion quite differs from @fingolfin's, but I use Jenkins regularly since 2009 so I am very biased. The UI could be better, but a particular feature I like is matrix jobs, when you can easily see patterns if tests are failing not in every configuration:

Anyhow, this issue is about GAP4Stones, and I agree to get rid of them.

Regarding walltime, we can use IO_gettimeofday which returns a record with components tv_sec and tv_usec meaning the time since epoch. We can backport this into the kernel (did this for HPC-GAP).

Regarding storing timings in Jenkins: some are extracted to build plots, but that is not perfect, so I would agree improving it as suggested above, and made it easier accessible. I think the main reasons of high entry threshold for Jenkins was not the UI itself, but the lack of publicly accessible interface, unlike those of Travis, AppVeyor, CodeCov etc...

fingolfin · 2017-01-14T12:20:52Z

Well, the fact the our Jenkins still requires a username and password, and is also not integrated with GitHub (and hence not easily discoverable) are also pretty big barriers. I always have to search until I figure out its URL, too. You may discount all of these as minor issues, but to me, having to spend time to figure out how I even find and then access the system each time, as opposed to just a single click on the relevant PR, is a pretty big drawback of our Jenkins installation.

If it was under a memorizable URL (jenkins.gap-system.org ?) and open to the public, that would be quite different already...

olexandr-konovalov · 2017-02-21T12:35:59Z

GAP4stones were removed in #1072.

fingolfin mentioned this issue Jan 12, 2017

Remove GAP stones #1072

Merged

olexandr-konovalov closed this as completed Feb 21, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Get rid of GAP stones in tests #1060

Get rid of GAP stones in tests #1060

fingolfin commented Jan 9, 2017

james-d-mitchell commented Jan 9, 2017

ChrisJefferson commented Jan 9, 2017

markuspf commented Jan 9, 2017

fingolfin commented Jan 9, 2017

markuspf commented Jan 9, 2017

frankluebeck commented Jan 9, 2017

fingolfin commented Jan 9, 2017

olexandr-konovalov commented Jan 11, 2017 •

edited

Loading

fingolfin commented Jan 14, 2017

olexandr-konovalov commented Feb 21, 2017 •

edited

Loading

Get rid of GAP stones in tests #1060

Get rid of GAP stones in tests #1060

Comments

fingolfin commented Jan 9, 2017

james-d-mitchell commented Jan 9, 2017

ChrisJefferson commented Jan 9, 2017

markuspf commented Jan 9, 2017

fingolfin commented Jan 9, 2017

markuspf commented Jan 9, 2017

frankluebeck commented Jan 9, 2017

fingolfin commented Jan 9, 2017

olexandr-konovalov commented Jan 11, 2017 • edited Loading

fingolfin commented Jan 14, 2017

olexandr-konovalov commented Feb 21, 2017 • edited Loading

olexandr-konovalov commented Jan 11, 2017 •

edited

Loading

olexandr-konovalov commented Feb 21, 2017 •

edited

Loading