Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get rid of GAP stones in tests #1060

Closed
fingolfin opened this issue Jan 9, 2017 · 10 comments
Closed

Get rid of GAP stones in tests #1060

fingolfin opened this issue Jan 9, 2017 · 10 comments

Comments

@fingolfin
Copy link
Member

We discussed this on some other issue at some point, but here it is on its own:

Let's get rid of the GAP stones in tests.

Rationale: Most people will add new tests without updating the stones; or add completely new .tst files with random stones values. And that's reasonable, too, for it's very time consuming to add "correct" stones -- it is not even clear what exactly that means, as there is no central "reference" server available on which one can recompute stones values; hence if I update stones, then it will change the stones values in all tst files. Indeed, due to the randomness inherent in measuring the times, if I immediately recompute stones again right after, it will also change (almost) all stones values again.

They also seem to serve little purpose. From what I can see:

  1. they are used to estimate the duration of the next test, based on the duration of the preceding tests. That's neat, but hardly important
  2. used to "sort" tests, from quickest to slowest. Again, neat, but hardly important.
  3. Apparently TestDirectory has a stonesLimit option, which can be used to run only tests which are faster than the given limit. That might be somewhat useful, but
    1. it is less useful given that often stones values go uncalibrated for some time,
    2. running an essential random subset of the tests seems not like a good idea,
    3. I am not sure anybody actually does use this feature anyway

As for doing this: of course STOP_TEST should retain its "stones" argument for backwards compatibility, but it should be made optional, and would do nothing. All the code dealing with predeciting the runtime of tests could go, as could everything else related to stones and their recalibration.

@james-d-mitchell
Copy link
Contributor

I agree with this proposal.

@ChrisJefferson
Copy link
Contributor

I agree. The 'stonesLimit' is mainly for travis, to run the things in teststandard which are "fast enough". it would be better to just split teststandard into two directories.

@markuspf
Copy link
Member

markuspf commented Jan 9, 2017

(I am sure this has been discussed before) should there be a way to time tests just in wallclock time (using NanosecondsSinceEpoch for instance) to detect regressions?

As you say, this is only relevant if one is running some test on some well-defined hardware, but it seems useful to be recording runtimes to see regressions, maybe?

@fingolfin
Copy link
Member Author

@markuspf Something like this certainly would be desirable, but one needs to put some thought into it, else it'll not be useful but only distracting.

One approach would be to setup our own version of http://speed.pypy.org/ (the code for the site is available) and simply plot the timing results produced by some fixed hardware. E.g. perhaps from our Jenkins. (The travis CI timings are not useful for this, as they can vary a lot).

A second (not quite alternative, but complementing) approach would be to compute a "baseline speed" at the start of the tests (i.e. time some artificial mini benchmark), and then time all tests, but "normalize" the results against the timings of the mini benchmark (i.e. compute the quotient). Then, check those quotients in order to determine whether a speed regression occurred. This approach might make it possible to do the tests on travis, though it's still not perfect, as the Travis VMs can slow down or speed up in the middle of a test run, as load changes...

I'd love to have 1, as the great graphical representation also makes it easier to spot and understand regressions, and it also shows improvements, which can be quite motivating. Alas, somebody would have to invest the time and effort required to set it up....

@markuspf
Copy link
Member

markuspf commented Jan 9, 2017

I agree, so lets first get rid of the GAPStones to scratch that itch, then move on to do the timing page, and lets then come up with a clear spec of what we're expecting of the timing page (and find someone who sets it up).

@frankluebeck
Copy link
Member

I never found the GAPStones particularly useful; so I agree to get rid of them.

Concerning timing and comparing for regression tests: Are you aware of the optional argument for the Test function which allows to specify writeTimings, compareTimings and reportTimeDiff?

@fingolfin
Copy link
Member Author

@frankluebeck I saw those before, but must admit I wasn't actively aware of them. Thank you for the reminder! They look indeed potentially useful. Not sure if we can use them with travis (due to the fluctuation of performance I mentioned, and also because we'd have to see how we store results between builds -- might be possible via their caching feature). But perhaps this could be setup on our Jenkins? Or perhaps it already is? I vageuely recall that @alex-konovalov showed me something like that...

That said, I could never really "get" our Jenkins. I find it far too complicate to navigate and undertand (resp. rememeber) what is what and where and how to use it... The user interface is just bad (it might be great if you have time to train it and learn it, and then constantly use it... but I don't have that time and patience. In contrast, I managed to pick up Travis (and also codecov, after they improved it a bit) fairly quickly.

@olexandr-konovalov
Copy link
Member

olexandr-konovalov commented Jan 11, 2017

My opinion quite differs from @fingolfin's, but I use Jenkins regularly since 2009 so I am very biased. The UI could be better, but a particular feature I like is matrix jobs, when you can easily see patterns if tests are failing not in every configuration:
screen shot 2017-01-10 at 23 50 47

Anyhow, this issue is about GAP4Stones, and I agree to get rid of them.

Regarding walltime, we can use IO_gettimeofday which returns a record with components tv_sec and tv_usec meaning the time since epoch. We can backport this into the kernel (did this for HPC-GAP).

Regarding storing timings in Jenkins: some are extracted to build plots, but that is not perfect, so I would agree improving it as suggested above, and made it easier accessible. I think the main reasons of high entry threshold for Jenkins was not the UI itself, but the lack of publicly accessible interface, unlike those of Travis, AppVeyor, CodeCov etc...

@fingolfin
Copy link
Member Author

Well, the fact the our Jenkins still requires a username and password, and is also not integrated with GitHub (and hence not easily discoverable) are also pretty big barriers. I always have to search until I figure out its URL, too. You may discount all of these as minor issues, but to me, having to spend time to figure out how I even find and then access the system each time, as opposed to just a single click on the relevant PR, is a pretty big drawback of our Jenkins installation.

If it was under a memorizable URL (jenkins.gap-system.org ?) and open to the public, that would be quite different already...

@olexandr-konovalov
Copy link
Member

olexandr-konovalov commented Feb 21, 2017

GAP4stones were removed in #1072.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants