What to do with `warning: slow doctest … Test ran for [very long time]`? #39569

user202729 · 2025-02-22T15:31:07Z

Currently there are 46 instances of them in every pull request. It makes the GitHub annotations highlight annoying to look at.

One example test-long test run

2025-02-22T13:29:16.0611644Z Test ran for 30.22s cpu, 30.58s wall
2025-02-22T13:56:25.4496259Z Test ran for 31.11s cpu, 31.48s wall
2025-02-22T13:28:23.1625804Z Test ran for 31.73s cpu, 32.16s wall
2025-02-22T13:57:32.9606147Z Test ran for 32.38s cpu, 32.99s wall
2025-02-22T13:53:15.8347722Z Test ran for 32.69s cpu, 32.72s wall
2025-02-22T13:29:36.9565765Z Test ran for 33.03s cpu, 33.57s wall
2025-02-22T13:35:25.1981149Z Test ran for 33.78s cpu, 33.79s wall
2025-02-22T13:27:25.8378732Z Test ran for 34.25s cpu, 36.98s wall
2025-02-22T13:57:58.6533147Z Test ran for 34.36s cpu, 34.74s wall
2025-02-22T13:54:03.6386977Z Test ran for 34.72s cpu, 35.63s wall
2025-02-22T13:41:49.1500562Z Test ran for 34.78s cpu, 33.49s wall
2025-02-22T13:47:03.5916907Z Test ran for 34.78s cpu, 34.79s wall
2025-02-22T13:57:11.1852268Z Test ran for 34.92s cpu, 34.93s wall
2025-02-22T13:52:42.6317100Z Test ran for 35.26s cpu, 35.28s wall
2025-02-22T13:57:07.2761191Z Test ran for 35.43s cpu, 35.48s wall
2025-02-22T13:47:03.6276858Z Test ran for 35.97s cpu, 35.99s wall
2025-02-22T13:33:59.8155323Z Test ran for 36.08s cpu, 36.53s wall
2025-02-22T13:33:59.8160794Z Test ran for 36.36s cpu, 36.80s wall
2025-02-22T13:58:24.2374105Z Test ran for 37.01s cpu, 37.01s wall
2025-02-22T13:38:20.1084890Z Test ran for 39.34s cpu, 32.77s wall
2025-02-22T13:32:30.2429610Z Test ran for 40.09s cpu, 40.54s wall
2025-02-22T13:30:13.0849221Z Test ran for 41.46s cpu, 41.95s wall
2025-02-22T13:56:24.1576362Z Test ran for 42.12s cpu, 36.86s wall
2025-02-22T13:33:00.8968907Z Test ran for 45.19s cpu, 45.95s wall
2025-02-22T13:47:03.4548029Z Test ran for 47.25s cpu, 47.26s wall
2025-02-22T13:37:55.8668785Z Test ran for 48.04s cpu, 48.94s wall
2025-02-22T13:30:12.6184455Z Test ran for 50.20s cpu, 49.13s wall
2025-02-22T13:34:54.3308336Z Test ran for 52.42s cpu, 53.20s wall
2025-02-22T14:01:16.4019104Z Test ran for 58.09s cpu, 58.15s wall
2025-02-22T13:32:14.9510569Z Test ran for 59.29s cpu, 59.71s wall
2025-02-22T13:33:40.1948935Z Test ran for 59.87s cpu, 60.52s wall
2025-02-22T13:31:46.0065634Z Test ran for 60.58s cpu, 61.24s wall
2025-02-22T13:44:58.3631500Z Test ran for 66.36s cpu, 66.39s wall
2025-02-22T13:44:06.1227057Z Test ran for 66.68s cpu, 67.55s wall
2025-02-22T13:37:06.6633748Z Test ran for 67.79s cpu, 67.80s wall
2025-02-22T13:34:25.2372204Z Test ran for 69.46s cpu, 70.60s wall
2025-02-22T13:36:15.2720051Z Test ran for 81.34s cpu, 82.21s wall
2025-02-22T13:34:50.9208405Z Test ran for 89.42s cpu, 90.39s wall
2025-02-22T13:28:56.5552520Z Test ran for 98.55s cpu, 100.16s wall
2025-02-22T13:37:05.8736350Z Test ran for 100.18s cpu, 100.22s wall
2025-02-22T13:45:23.2359017Z Test ran for 135.93s cpu, 136.09s wall
2025-02-22T13:39:47.2658831Z Test ran for 151.55s cpu, 151.65s wall
2025-02-22T13:39:09.2373108Z Test ran for 170.78s cpu, 170.91s wall
2025-02-22T13:36:21.6029463Z Test ran for 258.19s cpu, 260.96s wall
2025-02-22T13:54:24.6318133Z Test ran for 341.58s cpu, 341.77s wall
2025-02-22T13:46:55.1348524Z Test ran for 427.29s cpu, 427.51s wall

Some of them might be performance regression (note that https://doc.sagemath.org/html/en/developer/coding_basics.html#special-markup-to-influence-doctests requires any doctest to take less than 30 seconds), some are intentional like

sage/groups/perm_gps/permgroup_named.py
3517:        ....:     for n in [1..64] for k in [1..numgps(n)])  # long time (180s)

I can see a few ways around it.

just bump the current limit from 30s to 500s. (easiest to implement, if nobody complains I'd pull request with this) (disadvantage: performance regression got swept under the rug)
implement a feature where a test that is # long time ( $x$ s) has limit $x$ seconds instead of $30$ (disadvantage: as above, sort of)
temporarily replace all tests that take too long with # known bug (see :issue:`39569`) (disadvantage: their correctness is then no longer tested)

See also #38827 #35479

The text was updated successfully, but these errors were encountered:

DaveWitteMorris · 2025-02-22T23:12:55Z

I think the second one (implement a new feature) is a good idea if it's feasible, but I would oppose both of the others, because they each have a serious downside.

mantepse · 2025-03-20T10:16:23Z

would it be possible / make sense, to specify the timeout value as a relative value? That is, normalize with respect to some specific doctest - one which is unlikely to change in performance.

user202729 · 2025-03-20T14:42:31Z

would it be possible / make sense, to specify the timeout value as a relative value? That is, normalize with respect to some specific doctest - one which is unlikely to change in performance.

It might be possible, but seems unnecessarily complicated. After all the goal is just to have a mechanism to avoid too much false positive, and machines are probably going to only get faster, not slower.

On the other hand something that does matter is what are we going to do in the long run. This is not exactly sweeping the problem under the rug, but it is true that at any point in time there isn't too many people interested in the feature either…? If someone is interested to see something is too slow they can just implement a speedup and it is probably not too difficult to identify relevant tests to remove the # long time markers.

mantepse · 2025-03-20T17:08:31Z

The reason I'm asking is that we currently don't have any performance regression tests, and we do have performance regressions from time to time. Also, with some changes in the architecture, for example using categories instead of inheritance, it is next to impossible (for me) to check what it does to performance.

I would find it comforting to know that time relative to some benchmark does not increase too much. But I was told already that good benchmarking is very hard.

user202729 · 2025-03-20T17:50:55Z

we currently don't have any performance regression tests, and we do have performance regressions from time to time

that's unfortunately true, but… out of scope…?

On the other hand this change won't make the situation much worse, in the sense that if we're not overwhelmed with piles of false positive of "slow doctest" it is probably easier to detect actual regression.

user202729 · 2025-03-21T03:47:51Z

Actually there are some functions that try to deal with that issue

#32981

#33022

user202729 linked a pull request Mar 20, 2025 that will close this issue

Implement specified timeout for slow doctests #39746

Draft

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What to do with `warning: slow doctest … Test ran for [very long time]`? #39569

What to do with `warning: slow doctest … Test ran for [very long time]`? #39569

user202729 commented Feb 22, 2025 •

edited

Loading

DaveWitteMorris commented Feb 22, 2025

mantepse commented Mar 20, 2025

user202729 commented Mar 20, 2025

mantepse commented Mar 20, 2025

user202729 commented Mar 20, 2025

user202729 commented Mar 21, 2025

What to do with warning: slow doctest … Test ran for [very long time]? #39569

What to do with warning: slow doctest … Test ran for [very long time]? #39569

Comments

user202729 commented Feb 22, 2025 • edited Loading

DaveWitteMorris commented Feb 22, 2025

mantepse commented Mar 20, 2025

user202729 commented Mar 20, 2025

mantepse commented Mar 20, 2025

user202729 commented Mar 20, 2025

user202729 commented Mar 21, 2025

What to do with `warning: slow doctest … Test ran for [very long time]`? #39569

What to do with `warning: slow doctest … Test ran for [very long time]`? #39569

user202729 commented Feb 22, 2025 •

edited

Loading