Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What to do with warning: slow doctest … Test ran for [very long time]? #39569

Open
user202729 opened this issue Feb 22, 2025 · 6 comments · May be fixed by #39746
Open

What to do with warning: slow doctest … Test ran for [very long time]? #39569

user202729 opened this issue Feb 22, 2025 · 6 comments · May be fixed by #39746

Comments

@user202729
Copy link
Contributor

user202729 commented Feb 22, 2025

Currently there are 46 instances of them in every pull request. It makes the GitHub annotations highlight annoying to look at.

One example test-long test run
2025-02-22T13:29:16.0611644Z Test ran for 30.22s cpu, 30.58s wall
2025-02-22T13:56:25.4496259Z Test ran for 31.11s cpu, 31.48s wall
2025-02-22T13:28:23.1625804Z Test ran for 31.73s cpu, 32.16s wall
2025-02-22T13:57:32.9606147Z Test ran for 32.38s cpu, 32.99s wall
2025-02-22T13:53:15.8347722Z Test ran for 32.69s cpu, 32.72s wall
2025-02-22T13:29:36.9565765Z Test ran for 33.03s cpu, 33.57s wall
2025-02-22T13:35:25.1981149Z Test ran for 33.78s cpu, 33.79s wall
2025-02-22T13:27:25.8378732Z Test ran for 34.25s cpu, 36.98s wall
2025-02-22T13:57:58.6533147Z Test ran for 34.36s cpu, 34.74s wall
2025-02-22T13:54:03.6386977Z Test ran for 34.72s cpu, 35.63s wall
2025-02-22T13:41:49.1500562Z Test ran for 34.78s cpu, 33.49s wall
2025-02-22T13:47:03.5916907Z Test ran for 34.78s cpu, 34.79s wall
2025-02-22T13:57:11.1852268Z Test ran for 34.92s cpu, 34.93s wall
2025-02-22T13:52:42.6317100Z Test ran for 35.26s cpu, 35.28s wall
2025-02-22T13:57:07.2761191Z Test ran for 35.43s cpu, 35.48s wall
2025-02-22T13:47:03.6276858Z Test ran for 35.97s cpu, 35.99s wall
2025-02-22T13:33:59.8155323Z Test ran for 36.08s cpu, 36.53s wall
2025-02-22T13:33:59.8160794Z Test ran for 36.36s cpu, 36.80s wall
2025-02-22T13:58:24.2374105Z Test ran for 37.01s cpu, 37.01s wall
2025-02-22T13:38:20.1084890Z Test ran for 39.34s cpu, 32.77s wall
2025-02-22T13:32:30.2429610Z Test ran for 40.09s cpu, 40.54s wall
2025-02-22T13:30:13.0849221Z Test ran for 41.46s cpu, 41.95s wall
2025-02-22T13:56:24.1576362Z Test ran for 42.12s cpu, 36.86s wall
2025-02-22T13:33:00.8968907Z Test ran for 45.19s cpu, 45.95s wall
2025-02-22T13:47:03.4548029Z Test ran for 47.25s cpu, 47.26s wall
2025-02-22T13:37:55.8668785Z Test ran for 48.04s cpu, 48.94s wall
2025-02-22T13:30:12.6184455Z Test ran for 50.20s cpu, 49.13s wall
2025-02-22T13:34:54.3308336Z Test ran for 52.42s cpu, 53.20s wall
2025-02-22T14:01:16.4019104Z Test ran for 58.09s cpu, 58.15s wall
2025-02-22T13:32:14.9510569Z Test ran for 59.29s cpu, 59.71s wall
2025-02-22T13:33:40.1948935Z Test ran for 59.87s cpu, 60.52s wall
2025-02-22T13:31:46.0065634Z Test ran for 60.58s cpu, 61.24s wall
2025-02-22T13:44:58.3631500Z Test ran for 66.36s cpu, 66.39s wall
2025-02-22T13:44:06.1227057Z Test ran for 66.68s cpu, 67.55s wall
2025-02-22T13:37:06.6633748Z Test ran for 67.79s cpu, 67.80s wall
2025-02-22T13:34:25.2372204Z Test ran for 69.46s cpu, 70.60s wall
2025-02-22T13:36:15.2720051Z Test ran for 81.34s cpu, 82.21s wall
2025-02-22T13:34:50.9208405Z Test ran for 89.42s cpu, 90.39s wall
2025-02-22T13:28:56.5552520Z Test ran for 98.55s cpu, 100.16s wall
2025-02-22T13:37:05.8736350Z Test ran for 100.18s cpu, 100.22s wall
2025-02-22T13:45:23.2359017Z Test ran for 135.93s cpu, 136.09s wall
2025-02-22T13:39:47.2658831Z Test ran for 151.55s cpu, 151.65s wall
2025-02-22T13:39:09.2373108Z Test ran for 170.78s cpu, 170.91s wall
2025-02-22T13:36:21.6029463Z Test ran for 258.19s cpu, 260.96s wall
2025-02-22T13:54:24.6318133Z Test ran for 341.58s cpu, 341.77s wall
2025-02-22T13:46:55.1348524Z Test ran for 427.29s cpu, 427.51s wall

Some of them might be performance regression (note that https://doc.sagemath.org/html/en/developer/coding_basics.html#special-markup-to-influence-doctests requires any doctest to take less than 30 seconds), some are intentional like

sage/groups/perm_gps/permgroup_named.py
3517:        ....:     for n in [1..64] for k in [1..numgps(n)])  # long time (180s)

I can see a few ways around it.

  • just bump the current limit from 30s to 500s. (easiest to implement, if nobody complains I'd pull request with this) (disadvantage: performance regression got swept under the rug)
  • implement a feature where a test that is # long time ( x s) has limit x seconds instead of 30 (disadvantage: as above, sort of)
  • temporarily replace all tests that take too long with # known bug (see :issue:`39569`) (disadvantage: their correctness is then no longer tested)

See also #38827 #35479

@DaveWitteMorris
Copy link
Member

I think the second one (implement a new feature) is a good idea if it's feasible, but I would oppose both of the others, because they each have a serious downside.

@user202729 user202729 linked a pull request Mar 20, 2025 that will close this issue
5 tasks
@mantepse
Copy link
Contributor

would it be possible / make sense, to specify the timeout value as a relative value? That is, normalize with respect to some specific doctest - one which is unlikely to change in performance.

@user202729
Copy link
Contributor Author

would it be possible / make sense, to specify the timeout value as a relative value? That is, normalize with respect to some specific doctest - one which is unlikely to change in performance.

It might be possible, but seems unnecessarily complicated. After all the goal is just to have a mechanism to avoid too much false positive, and machines are probably going to only get faster, not slower.

On the other hand something that does matter is what are we going to do in the long run. This is not exactly sweeping the problem under the rug, but it is true that at any point in time there isn't too many people interested in the feature either…? If someone is interested to see something is too slow they can just implement a speedup and it is probably not too difficult to identify relevant tests to remove the # long time markers.

@mantepse
Copy link
Contributor

The reason I'm asking is that we currently don't have any performance regression tests, and we do have performance regressions from time to time. Also, with some changes in the architecture, for example using categories instead of inheritance, it is next to impossible (for me) to check what it does to performance.

I would find it comforting to know that time relative to some benchmark does not increase too much. But I was told already that good benchmarking is very hard.

@user202729
Copy link
Contributor Author

we currently don't have any performance regression tests, and we do have performance regressions from time to time

that's unfortunately true, but… out of scope…?

On the other hand this change won't make the situation much worse, in the sense that if we're not overwhelmed with piles of false positive of "slow doctest" it is probably easier to detect actual regression.

@user202729
Copy link
Contributor Author

Actually there are some functions that try to deal with that issue

#32981

#33022

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants