-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
make CI tests faster (2.4X faster on local OSX), and make gcleak2 leak detection more accurate #13480
Conversation
a44da13
to
a7b294c
Compare
074c549
to
753974d
Compare
Ok, I'll be @krux02 this time: can you give us a straight-forward non-clickbait non-convoluted realistic numbers for our CIs? (We don't care about your local machine numbers) |
in absolute terms, on azure CI machines, it saves 120s per machine; it's the slowest test
unless other people running latest OSX (10.15) have same issue of |
But now the test is much harder to understand. Why not simply change the number 1_000_000 to 10_000 instead and call it day? It used to be green with one million for months if not for years, no need to spend so much effort on it. |
First of all, thank you for tackling this problem. This is really an important problem to solve. Slow CI is something that affects everybody and faster CI feedback is oil in the Nim development machinery. So for me a PR tacking this problem is high priority. But what I don't like is, might be subjective but I like minimalism when it comes to the introduction of An idea that would be this: var tester: (when isDeterministic: DeterministicTester else: HeuristicTester)
type
HeuristicTester = object
numCycle
memPrevious: int
memMin: int
numCollections: int
[...]
proc measure(tester: HeuristicTester; mem: int) = [...]
proc measure(tester: DeterministicTester; mem: int) = [...]
[...]
tester.measure(mem) # instead of when isDeterministinc
|
I don't think this extra complexity is worth it, just to save 2 minutes. |
not sure about the complexity, but 2 minutes seems as a good win? |
...which might be achieved by changing one number in one line, as shown above, without the extra complexity. |
753974d
to
b709020
Compare
PTAL
noteyes the original changes were more complex, but they were also more precise, checking for loop/cycle invariants as opposed to current situation where we're just checking for a very coarse 1-size-fits-all upperbound on used memory (across OS/CPU/optimization level; and especially across GC algorithms despite them being widely different)
Which should find even the smallest leak should that occur in that code.
the problem isnt' that it's green; the problem is that it can be too loose to detect some leaks. Leaks do happen eg: these are 3 different leak issues
|
before PR
there's a single test that takes a whopping 335 seconds (on my local OSX; on other machines may be different), representing > 57% of CI tests time (585 s) (ie, all of
testament/testament all
; this excludes other things in./koch runCI
).on local OSX:
after PR
these:
./bin/nim c -r -d:release --gc:boehm tests/gc/gcleak2.nim
./bin/nim c -r --gc:boehm tests/gc/gcleak2.nim
now both run in 1 sec, by simply reducing number of iterations in this case; the test is in fact more useful than before despite smaller numIter, as we check that the occupied memory doesn't vary after few iterations; so running more iterations would make no difference.
other deterministic gc tested (eg: --gc:orc, --gc:arc) also are tested like --gc:boehm (ie fixed point at a few iterations)
for the other gc tested (ie
./bin/nim c -r tests/gc/gcleak2.nim
+ other gc) , we also modify the test to make it more useful, by checking that occupied memory follows a set of cycles, each with identical min/max values after a few cycles. The test before this PR was more rough, and would not have detected a small leak accumulating since it just had a coarse upperbound on memory use.and overall,
testament/testament all
now runs in 242s, a 2.4X speedup (on my local OSX)Note
on azure CI machines, the overall speedup is less significant compared to the speedup observed on my local OSX machine (OSX catalina 10.15, 16GB RAM 2.3 GHz 8-Core Intel Core i9); for some reason
./bin/nim c -r --gc:boehm tests/gc/gcleak2.nim
was 165s (before this PR) on my local machine but only 65s on azure OSX machines (note: azure uses 10.14 not 10.15, could be a factor) eg:https://dev.azure.com/nim-lang/255dfe86-e590-40bb-a8a2-3c0295ebdeb1/_apis/build/builds/2835/logs/60
still, this PR saves 120s on each such azure machine, and it saves 335s (a 2.4X overall speedup) on a local OSX machine, and makes the tests more accurate.