-
-
Notifications
You must be signed in to change notification settings - Fork 652
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor the test runner #3646
Refactor the test runner #3646
Conversation
I'm puzzled by the NetBSD test failure log. It looks like |
There was an extended period a few years back in which We really should move all of the I have to look at the PR's code still, but I really like the ideas you've put forth. Will try it out on Windows momentarily and go over the code. One thing to be mindful of is that |
Kicked it off again and it's gotten past that point. |
|
||
header^ = { | ||
prev_offset = block.offset, | ||
prev_ptr = max(0, cast(int)(cast(uintptr)block.last_alloc - cast(uintptr)raw_data(block.buffer))), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the uintptr
change above, this could be written as:
prev_ptr = uintptr(max(0,
int(uintptr(block.last_alloc)) - int(uintptr(raw_data(block.buffer))),
)),
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should also be noted that just casting int
does not propagate to everything, so it might not produce the correct answer, especially on platforms where int
is larger than uintptr
.
I won't merge any of this until it is heavily tested on Windows too. |
This is the BEST PR I have ever seen. The amount of time you must have spent on the PR, let alone the code itself, is impressive and honourable! Thank you so much for all your amazing work! |
Super impressive work. Maybe a better place for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great!
I think you have the idea that the heap allocator/default context.allocator isn't thread safe, but it actually is. So I am not sure if the custom allocator was needed here, and there is some code that wraps the default heap allocator in a mutex allocator that isn't needed.
For the cbor tests, I don't really see the noodly global state usage, could you elaborate why it doesn't work with the new system? Is it referring to the globals buf
, stream
and encoder
? Those can be put anywhere, they were made global so they don't get allocated/initialised for every test, but that is not a big deal.
And a general note, the core:sync/chan
package is kinda experimental and has some obvious race conditions, so it might not be all that reliable until @gingerBill
proceeds with the package, I am once again advocating for an exp:
collection or just core:exp/
where we put all these experimental things, because there is no way of knowing it is experimental currently.
Ah. That explains it. I might be able to handle that after the dust settles from this PR.
If there's a public link where I can review the rewrite, I'd be glad to look into how I can accommodate it.
I don't expect any less.
Thank you for making and maintaining Odin. :)
I initially thought of
I should've delved deeper into the code for the default context. Oh well, better safe than sorry. It's easy enough to remove
Yes, it was those 3 variables that were being written to by all the threads simultaneously. I think I'd leave any further work on it
I was aware that it's experimental back when Bill let me know when I fixed a memory leak in it, but it's a truly useful concept to be able to send thread-safe data this way. The new test runner is putting it to the test, so to speak. I haven't investigated the package for race conditions, but anecdotally, I haven't experienced any issues. |
@Feoramund, I'm rewiring existing tests to use the new test runner and to remove my pseudo-runner One thing that surprised me is that the memory track option shows no memory usage for the image suite:
|
About the memory tracker, I feel like it should be both enabled by default, and silent unless there's a leak or bad free. (And as mentioned above, the options should probably be |
Curious thing. 3 tests, 2 mem track reports.
|
For |
I also have a proposal for an additional option. There are a few tests I've written that do random testing, and we print the seed to replay the test with the same data iff a failure occurs. My suggestion is to do what I did for @(private)
_RANDOM_SEED :: #config(RANDOM_SEED, u64(0))
// Exported to tests that want to use it, in case they want to reset the `Rand` instance to first process a bunch of random data, and then replay the same sequence without having to store it for verification later.
random_seed := u64(intrinsics.read_cycle_counter()) when _RANDOM_SEED == 0 else u64(_RANDOM_SEED) Only of course the test runner would grab 1 value and then supply the same random seed to each thread, not re-read the cycle counter in each thread, so maybe an |
Many thanks.
I'll have to look into this. Off the top of my head, the only legitimate way that can happen is if the test is using temp_allocator.
That sounds fine.
I'll also have to look into this.
I think that's a good idea.
That sounds pretty cool. I can work on that as well. |
Found it. For some reason I don't remember, each test created a
|
(Sorry, I accidentally edited your reply instead of replying to it. I've restored the original content.) |
I found the issue with the disappearing logs, regarding the memory tracker. The ANSI redraw string was eating an extra line. I got it fixed. |
I'm preparing to convert the options to |
Thanks! And glad to hear you found the disappearing mem log. I'm almost done with |
I'll circle back later and refactor As for the new runner's stability on Windows, I've yet to notice it stumble. :-) |
Very good to hear, and excellent work. |
Provides a helpful info message about the option to change how many threads are used per run.
This should tidy up the CI output logs a bit.
@Kelimion All tidy now. :) I merged the AES test and benchmark in during the respective conflict commits that came up and also took care of converting the usage of I'm puzzled by what happened previously. I encountered only the two merge conflicts, just as I did with my test run of a rebase. For documentation's purposes, all I did was Well, it seems resolved now, either way. |
Thanks! I've never seen anything like this before either. I'll be disabling the NetBSD test runner (not the CI, just anything under |
'k, I've reapplied the changes I made to |
I was encountering bounds-check error messages being overwritten during a test, if the test failed for another reason and sent a log message. The original intent of having this check inside of the above `if` block was that if a test sent an error message, then it was assumed an overwrite would be safe, but it's completely possible for a test to fail for a legitimate reason, then do an unrelated bounds check somewhere else that would be buried under the animation. This change will make sure that, no matter what, the progress display will not trigger a clear if a signal was raised. There's still no guarantee that bounds-check messages will be printed properly, and it's best to redirect STDERR. The only way that can be fixed is if they get a similar hook to `context.assertion_failure_proc`.
Tested on FreeBSD 14.0 and NetBSD 10.0 OpenBSD is untested, but link names were sourced from: https://github.com/openbsd/src/blob/master/include/stdio.h According to this, OpenBSD shares the same layout as NetBSD. FreeBSD has the same as Darwin in this regard.
… solved." This reverts commit 21a1ddf.
@Kelimion All tests are solid now. I downloaded NetBSD 10.0 and setup an AMD64 VM for it. The issue was simple to see once I sat down and had a go at it. The |
The test runner was deadlocking when a test raised a signal on FreeBSD. This is untested on OpenBSD, but I have referenced this file: https://github.com/openbsd/src/blob/master/include/pthread.h
Add `pthread_testcancel` to `core:sys/unix`
Fixed the signal handler implementation on FreeBSD, NetBSD, and possibly OpenBSD (untested). They were halting any time a task raised a signal. This required fixing All UNIX-likes will now check |
Works well with `-define:ODIN_TEST_LOG_LEVEL=warning`.
Merged. Much thanks. |
Refactor the Test Runner (Multi-Threaded Edition)
Alternate PR Title: The Joy of Running Tests
"A picture's worth a thousand words," so the saying goes.
Pictures
You're all familiar with what our tests look like.
What if they could be like this?
Do you enjoy logs?
If you like logs, I bet you like memory tracking too.
Did you notice that part about selecting tests? What if they went directly to your clipboard?
Now bring it all together.
Feature Summary
-define:test_threads=N
.testing.set_fail_timeout
now works on all platforms that supportcore:thread
.testing.log
. Messages are sent toSTDERR
for ease of redirection.-define:test_track_memory=true
.-define:test_thread_memory=B
.-define:test_fancy=false
if you really want to.-define:test_select=package.name_of_test,package.name_of_another_test,...
-define:test_clipboard=true
.CTRL-C
gracefully cancels the test runner early on systems withlibc.signal
andlibc.SIGINT
.Notes
I realized after a couple messages with @laytan in #3538 that the test runner should be multi-threaded, so here we are.
This is the culmination of a couple weeks worth of work and some general notions brewing for months about how the test runner could be (and look) better.
The overall goal was to make things as pleasant as possible.
Windows Support
I'm putting the bad news here.
I had to disable all of the Windows-specific functionality in
runner_windows.odin
, because I do not have a Windows machine to test any of that on.I know part of the Windows build handles
set_fail_timeout
in a Windows-specific way, but I see that some of it is dealing with Windows exceptions. My hope is someone proficient in the Win32 API can suggest how I can accommodate that, or if it's not too difficult, submit a patch afterwards to restore it in congruence with the new thread pool-based runner.Regardless,
set_fail_timeout
should still work on Windows, since it's usingcore:thread
now. So there is that.Technical / Implementation Details
Getting this working right required some extra steps outlined below.
Additions to
core:thread
The
testing.set_fail_timeout
API required additions to the thread pool in order to stop individual tasks, as well as shutdown an entire pool.Modifications to
core:log
I moved a couple things around so custom loggers can do their own timestamping.
Fixed one small bug, too. Details are in the commit messages.
Modifications to
Tracking_Allocator
I added
mem.tracking_allocator_reset
to zero out all statistics.Layout of
T
T
no longer has a writer assigned to it. Any communication with the main thread is done through async.Channel
now.T
s are also now thread-specific, instead of being global to the test runner.ANSI Escape Codes (new package)
I've included a package full of ANSI escape codes to help support this new test runner. It's all constant references, no procs, since I found it easy enough to string together sequences like
ansi.CSI + ansi.FG_CYAN + ansi.SGR
.I've taken steps throughout this refactor to batch writes and streams of ANSI color codes where possible in order to keep the animation smooth and the flicker low.
There's no checking if
STDERR
is or is not a TTY though, so if you redirectSTDERR
to a file, it'll have the color codes in it. This already happens if you redirect a console-basedFile_Console_Logger
, but hopefully it's not a major issue.Even though it's not technically an encoding in the sense of ciphering or marshalling, I placed it in
core:encoding/ansi
, because I couldn't think of a better place. If you can, let me know, and I can move it.Logging
The test runner and its subordinate threads all have a proper logger context setup now. The new way to do things with this runner is to call the
core:log
suite of log procedures, instead of relying ontesting.log*
, and I've marked those procs as deprecated.Every log message is timestamped, formatted, and sent through a thread-specific
sync.Channel
to the main thread. The main thread then sorts the messages by the timestamp sent along with it to ensure precise order. These messages are then batch written toSTDERR
to support log redirection, whileSTDOUT
displays the fancy widgets.If a test - or any code it calls - raises a log error message, the entire test fails. This might be contentious or unexpected, so I'm noting it here.
Cleanups
Cleanup callbacks now record the context they were called in, to keep things sane as far as needing to call them from the main thread in the event of a timeout failure.
They're not called if the user sends an interrupt signal though.
I haven't looked into the internals of
defer
, but my experience with switching contexts around deferred procs tells me that this is similar to how they work - with recorded contexts.Note that the test thread's allocator is the one to allocate on the cleanups array, so that will impact memory usage statistics if you happen to poke around there.
Allocation Strategy
Most calls that allocate memory in the test runner will check for errors where possible and assert if there was an issue. This might be pedantic and more than what most people are used to reading, but I figured it suited a test runner to not let anything silently slip by if possible.
As many runner-specific allocations are done upfront if possible before the main loop starts.
There is also a new custom memory allocator for the subordinate threads.
New Memory Allocator
Why a new memory allocator? What's wrong with the ones we have?
The new test runner has these requirements:
I surveyed the allocators we currently have to see how they meet the requirements.
If memory tracking and BSD wasn't a concern, we could use a Growing arena from
core:mem/virtual
with no issue.However, I wanted to keep this refactor as system-independent as possible, while being able to check for memory leaks (not necessarily in test code itself, since we have a good grasp of that lifetime and can just
free_all
when a test is done, but in the code that is being tested).I read through the memory allocation strategies series and decided to design a stack-based allocator with these goals:
Because I wanted the allocator to respect the API behind the memory tracker and not just return successful on any pointer that was in bounds, I came up with the idea to have a stack that can support out-of-order frees.
It does so by marking headers as free and collapsing when a run of allocations has been found to be free, backwards from the last allocation. This was the simplest implementation I could make that fit these goals. I've left some notes in the
rollback_stack_allocator.odin
file about time complexity and implementation details.It's still possible to fragment memory by freeing everything but the last allocation, but such is the tradeoff of the design. It's assumed things are allocated in order of long-lived to short-lived. This design just gives the user a bit of wiggle room in when they deallocate.
Why not just use the heap wrapped in a mutexed proxy?
It's a lot simpler, but to my current understanding, it's faster to have isolated blocks of memory for each thread that don't have to be put behind a mutex (false sharing notwithstanding), and I aimed to keep the test runner from being designed in a way that hampered the whole process, in so much as I know how.
Of course, the custom allocators still use a mutexed proxy to the default allocator internally for their initial and expanding allocations, but they allocate blocks of 4 megabytes at a time.
Big Allocations
Note that these allocators can allocate any block size that is requested in order to fulfill the first requirement of expandable memory. Anything over the default size of 4 megabytes will be allocated in full and freed as one block.
Conclusion
All that said, I'm not emotionally attached to this allocator, so if this wasn't the right way to do it, I'm open to suggestions. I had fun making it.
Terminal Escape Code Support
This has been tested on Linux only so far. Virtually every terminal emulator I tried supported colors, disabling line wrap, and setting the window title. Half of them do not support setting the clipboard though, so keep that in mind if you try to use
test_clipboard
that it may not be supported.Here's a list of terminals tested.
Final Thoughts
I hope this makes running tests more enjoyable for everyone. I'm looking forward to feedback.
I've reviewed everything from top to bottom three times so far, but this is a large change, so understandably I may have missed something. As always, open to suggestions and reviews.
I saw that the Odin core tests and et cetera all have their own pseudo-implementaton of
core:testing
. I decided for this PR to do the bare minimum to get tests to pass again through the CI, which didn't require much more than removing references toT.w
.I'm assuming there's a good reason why this is done the way it is, but my hope is maybe the CI can take advantage of the new multi-threaded test runner one day. Hopefully it isn't too complicated.
The most noodly thing I saw as far as that goes was all the global state usage in the
test_core_cbor
package.(Yes, I initially only wanted
set_fail_timeout
to work on Linux, and it snowballed into this.)