-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Local dispatcher improvements fixes #17
Commits on Nov 28, 2023
-
GafferDispatchUI : Add LocalJobs editor
This is basically the window that the "Execute/View Local Jobs" menu item used to show, but refactored into a dockable editor panel. It still leaves a lot to be desired.
Configuration menu - View commit details
-
Copy full SHA for 4468a44 - Browse repository at this point
Copy the full SHA 4468a44View commit details -
Configuration menu - View commit details
-
Copy full SHA for 150b445 - Browse repository at this point
Copy the full SHA 150b445View commit details -
Configuration menu - View commit details
-
Copy full SHA for 15ff13c - Browse repository at this point
Copy the full SHA 15ff13cView commit details -
LocalDispatcher : Share foreground/background dispatch code
This removes a fair bit of duplicate code. There were quite a few weird little differences in the reporting logic in particular, which I could see no reason for, so they are gone. The two distinctions between foreground and background dispatches that remain are : - Background execution runs tasks in a subprocess, and polls them while checking for cancellation. - Background execution suppresses exceptions (while still reporting them via the job log), while foreground execution propagates them.
Configuration menu - View commit details
-
Copy full SHA for b8f4083 - Browse repository at this point
Copy the full SHA b8f4083View commit details -
LocalDispatcher : Remove
JobPool.jobFailedSignal()
- This appears to be completely unused. - A `jobStatusChangedSignal()` would be more generally useful. - But signals about status changing would make more sense on the Jobs themselves. We'll deal with this in a later commit.
Configuration menu - View commit details
-
Copy full SHA for 620a884 - Browse repository at this point
Copy the full SHA 620a884View commit details -
LocalDispatcher : Simplify status tracking
We were storing a status per batch, but in fact the only status we ever reported to the outside world (via `failed()` and `killed()`) was the status of the root batch. And to get the root batch into a failed state we were going via a convoluted route whereby `__reportFailed()` would mark the current batch as failed, and call `JobPool._failed()` which would in turn call back to `Job._fail()` to mark the root batch as failed. Much easier to manage a single `self.__status` directly, and limit the per-batch data to a simple flag that tells us if we've executed it yet or not. At the same time, we explicitly store the current batch instead of having to do a search for it every time. Note that we're assuming that attribute assignment (for `self.__currentBatch`) is atomic (it is for CPython) rather than using a mutex to synchronize access between the background and UI threads.
Configuration menu - View commit details
-
Copy full SHA for d335129 - Browse repository at this point
Copy the full SHA d335129View commit details -
LocalDispatcher : Remove
JobPool.failedJobs()
Instead, just keep failed jobs in the main jobs list. This prevents them from jumping around in the LocalJobs UI, and simplifies the API too.
Configuration menu - View commit details
-
Copy full SHA for 02b77b7 - Browse repository at this point
Copy the full SHA 02b77b7View commit details -
Configuration menu - View commit details
-
Copy full SHA for 8933c0c - Browse repository at this point
Copy the full SHA 8933c0cView commit details -
LocalDispatcher : Don't derive JobPool from RunTimeTyped
There was no benefit in that.
Configuration menu - View commit details
-
Copy full SHA for 61f02d0 - Browse repository at this point
Copy the full SHA 61f02d0View commit details -
Graphics : Add completed status icon for dispatcher jobs
Also move icons into their own section, and spruce up the other icons a little bit.
Configuration menu - View commit details
-
Copy full SHA for c0490b3 - Browse repository at this point
Copy the full SHA c0490b3View commit details -
LocalDispatcher : Keep all jobs in pool until explicitly removed
Otherwise you get no opportunity to see the job logs in the LocalJobs panel, and for quick jobs its not even clear if the job happened. Also add a `Job.status()` method since there was no public way of determining running/complete status before (it was assumed that anything in the list that wasn't killed or failed was running). Remove `Job.killed()` and `Job.failed()` since they are superceded by `Job.status()`.
Configuration menu - View commit details
-
Copy full SHA for 3a1c2f4 - Browse repository at this point
Copy the full SHA 3a1c2f4View commit details -
Configuration menu - View commit details
-
Copy full SHA for fb77f12 - Browse repository at this point
Copy the full SHA fb77f12View commit details -
LocalDispatcher : Replace
threading.Thread()
withBackgroundTask
This gives us a more natural cancellation mechanism, and means we're using a standard TBB worker thread and therefore are not creating unnecessary thread locals. Since cancellation operates via exceptions, it's more natural to also allow error exceptions to propagate out of `__executeWalk()` rather than using a `False` return value. This also means that all management of `self.__status` can be consolidated into one place in `__executeInternal()`.
Configuration menu - View commit details
-
Copy full SHA for 1403fa9 - Browse repository at this point
Copy the full SHA 1403fa9View commit details -
LocalDispatcher : Protect
Job.execute()
methodThis should not have been part of the public API - it only exists so that `doDispatch()` can initiate execution _after_ adding the job to the pool. Nobody needs to call it again after that, and definitely not client code.
Configuration menu - View commit details
-
Copy full SHA for dfec9a5 - Browse repository at this point
Copy the full SHA dfec9a5View commit details -
LocalDispatcher : Test JobPool signals
And make `removeJob()` public since it is called by the UI, and add public `addJob()` method for symmetry.
Configuration menu - View commit details
-
Copy full SHA for 185da44 - Browse repository at this point
Copy the full SHA 185da44View commit details -
Configuration menu - View commit details
-
Copy full SHA for 69bac0c - Browse repository at this point
Copy the full SHA 69bac0cView commit details -
Configuration menu - View commit details
-
Copy full SHA for b3d7557 - Browse repository at this point
Copy the full SHA b3d7557View commit details -
Configuration menu - View commit details
-
Copy full SHA for 087b59d - Browse repository at this point
Copy the full SHA 087b59dView commit details -
LocalJobs : Swap Log and Details tabs round
The Log tab is more useful.
Configuration menu - View commit details
-
Copy full SHA for c717e6f - Browse repository at this point
Copy the full SHA c717e6fView commit details -
LocalJobs : Remove redundant update calls
We'll be updating using our signal handlers anyway.
Configuration menu - View commit details
-
Copy full SHA for 6e2beb2 - Browse repository at this point
Copy the full SHA 6e2beb2View commit details -
LocalJobs : Update immediately on job status changes
And refactor the other updates to make it clearer what is being updated and why.
Configuration menu - View commit details
-
Copy full SHA for 2fbf877 - Browse repository at this point
Copy the full SHA 2fbf877View commit details -
Configuration menu - View commit details
-
Copy full SHA for d5d77f2 - Browse repository at this point
Copy the full SHA d5d77f2View commit details -
LocalJobs : Update messages widget when messages change
Before they were only updated when selecting a job, so you didn't get any streaming feedback as to what a job was doing.
Configuration menu - View commit details
-
Copy full SHA for 02f854d - Browse repository at this point
Copy the full SHA 02f854dView commit details -
LocalDispatcher : Improve messaging
- Prefix batch messages with the node name, not the job name - Include execution time in batch completion message - Output stack traces line-by-line to work better with MessagesWidget.Role.Log. - Add debug message containing subprocess command lines. - Omit batch messages for root batch
Configuration menu - View commit details
-
Copy full SHA for d0afbcf - Browse repository at this point
Copy the full SHA d0afbcfView commit details -
LocalDispatcher : Rename
__batch
to__rootBatch
That's what it is.
Configuration menu - View commit details
-
Copy full SHA for 2c828b4 - Browse repository at this point
Copy the full SHA 2c828b4View commit details -
Configuration menu - View commit details
-
Copy full SHA for 696e0e4 - Browse repository at this point
Copy the full SHA 696e0e4View commit details -
LocalDispatcher/LocalJobs : Add more properties, remove current batch
The LocalJobs Properties tab now contains only static properties of the job, so it is no longer a bug that we're not updating it dynamically when the current batch changes.
Configuration menu - View commit details
-
Copy full SHA for 5269a1c - Browse repository at this point
Copy the full SHA 5269a1cView commit details -
- Move just below job listing, since that's what they operate on. - Remove draggable splittable handle separating buttons from widget above. - Reduce size. - Update appropriately when job status changes.
Configuration menu - View commit details
-
Copy full SHA for 5cc6903 - Browse repository at this point
Copy the full SHA 5cc6903View commit details -
LocalDispatcher : Kill outstanding background jobs at Gaffer shutdown
Without this, the BackgroundTask and `outputHandler` threads would continue running while Python was torn down around them on the main thread. This led to crashes when trying to release the GIL from the BackgroundTask. The Changes.md entry doesn't refer to crashes, because the behaviour in 1.3 was different due to us using a Python thread instead of a BackgroundTask. In 1.3 Python would wait patiently for the Job thread to finish, even if that meant waiting a very very very long time. _Unless_ the user hit `Ctrl+C` again, in which case it would forcibly quit, leaving behind zombie task processes. I was in two minds as to whether or not to apply the exit handler to all JobPools or just the default one. Since we don't have any non-default pools outside of the unit tests I don't really have any information to base a decision on, so went with only the default pool for now.
Configuration menu - View commit details
-
Copy full SHA for 4c17093 - Browse repository at this point
Copy the full SHA 4c17093View commit details -
BackgroundTask : Fix crash caused by Views with no
in
plugStrictly speaking, no such View should ever exist, because the `in` plug is added in the View constructor. But `ImageView::insertConverter()` currently replaces it with another plug, meaning that it is missing when the replacement is added, which is when `cancelAffectedTasks()` is called. This was causing a crash in `GafferImageUITest.ImageViewTest.testDeriving`. The problem had gone unnoticed until now because previously there were no BackgroundTasks to cancel so we never got as far as the bad check. But now that the LocalDispatcher uses a BackgroundTask, some completed tasks lay dormant waiting for garbage collection after LocalDispatcherTest has run, and we hit the null dereference.
Configuration menu - View commit details
-
Copy full SHA for 0c1412c - Browse repository at this point
Copy the full SHA 0c1412cView commit details -
LocalDispatcher : Use WeakMethod with
callOnUIThread()
This is probably neither here nor there in actual usage in the UI, but it is important for running the unit tests when the UI module has been imported. That's because `GafferUI.EventLoop` unconditionally installs a UIThreadCallHandler that just buffers the calls until an event loop is started. That prolonged Job lifetimes way outside of their associated tests, eventually causing this spooky action at a distance : ``` Traceback (most recent call last): File "/__w/gaffer/gaffer/build/python/GafferTest/BoxTest.py", line 1138, in testComputeNodeCastDoesntRequirePython node = CastChecker() File "/__w/gaffer/gaffer/build/python/GafferTest/BoxTest.py", line 1129, in __init__ self["out"] = Gaffer.IntPlug( direction = Gaffer.Plug.Direction.Out ) IECore.Exception: Traceback (most recent call last): File "/__w/gaffer/gaffer/build/python/GafferTest/BoxTest.py", line 1133, in isInstanceOf raise Exception( "Cast to ComputeNode should not require Python" ) Exception: Cast to ComputeNode should not require Python ``` How did that happen? Well, a Job contains a BackgroundTask, and `self["out"]` is a graph edit, and graph edits cancel background tasks, and the cancellation code does a `runTimeCast()` on the subject of the edit. So if a BackgroundTask exists thanks to EventLoop, then the cancellation code runs and does a `runTimeCast()` on `CastChecker`, which throws. It really would be better if EventLoop scoped its UIThreadCallHandler between `start()` and `stop()`.
Configuration menu - View commit details
-
Copy full SHA for 26f3d7f - Browse repository at this point
Copy the full SHA 26f3d7fView commit details -
TestCase : Stop wrapped
RefCounted
classes outliving their testsWe were already doing this in a few specific tests but are now doing it for all. Our intention is still that our Python code should never make circular references between Python objects (and we have checks for that in `GafferUITest.TestCase.tearDown()` among other places), so we're not calling `gc.collect()`. But we can't do anything about the unfortunate circular references created by RefCountedBinding itself, and it's preferable to have those broken predictably at the end of the test rather than unpredictably when a later test runs. The immediate motivation for doing this is to destroy the `LocalDispatcher` instances and their associated `JobPool` in `LocalDispatcherTest`, before we run any other tests. If we don't, then the completed BackgroundTasks in the jobs cause a failure in `BoxTest.testComputeNodeCastDoesntRequirePython`. See previous commit for more details.
Configuration menu - View commit details
-
Copy full SHA for 094f416 - Browse repository at this point
Copy the full SHA 094f416View commit details -
ArnoldTextureBake : Don't use
LocalDispatcher.defaultJobPool()
If we use the default pool, then the internal jobs show up confusingly in the LocalJobs editor. This wasn't a problem before as completed jobs were being removed immediately.
Configuration menu - View commit details
-
Copy full SHA for e3cf82f - Browse repository at this point
Copy the full SHA e3cf82fView commit details -
Configuration menu - View commit details
-
Copy full SHA for 394abe0 - Browse repository at this point
Copy the full SHA 394abe0View commit details -
Configuration menu - View commit details
-
Copy full SHA for 5adb3c1 - Browse repository at this point
Copy the full SHA 5adb3c1View commit details -
Tests : Don't use
LocalDispatcher.defaultJobPool()
This pollutes other tests with old Job objects hanging around in the pool. The only exception is `LocalDispatcherTest.testShutdownDuringBackgroundDispatch()` which is testing functionality of the default pool, and does so in a subprocess.
Configuration menu - View commit details
-
Copy full SHA for d6d5366 - Browse repository at this point
Copy the full SHA d6d5366View commit details -
LocalDispatcher : Fix
Job.statistics()
errors on WindowsThe `ps` subprocess is erroring about unknown arguments, which then causes us to return an empty dict. This was causing `LocalDispatcherTest.testShutdownDuringBackgroundDispatch()` to hang forever, waiting to receive the PID.
Configuration menu - View commit details
-
Copy full SHA for 98a702d - Browse repository at this point
Copy the full SHA 98a702dView commit details -
Configuration menu - View commit details
-
Copy full SHA for 3486005 - Browse repository at this point
Copy the full SHA 3486005View commit details