Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Local dispatcher improvements fixes #17

Closed
wants to merge 38 commits into from

Commits on Nov 28, 2023

  1. GafferDispatchUI : Add LocalJobs editor

    This is basically the window that the "Execute/View Local Jobs" menu item used to show, but refactored into a dockable editor panel. It still leaves a lot to be desired.
    johnhaddon committed Nov 28, 2023
    Configuration menu
    Copy the full SHA
    4468a44 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    150b445 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    15ff13c View commit details
    Browse the repository at this point in the history
  4. LocalDispatcher : Share foreground/background dispatch code

    This removes a fair bit of duplicate code. There were quite a few weird little differences in the reporting logic in particular, which I could see no reason for, so they are gone. The two distinctions between foreground and background dispatches that remain are :
    
    - Background execution runs tasks in a subprocess, and polls them while checking for cancellation.
    - Background execution suppresses exceptions (while still reporting them via the job log), while foreground execution propagates them.
    johnhaddon committed Nov 28, 2023
    Configuration menu
    Copy the full SHA
    b8f4083 View commit details
    Browse the repository at this point in the history
  5. LocalDispatcher : Remove JobPool.jobFailedSignal()

    - This appears to be completely unused.
    - A `jobStatusChangedSignal()` would be more generally useful.
    - But signals about status changing would make more sense on the Jobs themselves. We'll deal with this in a later commit.
    johnhaddon committed Nov 28, 2023
    Configuration menu
    Copy the full SHA
    620a884 View commit details
    Browse the repository at this point in the history
  6. LocalDispatcher : Simplify status tracking

    We were storing a status per batch, but in fact the only status we ever
    reported to the outside world (via `failed()` and `killed()`) was the
    status of the root batch. And to get the root batch into a failed state
    we were going via a convoluted route whereby `__reportFailed()` would
    mark the current batch as failed, and call `JobPool._failed()` which would
    in turn call back to `Job._fail()` to mark the root batch as failed. Much
    easier to manage a single `self.__status` directly, and limit the per-batch
    data to a simple flag that tells us if we've executed it yet or not.
    
    At the same time, we explicitly store the current batch instead of having
    to do a search for it every time. Note that we're assuming that attribute assignment (for `self.__currentBatch`) is atomic (it is for CPython) rather than using a mutex to synchronize access between the background and UI threads.
    johnhaddon committed Nov 28, 2023
    Configuration menu
    Copy the full SHA
    d335129 View commit details
    Browse the repository at this point in the history
  7. LocalDispatcher : Remove JobPool.failedJobs()

    Instead, just keep failed jobs in the main jobs list. This prevents them
    from jumping around in the LocalJobs UI, and simplifies the API too.
    johnhaddon committed Nov 28, 2023
    Configuration menu
    Copy the full SHA
    02b77b7 View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    8933c0c View commit details
    Browse the repository at this point in the history
  9. LocalDispatcher : Don't derive JobPool from RunTimeTyped

    There was no benefit in that.
    johnhaddon committed Nov 28, 2023
    Configuration menu
    Copy the full SHA
    61f02d0 View commit details
    Browse the repository at this point in the history
  10. Graphics : Add completed status icon for dispatcher jobs

    Also move icons into their own section, and spruce up the other icons a little bit.
    johnhaddon committed Nov 28, 2023
    Configuration menu
    Copy the full SHA
    c0490b3 View commit details
    Browse the repository at this point in the history
  11. LocalDispatcher : Keep all jobs in pool until explicitly removed

    Otherwise you get no opportunity to see the job logs in the LocalJobs
    panel, and for quick jobs its not even clear if the job happened. Also
    add a `Job.status()` method since there was no public way of determining
    running/complete status before (it was assumed that anything in the list
    that wasn't killed or failed was running).
    
    Remove `Job.killed()` and `Job.failed()` since they are superceded by
    `Job.status()`.
    johnhaddon committed Nov 28, 2023
    Configuration menu
    Copy the full SHA
    3a1c2f4 View commit details
    Browse the repository at this point in the history
  12. Configuration menu
    Copy the full SHA
    fb77f12 View commit details
    Browse the repository at this point in the history
  13. LocalDispatcher : Replace threading.Thread() with BackgroundTask

    This gives us a more natural cancellation mechanism, and means we're
    using a standard TBB worker thread and therefore are not creating unnecessary thread locals.
    
    Since cancellation operates via exceptions, it's more natural to also allow error exceptions to propagate out of `__executeWalk()` rather than using a `False` return value. This also means that all management of `self.__status` can be consolidated into one place in `__executeInternal()`.
    johnhaddon committed Nov 28, 2023
    Configuration menu
    Copy the full SHA
    1403fa9 View commit details
    Browse the repository at this point in the history
  14. LocalDispatcher : Protect Job.execute() method

    This should not have been part of the public API - it only exists so that `doDispatch()` can initiate execution _after_ adding the job to the pool. Nobody needs to call it again after that, and definitely not client code.
    johnhaddon committed Nov 28, 2023
    Configuration menu
    Copy the full SHA
    dfec9a5 View commit details
    Browse the repository at this point in the history
  15. LocalDispatcher : Test JobPool signals

    And make `removeJob()` public since it is called by the UI, and add public `addJob()` method for symmetry.
    johnhaddon committed Nov 28, 2023
    Configuration menu
    Copy the full SHA
    185da44 View commit details
    Browse the repository at this point in the history
  16. Configuration menu
    Copy the full SHA
    69bac0c View commit details
    Browse the repository at this point in the history
  17. Configuration menu
    Copy the full SHA
    b3d7557 View commit details
    Browse the repository at this point in the history
  18. Configuration menu
    Copy the full SHA
    087b59d View commit details
    Browse the repository at this point in the history
  19. LocalJobs : Swap Log and Details tabs round

    The Log tab is more useful.
    johnhaddon committed Nov 28, 2023
    Configuration menu
    Copy the full SHA
    c717e6f View commit details
    Browse the repository at this point in the history
  20. LocalJobs : Remove redundant update calls

    We'll be updating using our signal handlers anyway.
    johnhaddon committed Nov 28, 2023
    Configuration menu
    Copy the full SHA
    6e2beb2 View commit details
    Browse the repository at this point in the history
  21. LocalJobs : Update immediately on job status changes

    And refactor the other updates to make it clearer what is being updated and why.
    johnhaddon committed Nov 28, 2023
    Configuration menu
    Copy the full SHA
    2fbf877 View commit details
    Browse the repository at this point in the history
  22. Configuration menu
    Copy the full SHA
    d5d77f2 View commit details
    Browse the repository at this point in the history
  23. LocalJobs : Update messages widget when messages change

    Before they were only updated when selecting a job, so you didn't get any streaming feedback as to what a job was doing.
    johnhaddon committed Nov 28, 2023
    Configuration menu
    Copy the full SHA
    02f854d View commit details
    Browse the repository at this point in the history
  24. LocalDispatcher : Improve messaging

    - Prefix batch messages with the node name, not the job name
    - Include execution time in batch completion message
    - Output stack traces line-by-line to work better with MessagesWidget.Role.Log.
    - Add debug message containing subprocess command lines.
    - Omit batch messages for root batch
    johnhaddon committed Nov 28, 2023
    Configuration menu
    Copy the full SHA
    d0afbcf View commit details
    Browse the repository at this point in the history
  25. LocalDispatcher : Rename __batch to __rootBatch

    That's what it is.
    johnhaddon committed Nov 28, 2023
    Configuration menu
    Copy the full SHA
    2c828b4 View commit details
    Browse the repository at this point in the history
  26. Configuration menu
    Copy the full SHA
    696e0e4 View commit details
    Browse the repository at this point in the history
  27. LocalDispatcher/LocalJobs : Add more properties, remove current batch

    The LocalJobs Properties tab now contains only static properties of the job, so it is no longer a bug that we're not updating it dynamically when the current batch changes.
    johnhaddon committed Nov 28, 2023
    Configuration menu
    Copy the full SHA
    5269a1c View commit details
    Browse the repository at this point in the history
  28. LocalJobs : Improve buttons

    - Move just below job listing, since that's what they operate on.
    - Remove draggable splittable handle separating buttons from widget above.
    - Reduce size.
    - Update appropriately when job status changes.
    johnhaddon committed Nov 28, 2023
    Configuration menu
    Copy the full SHA
    5cc6903 View commit details
    Browse the repository at this point in the history
  29. LocalDispatcher : Kill outstanding background jobs at Gaffer shutdown

    Without this, the BackgroundTask and `outputHandler` threads would continue running while Python was torn down around them on the main thread. This led to crashes when trying to release the GIL from the BackgroundTask.
    
    The Changes.md entry doesn't refer to crashes, because the behaviour in 1.3 was different due to us using a Python thread instead of a BackgroundTask. In 1.3 Python would wait patiently for the Job thread to finish, even if that meant waiting a very very very long time. _Unless_ the user hit `Ctrl+C` again, in which case it would forcibly quit, leaving behind zombie task processes.
    
    I was in two minds as to whether or not to apply the exit handler to all JobPools or just the default one. Since we don't have any non-default pools outside of the unit tests I don't really have any information to base a decision on, so went with only the default pool for now.
    johnhaddon committed Nov 28, 2023
    Configuration menu
    Copy the full SHA
    4c17093 View commit details
    Browse the repository at this point in the history
  30. BackgroundTask : Fix crash caused by Views with no in plug

    Strictly speaking, no such View should ever exist, because the `in` plug is added in the View constructor. But `ImageView::insertConverter()` currently replaces it with another plug, meaning that it is missing when the replacement is added, which is when `cancelAffectedTasks()` is called. This was causing a crash in `GafferImageUITest.ImageViewTest.testDeriving`. The problem had gone unnoticed until now because previously there were no BackgroundTasks to cancel so we never got as far as the bad check. But now that the LocalDispatcher uses a BackgroundTask, some completed tasks lay dormant waiting for garbage collection after LocalDispatcherTest has run, and we hit the null dereference.
    johnhaddon committed Nov 28, 2023
    Configuration menu
    Copy the full SHA
    0c1412c View commit details
    Browse the repository at this point in the history
  31. LocalDispatcher : Use WeakMethod with callOnUIThread()

    This is probably neither here nor there in actual usage in the UI, but it is
    important for running the unit tests when the UI module has been imported. That's
    because `GafferUI.EventLoop` unconditionally installs a UIThreadCallHandler that
    just buffers the calls until an event loop is started. That prolonged Job lifetimes
    way outside of their associated tests, eventually causing this spooky action at a
    distance :
    
    ```
    Traceback (most recent call last):
    File "/__w/gaffer/gaffer/build/python/GafferTest/BoxTest.py", line 1138, in testComputeNodeCastDoesntRequirePython
        node = CastChecker()
      File "/__w/gaffer/gaffer/build/python/GafferTest/BoxTest.py", line 1129, in __init__
        self["out"] = Gaffer.IntPlug( direction = Gaffer.Plug.Direction.Out )
    IECore.Exception: Traceback (most recent call last):
      File "/__w/gaffer/gaffer/build/python/GafferTest/BoxTest.py", line 1133, in isInstanceOf
        raise Exception( "Cast to ComputeNode should not require Python" )
    Exception: Cast to ComputeNode should not require Python
    ```
    
    How did that happen? Well, a Job contains a BackgroundTask, and `self["out"]` is
    a graph edit, and graph edits cancel background tasks, and the cancellation code does a `runTimeCast()`
    on the subject of the edit. So if a BackgroundTask exists thanks to EventLoop, then
    the cancellation code runs and does a `runTimeCast()` on `CastChecker`, which throws.
    
    It really would be better if EventLoop scoped its UIThreadCallHandler between `start()` and `stop()`.
    johnhaddon committed Nov 28, 2023
    Configuration menu
    Copy the full SHA
    26f3d7f View commit details
    Browse the repository at this point in the history
  32. TestCase : Stop wrapped RefCounted classes outliving their tests

    We were already doing this in a few specific tests but are now doing it for all. Our intention is still that our Python code should never make circular references between Python objects (and we have checks for that in `GafferUITest.TestCase.tearDown()` among other places), so we're not calling `gc.collect()`. But we can't do anything about the unfortunate circular references created by RefCountedBinding itself, and it's preferable to have those broken predictably at the end of the test rather than unpredictably when a later test runs.
    
    The immediate motivation for doing this is to destroy the `LocalDispatcher` instances and their associated `JobPool` in `LocalDispatcherTest`, before we run any other tests. If we don't, then the completed BackgroundTasks in the jobs cause a failure in `BoxTest.testComputeNodeCastDoesntRequirePython`. See previous commit for more details.
    johnhaddon committed Nov 28, 2023
    Configuration menu
    Copy the full SHA
    094f416 View commit details
    Browse the repository at this point in the history
  33. ArnoldTextureBake : Don't use LocalDispatcher.defaultJobPool()

    If we use the default pool, then the internal jobs show up confusingly in the LocalJobs editor. This wasn't a problem before as completed jobs were being removed immediately.
    johnhaddon committed Nov 28, 2023
    Configuration menu
    Copy the full SHA
    e3cf82f View commit details
    Browse the repository at this point in the history
  34. Configuration menu
    Copy the full SHA
    394abe0 View commit details
    Browse the repository at this point in the history
  35. pathFilterEnumFix

    johnhaddon committed Nov 28, 2023
    Configuration menu
    Copy the full SHA
    5adb3c1 View commit details
    Browse the repository at this point in the history
  36. Tests : Don't use LocalDispatcher.defaultJobPool()

    This pollutes other tests with old Job objects hanging around in the pool.
    The only exception is `LocalDispatcherTest.testShutdownDuringBackgroundDispatch()`
    which is testing functionality of the default pool, and does so in a subprocess.
    johnhaddon committed Nov 28, 2023
    Configuration menu
    Copy the full SHA
    d6d5366 View commit details
    Browse the repository at this point in the history
  37. LocalDispatcher : Fix Job.statistics() errors on Windows

    The `ps` subprocess is erroring about unknown arguments, which then causes us to return an empty dict.
    This was causing `LocalDispatcherTest.testShutdownDuringBackgroundDispatch()`
    to hang forever, waiting to receive the PID.
    johnhaddon committed Nov 28, 2023
    Configuration menu
    Copy the full SHA
    98a702d View commit details
    Browse the repository at this point in the history
  38. Configuration menu
    Copy the full SHA
    3486005 View commit details
    Browse the repository at this point in the history