Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dirty propagation bug hunt #2

Closed
wants to merge 27 commits into from
Closed

Conversation

johnhaddon
Copy link
Owner

I'm just making this pull request to trigger a Travis build.

johnhaddon and others added 27 commits April 2, 2015 10:16
The main motivation for this is that an upcoming commit will require propagateDirtiness() to have access to a protected method of Plug. Rather than declare two-way friendship between DependencyNode and Plug, it seems more appropriate to take a route requiring no friendship declarations at all.

This reorganisation also makes for nice parallels between the DependencyNode/Plug and ComputeNode/ValuePlug relationships. In both cases now, the node provides methods that define only internal relationships (affects(), hash() and compute()) between plugs, and then plugs themselves are responsible for the external graph relationships defined by connections, and for triggering the methods provided by the nodes.

This was a private method, so API compatibility is not affected in any way.
This is used to inform a Plug that it has been dirtied by Plug::propagateDirtiness().

Binary incompatibility :

- Added virtual function to Plug
It now allows cache entries to persist from one computation to the next, using the new Plug::dirty() mechanism to remove them when appropriate. This has exposed some bugs in the dirty propagation for some nodes, which will need to be addressed before we can roll this out.
We were using reverse iterators everywhere to work around the fact that topological_sort() returned the results in reverse order. It is more straightforward to simply build the graph with edges in the other direction. This also happens to be essential to get the correct ordering for the batched dirty propagation to be implemented shortly.
Even when making several changes to the graph, dirtiness will now only be signalled when all changes are complete. This prevents observers of the graph from reacting too soon, and performing a compute before edits are complete.
When setting the input to a plug with children, we were signalling dirtiness before the child plugs were in their final consistent state. We now defer signalling until all state changes are complete.
This demonstrates that dirtiness is now correctly propagated when adding/removing options.

Fixes GafferHQ#1039.
Two things seem to be failing here. To start with, it looks like tweaking a value on a ContextVariables node doesn't signal dirtiness properly, because ContextVariables<BaseType>::affects() isn't implemented correctly.
The other problem seems wider ranging: adding a member to/removing it from the "variables" plug on a ContextVariables node (and similar nodes) should trigger a propagateDirtiness and dirty its output, seeing as it affects its value. This seems problematic though - I tried calling propagateDirtiness( this ) in CompoundPlug::childAddedOrRemoved(), but compound plugs don't trigger calls to affects(), so that was the end of that. This is probably affecting other nodes with compound plugs like GafferScene::CustomAttributes too.
Adding, removing or changing a variable now correctly dirties the output plugs.
These now demonstrate that adding, editing and removing a display all correctly propagate dirtiness. Since this is achieved using the new Plug functionality, also removed the old workaround from Outputs::addOutput().
It now increments the __updateCount correctly, to emulate what the UI will do when data arrives.
We might see even less calls to hash() now, but these tests are only about checking that getValue( precomputedHash ) doesn't trigger another one.
It's a fundamental rule of Gaffer's computation engine that hashes and computation results must not depend on anything other than the state of the graph (plug values and connections) and the contents of the Context. Allowing environment variable substitutions in Context::substitute() and then using them in StringPlug::hash() and StringPlug::getValue() violates this rule, and perhaps was unwise. Now that ValuePlug caches hashes, it has no way of knowing when to invalidate the cache based on changes to environment variable, and changing an environment variable can poison the hash.

We work around this by taking a copy of the environment at startup, and using this to perform the substitutions. This also has the added benefit of improved performance - testManyEnvironmentSubstitutions() shows 12% improvement with a standard CentOS bash shell environment, and 62% improvement when 1000 additional environment variables are added.
When being repeated, the refresh count needs to be set for the first use of AlembicSource, as well as the second - without this the repeat test will pick up the wrong file - the one opened second in the first test.
We must never ever let a Python exception leak into pure C++ code, because that code cannot handle it properly, and it leaves the Python exception status set, meaning it will be reported again on reentry to python in a most confusing way.
Dirty propagation is a secondary process triggered by primary actions such as setInput(), setValue() and addChild(). We don't want errors that occur in DependencyNode::affects() to prevent the success of the primary process, so we handle the exceptions during dirty propagation, reporting them as error messages.
When the input is changed, we propagate dirtiness and the plugs end up in the DirtyPlugs graph. Then the nodes get deleted and the DirtyPlugs graph becomes the sole owner of the plugs. Then it clears the graph, the plugs are deleted, and they trigger an new dirty propagation as their children are unparented. Boom. To work around this we use a flag to avoid this reentrant clear/emit behaviour. A better long term solution might be to add support for weak pointers to RefCounted, and use those within DirtyPlugs.

This also required a fix for an egregious bug in the ScopedAssignment class.
Direct access to `environ` is not possible, we must use `_NSGetEnviron()` instead.
@johnhaddon johnhaddon closed this Apr 13, 2015
@johnhaddon johnhaddon deleted the dirtyPropagationBugHunt branch April 13, 2015 09:30
johnhaddon added a commit that referenced this pull request Dec 19, 2019
This obviously isn't quite right, because the InteractiveRender tests crash with the following stack trace :

```
#0  0x0000000000000000 in ?? ()
#1  0x00007fffe18153a6 in foundation::auto_release_ptr<renderer::ITileCallback>::reset (this=0x7fff8c012148, ptr=0x0)
   at /disk1/john/dev/gafferDependencies/Appleseed/working/appleseed-2.1.0-beta/src/appleseed/foundation/utility/autoreleaseptr.h:187
#2  0x00007fffe181137f in renderer::(anonymous namespace)::ProgressiveFrameRenderer::~ProgressiveFrameRenderer (this=0x7fff8c0120a0, __in_chrg=<optimized out>)
   at /disk1/john/dev/gafferDependencies/Appleseed/working/appleseed-2.1.0-beta/src/appleseed/renderer/kernel/rendering/progressive/progressiveframerenderer.cpp:178
#3  0x00007fffe1811592 in renderer::(anonymous namespace)::ProgressiveFrameRenderer::~ProgressiveFrameRenderer (this=0x7fff8c0120a0, __in_chrg=<optimized out>)
   at /disk1/john/dev/gafferDependencies/Appleseed/working/appleseed-2.1.0-beta/src/appleseed/renderer/kernel/rendering/progressive/progressiveframerenderer.cpp:187
#4  0x00007fffe18115ca in renderer::(anonymous namespace)::ProgressiveFrameRenderer::release (this=0x7fff8c0120a0)
   at /disk1/john/dev/gafferDependencies/Appleseed/working/appleseed-2.1.0-beta/src/appleseed/renderer/kernel/rendering/progressive/progressiveframerenderer.cpp:191
#5  0x00007fffe1712fb7 in foundation::auto_release_ptr<renderer::IFrameRenderer>::~auto_release_ptr (this=0x7fff8c01f928, __in_chrg=<optimized out>)
   at /disk1/john/dev/gafferDependencies/Appleseed/working/appleseed-2.1.0-beta/src/appleseed/foundation/utility/autoreleaseptr.h:124
#6  0x00007fffe17121be in renderer::RendererComponents::~RendererComponents (this=0x7fff8c01f890, __in_chrg=<optimized out>)
   at /disk1/john/dev/gafferDependencies/Appleseed/working/appleseed-2.1.0-beta/src/appleseed/renderer/kernel/rendering/renderercomponents.h:73
#7  0x00007fffe1712288 in std::default_delete<renderer::RendererComponents>::operator() (this=0x7fff8c000ae0, __ptr=0x7fff8c01f890)
   at /opt/rh/devtoolset-6/root/usr/include/c++/6.3.1/bits/unique_ptr.h:76
#8  0x00007fffe1711e0d in std::unique_ptr<renderer::RendererComponents, std::default_delete<renderer::RendererComponents> >::reset (this=0x7fff8c000ae0,
   __p=0x7fff8c01f890) at /opt/rh/devtoolset-6/root/usr/include/c++/6.3.1/bits/unique_ptr.h:347
#9  0x00007fffe1710632 in renderer::CPURenderDevice::~CPURenderDevice (this=0x7fff8c0009b0, __in_chrg=<optimized out>)
   at /disk1/john/dev/gafferDependencies/Appleseed/working/appleseed-2.1.0-beta/src/appleseed/renderer/device/cpu/cpurenderdevice.cpp:112
#10 0x00007fffe1710988 in renderer::CPURenderDevice::~CPURenderDevice (this=0x7fff8c0009b0, __in_chrg=<optimized out>)
   at /disk1/john/dev/gafferDependencies/Appleseed/working/appleseed-2.1.0-beta/src/appleseed/renderer/device/cpu/cpurenderdevice.cpp:129
#11 0x00007fffe181f21c in std::default_delete<renderer::IRenderDevice>::operator() (this=0x3093868, __ptr=0x7fff8c0009b0)
   at /opt/rh/devtoolset-6/root/usr/include/c++/6.3.1/bits/unique_ptr.h:76
#12 0x00007fffe181eafd in std::unique_ptr<renderer::IRenderDevice, std::default_delete<renderer::IRenderDevice> >::~unique_ptr (this=0x3093868,
   __in_chrg=<optimized out>) at /opt/rh/devtoolset-6/root/usr/include/c++/6.3.1/bits/unique_ptr.h:239
#13 0x00007fffe181d545 in renderer::MasterRenderer::Impl::~Impl (this=0x3093820, __in_chrg=<optimized out>)
   at /disk1/john/dev/gafferDependencies/Appleseed/working/appleseed-2.1.0-beta/src/appleseed/renderer/kernel/rendering/masterrenderer.cpp:168
#14 0x00007fffe181cfdb in renderer::MasterRenderer::~MasterRenderer (this=0x3088d70, __in_chrg=<optimized out>)
   at /disk1/john/dev/gafferDependencies/Appleseed/working/appleseed-2.1.0-beta/src/appleseed/renderer/kernel/rendering/masterrenderer.cpp:584
#15 0x00007fffc67d0ae0 in (anonymous namespace)::AppleseedRenderer::~AppleseedRenderer() () from /disk1/john/dev/build/gaffer/lib/libGafferAppleseed.so
#16 0x00007fffd8a734b6 in GafferScene::InteractiveRender::stop() () from /disk1/john/dev/build/gaffer/lib/libGafferScene.so
#17 0x00007fffd8a735bf in GafferScene::InteractiveRender::update() () from /disk1/john/dev/build/gaffer/lib/libGafferScene.so
```
johnhaddon added a commit that referenced this pull request Oct 8, 2020
This problem could be triggered as follows :

	1. `gaffer op sequenceLs -gui`
	2. Set `dir` to a directory that doesn't exist.
	3. Hit OK to execute the Op.

It yielded an assertion failure as follows :

	```ASSERT failure in QCoreApplication::sendEvent: "Cannot send events to objects owned by a different thread. Current thread 0x0x7fff90001050. Receiver '' (of type 'QThread') was created in thread 0x0xa902b0", file kernel/qcoreapplication.cpp, line 578```

And a stack trace like so :

	```
	#2  0x00007fffe229d615 in qt_message_fatal (context=..., message=...) at global/qlogging.cpp:1907
	#3  0x00007fffe229e122 in QMessageLogger::fatal (this=this@entry=0x7fffa77fcd90, msg=msg@entry=0x7fffe25b0c60 "ASSERT failure in %s: \"%s\", file %s, line %d") at global/qlogging.cpp:888
	#4  0x00007fffe22985dc in qt_assert_x (where=where@entry=0x7fffe25c0144 "QCoreApplication::sendEvent", what=<optimized out>, file=file@entry=0x7fffe25c00cf "kernel/qcoreapplication.cpp", line=line@entry=578)
	    at global/qglobal.cpp:3220
	#5  0x00007fffe247aaa2 in QCoreApplicationPrivate::checkReceiverThread (receiver=receiver@entry=0xa902b0) at kernel/qcoreapplication.cpp:572
	#6  0x00007fffe16052b2 in QApplication::notify (this=0x1dc4170, receiver=0xa902b0, e=0x7fffa77fd120) at kernel/qapplication.cpp:2902
	#7  0x00007fffdc443cf6 in QApplicationWrapper::notify(QObject*, QEvent*) () from /disk1/john/dev/build/gafferPython2/lib/python2.7/site-packages/PySide2/QtWidgets.so
	#8  0x00007fffe247b0ec in QCoreApplication::notifyInternal2 (receiver=0xa902b0, event=0x7fffa77fd120) at kernel/qcoreapplication.cpp:1088
	#9  0x00007fffe247b2ea in QCoreApplication::sendEvent (receiver=receiver@entry=0xa902b0, event=event@entry=0x7fffa77fd120) at kernel/qcoreapplication.cpp:1476
	#10 0x00007fffe24b426b in QObject::setProperty (this=0xa902b0, name=0x7fffdce01d50 <PySide::invalidatePropertyName> "_PySideInvalidatePtr", value=...) at kernel/qobject.cpp:3945
	#11 0x00007fffdcdffa42 in PySide::getWrapperForQObject(QObject*, SbkObjectType*) () from /disk1/john/dev/build/gafferPython2/lib/python2.7/site-packages/PySide2/libpyside2-python2.7.so.5.12
	#12 0x00007fffdb768443 in Sbk_QObjectFunc_thread () from /disk1/john/dev/build/gafferPython2/lib/python2.7/site-packages/PySide2/QtCore.so
	```

The problem is that when PySide first makes a wrapper for a QObject, it adds a property to that object. Adding a property can only be performed on the thread that owns the object, so the first access to `__qtApplication.thread()` must be performed on the main thread. We ensure that this is the case by accessing it immediately after constructing the QApplication.

The test case for this has to be run externally via `gaffer python` because when running the full test suite we hit a code path that happens to first access `__qtApplication.thread()` from the main thread. The external script mimics what happens in `gaffer op -gui`, where we end up first accessing `__qtApplication.thread()` from a background thread.
johnhaddon added a commit that referenced this pull request Feb 20, 2024
This could be reproduced in Gaffer as follows :

1. Create an Arnold spotlight.
2. Connect a ShaderAssignment below it.
3. Focus the ShaderAssignment.
4. Enable the LightTool.
5. Delete the ShaderAssignment.

The stacktrace looked like this :

<details>

```
#0  0x00007fff7e2dba50 in GafferSceneUI::Private::Inspector::Result::value() const () from /home/john/dev/build/gaffer-1.4/lib/libGafferSceneUI.so
#1  0x00007fff7e308a3c in float const GafferSceneUI::Private::Inspector::Result::typedValue<float>(float const&) const () from /home/john/dev/build/gaffer-1.4/lib/libGafferSceneUI.so
#2  0x00007fff7e2f4ec7 in (anonymous namespace)::SpotLightHandle::handleAngles() const () from /home/john/dev/build/gaffer-1.4/lib/libGafferSceneUI.so
#3  0x00007fff7e2fe8d2 in (anonymous namespace)::SpotLightHandle::addHandleVisualisation(IECoreGL::Group*, bool, bool) const () from /home/john/dev/build/gaffer-1.4/lib/libGafferSceneUI.so
#4  0x00007fff7e2fcacd in (anonymous namespace)::LightToolHandle::renderHandle(GafferUI::Style const*, GafferUI::Style::State) const () from /home/john/dev/build/gaffer-1.4/lib/libGafferSceneUI.so
#5  0x00007fffe2b8d7c3 in GafferUI::Handle::renderLayer(GafferUI::Gadget::Layer, GafferUI::Style const*, GafferUI::Gadget::RenderReason) const () from /home/john/dev/build/gaffer-1.4/lib/libGafferUI.so
#6  0x00007fffe2c0e790 in GafferUI::ViewportGadget::renderLayerInternal(GafferUI::Gadget::RenderReason, GafferUI::Gadget::Layer, Imath_3_1::Matrix44<float> const&, Imath_3_1::Box<Imath_3_1::Vec3<float> > const&, IECoreGL::Selector*) const () from /home/john/dev/build/gaffer-1.4/lib/libGafferUI.so
#7  0x00007fffe2c1543c in GafferUI::ViewportGadget::renderInternal(GafferUI::Gadget::RenderReason, GafferUI::Gadget::Layer) const () from /home/john/dev/build/gaffer-1.4/lib/libGafferUI.so
#8  0x00007fffe2c16728 in GafferUI::ViewportGadget::render() const () from /home/john/dev/build/gaffer-1.4/lib/libGafferUI.so
```

</details>

The main cause was that `LightTool::updateHandleInspections()` was skipping the updates when there was no input scene, which meant the handles were left in a visible state when they should have been hidden. This in turn led to them querying inspectors that returned `null` results unexpectedly on the next render, because they were still connected to the now-deleted scene. This is fixed by removing the early out in `updateHandleInspections()` so that we update the inspectors (in this case, clearing them).

But we can also go further. There is no need to pass a separate scene and context to `LightToolHandle::updateHandlePath()`, because those are both accessible already via the `SceneView` it was passed on construction. So we tidy that up so we always get them from the SceneView, make the `scene()` and `handlePath()` accessors protected as per the existing todo, and remove the `context()` accessor because it is barely used.

The general principle here is that it's easier if UI components don't worry about whether or not something is connected to the ScenePlug or ImagePlugs inputs. There's no real difference between the empty scene from an unconnected input or the empty scene from a connected input that happens to output an empty scene. We're not yet consistent about this everywhere, but that's the direction we're moving in.

# Please enter the commit message for your changes. Lines starting
# with '#' will be kept; you may remove them yourself if you want to.
# An empty message aborts the commit.
#
# Date:      Tue Feb 20 11:59:56 2024 +0000
#
# On branch lightToolCrashFix
# Changes to be committed:
#	modified:   python/GafferSceneUITest/LightToolTest.py
#	modified:   src/GafferSceneUI/LightTool.cpp
#
# Changes not staged for commit:
#	modified:   Changes.md
#
# Untracked files:
#	bits/
#	t.py
#

# Please enter the commit message for your changes. Lines starting
# with '#' will be kept; you may remove them yourself if you want to.
# An empty message aborts the commit.
#
# Date:      Tue Feb 20 11:59:56 2024 +0000
#
# On branch lightToolCrashFix
# Changes to be committed:
#	modified:   Changes.md
#	modified:   python/GafferSceneUITest/LightToolTest.py
#	modified:   src/GafferSceneUI/LightTool.cpp
#
# Untracked files:
#	bits/
#	t.py
#
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants