-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Liveness bug in update? #3244
Comments
Happened in nightlies last night (not really, I forgot to click "Comment"): |
But really, it did happen again last night: |
Some notes from Slack about this bug: A (lightly edited) excerpt from the nugget:
The exception is from the update listener implementing the transform in op.apply(proxy). Several partitioned tables and proxies are being “leaked”, but the important pair are the ones output by that transform. Only the transformed proxy’s target’s table is directly an ancestor of the result. If the transformed proxy or its target are GCed before the enclosing liveness scope is released, the result won’t be cleaned up until later, when those references are handles by the liveness cleanup reference processor. I can “fix” the problem the same way we fixed it in updateBy, with a computeEnclosed that keeps the enclosed referents strongly reachable until it releases the scope and returns the result. That said, a user could manage to do something like this, and the end result would be an unpredictable liveness failure from out of nowhere. One solution would be to make the transform-result-PT keep a hard ref to the proxy it was created for (feels backwards, though), and for the merge to keep a hard ref to the PT it was merged from. We’d be keeping those extra objects around indefinitely, however. But keeping them around longer, perversely, helps make it easier to ensure proactive cleanup. Alternatively, we could tryRetainReference the listener at the start of the cycle, short-circuit if that fails, and dropReference it upon completion. Meaning, prevent it from being in an ambiguous state during actual transform update processing. I suppose proxies could be in a memoization cache of sorts on the PT, which would address that part. They’re immutable, and of more or less constant direct size. (edited) And it’s not totally terrible to keep the PT around in the UnionSourceManager, for similar reasons. I think the listener behavior change is also good, as a “belt and suspenders” measure. |
I've made the change to |
I encountered an identical liveness complaint near-simultaneously in the same nightly run from
io.deephaven.engine.table.impl.PartitionedTableTest.testExecutionContext
:Gradle build scans:
CI run:
https://github.com/rcaudy/deephaven-core/actions/runs/3785209777
Other failing tests in those runs are cascading failures due to the UGP shared lock, which is due to be addressed under #3137
The text was updated successfully, but these errors were encountered: