Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apparent double-notify or notify-on-construction-step downstream of partitioned table transform #3127

Closed
jjbrosnan opened this issue Nov 30, 2022 · 7 comments · Fixed by #3449
Assignees
Labels
bug Something isn't working partitioned_tables query engine release blocker A bug/behavior that puts is below the "good enough" threshold to release.
Milestone

Comments

@jjbrosnan
Copy link
Contributor

Description

ERROR EFAULT.refreshThread | .AsyncClientErrorNotifier | Error in table update: io.deephaven.base.verify.AssertionFailure: Assertion failed: asserted lastNotificationStep < LogicalClock.DEFAULT.currentStep(), instead lastNotificationStep == 14241, LogicalClock.DEFAULT.currentStep() == 14241.

The above error has been raised (part of a bigger stack trace) in two cases:

  • Partitioning, updating, and merging a table.
  • Replaying table data (TableReplayer)

This could be a double-notification bug, or something else.

Steps to reproduce

This was encountered by two end-users. I don't have the exact code they ran to make it happen.

Expected results

Tables merging, and data being played back.

Actual results

The stack trace.

Additional details and attachments

I don't have any at the moment. I will paste them in comments below

Versions

  • Deephaven: ...
  • OS: ...
  • Browser: ...
  • Docker: ...
@jjbrosnan jjbrosnan added bug Something isn't working triage labels Nov 30, 2022
@rcaudy rcaudy self-assigned this Nov 30, 2022
@rcaudy
Copy link
Member

rcaudy commented Nov 30, 2022

Full stack trace from one user:

10:46:07.857 ERROR EFAULT.refreshThread | .AsyncClientErrorNotifier | Error in table update: io.deephaven.base.verify.AssertionFailure: Assertion failed: asserted lastNotificationStep < LogicalClock.DEFAULT.currentStep(), instead lastNotificationStep == 8261, LogicalClock.DEFAULT.currentStep() == 8261.
        at io.deephaven.base.verify.Assert.fail(Assert.java:99)
        at [io.deephaven.base.verify.Assert.lt](http://io.deephaven.base.verify.assert.lt/)(Assert.java:673)
        at io.deephaven.engine.table.impl.BaseTable.notifyListeners(BaseTable.java:789)
        at io.deephaven.engine.table.impl.SelectOrUpdateListener.completionRoutine(SelectOrUpdateListener.java:114)
        at io.deephaven.engine.table.impl.SelectOrUpdateListener$1.onAllRequiredColumnsCompleted(SelectOrUpdateListener.java:94)
        at io.deephaven.engine.table.impl.select.analyzers.SelectAndViewAnalyzer$SelectLayerCompletionHandler.onLayerCompleted(SelectAndViewAnalyzer.java:485)
        at io.deephaven.engine.table.impl.select.analyzers.SelectColumnLayer.doSerialApplyUpdate(SelectColumnLayer.java:237)
        at io.deephaven.engine.table.impl.select.analyzers.SelectColumnLayer$1.lambda$onAllRequiredColumnsCompleted$1(SelectColumnLayer.java:188)
        at io.deephaven.engine.table.impl.select.analyzers.SelectAndViewAnalyzer$ImmediateJobScheduler.submit(SelectAndViewAnalyzer.java:644)
        at io.deephaven.engine.table.impl.select.analyzers.SelectColumnLayer$1.onAllRequiredColumnsCompleted(SelectColumnLayer.java:186)
        at io.deephaven.engine.table.impl.select.analyzers.SelectAndViewAnalyzer$SelectLayerCompletionHandler.onLayerCompleted(SelectAndViewAnalyzer.java:485)
        at io.deephaven.engine.table.impl.select.analyzers.BaseLayer.applyUpdate(BaseLayer.java:75)
        at io.deephaven.engine.table.impl.select.analyzers.SelectColumnLayer.applyUpdate(SelectColumnLayer.java:132)
        at io.deephaven.engine.table.impl.SelectOrUpdateListener.onUpdate(SelectOrUpdateListener.java:90)
        at io.deephaven.engine.table.impl.InstrumentedTableUpdateListener$Notification.lambda$run$0(InstrumentedTableUpdateListener.java:37)
        at io.deephaven.engine.table.impl.InstrumentedTableListenerBase$NotificationBase.doRunInternal(InstrumentedTableListenerBase.java:294)
        at io.deephaven.engine.table.impl.InstrumentedTableListenerBase$NotificationBase.doRun(InstrumentedTableListenerBase.java:272)
        at io.deephaven.engine.table.impl.InstrumentedTableUpdateListener$Notification.run(InstrumentedTableUpdateListener.java:37)
        at io.deephaven.engine.updategraph.NotificationQueue$Notification.runInContext(NotificationQueue.java:60)
        at io.deephaven.engine.updategraph.UpdateGraphProcessor.runNotification(UpdateGraphProcessor.java:1298)
        at io.deephaven.engine.updategraph.UpdateGraphProcessor$QueueNotificationProcessor.doWork(UpdateGraphProcessor.java:1459)
        at io.deephaven.engine.updategraph.UpdateGraphProcessor.flushNormalNotificationsAndCompleteCycle(UpdateGraphProcessor.java:1177)
        at io.deephaven.engine.updategraph.UpdateGraphProcessor.flushNotificationsAndCompleteCycle(UpdateGraphProcessor.java:1122)
        at io.deephaven.engine.updategraph.UpdateGraphProcessor.lambda$doRefresh$10(UpdateGraphProcessor.java:1736)
        at io.deephaven.util.locks.FunctionalLock.doLocked(FunctionalLock.java:31)
        at io.deephaven.engine.updategraph.UpdateGraphProcessor.doRefresh(UpdateGraphProcessor.java:1726)
        at io.deephaven.engine.updategraph.UpdateGraphProcessor.refreshAllTables(UpdateGraphProcessor.java:1713)
        at io.deephaven.engine.updategraph.UpdateGraphProcessor.refreshTablesAndFlushNotifications(UpdateGraphProcessor.java:1567)
        at io.deephaven.engine.updategraph.UpdateGraphProcessor$1.run(UpdateGraphProcessor.java:276)

@rcaudy
Copy link
Member

rcaudy commented Nov 30, 2022

Current theory involves partitioned table transformations.

@rcaudy rcaudy added query engine release blocker A bug/behavior that puts is below the "good enough" threshold to release. partitioned_tables and removed triage labels Nov 30, 2022
@rcaudy rcaudy added this to the Nov 2022 milestone Nov 30, 2022
@jjbrosnan
Copy link
Contributor Author

From the TableReplayer occurence (partial stack trace):

r-Scheduler-Serial-1 | i.d.s.s.SessionState      | Internal Error '8a5d5fb7-cb49-49da-896c-e54523dbb5f4' java.lang.RuntimeException: Error in Python interpreter:
Type: <class 'deephaven.dherror.DHError'>
Value: failed to add a historical table. : RuntimeError: io.deephaven.base.verify.AssertionFailure: Assertion failed: asserted lastNotificationStep < LogicalClock.DEFAULT.currentStep(), instead lastNotificationStep == 49824, LogicalClock.DEFAULT.currentStep() == 49824.
Traceback (most recent call last):
  File "/opt/deephaven/venv/lib/python3.10/site-packages/deephaven/replay.py", line 60, in add_table
    replay_table = Table(j_table=self._j_replayer.replay(table.j_table, col))
RuntimeError: io.deephaven.base.verify.AssertionFailure: Assertion failed: asserted lastNotificationStep < LogicalClock.DEFAULT.currentStep(), instead lastNotificationStep == 49824, LogicalClock.DEFAULT.currentStep() == 49824.

I'll see if I can get any more information.

@rcaudy
Copy link
Member

rcaudy commented Nov 30, 2022

From the TableReplayer occurence (partial stack trace):

r-Scheduler-Serial-1 | i.d.s.s.SessionState      | Internal Error '8a5d5fb7-cb49-49da-896c-e54523dbb5f4' java.lang.RuntimeException: Error in Python interpreter:
Type: <class 'deephaven.dherror.DHError'>
Value: failed to add a historical table. : RuntimeError: io.deephaven.base.verify.AssertionFailure: Assertion failed: asserted lastNotificationStep < LogicalClock.DEFAULT.currentStep(), instead lastNotificationStep == 49824, LogicalClock.DEFAULT.currentStep() == 49824.
Traceback (most recent call last):
  File "/opt/deephaven/venv/lib/python3.10/site-packages/deephaven/replay.py", line 60, in add_table
    replay_table = Table(j_table=self._j_replayer.replay(table.j_table, col))
RuntimeError: io.deephaven.base.verify.AssertionFailure: Assertion failed: asserted lastNotificationStep < LogicalClock.DEFAULT.currentStep(), instead lastNotificationStep == 49824, LogicalClock.DEFAULT.currentStep() == 49824.

I'll see if I can get any more information.

This is completely distinct. Please file a separate issue for the bug in ReplayTable construction.

@rcaudy rcaudy changed the title Notification step assertion failures Apparent double-notify or notify-on-construction-step downstream of partitioned table transform Nov 30, 2022
@rcaudy
Copy link
Member

rcaudy commented Dec 4, 2022

I don't have anything in the user's logs that can help narrow this down, but after substantial testing and code evaluation I'm pretty sure partitioned table transform manages dependencies correctly. I've done some optimization and some tightening for error handling, but no reason to think I've isolated the issue.

@rcaudy
Copy link
Member

rcaudy commented Dec 6, 2022

The change that merged includes the aforementioned optimization and error checking, but also changes how swap listeners work to avoid redundant, hard-to-break links in the notification-delivery DAG. I'm optimistic that this will drop the noise level if there is in fact an issue of the postulated type.

@rcaudy
Copy link
Member

rcaudy commented Feb 28, 2023

I don't have enough history to know if this report was discovered with parallelism enabled, or not. If yes, I think the issue was a double-dispatch due to an error in the way SelectAndViewAnalyzer kicks off dependent layers, which will be fixed in #3449 . I'm going to close this ticket with that PR unless/until we see more problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working partitioned_tables query engine release blocker A bug/behavior that puts is below the "good enough" threshold to release.
Projects
None yet
2 participants