Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Daemon worker started through the started_daemon_client fixture can except leaving it incapable of running other processes #5687

Closed
sphuber opened this issue Oct 5, 2022 · 0 comments · Fixed by #5689

Comments

@sphuber
Copy link
Contributor

sphuber commented Oct 5, 2022

The recently added fixture started_daemon_client makes it easier to run unit tests that require a running daemon. Since it was added, tests that rely on it seem to randomly except. It seems the cause is that in some cases, the daemon worker hits an exception when trying to finalize a process it was running. Specifically, at the end of a process' lifetime, the on_terminated method is called, which for a Process means to delete the process checkpoint, which is stored as an attribute. This triggers an exception:

Error: Failed to delete checkpoint: UPDATE statement on table 'db_dbnode' expected to update 1 row(s); 0 were matched.
Traceback (most recent call last):
 File "/home/runner/work/aiida-core/aiida-core/aiida/engine/processes/process.py", line 413, in on_terminated
   self.runner.persister.delete_checkpoint(self.pid)
 File "/home/runner/work/aiida-core/aiida-core/aiida/engine/persistence.py", line 152, in delete_checkpoint
   calc.delete_checkpoint()
 File "/home/runner/work/aiida-core/aiida-core/aiida/orm/nodes/process/process.py", line 476, in delete_checkpoint
   self.base.attributes.delete(self.CHECKPOINT_KEY)
 File "/home/runner/work/aiida-core/aiida-core/aiida/orm/nodes/attributes.py", line 153, in delete
   self._backend_node.delete_attribute(key)
 File "/home/runner/work/aiida-core/aiida-core/aiida/storage/psql_dos/orm/nodes.py", line 283, in delete_attribute
   self._flush_if_stored({'attributes'})
 File "/home/runner/work/aiida-core/aiida-core/aiida/storage/psql_dos/orm/entities.py", line 96, in _flush_if_stored
   self.model._flush(fields)  # pylint: disable=protected-access
 File "/home/runner/work/aiida-core/aiida-core/aiida/storage/psql_dos/orm/utils.py", line 155, in _flush
   self.save()
 File "/home/runner/work/aiida-core/aiida-core/aiida/storage/psql_dos/orm/utils.py", line 122, in save
   self.session.commit()
 File "/opt/hostedtoolcache/Python/3.8.14/x64/lib/python3.8/site-packages/sqlalchemy/orm/session.py", line 1428, in commit
   self._transaction.commit(_to_root=self.future)
 File "/opt/hostedtoolcache/Python/3.8.14/x64/lib/python3.8/site-packages/sqlalchemy/orm/session.py", line 829, in commit
   self._prepare_impl()
 File "/opt/hostedtoolcache/Python/3.8.14/x64/lib/python3.8/site-packages/sqlalchemy/orm/session.py", line 808, in _prepare_impl
   self.session.flush()
 File "/opt/hostedtoolcache/Python/3.8.14/x64/lib/python3.8/site-packages/sqlalchemy/orm/session.py", line 3345, in flush
   self._flush(objects)
 File "/opt/hostedtoolcache/Python/3.8.14/x64/lib/python3.8/site-packages/sqlalchemy/orm/session.py", line 3485, in _flush
   transaction.rollback(_capture_exception=True)
 File "/opt/hostedtoolcache/Python/3.8.14/x64/lib/python3.8/site-packages/sqlalchemy/util/langhelpers.py", line 70, in __exit__
   compat.raise_(
 File "/opt/hostedtoolcache/Python/3.8.14/x64/lib/python3.8/site-packages/sqlalchemy/util/compat.py", line 207, in raise_
   raise exception
 File "/opt/hostedtoolcache/Python/3.8.14/x64/lib/python3.8/site-packages/sqlalchemy/orm/session.py", line 3445, in _flush
   flush_context.execute()
 File "/opt/hostedtoolcache/Python/3.8.14/x64/lib/python3.8/site-packages/sqlalchemy/orm/unitofwork.py", line 456, in execute
   rec.execute(self)
 File "/opt/hostedtoolcache/Python/3.8.14/x64/lib/python3.8/site-packages/sqlalchemy/orm/unitofwork.py", line 630, in execute
   util.preloaded.orm_persistence.save_obj(
 File "/opt/hostedtoolcache/Python/3.8.14/x64/lib/python3.8/site-packages/sqlalchemy/orm/persistence.py", line 236, in save_obj
   _emit_update_statements(
 File "/opt/hostedtoolcache/Python/3.8.14/x64/lib/python3.8/site-packages/sqlalchemy/orm/persistence.py", line 1034, in _emit_update_statements
   raise orm_exc.StaleDataError(
   sqlalchemy.orm.exc.StaleDataError: UPDATE statement on table 'db_dbnode' expected to update 1 row(s); 0 were matched.

It seems that the node whose checkpoint should be deleted no longer exists, or maybe just the attribute itself no longer exists. It is not quite clear why this is happening, but maybe other test fixtures are cleaning the database (and so removing the node in question) before the daemon worker got the chance to delete the checkpoint.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant