Skip to content

Commit

Permalink
Acknowledge process task if corresponding node cannot be loaded (#2936)
Browse files Browse the repository at this point in the history
Before this commit the `ProcessLauncher._continue` method was raising a
`plumpy.TaskRejected` exception if the node of a process task cannot be
loaded, either because it does not exist or it cannot be uniquely
resolved. However, this will cause RabbitMQ to requeue the task, causing
the task to be resent. This will result in the task ping-ponging between
RabbitMQ and the daemon workers without end. This situation typically
occurs when a user deletes a process node while it has not yet properly
terminated and therefore the tasks has not yet been acknowledged. This
situation is unrecoverable so the task should simply be acknowledged.
  • Loading branch information
sphuber authored May 28, 2019
1 parent 9b5c015 commit 95ced04
Showing 1 changed file with 7 additions and 3 deletions.
10 changes: 7 additions & 3 deletions aiida/manage/external/rmq.py
Original file line number Diff line number Diff line change
Expand Up @@ -136,7 +136,6 @@ def _continue(self, communicator, pid, nowait, tag=None):
:param nowait: if True don't wait for the process to finish, just return the pid, otherwise wait and
return the results
:param tag: the tag of the checkpoint to continue from
:raises plumpy.TaskRejected: if the node corresponding to the task cannot be loaded
"""
from aiida.common import exceptions
from aiida.engine.exceptions import PastException
Expand All @@ -145,9 +144,14 @@ def _continue(self, communicator, pid, nowait, tag=None):

try:
node = load_node(pk=pid)
except (exceptions.MultipleObjectsError, exceptions.NotExistent) as exception:
except (exceptions.MultipleObjectsError, exceptions.NotExistent):
# In this case, the process node corresponding to the process id, cannot be resolved uniquely or does not
# exist. The latter being the most common case, where someone deleted the node, before the process was
# properly terminated. Since the node is never coming back and so the process will never be able to continue
# we raise `Return` instead of `TaskRejected` because the latter would cause the task to be resent and start
# to ping-pong between RabbitMQ and the daemon workers.
LOGGER.exception('Cannot continue process<%d>', pid)
raise plumpy.TaskRejected('Cannot continue process: {}'.format(exception))
raise gen.Return(False)

if node.is_terminated:

Expand Down

0 comments on commit 95ced04

Please sign in to comment.