Skip to content

Commit

Permalink
Unbind interrupt signal handlers as soon as process function terminates
Browse files Browse the repository at this point in the history
The `run_get_node` method of process functions attaches a `kill_process`
handler to interrupt signals, such that when the funtion process is
interrupted during execution, it is properly cleaned up, for example by
setting the process state on the node to `Killed`. These handlers were,
however, not properly unbound after the process was terminated and so
would be triggered even after the process instance was no longer alive.
This would cause the `kill_process` handler to hang. In the case of a
daemon runner, this would prevent the python process from shutting down.
This manifested itself in `verdi daemon stop` timing out if one of the
daemon runners had run just a single process function. Note that the
previous commit already fixed this problem in a different way, by not
even attaching the interrupt handlers for process function in the first
place. But the reasoning for that is different.
  • Loading branch information
sphuber committed Jun 5, 2019
1 parent f646828 commit 3771945
Show file tree
Hide file tree
Showing 2 changed files with 17 additions and 7 deletions.
22 changes: 16 additions & 6 deletions aiida/engine/processes/functions.py
Original file line number Diff line number Diff line change
Expand Up @@ -127,8 +127,8 @@ def run_get_node(*args, **kwargs):
:return: tuple of the outputs of the process and the process node pk
:rtype: (dict, int)
"""
current_runner = get_manager().get_runner()
runner = get_manager().create_runner(with_persistence=False)
manager = get_manager()
runner = manager.create_runner(with_persistence=False)
inputs = process_class.create_inputs(*args, **kwargs)

# Remove all the known inputs from the kwargs
Expand All @@ -141,21 +141,31 @@ def run_get_node(*args, **kwargs):

process = process_class(inputs=inputs, runner=runner)

# Only add handlers for interrupt signal to kill the current process if we are not in a daemon runner
# Only add handlers for interrupt signal to kill the process if we are in a local and not a daemon runner.
# Without this check, running process functions in a daemon worker would be killed if the daemon is shutdown
current_runner = manager.get_runner()
original_handler = None
kill_signal = signal.SIGINT

if not current_runner.is_daemon_runner:

def kill_process(_num, _frame):
"""Send the kill signal to the process in the current scope."""
from tornado import gen
LOGGER.critical('runner received interrupt, killing process %s', process.pid)
process.kill(msg='Process was killed because the runner received an interrupt')
result = process.kill(msg='Process was killed because the runner received an interrupt')
raise gen.Return(result)

signal.signal(signal.SIGINT, kill_process)
signal.signal(signal.SIGTERM, kill_process)
# Store the current handler on the signal such that it can be restored after process has terminated
original_handler = signal.getsignal(kill_signal)
signal.signal(kill_signal, kill_process)

try:
result = process.execute()
finally:
# If the `original_handler` is set, that means the `kill_process` was bound, which needs to be reset
if original_handler:
signal.signal(signal.SIGINT, original_handler)
runner.close()

store_provenance = inputs.get('metadata', {}).get('store_provenance', True)
Expand Down
2 changes: 1 addition & 1 deletion aiida/engine/processes/process.py
Original file line number Diff line number Diff line change
Expand Up @@ -293,7 +293,7 @@ def kill(self, msg=None):

if killing:
# We are waiting for things to be killed, so return the 'gathered' future
result = plumpy.gather(result)
result = plumpy.gather(killing)

return result

Expand Down

0 comments on commit 3771945

Please sign in to comment.