Fix bug in process function interrupt handlers causing `verdi daemon stop` to timeout #2966

sphuber · 2019-06-05T12:30:29Z

The issue is addressed in two separate commits, because the source of the issue was two-fold. Although this solves this particular issue and makes the handling of interrupts in local runners more robust, a more general problem was laid bare as documented in issue #2965 .
Unfortunately, I haven't found a way to add unit tests for the issue that this PR solves.

Recently, signal handlers were added to catch interrupt signal to kill active processes that were `run` in a local interpreter. Since process functions create their own runner, separate handlers had to be added there. However, these handlers were than also attached when the functions are run in a daemon runner. Shutting the daemon down would then trigger these signal handlers and the process functions would be killed.

coveralls · 2019-06-05T12:54:04Z

Coverage decreased (-0.04%) to 72.913% when pulling 705d0fb on sphuber:fix_2963_runner_kill_process_signal into 5ff132e on aiidateam:develop.

ltalirz

thanks @sphuber !
just a few requests for clarification

ltalirz · 2019-06-05T13:19:33Z

aiida/engine/processes/functions.py

            the same runner and it would be blocking the event loop from continuing.

            :param args: input arguments to construct the FunctionProcess
            :param kwargs: input keyword arguments to construct the FunctionProcess
            :return: tuple of the outputs of the process and the process node pk
            :rtype: (dict, int)
            """
+            current_runner = get_manager().get_runner()


maybe move this line directly before the if not ...

ltalirz · 2019-06-05T13:24:22Z

aiida/engine/processes/functions.py

-                """Send the kill signal to the process in the current scope."""
-                LOGGER.critical('runner received interrupt, killing process %s', process.pid)
-                process.kill(msg='Process was killed because the runner received an interrupt')
+            # Only add handlers for interrupt signal to kill the current process if we are not in a daemon runner


maybe add what the alternative is (i.e. if you are not in a daemon runner, what type of runner are you in then?).

ltalirz · 2019-06-05T13:25:25Z

aiida/engine/processes/functions.py

+            # Only add handlers for interrupt signal to kill the current process if we are not in a daemon runner
+            # Without this check, running process functions in a daemon worker would be killed if the daemon is shutdown
+            original_handler = None
+            original_signal = signal.SIGINT


original_signal somehow suggests there will be a new signal as well, but there isn't right?
I guess you could call it kill_signal...

same for the handler

Good point for the signal, I will change it. However, I am not sure if the change for the handler would be good. This stems from the somewhat weird way of unbinding a signal in python. You cannot actually unbind it as such, you simply need to call signal.signal(signal, handler) again and here then handler needs to be what it was before. That is why I store that handle in the original_handler before overriding it, and then later you restore it. I will add a comment where I save the handler

The `run_get_node` method of process functions attaches a `kill_process` handler to interrupt signals, such that when the funtion process is interrupted during execution, it is properly cleaned up, for example by setting the process state on the node to `Killed`. These handlers were, however, not properly unbound after the process was terminated and so would be triggered even after the process instance was no longer alive. This would cause the `kill_process` handler to hang. In the case of a daemon runner, this would prevent the python process from shutting down. This manifested itself in `verdi daemon stop` timing out if one of the daemon runners had run just a single process function. Note that the previous commit already fixed this problem in a different way, by not even attaching the interrupt handlers for process function in the first place. But the reasoning for that is different.

ltalirz

Thanks @sphuber
this is as far as I can review this ;-) not saying I follow everything here in detail...

sphuber · 2019-06-05T19:18:50Z

Thanks for this and all the other thoughtful reviews Leo

sphuber requested a review from ltalirz June 5, 2019 12:30

ltalirz suggested changes Jun 5, 2019

View reviewed changes

sphuber force-pushed the fix_2963_runner_kill_process_signal branch from f291df6 to 705d0fb Compare June 5, 2019 14:59

ltalirz approved these changes Jun 5, 2019

View reviewed changes

sphuber merged commit 3771945 into aiidateam:develop Jun 5, 2019

sphuber deleted the fix_2963_runner_kill_process_signal branch June 5, 2019 19:18

sphuber mentioned this pull request Jun 5, 2019

Stopping the daemon always seems to hit a timeout #2963

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix bug in process function interrupt handlers causing `verdi daemon stop` to timeout #2966

Fix bug in process function interrupt handlers causing `verdi daemon stop` to timeout #2966

sphuber commented Jun 5, 2019 •

edited

Loading

coveralls commented Jun 5, 2019 •

edited

Loading

ltalirz left a comment

ltalirz Jun 5, 2019

sphuber Jun 5, 2019

ltalirz Jun 5, 2019

ltalirz Jun 5, 2019

ltalirz Jun 5, 2019

sphuber Jun 5, 2019 •

edited

Loading

ltalirz left a comment

sphuber commented Jun 5, 2019

Fix bug in process function interrupt handlers causing verdi daemon stop to timeout #2966

Fix bug in process function interrupt handlers causing verdi daemon stop to timeout #2966

Conversation

sphuber commented Jun 5, 2019 • edited Loading

coveralls commented Jun 5, 2019 • edited Loading

ltalirz left a comment

Choose a reason for hiding this comment

ltalirz Jun 5, 2019

Choose a reason for hiding this comment

sphuber Jun 5, 2019

Choose a reason for hiding this comment

ltalirz Jun 5, 2019

Choose a reason for hiding this comment

ltalirz Jun 5, 2019

Choose a reason for hiding this comment

ltalirz Jun 5, 2019

Choose a reason for hiding this comment

sphuber Jun 5, 2019 • edited Loading

Choose a reason for hiding this comment

ltalirz left a comment

Choose a reason for hiding this comment

sphuber commented Jun 5, 2019

Fix bug in process function interrupt handlers causing `verdi daemon stop` to timeout #2966

Fix bug in process function interrupt handlers causing `verdi daemon stop` to timeout #2966

sphuber commented Jun 5, 2019 •

edited

Loading

coveralls commented Jun 5, 2019 •

edited

Loading

sphuber Jun 5, 2019 •

edited

Loading