Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Handle SIGUSR1 even under +B i / +B d #4553

Merged
merged 2 commits into from
Mar 16, 2021
Merged

Conversation

jktomer
Copy link
Contributor

@jktomer jktomer commented Feb 25, 2021

The OTP documentation states that SIGUSR1 causes a crash dump unconditionally.
However, under +B i or +B d, the call to install the SIGUSR1 handler was not
made before this change, causing SIGUSR1 to kill process immediately instead.

Fixes #4552.

@CLAassistant
Copy link

CLAassistant commented Feb 25, 2021

CLA assistant check
All committers have signed the CLA.

@ferd
Copy link
Contributor

ferd commented Feb 26, 2021

This might be the fix behind the shell bugs we see with escripts and rebar3 when exiting the instance.

@jktomer jktomer changed the base branch from master to maint February 26, 2021 02:38
@garazdawi
Copy link
Contributor

This might be the fix behind the shell bugs we see with escripts and rebar3 when exiting the instance.

Shell bug?

@ferd
Copy link
Contributor

ferd commented Feb 26, 2021

This might be the fix behind the shell bugs we see with escripts and rebar3 when exiting the instance.

Shell bug?

I opened it as https://bugs.erlang.org/browse/ERL-1162 and it was closed as a duplicate.

When starting erl normally, once can exit the shell without bad consequences on the output when exiting with CTRL+C:

$ erl
Erlang/OTP 22 [erts-10.5.3] [source] [64-bit] [smp:8:4] [ds:8:4:10] [async-threads:1] [hipe]
Eshell V10.5.3 (abort with ^G)
1>
BREAK: (a)bort (c)ontinue (p)roc info (i)nfo (l)oaded
(v)ersion (k)ill (D)b-tables (d)istribution
^C% λ ~ → ls
<regular output>

However, when the +B option is passed, the same CTRL+C exit causes the resulting listing to be broken, and generally the terminal looks like it has no concept of its own size anymore:

$ erl +B
Erlang/OTP 22 [erts-10.5.3] [source] [64-bit] [smp:8:4] [ds:8:4:10] [async-threads:1] [hipe]
Eshell V10.5.3 (abort with ^G)
1>
$ ls
<broken output>

I've attached the broken output as an image:
image

This problem is made a bit worse because escripts mandate the +B argument to work, so any interactive shell (i.e. rebar3 shell) gets broken output automatically

@garazdawi
Copy link
Contributor

I opened it as https://bugs.erlang.org/browse/ERL-1162 and it was closed as a duplicate.

Ah, that good oldie. I've linked in all the info in the github issue tracking this.

@jktomer
Copy link
Contributor Author

jktomer commented Feb 26, 2021

Not restoring terminal state on Ctrl-C with +B d is related, but not identical, to this problem and the one described in #3150. This commit handles SIGUSR1 only; for +B d, we would want to handle SIGINT with a handler to run (at least) the tty cleanup code in sys_tty_reset before exiting. This shouldn't present a compatibility problem with existing Erlang code using os:set_signal because any call to os:set_signal would replace whatever signal handler is set up at init time.

@max-au
Copy link
Contributor

max-au commented Feb 27, 2021

I can confirm that this change does not fix garbled terminal after exiting rebar3 shell with Ctrl+C.

@rickard-green rickard-green added the team:VM Assigned to OTP team VM label Mar 1, 2021
@rickard-green rickard-green self-assigned this Mar 1, 2021
@jktomer
Copy link
Contributor Author

jktomer commented Mar 9, 2021

Gentle ping: any objection to merging this? I'm having a hard time envisioning a system that depends on SIGUSR1 hard-killing BEAM (note that any action by user code to change handling of SIGUSR1, in particular any call to os:set_signal/2, or any NIF or port driver that installs a signal handler after early BEAM initialization, would render this change moot, so only programs that did nothing with respect to SIGUSR1 would find themselves affected by this change, and the effect will always be that SIGUSR1 triggers a crashdump and orderly emulator exit rather than brutally killing the process as SIGKILL would).

@rickard-green
Copy link
Contributor

Currently a bit short on time. Will have a look at it at the end of the week.

@rickard-green
Copy link
Contributor

Short version: I'm for this, but into the master branch instead of into the maint branch. That is, not until OTP 24.

SIGUSR1 has always (at least since the 90ies) been lumped together with SIGINT and SIGQUIT in the "break handling" and restored to their default actions if +B (or +Bd) was passed as argument at startup. I don't see what SIGUSR1 has to do with "break handlng", though. I guess that someone wanted a way to restore all signal actions to their defaults that could be restored to their defaults without breaking the system.

+Bi was introduced in order to prevent key combinations (Ctrl-C, Ctrl-\, Ctrl-Z causing signals being sent to the emulator) resulting in the emulator being suspended, halted, etc. The SIGUSR1 probably by mistake got set to default action, when +Bi is passed, just because it had always been lumped together with SIGINT and SIGQUIT in the initialization of "break handling". I don't see any reason why the behavior of SIGUSR1 should change when +Bi is passed.

I don't think we should introduce this change in OTP 23 since it changes a behavior that has been there for a very long time, but I think we should change it in OTP 24 since current behavior makes no sense.

@rickard-green rickard-green changed the base branch from maint to master March 11, 2021 14:56
@rickard-green rickard-green added the testing currently being tested, tag is used by OTP internal CI label Mar 11, 2021
jktomer added 2 commits March 13, 2021 10:58
The OTP documentation states that SIGUSR1 causes a crash dump unconditionally.
However, under +B i or +B d, the call to install the SIGUSR1 handler was not
made before this change, causing SIGUSR1 to kill process immediately instead.
With this change to the test but not the fix in the previous commit, running
os_signal_SUITE with emulator arg +Bd or +Bi will fail (BEAM will die with a
signal, which is not the best way to fail a test, but we don't have a shell
test suite to catch it).

To confirm, you can use ts, with
`ts:run(emulator, os_signal_SUITE, [{vars, [{erl_start_args, "+Bd"}]}]).`.
@jktomer
Copy link
Contributor Author

jktomer commented Mar 13, 2021

Thanks for the review! Rebased onto master.

@rickard-green rickard-green merged commit 98f6d7e into erlang:master Mar 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team:VM Assigned to OTP team VM testing currently being tested, tag is used by OTP internal CI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SIGUSR1 kills beam immediately under +B i/d
6 participants