Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERL-282: escript fails when redirecting stdout to less and less terminates #3150

Closed
OTP-Maintainer opened this issue Oct 13, 2016 · 18 comments
Closed
Assignees
Labels
bug Issue is reported as a bug priority:low team:VM Assigned to OTP team VM
Milestone

Comments

@OTP-Maintainer
Copy link

Original reporter: jj1bdx
Affected version: OTP-19.1
Component: erts
Migrated from: https://bugs.erlang.org/browse/ERL-282


When executing the attached script {{lessbug.escript}} as
{{escript lessbug.escript | less}} and terminates less before displaying all the results,
BEAM crashes with a dump.
The system is macOS 10.12.0, Xcode 8.0, with OTP 19.1.3, compiled with the following options:
{{--enable-m64-build --enable-darwin-64bit --disable-hipe --enable-vm-probes --with-dynamic-trace=dtrace --disable-native-libs --enable-kernel-poll --without-odbc --enable-threads --enable-smp-support --with-wx --with-ssl=/usr/local/opt/openssl}}

This crash dump is not always reproducible: the output may not respond at all sometimes.

@OTP-Maintainer
Copy link
Author

lukas said:

Hello!

I've been unable to reproduce the error. Could you run it again and pass the flag "-init_debug", into the escript (next to the -pa) and post the printouts here.

Also if possible it would be great if sasl could be started to get some more logging, but that may cause the race to not happen any more.

@OTP-Maintainer
Copy link
Author

essen said:

I can reproduce it reliably. This is also an issue in the Erlang.mk test suite because I use `grep -q` to make sure Eunit runs. It's not just escripts, as I don't use those there, I use `erl -eval` instead, and still get the same error.

It's easier to trigger with grep, so that's what I did. If I run this:

{code}
./lessbug.escript | tee out.txt | grep -q Random
{code}

Then the out.txt file starts with this:

{code:erlang}
{progress,preloaded}
{progress,kernel_load_completed}
{progress,modules_loaded}
{start,heart}
{start,error_logger}
{start,application_controller}
{progress,init_kernel_started}
{apply,{application,load,[{application,stdlib,[{description,"ERTS  CXC 138 10"},{vsn,"3.1"},{id,[]},{modules,[array,base64,beam_lib,binary,c,calendar,dets,dets_server,dets_sup,dets_utils,dets_v8,dets_v9,dict,digraph,digraph_utils,edlin,edlin_expand,epp,eval_bits,erl_anno,erl_bits,erl_compile,erl_eval,erl_expand_records,erl_internal,erl_lint,erl_parse,erl_posix_msg,erl_pp,erl_scan,erl_tar,error_logger_file_h,error_logger_tty_h,escript,ets,file_sorter,filelib,filename,gb_trees,gb_sets,gen,gen_event,gen_fsm,gen_server,gen_statem,io,io_lib,io_lib_format,io_lib_fread,io_lib_pretty,lib,lists,log_mf_h,maps,math,ms_transform,orddict,ordsets,otp_internal,pool,proc_lib,proplists,qlc,qlc_pt,queue,rand,random,re,sets,shell,shell_default,slave,sofs,string,supervisor,supervisor_bridge,sys,timer,unicode,win32reg,zip]},{registered,[timer_server,rsh_starter,take_over_monitor,pool_master,dets]},{applications,[kernel]},{included_applications,[]},{env,[]},{maxT,infinity},{maxP,infinity}]}]}}
{progress,applications_loaded}
{apply,{application,start_boot,[kernel,permanent]}}
{apply,{application,start_boot,[stdlib,permanent]}}
{apply,{c,erlangrc,[]}}
{progress,started}
... normal output ...
{code}

And the error is basically the same as Kenji's.

When trying different Erlang versions from 17.1 to 19.1, I run into the same issue. I have it on Arch Linux, OSX and Windows. Strangely FreeBSD apparently has no issues, but that might just be a fluke. When the tests were initially written I also did not have this problem; now it's every time. This doesn't make my tests fail, it's just that the VM shuts down miserably instead of normally.

I can provide you with access to one machine where the error happens, just send me your public key.

@OTP-Maintainer
Copy link
Author

jj1bdx said:

Tried essen's script 
{code:erlang}
./lessbug.escript | tee out.txt | grep -q Random
{code}
on macOS 10.12.1, Xcode 8.1, and OTP 19.1.6, with the -init_debug option, resulted in the same out.txt as:
{code:erlang}
{progress,preloaded}
{progress,kernel_load_completed}
{progress,modules_loaded}
{start,heart}
{start,error_logger}
{start,application_controller}
{progress,init_kernel_started}
{apply,{application,load,[{application,stdlib,[{description,"ERTS  CXC 138 10"},{vsn,"3.1"},{id,[]},{modules,[array,base64,beam_lib,binary,c,calendar,dets,dets_server,dets_sup,dets_utils,dets_v8,dets_v9,dict,digraph,digraph_utils,edlin,edlin_expand,epp,eval_bits,erl_anno,erl_bits,erl_compile,erl_eval,erl_expand_records,erl_internal,erl_lint,erl_parse,erl_posix_msg,erl_pp,erl_scan,erl_tar,error_logger_file_h,error_logger_tty_h,escript,ets,file_sorter,filelib,filename,gb_trees,gb_sets,gen,gen_event,gen_fsm,gen_server,gen_statem,io,io_lib,io_lib_format,io_lib_fread,io_lib_pretty,lib,lists,log_mf_h,maps,math,ms_transform,orddict,ordsets,otp_internal,pool,proc_lib,proplists,qlc,qlc_pt,queue,rand,random,re,sets,shell,shell_default,slave,sofs,string,supervisor,supervisor_bridge,sys,timer,unicode,win32reg,zip]},{registered,[timer_server,rsh_starter,take_over_monitor,pool_master,dets]},{applications,[kernel]},{included_applications,[]},{env,[]},{maxT,infinity},{maxP,infinity}]}]}}
{progress,applications_loaded}
{apply,{application,start_boot,[kernel,permanent]}}
{apply,{application,start_boot,[stdlib,permanent]}}
{apply,{c,erlangrc,[]}}
{progress,started}
State1: {#{max => 288230376151711743,
... and the output continues ...
{code}
And the error was
{code:erlang}
2016-11-10 19:51:29 Error in process ~p with exit value:~n~p~n
		2016-11-10 19:51:29 supervisor_report
    supervisor:     errorContext:     reason:     offender: 2016-11-10 19:51:29 ** Generic server ~p terminating
** Last message in was ~p~n** When Server state == ~p~n** Reason for termination == ~n** ~p~n
				init terminating in do_boot ()

Crash dump is being written to: erl_crash.dump...done
{code}

So far that's what I've got.

@OTP-Maintainer
Copy link
Author

jj1bdx said:

Probably related article from erlang-bugs mailing list in May 2015: [Crash when piping stdout to 'head'?|http://erlang.org/pipermail/erlang-bugs/2015-May/004950.html]
Mikael Petersson writes in [the previous mailing list article here|http://erlang.org/pipermail/erlang-bugs/2015-May/004948.html]:

{quote}
I'm not so sure this is a bug.

head -20 will close the pipe after consuming 20 lines from it.  Standard Unix
behaviour is to send a SIGPIPE signal to a process that writes to a closed pipe,
and unless caught or ignored that kills the process.  It looks like the Erlang
VM ignores SIGPIPE, so the write will instead fail with EPIPE.  Presumably
something in the call chain considers that a fatal error and terminates the VM.

Unless there is documentation to the effect that the VM is supposed to handle
this use case, I don't think this is a bug.
{quote}

My thoughts on this are:

* This crashing behavior about SIGPIPE should be described somewhere in the documentation.
* I think many people are using escript as some like shell scripts, so handling the SIGPIPE in a more graceful way is nice.
* OTOH termination with generating such an error message (including ~p or ~n) is abnormal and a more legible error message is needed.

@OTP-Maintainer
Copy link
Author

cdegroot said:

ERL-396 is a bit more related to what I've seen, but as that one has been closed I'm adding this here: I had the issue described in ERL-396, and on the elixir-lang slack user "tsu" pointed me in the right direction of a bash issue (http://lists.gnu.org/archive/html/bug-bash/2017-01/msg00039.html) and a potential Erlang-side cause (not calling ttysl_stop - https://github.com/erlang/otp/blob/master/erts/emulator/drivers/unix/ttsl_drv.c#L432-L448). 

To verify, I applied the patch to bash and the behaviour went away. But this does seem to be an OTP issue as well (both, I'd argue, because even if Ctrl-C etcetera is handled, Erlang can get a kill -9 and then bash should make sure stuff gets reset to a sane configuration). 

Hth.

@OTP-Maintainer
Copy link
Author

roadrunnr71 said:

Got hit by the same/similar problem. It seems that Redhat has applied a fix to their bash package (https://bugzilla.redhat.com/show_bug.cgi?id=1068697), other shell (e.g. zsh) might also reset STDIN to sane setting by default.

The position of the bash developers seems that if a programs mucks with STDIN, it is the responsibility of that program to reset it to default behavior. Who that is supposed to work with a SIGKILL or SIGSEGV i don't know.

It would be best if Erlang would not tinker with the blocking setting of STDIN

@OTP-Maintainer
Copy link
Author

ferd said:

Related issue I might have found that only triggers on escript: Unicode output gets mangled in a way a regular shell won't. I.e. in an escript, `io:format("~ts~n", [Str])`  will yield "don\x\{2019}t" instead of "don’t" as you will with just `erl` running things.

@OTP-Maintainer OTP-Maintainer added bug Issue is reported as a bug team:VM Assigned to OTP team VM priority:low labels Feb 10, 2021
@garazdawi
Copy link
Contributor

garazdawi commented Feb 26, 2021

A bit more info here: #4553

Duplicates: #4161, #3305, #3796, #3550

@garazdawi
Copy link
Contributor

I want to put down some of the reasons why I think that the issues linked above are duplicates of this.

In OTP-18 I introduced this commit that sets the stdin and stdout fds to NONBLOCKING on unix. The reason why this change was done is described in the commit message:

Without this change an stdout write could block an entire scheduler if for some reason the pseudo tty on the other side does not consume the output of the Erlang shell.

As it turns out, this change comes with a host of issues for programs using the stdin/stdout fds of Erlang.

  1. Firstly, programs that read the stdout from Erlang may get an FD as its stdin that already is in NONBLOCKING mode. Some versions of programs such as less and cat are not prepared for this and behave badly. This is what happens when you do erl | cat.

  2. Secondly, when Erlang exits without setting its stdout/stdin fds to BLOCKING, some versions of bash/sh will not reset it and all of the sudden bash starts working with a NONBLOCKING fd, which it is not prepared for. This is when you get a garbled terminal.

A partial solution to this problem could be to call sys_tty_reset (as @jktomer points out in #4553) when SIGINT is encounter. However, this is not a foolproof method as a process can be killed by SIGKILL and then the same problem would remain.

So the solution for this in my mind is to remove the NONBLOCKING behaviour from the ttsl_driver and the fd_driver. This brings us back to the original problem of what to do when we try to write to stdout and it blocks.

The best solution that I can come up with is to put all writes on dirty I/O schedulers, however, that would mean that we need to rewrite the ttsl driver to use nifs and that is a lot of work. Also the fd_driver cannot be rewritten to a nif, so we would have to create another way to handle "oldshell".

So right now I'm leaning towards creating two driver threads inside each ttysl/fd port that deals with the problems of reading/writing to blocking fds. This seems to be the solution with the least amount of work that we know will work. Unless anyone else has any better ideas?

@jktomer
Copy link
Contributor

jktomer commented Mar 1, 2021

I don't think making the file descriptor nonblocking is the issue for "tty output after interrupting with +Bd/killing any shell is hosed," rather it's the tty modes set in ttsl_sdrv.c's tty_init. (I was wrong that sys_tty_reset is the fix; I hadn't read it carefully enough).

I don't actually think setting a TTY FD to nonblocking mode changes behavior for other processes that have the same TTY open. (It will, of course, affect processes we fork off, but that doesn't seem to be a major concern for anyone.) Edit: I'm wrong about that, demonstration in https://gist.github.com/jktomer/8361b042236a5abbbafc1ce9836f22e1

@max-au
Copy link
Contributor

max-au commented Mar 9, 2021

Do I understand correctly that the discussion is about this line: https://github.com/erlang/otp/blob/master/erts/emulator/drivers/unix/ttsl_drv.c#L334

I tried removing it, and it seems to make no difference, erl +B still garbles the console after Ctrl + C (at least for Mac OS).

@jktomer
Copy link
Contributor

jktomer commented Mar 9, 2021

Setting the tty file descriptor to nonblocking is not the cause of garbled terminal output. The issue is the tcsetattr call in tty_set, specifically the fact that it is normally called with the OPOST bit cleared as set up in tty_init. The comment above helpfully notes that setting raw mode on the terminal is OK "assuming the caller will call the tty_reset() function (also in this file) when it's done with raw mode."

Setting the fd to nonblocking may cause other issues with pipelines.

@garazdawi
Copy link
Contributor

What shell and OS versions are you using to trigger the error with a garbled terminal? Also what exactly are you doing to trigger it?

@max-au
Copy link
Contributor

max-au commented Mar 9, 2021

I think this issue is actually two issues merged. Might be useful to split it.

Garbled terminal can be consistently reproduced with Mac OS X 11.2.2 and zsh (interestingly, bash seems unaffected). Running erl +B and then pressing Ctrl+C leads to undesired effect (see @ferd's screenshot). As @jktomer mentioned, it is caused with clearing OPOST bit (and not resetting it). Running stty sane command resets OPOST and recovers terminal.

SET_NONBLOCKING I think causes a different issue (crash dumps when piping BEAM output to less/head/...)

@garazdawi
Copy link
Contributor

zsh (interestingly, bash seems unaffected)

So that is why I haven't seen it. I'm still using bash on OS X and I very rarely run using +B.

@max-au
Copy link
Contributor

max-au commented Mar 10, 2021

All escripts (e.g. rebar3) are running with +B, so rebar3 shell leaves garbled TTY almost every time. I realised that it might be an easy fix though (unlike piping issue with nonblocking fd).

@paulo-ferraz-oliveira
Copy link
Contributor

If it helps, I also have this issue. I'm on:

  • Mac OS 10.12.6, running
  • OTP 24-rc1, with
  • rebar3 master, compiled with OTP 24-rc1.

I do rebar3 shell then Ctrl-C Ctrl-C > get the garbled output. I'm using zsh 5.2.

By garbled I mean stuff like

===> Verifying dependencies...
                              ===> Analyzing applications...
                                                            ===> Compiling hacienda
                                                                                   Erlang/OTP 24 [RELEASE CANDIDATE 1] [erts-12.0] [source] [64-bit] [smp:8:8] [ds:8:8:10] [async-threads:1] [systemtap]

@jktomer
Copy link
Contributor

jktomer commented Mar 10, 2021 via email

@garazdawi garazdawi added this to the OTP-26.0 milestone Apr 7, 2022
garazdawi added a commit to garazdawi/otp that referenced this issue Aug 28, 2022
This commit re-implements the entire tty driver for both Unix
and Windows to use a common nif instead of two seperate drivers.

The Unix implementation works pretty much as it did before only that
a lot more of the terminal logic has been moved from Erlang to C.

The windows implementation now uses Windows Terminal Sequences, i.e.
the same sequences as most Unixes to control the terminal. This means
that werl.exe is no longer needed and erl.exe will have the "newshell"
with all the features normally only found on Unix.

The new implementation also uses dirty I/O threads for all I/O which
means that it can leave the FDs in blocking mode. This fixes problems
when the Erlang tty is interacting with other systems such as bash.

Closes erlang#3150
Closes erlang#3390
Closes erlang#4343
garazdawi added a commit to garazdawi/otp that referenced this issue Aug 28, 2022
If the cleanup code in sys_tty_reset is never run it can leave
the terminal in a broken state. The cleanup code is not executed
if erts receives a SIGKILL or if Ctrl-C is pressed when +B is started.

Closes erlang#3150
garazdawi added a commit to garazdawi/otp that referenced this issue Aug 29, 2022
This commit re-implements the entire tty driver for both Unix
and Windows to use a common nif instead of two seperate drivers.

The Unix implementation works pretty much as it did before only that
a lot more of the terminal logic has been moved from Erlang to C.

The windows implementation now uses Windows Terminal Sequences, i.e.
the same sequences as most Unixes to control the terminal. This means
that werl.exe is no longer needed and erl.exe will have the "newshell"
with all the features normally only found on Unix.

The new implementation also uses dirty I/O threads for all I/O which
means that it can leave the FDs in blocking mode. This fixes problems
when the Erlang tty is interacting with other systems such as bash.

Closes erlang#3150
Closes erlang#3390
Closes erlang#4343
garazdawi added a commit to garazdawi/otp that referenced this issue Aug 29, 2022
If the cleanup code in sys_tty_reset is never run it can leave
the terminal in a broken state. The cleanup code is not executed
if erts receives a SIGKILL or if Ctrl-C is pressed when +B is started.

Closes erlang#3150
garazdawi added a commit to garazdawi/otp that referenced this issue Aug 29, 2022
This commit re-implements the entire tty driver for both Unix
and Windows to use a common nif instead of two seperate drivers.

The Unix implementation works pretty much as it did before only that
a lot more of the terminal logic has been moved from Erlang to C.

The windows implementation now uses Windows Terminal Sequences, i.e.
the same sequences as most Unixes to control the terminal. This means
that werl.exe is no longer needed and erl.exe will have the "newshell"
with all the features normally only found on Unix.

The new implementation also uses dirty I/O threads for all I/O which
means that it can leave the FDs in blocking mode. This fixes problems
when the Erlang tty is interacting with other systems such as bash.

Closes erlang#3150
Closes erlang#3390
Closes erlang#4343
garazdawi added a commit to garazdawi/otp that referenced this issue Aug 29, 2022
If the cleanup code in sys_tty_reset is never run it can leave
the terminal in a broken state. The cleanup code is not executed
if erts receives a SIGKILL or if Ctrl-C is pressed when +B is started.

Closes erlang#3150
garazdawi added a commit to garazdawi/otp that referenced this issue Aug 29, 2022
If the cleanup code in sys_tty_reset is never run it can leave
the terminal in a broken state. The cleanup code is not executed
if erts receives a SIGKILL or if Ctrl-C is pressed when +B is started.

Closes erlang#3150
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issue is reported as a bug priority:low team:VM Assigned to OTP team VM
Projects
None yet
Development

No branches or pull requests

5 participants