NIF panics when producing too much input in too short of a time #72

hansihe · 2017-01-19T09:36:43Z

Not sure if there is any way to actually fix this. This is probably because Erlang has stdout set to non-blocking mode, and a buffer gets filled up somewhere.

scrogson · 2017-01-19T13:01:10Z

Can you paste some example code that produces this scenario?

hansihe · 2017-01-19T13:10:45Z

@scrogson

pub fn print_panic<'a>(env: NifEnv<'a>, _args: &Vec<NifTerm<'a>>) -> NifResult<NifTerm<'a>> {
    for _ in 0..1000 {
        println!("testtesttesttesttesttesttesttesttesttesttesttest");
    }

    Ok(true.encode(env))
}

scrogson · 2017-01-19T13:12:36Z

hmm...but this isn't real world use case, is it? Seems only useful for debugging purposes.

hansihe · 2017-01-19T13:16:55Z

Which is a very real world use case. I have hit this issue numerous times when writing different kinds of NIFs.

jorendorff · 2017-01-26T23:30:39Z

Yeah, this has hit me more than once, just trying to debug misbehaving tests.

Rustler can provide its own print! and println! macros that work around this.

zhaohansprt · 2017-03-31T02:30:49Z

erlnif have some api can provide async thread spawn which can aviod such situations that going to block the beam

jorendorff · 2017-03-31T02:44:33Z

Rustler has APIs like that too, but they don't do anything to solve this problem.

I still think just replacing print! and println! is the way to go here.

zhaohansprt · 2017-03-31T02:48:26Z

cool!

evnu · 2019-01-11T10:36:33Z

Using println! repeatedly is a costly operation, as a lock has to be acquired on each invocation. By acquiring the lock only once, this can possibly be worked around. The following works on my machine without a panic:

use rustler::{Encoder, NifResult};
use std::io::Write;
fn crash<'a>(env: Env<'a>, _: &[Term]) -> NifResult<Term<'a>> {
    let stdout = std::io::stdout();
    let mut lock = stdout.lock();
    for _ in 0..10000 {
        writeln!(&mut lock, "testtesttesttesttesttesttesttesttesttesttesttest");
    }

    Ok(true.encode(env))
}

Of course, this does not solve the problem, as simply increasing the iterations will still lead to panics. 100_000 iterations of the loop above result in a panic here.

shribe · 2020-01-20T17:17:22Z

FWIW, I just hit this problem with only a few KB, (one terminal screen full or so, no loops, just individual logging statements), and it cost me a day chasing down where my panic was coming from, as I assumed I had a logic error somewhere. As we may not have good source-level debugging tools for Rust code running inside the BEAM, early development can be highly-dependent on debug messages.

That said, there is a workaround to at least avoid the crash:

use gag::Redirect;
…
  let log = OpenOptions::new()
    .truncate(true)
    .read(true)
    .create(true)
    .write(true)
    .open("/var/log/…")
    .unwrap();

  let print_redirect = Redirect::stdout(log).unwrap();
  let pd_heap = Box::new(print_redirect);
  let _ = Box::into_raw(pd_heap);

This has the awkward effect of segregating output of IO.put/inspect from the NIF output, but it's better than mysterious panics.

evnu · 2020-01-21T21:01:52Z

I can trigger it reliably with evnu@6d1ce19 if I run it through rust-gdb and an ERTS Debug VM. When running the example directly with rust 1.40 and alacritty, the panic happens very rarely here.

The panic happens here, after the output should have been written to stdout:

    if let Err(e) = result {
        panic!("failed printing to {}: {}", label, e);
    }

Alas, I haven't been able to get the values of label or e. A stacktrace can be reproduced as follows. Note that I manually add the rust source as a directory in gdb.

$ rust-gdb --args bash $ERL_TOP/bin/cerl -kernel shell_history enabled -pa $(find /usr/lib/elixir/lib/*/ebin) -elixir ansi_enabled true -noshell -s elixir start_cli -- -extra /usr/bin/mix test test/panic_on_output_test.exs
(gdb) directory /home/evnu/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust
(gdb) b rust_panic
(gdb) run
...
line = 353428
line = 353429[Switching to Thread 0x7fffb4279700 (LWP 28195)]

Thread 7 "3_scheduler" hit Breakpoint 1, rust_panic () at src/libstd/panicking.rs:513
513	        __rust_start_panic(obj as usize)
(gdb) bt
#0  rust_panic () at src/libstd/panicking.rs:513
....
#4  0x00007fff37d1a812 in std::io::stdio::print_to () at src/libstd/io/stdio.rs:792
#5  std::io::stdio::_print () at src/libstd/io/stdio.rs:802
#6  0x00007fff37cb8a9c in <rustler_test::test_panic::too_much_output as rustler::nif::Nif>::RAW_FUNC::nif_func::wrapper::{{closure}}::too_much_output (n=10000000) at rustler_tests/native/rustler_test/src/test_panic.rs:4
...
#10 0x00007fff37ca0ae7 in std::panicking::try (f=...) at /rustc/73528e339aae0f17a15ffa49a8ac608f50c6cf14/src/libstd/panicking.rs:265
#11 0x00007fff37c417a4 in std::panic::catch_unwind (f=...) at /rustc/73528e339aae0f17a15ffa49a8ac608f50c6cf14/src/libstd/panic.rs:396
#12 0x00007fff37cb891e in <rustler_test::test_panic::too_much_output as rustler::nif::Nif>::RAW_FUNC::nif_func::wrapper (env=..., args=...) at rustler_tests/native/rustler_test/src/test_panic.rs:1
#13 0x00007fff37cb8879 in <rustler_test::test_panic::too_much_output as
....

scrogson · 2020-01-21T21:49:08Z

@evnu can you swap the use of println! with env_logger and log crates and see if you can trigger a panic? So far I've had good luck with that combination in https://github.com/scrogson/franz

evnu · 2020-01-21T21:58:45Z

I added env_logger to the test with evnu@ec6427d. I can no longer reproduce the panic with gdb:

RUST_LOG=info rust-gdb --args bash /home/evnu/tools/otp/bin/cerl -kernel shell_history enabled -pa $(find /usr/lib/elixir/lib/*/ebin) -elixir ansi_enabled true -noshell -s elixir start_cli -- -extra /usr/bin/mix test test/panic_on_output_test.exs

evnu · 2020-01-22T22:43:40Z

I tried to get some more info using lldb. While I wasn't able to break at rust_panic, I got some more info on the specific error. Maybe this can help to track down the cause.

thread '<unnamed>' panicked at 'failed printing to stdout: Resource temporarily unavailable (os error 11)', src/libstd/io/stdio.rs:792:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace.

I ran lldb like this:

$ echo '/home/evnu/tools/otp/bin/cerl -kernel shell_history enabled -pa $(find /usr/lib/elixir/lib/*/ebin) -elixir ansi_enabled true -noshell -s elixir start_cli -- -extra /usr/bin/mix test test/panic_on_output_test.exs' > run.sh
$ rust-lldb -- bash run.sh

Error 11 = EAGAIN:

~/tools/rustler(72-panic-output ✗) errno 11
EAGAIN 11 Resource temporarily unavailable

evnu · 2022-11-11T11:00:23Z

Closing, I think we cannot do something here.

ishitatsuyuki · 2022-12-03T07:19:13Z

It looks like setting stdio as nonblocking is generally considered bad practice as it not only propagates to external libraries but also affects every program sharing the tty.

Turns out it surely had caused other complaints upstream for the reason I mentioned: erlang/otp#3150. It looks like a new tty driver implementation is in the pipeline and will hopefully fix this issue in a future release: erlang/otp@bd0865f.

hansihe mentioned this issue Nov 3, 2020

println! with iex -S mix produces poorly formatted output #335

Open

evnu closed this as completed Nov 11, 2022

ishitatsuyuki mentioned this issue Dec 5, 2022

nif_panicked does not show the panic message #504

Closed

hansihe mentioned this issue Apr 13, 2023

libexplorer-v0.5.6-nif-2.16-aarch64-apple-darwin.so nif panic elixir-explorer/explorer#573

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NIF panics when producing too much input in too short of a time #72

NIF panics when producing too much input in too short of a time #72

hansihe commented Jan 19, 2017

scrogson commented Jan 19, 2017

hansihe commented Jan 19, 2017

scrogson commented Jan 19, 2017

hansihe commented Jan 19, 2017

jorendorff commented Jan 26, 2017

zhaohansprt commented Mar 31, 2017

jorendorff commented Mar 31, 2017 •

edited

Loading

zhaohansprt commented Mar 31, 2017

evnu commented Jan 11, 2019

shribe commented Jan 20, 2020

evnu commented Jan 21, 2020

scrogson commented Jan 21, 2020

evnu commented Jan 21, 2020 •

edited

Loading

evnu commented Jan 22, 2020 •

edited

Loading

evnu commented Nov 11, 2022

ishitatsuyuki commented Dec 3, 2022

NIF panics when producing too much input in too short of a time #72

NIF panics when producing too much input in too short of a time #72

Comments

hansihe commented Jan 19, 2017

scrogson commented Jan 19, 2017

hansihe commented Jan 19, 2017

scrogson commented Jan 19, 2017

hansihe commented Jan 19, 2017

jorendorff commented Jan 26, 2017

zhaohansprt commented Mar 31, 2017

jorendorff commented Mar 31, 2017 • edited Loading

zhaohansprt commented Mar 31, 2017

evnu commented Jan 11, 2019

shribe commented Jan 20, 2020

evnu commented Jan 21, 2020

scrogson commented Jan 21, 2020

evnu commented Jan 21, 2020 • edited Loading

evnu commented Jan 22, 2020 • edited Loading

evnu commented Nov 11, 2022

ishitatsuyuki commented Dec 3, 2022

jorendorff commented Mar 31, 2017 •

edited

Loading

evnu commented Jan 21, 2020 •

edited

Loading

evnu commented Jan 22, 2020 •

edited

Loading