Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NIF panics when producing too much input in too short of a time #72

Closed
hansihe opened this issue Jan 19, 2017 · 16 comments
Closed

NIF panics when producing too much input in too short of a time #72

hansihe opened this issue Jan 19, 2017 · 16 comments

Comments

@hansihe
Copy link
Member

hansihe commented Jan 19, 2017

Not sure if there is any way to actually fix this. This is probably because Erlang has stdout set to non-blocking mode, and a buffer gets filled up somewhere.

@scrogson
Copy link
Member

Can you paste some example code that produces this scenario?

@hansihe
Copy link
Member Author

hansihe commented Jan 19, 2017

@scrogson

pub fn print_panic<'a>(env: NifEnv<'a>, _args: &Vec<NifTerm<'a>>) -> NifResult<NifTerm<'a>> {
    for _ in 0..1000 {
        println!("testtesttesttesttesttesttesttesttesttesttesttest");
    }

    Ok(true.encode(env))
}

@scrogson
Copy link
Member

hmm...but this isn't real world use case, is it? Seems only useful for debugging purposes.

@hansihe
Copy link
Member Author

hansihe commented Jan 19, 2017

Which is a very real world use case. I have hit this issue numerous times when writing different kinds of NIFs.

@jorendorff
Copy link
Collaborator

Yeah, this has hit me more than once, just trying to debug misbehaving tests.

Rustler can provide its own print! and println! macros that work around this.

@zhaohansprt
Copy link

erlnif have some api can provide async thread spawn which can aviod such situations that going to block the beam

@jorendorff
Copy link
Collaborator

jorendorff commented Mar 31, 2017

Rustler has APIs like that too, but they don't do anything to solve this problem.

I still think just replacing print! and println! is the way to go here.

@zhaohansprt
Copy link

cool!

@evnu
Copy link
Member

evnu commented Jan 11, 2019

Using println! repeatedly is a costly operation, as a lock has to be acquired on each invocation. By acquiring the lock only once, this can possibly be worked around. The following works on my machine without a panic:

use rustler::{Encoder, NifResult};
use std::io::Write;
fn crash<'a>(env: Env<'a>, _: &[Term]) -> NifResult<Term<'a>> {
    let stdout = std::io::stdout();
    let mut lock = stdout.lock();
    for _ in 0..10000 {
        writeln!(&mut lock, "testtesttesttesttesttesttesttesttesttesttesttest");
    }

    Ok(true.encode(env))
}

Of course, this does not solve the problem, as simply increasing the iterations will still lead to panics. 100_000 iterations of the loop above result in a panic here.

@shribe
Copy link

shribe commented Jan 20, 2020

FWIW, I just hit this problem with only a few KB, (one terminal screen full or so, no loops, just individual logging statements), and it cost me a day chasing down where my panic was coming from, as I assumed I had a logic error somewhere. As we may not have good source-level debugging tools for Rust code running inside the BEAM, early development can be highly-dependent on debug messages.

That said, there is a workaround to at least avoid the crash:

use gag::Redirect;let log = OpenOptions::new()
    .truncate(true)
    .read(true)
    .create(true)
    .write(true)
    .open("/var/log/…")
    .unwrap();

  let print_redirect = Redirect::stdout(log).unwrap();
  let pd_heap = Box::new(print_redirect);
  let _ = Box::into_raw(pd_heap);

This has the awkward effect of segregating output of IO.put/inspect from the NIF output, but it's better than mysterious panics.

@evnu
Copy link
Member

evnu commented Jan 21, 2020

I can trigger it reliably with evnu@6d1ce19 if I run it through rust-gdb and an ERTS Debug VM. When running the example directly with rust 1.40 and alacritty, the panic happens very rarely here.

The panic happens here, after the output should have been written to stdout:

    if let Err(e) = result {
        panic!("failed printing to {}: {}", label, e);
    }

Alas, I haven't been able to get the values of label or e. A stacktrace can be reproduced as follows. Note that I manually add the rust source as a directory in gdb.

$ rust-gdb --args bash $ERL_TOP/bin/cerl -kernel shell_history enabled -pa $(find /usr/lib/elixir/lib/*/ebin) -elixir ansi_enabled true -noshell -s elixir start_cli -- -extra /usr/bin/mix test test/panic_on_output_test.exs
(gdb) directory /home/evnu/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust
(gdb) b rust_panic
(gdb) run
...
line = 353428
line = 353429[Switching to Thread 0x7fffb4279700 (LWP 28195)]

Thread 7 "3_scheduler" hit Breakpoint 1, rust_panic () at src/libstd/panicking.rs:513
513	        __rust_start_panic(obj as usize)
(gdb) bt
#0  rust_panic () at src/libstd/panicking.rs:513
....
#4  0x00007fff37d1a812 in std::io::stdio::print_to () at src/libstd/io/stdio.rs:792
#5  std::io::stdio::_print () at src/libstd/io/stdio.rs:802
#6  0x00007fff37cb8a9c in <rustler_test::test_panic::too_much_output as rustler::nif::Nif>::RAW_FUNC::nif_func::wrapper::{{closure}}::too_much_output (n=10000000) at rustler_tests/native/rustler_test/src/test_panic.rs:4
...
#10 0x00007fff37ca0ae7 in std::panicking::try (f=...) at /rustc/73528e339aae0f17a15ffa49a8ac608f50c6cf14/src/libstd/panicking.rs:265
#11 0x00007fff37c417a4 in std::panic::catch_unwind (f=...) at /rustc/73528e339aae0f17a15ffa49a8ac608f50c6cf14/src/libstd/panic.rs:396
#12 0x00007fff37cb891e in <rustler_test::test_panic::too_much_output as rustler::nif::Nif>::RAW_FUNC::nif_func::wrapper (env=..., args=...) at rustler_tests/native/rustler_test/src/test_panic.rs:1
#13 0x00007fff37cb8879 in <rustler_test::test_panic::too_much_output as
....

@scrogson
Copy link
Member

@evnu can you swap the use of println! with env_logger and log crates and see if you can trigger a panic? So far I've had good luck with that combination in https://github.com/scrogson/franz

@evnu
Copy link
Member

evnu commented Jan 21, 2020

I added env_logger to the test with evnu@ec6427d. I can no longer reproduce the panic with gdb:

RUST_LOG=info rust-gdb --args bash /home/evnu/tools/otp/bin/cerl -kernel shell_history enabled -pa $(find /usr/lib/elixir/lib/*/ebin) -elixir ansi_enabled true -noshell -s elixir start_cli -- -extra /usr/bin/mix test test/panic_on_output_test.exs

@evnu
Copy link
Member

evnu commented Jan 22, 2020

I tried to get some more info using lldb. While I wasn't able to break at rust_panic, I got some more info on the specific error. Maybe this can help to track down the cause.

thread '<unnamed>' panicked at 'failed printing to stdout: Resource temporarily unavailable (os error 11)', src/libstd/io/stdio.rs:792:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace.

I ran lldb like this:

$ echo '/home/evnu/tools/otp/bin/cerl -kernel shell_history enabled -pa $(find /usr/lib/elixir/lib/*/ebin) -elixir ansi_enabled true -noshell -s elixir start_cli -- -extra /usr/bin/mix test test/panic_on_output_test.exs' > run.sh
$ rust-lldb -- bash run.sh

Error 11 = EAGAIN:

~/tools/rustler(72-panic-output ✗) errno 11
EAGAIN 11 Resource temporarily unavailable

@evnu
Copy link
Member

evnu commented Nov 11, 2022

Closing, I think we cannot do something here.

@evnu evnu closed this as completed Nov 11, 2022
@ishitatsuyuki
Copy link
Contributor

It looks like setting stdio as nonblocking is generally considered bad practice as it not only propagates to external libraries but also affects every program sharing the tty.

Turns out it surely had caused other complaints upstream for the reason I mentioned: erlang/otp#3150. It looks like a new tty driver implementation is in the pipeline and will hopefully fix this issue in a future release: erlang/otp@bd0865f.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants