Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

merging stdout/stderr cleanly #43

Closed
dch opened this issue Jun 19, 2024 · 5 comments · Fixed by #46
Closed

merging stdout/stderr cleanly #43

dch opened this issue Jun 19, 2024 · 5 comments · Fixed by #46

Comments

@dch
Copy link
Contributor

dch commented Jun 19, 2024

Note the difference between these 2 shell scripts:

  • this is what somebody familiar with pipes and sockets would expect:
script = "for s in $(seq 1 10); do echo stdout $s; echo stderr $s >&2; done"
cmd = ["sh", "-c", script]
Exile.stream!(cmd, stderr: :consume, ignore_epipe: true) |> Enum.to_list()

[
  stdout: "stdout 1\nstdout 2\nstdout 3\nstdout 4\nstdout 5\nstdout 6\n",
  stdout: "stdout 7\nstdout 8\nstdout 9\nstdout 10\n",
  stderr: "stderr 1\nstderr 2\nstderr 3\nstderr 4\nstderr 5\nstderr 6\nstderr 7\nstderr 8\nstderr 9\nstderr 10\n"
]
  • this is what a naive user would expect in output
script = "for s in $(seq 1 10); do echo stdout $s; echo stderr $s >&2; done 2>&1"
cmd = ["sh", "-c", script]
Exile.stream!(cmd, ignore_epipe: true) |> Enum.to_list() |> IO.puts()

stdout 1
stderr 1
stdout 2
stderr 2
stdout 3
stderr 3
stdout 4
stderr 4
stdout 5
stderr 5
stdout 6
stderr 6
stdout 7
stderr 7
stdout 8
stderr 8
stdout 9
stderr 9
stdout 10
stderr 10

Evidently this is all related to buffering inside the kernel, depending on how fast
the NIF handles data before the next read, and there may not be a lot to work
around this.

Is it possible to add a further stderr mode that handles the redirection already? stderr: :merge or similar, such that its not necessary to wrap a shell invocation around?

Or are there other options to handle this?

@akash-akya
Copy link
Owner

akash-akya commented Jun 19, 2024

@dch you mean just getting stdout & stderr data in a single stream? Something like this?

script = "for s in $(seq 1 10); do echo stdout $s; echo stderr $s >&2; done"
cmd = ["sh", "-c", script]

Exile.stream!(cmd, stderr: :consume, ignore_epipe: true) 
|> Stream.map(fn  {_io_stream, data} -> data end)
|> Enum.to_list()
|> IO.puts()

Producing output

stdout 1
stdout 2
stdout 3
stdout 4
stdout 5
stdout 6
stderr 1
stderr 2
stderr 3
stderr 4
stderr 5
stderr 6
stdout 7
stdout 8
stdout 9
stdout 10
stderr 7
stderr 8
stderr 9
stderr 10

Or like preserving the natural write ordering as in the second example?

Preserving the order is interesting, I initially skipped because merging stdio streams can cause lot of problems when binary data is involved. But I guess it makes sense to support it via explicit flag for those who know what they are doing.

@dch
Copy link
Contributor Author

dch commented Jun 20, 2024

I will try to come up with a (short) explanation of the risks involved here, but yes if it is possible to use the same file descriptor/handle inside the NIF for this case, then at least text-based tools that alternate output between stderr/stdout can work as expected, without needing to wrap the shell.

For threaded, forked processes, or raw binary data this will obviously produce garbage, but this is also likely to be relatively obvious during even basic tests.

@akash-akya
Copy link
Owner

akash-akya commented Jun 21, 2024

Yeah, that makes sense.
I added a basic support in dev branch. Just need pass stderr: :redirect_to_stdout option. I have to test few more things before merge. But If possible please try it out and share your feedback.
Thanks

@dch
Copy link
Contributor Author

dch commented Jul 8, 2024

this works very nicely, thanks. How is this for a sample explanation?

Using redirect_to_stdout

On many systems, stdout and stderr are separated, and between
the source programme, via the kernel, to Exile, are several places
that may buffer data, even temporarily, before Exile is ready
to read them. There is no enforced ordering of the readiness of
these independent buffers, for Exile to make use of.

This can result in unexpected behaviour, including:

  • mangled data, for example, UTF-8 characters may be incomplete
    until an additional buffered segment is released on the same
    source
  • raw data, where binary data sent on one source, is incompatible
    with data sent on the other source
  • interleaved data, where what appears to be synchronous, is not

Most well-behaved command-line programmes are unlikely to exhibit
this, but you need to be aware of the risk.

A good example is streaming JSON from an external tool to Exile,
where normal JSON output is expected on stdout, and errors
or warnings via stderr. In the case of an unexpected error,
the stdout stream could be incomplete, or the stderr message
might arrive before the closing data on the stdout stream.

@akash-akya
Copy link
Owner

Hey @dch, that's much better than I could have come up with!
Thanks for checking the changes and explanation 👍🏼

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants