Skip to content
This repository has been archived by the owner on Jun 11, 2022. It is now read-only.

Hangs when reading from child process on Windows #159

Closed
jake-at-work opened this issue Apr 9, 2018 · 8 comments
Closed

Hangs when reading from child process on Windows #159

jake-at-work opened this issue Apr 9, 2018 · 8 comments

Comments

@jake-at-work
Copy link
Contributor

The below code works great on MacOS, Linux and Solaris but hangs consistently on Windows under unusual conditions. It works fine on Window 2012r2 when the compile program is executed directly in a command console but hangs when it is executed as child process of CTest (CMake). It hangs in both scenarios on Windows 10. In all hangs the threads are stuck in std::getline() and the child processes they are reading from have exited.

Code:

    ipstream outStream;
    child gfsh(GFSH_EXECUTABLE, args = command, std_out > outStream);

    std::string line;
    while (gfsh.running() && std::getline(outStream, line) && !line.empty())
      BOOST_LOG_TRIVIAL(debug) << "Gfsh::execute: " << line;

    gfsh.wait();

Stack:

integration-test-2.exe!boost::winapi::ReadFile(void * hFile, void * lpBuffer, unsigned long nNumberOfBytesToWrite, unsigned long * lpNumberOfBytesWritten, boost::winapi::_OVERLAPPED * lpOverlapped) Line 552
integration-test-2.exe!boost::process::detail::windows::basic_pipe<char,std::char_traits<char> >::read(char * data, int count) Line 87
integration-test-2.exe!boost::process::basic_pipebuf<char,std::char_traits<char> >::underflow() Line 192

Having read the warnings about synchronous IO hanging if the process has exited I switched to asynchronous IO using the std::future<std::string> approach. Unfortunately it exhibits a similar issue and hangs. It hangs in ios::run():

Async:

  io_service ios;
  std::future<std::string> output;

  child gfsh(GFSH_EXECUTABLE, args = commands, std_out > output, ios);

  ios.run();
  gfsh.wait();

  BOOST_LOG_TRIVIAL(debug) << "Gfsh::execute: " << output.get();

Stack:

integration-test-2.exe!boost::asio::detail::win_iocp_io_context::do_one(unsigned long msec, boost::system::error_code & ec) Line 381
integration-test-2.exe!boost::asio::detail::win_iocp_io_context::run(boost::system::error_code & ec) Line 163
integration-test-2.exe!boost::asio::io_context::run() Line 62

One interesting thing to note is that the command we are executing is a java process that itself spawns a child java process before exiting. The grandchild preprocess is long living. In both of these cases, sync and async pipes, if I kill the grandchildren our executable continues. So in some way the pipe is blocked waiting for the exit of the grandchildren, the children of the child we are executing and have the pipe attached to.

@jake-at-work
Copy link
Contributor Author

Ok, a key piece of information I didn't think to include actually holds the key to what may be the root cause of the issue. This code is called in parallel via std::async to start multiple sub-processes simultaneously. As a result I believe we are hitting this issue document here:
https://blogs.msdn.microsoft.com/oldnewthing/20131018-00/?p=2893

Working off that assumption I added a mutex around the child constructor so that construction of a new sub process is synchronized. It work!

Hack:

  child gfsh;
#if defined(_WINDOWS)
  {
    std::lock_guard<std::mutex> guard(g_child_mutex);
#endif
    gfsh = child(GFSH_EXECUTABLE, args = commands, env, std_out > outStream,
                 std_err > errStream, std_in < null);
#if defined(_WINDOWS)
  }
#endif

Any chance that the fixes mentioned in the attached link can be applied to the Windows implementation of boost::process to avoid this issue?

@jake-at-work
Copy link
Contributor Author

Another blog article with better details of the issue. https://blogs.msdn.microsoft.com/oldnewthing/20111216-00/?p=8873

@klemens-morgenstern
Copy link
Owner

klemens-morgenstern commented Apr 10, 2018

We had a discussion about the inheritance of handles already and that's just too much work atm for me to do implement it, because it should be an optional feature. Instead of using std::async you might want to go with a boost::asio::io_context::strand which would solve the issue and "feels" like async.

@jake-at-work
Copy link
Contributor Author

Thanks! Would you recommend a good source for documentation on strands, the Boost docs are pretty sparse.

Are you suggesting that the creation of the child take place in a strand in the io::run() loop?

@klemens-morgenstern
Copy link
Owner

klemens-morgenstern commented Apr 10, 2018

Yeah, I hope the following works:

namespace asio = boost::asio;

io_context ioc;
strand s(ioc);

child c1, c2, c3;
s.post([&]{c1 = child(...);});
s.post([&]{c2 = child(...);});
s.post([&]{c3 = child(...);});

ioc.run();

//here the childen should be started.

I'm mentioning that because you might be running an io_context anyhow if you use the async-io. So you can just use one instance to cover all those applications and it feels somewhat like async without spawning a whole lot of threads.

@klemens-morgenstern
Copy link
Owner

I have some applications, but in proprietary code, so I can't really share that. If you need some consulting or development, feel free to write me an E-Mail.

@klemens-morgenstern
Copy link
Owner

Were you able to resolve the issue?

@klemens-morgenstern
Copy link
Owner

Closing due inactivity, feel free to reopen or create another one if you run into problems.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants