Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Invalid argument" error in osrm-routed causes hang #4506

Closed
xnyhps opened this issue Sep 14, 2017 · 5 comments
Closed

"Invalid argument" error in osrm-routed causes hang #4506

xnyhps opened this issue Sep 14, 2017 · 5 comments

Comments

@xnyhps
Copy link

xnyhps commented Sep 14, 2017

Helllo,

I'm using osrm-routed on macOS and in the last few days I've run into a problem with it hanging, accepting new connections but never sending any HTTP response.

At first I thought it was overloaded doing some intensive queries but Activity Monitor on macOS showed the process completely idle. Interestingly, it was also showing up as having only 1 thread, while it was initially running with the default number of threads (8).

After adding some debugging code to the server code I noticed the following:

HandleAccept (https://github.com/Project-OSRM/osrm-backend/blob/master/include/server/server.hpp#L94) is called with an error (error code 22), which causes the acceptor to not accept any new connections. I'd expect the program to shut itself down after that (due to the thread->join()), but this appears to not happen.

I think HandleAccept should either be modified to ensure that the server shuts down when an accept call fails, or to log the error and continue. I'm not sure which is the best in this situation, as I can't figure out what causes this error. The message is "Invalid argument", but I have no idea to what function call an argument is invalid. If you want me to get more logging to figure out why exactly the error happens, let me know.

@oxidase
Copy link
Contributor

oxidase commented Sep 15, 2017

@xnyhps thanks for flagging the issue! OSRM server should at least log errors. Please could you check OSRM with dtrace (strace) to find failed syscalls?

@xnyhps
Copy link
Author

xnyhps commented Sep 15, 2017

Ah, good idea. I've followed it with dtruss and I've pasted a log here: https://gist.github.com/xnyhps/c68cc55ddd8d42e662dc8242c9c5975e

Line 29 appears to be the line where the error 22 happens, in setsockopt. After that, the worker threads are shutting down.

@xnyhps
Copy link
Author

xnyhps commented Sep 15, 2017

And here's one with a backtrace for the failing call:

https://gist.github.com/xnyhps/26a2bb1368e223a46cac849dccefa440

Socket option 0x1022 on macOS is SO_NOSIGPIPE. Probably called from here: https://github.com/boostorg/asio/blob/master/include/boost/asio/detail/impl/socket_ops.ipp#L121.

Searching specifically for SO_NOSIGPIPE and "Invalid argument" I found zeromq/libzmq#1442. So apparently it's possible that the connection is closed by the peer between the accept and setsockopt calls, causing this error. If that's the case, it should be perfectly fine to ignore the error and create a new socket and async_accept on that.

@oxidase
Copy link
Contributor

oxidase commented Sep 18, 2017

@xnyhps thanks for analysis! On my side i have no OSX to reproduce and fix the issue.
Please could you open a PR with a error code check at https://github.com/Project-OSRM/osrm-backend/blob/master/include/server/server.hpp#L96

!e
#if defined(__MACH__) && defined(__APPLE__) || defined(__FreeBSD__)
     // connection can be closed by the peer between the accept and setsockopt calls that potentially can fail with EINVAL
 ||  /* check if e is boost::asio::error::invalid_argument */
#endif

@SiarheiFedartsou
Copy link
Member

Stale.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants