-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HAproxy health-check causes node shutdown #54
Conversation
It's worth noting that this was not an issue on 1.4.x because we didn't have the security features that require checking the peername, so the process would simply shutdown normally at a time after the |
This should address #54, where the socket is invalid even before the server receives control of it.
^^ @cipy |
I tested two haproxy configs and I only see this issue in pre11 when Saying that, having tested the configuration @cipy is using (including nolinger) the pre11 crashes as reported, and this branch stopped the crashing behaviour. So +1 [0] http://cbonte.github.io/haproxy-dconv/configuration-1.4.html#option%20nolinger |
Thanks @roooms. Can I get an Eng +1? |
%% The call to set_socket should reply ok, but shutdown the | ||
%% server, not crash and propagate back to the listener process. | ||
?assertEqual(ok, set_socket(Server, ServerSocket)), | ||
?assertNot(erlang:is_process_alive(Server)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks racy. How about setting a monitor on the server, and expect a 'DOWN' message within a given timeout instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea.
Quick note: if you were to run the eunit test on the old code, it is skipped because of way the crash happens, not marked as failed. |
|
||
%% Pretend that we're a listener, listen on any port | ||
|
||
{ok, Server} = start_link(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So probably a good idea to start without linking to the test process here to really verify what happens to it.
The test looks much better, and now fails with the old code, passes with the fixes. I did not test the HAProxy configuration myself, as Dan has already and know nothing about HAProxy. I hope between my 👍 and his it's enough. 💃 ⛵ |
Thanks @engelsanchez @roooms @cipy |
HAproxy health-check causes node shutdown
HAproxy uses a naïve health-check mechanism that simply opens a TCP socket and then closes it immediately after it is accepted by the server. This causes the call to
peername/1
inriak_api_pb_server
to return{error, enotconn}
instead of the expected{ok, Peername}
. The resulting badmatch propagates to the listener process (it was waiting on a reply viagen_fsm:sync_send_event
), which then crashes the application supervisor after too many restarts.Using this example config: