Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve subscription reliability #90

Open
rixmann opened this issue Mar 21, 2016 · 5 comments
Open

improve subscription reliability #90

rixmann opened this issue Mar 21, 2016 · 5 comments
Assignees

Comments

@rixmann
Copy link
Member

rixmann commented Mar 21, 2016

When replying asynchronously from a hello server, the server context disappears (due to network failure).
For subscriptions to work reliable this mechanism has to become more robust.

currently on the server side this error message is observed when trying to push a message to a client where the server context was removed by hello:

Mar 09 17:53:02 vlx129-tpb pcs[186]: ** (MatchError) no match of right hand side value: :invalid_identity
Mar 09 17:53:02 vlx129-tpb pcs[186]: 17:53:02.749 [error] GenServer #PID<0.1693.0> terminating
Mar 09 17:53:02 vlx129-tpb pcs[186]: Supervisor hello_listener_supervisor had child {hello_zmq_listener,{ex_uri,"zmq-tcp",
                                                                 {ex_uri_authority,undefined,"172.30.3.10",27001},
                                                                 [],undefined,undefined}} started with hello_zmq_listener:start_link({ex_uri,"zmq-tcp",{ex_uri_authority,undefined,"172.30.3.10",27001},[],undefined,undefined}) at <0.1693.0> exit with reason no match
Mar 09 17:53:02 vlx129-tpb pcs[186]: CRASH REPORT Process <0.1693.0> with 0 neighbours exited with reason: no match of right hand value invalid_identity in hello_zmq_listener:handle_info/2 line 99 in gen_server:terminate/7 line 804
Mar 09 17:53:02 vlx129-tpb pcs[186]: gen_server <0.1693.0> terminated with reason: no match of right hand value invalid_identity in hello_zmq_listener:handle_info/2 line 99

in case of network failure it would be ideal if hello could cash the messages to be send and only discard the context after the connection is long dead (to be determined by heartbeats?).

the client must also identify the dead connection after a similar timeout, then crash the hello client (which is to be restarted by a supervisor).

@surik
Copy link
Contributor

surik commented Mar 21, 2016

Here is a fix for this crash: bb7943c. Please update hello.

I will look more into reconnecting client after context disappeared.

@surik surik self-assigned this Mar 21, 2016
@thz
Copy link
Contributor

thz commented Mar 21, 2016

but that is just changing crash->errorlog, while not touching the reason

@surik
Copy link
Contributor

surik commented Mar 21, 2016

@thz yes, I just noticed that here isn't the last hello.

@hwinkel
Copy link
Member

hwinkel commented Mar 21, 2016

hhm. was such a robust behavior not one of the reasons to have something like hello. The underlaying tools like zeroMQ or Erlang distribution or lately http2 give you already the building blocks. I.e. ZeroMQ would reconnect sockets after TCP failure and Erlang Node Communication kann also be configured with more aggressive connectivity checks. And as far i know someone has implemented hello echos above the transport for exactly this reasons.

@surik
Copy link
Contributor

surik commented Mar 22, 2016

Hello doesn't have any subscription mechanism. Here is just a notify method which allows to send message from server to client. In this case subscription mechanism is a part of application layer.
As improvements for implementing good pubsub over hello with reused socket I may suggest add feedback for notify method. If happened something wrong you will know about that and be able to remove subscription in your application.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants