-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add logic to clear cancelled connection requests #598
Conversation
This branch adds logic to clear cancelled connections from the riak_core_connection_manager Pending state. I have also added a few helper functions to give a better view into request states(get_request_states) and connection errors(get_connection_errrors). Connections are cancelled only when a user runs `riak-repl disconnect NAME/[IP,PORT]`. I’ve added logic to this path to call remove_cancelled_connections after waiting cm_cancellation_interval. This interval must either be an integer or the atom never. This logic only makes one attempt to remove the cancelled connection. It might be nice to have a helper function that we can call to manually clear the connections but currently this can be done by re-trying `riak-repl disconnect`. The riak_test showing this issue is `replication2_connections.erl` and the issue can be demonstrated with this code by setting the `cm_cancellation_interval` to `never`.
Dialyzer is returning the following problems:
|
Oh god, the With that said, the call at that line is I don't see the issue here. |
Test for this behavior: basho/riak_test#626 |
@bsparrow435 How are we looking here? |
%% The helper process will discover the cancellation when it asks if it | ||
%% should connect to an endpoint. | ||
State#state{pending = lists:keystore(Req#req.ref, #req.ref, Pending, | ||
Req#req{state = cancelled})} | ||
end. | ||
|
||
%% remove pending connection from state for supplied Ref. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing @doc
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the rest of the module, most functions are missing @doc
and type specifications. I'll add these to my new functions but would it be ok to go ahead and add the rest in this PR?
Changes the atom for canceling cleared connections from `never` to `undefined`. Cleaned up logic and added `@doc` to one new function.
Just pushed commit addressing comments. Working on associated |
%% The helper process will discover the cancellation when it asks if it | ||
%% should connect to an endpoint. | ||
State#state{pending = lists:keystore(Req#req.ref, #req.ref, Pending, | ||
Req#req{state = cancelled})} | ||
end. | ||
|
||
%% @doc remove pending connection from state for supplied Ref. | ||
remove_cancelled_connection(Ref, State = #state{pending = Pending}) -> | ||
case lists:keytake(Ref, #req.ref, Pending) of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't see how this will ever match, given that the second parameter is supposed to be an integer. Does this grab the default value from the record?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did it this way because a few lines above we did the same thing in lists:keystore. My understanding is that #req.ref
will return an integer representing it's place in the record. I'll try testing that now to confirm.
Looking at the rest of the module and other modules in riak_repl
this is fairly common in all lists:key*
operations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just confirmed this behavior with the following immediately below line 431 of this file:
lager:notice("BRIAN RETURN: ~p",[#req.ref]),
Result:
2014-06-04 18:50:57.294 [notice] <0.2467.0>@riak_core_connection_mgr:disconnect_from_target:431 BRIAN RETURN: 2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, very interesting. Thanks for taking the time to explain why it works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No big deal now, but that is definitely something I've never seen before. It's clever, but I think too much so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went to the docs, from the bottom of the Records page: http://www.erlang.org/doc/reference_manual/records.html
In addition, #Record.Name returns the index in the tuple representation of Name of the record Record. Name must be an atom.
So this is an explicit, by design behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems the reason this works is because people were tired of defining indexes for elements in a record.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I know it's by design. I was arguing it was obtuse, given two Erlang programmers who have been doing it for 2+ years didn't know what it was doing, so we were making an argument to just put the integer value in (or macro).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, and I know this is code you inherited, @bsparrow435, so not asking for it to be changed now. Might as well be consistent within the module until we fix the whole mess.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just leave it as is and we'll address all of it later.
I'm still concerned given this code introduced a new Dialyzer error which is not on master which should be fixed prior to merge. Otherwise, once the Dialyzer warning is fixed, and the test refactored, we're good to go here. |
I just found an issue where, if connected/disconnected in the correct order within 5 minutes, the endpoint(ex. "127.0.0.1":"10046") will be blacklisted forever. I need to add some logic to reset the retry_backoff. Will work on it tonight. |
Also will be adding another riak_test to show issue |
False alarm, i wasn't waiting long enough for the endpoint backoff to finish. Still, going to add one more step to the riak_test to prove reconnection is possible as well as adding logic to reset the backoff timer when the cancelled connection is removed. |
Dialyzer issue is the last open item:
Myself and @cmeiklejohn are looking into it now. |
@reiddraper found it; warning was is in the ignore file, but line number changed. Just push a commit to update the line number. |
Nice find, thanks. |
Pushed d61ff7e with updated line numbers. |
👍 d61ff7e |
…ions Add logic to clear cancelled connection requests Reviewed-by: cmeiklejohn
@borshop merge |
This branch adds logic to clear cancelled connections from the
riak_core_connection_manager Pending state. I have also added a few
helper functions to give a better view into request
states(get_request_states) and connection
errors(get_connection_errrors).
Connections are cancelled only when a user runs
riak-repl disconnect NAME/[IP,PORT]
. I’ve added logic to this path to callremove_cancelled_connections after waiting cm_cancellation_interval.
This interval must either be an integer or the atom never.
This logic only makes one attempt to remove the cancelled connection.
It might be nice to have a helper function that we can call to manually
clear the connections but currently this can be done by re-trying
riak-repl disconnect
.The riak_test showing this issue is
replication2_connections.erl
andthe issue can be demonstrated with this code by setting the
cm_cancellation_interval
tonever
.