Made many rpc:call/4,5 calls safer if rex is down. #586

lordnull · 2014-05-22T19:08:41Z

Created a couple of utility functions to handle do the wrapping. Most
rpc:call/4,5 that where not wrapped in a try/catch have been changed to
use the utility. Those that weren't appeared to want a crash due to
context (such as explictly matching for only the success value).

There are some dialyzer issues that were introduced, however I'm not able to see how they are related to the changes.

Created a couple of utility functions to handle do the wrapping. Most rpc:call/4,5 that where not wrapped in a try/catch have been changed to use the utility. Those that weren't appeared to want a crash due to context (such as explictly matching for only the success value).

reiddraper · 2014-05-22T19:42:23Z

I'm working on fixing the failing eunit tests, but you'll need to fix the dialzyer bits.

rzezeski · 2014-05-22T19:50:19Z

So, this is a drive-by review but are we sure we want to change all rpc calls? Idiomatic Erlang is against defensive programming and maybe not all these places need to be defended from a crash. My main concern is overhead and if any of these changed calls sit in the hot path. Are we adding a few percentage points of overhead for something that is a rare event?

lordnull · 2014-05-22T20:28:18Z

For the most part, I only added a the safer version where there was both already a check for {badrpc, _Error} and no try/catch in place. So I'm trying to put these in places where people didn't expect this call to fail.

Personally, I do think it's going to be a rare event that the rex process won't be running. It's more likely that any gen process the function passed to rpc will be down (and even then, it seems very unlikely). The changes made here simply change what happens in the first cast to be the same as what happens in the second case.

I tried to find how expensive a try/catch is compared to just a case statement, and the closest I could find was http://erlang.org/pipermail/erlang-questions/2013-November/075928.html. The author implies that try/catch is as performant as case (or at least as close that it did not much matter), and the real killer was the getting a stack trace.

So, to loop around to your question, @rzezeski, I don't think it's going to be a few percentage points, but fractions of a percent. Still, are those fractions of a percent a reasonable trade-of for code to work as expected in the circumstances where someone thought to check {badrpc, _} but didn't expect different behavior if rex itself is down? Dunno at the moment.

rzezeski · 2014-05-22T21:11:59Z

@lordnull I just wanted to make sure we put some thought into this and you certainly have. With that I will stop my back seat code reviewing.

reiddraper · 2014-05-23T21:31:00Z

Grabbing review.

cmeiklejohn · 2014-05-27T17:15:31Z

How is this looking for merging? Replication has a dependency on this PR.

reiddraper · 2014-05-27T19:15:37Z

src/riak_core_util.erl

+            Result
+    catch
+        'EXIT':{noproc, _NoProcDetails} ->
+            {badrpc, rpc_process_down}


Is it worth considering a lager warning here? This is a real weird situation, and maybe one we should be alerted to. (rex being down).

Sorry, backseat reviewer again. There should be a crash report if rex goes down. Given that the point of this function is to convert an exception into an error tuple shouldn't it be up to the caller to decide if a log message is needed? The caller may not care if rex is down and choose to ignore this situation.

Pretty much my thought as well; if the caller cares if rex is down, then the function returns enough information to act on it.

OK, fair enough. That's fine, but I'll just put here for posterity that the one time we've seen rex crash, we never saw a log, because the FS on that node went read-only. But maybe enough other things will go wrong to alert the operator to that fact.

reiddraper · 2014-05-27T19:20:28Z

Just a few minor comments, but this is looking good to me otherwise.

cmeiklejohn · 2014-05-28T19:49:50Z

@lordnull Once you address the comments, we should be good to merge.

reiddraper · 2014-05-28T20:51:58Z

Sounds like the consensus is to skip the lager messages. I think we're good to go here.

lordnull · 2014-05-28T22:42:20Z

Of course eunit failed. Let me see if I can recreate...

lordnull · 2014-05-28T23:24:28Z

Forced a rebuild and eunit fails on a different test. Passes on my machine (both raw branch and rebased). Merge anyway?

reiddraper · 2014-05-28T23:28:14Z

@lordnull yeah we'll likely merge anyway? Can you point me at the failure though, so I can make sure an issue is filed about it?

lordnull · 2014-05-28T23:30:49Z

Previous failure: http://buildbot.bos1/builders/test-riak_core/builds/990

After a forced rebuild: http://buildbot.bos1/builders/test-riak_core/builds/1010

Made many rpc:call/4,5 calls safer if rex is down.

reiddraper · 2014-05-29T14:39:25Z

For the record. The first failure is fixed in #589. The second is fixed in #587.

Updated dialyzer ignored warnings.

f32408c

kellymclaughlin added this to the 2.0-RC milestone May 23, 2014

reiddraper self-assigned this May 23, 2014

lordnull mentioned this pull request May 27, 2014

Bugfix/mw/safer rpc basho/riak_repl#587

Merged

reiddraper reviewed May 27, 2014
View reviewed changes

lordnull added a commit that referenced this pull request May 28, 2014

Merge pull request #586 from basho/bugfix/mw/safer_rpc

310af4d

Made many rpc:call/4,5 calls safer if rex is down.

lordnull merged commit 310af4d into develop May 28, 2014

lordnull deleted the bugfix/mw/safer_rpc branch May 28, 2014 23:30

bowrocker mentioned this pull request May 30, 2014

Add test for safe_rpc when rex is down basho/riak_test#624

Merged

cmeiklejohn mentioned this pull request Jun 3, 2014

rpc:call try/catch rex #579

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Made many rpc:call/4,5 calls safer if rex is down. #586

Made many rpc:call/4,5 calls safer if rex is down. #586

lordnull commented May 22, 2014

reiddraper commented May 22, 2014

rzezeski commented May 22, 2014

lordnull commented May 22, 2014

rzezeski commented May 22, 2014

reiddraper commented May 23, 2014

cmeiklejohn commented May 27, 2014

reiddraper May 27, 2014

rzezeski May 27, 2014

lordnull May 28, 2014

reiddraper May 28, 2014

reiddraper commented May 27, 2014

cmeiklejohn commented May 28, 2014

reiddraper commented May 28, 2014

lordnull commented May 28, 2014

lordnull commented May 28, 2014

reiddraper commented May 28, 2014

lordnull commented May 28, 2014

reiddraper commented May 29, 2014

Made many rpc:call/4,5 calls safer if rex is down. #586

Made many rpc:call/4,5 calls safer if rex is down. #586

Conversation

lordnull commented May 22, 2014

reiddraper commented May 22, 2014

rzezeski commented May 22, 2014

lordnull commented May 22, 2014

rzezeski commented May 22, 2014

reiddraper commented May 23, 2014

cmeiklejohn commented May 27, 2014

reiddraper May 27, 2014

Choose a reason for hiding this comment

rzezeski May 27, 2014

Choose a reason for hiding this comment

lordnull May 28, 2014

Choose a reason for hiding this comment

reiddraper May 28, 2014

Choose a reason for hiding this comment

reiddraper commented May 27, 2014

cmeiklejohn commented May 28, 2014

reiddraper commented May 28, 2014

lordnull commented May 28, 2014

lordnull commented May 28, 2014

reiddraper commented May 28, 2014

lordnull commented May 28, 2014

reiddraper commented May 29, 2014