Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Made many rpc:call/4,5 calls safer if rex is down. #586

Merged
merged 2 commits into from
May 28, 2014

Conversation

lordnull
Copy link
Contributor

Created a couple of utility functions to handle do the wrapping. Most
rpc:call/4,5 that where not wrapped in a try/catch have been changed to
use the utility. Those that weren't appeared to want a crash due to
context (such as explictly matching for only the success value).

There are some dialyzer issues that were introduced, however I'm not able to see how they are related to the changes.

Created a couple of utility functions to handle do the wrapping. Most
rpc:call/4,5 that where not wrapped in a try/catch have been changed to
use the utility. Those that weren't appeared to want a crash due to
context (such as explictly matching for only the success value).
@reiddraper
Copy link
Contributor

I'm working on fixing the failing eunit tests, but you'll need to fix the dialzyer bits.

@rzezeski
Copy link
Contributor

So, this is a drive-by review but are we sure we want to change all rpc calls? Idiomatic Erlang is against defensive programming and maybe not all these places need to be defended from a crash. My main concern is overhead and if any of these changed calls sit in the hot path. Are we adding a few percentage points of overhead for something that is a rare event?

@lordnull
Copy link
Contributor Author

For the most part, I only added a the safer version where there was both already a check for {badrpc, _Error} and no try/catch in place. So I'm trying to put these in places where people didn't expect this call to fail.

Personally, I do think it's going to be a rare event that the rex process won't be running. It's more likely that any gen process the function passed to rpc will be down (and even then, it seems very unlikely). The changes made here simply change what happens in the first cast to be the same as what happens in the second case.

I tried to find how expensive a try/catch is compared to just a case statement, and the closest I could find was http://erlang.org/pipermail/erlang-questions/2013-November/075928.html. The author implies that try/catch is as performant as case (or at least as close that it did not much matter), and the real killer was the getting a stack trace.

So, to loop around to your question, @rzezeski, I don't think it's going to be a few percentage points, but fractions of a percent. Still, are those fractions of a percent a reasonable trade-of for code to work as expected in the circumstances where someone thought to check {badrpc, _} but didn't expect different behavior if rex itself is down? Dunno at the moment.

@rzezeski
Copy link
Contributor

@lordnull I just wanted to make sure we put some thought into this and you certainly have. With that I will stop my back seat code reviewing.

@kellymclaughlin kellymclaughlin added this to the 2.0-RC milestone May 23, 2014
@reiddraper reiddraper self-assigned this May 23, 2014
@reiddraper
Copy link
Contributor

Grabbing review.

@cmeiklejohn
Copy link
Contributor

How is this looking for merging? Replication has a dependency on this PR.

Result
catch
'EXIT':{noproc, _NoProcDetails} ->
{badrpc, rpc_process_down}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth considering a lager warning here? This is a real weird situation, and maybe one we should be alerted to. (rex being down).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, backseat reviewer again. There should be a crash report if rex goes down. Given that the point of this function is to convert an exception into an error tuple shouldn't it be up to the caller to decide if a log message is needed? The caller may not care if rex is down and choose to ignore this situation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty much my thought as well; if the caller cares if rex is down, then the function returns enough information to act on it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, fair enough. That's fine, but I'll just put here for posterity that the one time we've seen rex crash, we never saw a log, because the FS on that node went read-only. But maybe enough other things will go wrong to alert the operator to that fact.

@reiddraper
Copy link
Contributor

Just a few minor comments, but this is looking good to me otherwise.

@cmeiklejohn
Copy link
Contributor

@lordnull Once you address the comments, we should be good to merge.

@reiddraper
Copy link
Contributor

Sounds like the consensus is to skip the lager messages. I think we're good to go here.

@lordnull
Copy link
Contributor Author

Of course eunit failed. Let me see if I can recreate...

@lordnull
Copy link
Contributor Author

Forced a rebuild and eunit fails on a different test. Passes on my machine (both raw branch and rebased). Merge anyway?

@reiddraper
Copy link
Contributor

@lordnull yeah we'll likely merge anyway? Can you point me at the failure though, so I can make sure an issue is filed about it?

@lordnull
Copy link
Contributor Author

lordnull added a commit that referenced this pull request May 28, 2014
Made many rpc:call/4,5 calls safer if rex is down.
@lordnull lordnull merged commit 310af4d into develop May 28, 2014
@lordnull lordnull deleted the bugfix/mw/safer_rpc branch May 28, 2014 23:30
@reiddraper
Copy link
Contributor

For the record. The first failure is fixed in #589. The second is fixed in #587.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants