-
Notifications
You must be signed in to change notification settings - Fork 392
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More robustness additions for the info service #904
Conversation
macintux
commented
Mar 3, 2017
- Do not let riak_core crash on startup if the eleveldb callbacks are not present
- Verify the callbacks (3 of them, anyway) are available at registration time
- Add context to the registration and shutdown callbacks for future eleveldb work
* PoC for information service solution. riak_core_info_service*.erl files contains the solution. * Make the info service process do the riak_core call Changed the code placement of where the result to give back to the dependent process happens. Now the Result is calculated in the riak_core_info_service_process callback_router function and then the handler is invoked. * Cleanup service. Hard code eleveldb process for right now. * Export `callback_router/5` because otherwise `spawn` doesn't work. * More cleanup... - Add `state` record to `riak_core_info_service_process` to resolve issues with order of parameters. - Make children of `riak_core_info_service_sup` `permanent` rather than `temporary` * Add "TODO" in riak_core_app to move some code to `riak_kv` later. * Fix docs in `riak_core_info_service` to match final implementation details. - Also, rename `get` handler to `invoke` * remove logging statement * Dialyzer fixes and additional sys:handle_debug call to also call `handle_debug` on outbound messages. * Dialyzer fixes. * Make `riak_core_info_service_process` a `gen_server`. - Invoke the `shutdown` callback in `Mod:terminate` * Create `state` record in `start_link` so we can just pass it to `init` rather than passing all the parameters down to `init` * Add standard license headers Also picked up a few spurious whitespace fixes. * Add edoc cross-references Since `riak_core_info_service` provides such a detailed doc string, refer to it from the `_process` and `_sup` modules. * Extensive edoc updates * Add paragraph breaks * Add section headers * Add cross-references * Add code markup * Add type documentation for `callback()` * Clarify usage docs * Rename all occurences of source The name "source" was a bit confusing, so it has been renamed "provider" (as a binding name) or "service_provider" (as a record field). Also corrected the 3rd element of the invocation tuple: it's not a list of terms, it's just a term (which, of course, can be a list). * Edoc bugfix Unrelated to this branch but required to get `make docs` to work: move a < sign into a code span so that it doesn't throw off the XHTML parsing. * Add unit tests * Message handling updates per code review * Use proper OTP shutdown mechanism when `callback_shutdown` message is received. * Log and shutdown gracefully when an unknown message is received. * Add error handling Impose more structure: * Callbacks can no longer be `undefined` * The primary consumer callbacks (registration, response handler) must return `ok` or the service process will terminate * Catch exceptions * Log exceptions and other unexpected return values * Add "Error handling" section to module doc * Fix code_change As Ted noted, must return an `ok` tuple. * Address type problems * Explicitly disregard the return value from `apply_callback/2` * Fix the error tuple typespec for `apply_callback/2` * Register a shutdown handler The eleveldb callbacks in this file did not previously include a shutdown registration. It seems self-evident that if the service process dies prematurely we want to spin up a new one, so register the `start_eleveldb_info_service/0` function from `riak_core_app` as its own shutdown handler. Presumably in the case of an application/node shutdown the service supervisor would terminate preventing this from turning into an infinite loop, but I do wonder whether there should be some differentiation for application shutdown vs process failure. * Add end-to-end unit test Spin up the supervisor and ask `riak_core_ring` for a fresh ring. Also a minor doc update to clarify the provider parameters passed in the result. * Add unhappy path test Add a test which throws an exception when receiving a response from riak_core and make sure we see a shutdown message. * Clean up tests These tests were failing when invoked as part of a larger suite due to unrelated messages arriving from earlier tests. * Flush the mailbox during setup * Be more selective about which messages we process * Add more comments * Refactor away some boilerplate redundancy * Fix eleveldb info service shutdown * The shutdown callback always takes the info service process pid being shut down as an argument, so create a new function to take that pid * Tell eleveldb the pid is no longer in service (corresponding function is already in the appropriate branch for review, MvM is working on the NIF) * The supervisor will restart the process with the callbacks, including registration, so don't try to re-register after shutdown * Remove unnecessary layer of indirection at shutdown
Make certain that the callbacks (at least the ones we know arity for) are available.
riak_core should absolutely not crash if the registration failed.
This way eleveldb knows what the service process is for. We anticipate additional such registrations.
Thanks @macintux! Settings---
minimum_reviewers: 2
merge: false
build_steps:
- make clean
- make deps
- make compile
- make test
- make xref
- make dialyzer
org_mode: true
timeout: 1800 |
Looks good! 👍✅ MERGE
✅ MAKE_CLEAN
✅ MAKE_DEPS
✅ MAKE_COMPILE
✅ MAKE_TEST
✅ MAKE_XREF
✅ MAKE_DIALYZER
⬜ 0 of 2 Code reviews from organization basho |
Replaces #903 because git weirdness |
basho/riak_core#904 will result in new `bucket_props` argument being delivered to the registration and shutdown callbacks per MvM's request.
Looks good! 👍✅ MERGE
✅ MAKE_CLEAN
✅ MAKE_DEPS
✅ MAKE_COMPILE
✅ MAKE_TEST
✅ MAKE_XREF
✅ MAKE_DIALYZER
⬜ 0 of 2 Code reviews from organization basho |
src/riak_core_info_service.erl
Outdated
@@ -227,6 +248,10 @@ exception_test() -> | |||
teardown(Sup), | |||
ok. | |||
|
|||
no_callback_test() -> | |||
Key = 'no_callback_test', | |||
?assertMatch({error, _}, riak_core_info_service:start_service(?BOGUS_REG, ?SHUT(Key), ?GET_RING, ?HANDLER(Key))). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be ?assertMatch({error, {not_callable, _Fun}}, ...
?
{error, {not_callable, Fun}} | ||
end. | ||
|
||
verify_callable([]) -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Channeling Nick to point out that our style guidelines prefer not doing our own recursion.
This could be neatly handled with lists:all/2
or, for reporting, lists:filtermap/2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happier with custom recursion in cases where we fail early, although admittedly there are only 3 functions to check.
Do not approve this; |
Looks good! 👍✅ MERGE
✅ MAKE_CLEAN
✅ MAKE_DEPS
✅ MAKE_COMPILE
✅ MAKE_TEST
✅ MAKE_XREF
✅ MAKE_DIALYZER
⬜ 0 of 2 Code reviews from organization basho |
Should be safe to review this PR now. It fortunately incorporates the earlier approved #898 and Doug's Thumbs fix to prevent an auto-merge. |
We may need to rethink the supervisor restart intensity and period values and/or |
Manual testing points to a couple of issues:
Will keep digging. |
I was wrong. The log messages were present, just further back in the log than I expected.
Adding |
`verify_callable` attempts to make sure the callbacks are valid via `erlang:function_exported/3`, but that returns false if the module is not loaded.
Looks good! 👍✅ MERGE
✅ MAKE_CLEAN
✅ MAKE_DEPS
✅ MAKE_COMPILE
✅ MAKE_TEST
✅ MAKE_XREF
✅ MAKE_DIALYZER
⬜ 0 of 2 Code reviews from organization basho |
As I feared, an ill-behaved registration can force a restart of the supervisor (and thus lose any other registered consumers). Ruminating. |
Looks good! 👍✅ MERGE
✅ MAKE_CLEAN
✅ MAKE_DEPS
✅ MAKE_COMPILE
✅ MAKE_TEST
✅ MAKE_XREF
✅ MAKE_DIALYZER
⬜ 0 of 2 Code reviews from organization basho |
The Erlang supervision model does not allow for a child to be removed if it restarts more often than the intensity/period values permit. Instead, the supervisor itself and all children are terminated. This makes the "subscribers" to `riak_core` data susceptible to completely unrelated errors. To reduce the probability of this occurring, log non-`ok` return values from the response handler callback instead of terminating the service process.
If an earlier test fails, `teardown` may not be invoked properly and later tests will fail during `setup`. Terminate lingering supervisors.
There seems to be an issue with build step **make_test** ! ☁️✅ MERGE
✅ MAKE_CLEAN
✅ MAKE_DEPS
✅ MAKE_COMPILE
⛔ MAKE_TEST
✅ MAKE_XREF
✅ MAKE_DIALYZER
⬜ 0 of 2 Code reviews from organization basho |
Now that a non-`ok` return from the response handler should not crash the service process, change the exception test to see whether an `exit` call results in a service process failure.
Looks good! 👍✅ MERGE
✅ MAKE_CLEAN
✅ MAKE_DEPS
✅ MAKE_COMPILE
✅ MAKE_TEST
✅ MAKE_XREF
✅ MAKE_DIALYZER
⬜ 0 of 2 Code reviews from organization basho |
Looks good! 👍✅ MERGE
✅ MAKE_CLEAN
✅ MAKE_DEPS
✅ MAKE_COMPILE
✅ MAKE_TEST
✅ MAKE_XREF
✅ MAKE_DIALYZER
⬜ 0 of 2 Code reviews from organization basho |
Looks good! 👍✅ MERGE
✅ MAKE_CLEAN
✅ MAKE_DEPS
✅ MAKE_COMPILE
✅ MAKE_TEST
✅ MAKE_XREF
✅ MAKE_DIALYZER
⬜ 0 of 2 Code reviews from organization basho |
- Assert the the response handler function expected to crash is actually called by synchronizing with it. This also removes the need for the `timer:sleep(...)` - Assert the the response handler functions actually crashes.
Looks good! 👍✅ MERGE
✅ MAKE_CLEAN
✅ MAKE_DEPS
✅ MAKE_COMPILE
✅ MAKE_TEST
✅ MAKE_XREF
✅ MAKE_DIALYZER
⬜ 1 of 2 Code reviews from organization basho
|
+1 |
Looks good! 👍✅ MERGE
✅ MAKE_CLEAN
✅ MAKE_DEPS
✅ MAKE_COMPILE
✅ MAKE_TEST
✅ MAKE_XREF
✅ MAKE_DIALYZER
⬜ 0 of 2 Code reviews from organization basho |
The backend bits (eleveldb, leveldb*) have been merged to develop, so this PR should be fully functional without further branch-wrangling. |
Looks good! 👍✅ MERGE
✅ MAKE_CLEAN
✅ MAKE_DEPS
✅ MAKE_COMPILE
✅ MAKE_TEST
✅ MAKE_XREF
✅ MAKE_DIALYZER
⬜ 1 of 2 Code reviews from organization basho
|
+1 |