Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a Lock and Token manager for background jobs in riak #475

Closed
wants to merge 4 commits into from

Conversation

buddhisthead
Copy link
Contributor

Background Manager = Locks + Tokens + visibility

  • The goal is to allow riak sub-systems to coordinate use of shared resources,
  • e.g. protect from concurrent vnode folds on the same partition.
  • Locks and tokens have a settable maximum concurrency limit.
  • "Taken" locks and tokens are tracked in an ETS table.
  • max_concurrency is returned when the set limits are reached.
  • Processes that take locks are monitored.
  • Locks are released when the taking processes terminate.
  • Tokens are refreshed at a specified periodic rate.
  • Token processes are not monitored because tokens never "release".
  • A table manager is introduced to add persistence across process crashes,
  • and to allow proper table transfer to occur without losing the table.
  • An EQC test exercises the majority of the API. see test/bg_manager_eqc.erl
  • See the original PR for background manager here: Background Manager (Locks + Tokens) #364

The background manager has many commits and comments in it's original PR (#412), but is difficult to test with the latest branch of develop. This PR was originated to rebase our work on the develop branch and to squash the commit history to something more useful.

@ghost ghost assigned jrwest Dec 10, 2013
%% This could go into a queue to be played when the transfer happens.
{reply, undefined, State};
{unregistered, Token} ->
{reply, undefined, update_limit(Token, Rate, ?DEFAULT_TOKEN_INFO, State)};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the exception can (and I believe is) thrown before calling schedule_refill_tokens we never actually schedule the first refill:

1> riak_core_bg_manager:set_token_rate(new, {1000, 1}).
undefined
2> riak_core_bg_manager:get_token(new).
ok
3> timer:sleep(2000), riak_core_bg_manager:get_token(new).
max_concurrency
4> timer:sleep(2000), riak_core_bg_manager:get_token(new).
max_concurrency

This is not caught by the eqc test because eqc needs to do the token refilling in order to properly model things. I wonder if we could use eqc_component callouts to more tightly couple the test to the scheduling of the refill but that isn't something I see us extending the test to do quickly. This is something that is easier caught by hand for now unfortunately.

From a stylistic perspective, part of the issue here is the use of try/catch for control flow, imo. It is unclear where the exceptions may be thrown and what steps we do not perform in various cases, but alas, its probably too late for those sorts of changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I will fix the bug and add comments regarding where the exception can be thrown. We could have a debate about style, but I did try other ways to structure the flow; I think this is a good alternative, imho.

@jrwest
Copy link
Contributor

jrwest commented Dec 12, 2013

tl;dr I think we should maybe hold off on the historical stats interface or try to get a group together to discuss it soon and get some changes in quickly.

The historical stats and query interface needs further discussion before being merged. The current implementation's use of the process state is problematic, in part at least, because it can cause the process heap to grow pretty large. With the default 10k sample limit I was able to get the total_heap_size of riak_core_background_manager to fluctuate between 10 - 15MB. This state is regularly copied around as the background manager does various things so I saw it get as high as 21MB before garbage collection had a chance to clean things up. Once this limit state is reached it never shrinks. In addition, lowering the limit is insufficient. The state still growns unbounded with the number of registered tokens. With 100 registered tokens at the limit state the process heap was 141MB and erlang has no way for us to limit this.

In addition, the stats windows and limits most likely need to be configurable (at runtime) in addition to being able to turn them off, which right now they are not.

I wonder if the easiest thing to do for now is remove the code and revist it later (perhaps leaving some of the "update stat" functions in but making them no-ops w/ an appropriate comment? Although maybe that is just as much work as moving things over to folsom now?

Also, although maybe not necessarily part of this PR we should also discuss if this information gets exposed in the stats blob (I know there was some intention of doing this via riak-admin but if we have the historical information people should be able to get at it via API and plot it). Actually, as I type that, I wonder if we really need histograms. iirc most stats are just time-boxed counters that we expect users to plot over time. I guess this information is meant more for us for debugging but at the same time not every user is a basho customer.

limit :: non_neg_integer(), %% maximum available, defined by token rate during interval
refills :: non_neg_integer(), %% number of times a token was refilled during interval. 0 if lock
given :: non_neg_integer(), %% number of times this resource was handed out within interval
blocked :: non_neg_integer() %% number of blocked processes waiting for a token
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this entry no longer makes sense does it? if we were going to track something like this shouldn't it be the number of times we returned max_concurrency in the window?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blocked got left in by accident when I removed the blocking API. Measuring max_concurrency is a good addition, but given that we'll be re-designing the query API, I don't know.

@buddhisthead
Copy link
Contributor Author

With regard to the historical API, I agree that the current mechanism is not what we want. I was already on the path of designing something different - something that uses far less memory. We may want some windowed stats, but it's hard because the locks have unique names and if we try and keep a stat per lock, there will be a lot. e.g. for vnode locks, there is one per partition. I don't think we need to track 1020 stats on larger ring sizes.

From my own experience with using the vnode locks on the three subsystems (repo, aae, handoff), what I want to see is which locks are currently held (we have that), what recently hit max_concurrency, and probably a timestamp of when the lock was acquired so that I can see if it's "stuck". That covers the ps command.

As for head and tail, I'd like to just log the get_lock, get_token, and set_concurrency_limit calls to a separate log file and provide a tool for examining the file, separate from the bg_manager API. Dirt simple. I have some cool ideas for visualizing this data, but the main point is just to log it. I'll pause now for comments.

@jrwest
Copy link
Contributor

jrwest commented Dec 13, 2013

That all sounds fine. For the logging to a file stuff I would take a look at https://github.com/basho/lager#tracing.

@buddhisthead
Copy link
Contributor Author

I just finished an experiment using trace, but unfortunately it logs to the console.log file in addition to my custom logfile. I really want to keep my traces OUT of the console.log. @Vagabond Any suggestions?

Here is what I tried. I did get entries in my custom file, but I also got all the same entries in console.log

lager:trace_file("log/bg_manager.log", [{bg_mgr_op, get_resource}], info)

give_resource(Resource, Type, Pid, Ref, Meta, State) ->
    lager:info([{bg_mgr_op, get_resource}], "get_~p:~p:~p:~p:ok",
               [Type, Resource, Pid, Meta]),

@Vagabond
Copy link
Contributor

So, currently this isn't ideal. For riak 2.1 I plan to add the ability to have multiple log 'streams' in lager, so you could have a general logging stream, an access logging stream, a security audit log stream or whatever. This is mainly driven by the need to add audit logging to Riai, but it will be a generic lager feature.

Right now, the best thing you can do is to log at the debug level with some specific logging attribute, like you're doing with the bg_mgr_op, and then set up a trace to pull those messages into a custom backend. This does mean that if all debug logs are turned on, you'll see your messages in the other logfiles, it also is a performance penalty to leave tracing on all the time.

@buddhisthead
Copy link
Contributor Author

Ok, thanks for the update on lager status. I think the long term plan is excellent; and I can adapt background manager to use the separate streams when that becomes available. Until then, I will use the debug level approach because it will get us the functionality and performance is not the concern. The access rate to the bg-mgr API is low, so the overhead is not a problem.

If someone turns on debug for all of riak, then they'll get these messages in the console logfile too. But oh well, they'll be getting a crap-ton of messages from every subsystem as well. More likely, they would turn on debug for a specific module and not have to suffer our output.

I'll move forward with this approach. Thanks again, @Vagabond

@jrwest
Copy link
Contributor

jrwest commented Dec 13, 2013

@buddhisthead just to clarify...the tracing you are adding will be something that will can be turned on/off? and will be in what state by default?

@jrwest
Copy link
Contributor

jrwest commented Dec 13, 2013

There seems to be a good amount of duplication in the "live" query interface. Some I've found:

  • all_tokens/0, ps(token), tokens_given/0 and tokens_given(all) return the same information
  • all_locks/0 and ps(lock) are the same
  • all_resources/0 and all_given/0 are the same (even in code)

In addition it seems ps/0 has some overlap with query_resource/3, its another shorthand.

I think well-named functions over a single one w/ multiple arguments is the better approach, at least in part because the ps/0 approach places the unneeded restriction that resources cannot be named lock or token:

> riak_core_bg_manager:get_lock(lock).
{ok,#Ref<0.0.7.129424>}

> riak_core_bg_manager:get_token(a), riak_core_bg_manager:get_lock(new2).
{ok,#Ref<0.0.7.129606>}

> riak_core_bg_manager:ps().
[{bg_stat_live,a,token,<0.1926.0>,[],given},
 {bg_stat_live,lock,lock,<0.1926.0>,[],given},
 {bg_stat_live,new2,lock,<0.1926.0>,[],given}]

> riak_core_bg_manager:ps(lock).
[{bg_stat_live,lock,lock,<0.1926.0>,[],given},
 {bg_stat_live,new2,lock,<0.1926.0>,[],given}]

In addition, I would really like to see us move these queries outside of the riak_core_background_manager process, which should mostly be a matter of changing the tables to be protected, choosing the appropriate read/write_concurrency settings and changing some functions to not the need the process state. The main advantage to this is if for some crazy reason the background manager does get flooded we can still query and on the other hand an overflow of queries (say run accidentally on a too frequent timer) cannot overflow the process either.

Really, now that I see us moving towards protected tables I wonder about the choice of the table heir/handoff approach vs. just using a public table owned by riak_core_sup or some child supervisor that is a parent of riak_core_bg_manager. It is probably way to late for such a change but I'm starting to wonder about the complexity it added to this code in addition to the fact that, for the most part, implementation wise a protected ets table and a public one are the same when there is one writer and multiple readers.

@Vagabond
Copy link
Contributor

Having a lager trace installed impacts the performace of ALL of the logging subsystem, it is not related to the volume of messages that match a specific trace.

@buddhisthead
Copy link
Contributor Author

Given the current performance impact of using the trace mechanism (as described above), we'll postpone the addition of tracing/logging for a separate PR and will not use lager:trace_file() at this time.

@buddhisthead
Copy link
Contributor Author

With regard to the tables, @jrwest and I discussed the proposed changes and decided the following. We'll use protected named tables in order to allow the query functions to read the tables without going through the gen_server process. This will allow queries to succeed even if the gen_server is somehow overloaded (which is not expected, but you never know). Writes to the table are still guarded and can only be done by the gen_server process.

We'll continue to use the table manager to hand tables over to the background manager, which will still use the table id state variables to know when the table is valid.

@buddhisthead
Copy link
Contributor Author

I believe commit 57af397 should address the issue of query API consistency. Thanks for comments.

@buddhisthead
Copy link
Contributor Author

Commit 754f34a should address the splitting of query API to read the ETS table directly and not use the gen_server at all.

@buddhisthead
Copy link
Contributor Author

@jrwest This PR has been rebased off develop and squashed. Please review the changes for the tables and query API. Thanks, man!

@jrwest
Copy link
Contributor

jrwest commented Dec 17, 2013

@buddhisthead it looks like something went wrong w/ that rebase/force-push.

* The goal is to allow riak sub-systems to coordinate use of shared resources,
* e.g. protect from concurrent vnode folds on the same partition.
* Locks and tokens have a settable maximum concurrency limit.
* "Taken" locks and tokens are tracked in an ETS table.
* max_concurrency is returned when the set limits are reached.
* Processes that take locks are monitored.
* Locks are released when the taking processes terminate.
* Tokens are refreshed at a specified periodic rate.
* Token processes are not monitored because tokens never "release".
* A table manager is introduced to add persistence across process crashes,
* and to allow proper table transfer to occur without losing the table.
* An EQC test exercises the majority of the API. see test/bg_manager_eqc.erl
* See the original PR for background manager here: #364
…le is unavailable * Allows callers to try again later
* Remove history API, use unregistered to signal unavailability of bg-mgr, fix set_token_rate bug.
* Prune the query API down to a consistent set of commands and adjust tests.
* Change query API to use named ETS table directly; is not threaded through the gen_server.
@buddhisthead
Copy link
Contributor Author

@jrwest Ok, I was a dumb ass to try and rebase on develop before we were done here. I've force pushed the branch and rebased out the develop rebase. Ha. Git-fu boomerang in the back of my head. Thanks for looking.

@jrwest
Copy link
Contributor

jrwest commented Dec 20, 2013

closing because review moved to #484 by accident. i did a diff and verified they contain the same code (besides the new handoff changes in 484 of course), @buddhisthead would you mind double checking.

@jrwest jrwest closed this Dec 20, 2013
@jrwest jrwest added this to the 2.0-beta milestone Mar 24, 2014
@jrwest jrwest removed this from the 2.0 milestone Mar 24, 2014
@seancribbs seancribbs deleted the cet-bg-mgr-refactor-rebased branch April 1, 2015 23:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants