-
Notifications
You must be signed in to change notification settings - Fork 233
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added Mutators to support changing object on read/write #661
Conversation
%% <code><b>mutate_put(MetaData, Value, ExposedMeta, | ||
%% FullObject, BucketProperties) -> Result</b></code> | ||
%% | ||
%% Types: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think edoc can generate all this from dialyzer -type and -callback specs now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking into this (and trying it as well), edoc lists the functions but doesn't give type information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if you did like -type metadata() :: dict(). and then used metadata() it'd show them better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to define better. I'd love to be able to have it automatically use the correct types, but that's a ways out. Adding links to types is a next step, but would it be better inline (in the code element) or in the expanded area?
The tests fail for me, presumably because of changes to cluster metadata:
|
I'm mostly OK with this change, although I think @jtuple was also going to look at it. |
To achieve reduced replications, a mutation of the object on put and get needs to be applied, and needs to be loosely coupled to the systems that will be doing the mutating (repl). This module provides a place to register and run mutators on puts and gets.
Rather than try to manage the list itself, it leans heavily on the metadata system to manage diferences and storage of the data.
This means making mutators is safer as they are expected to be symetrical. A get mutator is only called if the object in question was put through the corresponding put mutator.
The get in kv_vnode does not have the bucket properties available, so to avoid additional slow-down, the get's no longer require bucket properties.
Reliable tests remain reliable.
Assuming the mutator module is correct*, mutations on get are applied in the reverse order from put. A mutator module is "correct" if the get is the reverse function of a put. All the data a mutator needs to reverse a given mutation must be put into the object to be able to do this.
Currently defualt is 0. There's no restriction on what a priority can be since all erlang terms can be compared in < and >.
Callback modules can now be set to only worry about a given meta/value pair at a time, but also gets full object. Put can expose some meta data to the putting client that will be different from the stored data.
It become needed for a mutator to know which specific meta/value pair it was working on. Rather than extend the function, I've opted to just give the callback the object directly.
Hold over from when the eunit tests were at the bottom of the module. As they are not, this should have been removed.
More than what, but some why too!
Tests should pass without warnings now.
Eunit tests should pass now. Also, there is now a riak_test pull request that enhances the tests further; it attempts to stress the interaction between aae and the replication mutator: basho/riak_test#392 |
%% data storage backend is not meant to be the object given. An | ||
%% example would be storing only meta data for an object on a remote | ||
%% cluster. This module is an interface to register mutators that will | ||
%% can be run. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can haz this sentence fixed next time you're in the file?
% {Meta, Value, Exposed} values, the get callback gets away with just | ||
% giving the object? This is to avoid complicated interaction with | ||
% notfound return. | ||
case Mutator:mutate_get(Object) of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Notice that this makes it possible to invalidate all the data in the cluster if a mutator stored in object metadata is removed from the code. Users of this feature will have to be careful about some downgrade scenarios where data is written with a mutator in the list than then disappears on downgrade.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we just leave this to documentation, or should there be some insurance put in the code here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A note in the documentation would be enough for now. It's good to flag the worse case scenarios.
Added a similar warning in 4 places concerning the same thing. If an object is stored using a given mutator, that same mutator is used when the object is retreived. If that code isn't available or has changed in a backwards incompatible way, one can expect to have a bad time.
end, | ||
ModulesAndPriors = riak_core_metadata:get({riak_kv, mutators}, list, [{default, []}, {resolver, Resolver}]), | ||
Flipped = [{P, M} || {M, P} <- ModulesAndPriors], | ||
Sorted = lists:sort(Flipped), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A sort in the critical path could become an issue with multiple mutators. I know we are not planning to add those soon, but it seems that the resolution and merge functions could take care of this directly and store values in sorted order, no?
While writing my own test mutators I was surprised by how they can be called twice on the same object (once in the local put, and the result of that is passed to the other vnodes in the preflist). Unless the application of the mutator chain is completely idempotent, it will be very hard to reason about. I had tried to do a simple "make value a list of two values" and a "make a list a tuple" mutator and put them in a chain, but the results end up just crazy. The first one went from a value of 1 to {1, 1}, whereas the replicas ended up with {{1,1}, {1,1}}. I'm leaving this here as a note of the headache that it could be to use this feature in a truly generic fashion. |
I'm going to +1 this PR with the following caveats:
The code as is should be enough to support the Repl use case. |
Instead of storing a list of mutators as {mutator, priority}, then doing a sort on every retrieval of the list of mutators, the mutators get stored {Priority, Module}, so the list just gets the priority stripped off (faster then a sort). This means there's a sort done in the following cases: - registering or unregistering a module. There's a check if the module existed, and it's overwrittne if it did. - there are sibling values in metadata. A functionaly equal merge is done on those values. Hopefully that is a short-lived situation.
{P1, _} when P1 < Priority -> | ||
Acc; | ||
Else -> | ||
ordsets:all_element(Mutator, ordsets:del_element(Else)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't even run it, but you surely meant add_element here, not all_element.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
D'oh!
-spec get() -> [atom()]. | ||
get() -> | ||
Resolver = fun | ||
('$deleted', '$deleted') -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A note for @jrwest: it seems that we should have the metadata tombstone in a public header or is there a way for resolvers not to use this atom directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hrm thats a good point. we certainly want to expose tombstones to resolvers (or at least have the option to) but leaking this is a bit nasty. a macro would probably be quickest, but i wonder if there is a better way...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for this code specifically, does it even call riak_core_metadata:delete
? If not, don't really need to have these here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well no, but I think it's worth handling anyway for our convenience at the very least. I used delete while testing it when the format was changed, for example.
I've looked over the sort change and tested manually. After the one issue I found was resolved, it looks good. Re +1 on this. |
Added Mutators to support changing object on read/write
%% a crash. | ||
-spec mutate_get(Object :: riak_object:riak_object()) -> riak_object:riak_object(). | ||
mutate_get(Object) -> | ||
[Meta | _] = riak_object:get_metadatas(Object), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this right? What about siblings with no mutators in their metadata being higher up the list?
This is to support basho/riak_repl#223 .
This implements a 'mutator' interface. A mutator callback module can alter an object to be stored and after it has been retrieved from a storage backend.
Note this has been recently rebased and force-pushed (sept 26) to account for changes from develop.