Added Mutators to support changing object on read/write #661

lordnull · 2013-09-16T18:33:29Z

This is to support basho/riak_repl#223 .

This implements a 'mutator' interface. A mutator callback module can alter an object to be stored and after it has been retrieved from a storage backend.

Note this has been recently rebased and force-pushed (sept 26) to account for changes from develop.

Vagabond · 2013-09-23T20:11:01Z

src/riak_kv_mutator.erl

+%% <code><b>mutate_put(MetaData, Value, ExposedMeta,
+%% FullObject, BucketProperties) -> Result</b></code>
+%%
+%% Types:


I think edoc can generate all this from dialyzer -type and -callback specs now.

Looking into this (and trying it as well), edoc lists the functions but doesn't give type information.

I think if you did like -type metadata() :: dict(). and then used metadata() it'd show them better?

I think we need to define better. I'd love to be able to have it automatically use the correct types, but that's a ways out. Adding links to types is a next step, but would it be better inline (in the code element) or in the expanded area?

Vagabond · 2013-09-24T19:39:09Z

The tests fail for me, presumably because of changes to cluster metadata:

=ERROR REPORT==== 24-Sep-2013::15:38:43 ===
** Generic server riak_core_metadata_manager terminating
** Last message in was {put,{{riak_kv,mutators},list},
                            [],#Fun<riak_kv_mutator.0.123063518>}
** When Server state == {state,nonode@nohost,"test_data",876565}
** Reason for termination ==
** {noproc,{gen_server,call,
                       [riak_core_metadata_hashtree,
                        {insert,{{riak_kv,mutators},list},
                                <<0,170,139,16,5,28,164,71,218,221,64,20,238,
                                  83,39,191,48,160,195,137>>,
                                false},
                        infinity]}}

Vagabond · 2013-09-24T20:03:04Z

I'm mostly OK with this change, although I think @jtuple was also going to look at it.

To achieve reduced replications, a mutation of the object on put and get needs to be applied, and needs to be loosely coupled to the systems that will be doing the mutating (repl). This module provides a place to register and run mutators on puts and gets.

Rather than try to manage the list itself, it leans heavily on the metadata system to manage diferences and storage of the data.

This means making mutators is safer as they are expected to be symetrical. A get mutator is only called if the object in question was put through the corresponding put mutator.

The get in kv_vnode does not have the bucket properties available, so to avoid additional slow-down, the get's no longer require bucket properties.

Reliable tests remain reliable.

Assuming the mutator module is correct*, mutations on get are applied in the reverse order from put. A mutator module is "correct" if the get is the reverse function of a put. All the data a mutator needs to reverse a given mutation must be put into the object to be able to do this.

Currently defualt is 0. There's no restriction on what a priority can be since all erlang terms can be compared in < and >.

Callback modules can now be set to only worry about a given meta/value pair at a time, but also gets full object. Put can expose some meta data to the putting client that will be different from the stored data.

It become needed for a mutator to know which specific meta/value pair it was working on. Rather than extend the function, I've opted to just give the callback the object directly.

Hold over from when the eunit tests were at the bottom of the module. As they are not, this should have been removed.

More than what, but some why too!

Tests should pass without warnings now.

lordnull · 2013-09-26T20:06:49Z

Eunit tests should pass now.

Also, there is now a riak_test pull request that enhances the tests further; it attempts to stress the interaction between aae and the replication mutator: basho/riak_test#392

engelsanchez · 2013-10-03T14:00:31Z

src/riak_kv_mutator.erl

+%% data storage backend is not meant to be the object given. An
+%% example would be storing only meta data for an object on a remote
+%% cluster. This module is an interface to register mutators that will
+%% can be run.


I can haz this sentence fixed next time you're in the file?

engelsanchez · 2013-10-03T15:35:34Z

src/riak_kv_mutator.erl

+    % {Meta, Value, Exposed} values, the get callback gets away with just
+    % giving the object? This is to avoid complicated interaction with
+    % notfound return.
+    case Mutator:mutate_get(Object) of


Notice that this makes it possible to invalidate all the data in the cluster if a mutator stored in object metadata is removed from the code. Users of this feature will have to be careful about some downgrade scenarios where data is written with a mutator in the list than then disappears on downgrade.

Do we just leave this to documentation, or should there be some insurance put in the code here?

A note in the documentation would be enough for now. It's good to flag the worse case scenarios.

Added a similar warning in 4 places concerning the same thing. If an object is stored using a given mutator, that same mutator is used when the object is retreived. If that code isn't available or has changed in a backwards incompatible way, one can expect to have a bad time.

engelsanchez · 2013-10-03T17:57:18Z

src/riak_kv_mutator.erl

+    end,
+    ModulesAndPriors = riak_core_metadata:get({riak_kv, mutators}, list, [{default, []}, {resolver, Resolver}]),
+    Flipped = [{P, M} || {M, P} <- ModulesAndPriors],
+    Sorted = lists:sort(Flipped),


A sort in the critical path could become an issue with multiple mutators. I know we are not planning to add those soon, but it seems that the resolution and merge functions could take care of this directly and store values in sorted order, no?

engelsanchez · 2013-10-03T19:30:45Z

While writing my own test mutators I was surprised by how they can be called twice on the same object (once in the local put, and the result of that is passed to the other vnodes in the preflist). Unless the application of the mutator chain is completely idempotent, it will be very hard to reason about. I had tried to do a simple "make value a list of two values" and a "make a list a tuple" mutator and put them in a chain, but the results end up just crazy. The first one went from a value of 1 to {1, 1}, whereas the replicas ended up with {{1,1}, {1,1}}. I'm leaving this here as a note of the headache that it could be to use this feature in a truly generic fashion.

engelsanchez · 2013-10-03T19:35:25Z

I'm going to +1 this PR with the following caveats:

We need to make sure we revisit and review performance before release. Part of my concerns will probably be addressed in the cluster metadata code by removing gen_server calls. The sort on get should be targeted for removal if resolution can create sorted lists without issue.
I have my doubts that the code will survive as is when a second use case arrives. Of course, it's impossible to foresee what that may be.

The code as is should be enough to support the Repl use case.

Instead of storing a list of mutators as {mutator, priority}, then doing a sort on every retrieval of the list of mutators, the mutators get stored {Priority, Module}, so the list just gets the priority stripped off (faster then a sort). This means there's a sort done in the following cases: - registering or unregistering a module. There's a check if the module existed, and it's overwrittne if it did. - there are sibling values in metadata. A functionaly equal merge is done on those values. Hopefully that is a short-lived situation.

engelsanchez · 2013-10-03T21:16:34Z

src/riak_kv_mutator.erl

+        {P1, _} when P1 < Priority ->
+            Acc;
+        Else ->
+            ordsets:all_element(Mutator, ordsets:del_element(Else))


I haven't even run it, but you surely meant add_element here, not all_element.

engelsanchez · 2013-10-03T22:19:57Z

src/riak_kv_mutator.erl

+-spec get() -> [atom()].
+get() ->
+    Resolver = fun
+        ('$deleted', '$deleted') ->


A note for @jrwest: it seems that we should have the metadata tombstone in a public header or is there a way for resolvers not to use this atom directly?

hrm thats a good point. we certainly want to expose tombstones to resolvers (or at least have the option to) but leaking this is a bit nasty. a macro would probably be quickest, but i wonder if there is a better way...

for this code specifically, does it even call riak_core_metadata:delete? If not, don't really need to have these here.

Well no, but I think it's worth handling anyway for our convenience at the very least. I used delete while testing it when the format was changed, for example.

engelsanchez · 2013-10-03T22:24:00Z

I've looked over the sort change and tested manually. After the one issue I found was resolved, it looks good. Re +1 on this.

Added Mutators to support changing object on read/write

russelldb · 2014-01-03T15:59:52Z

src/riak_kv_mutator.erl

+%% a crash.
+-spec mutate_get(Object :: riak_object:riak_object()) -> riak_object:riak_object().
+mutate_get(Object) ->
+    [Meta | _] = riak_object:get_metadatas(Object),


Is this right? What about siblings with no mutators in their metadata being higher up the list?

buddhisthead mentioned this pull request Sep 20, 2013

Riak CS bucket replication controls basho/riak_repl#223

Closed

4 tasks

Vagabond reviewed Sep 23, 2013
View reviewed changes

lordnull added 22 commits September 25, 2013 15:25

Refactored to use the new metadata system.

a62a352

Rather than try to manage the list itself, it leans heavily on the metadata system to manage diferences and storage of the data.

Added test for when no mutators ever registered.

e3832e5

Made get mutation dependant on a put mutation.

6369d64

This means making mutators is safer as they are expected to be symetrical. A get mutator is only called if the object in question was put through the corresponding put mutator.

Gets mutators no longer get bucket properties.

4a23881

The get in kv_vnode does not have the bucket properties available, so to avoid additional slow-down, the get's no longer require bucket properties.

Hooked meta data into get/put path and tests.

087bd33

Reliable tests remain reliable.

added priorities to mutators.

1580dfa

Currently defualt is 0. There's no restriction on what a priority can be since all erlang terms can be compared in < and >.

Fixed typo

9fb7fa5

Refactored / changed mutator to be more resilient.

fc4ca3c

Callback modules can now be set to only worry about a given meta/value pair at a time, but also gets full object. Put can expose some meta data to the putting client that will be different from the stored data.

Fixed vnode call to mutator system.

22b9ed8

refactored get mutation again.

548974b

It become needed for a mutator to know which specific meta/value pair it was working on. Rather than extend the function, I've opted to just give the callback the object directly.

Get mutators can now (legitimately) return notfound

b9c9a7f

Allowed mutators to use notfound as a response on get.

3355c87

Much documentation for mutation.

727a8b2

Fixed resolver for mutators.

f89adf0

Removed eunit test hrl inclusion

f93f53d

Hold over from when the eunit tests were at the bottom of the module. As they are not, this should have been removed.

Added comment about conflict resultion on mutators

4255be2

Fixed typo.

d072919

Added comments and documentation

7595fa9

More than what, but some why too!

Added capability declaration

ac0795a

Fixed setup/teardown to reflect metadata requirements

95c67c4

Tests should pass without warnings now.

engelsanchez reviewed Oct 3, 2013
View reviewed changes

Fixed grammatical errors in docs.

6f26412

engelsanchez reviewed Oct 3, 2013
View reviewed changes

Fixed typo

4a45469

engelsanchez reviewed Oct 3, 2013
View reviewed changes

lordnull added a commit that referenced this pull request Oct 4, 2013

Merge pull request #661 from basho/mw-reduced-repl

ffb48c8

Added Mutators to support changing object on read/write

lordnull merged commit ffb48c8 into develop Oct 4, 2013

russelldb reviewed Jan 3, 2014
View reviewed changes

seancribbs deleted the mw-reduced-repl branch April 1, 2015 23:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added Mutators to support changing object on read/write #661

Added Mutators to support changing object on read/write #661

lordnull commented Sep 16, 2013

Vagabond Sep 23, 2013

lordnull Sep 23, 2013

Vagabond Sep 23, 2013

lordnull Sep 23, 2013

Vagabond commented Sep 24, 2013

Vagabond commented Sep 24, 2013

lordnull commented Sep 26, 2013

engelsanchez Oct 3, 2013

engelsanchez Oct 3, 2013

lordnull Oct 3, 2013

engelsanchez Oct 3, 2013

engelsanchez Oct 3, 2013

engelsanchez commented Oct 3, 2013

engelsanchez commented Oct 3, 2013

engelsanchez Oct 3, 2013

lordnull Oct 3, 2013

engelsanchez Oct 3, 2013

jrwest Oct 3, 2013

jrwest Oct 3, 2013

engelsanchez Oct 3, 2013

engelsanchez commented Oct 3, 2013

russelldb Jan 3, 2014

Added Mutators to support changing object on read/write #661

Added Mutators to support changing object on read/write #661

Conversation

lordnull commented Sep 16, 2013

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Vagabond commented Sep 24, 2013

Vagabond commented Sep 24, 2013

lordnull commented Sep 26, 2013

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

engelsanchez commented Oct 3, 2013

engelsanchez commented Oct 3, 2013

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

engelsanchez commented Oct 3, 2013

Choose a reason for hiding this comment