Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Yokozuna loses entries when YZ AAE trees expire, dealing w/ Default Bucket Types [JIRA: RIAK-1674] #481

Closed
kesslerm opened this issue Apr 8, 2015 · 6 comments
Assignees

Comments

@kesslerm
Copy link

kesslerm commented Apr 8, 2015

When AAE trees expire, Yokozuna starts to lose entries. After initially reporting identical numbers, once the YZ AAE trees have been expired the total number of entries as reported by "http://$RIAK_HOST/solr/$INDEX_NAME/select?q=:" is less than the total number of keys reported by key listing.

Steps to reproduce:

  • Set up a 5 node devrel cluster of Riak-2.0.5; add search = on and
    anti_entropy.tree.build_limit.per_timespan = 5m to riak.conf on each node, start the nodes and join them into a cluster.
  • Add the default yz schema to the test bucket
#!/bin/bash
RIAK_HOST="http://127.0.0.1:10018"

test_results_bucket_props=`curl -s "$RIAK_HOST/buckets/Test_Results/props"`
if [[ $test_results_bucket_props  =~ "index" ]]
then
  echo "Index already exists";
  exit
fi

echo "Creating index"
curl -XPUT "$RIAK_HOST/search/index/Test_Results"
sleep 10

echo "Adding index to bucket"
curl -XPUT -H "Content-Type: application/json" "$RIAK_HOST/buckets/Test_Results/props" -d '{"props":{"search_index":"Test_Results"}}'

test_results_bucket_props=`curl -s "$RIAK_HOST/buckets/Test_Results/props"`
if [[ $test_results_bucket_props  =~ "index" ]]
then
  echo "Index added";
fi
  • Add some test data
#!/bin/bash

for i in {0..4999}
do
        uuid=$(uuidgen)
        echo ${uuid}

        curl -XPUT http://127.0.0.1:10018/buckets/Test_Results/keys/${uuid} \
                -H "Content-Type: application/json" \
            -d "{\"uuid\": \"${uuid}\", \"date\": \"$(date "+%FT%T.00000:Z")\"}"
done
  • Monitor the number of hits reported by Riak and Yokozuna
#!/bin/bash
echo "solr:"
curl -XGET "http://127.0.0.1:10018/solr/Test_Results/select?wt=json&q=*:*" 2>/dev/null | json_pp | grep numFound
echo "keys in bucket:"
echo $((`curl -XGET "http://127.0.0.1:10018/buckets/Test_Results/keys?keys=true" 2>/dev/null | json_pp | wc -l` - 4))
  • Expire the YZ AAE trees. Run riak attach on one of the nodes and enter at the erlang prompt
rpc:multicall([node() | nodes()], yz_entropy_mgr, expire_trees, []).
  • Check number of keys again, until all keys have been repaired (about 1 hour with the settings used)
@Basho-JIRA Basho-JIRA changed the title Yokozuna looses entries when YZ AAE trees expire Yokozuna looses entries when YZ AAE trees expire [JIRA: RIAK-1674] Apr 8, 2015
@zeeshanlakhani
Copy link
Contributor

Thanks for this @kesslerm. Will start looking into this. Also post your notes from attempting this w/ clear_trees as well (i.e. https://github.com/basho/yokozuna/blob/develop/src/yz_entropy_mgr.erl#L125, instead of expire) as per our chat. Thanks.

@kesslerm
Copy link
Author

kesslerm commented Apr 9, 2015

@zeeshanlakhani, the behaviour with clear_trees instead of expire is exactly the same, both with the standard AAE settings as well as with the accelerated anti_entropy.tree.build_limit.per_timespan = 5m.

The number of missing YZ entries is rising steadily over the repair period; after the first repair operation at least 1 node reports a lower number while at least one node still reports the original number of entries. Later all nodes report lower numbers in YZ.

@zeeshanlakhani
Copy link
Contributor

ok, thanks @kesslerm. And, you've had the same issues w/ clear/expire w/o anti_entropy.tree.build_limit.per_timespan = 5m right? Just want to be sure on that, thanks.

@kesslerm
Copy link
Author

kesslerm commented Apr 9, 2015

@zeeshanlakhani, yes absolutely the same behaviour with default settings and anti_entropy.tree.build_limit.per_timespan = 5m. Both clear and expire show the issue.

@zeeshanlakhani zeeshanlakhani changed the title Yokozuna looses entries when YZ AAE trees expire [JIRA: RIAK-1674] Yokozuna loses entries when YZ AAE trees expire [JIRA: RIAK-1674] Apr 14, 2015
@kesslerm
Copy link
Author

We tracked this down to an incompatibility between the default bucket type and yokozuna's AAE feature. The issue has not been seen with non-default bucket types, so far. The default bucket type has allow_mult=false and dvv_enabled=false when riak is started with a default riak.conf file (as the legacy settings for these values are enforced via cuttlefish). Manually setting those values just on a given bucket under the default bucket type (not the entire bucket type) does not rectify this problem.

At this point it's safest to suggest that yokozuna with AAE enabled should not be used on the default bucket type unless the properties mentioned are set to the values all non-legacy bucket types would have. We need to investigate still if the problem occurs on non-default bucket types if one or both of the properties are changed from their default values.

@zeeshanlakhani zeeshanlakhani changed the title Yokozuna loses entries when YZ AAE trees expire [JIRA: RIAK-1674] Yokozuna loses entries when YZ AAE trees expire, dealing w/ Default Bucket Types [JIRA: RIAK-1674] Apr 20, 2015
@shino
Copy link

shino commented Jul 8, 2015

For cross reference: the fix was #486 (if wrong, please correct me)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants