CL.ONE (read from followers) serving reads before node is fully caught up #456

rameshpoti · 2018-08-31T00:03:37Z

CassandraKeyValue sample app in Read only load and when a node is restarted seeing the following error when the node joins back to the cluster.

2018-08-30 23:48:40,389 [FATAL|com.yugabyte.sample.apps.CassandraKeyValue|CassandraKeyValue] Read key: key:33191 expected 1 row in result, got 0

Initial assumption : Until a "follower" is caught up replaying its local RAFT log, it shouldn't serve reads. Until then it should just redirect to one of the other two peers. (edited)

kmuthukk · 2018-08-31T00:11:02Z

Thanks @rameshpoti for reporting this issue. Will keep you posted once the fix is available.

amitanandaiyer · 2018-09-12T18:07:20Z

This seems to be caused by the follower not resetting safe_time_to_read correctly.

We seem to have been setting safe_time_to_read based on the flushed sst files. Not accounting for the committed entries in rocksdb that haven't been flushed.

But, the committed entries are present in the Write Ahead Log and replayed upon restart. The fix is to update the initial safe_time_to_read based on the applied entries from WAL in addition to the flushed files.

Summary: Currently we update local safe time based on the written SST files. But fail to do so for WAL records. This may cause a follower -- who has restarted -- to miss those entries, until he reconnects with the Leader. This fix is to update the safe time based on entries in the log that are known to be committed. For entries that are in the log, that cannot be determined to have committed (because there are no further log entries telling us they are); the follower will have to wait to hear from the leader before serving them. Test Plan: added unit test Reviewers: mikhail, kannan, sergei Reviewed By: sergei Subscribers: ybase Differential Revision: https://phabricator.dev.yugabyte.com/D5442

kmuthukk added the kind/bug This issue is a bug label Aug 31, 2018

amitanandaiyer self-assigned this Aug 31, 2018

amitanandaiyer closed this as completed Sep 18, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CL.ONE (read from followers) serving reads before node is fully caught up #456

CL.ONE (read from followers) serving reads before node is fully caught up #456

rameshpoti commented Aug 31, 2018

kmuthukk commented Aug 31, 2018

amitanandaiyer commented Sep 12, 2018

CL.ONE (read from followers) serving reads before node is fully caught up #456

CL.ONE (read from followers) serving reads before node is fully caught up #456

Comments

rameshpoti commented Aug 31, 2018

kmuthukk commented Aug 31, 2018

amitanandaiyer commented Sep 12, 2018