You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Initial assumption : Until a "follower" is caught up replaying its local RAFT log, it shouldn't serve reads. Until then it should just redirect to one of the other two peers. (edited)
The text was updated successfully, but these errors were encountered:
This seems to be caused by the follower not resetting safe_time_to_read correctly.
We seem to have been setting safe_time_to_read based on the flushed sst files. Not accounting for the committed entries in rocksdb that haven't been flushed.
But, the committed entries are present in the Write Ahead Log and replayed upon restart. The fix is to update the initial safe_time_to_read based on the applied entries from WAL in addition to the flushed files.
Summary:
Currently we update local safe time based on the written SST files. But fail to do so for WAL records. This may cause a follower -- who has restarted -- to miss those entries, until he reconnects with the Leader.
This fix is to update the safe time based on entries in the log that are known to be committed.
For entries that are in the log, that cannot be determined to have committed (because there are no further log entries telling us they are); the follower will have to wait to hear from the leader before serving them.
Test Plan: added unit test
Reviewers: mikhail, kannan, sergei
Reviewed By: sergei
Subscribers: ybase
Differential Revision: https://phabricator.dev.yugabyte.com/D5442
CassandraKeyValue sample app in Read only load and when a node is restarted seeing the following error when the node joins back to the cluster.
2018-08-30 23:48:40,389 [FATAL|com.yugabyte.sample.apps.CassandraKeyValue|CassandraKeyValue] Read key: key:33191 expected 1 row in result, got 0
Initial assumption : Until a "follower" is caught up replaying its local RAFT log, it shouldn't serve reads. Until then it should just redirect to one of the other two peers. (edited)
The text was updated successfully, but these errors were encountered: