-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storagenode: unable to sync from trimmed source replica to a new replica #478
Labels
bug
Something isn't working
Comments
ijsong
added a commit
that referenced
this issue
Jun 13, 2023
…tination Storage nodes can be trimmed and synchronized. However, there is some bug in that a new destination replica joined into the log stream rejects SyncInit RPC sent from the trimmed source replica. Those replicas are all empty and have no log entries; however, the source replica has a commit context indicating the last committed LLSN. In this situation, the destination replica must accept SyncInit to receive the commit context from the source replica, but it does not. This PR fixes the above issue. To solve the problem, it changes the condition that the destination replica decides whether they are already synchronized. ```go // Previous code: https://github.com/kakao/varlog/blob/5269481c0e80c2eebf8214116a2d1544a26cb443/internal/storagenode/logstream/sync.go#L297-L302 // // NOTE: When the replica has all log entries, it returns its range of logs and non-error results. // In this case, this replica remains executorStateSealing. // Breaking change: previously it returns ErrExist when the replica has all log entries to replicate. if dstLastCommittedLLSN == srcRange.LastLLSN && !invalid { return snpb.SyncRange{}, status.Errorf(codes.AlreadyExists, "already synchronized") } ``` Since both replicas have no log entries, the condition `dstLastCommittedLLSN == srcRange.LastLLSN` is not enough. This PR changed the condition to be `dstLastCommittedLLSN == srcLastCommittedLLSN && dstLastCommittedLLSN == srcRange.LastLLSN`. Since the `srcLastCommittedLLSN` is valid regardless of log entries in the source replica, the destination replica will accept the SyncInit. Resolve #478
ijsong
added a commit
that referenced
this issue
Jun 17, 2023
…tination Storage nodes can be trimmed and synchronized. However, there is some bug in that a new destination replica joined into the log stream rejects SyncInit RPC sent from the trimmed source replica. Those replicas are all empty and have no log entries; however, the source replica has a commit context indicating the last committed LLSN. In this situation, the destination replica must accept SyncInit to receive the commit context from the source replica, but it does not. This PR fixes the above issue. To solve the problem, it changes the condition that the destination replica decides whether they are already synchronized. ```go // Previous code: https://github.com/kakao/varlog/blob/5269481c0e80c2eebf8214116a2d1544a26cb443/internal/storagenode/logstream/sync.go#L297-L302 // // NOTE: When the replica has all log entries, it returns its range of logs and non-error results. // In this case, this replica remains executorStateSealing. // Breaking change: previously it returns ErrExist when the replica has all log entries to replicate. if dstLastCommittedLLSN == srcRange.LastLLSN && !invalid { return snpb.SyncRange{}, status.Errorf(codes.AlreadyExists, "already synchronized") } ``` Since both replicas have no log entries, the condition `dstLastCommittedLLSN == srcRange.LastLLSN` is not enough. This PR changed the condition to be `dstLastCommittedLLSN == srcLastCommittedLLSN && dstLastCommittedLLSN == srcRange.LastLLSN`. Since the `srcLastCommittedLLSN` is valid regardless of log entries in the source replica, the destination replica will accept the SyncInit. Resolve #478
ijsong
added a commit
that referenced
this issue
Jun 19, 2023
…tination Storage nodes can be trimmed and synchronized. However, there is some bug in that a new destination replica joined into the log stream rejects SyncInit RPC sent from the trimmed source replica. Those replicas are all empty and have no log entries; however, the source replica has a commit context indicating the last committed LLSN. In this situation, the destination replica must accept SyncInit to receive the commit context from the source replica, but it does not. This PR fixes the above issue. To solve the problem, it changes the condition that the destination replica decides whether they are already synchronized. ```go // Previous code: https://github.com/kakao/varlog/blob/5269481c0e80c2eebf8214116a2d1544a26cb443/internal/storagenode/logstream/sync.go#L297-L302 // // NOTE: When the replica has all log entries, it returns its range of logs and non-error results. // In this case, this replica remains executorStateSealing. // Breaking change: previously it returns ErrExist when the replica has all log entries to replicate. if dstLastCommittedLLSN == srcRange.LastLLSN && !invalid { return snpb.SyncRange{}, status.Errorf(codes.AlreadyExists, "already synchronized") } ``` Since both replicas have no log entries, the condition `dstLastCommittedLLSN == srcRange.LastLLSN` is not enough. This PR changed the condition to be `dstLastCommittedLLSN == srcLastCommittedLLSN && dstLastCommittedLLSN == srcRange.LastLLSN`. Since the `srcLastCommittedLLSN` is valid regardless of log entries in the source replica, the destination replica will accept the SyncInit. Resolve #478
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Current Behavior
Log stream replica whose log entries are trimmed cannot call SyncInit to the new replica joined into the log stream.
Expected Behavior
The new log stream replica should accept SyncInit to receive the commit context from the destination replica. It will make the log stream's state SEALED, which can transform writable.
Steps To Reproduce
The log stream could not unseal because it failed to synchronize from the source replica to the destination one joined into the log stream. Although there is no log entry to copy because of the Trim operation, the CommitContext must be copied. However, the destination replica rejects SyncInit since it makes a wrong decision due to no log entry to copy.
Environment
The text was updated successfully, but these errors were encountered: