Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(storagenode): accept SyncInit sent from trimmed source to new destination #470

Merged
merged 2 commits into from
Jun 19, 2023

Conversation

ijsong
Copy link
Member

@ijsong ijsong commented Jun 8, 2023

What this PR does

Storage nodes can be trimmed and synchronized. However, there is some bug in that a new destination
replica joined into the log stream rejects SyncInit RPC sent from the trimmed source replica. Those
replicas are all empty and have no log entries; however, the source replica has a commit context
indicating the last committed LLSN. In this situation, the destination replica must accept SyncInit
to receive the commit context from the source replica, but it does not.

This PR fixes the above issue. To solve the problem, it changes the condition that the destination
replica decides whether they are already synchronized.

    // Previous code: https://github.com/kakao/varlog/blob/5269481c0e80c2eebf8214116a2d1544a26cb443/internal/storagenode/logstream/sync.go#L297-L302
    //
    // NOTE: When the replica has all log entries, it returns its range of logs and non-error results.
    // In this case, this replica remains executorStateSealing.
    // Breaking change: previously it returns ErrExist when the replica has all log entries to replicate.
    if dstLastCommittedLLSN == srcRange.LastLLSN && !invalid {
        return snpb.SyncRange{}, status.Errorf(codes.AlreadyExists, "already synchronized")
    }

Since both replicas have no log entries, the condition dstLastCommittedLLSN == srcRange.LastLLSN
is not enough. This PR changed the condition to be dstLastCommittedLLSN == srcLastCommittedLLSN && dstLastCommittedLLSN == srcRange.LastLLSN. Since the srcLastCommittedLLSN is valid regardless of
log entries in the source replica, the destination replica will accept the SyncInit.

Which issue(s) this PR resolves

Resolves #478

@ijsong
Copy link
Member Author

ijsong commented Jun 8, 2023

Current dependencies on/for this PR:

This comment was auto-generated by Graphite.

@codecov-commenter
Copy link

codecov-commenter commented Jun 8, 2023

Codecov Report

Patch coverage: 48.38% and project coverage change: +0.04 🎉

Comparison is base (6db401a) 62.30% compared to head (110ef19) 62.35%.

❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #470      +/-   ##
==========================================
+ Coverage   62.30%   62.35%   +0.04%     
==========================================
  Files         133      133              
  Lines       18415    18434      +19     
==========================================
+ Hits        11473    11494      +21     
- Misses       6363     6369       +6     
+ Partials      579      571       -8     
Impacted Files Coverage Δ
internal/storagenode/logstream/testing.go 54.38% <0.00%> (-6.40%) ⬇️
internal/storagenode/replication_server.go 73.94% <0.00%> (-4.87%) ⬇️
internal/storagenode/logstream/sync.go 56.98% <55.55%> (-0.42%) ⬇️

... and 3 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@ijsong ijsong self-assigned this Jun 9, 2023
@ijsong ijsong force-pushed the fix_syncinit_from_trimmed_source branch from 580fa72 to ecb0a2c Compare June 13, 2023 01:30
@ijsong ijsong marked this pull request as ready for review June 13, 2023 01:30
@ijsong ijsong requested a review from hungryjang as a code owner June 13, 2023 01:30
…tination

Storage nodes can be trimmed and synchronized. However, there is some bug in that a new destination
replica joined into the log stream rejects SyncInit RPC sent from the trimmed source replica. Those
replicas are all empty and have no log entries; however, the source replica has a commit context
indicating the last committed LLSN. In this situation, the destination replica must accept SyncInit
to receive the commit context from the source replica, but it does not.

This PR fixes the above issue. To solve the problem, it changes the condition that the destination
replica decides whether they are already synchronized.

```go
    // Previous code: https://github.com/kakao/varlog/blob/5269481c0e80c2eebf8214116a2d1544a26cb443/internal/storagenode/logstream/sync.go#L297-L302
    //
    // NOTE: When the replica has all log entries, it returns its range of logs and non-error results.
    // In this case, this replica remains executorStateSealing.
    // Breaking change: previously it returns ErrExist when the replica has all log entries to replicate.
    if dstLastCommittedLLSN == srcRange.LastLLSN && !invalid {
        return snpb.SyncRange{}, status.Errorf(codes.AlreadyExists, "already synchronized")
    }
```

Since both replicas have no log entries, the condition `dstLastCommittedLLSN == srcRange.LastLLSN`
is not enough. This PR changed the condition to be `dstLastCommittedLLSN == srcLastCommittedLLSN &&
dstLastCommittedLLSN == srcRange.LastLLSN`. Since the `srcLastCommittedLLSN` is valid regardless of
log entries in the source replica, the destination replica will accept the SyncInit.

Resolve #478
@ijsong ijsong force-pushed the fix_syncinit_from_trimmed_source branch from ecb0a2c to 5b24f99 Compare June 17, 2023 03:35
@ijsong
Copy link
Member Author

ijsong commented Jun 19, 2023

@hungryjang, I added a follow-up commit, 110ef19. It makes SyncInit more obvious to check the need for synchronization.

@ijsong ijsong merged commit 66664a6 into main Jun 19, 2023
@ijsong ijsong deleted the fix_syncinit_from_trimmed_source branch June 19, 2023 06:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

storagenode: unable to sync from trimmed source replica to a new replica
3 participants