-
Notifications
You must be signed in to change notification settings - Fork 4.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update upload_confirmed_blocks() return value when no blocks to upload #33861
Conversation
upload_confirmed_blocks() states that it will return the passed in ending_slot when there are no blocks to upload. This is enforced in one early return but not the other. The result is that BigTableUploadService could potentially get stuck in a loop of trying to upload the same slot. While this case seems to be caused when an operator restarts their node with a fetched snapshot that causes a gap, we can still be friendly and allow them to break out of this loop.
Codecov Report
@@ Coverage Diff @@
## master #33861 +/- ##
=========================================
- Coverage 81.9% 81.9% -0.1%
=========================================
Files 809 809
Lines 217717 217717
=========================================
- Hits 178349 178317 -32
- Misses 39368 39400 +32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like the right fix to advance past the first slot at the beginning of a gap. I confirmed using a modified solana-ledger-tool bigtable upload
with a local ledger and BigTable emulator.
The modification I had to make to ledger tool points to another change we might want to make. I was thinking of asking for it here, but I think I'll just open my own PR and explain there. :)
@@ -138,7 +138,7 @@ pub async fn upload_confirmed_blocks( | |||
"No blocks between {} and {} need to be uploaded to bigtable", | |||
starting_slot, ending_slot | |||
); | |||
return Ok(last_blockstore_slot); | |||
return Ok(ending_slot); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sigh kind of obvious, given the vars in the message immediately above. Thanks for tracking this down!
What do you think about backports for this one @CriesofCarrots ? It is a bug and fairly self-contained, but also, nodes running |
Yeah, I'm agreed with just v1.17 |
#33861) upload_confirmed_blocks() states that it will return the passed in ending_slot when there are no blocks to upload. This is enforced in one early return but not the other. The result is that BigTableUploadService could potentially get stuck in a loop of trying to upload the same slot. While this case seems to be caused when an operator restarts their node without --no-snapshot-fetch (which can cause a gap in blockstore), we can still be friendly and allow them to break out of this loop. (cherry picked from commit a799a90)
…o upload (backport of solana-labs#33861) (solana-labs#33882)
…o upload (backport of solana-labs#33861) (solana-labs#33882)
Problem
See #33831 (comment) for a detailed walk-through.
Summary of Changes
While this problem of looping on the same slot seems to be caused by an operator restarting their warehouse node without
--no-snapshot-fetch
(which can cause a gap in the blockstore), we can still be friendly and break out of this loopFixes #33831