ledger-tool: Get shreds from BigTable blocks #35090

steviez · 2024-02-05T22:05:27Z

Problem

There is often a desire to examine/replay/etc blocks. If the blocks are
very recent, they can often be pulled from an actively running node.
Otherwise, the blocks must be pulled down from the warehouse node
archives. These archives are uploaded on a per-epoch basis so they are
quite large (100's of GB's). Even with a good download speed and
capable machine, it can take several hours before having access to the
block. And, 100's of GB's must be downloaded and expanded even if
access to only a single block is desired.

Summary of Changes

With the addition of Entry data to BigTable, blocks, in the form that
solana-validator and solana-ledger-tool operate with, can be recreated
from BigTable. This change add a new BigTable command that does just
that; fetch BigTable block data, parse it, and then insert the block
as shreds into a local Blockstore.

Several important callouts:

Shreds for some slot S will not have valid shred signatures; instead,
shreds will signed with a dummy keypair. This does not inhibit the
utility of these shreds by other solana-ledger-tool commands.
Entry PoH data does not go back to genesis in BigTable. While this
data could be extracted and uploaded from existing rocksdb archives,
I'm not sure if that work is planned. So, this change adds a flag that
generates mock PoH entries for such blocks. These blocks are replayable
by passing --skip-poh-verify to solana-ledger-tool commands.

steviez · 2024-02-06T17:29:38Z

ledger-tool/src/bigtable.rs

+    // TODO: parse this from CLI ?
+    let shred_version = 0;
+    // TODO: parse from CLI OR extract from genesis
+    let num_ticks_per_slot = 64;
+    // TODO: parse from CLI OR extract from Bank; tick rate changed recently
+    let num_hashes_per_tick = 12500;


Hi @CriesofCarrots - wanted to get some initial thoughts from you on these TODOs (and I guess the PR in general for context ha). Here is my current thinking:

shred_version: Make this an optional CLI flag; this isn't super critical IMO so fine to leave as optional

Also no place that we can fetch this from (don't think we want to introduce RPC calls into ledger-tool)

num_ticks_per_slot: This is currently a fixed value that is available from genesis or a Bank

num_hashes_per_tick: Until very recently, this was a fixed value that could be read from genesis. However, this value can now vary with slot so it must be determined from Bank

For num_ticks_per_slot and num_hashes_per_tick, I see two options: 1) require on CLI or 2) read from a Bank. 1) would be quicker, but more error prone. 2) will be more correct if the bank is in the same epoch as the desired slot range, but will take more time to execute as the snapshot will have to be unpacked

I'm leaning towards doing 2) so as not to introduce a foot-gun, but curious for a quick sanity check from you as well.

My thinking is that we'd extract a Bank from snapshot and use Bank helpers to confirm that the desired slot range to create shreds for is the same epoch as the Bank's slot. It could be nice to do this check before unpacking the snapshot to avoid wasted time, but we would need to re-impl some logic to determine epoch from slot ... maybe this wouldn't be so bad

I lean toward extracting those two fields from a Bank as well. The unfortunate bit is, I guess we would have to error if the starting_slot..ending_slot range extends outside that one epoch, or only shred part of the range.

shred_version... Also no place that we can fetch this from (don't think we want to introduce RPC calls into ledger-tool)

Definitely don't want to have to depend on a running node for anything. If we put more requirements on the snapshot being used (or make greater assumptions), I guess we could actually compute the shred_version from the hard_forks in the Bank, right? Not sure it's worth it, though.

lean toward extracting those two fields from a Bank as well

Cool, this seems like the correct answer, but I was feeling a little lazy so wanted a heat check. Let's get it from the bank.

I guess we could actually compute the shred_version from the hard_forks in the Bank, right? Not sure it's worth it, though

Ohh, you might be right! If we can get it from Bank easily, then I'm good with doing it that way. I bet we may have a helper for that somewhere already

willhickey · 2024-03-03T04:55:47Z

This repository is no longer in use. Please re-open this pull request in the agave repo: https://github.com/anza-xyz/agave

steviez added 3 commits February 5, 2024 15:44

ledger-tool: Add command to create shreds from bigtable data

6471823

Add fallback incase entries data does not exist

be2e052

More thought out terminology for args and help

5793608

steviez changed the title ~~ledger-tool: Get shred from BigTable data~~ ledger-tool: Get shreds from BigTable data Feb 5, 2024

steviez changed the title ~~ledger-tool: Get shreds from BigTable data~~ ledger-tool: Get shreds from BigTable blocks Feb 5, 2024

steviez commented Feb 6, 2024

View reviewed changes

github-actions bot added the stale [bot only] Added to stale content; results in auto-close after a week. label Feb 22, 2024

steviez removed the stale [bot only] Added to stale content; results in auto-close after a week. label Feb 22, 2024

willhickey closed this Mar 3, 2024

steviez mentioned this pull request Jun 6, 2024

ledger-tool: Get shreds from BigTable blocks anza-xyz/agave#1638

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ledger-tool: Get shreds from BigTable blocks #35090

ledger-tool: Get shreds from BigTable blocks #35090

steviez commented Feb 5, 2024

steviez Feb 6, 2024 •

edited

Loading

CriesofCarrots Feb 7, 2024

steviez Feb 7, 2024

willhickey commented Mar 3, 2024

ledger-tool: Get shreds from BigTable blocks #35090

ledger-tool: Get shreds from BigTable blocks #35090

Conversation

steviez commented Feb 5, 2024

Problem

Summary of Changes

steviez Feb 6, 2024 • edited Loading

Choose a reason for hiding this comment

CriesofCarrots Feb 7, 2024

Choose a reason for hiding this comment

steviez Feb 7, 2024

Choose a reason for hiding this comment

willhickey commented Mar 3, 2024

steviez Feb 6, 2024 •

edited

Loading