Skip to content
This repository has been archived by the owner on Jan 22, 2025. It is now read-only.

ledger-tool: Get shreds from BigTable blocks #35090

Closed
wants to merge 3 commits into from

Conversation

steviez
Copy link
Contributor

@steviez steviez commented Feb 5, 2024

Problem

There is often a desire to examine/replay/etc blocks. If the blocks are
very recent, they can often be pulled from an actively running node.
Otherwise, the blocks must be pulled down from the warehouse node
archives. These archives are uploaded on a per-epoch basis so they are
quite large (100's of GB's). Even with a good download speed and
capable machine, it can take several hours before having access to the
block. And, 100's of GB's must be downloaded and expanded even if
access to only a single block is desired.

Summary of Changes

With the addition of Entry data to BigTable, blocks, in the form that
solana-validator and solana-ledger-tool operate with, can be recreated
from BigTable. This change add a new BigTable command that does just
that; fetch BigTable block data, parse it, and then insert the block
as shreds into a local Blockstore.

Several important callouts:

  • Shreds for some slot S will not have valid shred signatures; instead,
    shreds will signed with a dummy keypair. This does not inhibit the
    utility of these shreds by other solana-ledger-tool commands.
  • Entry PoH data does not go back to genesis in BigTable. While this
    data could be extracted and uploaded from existing rocksdb archives,
    I'm not sure if that work is planned. So, this change adds a flag that
    generates mock PoH entries for such blocks. These blocks are replayable
    by passing --skip-poh-verify to solana-ledger-tool commands.

@steviez steviez changed the title ledger-tool: Get shred from BigTable data ledger-tool: Get shreds from BigTable data Feb 5, 2024
@steviez steviez changed the title ledger-tool: Get shreds from BigTable data ledger-tool: Get shreds from BigTable blocks Feb 5, 2024
Comment on lines +203 to +208
// TODO: parse this from CLI ?
let shred_version = 0;
// TODO: parse from CLI OR extract from genesis
let num_ticks_per_slot = 64;
// TODO: parse from CLI OR extract from Bank; tick rate changed recently
let num_hashes_per_tick = 12500;
Copy link
Contributor Author

@steviez steviez Feb 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @CriesofCarrots - wanted to get some initial thoughts from you on these TODOs (and I guess the PR in general for context ha). Here is my current thinking:

  • shred_version: Make this an optional CLI flag; this isn't super critical IMO so fine to leave as optional
    • Also no place that we can fetch this from (don't think we want to introduce RPC calls into ledger-tool)
  • num_ticks_per_slot: This is currently a fixed value that is available from genesis or a Bank
  • num_hashes_per_tick: Until very recently, this was a fixed value that could be read from genesis. However, this value can now vary with slot so it must be determined from Bank

For num_ticks_per_slot and num_hashes_per_tick, I see two options: 1) require on CLI or 2) read from a Bank. 1) would be quicker, but more error prone. 2) will be more correct if the bank is in the same epoch as the desired slot range, but will take more time to execute as the snapshot will have to be unpacked

I'm leaning towards doing 2) so as not to introduce a foot-gun, but curious for a quick sanity check from you as well.

My thinking is that we'd extract a Bank from snapshot and use Bank helpers to confirm that the desired slot range to create shreds for is the same epoch as the Bank's slot. It could be nice to do this check before unpacking the snapshot to avoid wasted time, but we would need to re-impl some logic to determine epoch from slot ... maybe this wouldn't be so bad

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I lean toward extracting those two fields from a Bank as well. The unfortunate bit is, I guess we would have to error if the starting_slot..ending_slot range extends outside that one epoch, or only shred part of the range.

shred_version... Also no place that we can fetch this from (don't think we want to introduce RPC calls into ledger-tool)

Definitely don't want to have to depend on a running node for anything. If we put more requirements on the snapshot being used (or make greater assumptions), I guess we could actually compute the shred_version from the hard_forks in the Bank, right? Not sure it's worth it, though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lean toward extracting those two fields from a Bank as well

Cool, this seems like the correct answer, but I was feeling a little lazy so wanted a heat check. Let's get it from the bank.

I guess we could actually compute the shred_version from the hard_forks in the Bank, right? Not sure it's worth it, though

Ohh, you might be right! If we can get it from Bank easily, then I'm good with doing it that way. I bet we may have a helper for that somewhere already

@github-actions github-actions bot added the stale [bot only] Added to stale content; results in auto-close after a week. label Feb 22, 2024
@steviez steviez removed the stale [bot only] Added to stale content; results in auto-close after a week. label Feb 22, 2024
@willhickey
Copy link
Contributor

This repository is no longer in use. Please re-open this pull request in the agave repo: https://github.com/anza-xyz/agave

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants