Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

backfill sync from an anchor checkpoint state #3384

Merged
merged 32 commits into from
Jan 6, 2022

Conversation

g11tech
Copy link
Contributor

@g11tech g11tech commented Oct 21, 2021

Motivation
This PR is to backfill in a scenario when wss checkpoint or any other anchor state is used to start the beacon node.

Description
Right now, this PR rebases #2637 on current master, making it workable base to build on top of the mentioned previous PR

Fixes:

  • Don't reject the entire syncrange sequence if a block in mid is missing making it a non linear
  • Fixe the scenario where the syncrange keeps on resulting in a non anchored block at the end, i.e. end block is orphaned, and the previous head could be multi blocks behind, likely scenario in reorgs. Resolution, load the previous head in a solo request to set it as a new anchor
  • before rangeSync or sync from outside, check the DB as DB could have had blocks saved from previous run, no need to verify their signatures and blocks in db should already be verified.
  • Do a checkpoint check if a wss checkpoint has been provided which was in the past from the initialized beacon state (from DB because of a previous run). In case not provided compare against genesis when at genesis and panic if not same.
    Created a backfilledSequences repository module, which tracks backfilled sequences like 384:key=> 211:value i.e. backfilled from slot 384 to 211 both including, in normal operation will typically store two sequences, one used by the backfill process and one used by the forward moving chain (which will upon finalizing will write a new sequence finalizedSlot=>chainStarted/anchorSlot). Any gaps in the db because of restarting with checkpoint sync will result into a new sequence entry, but the previous entry will be eliminated when the backfill syncs up the gap
  • Strategy to skip already filled sequence (needs a strategy to track gaps or filled sequences)
  • InvalidWSCheckpoint error (investigate: weak subjectivity checkpoint improper use #3385)
  • Schedule the processing/verification when the chain is supposedly free throttling chain by BATCH_SIZE in a single trigger cycle (in db reads or syncRange fetches)
  • grafana dashboard

skipBackFillSync option?
backfillTill: options i) till wsCheckpoint to be validated, ii)till GENESIS_SLOT (default) iii)till provided Slot

Tested Scenarios:

  • sync test case: sync/wss.test.ts
    image
  • prater backfill sync to genesis (despite multiple restarts interrupting the sync, recovering using backfilled Sequences)
    image
  • stopping and starting multiple times creating gaps in the back filled sequences, backfiled sync filled the gaps and combined them with backfilled sequences to avoid redoing previous backfill sync.
    example of backfillSync recovering on a restart using previous backfilled sequence (jumped back from 1633696 to 1437564, and then continuing backfilling from there on:
    image
  • recovering from a persistent non anchored sequence in range sync (because of the orphaned block being served at the edge of range sync)
    image
  • recovering the backfilled to genesis state on a fully backfilled node restart
    image
  • stopped a fully backfilled node and restarted after a gap with new recent checkpoint - backfilled from the new recent checkpoint to previous finalized checkpoint when the node was stopped, and then correlated with already backfilled sequence till the GENESIS to switch to backfilled complete state
    image

Closes #2556

@codeclimate
Copy link

codeclimate bot commented Oct 21, 2021

Code Climate has analyzed commit f9c977a and detected 12 issues on this pull request.

Here's the issue category breakdown:

Category Count
Complexity 12

View more on Code Climate.

@g11tech g11tech changed the title Rebased mpetrunic's PR #2637 with fixes on current master backfill sync from an anchored state Oct 21, 2021
@codecov
Copy link

codecov bot commented Oct 21, 2021

Codecov Report

Merging #3384 (268f6cc) into master (f49aa8e) will decrease coverage by 0.07%.
The diff coverage is n/a.

❗ Current head 268f6cc differs from pull request most recent head 42a0a9b. Consider uploading reports for the commit 42a0a9b to get more accurate results

@@            Coverage Diff             @@
##           master    #3384      +/-   ##
==========================================
- Coverage   37.47%   37.40%   -0.08%     
==========================================
  Files         311      311              
  Lines        8291     8318      +27     
  Branches     1282     1288       +6     
==========================================
+ Hits         3107     3111       +4     
- Misses       5036     5059      +23     
  Partials      148      148              

@g11tech g11tech changed the title backfill sync from an anchored state backfill sync from an anchor checkpoint state Oct 21, 2021
@g11tech g11tech force-pushed the mbackfill branch 3 times, most recently from f30c7e5 to 1afa813 Compare November 5, 2021 11:38
@g11tech
Copy link
Contributor Author

g11tech commented Nov 5, 2021

example of backfillSync recovering on a restart using previous backfilled sequence (jumped back from 1633696 to 1437564, and then continuing backfilling from there on (updated the scenario in the Tested Scenario section)

@dapplion dapplion mentioned this pull request Nov 9, 2021
packages/lodestar/src/node/notifier.ts Outdated Show resolved Hide resolved
packages/lodestar/src/node/notifier.ts Outdated Show resolved Hide resolved
packages/lodestar/src/sync/backfill/backfill.ts Outdated Show resolved Hide resolved
packages/lodestar/src/sync/backfill/backfill.ts Outdated Show resolved Hide resolved
packages/lodestar/src/sync/backfill/errors.ts Outdated Show resolved Hide resolved
packages/lodestar/src/sync/backfill/verify.ts Outdated Show resolved Hide resolved
packages/lodestar/src/sync/backfill/verify.ts Outdated Show resolved Hide resolved
packages/lodestar/src/sync/sync.ts Outdated Show resolved Hide resolved
@dapplion
Copy link
Contributor

@g11tech Remember to mark as "Ready for review" when appropiate! Draft PRs should be a transient state for unfinished work

@g11tech
Copy link
Contributor Author

g11tech commented Nov 30, 2021

Graphana dashboard:
image
analysis:

  1. on starting beacon node with weakSubjectivitySync latest, the backfill sync started (1st graph),
  2. backfillTillSlot (2nd graph) started decreasing via syncing blocks from the network using rangesync (blue line in 3rd graph),
  3. till around 17:50 time mark, where backfill sync encountered a previous backfilled saved sequence (from the previous run few days ago) all the way to genesis (yellow spike in 3rd graph).
  4. then backfill sync status turns to completed, backfillTillSlot goes to 0 and backfill sync exits.

4rth graph is the error rate.

@g11tech g11tech marked this pull request as ready for review November 30, 2021 11:10
@dapplion
Copy link
Contributor

This looks super good! Thank you so much for the detailed charts and progression ❤️ Will do another review in depth asap and should be good to go.

@g11tech
Copy link
Contributor Author

g11tech commented Nov 30, 2021

This looks super good! Thank you so much for the detailed charts and progression heart Will do another review in depth asap and should be good to go.

thank you! 👍

packages/lodestar/src/chain/interface.ts Show resolved Hide resolved
packages/lodestar/src/db/beacon.ts Outdated Show resolved Hide resolved
@g11tech g11tech force-pushed the mbackfill branch 2 times, most recently from ec9b0cf to 5e82cb6 Compare December 7, 2021 08:59
packages/cli/src/cmds/beacon/initBeaconState.ts Outdated Show resolved Hide resolved
@g11tech
Copy link
Contributor Author

g11tech commented Dec 25, 2021

synopsis of a test run with the latest PR code:

image

  1. Started backfill sync around 16:30
  2. Restarted coouple of times between ~17:00-18:30, sync recovered the previous progress using backfillRanges
  3. Synced to genesis at 20:00 but sync aborted with failiure as code was checking signature of genesis block as well
  4. Fixed it around 22:00, and the sync recovered previous progress to genesis which it complete successfully

@dapplion : Kindly review and let know if anything further needs to be fine tuned.

@g11tech
Copy link
Contributor Author

g11tech commented Jan 3, 2022

@dapplion updated the PR with the discussed changes (and rebased with current master). test run where backfill sync recovered previous progress and completed sync to genesis. 🙂

image

Copy link
Contributor

@dapplion dapplion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent! Looking forward to many more like this in 2022 🎉

@g11tech
Copy link
Contributor Author

g11tech commented Jan 6, 2022

Excellent! Looking forward to many more like this in 2022 tada

Thank you! lodestar rocks! 🚀
latest run on prater
image

@dapplion dapplion dismissed wemeetagain’s stale review January 6, 2022 17:32

Changes addressed

@dapplion dapplion merged commit 0e47cd3 into ChainSafe:master Jan 6, 2022
@g11tech g11tech deleted the mbackfill branch January 6, 2022 18:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BlocksByRange under WS
3 participants