Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Asynchronous Block Download and Verification #4365

Merged
merged 9 commits into from
Aug 2, 2024
Merged

Conversation

eval-exec
Copy link
Collaborator

@eval-exec eval-exec commented Feb 26, 2024

What problem does this PR solve?

In this PR, I have made changes to reduce the synchronization time during CKB's initial block download (IBD) phase. Previously, in the develop branch, the Synchronizer would continuously request blocks within a sliding window from the Tip to Tip+BLOCK_DOWNLOAD_WINDOW. After receiving a block, the Synchronizer would send it to the ChainService and wait for it to be fully processed before proceeding to the next block. Since the Tip would only update after ChainService completely verified a block, the Synchronizer refrained from requesting blocks from other peers during this period. This sequential processing limited the sliding window's progression speed, reducing the Synchronizer's efficiency in block requests and causing delays in feeding blocks to the ChainService, ultimately prolonging the synchronization time.

To address this, this PR aims to decouple the sliding window movement in the Synchronizer from the block verification process in the ChainService, making these operations asynchronous. Since ChainService verifies blocks faster than it inserts them, I've introduced an additional UnverifiedTip value in ChainService to represent the continuous stream of blocks waiting for verification. The sliding window in the Synchronizer is now based on the UnverifiedTip, allowing it to request blocks from other peers independently. ChainService will continuously follow the UnverifiedTip to verify blocks, ensuring that block download is not hindered by the time-consuming block verification process.

a sequence diagram to describe the IBD phase

The IBD phase on develop branch (before v0.111.0)

sequenceDiagram
  autonumber

  participant S as Synchronizer
  participant BP as BlockProcess
  participant C as ChainService


  box crate:ckb_sync
    participant S
    participant BP
  end


  box crate:ckb_chain
    participant C
  end
  
  Note left of S: synchronizer received <br>Block(122) from remote peer
  
  Note over S: try_process SyncMessageUnionReader::SendBlock


  S->>+BP: BlockProcess::execute(Block(122))
  BP->>+C: process_block(Block(122))
  Note over BP: waiting ChainService to return<br>the result of process_block(Block(123))
  Note over C: non_contextual_verify(Block(122))
  Note over C: contextual_verify(Block(122))  
  Note over C: insert_block(Block(122))
  C->>-BP: return result of process_block(Block(122))
  BP->>-S: return result of BlockProcess::execute(Block(122))
  
  alt block is Valid
    Note over S: going on
  else block is Invalid
    Note over S: punish the malicious peer
  end

  Note left of S: synchronizer received <br>Block(123) from remote peer
  Note over S: try_process SyncMessageUnionReader::SendBlock
  S->>+BP: BlockProcess::execute(Block(123))
  BP->>+C: process_block(Block(123))
  Note over BP: waiting ChainService to return<br>the result of process_block(Block(123))
  Note over C: non_contextual_verify(Block(123))
  Note over C: contextual_verify(Block(123))  
Note over C: insert_block(Block(123))
  C->>-BP: return result of process_block(Block(123))
  BP->>-S: return result of BlockProcess::execute(Block(123))
  
  alt block is Valid
    Note over S: going on
  else block is Invalid
    Note over S: punish the malicious peer
  end

Loading

The IBD phase by this PR:

sequenceDiagram
  autonumber
  participant Sr as Synchronizer::received
  participant BP as BlockProcess
  participant Sp as Synchronizer::poll
  participant C as main thread
  participant PU as PreloadUnverified thread
  participant CV as ConsumeUnverifiedBlocks thread

  box crate:ckb-sync
    participant Sr
    participant Sp
    participant BP
  end

  box crate:ckb-chain
    participant C
    participant PU
    participant CV
  end

  Note left of Sr: synchronizer received <br>Block(122) from remote peer
  Note over Sr: try_process SyncMessageUnionReader::SendBlock
  Sr ->>+ BP: BlockProcess::execute(Block(122))
  BP ->>+ C: asynchronous_process_block(Block(122))
  Note over C: non_contextual_verify(Block(122))
  Note over C: insert_block(Block(122))
  Note over C: OrphanBroker.process_lonly_block(Block(122))

  alt parent is BLOCK_STORED or parent is_pending_veryfing
    Note over C: OrphanBroker.process_lonly_block(Block(122))
    Note over C: increase unverified_tip to Block(122)
    C ->>+ PU: send Block(122) to PreloadUnverified via channel
  else parent not found
    Note over C: OrphanBroker.process_lonly_block(Block(122))
    Note over C: insert Block(122) to OrphanBroker
  end
  C ->>+ PU: send Block(123) to PreloadUnverified via channel
  C ->>- BP: return
  BP ->>- Sr: return
  Note left of Sr: synchronizer received <br>Block(123) from remote peer
  Note over Sr: try_process SyncMessageUnionReader::SendBlock
  Sr ->>+ BP: BlockProcess::execute(Block(123))
  BP ->>+ C: asynchronous_process_block(Block(123))
  Note over C: non_contextual_verify(Block(123))
  Note over C: insert_block(Block(123))
  Note over C: OrphanBroker.process_lonly_block(Block(123))
  alt parent is BLOCK_STORED or parent is_pending_veryfing
    Note over C: OrphanBroker.process_lonly_block(Block(123))
    Note over C: increase unverified_tip to Block(123)
    C ->>+ PU: send Block(123) to PreloadUnverified via channel
  else parent not found
    Note over C: OrphanBroker.process_lonly_block(Block(123))
    Note over C: insert Block(123) to OrphanBroker
  end
  C ->>- BP: return
  BP ->>- Sr: return

  loop load unverified
    Note over PU: receive LonelyBlockHash
    Note over PU: load UnverifiedBlock from db
    PU ->>+ CV: send UnverifiedBlock to ConsumeUnverifiedBlocks
  end

  loop Consume Unverified Blocks
    Note over CV: start verify UnverifiedBlock if the channel is not empty
    Note over CV: Verify Block in CKB VM

    alt Block is Valid
      Note over CV: remove Block block_status and HeaderMap
    else Block is Invalid
      Note over CV: mark block as BLOCK_INVALID in block_status_map
      Note over CV: Decrease Unverified TIP
    end

    opt Execute Callback
      Note over CV: execute callback to punish the malicious peer if block is invalid
      Note over CV: callback: Box<dyn FnOnce(Result<bool , Error>) + Send + Sync>

    end
  end

Loading

What's Changed:

Related changes

  • When Synchronizer receive a block from remote peer, transfer it to ChainService.
  • ChainService performs all validation tasks except for ScriptVerify, and optimistically assumes that the block is valid, storing blocks first in RocksDB, and using UnverifiedTip to represent the highest chain that has not been verified.
  • do ScriptVerify task in a a asynchronous thread and update TipHeader
  • move OrphanBlock to ckb_chain
  • move BlockStatus and HeaderMap to ckb_shared, because both Synchronizer and ChainService need them

Check List

Tests

Side effects

  • Performance improvement

Release note

Note: Add a note under the PR title in the release note.
This PR introduces several enhancements to the CKB Synchronizer to reduce synchronization time 
during the initial block download (IBD) phase. Key changes include:

1. **Asynchronous Operations**: The Synchronizer sliding window movement is now decoupled from the block verification process in the ChainService, allowing asynchronous handling. This improves the efficiency of block requests and verification.
2. **Changes to sync_state RPC**:
   - Added `tip_hash` and `tip_number` to represent the current chain tip.
   - Added `unverified_tip_hash` and `unverified_tip_number` to track the latest received but not yet verified blocks.
   - Removed the `orphan_blocks_size` field.
3. **Log Format Update**: The format of CKB logs has been updated, specifically changing the prefix and content of log entries to provide clearer and more structured information.

These updates lead to a more efficient synchronization process, reducing the overall time 
required for IBD. Note that removing the `orphan_blocks_size` field constitutes a BREAKING CHANGE 
in the `sync_state` RPC.

@eval-exec

This comment was marked as off-topic.

@eval-exec
Copy link
Collaborator Author

eval-exec commented Mar 27, 2024

synchronization timecost

image
From genesis block to 12,500,000:
v0.115.0-rc2: need 30h40min
this PR: need 23h

This PR reduces the synchronization time by approximately 25%.

@eval-exec
Copy link
Collaborator Author

RSS memory usage

Compare v0.115.0-rc2 vs this PR's full synchronization RSS mem usage:
image

only compare Omiga huge blocks:
image

In 11,980,000 -> 11,995,000:
v0.115.0-rc2: RSS mem usage fluctuates between 1.9G and 2.5G.
With this PR, the RSS memory usage consistently stays at 2.0G.

@eval-exec eval-exec added the t:enhancement Type: Feature, refactoring. label Mar 27, 2024
@eval-exec eval-exec self-assigned this Mar 27, 2024
@eval-exec eval-exec force-pushed the ckb-async-download branch 2 times, most recently from da0db29 to 426adc0 Compare March 27, 2024 06:26
@eval-exec eval-exec added the m:sync module: ckb-sync label Mar 27, 2024
}

// `self.non_contextual_verify` is very fast.
fn asynchronous_process_block(&self, lonely_block: LonelyBlock) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems asynchronous_ is not the best name for this function, which makes me think of async fn in tokio :)
it's more like preprocess_block, we only do non_contextual_verify and insert into db.

chain/src/utils/orphan_block_pool.rs Outdated Show resolved Hide resolved
chenyukang
chenyukang previously approved these changes Aug 2, 2024
@chenyukang chenyukang added this pull request to the merge queue Aug 2, 2024
driftluo
driftluo previously approved these changes Aug 2, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Aug 2, 2024
@zhangsoledad zhangsoledad added this pull request to the merge queue Aug 2, 2024
zhangsoledad
zhangsoledad previously approved these changes Aug 2, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Aug 2, 2024
@driftluo driftluo added this pull request to the merge queue Aug 2, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Aug 2, 2024
@eval-exec
Copy link
Collaborator Author

@chenyukang I have pushed a commit for test_accept_not_a_better_block. Now it waits to complete its task before exiting.

Please re-add this PR to merge queue.

…fore exit

Signed-off-by: Eval EXEC <execvy@gmail.com>
@chenyukang chenyukang added this pull request to the merge queue Aug 2, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Aug 2, 2024
@driftluo driftluo added this pull request to the merge queue Aug 2, 2024
Merged via the queue into develop with commit 5233b91 Aug 2, 2024
33 checks passed
@doitian doitian mentioned this pull request Aug 19, 2024
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
m:sync module: ckb-sync t:enhancement Type: Feature, refactoring.
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

4 participants