Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(l1): process blocks in batches when syncing #2174

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

MarcosNicolau
Copy link
Contributor

@MarcosNicolau MarcosNicolau commented Mar 7, 2025

Motivation
Accelerate syncing!

Description
This PR introduces block batching during full sync:

  1. Instead of storing and computing the state root for each block individually, we now maintain a single state tree for the entire batch, committing it only at the end. This results in one state trie per n blocks instead of one per block (we'll need less storage also).
  2. The new full sync process:
    • Request 128 headers
    • Request 128 block bodies and collect them
    • Once all blocks are received, process them in batches using a single state trie, which is attached to the last block.
  3. Blocks are now stored in a single transaction.
  4. State root, receipts root, and request root validation are only required for the last block in the batch.

Considerations:

  1. Optimize account updates: Instead of inserting updates into the state trie after each block execution, batch them at the end, merging repeated accounts to reduce insertions and improve performance.
  2. Improve transaction handling: Avoid committing storage tries to the database separately. Instead, create a single transaction for storing receipts, storage tries, and blocks. This would require additional abstractions for transaction management.
  3. Performance metrics are pending. I am working on a performance integration test to measure the improvements introduced by this approach.
  4. This isn't working for levm backend we need it to cache the executions state and persist it between them, as we don't store anything until the final of the batch.

Closes None

@MarcosNicolau MarcosNicolau requested a review from a team as a code owner March 7, 2025 12:13
Copy link

github-actions bot commented Mar 7, 2025

Lines of code report

Total lines added: 189
Total lines removed: 0
Total lines changed: 189

Detailed view
+---------------------------------------------+-------+------+
| File                                        | Lines | Diff |
+---------------------------------------------+-------+------+
| ethrex/crates/blockchain/blockchain.rs      | 408   | +37  |
+---------------------------------------------+-------+------+
| ethrex/crates/storage/api.rs                | 190   | +5   |
+---------------------------------------------+-------+------+
| ethrex/crates/storage/store.rs              | 1160  | +51  |
+---------------------------------------------+-------+------+
| ethrex/crates/storage/store_db/in_memory.rs | 499   | +31  |
+---------------------------------------------+-------+------+
| ethrex/crates/storage/store_db/libmdbx.rs   | 1144  | +54  |
+---------------------------------------------+-------+------+
| ethrex/crates/vm/backends/mod.rs            | 246   | +11  |
+---------------------------------------------+-------+------+

@@ -110,6 +110,63 @@ impl Blockchain {
Ok(())
}

pub fn add_blocks_in_batch(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we change the implementation of add_block to just call this function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I planned to refactor it after having some benches.

@@ -110,6 +110,63 @@ impl Blockchain {
Ok(())
}

pub fn add_blocks_in_batch(
&self,
parent_hash: BlockHash,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you need to explicitly pass parent_hash? Couldn't you just get it from blocks[0].parent_hash?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes you are right, it was a param inherited from add_block.

for (i, block) in blocks.iter().enumerate() {
let block_hash = block.header.compute_block_hash();
// Validate if it can be the new head and find the parent
let Ok(parent_header) = find_parent_header(&block.header, &self.storage) else {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait, shouldn't it be a precondition of the function that block n + 1 has block n as parent?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure we could add that pre-condition to return earlier. Thou, I don't know if its actually really necessary as in the next iteration you are going to verify that block n-1 is indeed the parent of n.

return Err(ChainError::ParentNotFound);
};

let chain_config: ChainConfig = self.storage.get_chain_config()?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is inmutable so it could be passed to Blockchain on new, or called in the constructor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, we would avoid a db read as well.

// todo only execute transactions
// batch account updates to merge the repeated accounts
self.storage
.apply_account_updates_to_trie(&account_updates, &mut state_trie)?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, I understand now. I was hoping we could just call: https://github.com/lambdaclass/lambda_ethereum_rust/blob/0acc5e28b861f88c30cebb6cbfe0230970df25ed/crates/vm/backends/revm.rs#L96 get_state_transitions only once.

We would need to add a execute_blocks inside vm.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can discuss this later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried this approach but it won't work without making larger modifications to the vm backend.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants