Rework the handling of the `ChainConfig.initial_state` during re-genesis #1209

xgreenx · 2023-06-13T12:30:05Z

We must process a huge state if we need to re-genesis in the mainnet. It can be so huge that it can not even fit into the RAM. So, we need to re-work the initial state handling.

The obvious solution is to stream the state from the file and process it with chunks. Because this process can be long, we also need to support interruption and continuing from the place where we've stopped. Also, the initial_state field in the ChainConfig should be replaced with the final result like genesis hash or genesis metadata instead of storing the whole state in the RAM.

The text was updated successfully, but these errors were encountered:

xgreenx · 2023-10-04T10:02:03Z

Based on the internal conversation, we agreed that the work can be paralyzed in several ways:

Extract the initial_state from the ChainConfig #1450 - Extract the initial_state from the ChainConfig. We still need it during the start of the network, so this information should be provided along with ChainConfig. But it can be done as a separate file or as an optional field. It would be nice to integrate it with the loading snapshot functionality.
Determine file encoding for initial state #1396 - The current serde JSON format is hard to stream. It can be replaced with a custom or any already implemented format. The person who works on this part should compare existing solutions and decide which to select based on the integration complexity and performance.
Make pushing/loading of the state streamable.
The streaming itself should be pausable/resumable. It means we need to store a temporary, not finalized, database somewhere. When the snapshot loading is finished, we must make a temporary database our main database. Also, the temporary database may store some information about where we stopped. This information also should be cleaned up at the end.
We store all contracts and balances in the memory during the snapshot creation. We need to make it streamable to file too. We should be able to reuse the mechanism created for streamable snapshot loading above.
The code requires adaptation to work with streamed data. It may require the introduction of some abstraction(mostly by traits) to support streaming. Also, we may support different formats in the future, so it would be nice if this abstraction is format agnostic.
During the initial benchmark, it becomes clear that BMT consumes a lot of memory for huge snapshots. The solution is to create an optimized version of the BMT that doesn't store all leaves and nodes. We can store only the nodes required to build the next nodes. It should drastically reduce memory consumption. The alternative solution is to use storage to store all nodes, but it is better to do this after optimized BMT. - Done Improved Memory Efficiency in Merkle Root Calculation fuel-vm#603

xgreenx · 2023-10-04T10:55:52Z

Also, maybe we need to think about migration. Do we want to make it part of the regenesis, or do we want to have a separate flow to migrate it?

Closes #1209 The change separates the initial chain state from the chain config and stores them in separate files when generating a snapshot. The state snapshot can be generated in a new format where parquet is used for compression and indexing while postcard is used for encoding. This enables importing in a stream like fashion which reduces memory requirements. Json encoding is still supported to enable easy manual setup. However, parquet is prefered for large state files. ### Snapshot command The CLI was expanded to allow customizing the used encoding. Snapshots are now generated along with a metadata file describing the encoding used. The metadata file contains encoding details as well as the location of additional files inside the snapshot directory containing the actual data. The chain config is always generated in the JSON format. The snapshot command now has the '--output-directory' for specifying where to save the snapshot. ### Run command The run command now includes the 'db_prune' flag which when provided will prune the existing db and start genesis from the provided snapshot metadata file or the local testnet configuration. The snapshot metadata file contains paths to the chain config file and files containing chain state items (coins, messages, contracts, contract states, and balances), which are loaded via streaming. Each item group in the genesis process is handled by a separate worker, allowing for parallel loading. Workers stream file contents in batches. A database transaction is committed every time an item group is succesfully loaded. Resumability is achieved by recording the last loaded group index within the same db tx. If loading is aborted, the remaining workers are shutdown. Upon restart, workers resume from the last processed group. ### Contract States and Balances Using uniform-sized batches may result in batches containing items from multiple contracts. Optimal performance can presumably be achieved by selecting a batch size that typically encompasses an entire contract's state or balance, allowing for immediate initialization of relevant Merkle trees. --------- Co-authored-by: Salka1988 <emirsalkicart@gmail.com> Co-authored-by: Brandon Vrooman <brandon.vrooman@fuel.sh> Co-authored-by: segfault_magnet <ahmed.sagdati.ets@gmail.com> Co-authored-by: Ahmed Sagdati <37515857+segfault-magnet@users.noreply.github.com> Co-authored-by: Brandon Kite <brandonkite92@gmail.com> Co-authored-by: Mitchell Turner <james.mitchell.turner@gmail.com> Co-authored-by: Green Baneling <XgreenX9999@gmail.com>

Closes FuelLabs/fuel-core#1209 The change separates the initial chain state from the chain config and stores them in separate files when generating a snapshot. The state snapshot can be generated in a new format where parquet is used for compression and indexing while postcard is used for encoding. This enables importing in a stream like fashion which reduces memory requirements. Json encoding is still supported to enable easy manual setup. However, parquet is prefered for large state files. ### Snapshot command The CLI was expanded to allow customizing the used encoding. Snapshots are now generated along with a metadata file describing the encoding used. The metadata file contains encoding details as well as the location of additional files inside the snapshot directory containing the actual data. The chain config is always generated in the JSON format. The snapshot command now has the '--output-directory' for specifying where to save the snapshot. ### Run command The run command now includes the 'db_prune' flag which when provided will prune the existing db and start genesis from the provided snapshot metadata file or the local testnet configuration. The snapshot metadata file contains paths to the chain config file and files containing chain state items (coins, messages, contracts, contract states, and balances), which are loaded via streaming. Each item group in the genesis process is handled by a separate worker, allowing for parallel loading. Workers stream file contents in batches. A database transaction is committed every time an item group is succesfully loaded. Resumability is achieved by recording the last loaded group index within the same db tx. If loading is aborted, the remaining workers are shutdown. Upon restart, workers resume from the last processed group. ### Contract States and Balances Using uniform-sized batches may result in batches containing items from multiple contracts. Optimal performance can presumably be achieved by selecting a batch size that typically encompasses an entire contract's state or balance, allowing for immediate initialization of relevant Merkle trees. --------- Co-authored-by: Salka1988 <emirsalkicart@gmail.com> Co-authored-by: Brandon Vrooman <brandon.vrooman@fuel.sh> Co-authored-by: segfault_magnet <ahmed.sagdati.ets@gmail.com> Co-authored-by: Ahmed Sagdati <37515857+segfault-magnet@users.noreply.github.com> Co-authored-by: Brandon Kite <brandonkite92@gmail.com> Co-authored-by: Mitchell Turner <james.mitchell.turner@gmail.com> Co-authored-by: Green Baneling <XgreenX9999@gmail.com>

xgreenx added the tech-debt The issue is to improve the current code and make it more clear/generic/reusable/pretty/avoidable. label Jun 13, 2023

xgreenx mentioned this issue Jun 13, 2023

SMT storage key hashing #1207

Merged

xgreenx removed the tech-debt The issue is to improve the current code and make it more clear/generic/reusable/pretty/avoidable. label Jun 13, 2023

Voxelot added the regenesis label Aug 1, 2023

xgreenx mentioned this issue Aug 16, 2023

Re-genesis feature #1307

Closed

xgreenx added the SDK team The issue is ready to be addressed by SDK team label Aug 22, 2023

segfault-magnet assigned segfault-magnet and Salka1988 Aug 23, 2023

MujkicA mentioned this issue Feb 8, 2024

Stream reader for chain state #1519

Merged

xgreenx mentioned this issue Feb 26, 2024

Regenesis support #1693

Merged

xgreenx closed this as completed in #1693 Mar 5, 2024

xgreenx assigned MujkicA Mar 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rework the handling of the `ChainConfig.initial_state` during re-genesis #1209

Rework the handling of the `ChainConfig.initial_state` during re-genesis #1209

xgreenx commented Jun 13, 2023

xgreenx commented Oct 4, 2023 •

edited

Loading

xgreenx commented Oct 4, 2023

Rework the handling of the ChainConfig.initial_state during re-genesis #1209

Rework the handling of the ChainConfig.initial_state during re-genesis #1209

Comments

xgreenx commented Jun 13, 2023

xgreenx commented Oct 4, 2023 • edited Loading

xgreenx commented Oct 4, 2023

Rework the handling of the `ChainConfig.initial_state` during re-genesis #1209

Rework the handling of the `ChainConfig.initial_state` during re-genesis #1209

xgreenx commented Oct 4, 2023 •

edited

Loading