-
Notifications
You must be signed in to change notification settings - Fork 247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Store the domain genesis storage on the consensus chain #1824
Conversation
a6591cf
to
9797d2a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still not sure the exact reason we need to hold entire genesis state 🤔
If the build_storage
relies on std, then other substrate chains might not sync at all but that is not the case.
Can you add more detail as to what is actually causing the different state root? If there is an actual requirement for std, then the fix should be sent upstream instead of trying to patch it here IMO
9797d2a
to
7cb396a
Compare
Please take a look when you have time @nazar-pc |
Will take a look tomorrow, this seems to be pretty involved |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still not sure why we need the host function 🤔. We already have the raw chainspec stored, when the domain instance is created, we can make necessary domain specific changes and generate the state root which is what the host function is doing now. I was hoping the host function is completely removed in this change
@@ -134,19 +94,26 @@ pub(crate) fn register_runtime_at_genesis<T: Config>( | |||
/// Schedules a runtime upgrade after `DomainRuntimeUpgradeDelay` from current block number. | |||
pub(crate) fn do_schedule_runtime_upgrade<T: Config>( | |||
runtime_id: RuntimeId, | |||
code: Vec<u8>, | |||
raw_genesis_storage: Vec<u8>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we already have the raw_genesis when we register a runtime. Why do we need this again ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the genesis state will be changed as runtime version changes as observed previously, so every different runtime needs a corresponding genesis storage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem right. If we upgrade existing domain, we shouldn't wipe the state and replace it with one from genesis. Domain should do the actual upgrade instead with provided updated code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Domain should do the actual upgrade instead with provided updated code.
Yeah, you are right!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still not sure why we need to pass the entire geneis storage here. We just new runtime and the rest should stay as is. What am I missing here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the genesis state will be changed as runtime version changes as observed previously, so every different runtime needs a corresponding genesis storage.
^
@nazar-pc 's comment is wrong, plz take a look at #1824 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Passing genesis storage means you can instantitate domain with correct storage right away, otherwise you'll need to do multiple runtime upgrades right after instantiation, which makes little sense.
The reason we still need host function is these operator is not supported in runtime currently:
|
serde-json-core supports WASM, so we can already decode those key values and replace the necessary keys. As for the state root, we just need a trie builder to construct the state root. trie-db as well supports WASM. These should completely eliminate the Host functions on the domains. Update: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will not pretend to understand what is going on, but I have questions and concerns. The goal was to remove host calls, but we seem to still have them, which is not good.
@@ -134,19 +94,26 @@ pub(crate) fn register_runtime_at_genesis<T: Config>( | |||
/// Schedules a runtime upgrade after `DomainRuntimeUpgradeDelay` from current block number. | |||
pub(crate) fn do_schedule_runtime_upgrade<T: Config>( | |||
runtime_id: RuntimeId, | |||
code: Vec<u8>, | |||
raw_genesis_storage: Vec<u8>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem right. If we upgrade existing domain, we shouldn't wipe the state and replace it with one from genesis. Domain should do the actual upgrade instead with provided updated code.
This storage key will be used in upcoming commit to reset the SelfDomainId storage item to the correct value in the domain genesis storage Signed-off-by: linning <linningde25@gmail.com>
…Info supported RawGenesis is mostly port from upstream with slight changes to support our usecase Signed-off-by: linning <linningde25@gmail.com>
…ime code In this way the domain instance can reuse the genesis storage, which including both the wasm runtime code and genesis config of other pallet, and avoid to build the genesis storage again Signed-off-by: linning <linningde25@gmail.com>
…e root for domain instance This commit remove the usage of host function to derive genesis receipt and also get rid of the ugly workaround of constructing chain spec with the RuntimeGenesisConfig and global variable Signed-off-by: linning <linningde25@gmail.com>
Signed-off-by: linning <linningde25@gmail.com>
Previously we need to delay this to block one due to the host function is unavailable but now the host function is removed. Also, because the block hash of the genesis block is unavailable thus the genesis receipt's consensus_block_hash is set to the default value Signed-off-by: linning <linningde25@gmail.com>
Since we use runtime_object.raw_genesis to derive the genesis state for domain instance and which already contains the sudo account, raw_genesis_config is needless now Signed-off-by: linning <linningde25@gmail.com>
…-storage Since now we don't need the raw_genesis_config when instantiate domain and we need the raw_genesis_storage when register runtime Signed-off-by: linning <linningde25@gmail.com>
This is used to simulate the situation where the domain runtime is upgraded after domain instantiation, thus the domain tests will ensure the domain genesis state will always be the same even it is generated in different place with different native runtime. This commit also change the domain_id of the RuntimeGenesisConfig to an arbitrary value to ensure the domain_id will be reset to the correct value in the genesis storage Signed-off-by: linning <linningde25@gmail.com>
7cb396a
to
eb8036d
Compare
PR is rebased, and now all the domain-related host functions are removed, PTAL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense. Some nits and questions.
crates/pallet-domains/src/lib.rs
Outdated
) | ||
.expect("Genesis domain instantiation must always succeed"); | ||
let domain_id = | ||
do_instantiate_domain::<T>(domain_config, domain_owner.clone(), One::one(), None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we do not need the raw_genesis_config
to instantiate domain anymore. We should also remove the sudo related things from the domain_registry as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is removed in a later commit but I'm not sure what you means by remove the sudo related things from the domain_registry as well
/// A temporary storage to hold any previous epoch details for a given domain | ||
/// if the epoch transitioned in this block so that all the submitted bundles | ||
/// within this block are verified. | ||
/// The storage is cleared on block finalization. | ||
/// TODO: The storage is cleared on block finalization that means this storage is already cleared when | ||
/// verifying the `submit_bundle` extrinsic and not used at all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is never meant to be used outside the runtime. This storage purely exists to ensure POE is verified if there was a Epoch transition in this block
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, but the storage is inserted during block execution and cleared on block finalization which means it only exists during block execution, and we need to verify POE whenever a node receives a submit_bundles
extrinsic.
.get(operator_id) | ||
.ok_or(BundleError::BadOperator)?; | ||
Ok((*operator_stake, pending_election_params.total_domain_stake)) | ||
if let Some(pending_election_params) = LastEpochStakingDistribution::<T>::get(domain_id) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This refactor seems unrelated 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is related to the change of registering the genesis domain at block #0
instead of block #1
.
Because when in block #1
, the LastEpochStakingDistribution
is cleared when verifying the bundle that is to be included at #2
so we always use DomainStakingSummary
.
But when in block #0
namely the genesis block there is no block finalization thus the LastEpochStakingDistribution
is used when verifying the bundle that is to be included at #1
, and it will fail with BadOperator
due to the operator id not present in LastEpochStakingDistribution::operators
ExecutionReceipt { | ||
domain_block_number: Zero::zero(), | ||
domain_block_hash: Default::default(), | ||
domain_block_extrinsic_root: Default::default(), | ||
parent_domain_block_receipt_hash: Default::default(), | ||
consensus_block_hash: consensus_genesis_hash, | ||
consensus_block_hash: Default::default(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure why? Ideally, this should be the block hash in which the domain was instantiated. Is that not so ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, there should be a TODO here, but the change is due to the genesis domain being registered at the genesis block now, and we can't know the hash of the genesis block at that time.
@@ -89,12 +88,12 @@ pub fn testnet_evm_genesis() -> RuntimeGenesisConfig { | |||
.collect(), | |||
..Default::default() | |||
}, | |||
ethereum: Default::default(), | |||
base_fee: Default::default(), | |||
self_domain_id: evm_domain_test_runtime::SelfDomainIdConfig { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not just default instead ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the domain_id
is set to Some(DomainId::new(123))
for testing purpose
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I follow everything through, but I certainly like this more than previous iteration
/// NOTE: this function is port from `sp_state_machine::in_memory_backend::new_in_mem` which is | ||
/// only export for `std` but we need to use it in `no_std` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about submitting PR upstream that exposes it in other environments? I'd pull it earlier today with Substrate upgrade already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to get some feedback for this direction first, if it is the right direction I will surely contribute it upstream.
TrieBackendBuilder::new(db, empty_trie_root::<LayoutV0<H>>()).build() | ||
} | ||
|
||
// NOTE: this is port from `sp_core::storage::StorageKey` with `TypeInfo` supported |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, looks like something worth contributing upstream.
let mut raw_genesis: RawGenesis = Decode::decode(&mut raw_genesis_storage.as_slice()) | ||
.map_err(|_| Error::FailedToDecodeRawGenesis)?; | ||
|
||
// Remove the runtime code from `raw_genesis` before storing it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why removing runtime code from genesis state? It would make sense to me to leave it in there so you don't need to put it back in later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because runtime code is updatable, it is weird to extract the code, update it, and store the updated code back to the genesis state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because runtime code is updatable, it is weird to extract the code, update it, and store the updated code back to the genesis state.
This should never be necessary, runtime code in genesis state doesn't need to be updatable. It needs to be canonical instead and come as a bundle, just like it is before you remove runtime code out of it. For updated runtime genesis state will be different too. State and code should come in pairs when it comes to instantiation.
You don't update genesis state without updating runtime code and you don't update runtime code without updating genesis state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I may be misled by your reply #1824 (comment), that is what I thought at first! the genesis state and the runtime code should come in pairs. This should be true for both the first registration and the later runtime upgrade. The new domain instance should use both the upgraded runtime code and genesis state to initialize its genesis block, this is the only place the genesis state is involved. The existing domain instance should just upgrade its runtime code don't care about the updated genesis state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The last part of above comment is correct. They come together, but during upgrades we only upgrade code and let it change the state accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It still means we need to update state one step at a time when we deploy new domain. Maybe this is acceptable, but feels inefficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm okay with storing the runtime code along with the genesis state, but what we need to discuss is whether the genesis state should be updated as the domain runtime upgraded in the consensus chain. My opinion is yes, the domain genesis state and runtime code should come in pairs and be updated at the same time.
What you're suggesting seems to be that domain code is updated "in place", which is confusing for other domains that use the old code
I say, we store all the new runtime seperately as they are submitted but leave the first one in place. This also avoids the manually refilling them when we need the full genesis
Suppose the domain is instantiated, it commits to a runtime object with the specified RuntimeId
, and uses both the runtime object's runtime code and genesis state to initialize the domain chain.
Later, the runtime object is updated (both the code and genesis state), and the existing domain instance will upgrade its runtime code unconditionally and automatically as the upgrade is derived from the consensus block deterministically. However, the existing domain doesn't care about the updated genesis state because it only needs this when building its genesis block.
Then there is a new domain instance instantiated and committed to the same runtime object with the same RuntimeId
, now the new domain instance should use the newest version of both the runtime code and genesis state to initialize its genesis block. This is the reason why I think we should update the runtime object's runtime code and genesis state at the same time.
And I can't see why we need to store all the version of the runtime code in the consensus chain state to support instantiating new domain instances with an older version of the runtime code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My opinion is yes, the domain genesis state and runtime code should come in pairs and be updated at the same time.
That is my expectation and preference as well.
And I can't see why we need to store all the version of the runtime code in the consensus chain state to support instantiating new domain instances with an older version of the runtime code.
Then whatever the latest is will have to support runtime upgrades from whatever the oldest registered domain is (even if it is not running right now) because otherwise domain will not resume. I feel like this is an issue and storing all used versions is very desirable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then whatever the latest is will have to support runtime upgrades from whatever the oldest registered domain is (even if it is not running right now) because otherwise domain will not resume
No, as said above the domain runtime upgrade is not controlled by the domain owner but by the runtime object, whenever the runtime object is upgraded the domain instance will upgrade its runtime unconditionally and automatically.
If the domain instance node is not running for a while, then it should catch up to the domain chain tip by deriving the domain block from the consensus block, it can't miss any consensus block since it is instantiated unless we support syncing the domain chain from other nodes, but still all the consensus block should be handled by some (but not all) domain nodes in case any potential bundle/runtime upgrade is missed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR is updated with runtime code and genesis state stored together and updated together, PTAL.
…ain_runtime call Signed-off-by: linning <linningde25@gmail.com>
Signed-off-by: linning <linningde25@gmail.com>
Signed-off-by: linning <linningde25@gmail.com>
All comments are addressed/replied, please take a look again, thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General Direction make sense but left some coments
@@ -1379,7 +1361,7 @@ impl<T: Config> Pallet<T> { | |||
|
|||
pub fn domain_runtime_code(domain_id: DomainId) -> Option<Vec<u8>> { | |||
RuntimeRegistry::<T>::get(Self::runtime_id(domain_id)?) | |||
.map(|runtime_object| runtime_object.code) | |||
.and_then(|mut runtime_object| runtime_object.raw_genesis.take_runtime_code()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doe not consider runtime upgrades and instead return the first runtime
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't maintain historical domain runtime code, they are updated in place and the returned value is determined by the consensus_hash
used to call the runtime API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, that should not be case. We should never touch the raw chainspec and treat it just like that for domain raw spec as well. Updating in place means, raw chainspec is always updating whenever there is a runtime upgrade.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for the purposes of future instantiation such that we don't have to go through potentially many runtime upgrades and keep the whole upgrade path support in the runtime forever (though maybe we do need that for other reasons).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then we will essential break the exisiting domains for the operators who want to sync from beginning, IIUC. If we do want to support runtime upgrades such that new domains do not need to go through the upgrade path. Then each domain should ideally carry their current rutime hash and fetch the right runtime based on that hash. This way, exisiting operator domains can sync from the beginning, while new domains and their operators can start from the latest upgrade as their genesis runtime.
Update: Okay I see where my confusion is coming from. Fetching this from chain_spec and the fact that we are updating this in place leads to this confusion. We already update the chainspec with domain_id
, why not have a Runtime storage against runtime hash and then return that instead ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I don't get your point. raw_genesis
is not fetched from the chain spec but built from chain spec, different runtime code can lead to different raw_genesis
even if the rest of the chain spec is the same.
We already update the chainspec with domain_id, why not have a Runtime storage against runtime hash and then return that instead ?
Do you mean storing the runtime code separately? the runtime code and raw_genesis
are come together and updated together, domain_runtime_code
is the only place that runtime code is used alone so I don't see much benefit to doing so, also see #1824 (comment)
@@ -134,19 +94,26 @@ pub(crate) fn register_runtime_at_genesis<T: Config>( | |||
/// Schedules a runtime upgrade after `DomainRuntimeUpgradeDelay` from current block number. | |||
pub(crate) fn do_schedule_runtime_upgrade<T: Config>( | |||
runtime_id: RuntimeId, | |||
code: Vec<u8>, | |||
raw_genesis_storage: Vec<u8>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still not sure why we need to pass the entire geneis storage here. We just new runtime and the rest should stay as is. What am I missing here ?
balances: BalancesConfig { | ||
// TODO: remove `endowed_accounts` once XDM is ready |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont understand this TODO. Why do we want to remove endowed accounts ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the genesis storage is registered with the runtime code and is reused by every domain instance, so it basically means we mint SSC whenever a domain is instantiated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, for the production runtime these will be empty anyway. For the tests, I would prefer having these endowed accounts rather than using the XDM to moving funds from the Consensus. This is a bad ux when testing or developing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, will update the comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. Lets move on.
Thank you for being patient
close #1798
In domain v2, the domain's genesis state root is driven and stored on the consensus chain when the domain is instantiated, and we need to ensure when the operator node is started it will construct the same genesis state root.
Currently, this is done by storing the domain runtime code and optionally the domain
GenesisRuntimeConfig
on the consensus chain, and the operator node will fetch these data from consensus chain (useGenesisRuntimeConfig::default
in case it is not stored) and them construct the genesis state by<GenesisRuntimeConfig as BuildStorage>::build_storage
.Unfortunately, both
GenesisRuntimeConfig::default
and<GenesisRuntimeConfig as BuildStorage>::build_storage
are called on the client side and the return value is determined by the native runtime that compiled with the consensus chain, thus even if we store the same WASM runtime code onchain the operator may still get a different genesis state root if the operator is running with a newer release ofsubspace-node
, see #1798 for more detail.This PR fixes this issue by directly storing the domain genesis storage onchain when domain is instantiated instead of building the genesis storage when the operator is started. This won't bring much cost compared with the main branch because:We remove the runtime code from the genesis storage to avoid duplicating withRuntimeRegistry
We just changed to store the genesis storage instead ofGenesisRuntimeConfig
onchainThis PR fixes the issue by storing the domain genesis storage in the runtime registry, so when registering a runtime instead of providing just the runtime code we need to provide the full genesis storage (which includes the runtime code), and every domain instance will reuse the same genesis storage with a slight adjustment of the
SelfDomainId
storage, without the need of rebuilding the genesis storage again.This PR also comes with a domain subcommand
build-genesis-storage
that we can use to get the raw genesis storage for registering runtime.This PR also gets rid of the ugly workaround of constructing chain spec with the
GenesisRuntimeConfig
which is introduced in #1668Code contributor checklist: