-
Notifications
You must be signed in to change notification settings - Fork 130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Stale] On-chain light client for BEEFY+MMR #1367
Conversation
Error::<T, I>::InvalidParentNumberAndHash, | ||
); | ||
|
||
// TODO: is it the right condition? can id is increased by say +3? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actual question. My understanding of BEEFY is that:
- commitment is signed by the current validator set, which must have id equal to
commitment.commitment.validator_set_id
; - if
mmr_leaf.beefy_next_authority_set.id == commitment.commitment.validator_set_id + 1
, then it is the regular header and its direct descendant will be signed by the same setcommitment.commitment.validator_set_id
; - when we see
mmr_leaf.beefy_next_authority_set.id == commitment.commitment.validator_set_id + 2
, then it is the 'handoff' header, which must be signed by current validator set. But its direct descendant will be signed by the next setcommitment.commitment.validator_set_id + 1
.
Is that correct? I'll be able to find it later, during relay development, but people who have more BEEFY knowledge may answer earlier.
Follow-up question is whether we may see e.g. mmr_leaf.beefy_next_authority_set.id == commitment.commitment.validator_set_id + 3
here? Is it valid to skip set id in BEEFY? Like jumping from set 10
to 15
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you clarify point 2:
if mmr_leaf.beefy_next_authority_set.id == commitment.commitment.validator_set_id + 1, then it is the regular header and its direct descendant will be signed by the same set commitment.commitment.validator_set_id;
you are referring to mmr_leaf
(which represents/identifies a particular block basically) and commitment
- is the commitment here a vote for block number matching mmr_leaf, or the full commitment (super-majority set of votes finalizing the block)?
Follow-up question is whether we may see e.g. mmr_leaf.beefy_next_authority_set.id == commitment.commitment.validator_set_id + 3 here? Is it valid to skip set id in BEEFY? Like jumping from set 10 to 15?
not valid to skip set ids, validator set changes each session (even if validators and keys stay the same) and set id increments monotonically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are referring to mmr_leaf (which represents/identifies a particular block basically) and commitment - is the commitment here a vote for block number matching mmr_leaf, or the full commitment (super-majority set of votes finalizing the block)?
That's the full commitment (SignedCommitment
). I mean - we don't have (and don't need) an access to individual votes here, especially in votes that haven't made it in the full commitment. So this light client only works with SignedCommitment
s
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not valid to skip set ids, validator set changes each session (even if validators and keys stay the same) and set id increments monotonically.
Good to know, thank you!
#[pallet::weight(0)] // TODO: compute weights | ||
pub fn submit_commitment( | ||
origin: OriginFor<T>, | ||
// TODO: implement `TypeInfo` for `BridgedBeefySignedCommitment<T, I>`, `BeefyMmrProof` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this PR will be approved, I'll file an issue for this and for weights
origin: OriginFor<T>, | ||
// TODO: implement `TypeInfo` for `BridgedBeefySignedCommitment<T, I>`, `BeefyMmrProof` | ||
// and `BridgedBeefyMmrLeafUnpacked::<T, I>` | ||
encoded_commitment: Vec<u8>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While working on relay, I've found having mandatory encoded_mmr_proof
and mmr_leaf
arguments here breaks our basic assumptions: relay is not putting any restrictions on the node it is connected to. But in this case (unless I'm totally wrong), we need to be connected to node that has offchain indexing enabled and to generate a proof we need to have unpruned state (because it is generated during runtime call, iiuc).
The first requirement seems adequate, because it is how BEEFY is designed. The second one is redundant imo. So probably we shall:
- make these two arguments optional, or even better - extract to a separate call;
- get rid, or make optional this "unpacked" validator set structure. We shall be using merkle root of validator keys instead and optionally - save proved validator keys to the storage (to avoid verifying same keys next time when the same set is signing something).
Whether this will be fixed in this, or in follow up PRs, depends on how things will be going (review vs relay development).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But in this case, we need to be connected to node that has offchain indexing enabled and to generate a proof we need to have unpruned state.
Correct, given the relayer stays completely stateless. Alternatively the relayer could be fetching and storing this data as it is produced (even without finality) and use that instead of relying on the node to provide it.
Another option to get rid of the archive node requirement is to make the MMR proof creation completely off-chain - I believe the data in off-chain storage should always be sufficient (i.e. it's never pruned), but for simplicity we do that as a runtime call. We could move some of that logic to the relayer itself (sure, "hardcoding" trade-offs again).
The validator set is a bit easier, since it does not change that frequently. Also if the BEEFY digest item stays, we could use validators from there (i.e. not rely on the state) and re-merkelize them (again, "hardcoding" that logic in the relay).
IMHO these proposals are critical to consider long-term, but if delivery time is important it might be better to simply add the extra offchain-indexing + archive requirement to the upstream node, with an intention to optimize this later.
save proved validator keys to the storage (to avoid verifying same keys next time when the same set is signing something).
Depending on the acceptable bridge latency (aka cost), we might end up importing just one SignedCommitment
per session so this cache might not be very useful. However it obviously is a valid solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! So there are regular (GRANDPA-like) authorities-related digest items in BEEFY - I've missed it somehow (only seen this MMR root digest). So probably it would be better to revert to the same schema we've been using with GRANDPA && to track authorities from digest items. In this case leaf and its proof are definitely optional - they may only be required when we need to verify some secondary proof (like storage proof of some parachain -> parachain heads root -> mmr leaf -> mmr leaf proof).
I've seen @acatangiu was working on removing MMR root digest item. But I've found no issues about authorities-related digests removal (there's one related to BABE validators within parachains, but it seems unrealted), so I'll probably take this option. Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In our sub2sub bridge I think we can rely on the BEEFY authorities-related digest items in headers and avoid relying on the "more complicated" leaf data and its proof. There are no plans to remove the auth-related digest from substrate chains headers; the digest happens just once per session so its overhead is negligible.
Other non-substrate-based-chains bridges will use the BEEFY payload, the leafs and their proofs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bad - it is impossible to build BEEFY light client using only headers + commitments. There's nothing about header in the commitment (apart from block number, which is useless in this case), so we don't have a link between commitment and header => can't rely on header digests. So MMR leafs + proofs are definitely essential for the client. I'll try to experiment with generating MMR proofs outside of runtime - it should solve the archive-node problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So MMR leafs + proofs are definitely essential for the client.
Yes, anything that you can prove about the latest state will require some sort of MMR proof. We don't necessarily need headers (since we might have some data directly in the MMR leaf), but yeah, generating a MMR proof is necessary for anything built "on top" of BEEFY.
We could bring some extra data to the commitment itself if necessary (maybe block hash along the block number?) - I guess this could simplify some use cases (like getting somerthing from the latest state would require just header + state proof
instead of mmr proof + header + state proof
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, adding block hash to the commitment imo would make client development easier. But perhaps it will just be better (at least in terms of maintaining BEEFY) for all of us to use the same schema for all BEEFY clients on all chains. I can imagine that submitting header with public keys of 1024
Kusama BEEFY validators won't work for eth client (meaning tx size will be larger than 33kb).
So I'm leaning towards using MMR proofs, but I'd like to explore the option of generating these proofs outside of runtime to avoid relying on archive nodes. Of course if you or @acatangiu are insist on header-only light client, then we can alter commitment instead,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but I'd like to explore the option of generating these proofs outside of runtime to avoid relying on archive nodes.
This was easier than I've expected. Actually we don't need to do it outside of runtime - we only need to avoid touching any MMR-relevant storage items. I've implemented it here: https://github.com/paritytech/substrate/tree/sv-beefy-generate-historical-proof. Going to open PR on monday.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
upd: paritytech/substrate#11230 and paritytech/polkadot#5334 - don't have a time to work on that now, so I'm closing
I'm "marking" this as stale for now, since I'm not going to actively work on it in the near future. The outline of what needs to be done here (to future me):
|
We can probably also add another merkle root to the MMR leaf to support this more easily (i.e. merkle tree from pub keys instead of addresses). Your other solution to drop |
I am picking-up this PR and will continue the work here. For the moment I am ramping-up on the topic and for the beginning I merged the |
3f5f6b5
to
2dfee85
Compare
- impl OwnedBridgeModule - define LOG_TARGET - fix clippy warnings
2dfee85
to
f97c2c1
Compare
@serban300 you can start here with getting paritytech/substrate#11230 across the finish line. I can help you with that. |
@acatangiu thanks, sounds good ! What can I do for #11230 ? The PR seems closed Later edit: Sorry, I read the comments. I think I got it. |
I think we can close this PR since #1606 was merged. |
* Companion for substrate#11618 * update lockfile for {"polkadot", "substrate"} Co-authored-by: parity-processbot <>
starts #2480
This PR adds
pallet-bridge-beefy
, a pallet which is able to track finality of the bridged chain. There's a lot of things to do in this pallet and to think of before even starting implementation. The idea is that this PR will only addsubmit_commitment
function and the rest will be added in follow up PRs. Some things that need to be decided/implemented as a part of this PR:SignedCommitmentWitness
in Substrate code. Shall we use it? Is it really better for us to use 2 transactions instead of one? We may defer experiments for later and keep verifyingSignedCommitment
for now, but probably there's something important that I'm missing? UPD: let's leave it for future - we may actually support both options.BeefyNextAuthoritySet
holds merkle root of all validators public keys. My understanding is that by designsubmit_commitment
shall be accepting public keys of all signed validators (btw - can we recover public from signature, ECDSA allows this iirc?) and then ensure that the public belongs to current set by verifying merkle proof of membership (another additional argument). Imo this is fine for the case when we're verifying single commitment for every session (i.e. we're only verifying commitment of header that enacts new validator set). But what if we need to verify intermediate commitments? then we'll be doing many membership-proof verifications. We may store proved keys in the runtime storage, but it'll be additional write(s) anyways. Right now I've implemented version where all public keys are submitted during handoff, but probably I'm missing something? UPD: I suggest to leave it as is, unless there are strong arguments against that.10
using only MMR root at block100
=> we need to keep several MMR roots similar to how we store headers in GRANDPA pallet. That's because relay (or "other" clients) will be generating proofs using "best known root" of the pallet. And if we'll be rewriting this best root and not keeping it in the storage, some malicious actor may be just submitting commitments and all honest relay transactions will be rejected. "Malicious" is not necessary here - we can achieve the same if there are multiple relayers running at the same time;beefy-merkle-tree
,beefy-primitives
,pallet-mmr
andpallet-mmr-primitives
. Is it ok to directly referencepallet-mmr
? Do we need to use underlying mmr libraries directly?