Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Implementer's Guide: Incorporate HRMP to TransientValidationData #1588

Merged
17 commits merged into from
Aug 31, 2020

Conversation

pepyakin
Copy link
Contributor

@pepyakin pepyakin commented Aug 14, 2020

Updates TransientValidationData according to my comment here #1539 (comment)

Related to #1576
Follow-up #1629

What to look for

It is important to ensure that TransientValidationData contains all information to perform checks equivalent to those in Router's acceptance criteria.

@pepyakin pepyakin added A0-please_review Pull request needs code review. B0-silent Changes should not be mentioned in any release notes C1-low PR touches the given topic and has a low impact on builders. labels Aug 14, 2020
@rphmeier
Copy link
Contributor

Will there be a follow-up for PersistedValidationData? For instance, to store the incoming (downward) messages?

hrmp_digest: Vec<BlockNumber>,
/// The watermark of the HRMP. That is, the block number up to which (inclusive) all HRMP messages
/// sent to the parachain are processed.
hrmp_watermark: BlockNumber,
Copy link
Contributor

@rphmeier rphmeier Aug 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bkchr One thing to watch out for here, is that because this is in the TransientValidationData, it is not a direct parameter to the validation function and instead will be provided by inherent. That means that a malicious collator can provide something different (e.g. 100 instead of 90) and the relay-chain will still accept the block. So it is possible for this value to move backwards from the parachain's perspective and Cumulus needs to account for that. Similarly with other values in this struct.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, after conversation in DM with Sergei, our conclusion was that this watermark should not be provided via inherent and instead Cumulus should be tracking the watermark internally.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand this here, why isn't the relay chain checking that the watermark is >= old_watermark?

Copy link
Contributor Author

@pepyakin pepyakin Aug 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The relay-chain actually does check that the watermark is > old_watermark.

See the excerpt from the router.md

check_hrmp_watermark(P: ParaId, new_hrmp_watermark):
- new_hrmp_watermark should be strictly greater than the value of HrmpWatermarks for P (if any).
- new_hrmp_watermark must not be greater than the context's block number.
- in HrmpChannelDigests for P an entry with the block number equal to new_hrmp_watermark should exist.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if the parachain can not process all downward messages of a given relay chain block and thus could not increase the value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, in this particular place, we are discussing the HRMP watermark. For XCMP and by extension HRMP we have agreed on a simplifying assumption: we assume that a parachain can receive and process all the incoming messages within a block. I think that might be fine for HRMP, I don't think that we will reach the usage levels where they could be a problem and we are going to put aggressive limits on HRMP anyway (I see it happening within upcoming #1632).

For XCMP I think we might want to revisit this assumption. Under circumstances when the parachain is bombarded by the stream of messages over all the channels and processing of all these messages just don't fit into the PVF time or PoV space budget. Although that doesn't seem likely, I also feel that this simplifying assumption doesn't give us a lot - maybe it is not that hard to start with them?

Then, regarding downward messages. In the current version that we are discussing here, downward messages are implemented the same way you (@bkchr) introduced them: i.e. instead of a watermark we just used a simple processed_downward_messages. I liked this approach because, well, I didn't see a lot of reasons to use watermarks, but then today I finally broke down and switched to watermarks in #1629 (see the PR description to find out why).

But now I think maybe it would be a good idea to revert it back to processed_downward_messages. I'll think about that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the explanation :)

Although that doesn't seem likely, I also feel that this simplifying assumption doesn't give us a lot - maybe it is not that hard to start with them?

I'm still not sure that this is true, IMHO we can reach this limit relative easy. In the end the parachain wants to process some of its own transactions as well. Just alone think about a parathread that collects messages over quite some time, the parachain will probably not be able to process them in one block.

Copy link
Contributor Author

@pepyakin pepyakin Aug 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in a big part it depends on the parametrization and there are two parameters in play: weight and pov size. I don't think that it is productive to speak about the weight here, because it is highly dependent on the exact code of PVF at hand and more primarily it's PVF which will be designed to not violate any constraints put onto the it by polkadot (although we need to come up with a less costraining system overall, about this later). Therefore, let's assume for now that messages received will always fit into the weight constraints.

Regarding the PoV size, this could be enforced simply.

Since:

  1. there can be only so many paras, N, executed at the same time (I think currently we aim for N = 100 "cores"), and
  2. every parachain can target the same single parachain, and
  3. every parachain can only send one XCMP/HRMP message per block per recipient

we arrive to the worst case of N messages can be received at the same time, then we can say that we want to devote a half of PoV at most for messages. Dividing the terms gives us the maximum size of the message which would make it impossible to overwhelm the parachain as long as it's PVF doesn't use more than half of its PVF (and once again, it should follow the constraints unless it accepts possibility of getting in the aforementioned unfortunate situation).

Also, to clarify, I say it is unlikely situation because I don't think getting 100 slots and spamming another chain is an easy task. I feel there might be just going to the chain and spamming it might be cheaper. (Note that is not to say that we shouldn't keep this vector is mind!)

Regarding the parathread, I don't think they are too special here. The alternative, AFAIU your concerns, is to allow to process messages not only a block worth but message-by-message. In such case, it would be approximately the same.

As an aside: there is also a thing that is related to one of our recent discussions in the parachains team, i.e. parathreads should be able to process all the messages so that these messages don't slide beyond the availability pruning window, which should be less than a day.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3\. every parachain can only send one XCMP/HRMP message per block

Okay, I never heard from this before. However, this clearly lowers the amount of messages :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I beg your pardon! It is actually

every parachain can only send one XCMP/HRMP message per block per recipient

@pepyakin pepyakin force-pushed the ser-hrmp-after-validation-data branch from 4bbb9a9 to af3ce69 Compare August 18, 2020 14:38
@pepyakin
Copy link
Contributor Author

Will there be a follow-up for PersistedValidationData? For instance, to store the incoming (downward) messages?

FWIW, a follow up is here #1629

roadmap/implementers-guide/src/types/candidate.md Outdated Show resolved Hide resolved
hrmp_digest: Vec<BlockNumber>,
/// The watermark of the HRMP. That is, the block number up to which (inclusive) all HRMP messages
/// sent to the parachain are processed.
hrmp_watermark: BlockNumber,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand this here, why isn't the relay chain checking that the watermark is >= old_watermark?

Co-authored-by: Bastian Köcher <bkchr@users.noreply.github.com>
@pepyakin pepyakin added A3-in_progress Pull request is in progress. No review needed at this stage. and removed A0-please_review Pull request needs code review. labels Aug 24, 2020
@pepyakin pepyakin added A0-please_review Pull request needs code review. and removed A3-in_progress Pull request is in progress. No review needed at this stage. labels Aug 24, 2020
@pepyakin
Copy link
Contributor Author

pepyakin commented Aug 24, 2020

I grouped and split into a new struct HrmpTransientValidationData all data related to HRMP. I also added some data that was missing to check the acceptance criteria for a candidate, making this PR good to go.

I used quite some bit of git-fu (between this and some not-yet-published PR) so there might be some rebase artifacts.

Note, that some data is superfluous there. If we had had #1594 we wouldn't have needed those to live in TransientValidationData because these data do not inform the collator.

Take for example open_requests/close_requests. The decision for opening or closing a channel is taken by the PVF and not by a collator. If the PVF decided to open another channel beyond the allowed number of channels anyway, then all bets are off, and the collator won't be able to do anything.

One piece that is still missing is how do the messages get inside. The follow-up #1629 doesn't directly address this yet. Another, is the number of maximum channels. But I will file it as a follow-up with other limiting and tightening. (UPD: Follow up: #1632)

/// elements are ordered by ascending the block number. The vector doesn't contain duplicates.
digest: Vec<BlockNumber>,
/// The watermark of the HRMP. That is, the block number up to which (inclusive) all HRMP messages
/// sent to the parachain are processed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docs are a bit unclear. Is this the previous watermark of the para or the new watermark?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it is the previous. I didn't want to repeat myself that all of those are collected before the execution of the PVF so I left a note above:

It's worth noting that all the data is collected before the candidate execution.

Do you think it's better to clarify that they are from the current state explictily?

open_requests: Vec<(HrmpChannelId, HrmpAbridgedOpenChannelRequest)>,
/// A vector of close requests in which the para participates either as sender or recipient.
/// The vector doesn't contain two entries with the same `HrmpChannelId`.
close_requests: Vec<HrmpChannelId>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does a channel appear in this vector only during one parablock, or does it appear in this vector as long as the channel has not yet been closed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are channel close requests and these two fields contain all the registered at the context block before the execution of PVF. You can think basically as a copy of the storage list found in the router, but only filtered to be relevant to the parachain at hand.
As the open/close requests found in the router module storage, they live up until they are handled. For the case of close requests - it's the first session boundary.

/// A vector of open requests in which the para participates either as sender or recipient. The
/// items are ordered ascending by `HrmpChannelId`. The vector doesn't contain two entries
/// with the same `HrmpChannelId`.
open_requests: Vec<(HrmpChannelId, HrmpAbridgedOpenChannelRequest)>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this and close requests need to be part of PersistedValidationData. We don't want the collator to be able to forge this data.

Or maybe the hash of that data should be in the PersistedValidationData.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, the PVF needs a way to authenticate that the following events:

  1. there is a new inbound open request
  2. an outbound open request was confirmed by the recipient
  3. an outbound open request timed out. This one could theoretically be tracked by the PVF, but only assuming that the configuration doesn't change which can happen even if rarely.

(I believe other cases can be tracked by the PVF itself. E.g. it should know that it shouldn't send two open requests to the same recipient, confirm the same inbound open request twice, etc)

A collator should not be able to conceal any of these events.

I see the following solutions:

  1. put raw requests into PersistedValidationData. That inflates the persistent data, but not that bad: the number of open and close requests are bounded (the former is by the sum of max ingress and egress channels, the latter by the number of already opened channels).
    • Then we can further compress data by leveraging the fact that either sender or recipient is this para.
    • Then, we can remove confirmed: bool for all inbound open requests.
    • This data will be duplicated for successive block in the same session though.
  2. put a hash into PersistedValidationData.
    • Essentially, allows us to move this data into the PVF's state and not duplicate it.
  3. send a DM for each of these events.
    • The same as previous but less complex since it relies on an existing mechanism
    • A caveat: the para can be a bit slow to process DMQ and potentially miss a notification or receive it when it's too late.
  4. punt off on the relay chain storage root proofs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we assume that with #1642 we limit the total number of channels to a 2 digit number and then retire HRMP completely may be it makes sense to actually go with 1 for the time being.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created #1663 to address this

pepyakin and others added 2 commits August 31, 2020 12:25
@pepyakin
Copy link
Contributor Author

I think all points were addressed or created issues for follow-ups, so I am going ahead and merging this to make my life less painful because of the rebases of the next PRs.

That said, feel free to leave any comments and I will address them in the follow up PRs.

@pepyakin

This comment has been minimized.

@ghost

This comment has been minimized.

@ghost

This comment has been minimized.

@pepyakin

This comment has been minimized.

@drahnr

This comment has been minimized.

@ghost

This comment has been minimized.

@pepyakin
Copy link
Contributor Author

bot merge

@ghost
Copy link

ghost commented Aug 31, 2020

Trying merge.

@ghost ghost merged commit f125cbe into master Aug 31, 2020
@ghost ghost deleted the ser-hrmp-after-validation-data branch August 31, 2020 12:01
This pull request was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A0-please_review Pull request needs code review. B0-silent Changes should not be mentioned in any release notes C1-low PR touches the given topic and has a low impact on builders.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants