Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Dispute distribution guide #3158

Merged
merged 12 commits into from
Jun 22, 2021
226 changes: 197 additions & 29 deletions roadmap/implementers-guide/src/node/disputes/dispute-distribution.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,15 @@
# Dispute Distribution

## Design Goals

This design should result in a protocol that is:

- resilient to nodes being temporarily unavailable
- make sure nodes are aware of a dispute quickly
- relatively efficient, should not cause too much stress on the network
- be resilient when it comes to spam
- be simple and boring: We want disputes to work when they happen

## Protocol

### Input
Expand All @@ -10,48 +20,211 @@

- [`DisputeCoordinatorMessage::ActiveDisputes`][DisputeParticipationMessage]
- [`DisputeCoordinatorMessage::ImportStatements`][DisputeParticipationMessage]
- [`DisputeCoordinatorMessage::QueryCandidateVotes`][DisputeParticipationMessage]
- [`RuntimeApiMessage`][RuntimeApiMessage]

## Functionality
### Wire format

#### Disputes

Protocol: "/polkadot/dispute/1"

Request:

```rust
struct DisputeRequest {
// Either initiating invalid vote or our own (if we voted invalid).
invalid_vote: SignedV2<InvalidVote>,
eskimor marked this conversation as resolved.
Show resolved Hide resolved
// Some invalid vote (can be from backing/approval) or our own if we voted
// valid.
valid_vote: SignedV2<ValidVote>,
}

struct InvalidVote {
/// The candidate being disputed.
candidate_hash: CandidateHash,
/// The voting validator.
validator_index: ValidatorIndex,
/// The session the candidate appears in.
candidate_session: SessionIndex,
}
eskimor marked this conversation as resolved.
Show resolved Hide resolved

struct ValidVote {
candidate_hash: CandidateHash,
validator_index: ValidatorIndex,
candidate_session: SessionIndex,
kind: ValidDisputeStatementKind,
}
```

Response:

### Distribution
```rust
enum DisputeResponse {
Confirmed
}
```

#### Vote Recovery

Protocol: "/polkadot/vote-recovery/1"

```rust
struct IHaveVotesRequest {
candidate_hash: CandidateHash,
session: SessionIndex,
votes: VotesBitfield,
eskimor marked this conversation as resolved.
Show resolved Hide resolved
}

struct VotesBitfield(pub BitVec<bitvec::order::Lsb0, u8>);
```

Response:

```rust
struct VotesResponse {
/// All votes we have, but the requester was missing.
missing: Vec<(DisputeStatement, ValidatorIndex, ValidatorSignature)>,
/// Any additional equivocating votes, we transmit those even if the sender
/// claims to have votes for that validator (as it might only have one).
equivocating: Vec<(DisputeStatement, ValidatorIndex, ValidatorSignature)>,
eskimor marked this conversation as resolved.
Show resolved Hide resolved
}
```

## Functionality

Distributing disputes needs to be a reliable protocol. We would like to make as
sure as possible that our vote got properly delivered to all concerned
validators. For this to work, this subsystem won't be gossip based, but instead
will use a request/response protocol for application level confirmations. The
request will be the payload (the `ExplicitDisputeStatement`), the response will
be the confirmation. On reception of `DistributeStatement` a node will send and
keep retrying delivering that statement in case of failures as long as the
dispute is active to all concerned validators. The list of concerned validators
will be updated on every block and will change at session boundaries.
request will be the payload (the actual votes/statements), the response will
be the confirmation. See [above][#wire-format].

### Starting a Dispute

A dispute is initiated once a node sends the first `Dispute` wire message,
which must contain an "invalid" vote and some "valid" vote.

The dispute distribution subsystem can instructed to send that message out to
all concerned validators by means of a `DisputeDistributionMessage::SendDispute`
message. That message must contain an invalid vote from the local node and some
valid one, e.g. a backing statement.

As can be determined from the protocol section, this subsystem is only concerned
with delivering `ExplicityDisputeStatement`s, for all other votes
(backing/approval) the dispute coordinator is responsible of keeping track of
those statements.
We include a valid vote as well so any node regardless of whether it is synced
with the chain or not or has seen backing/approval vote can see that there are
conflicting votes available, hence we have a valid dispute. Nodes will still
need to check whether the disputing votes are somewhat current and not some
stale ones.

### Participating in a Dispute

Upon receiving a `Dispute` message, a dispute distribution will trigger the
import of the received votes via the dispute coordinator
(`DisputeCoordinatorMessage::ImportStatements`). The dispute coordinator will
take care of participating in that dispute if necessary. Once it is done, the
coordinator will send a `DisputeDistributionMessage::SendDispute` message to dispute
eskimor marked this conversation as resolved.
Show resolved Hide resolved
distribution. From here, everything is the same as for starting a dispute,
except that if the local node deemed the candidate valid, the `SendDispute`
message will contain a valid vote signed by our node and will contain the
initially received `Invalid` vote.

### Sending of messages

Starting and participting in a dispute are pretty similar from the perspective
of disptute distribution. Once we receive a `SendDispute` message we try to make
eskimor marked this conversation as resolved.
Show resolved Hide resolved
sure to get the data out. We keep track of all the parachain validators that
should see the message, which are all the parachain validators of the session
where the dispute happened as they will want to participate in the dispute. In
addition we also need to get the votes out to all authorities of the current
session (which might be the same or not). Those authorities will not
participtate in the dispute, but need to see the statements so they can include
eskimor marked this conversation as resolved.
Show resolved Hide resolved
them in blocks.

We keep track of connected parachain validators and authorities and will issue
warnings in the logs if connected nodes are less than two thirds of the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are validator operators meant to do in this situation?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On a live network - check their internet connection, on a test network: find the bug. Like we had the case already a couple of times that we were not connected properly and we did only realize because of other issues that have been caused by this. Dispute distribution already tries its best - it will try to send requests, even if the receiver is not connected to us and it will keep trying.

I was just thinking about disputes and because they are so critical, I wanted to do whatever I can to ensure our messages gets out, but general connection warnings should go to gossip support already, I guess. And warnings when a dispute is already happening are a bit late - still useful though as a additional safety guard though. If validators become aware of a dispute that did not work out for some weird reasons/bugs/whatever we still have governance - better than nobody noticing.

corresponding sets. We also only consider a message transmitted, once we
received a confirmation message. If not we will keep retrying getting that
message out as long as the dispute is deemed alive. To determine whether a
dispute is still alive we will issue a
`DisputeCoordinatorMessage::ActiveDisputes` message before each retry run. Once
a dispute is no longer live, we will clean up the state coordingly.

To cather with spam issues, we will in a first implementation only consider
eskimor marked this conversation as resolved.
Show resolved Hide resolved
disputes of already included data. Therefore only for candidates that are
eskimor marked this conversation as resolved.
Show resolved Hide resolved
already available. These are the only disputes representing an actual threat to
the system and are also the easiest to implement with regards to spam.

Votes can still be old/ not relevant. In this case we will drop those messages
and we might want to decrease reputation of peers sending old data.

### Reception

Apart from making sure that local statements are sent out to all relevant
validators, this subsystem is also responsible for receiving votes from other
nodes. Because we are not forwarding foreign statements, spam is not so much of
an issue. We should just make sure to punish a node if it issues a statement for
a candidate that was not found available.
Because we are not forwarding foreign statements, spam is not so much of
an issue. Rate limiting should be implemented at the substrate level, see
[#7750](https://github.com/paritytech/substrate/issues/7750).

### Node Startup

On startup we need to check with the dispute coordinator for any ongoing
disputes and assume we have not yet sent our statement for those. In case we
find an explicit statement from ourselves via
`DisputeCoordinatorMessage::QueryCandidateVotes` we will pretend to just have
received a `SendDispute` message for that candidate.
eskimor marked this conversation as resolved.
Show resolved Hide resolved

## Backing and Approval Votes

Backing and approval votes get imported when they arrive/are created via the
distpute coordinator by corresponding subsystems.

We assume that under normal operation each node will be aware of backing and
approval votes and optimize for that case. Nevertheless we want disputes to
conclude fast and reliable, therefore if a node is not aware of backing/approval
votes it can request the missing votes from the node that informed it about the
dispute.

For each received vote, the subsystem will send an
`DisputeCoordinatorMessage::ImportStatements` message to the dispute
coordinator. We rely on the coordinator to trigger validation and availability
recovery of the candidate, if there was no local vote for it yet and to report
back to us via `DisputeDistributionMessage::ReportCandidateUnavailable` if a
candidate was not found available.
## Resiliency

### Considerations
The above protocol should be sufficient for most cases, but there are certain
cases we also want to have covered:

- Non validator nodes might be interested in ongoing voting, even before it is
recorded on chain.
- Nodes might have missed votes, especially backing or approval votes.
Recovering them from chain is difficult and expensive, due to runtime upgrades
and untyped extrinsics.

To cover those cases, we introduce a second request/response protocol, which can
be handled on a lower priority basis as the one above. It consists of the
request/response messages as described in the [protocol
section][#vote-recovery].

Nodes may send those requests to validators, if they feel they are missing
votes. E.g. after some timeout, if no majority was reached yet in their point of
view or if they are not aware of any backing/approval votes for a received
disputed candidate.

The receiver of a `IHaveVotesRequests` message will do the following:

1. See if the sender is missing votes we are aware of - if so, respond with
those votes. Also send votes of equivocating validators, no matter the
bitfield.
2. Check whether the sender knows about any votes, we don't know about and if so
send a `IHaveVotes` request back, with our knowledge.
3. Record the peer's knowledge.

When to send `IHaveVotes` messages:

1. Whenever we are asked to do so via
`DisputeDistributionMessage::FetchMissingVotes`.
2. Approximately once per block to some random validator as long as the dispute
is active.

Spam considerations: Nodes want to accept those messages once per validator and
per slot. They are free to drop more frequent requests or requests for stale
data. Requests coming from non validator nodes, can be handled on a best effort
basis.

## Considerations

Dispute distribution is critical. We should keep track of available validator
connections and issue warnings if we are not connected to a majority of
Expand All @@ -77,14 +250,9 @@ providers of the data, hence distributing load and making prevention of the
dispute from concluding harder and harder over time. Assuming an attacker can
not DoS a node forever, the dispute will succeed eventually, which is all that
matters. And again, even if an attacker managed to prevent such a dispute from
happening somehow, there is no real harm done, there was no serious attack to
happening somehow, there is no real harm done: There was no serious attack to
begin with.

Third: As candidates can be made up at will, we are susceptible to spam. Two
validators can continuously contradict each other. This will result in a slash
of one of them. Still it would be good to consider that possibility and make
sure the network will work properly in such an event.

[DistputeDistributionMessage]: ../../types/overseer-protocol.md#dispute-distribution-message
[RuntimeApiMessage]: ../../types/overseer-protocol.md#runtime-api-message
[DisputeParticipationMessage]: ../../types/overseer-protocol.md#dispute-participation-message
16 changes: 14 additions & 2 deletions roadmap/implementers-guide/src/types/overseer-protocol.md
Original file line number Diff line number Diff line change
Expand Up @@ -432,10 +432,22 @@ responsible of distributing explicit dispute statements.

```rust
enum DisputeDistributionMessage {

/// Tell dispute distribution to distribute an explicit dispute statement to
validators.
eskimor marked this conversation as resolved.
Show resolved Hide resolved
DistributeStatement(ExplicitDisputeStatement),
/// Tell the subsystem that a candidate is not availble. Dispute distribution
SendDispute((ValidVote, InvalidVote)),

/// Ask DisputeDistribution to get votes we don't know about.
/// Fetched votes will be reported via `DisputeCoordinatorMessage::ImportStatements`
FetchMissingVotes {
candiate_hash: CandidateHash,
eskimor marked this conversation as resolved.
Show resolved Hide resolved
session: SessionIndex,
knownVotes: Bitfield,
eskimor marked this conversation as resolved.
Show resolved Hide resolved
/// Optional validator to query from. `ValidatorIndex` as in the above
referenced session.
eskimor marked this conversation as resolved.
Show resolved Hide resolved
from_validator: Option<ValidatorIndex>,
}
/// Tell the subsystem that a candidate is not available. Dispute distribution
can punish peers distributing votes on unavailable hashes for example.
eskimor marked this conversation as resolved.
Show resolved Hide resolved
ReportCandidateUnavailable(CandidateHash),
}
Expand Down