Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs(adr): ADR-007 pause unbonding period during equivocation proposal #964

Merged
merged 7 commits into from
Jun 12, 2023
93 changes: 93 additions & 0 deletions docs/docs/adrs/adr-007-pause-unbonding-on-eqv-prop.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
---
sidebar_position: 2
title: ADR Template
---
# ADR 007: Pause validator unbonding during equivocation proposal

## Changelog
* 2023-05-16: Initial Draft

## Status

Proposed

## Context

Currently, if an equivocation slashing proposal is created after more than one
mpoke marked this conversation as resolved.
Show resolved Hide resolved
week has passed since the equivocation, it is possible that the validator in
question could unbond and get away without being slashed, since the unbonding
period is 3 weeks, and the voting period is 2 weeks. For this reason, it might
be good to pause unbondings for validators named in an equivocation slashing
proposal until the proposal's voting period is over.

## Decision

### How

Pausing the unbonding period is already possible thanks to the changes in the
`staking` module of the cosmos-sdk:
- `stakingKeeper.PutUnbondingOnHold` pauses an unbonding period
- `stakingKeeper.UnbondingCanComplete` unpauses an unbonding period

These methods use a reference counter under the hood, that gets incremented
every time `PutUnbondingOnHold` is called, and decreased when
`UnbondingCanComplete` is called instead. A specific unbonding is considered
fully unpaused when its underlying reference counter reaches 0. Therefore, as
long as we safeguard consistency - i.e. we make sure we eventually decrement
the reference counter for each time we have incremented it - we can safely use
this existing mechanism without conflicts with the *Completion of Unbonding
Operations* system.

### When pause

The unbonding period (if there is any unbonding) should be paused once an
equivocation proposal enters the voting period. For that, the `gov` module's
hook `AfterProposalDeposit` can be used.

If the hook is triggered with a an equivocation proposal in voting period, then
for each equivocation of the proposal, the unbonding operations of the related
validator that were initiated after the equivocation block time must be paused
- i.e. the underlying reference counter has to be increased.

Note that even after the voting period has started, a proposal can receive
additional deposits. The hook is triggered however at arrival of a deposit, so
a check to verify that the proposal is not already in voting period is
required.

### When unpause

We can use a `gov` module's hook also here and it is
`AfterProposalVotingPeriodEnded`.

If the hook is triggered with an equivocation proposal, then for each
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may lead to unpausing unbonding operations that occurred after the AfterProposalDeposit hook was called. As a result, those unbondings would no longer wait for the unbonding period on the consumer to elapse.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, we'll fix that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed dd353c2

associated equivocation, the unbonding operations of the related validator that
were initiated between the equivocation block time and the start of the
proposal voting period must be unpaused - i.e. decrease the underlying
reference counter - regardless of the proposal outcome.

## Consequences

### Positive

- Validators subject to an equivocation proposal cannot finish unbonding
their tokens before the end of the voting period.

### Negative

- A malicious consumer chain could forge slash packets enabling submission of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, this is the main argument against this proposal. The main reason for pausing unbonding operations is to enable the consumer chains to have shorter unbonding periods and avoid unbonding operations to be delayed. However, reducing the unbonding period on a consumer chain would impact the trusting period of all light clients of that consumer chain (even clients on other chains). This may lead to light clients expiring, which would be a major disruption for the consumer chain.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The alternative to this proposal is to enable the provider to verify the evidence of equivocation on consumer chains, i.e., #732.

Copy link
Contributor

@jtremback jtremback May 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mpoke I'm not sure I follow. Currently (without this PR, or cryptographic verification), we are in a situation where the trusting period should be (consumer unbonding - provider voting period) * ~0.75. With a 20 day consumer unbonding period, that means the trusting period should be about 4-5 days. This is already pretty tough for consumer chains to deal with. If we took the consumer unbonding period down to 14 days like we want to, we'd need a trusting period of 0.

With this PR or cryptographic equivocation, we can go back to the trusting period being consumer unbonding * 0.75, or 10-11 days. So at least in this respect, it seems that this PR or cryptographic verification both mitigate this issue equivalently.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jtremback Yeah, that’s indeed confusing. Sorry for that.

There are actually two problems we are trying to address here. First, ensuring the correctness of the slashing mechanism of PoS. Second, avoiding delays in the completion of unbonding operations on the provider (which would affect the user experience). Note that the first is a safety concern, while the second is a liveness concern.

For the first, we MUST make sure that the stake of a validator that double signed on a consumer chain is still locked (on the provider) after an equivocation proposal passes. Thus, the following condition MUST hold consumerTrustingPeriod < consumerUnbondingPeriod - providerVotingPeriod. In addition, we MUST leave time for the double signing to be detected, a SlashPacket to be relayed and an equivocation proposal to be submitted. Let’s denote this extra time period with delta1. Then, consumerTrustingPeriod < consumerUnbondingPeriod - delta1 - providerVotingPeriod.

For the second, the following condition SHOULD hold consumerUnbondingPeriod < providerUnbondingPeriod. In addition, we SHOULD leave time to relay both VSCPackets and VSCMaturedPackets, and to account for potential downtimes of the consumer chain (as it happened on Neutron’s launch). Let’s denote this extra time period with delta2. Then, consumerUnbondingPeriod + delta2 < providerUnbondingPeriod.

From these two inequalities, we get consumerTrustingPeriod < providerUnbondingPeriod - delta2 - delta1 - providerVotingPeriod. For the Hub, providerUnbondingPeriod = 21 days, providerVotingPeriod = 14 days, which means consumerTrustingPeriod < 7 - delta2 - delta1. Best case scenario, delta1 = 1 day and delta2 = 1day, which means consumerTrustingPeriod < 5 days. This trusting period must be used for all light clients of the consumer chain, not only the ones on the provider. This means that the chances of a consumer light client expiring are quite high (which would clearly be a very disruptive event). Also, note that these values for the deltas are very low. For example, the network has ~7 days to detect a light client attack on a Hub client (i.e., delta1 ~ 7 days). Also, as we seen with Neutron, having delta2=1day may lead to delays of unbonding operations.

The only way to deal with this problem, is to get rid of the providerVotingPeriod, i.e., consumerTrustingPeriod < providerUnbondingPeriod - delta2 - delta1. This ADR is one way of doing that. The downside is that the suggested approach effectively increases the consumerUnbondingPeriod (when an equivocation proposal is submitted), which means that a malicious consumer can break liveness (i.e., delay unbondings). Given that such an event would result in that consumer being removed (via a ConsumerRemovalProposal), I think the risks are minimal. So, we could go ahead with this proposal for now until #732 is done.

an equivocation proposal on the provider chain, resulting in the freezing of
validator's unbondings for an undeterminated amount of time.
- Misbehavior on a consumer chain can potentially go unpunished, if no one
submits an equivocation proposal in time, or if the proposal doesn't pass.

### Neutral

- This feature can't be used for social slashing, because an equivocation
proposal is only accepted if there's a slash log for the related
validator(s), meaning the consumer chain has reported the equivocation to
the provider chain.

## References

* https://github.com/cosmos/interchain-security/issues/747
* https://github.com/cosmos/interchain-security/pull/791