total_deltas becoming negative #7

angbrav · 2022-03-30T10:56:12Z

Potential problem with total_deltas

Following there is an execution that makes total_deltas of a validator val become negative. The VP code seems to disallow it:
https://github.com/anoma/anoma/blob/master/proof_of_stake/src/validation.rs#L694-L698

Let validators[val].total_deltas[e+1-unbonding_length] = a1
Assume that val decides to unbond a1 at e+1-unbonding_length
We have that validators[val].total_deltas[e+1] = a1 - a1 = 0
Assume now that some evidence is found for an action val did at epoch e
We compute slashed_amount = validators[val].total_deltas[e] * slash_rate = a1 * slash_rate
Then update total_deltas at the offset: validators[val].total_deltas[e+pipeline_length] = 0 - a1*slash_rate < 0

The text was updated successfully, but these errors were encountered:

tzemanovic · 2022-03-31T08:53:30Z

I think this is a real concern we'll need to resolve. Nice catch!

tzemanovic · 2022-04-19T11:33:10Z

I think the issue might be even worse than that, because we’re subtracting absolute slashed amounts from the validator total deltas. The validator total deltas are summed in each epoch, so we cannot tell what amount has been bonded and unbonded just from this field alone, so the absolute slashed amounts are effectively slashing the bonds (positive deltas), but not the unbonds (negative deltas), which doesn’t just lead to underflow but also wrong values when slashed bond is partially unbonded and when new bonds are added to be materialized at the same epoch as unbonds (submit a new bond unbonding_len - pipeline_len epochs after unbonding) With the example from the issue, say the slash rate is 50%, but instead of unbonding the full a1 , the validator unbonds just a half. Then what we end up with in the validator total deltas would amount to sum(a1, -a1/2, -a1/2) = 0 , but in fact, the validator should be left with a1/4 (half of the bond that wasn’t unbonded, slashed by 50%).

We're gonna revisit this with new slashing design.

tzemanovic · 2022-06-29T11:44:54Z

I tried to reason through some ideas on how to fix this issue, but something always ends up being off :/ I think the problem is that when we calculate the slashed amount from validator total deltas for epoch e, we cannot simply apply it at e+pipeline, because some of the bonds that constitute the total deltas might have been unbonded by then. In the total_deltas field, the unbonded amounts are subtracted without any slashes applied, they only get slashed when withdrawing, because slashable events might not be discovered yet. This then causes the problem where we calculate slash from the total_deltas and we try to apply it in future total_deltas without knowing how much of it is to be unbonded. We want to avoid iterating bonds when slashing, but this seems to force us to do so, so I’m not sure if it can be solved without a change in the design.

cwgoes · 2022-07-26T16:47:31Z

If we consider the stake of the validator at the epoch when the infraction was committed, and the stake of the validator at the current epoch, the difference of these two values will be the cumulative effect of any new delegations to the validator (adding stake) and any new unbondings from the validator (subtracting stake). The former should be unaffected by the slash; the latter is being double-counted for total_deltas in this case. I think we could track per-epoch the total unbonding amount from each validator in the epoch (when executing an unbonding tx, this amount is updated), and subtract those from the stake which we consider for the total_deltas reduction in this case (iterating over a constant number of epochs), something like this. I can write a more detailed spec proposal later this week.

angbrav · 2022-07-29T17:24:31Z

I can write a more detailed spec proposal later this week.

It would be great. Then I can add it to the model and eventually verify that works :)

tzemanovic · 2022-08-01T12:31:59Z

This sounds good! I think tracking total unbonded amounts per validator per epoch should work, because an evidence is only valid before the epoch infraction_epoch + unbonding_len, so any unbonding that is not yet withdrawable must have originated from a slash-able bond

angbrav · 2022-08-19T10:00:21Z

This issue is still relevant in the context of cubic slashing.

@tzemanovic fix suggestion:

in Validator struct we add total_unbonded map<Epoch, amount:int> similar to total_deltas map<Epoch, amount:int> (we have to track unbonded amounts in addition to the total deltas, because total deltas may also include new bonds in future epochs which shouldn’t be slashed as they would have been added after the evidence epoch).
in unbond(), we update this field with validators[validator_address].total_unbonded[unbonding_length] += amount
update slashing calculations to:

end_of_epoch()
{
  var set_validators = {val | val = slash.validator && slash in enqueued_slashes[cur_epoch]}
  forall (validator_address in set_validators) do
    //iterate over all slashes for infractions within (-1,+1) epochs range (Step 2.1 of cubic slashing)
    var set_slashes = {s | s in enqueued_slashes[epoch] && cur_epoch-1 <= epoch <= cur_epoch+1 && s.validator == validator_address}
    //calculate the slash rate (Step 2.2 of cubic slashing)
    var rate = compute_final_rate(set_slashes)
    forall (slash in {s | s in enqueued_slashes[cur_epoch] && s.validator == validator_address}) do
      //set the slash on the now "finalised" slash amount in storage (Step 2.3 of cubic slashing)
      slash.rate = rate
      append(slashes[validator_address], slash)
      var total_staked = read_epoched_field(validators[validator_address].total_deltas, slash.epoch, 0)

      // CHANGES start from here      

      var offsets = {offset | 1 <= offset <= unbonding_length}
      var total_unbonded = 0

      //find the total unbonded from the slash epoch up to the current epoch first
      forall (epoch in epochs | slash.epoch <= epoch <= cur_epoch) do
        total_unbonded += read_epoched_field(validators[validator_address].total_unbonded, epoch, 0)

      var last_slash = 0
      forall (offset in offsets) do
        total_unbonded += read_epoched_field(validators[validator_address].total_unbonded, cur_epoch + offset, 0)
        var this_slash = (total_staked - total_unbonded) * slash.rate
        var slashed_amount = this_slash - last_slash
        last_slash = this_slash
        update_total_deltas(validator_address, offset, -1*slashed_amount)
        update_voting_power(validator_address, offset)

      // CHANGES end here  

    //unfreeze the validator (Step 2.5 of cubic slashing)
    validators[validator_address].frozen = false
  cur_epoch = cur_epoch + 1
}

angbrav · 2022-08-19T10:03:33Z

@tzemanovic fix suggestion:

I am a bit confused about your proposal. Why isn't the following enough?

forall (slash in {s | s in enqueued_slashes[cur_epoch] && s.validator == validator_address}) do
      //set the slash on the now "finalised" slash amount in storage (Step 2.3 of cubic slashing)
      slash.rate = rate
      append(slashes[validator_address], slash)
      var total_staked = read_epoched_field(validators[validator_address].total_deltas, slash.epoch, 0)
      var total_unbonded = 0
      forall (epoch in slash.epoch+1..cur_epoch) do
            total_unbonded += read_epoched_field(validators[validator_address].total_unbonded, epoch, 0)
      var slashed_amount = (total_staked - total_unbonded) * slash.rate
      //update voting power (Step 2.4 of cubic slashing)
      update_total_deltas(validator_address, 1, -1*slashed_amount)
      update_voting_power(validator_address, 1)

Maybe the indexes are a bit off (I need to think about it), but I think it should work. First we compute how much stake has been unbonded since the validator misbehaved (from slash.epoch+1 up to cur_epoch). Then we compute the slashed amount based on the difference between the stake at the misbehaving epoch (slash.epoch) minus total_unbonded (previously computed). Maybe I am missing something...

tzemanovic · 2022-08-19T16:40:15Z

@angbrav this was my reasoning, consider:

slash.epoch = n
in worst case scenario, it gets discovered last minute in epoch n + unbonding_len - 1 (at n + unbonding_len, we're not able to apply slash anymore, so it would be discarded)
- the validator will be frozen only once discovered, so between (n .. n + unbonding_len - 1], any bonds can still be unbonded
  - those will be withdrawable by (n + unbonding_len .. n + 2*unbonding_len - 1]
the slash will be processed before change to epoch n + unbonding_len, so when cur_epoch = n + unbonding_len, we still need to iterate up to cur_epoch + unbonding_len - 1 to adjust slashed amounts for the unbonding all the way up to the withdrawable epoch ceiling

angbrav · 2022-08-22T14:25:49Z

Thanks for the clarification!

cwgoes · 2022-08-23T13:10:19Z

@tzemanovic does the change to when we apply the voting power changes for unbonding (now at the pipeline offset) that we discussed last week change any of the logic which is necessary here?

tzemanovic · 2022-08-24T08:39:40Z

we'll need to check everything carefully, but it seems to me that the change (anoma/namada#366) shouldn't affect the logic for slashing in any way

angbrav · 2022-09-27T10:46:55Z

I've discovered the following. I think there is a missing piece in the fix proposed in #7 (comment).

Consider the following execution:

Let validators[val].total_deltas[e] = a1 and assume that belongs to a single bond issued by user.
Assume now that some evidence is found at e for an action val did at epoch e. Then val gets frozen at e and the evidence is processed at the end of epoch e+unbonding_length.
Assume that the rate for the evidence in 1. Then, after processing the evidence, validators[val].total_deltas[e+unbonding_length+1] = 0 and val is not frozen anymore.
Assume that user unbonds a1 by e+unbonding_length+1. Then validators[val].total_deltas[e+unbonding_length+1+pipeline_length] = 0 - a1 = -a1.

Conclusion: the same way we carefully update total_deltas considering total_unbonded when processing evidence at the end of an epoch, we should carefully update total_deltas considering slashes when unbonding.

I've already fixed the tla+ model, I'll soon comment the solution for the pseudocode model.

tzemanovic · 2022-10-03T15:42:31Z

@angbrav Thank you, the described issue makes sense and the proposed solution looks good to me! I think there’s however most likely another issue that is related to #27:

at the end of epoch when we’re applying enqueued slashes, we read total stake only once at the infraction epoch and then we're using total_unbonded to find the stake at the epoch offsets that we're updating

I think there's a risk of overriding/discarding previous slashes that may have already been applied at the same offsets, because in total_staked - total_unbonded, total_staked is the value from the infraction epoch (I think this is the case at least in the pseudo-code).

If I understand the issue in #27 correctly (in short - the slashes applied on total_deltas are not additive, so the rate calculation on withdrawing is off), I think we could fix both of these in the following way:

Instead of accumulating total_unbonded for the epoch offsets as we iterate them, re-read total_staked in every epoch and only subtract the total_unbonded from that epoch - this will make sure that we have the stake with any previous slashes applied in any given epoch.
When we're withdrawing tokens, for each epoch in increasing order in which there are some applicable slashes, use the slashed value as the base for stake in the following slash calculation (e.g. total_rate = this_slash + next_slash - this_slash * next_slash).

Additionally, I think it would be better if the slash rates for slashes processed in the same epoch are just their sums. To do this, we could group them by the validator addresses and sum the rates within the same epoch for each validator (again, this has to be consistent in both the total_unbonded and withdrawal slashing).

angbrav · 2022-10-14T10:19:02Z

I think there's a risk of overriding/discarding previous slashes that may have already been applied at the same offsets, because in total_staked - total_unbonded, total_staked is the value from the infraction epoch (I think this is the case at least in the pseudo-code).

I am not sure I am following. The function update_total_deltas(validator_address, offset, diff_slashed_amount) does not set total deltas to diff_slashed_amount but adds it. diff_slashed_amount will be negative (or 0 if all tokens were unbonded before a slash is processed) at cur_epoch+1 and 0 or positive afterwards. If there are two slashes to apply at the same offset, the model does the following:

Assume that both slashes are processed at e. This means that both infractions happened at the same epoch (e-unbonding_length). This is a simplification.
Assume that the total stake at the infraction epoch was total_stake, and the slash final ratio is 0.2.
Let total_deltas[e+1] = x before processing any slash at the end of e, and assume that no tokens were unbonded since the infraction.
After processing the first, we will have total_deltas[e+1] = x - total_stake*0.2.
After processing the second, we will have total_deltas[e+1] = x - total_stake*0.4, which I think it is correct.

Unless I misunderstood your concern or the simplification of assuming that both infractions happened at the same epoch precludes the problem, I do not see why what we have is not correct.

I think #27 is a different issue: in the execution there, the second infraction happens when the first one has been processed, so the stake cached when the evidence is processed already accounts for the amount slashed by the first infraction. The problem there is that when withdrawing, we apply the slash to the unbonded amount, which is set from the bond. The bonds do not get updated when slashes are processed, so there is a mismatch there.

tzemanovic · 2022-10-18T11:30:45Z

@angbrav ah, that makes sense, thanks for clearing it up!

In relation to #27, is it so that this implies the following?

Within the same epoch, validators with multiple slashes are slashed the sum of the rates
Across different epochs (e.g. x and y where y > x) validators with multiple slashes are slashed with additive rates in the order of the epochs (rate = rate(x) + rate(y) - rate(x) * rate(y))

If this holds, we can then match that logic to apply slashes when withdrawing, right?

* alternative fix to 24 * delete commentted text and unused function * fix typo * change name variable * inf -> bottom * fix to latest issue of #25 * fix typo

angbrav · 2022-10-19T11:23:30Z

@tzemanovic it does not hold exactly like that, but quite similar. I've opened a PR (#29) that includes a fix. Please check :)

…ing windows (#28) * VotingpowerDelagations invariant * add comment * add a bunch of PoS invariants * fix new issue total_deltas * proper handling of misbehaving windows * fix typos * fix typos

angbrav · 2022-10-27T12:31:31Z

It seems that this issue is fixed! Closing in.

tzemanovic mentioned this issue Jul 7, 2022

PoS - update integration spec for new rewards and slashing spec anoma/namada#125

Closed

angbrav mentioned this issue Jul 26, 2022

Add cubic slashing to pseudocode model #16

Merged

angbrav mentioned this issue Aug 17, 2022

Reworked TLA+ model with cubic slashing #19

Merged

angbrav mentioned this issue Aug 19, 2022

Problem with processing evidences too late #21

Closed

This was referenced Sep 27, 2022

Alternative to issue #24 and missing piece for #7 #26

Merged

PoS invariants, new fix to issue #7 and parameter to control misbehaving windows #28

Merged

angbrav added a commit that referenced this issue Oct 19, 2022

Alternative to issue #24 and missing piece for #7 (#26)

c946b41

* alternative fix to 24 * delete commentted text and unused function * fix typo * change name variable * inf -> bottom * fix to latest issue of #25 * fix typo

angbrav closed this as completed Oct 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

total_deltas becoming negative #7

total_deltas becoming negative #7

angbrav commented Mar 30, 2022

tzemanovic commented Mar 31, 2022

tzemanovic commented Apr 19, 2022

tzemanovic commented Jun 29, 2022

cwgoes commented Jul 26, 2022 •

edited

Loading

angbrav commented Jul 29, 2022

tzemanovic commented Aug 1, 2022

angbrav commented Aug 19, 2022

angbrav commented Aug 19, 2022

tzemanovic commented Aug 19, 2022

angbrav commented Aug 22, 2022

cwgoes commented Aug 23, 2022

tzemanovic commented Aug 24, 2022

angbrav commented Sep 27, 2022 •

edited

Loading

tzemanovic commented Oct 3, 2022

angbrav commented Oct 14, 2022

tzemanovic commented Oct 18, 2022

angbrav commented Oct 19, 2022

angbrav commented Oct 27, 2022

total_deltas becoming negative #7

total_deltas becoming negative #7

Comments

angbrav commented Mar 30, 2022

Potential problem with total_deltas

tzemanovic commented Mar 31, 2022

tzemanovic commented Apr 19, 2022

tzemanovic commented Jun 29, 2022

cwgoes commented Jul 26, 2022 • edited Loading

angbrav commented Jul 29, 2022

tzemanovic commented Aug 1, 2022

angbrav commented Aug 19, 2022

angbrav commented Aug 19, 2022

tzemanovic commented Aug 19, 2022

angbrav commented Aug 22, 2022

cwgoes commented Aug 23, 2022

tzemanovic commented Aug 24, 2022

angbrav commented Sep 27, 2022 • edited Loading

tzemanovic commented Oct 3, 2022

angbrav commented Oct 14, 2022

tzemanovic commented Oct 18, 2022

angbrav commented Oct 19, 2022

angbrav commented Oct 27, 2022

cwgoes commented Jul 26, 2022 •

edited

Loading

angbrav commented Sep 27, 2022 •

edited

Loading