Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

total_deltas becoming negative #7

Closed
angbrav opened this issue Mar 30, 2022 · 18 comments
Closed

total_deltas becoming negative #7

angbrav opened this issue Mar 30, 2022 · 18 comments

Comments

@angbrav
Copy link
Collaborator

angbrav commented Mar 30, 2022

Potential problem with total_deltas

Following there is an execution that makes total_deltas of a validator val become negative. The VP code seems to disallow it:
https://github.com/anoma/anoma/blob/master/proof_of_stake/src/validation.rs#L694-L698

  • Let validators[val].total_deltas[e+1-unbonding_length] = a1
  • Assume that val decides to unbond a1 at e+1-unbonding_length
  • We have that validators[val].total_deltas[e+1] = a1 - a1 = 0
  • Assume now that some evidence is found for an action val did at epoch e
  • We compute slashed_amount = validators[val].total_deltas[e] * slash_rate = a1 * slash_rate
  • Then update total_deltas at the offset: validators[val].total_deltas[e+pipeline_length] = 0 - a1*slash_rate < 0
@tzemanovic
Copy link
Collaborator

I think this is a real concern we'll need to resolve. Nice catch!

@tzemanovic
Copy link
Collaborator

I think the issue might be even worse than that, because we’re subtracting absolute slashed amounts from the validator total deltas. The validator total deltas are summed in each epoch, so we cannot tell what amount has been bonded and unbonded just from this field alone, so the absolute slashed amounts are effectively slashing the bonds (positive deltas), but not the unbonds (negative deltas), which doesn’t just lead to underflow but also wrong values when slashed bond is partially unbonded and when new bonds are added to be materialized at the same epoch as unbonds (submit a new bond unbonding_len - pipeline_len epochs after unbonding) With the example from the issue, say the slash rate is 50%, but instead of unbonding the full a1 , the validator unbonds just a half. Then what we end up with in the validator total deltas would amount to sum(a1, -a1/2, -a1/2) = 0 , but in fact, the validator should be left with a1/4 (half of the bond that wasn’t unbonded, slashed by 50%).

We're gonna revisit this with new slashing design.

@tzemanovic
Copy link
Collaborator

I tried to reason through some ideas on how to fix this issue, but something always ends up being off :/ I think the problem is that when we calculate the slashed amount from validator total deltas for epoch e, we cannot simply apply it at e+pipeline, because some of the bonds that constitute the total deltas might have been unbonded by then. In the total_deltas field, the unbonded amounts are subtracted without any slashes applied, they only get slashed when withdrawing, because slashable events might not be discovered yet. This then causes the problem where we calculate slash from the total_deltas and we try to apply it in future total_deltas without knowing how much of it is to be unbonded. We want to avoid iterating bonds when slashing, but this seems to force us to do so, so I’m not sure if it can be solved without a change in the design.

@cwgoes
Copy link
Collaborator

cwgoes commented Jul 26, 2022

If we consider the stake of the validator at the epoch when the infraction was committed, and the stake of the validator at the current epoch, the difference of these two values will be the cumulative effect of any new delegations to the validator (adding stake) and any new unbondings from the validator (subtracting stake). The former should be unaffected by the slash; the latter is being double-counted for total_deltas in this case. I think we could track per-epoch the total unbonding amount from each validator in the epoch (when executing an unbonding tx, this amount is updated), and subtract those from the stake which we consider for the total_deltas reduction in this case (iterating over a constant number of epochs), something like this. I can write a more detailed spec proposal later this week.

@angbrav
Copy link
Collaborator Author

angbrav commented Jul 29, 2022

I can write a more detailed spec proposal later this week.

It would be great. Then I can add it to the model and eventually verify that works :)

@tzemanovic
Copy link
Collaborator

This sounds good! I think tracking total unbonded amounts per validator per epoch should work, because an evidence is only valid before the epoch infraction_epoch + unbonding_len, so any unbonding that is not yet withdrawable must have originated from a slash-able bond

@angbrav
Copy link
Collaborator Author

angbrav commented Aug 19, 2022

This issue is still relevant in the context of cubic slashing.

@tzemanovic fix suggestion:

  • in Validator struct we add total_unbonded map<Epoch, amount:int> similar to total_deltas map<Epoch, amount:int> (we have to track unbonded amounts in addition to the total deltas, because total deltas may also include new bonds in future epochs which shouldn’t be slashed as they would have been added after the evidence epoch).

  • in unbond(), we update this field with validators[validator_address].total_unbonded[unbonding_length] += amount

  • update slashing calculations to:

end_of_epoch()
{
  var set_validators = {val | val = slash.validator && slash in enqueued_slashes[cur_epoch]}
  forall (validator_address in set_validators) do
    //iterate over all slashes for infractions within (-1,+1) epochs range (Step 2.1 of cubic slashing)
    var set_slashes = {s | s in enqueued_slashes[epoch] && cur_epoch-1 <= epoch <= cur_epoch+1 && s.validator == validator_address}
    //calculate the slash rate (Step 2.2 of cubic slashing)
    var rate = compute_final_rate(set_slashes)
    forall (slash in {s | s in enqueued_slashes[cur_epoch] && s.validator == validator_address}) do
      //set the slash on the now "finalised" slash amount in storage (Step 2.3 of cubic slashing)
      slash.rate = rate
      append(slashes[validator_address], slash)
      var total_staked = read_epoched_field(validators[validator_address].total_deltas, slash.epoch, 0)

      // CHANGES start from here      

      var offsets = {offset | 1 <= offset <= unbonding_length}
      var total_unbonded = 0

      //find the total unbonded from the slash epoch up to the current epoch first
      forall (epoch in epochs | slash.epoch <= epoch <= cur_epoch) do
        total_unbonded += read_epoched_field(validators[validator_address].total_unbonded, epoch, 0)

      var last_slash = 0
      forall (offset in offsets) do
        total_unbonded += read_epoched_field(validators[validator_address].total_unbonded, cur_epoch + offset, 0)
        var this_slash = (total_staked - total_unbonded) * slash.rate
        var slashed_amount = this_slash - last_slash
        last_slash = this_slash
        update_total_deltas(validator_address, offset, -1*slashed_amount)
        update_voting_power(validator_address, offset)

      // CHANGES end here  

    //unfreeze the validator (Step 2.5 of cubic slashing)
    validators[validator_address].frozen = false
  cur_epoch = cur_epoch + 1
}

@angbrav
Copy link
Collaborator Author

angbrav commented Aug 19, 2022

@tzemanovic fix suggestion:

I am a bit confused about your proposal. Why isn't the following enough?

forall (slash in {s | s in enqueued_slashes[cur_epoch] && s.validator == validator_address}) do
      //set the slash on the now "finalised" slash amount in storage (Step 2.3 of cubic slashing)
      slash.rate = rate
      append(slashes[validator_address], slash)
      var total_staked = read_epoched_field(validators[validator_address].total_deltas, slash.epoch, 0)
      var total_unbonded = 0
      forall (epoch in slash.epoch+1..cur_epoch) do
            total_unbonded += read_epoched_field(validators[validator_address].total_unbonded, epoch, 0)
      var slashed_amount = (total_staked - total_unbonded) * slash.rate
      //update voting power (Step 2.4 of cubic slashing)
      update_total_deltas(validator_address, 1, -1*slashed_amount)
      update_voting_power(validator_address, 1)

Maybe the indexes are a bit off (I need to think about it), but I think it should work. First we compute how much stake has been unbonded since the validator misbehaved (from slash.epoch+1 up to cur_epoch). Then we compute the slashed amount based on the difference between the stake at the misbehaving epoch (slash.epoch) minus total_unbonded (previously computed). Maybe I am missing something...

@tzemanovic
Copy link
Collaborator

@angbrav this was my reasoning, consider:

  • slash.epoch = n
  • in worst case scenario, it gets discovered last minute in epoch n + unbonding_len - 1 (at n + unbonding_len, we're not able to apply slash anymore, so it would be discarded)
    • the validator will be frozen only once discovered, so between (n .. n + unbonding_len - 1], any bonds can still be unbonded
      • those will be withdrawable by (n + unbonding_len .. n + 2*unbonding_len - 1]
  • the slash will be processed before change to epoch n + unbonding_len, so when cur_epoch = n + unbonding_len, we still need to iterate up to cur_epoch + unbonding_len - 1 to adjust slashed amounts for the unbonding all the way up to the withdrawable epoch ceiling

@angbrav
Copy link
Collaborator Author

angbrav commented Aug 22, 2022

Thanks for the clarification!

@cwgoes
Copy link
Collaborator

cwgoes commented Aug 23, 2022

@tzemanovic does the change to when we apply the voting power changes for unbonding (now at the pipeline offset) that we discussed last week change any of the logic which is necessary here?

@tzemanovic
Copy link
Collaborator

we'll need to check everything carefully, but it seems to me that the change (anoma/namada#366) shouldn't affect the logic for slashing in any way

@angbrav
Copy link
Collaborator Author

angbrav commented Sep 27, 2022

I've discovered the following. I think there is a missing piece in the fix proposed in #7 (comment).

Consider the following execution:

  • Let validators[val].total_deltas[e] = a1 and assume that belongs to a single bond issued by user.
  • Assume now that some evidence is found at e for an action val did at epoch e. Then val gets frozen at e and the evidence is processed at the end of epoch e+unbonding_length.
  • Assume that the rate for the evidence in 1. Then, after processing the evidence, validators[val].total_deltas[e+unbonding_length+1] = 0 and val is not frozen anymore.
  • Assume that user unbonds a1 by e+unbonding_length+1. Then validators[val].total_deltas[e+unbonding_length+1+pipeline_length] = 0 - a1 = -a1.

Conclusion: the same way we carefully update total_deltas considering total_unbonded when processing evidence at the end of an epoch, we should carefully update total_deltas considering slashes when unbonding.

I've already fixed the tla+ model, I'll soon comment the solution for the pseudocode model.

@tzemanovic
Copy link
Collaborator

@angbrav Thank you, the described issue makes sense and the proposed solution looks good to me! I think there’s however most likely another issue that is related to #27:

  • at the end of epoch when we’re applying enqueued slashes, we read total stake only once at the infraction epoch and then we're using total_unbonded to find the stake at the epoch offsets that we're updating

I think there's a risk of overriding/discarding previous slashes that may have already been applied at the same offsets, because in total_staked - total_unbonded, total_staked is the value from the infraction epoch (I think this is the case at least in the pseudo-code).

If I understand the issue in #27 correctly (in short - the slashes applied on total_deltas are not additive, so the rate calculation on withdrawing is off), I think we could fix both of these in the following way:

  • Instead of accumulating total_unbonded for the epoch offsets as we iterate them, re-read total_staked in every epoch and only subtract the total_unbonded from that epoch - this will make sure that we have the stake with any previous slashes applied in any given epoch.
  • When we're withdrawing tokens, for each epoch in increasing order in which there are some applicable slashes, use the slashed value as the base for stake in the following slash calculation (e.g. total_rate = this_slash + next_slash - this_slash * next_slash).

Additionally, I think it would be better if the slash rates for slashes processed in the same epoch are just their sums. To do this, we could group them by the validator addresses and sum the rates within the same epoch for each validator (again, this has to be consistent in both the total_unbonded and withdrawal slashing).

@angbrav
Copy link
Collaborator Author

angbrav commented Oct 14, 2022

I think there's a risk of overriding/discarding previous slashes that may have already been applied at the same offsets, because in total_staked - total_unbonded, total_staked is the value from the infraction epoch (I think this is the case at least in the pseudo-code).

I am not sure I am following. The function update_total_deltas(validator_address, offset, diff_slashed_amount) does not set total deltas to diff_slashed_amount but adds it. diff_slashed_amount will be negative (or 0 if all tokens were unbonded before a slash is processed) at cur_epoch+1 and 0 or positive afterwards. If there are two slashes to apply at the same offset, the model does the following:

  • Assume that both slashes are processed at e. This means that both infractions happened at the same epoch (e-unbonding_length). This is a simplification.
  • Assume that the total stake at the infraction epoch was total_stake, and the slash final ratio is 0.2.
  • Let total_deltas[e+1] = x before processing any slash at the end of e, and assume that no tokens were unbonded since the infraction.
  • After processing the first, we will have total_deltas[e+1] = x - total_stake*0.2.
  • After processing the second, we will have total_deltas[e+1] = x - total_stake*0.4, which I think it is correct.

Unless I misunderstood your concern or the simplification of assuming that both infractions happened at the same epoch precludes the problem, I do not see why what we have is not correct.

I think #27 is a different issue: in the execution there, the second infraction happens when the first one has been processed, so the stake cached when the evidence is processed already accounts for the amount slashed by the first infraction. The problem there is that when withdrawing, we apply the slash to the unbonded amount, which is set from the bond. The bonds do not get updated when slashes are processed, so there is a mismatch there.

@tzemanovic
Copy link
Collaborator

@angbrav ah, that makes sense, thanks for clearing it up!

In relation to #27, is it so that this implies the following?

  • Within the same epoch, validators with multiple slashes are slashed the sum of the rates
  • Across different epochs (e.g. x and y where y > x) validators with multiple slashes are slashed with additive rates in the order of the epochs (rate = rate(x) + rate(y) - rate(x) * rate(y))

If this holds, we can then match that logic to apply slashes when withdrawing, right?

angbrav added a commit that referenced this issue Oct 19, 2022
* alternative fix to 24

* delete commentted text and unused function

* fix typo

* change name variable

* inf -> bottom

* fix to latest issue of #25

* fix typo
@angbrav
Copy link
Collaborator Author

angbrav commented Oct 19, 2022

@tzemanovic it does not hold exactly like that, but quite similar. I've opened a PR (#29) that includes a fix. Please check :)

angbrav added a commit that referenced this issue Oct 27, 2022
…ing windows (#28)

* VotingpowerDelagations invariant

* add comment

* add a bunch of PoS invariants

* fix new issue total_deltas

* proper handling of misbehaving windows

* fix typos

* fix typos
@angbrav
Copy link
Collaborator Author

angbrav commented Oct 27, 2022

It seems that this issue is fixed! Closing in.

@angbrav angbrav closed this as completed Oct 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants