Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jailing circuit breaker #404

Closed
2 tasks done
jtremback opened this issue Oct 19, 2022 · 6 comments · Fixed by #462
Closed
2 tasks done

Jailing circuit breaker #404

jtremback opened this issue Oct 19, 2022 · 6 comments · Fixed by #462
Assignees
Labels
type: feature-request New feature or request improvement

Comments

@jtremback
Copy link
Contributor

jtremback commented Oct 19, 2022

Problem

I'd like to find a way to mitigate the worst-case scenario possible in ICS. This scenario is one where an attacker sneaks code into a consumer chain which is able to send many downtime or double signing packets at once. The attacker then creates 175 validators just below the Hub's active set, and slashes every real validator at once. These validators are then jailed, and control of the chain passes over to the attacker's 175 validators, enabling them to steal all tokens bridged to the Hub over IBC.

To mitigate this scenario, it would be good to put a circuit breaker into the slashing packet receiving code on the provider. This circuit break would make it impossible to jail more than x% (probably between 1-5% would be good) of the power on the provider per hour. This would make the provider takeover attack take around a day, allowing the remaining validators to be alerted and halt the chain.

The design of this feature is not too difficult but will require some thought. Here's a naive design:

  • When a slashing packet comes in, add it to a queue
  • Every hour, an endblocker process checks the queue
  • If the first slash packet in the queue is for a validator with more than x%, take it off the queue and slash/jail the validator
  • If the first slash packet in the queue is less than x%, take another one off the queue and slash the validator. Repeat until x% is reached.

I'm not sure if this is correct/optimal tbh

Closing criteria

When this feature is implemented.

TODOs

  • Labels have been added for issue
  • Issue has been added to the ICS project
@shaspitz
Copy link
Contributor

shaspitz commented Oct 21, 2022

Nice writeup, I'll throw out some ideas, but your throttling mechanism or some variation of it seems like a reasonable way forward.

Throttling

Limit the time it takes for a high volume of slash packets to actually affect validator voting power. This is what you've described above.

  • Pros: less prone to error, seems complete.
  • Cons: Requires manual intervention from validators. Potentially more complex to implement, particularly w.r.t removing a portion of a validator's voting power (ex: coinbase has 7%, do we remove all voting power in a single endblock in case of slash? Or in multiple x% steps over multiple endblocks? I think the sdk staking module only supports the former with jailing).

Panic Threshold

Alternatively, set a threshold for which a certain # of slash packets, or a certain amount of slashed voting % (within some time window), triggers the provider chain to panic and therefore halt. Validators could then evaluate the situation, and take steps from there.

  • Pros: simpler to implement, more autonomous. If that's a good thing?
  • Cons: Potential for the provider chain to incorrectly enter the emergency/panic state when the desired behavior was actually to jail a bunch of validators in a short period of time. We could set parameters s.t this would be rare tho

TLDR

Do we want the provider to be halted autonomously under certain conditions? Or do we want throttling to occur, where provider validators would have enough time to react to the attack manually

@jtremback
Copy link
Contributor Author

I'd like to avoid the possibility for malicious consumer code to halt the provider

@mpoke
Copy link
Contributor

mpoke commented Oct 25, 2022

Cons: Requires manual intervention from validators. Potentially more complex to implement, particularly w.r.t removing a portion of a validator's voting power (ex: coinbase has 7%, do we remove all voting power in a single endblock in case of slash? Or in multiple x% steps over multiple endblocks? I think the sdk staking module only supports the former with jailing).

@smarshall-spitzbart Not sure that I follow. A slash event will result in the validator getting jailed, which means removing its entire stake from the total stake of the validator set. Why would we do it in multiple smaller steps over multiple endblocks?

@shaspitz
Copy link
Contributor

That was just an idea, it was not feasible.

cosmos/ibc#869 is implemented in such a way that the jailing always happens atomically, but a validator with a large % voting power would cause the slash meter to go negative, meaning no more slash packets will be handled until the meter is replenished to a positive value

@shaspitz
Copy link
Contributor

shaspitz commented Oct 25, 2022

@danwt just brought up a great point that if we're talking about the scenario where many slash packets are being received by the provider in a short period of time, we should consider the potential size of the packet queue. Is there a param we could set s.t we start to drop slash packets when an unreasonable amount are being received by a certain chain? Unreasonable being defined as: it would infeasible to store all those packets on chain.

Edit: Just talked through some solutions with Jehan. The way we're going to alleviate the issue is by adjusting the protocol to drop (not queue) slash packets which are relevant to a validator that is already jailed/tombstoned. This way, there is a limit (per consumer chain) to the amount of slash packets that can actually clog up the queue. We'll have to think about this issue deeper when we start talking about a large number of consumers. See https://github.com/smarshall-spitzbart/ibc/blob/main/spec/app/ics-028-cross-chain-validation/methods.md?plain=1#L1660

@alexanderbez
Copy link
Contributor

alexanderbez commented Oct 26, 2022

Just posting this here for awareness, there is a plan and intent to have a more general circuit breaker built-in natively in the SDK: cosmos/cosmos-sdk#926

@mpoke mpoke moved this from Todo to In Progress in Replicated Security Oct 27, 2022
Repository owner moved this from In Progress to Done in Replicated Security Dec 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: feature-request New feature or request improvement
Projects
No open projects
Status: Done
Development

Successfully merging a pull request may close this issue.

4 participants