Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Runtime Task Executor + Example for staking slashing spans #8197

Closed
wants to merge 17 commits into from

Conversation

kianenigma
Copy link
Contributor

@kianenigma kianenigma commented Feb 24, 2021

Could fix https://github.com/paritytech/srlabs_findings/issues/57
Fixes #6835
Fixes #5188
Will also lay the foundation of #7911 and paritytech/polkadot-sdk#465 via the scheduler.

The gist of it is that: From now on each vote has an enabled flag, and we flip that upon slash. This means for each slash event we need to iterate all nominators, which is why I made the scheduler.

@shawntabrizi pointed out an interesting idea that we could outsource this operation: If we track the total voters of each validator, we can have someone submit the list of nominators that need to be iterated, potentially saving a lot of time.

I am okay with that as well, in which case we need to decouple this PR. But note that even then, iterating a few hundred keys might be something that we want to split among blocks. Overall, the scheduler is a personal interest of mine, and I want to experiemnt with it anyhow.

Also, I belive perhaps the work of @athei for contracts can be rewritten using a similar approach as what we do here.

frame/staking/src/scheduler.rs Outdated Show resolved Hide resolved
@athei
Copy link
Member

athei commented Feb 25, 2021

Also, I belive perhaps the work of @athei for contracts can be rewritten using a similar approach as what we do here.

You mean the lazy storage removal? From a glance over this code I think I am doing the same thing as you are doing here. Just in a less generalized way (queuing tasks in storage and working on them in on_initialize). If we could extract this RuntimeTaskScheduler to its own crate I could make use of it to implement lazy storage removal. Is that what you mean?

This would allow us to have a global queue of tasks. It would solve the problem of coming up with proper weight limits for each pallet's on_initialize workqueue.

@kianenigma
Copy link
Contributor Author

Also, I belive perhaps the work of @athei for contracts can be rewritten using a similar approach as what we do here.

You mean the lazy storage removal? From a glance over this code I think I am doing the same thing as you are doing here. Just in a less generalized way (queuing tasks in storage and working on them in on_initialize). If we could extract this RuntimeTaskScheduler to its own crate I could make use of it to implement lazy storage removal. Is that what you mean?

This would allow us to have a global queue of tasks. It would solve the problem of coming up with proper weight limits for each pallet's on_initialize workqueue.

yeah exactly, I thought maybe we can try this in the contracts pallet as well to see if it is any good or not.

As you said, I think they should fit nicely as they are doing the same thing, this one is just trying to be reusable.

Copy link
Member

@athei athei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if my comments are useful because it is still WIP.


/// Add a new task to the internal queue.
pub fn add_task(&mut self, task: Task) {
self.tasks.push(task)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would need to cap the size of the queue because the weight to decode the Vec is linear to the size of the queue. Also we should make use of append here to have O(1) costs when pusing items to storage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah so in the new API, ideally we need both append and decode_len in StoredExecutor.

Although, this situation exists because I want to abstract the task management from the pallet. As you see now, as a pallet you just accept a type E: StoredExecutor, put it in storage and mutate it whenever you want to .execute. This is a okay-ish abstraction (compared to you manually being in charge of putting a Vec<Task> in storage), but we kinda miss the benefit of decode_len and append.

but I am sure we can build these abstractions, if the whole idea of having StoredExecutor seems like a good way to go.

frame/staking/src/lib.rs Outdated Show resolved Hide resolved
@@ -1721,16 +1824,16 @@ decl_module! {
ensure!(!targets.is_empty(), Error::<T>::EmptyTargets);
ensure!(targets.len() <= MAX_NOMINATIONS, Error::<T>::TooManyTargets);

let old = Nominators::<T>::get(stash).map_or_else(Vec::new, |x| x.targets);
let old = Nominators::<T>::get(stash).map_or_else(Vec::new, |x| x.targets.into_iter().map(|(x, _)| x).collect::<Vec<_>>());

let targets = targets.into_iter()
.map(|t| T::Lookup::lookup(t).map_err(DispatchError::from))
.map(|n| n.and_then(|n| if old.contains(&n) || !Validators::<T>::get(&n).blocked {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same note about allocations here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how it can be avoided here?

@kianenigma kianenigma requested review from athei and apopiak March 12, 2021 10:15
@kianenigma kianenigma marked this pull request as ready for review March 12, 2021 10:16
@github-actions github-actions bot added the A0-please_review Pull request needs code review. label Mar 12, 2021
@kianenigma
Copy link
Contributor Author

This is now ready for review. I'd be happy to either try myself, or if anyone's interested, explore both #7911 paritytech/polkadot-sdk#465 using this abstraction. You'll see that it is a bit over-engineered just for the purpose of slashing tasks, and the reason is that I envision building other multi-block stuff with it.

@kianenigma
Copy link
Contributor Author

Actually, I can pretty much see the possibility of creating a new pallet that has just a storage task executor, and creating trait interfaces so that for example staking would only delegate its own tasks to the pallet-executor.

@kianenigma kianenigma requested a review from andresilva as a code owner March 12, 2021 10:42
Copy link
Member

@athei athei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am worried by the fact that the StoredExecutor declares its own storage related weights out of scope. This means that accounting for the weight that results from encoding and decoding the potentially unbounded task queue is solely upon the user of this feature. I am not even sure if it would be possible for the user to do so.

The dependent introduced in this PR (staking) already disregards those weights (see my other comments). I can't tell if this is a problem but I wager that it would profit if the executor would offer an abstraction to deal with it.

I am not sure if this feature should be changed to take coding weight into account or a higher level abstraction can be build onto it that handles it. This would also need to include storage append support.

I won't oppose the merge as you surely have a purpose for it. However, I cannot use it in contracts for the reasons stated.

Comment on lines 19 to 22
use crate::{weights::Weight, traits::Get};
use codec::{Encode, Decode};
use sp_runtime::traits::Zero;
use crate::{RuntimeDebugNoBound, PartialEqNoBound, EqNoBound, CloneNoBound};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to merge those crate:: imports.

frame/support/src/executor.rs Outdated Show resolved Hide resolved
frame/support/src/executor.rs Outdated Show resolved Hide resolved
frame/staking/src/lib.rs Show resolved Hide resolved
frame/staking/src/lib.rs Show resolved Hide resolved
// if this is a non-zero slash, schedule tasks to chill their nominations.
if !slash_fraction.is_zero() {
let task = SlashTask::new(stash.clone());
<SlashTaskExecutor<T>>::mutate(|e| e.add_task(task));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The add_task complexity is linear to the amount of already existing tasks in storage because of decoding and encoding of all tasks by mutate. Are the number of queued slashes capped and benchmarked for the worst case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.


let task_weight = <SlashTaskExecutor<T>>::mutate(|e| e.execute());
// The additional weight of reading the tasks, and writing back the result.
add_weight(1, 1, task_weight);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not only add the storage accesses but also decoding and encoding weight proportional to the amount of queued tasks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, this is one of the things that need to be fixed. Not sure where, but there's a TODO for it.

Comment on lines 36 to 37
pub trait RuntimeTask:
Sized + Clone + Default + Encode + Decode + PartialEq + Eq + sp_std::fmt::Debug
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need Default?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe everything going to storage needs Default.

Copy link
Member

@athei athei Mar 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the RuntimeTask does not go into storage. The StoredExecutor does which does not need its task to implement Default in order to implement it.

/// Return a vector of all tasks.
#[cfg(any(test, feature = "std"))]
fn tasks(&self) -> Vec<Self::Task>;
// TODO: providing an iter() might also be good.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would really like to have those as a github issue rather than a TODO comment. But this is a personal preference. Feel free to ignore.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A PR shall never be merged with TODOs, these are notes that I want my reviewers to also see and be aware that I maybe intend to do them.

frame/support/src/executor.rs Show resolved Hide resolved
kianenigma and others added 2 commits March 12, 2021 13:31
Co-authored-by: Alexander Theißen <alex.theissen@me.com>
Co-authored-by: Alexander Theißen <alex.theissen@me.com>
@kianenigma
Copy link
Contributor Author

I am worried by the fact that the StoredExecutor declares its own storage related weights out of scope. This means that accounting for the weight that results from encoding and decoding the potentially unbounded task queue is solely upon the user of this feature. I am not even sure if it would be possible for the user to do so.

The dependent introduced in this PR (staking) already disregards those weights (see my other comments). I can't tell if this is a problem but I wager that it would profit if the executor would offer an abstraction to deal with it.

I am not sure if this feature should be changed to take coding weight into account or a higher level abstraction can be build onto it that handles it. This would also need to include storage append support.

I totally agree with this, as you already mentioned it in your previous round. Honestly, I haven't addressed it yet because I am sure it is solvable and it is not the core of the work. An executor that has a bounded queue, IMO, can initially be deployed even with this code as-is, since the amount of work that the executor is doing under the hood + linear decoding is simply negligible.

So we have two issues to fix here:

  1. Each implementation of StoredExecutor needs to be benchmarked only for its internal operations. This is not the decoding stuff that you said, only the in-memory operations (such as iterating over the queue etc). I honestly think this is so minuscule that in the initial versions we can ignore it.

  2. Then, there's the weight of the executor itself being stored in storage. Of course, StoredExecutor itself cannot know this because it doesn't know where and how it is stored. This is the responsibility of the user of StoredExecutor, namelt staking or contracts. I currently do take it account by adding 1 read and 1 write, and you are totally right that the decode/encode weight is missing.

I see the second one being more important to fix.

I won't oppose the merge as you surely have a purpose for it. However, I cannot use it in contracts for the reasons stated.

I think what contracts is doing is a good use case and rather bend to your requirements, as long as they are reasonable. If your only grumbles are the two points above, I think they are easily fixable.

@kianenigma kianenigma changed the title Runtime Task Scheduler + Example for staking slashing spans Runtime Task Executor + Example for staking slashing spans Mar 26, 2021
@gnunicorn gnunicorn added the A5-stale Pull request did not receive any updates in a long time. No review needed at this stage. Close it. label May 19, 2021
@paritytech paritytech deleted a comment from cla-bot-2021 bot Jun 3, 2021
@stale stale bot closed this Jul 7, 2021
@kianenigma
Copy link
Contributor Author

shush bot, this is important work, and needs to stay around :D

@kianenigma kianenigma reopened this Jul 7, 2021
@stale stale bot removed the A5-stale Pull request did not receive any updates in a long time. No review needed at this stage. Close it. label Jul 7, 2021
@stale
Copy link

stale bot commented Aug 6, 2021

Hey, is anyone still working on this? Due to the inactivity this issue has been automatically marked as stale. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the A5-stale Pull request did not receive any updates in a long time. No review needed at this stage. Close it. label Aug 6, 2021
@stale stale bot closed this Aug 20, 2021
@kianenigma kianenigma deleted the kiz-full-slash-reversal branch November 4, 2021 00:34
@kianenigma kianenigma restored the kiz-full-slash-reversal branch November 4, 2021 00:34
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A3-in_progress Pull request is in progress. No review needed at this stage. A5-stale Pull request did not receive any updates in a long time. No review needed at this stage. Close it.
Projects
Status: 👀 Might Revisit 👀
Development

Successfully merging this pull request may close these issues.

Full Slash Reversals Properly prevent slashing during offchain phragmén election window.
6 participants