Refactor election solution trimming for efficiency (paritytech#8614)

* Refactor election solution trimming for efficiency The previous version always trimmed the `CompactOf<T>` instance, which was intrinsically inefficient: that's a packed data structure, which is naturally expensive to edit. It's much easier to edit the unpacked data structures: the `voters` and `assignments` lists. * rework length-trim tests to work with the new interface Test suite now compiles. Tests still don't pass because the macro generating the compact structure still generates `unimplemented!()` for the actual `compact_length_of` implementation. * simplify * add a fuzzer which can validate `Compact::encoded_size_for` The `Compact` solution type is generated distinctly for each runtime, and has both three type parameters and a built-in limit to the number of candidates that each voter can vote for. Finally, they have an optional `#[compact]` attribute which changes the encoding behavior. The assignment truncation algorithm we're using depends on the ability to efficiently and accurately determine how much space a `Compact` solution will take once encoded. Together, these two facts imply that simple unit tests are not sufficient to validate the behavior of `Compact::encoded_size_for`. This commit adds such a fuzzer. It is designed such that it is possible to add a new fuzzer to the family by simply adjusting the `generate_solution_type` macro invocation as desired, and making a few minor documentation edits. Of course, the fuzzer still fails for now: the generated implementation for `encoded_size_for` is still `unimplemented!()`. However, once the macro is updated appropriately, this fuzzer family should allow us to gain confidence in the correctness of the generated code. * Revert "add a fuzzer which can validate `Compact::encoded_size_for`" This reverts commit 9160387. The design of `Compact::encoded_size_for` is flawed. When `#[compact]` mode is enabled, every integer in the dataset is encoded using run- length encoding. This means that it is impossible to compute the final length faster than actually encoding the data structure, because the encoded length of every field varies with the actual value stored. Given that we won't be adding that method to the trait, we won't be needing a fuzzer to validate its performance. * revert changes to `trait CompactSolution` If `CompactSolution::encoded_size_for` can't be implemented in the way that we wanted, there's no point in adding it. * WIP: restructure trim_assignments_length by actually encoding This is not as efficient as what we'd hoped for, but it should still be better than what it's replacing. Overall efficiency of `fn trim_assignments_length` is now `O(edges * lg assignments.len())`. * fix compiler errors * don't sort voters, just assignments Sorting the `voters` list causes lots of problems; an invariant that we need to maintain is that an index into the voters list has a stable meaning. Luckily, it turns out that there is no need for the assignments list to correspond to the voters list. That isn't an invariant, though previously I'd thought that it was. This simplifies things; we can just leave the voters list alone, and sort the assignments list the way that is convenient. * WIP: add `IndexAssignment` type to speed up repeatedly creating `Compact` Next up: `impl<'a, T> From<&'a [IndexAssignmentOf<T>]> for Compact`, in the proc-macro which makes `Compact`. Should be a pretty straightforward adaptation of `from_assignment`. * Add IndexAssignment and conversion method to CompactSolution This involves a bit of duplication of types from `election-provider-multi-phase`; we'll clean those up shortly. I'm not entirely happy that we had to add a `from_index_assignments` method to `CompactSolution`, but we couldn't define `trait CompactSolution: TryFrom<&'a [Self::IndexAssignment]` because that made trait lookup recursive, and I didn't want to propagate `CompactSolutionOf<T> + TryFrom<&[IndexAssignmentOf<T>]>` everywhere that compact solutions are specified. * use `CompactSolution::from_index_assignment` and clean up dead code * get rid of `from_index_assignments` in favor of `TryFrom` * cause `pallet-election-provider-multi-phase` tests to compile successfully Mostly that's just updating the various test functions to keep track of refactorings elsewhere, though in a few places we needed to refactor some test-only helpers as well. * fix infinite binary search loop Turns out that moving `low` and `high` into an averager function is a bad idea, because the averager gets copies of those values, which of course are never updated. Can't use mutable references, because we want to read them elsewhere in the code. Just compute the average directly; life is better that way. * fix a test failure * fix the rest of test failures * remove unguarded subtraction * fix npos-elections tests compilation * ensure we use sp_std::vec::Vec in assignments * add IndexAssignmentOf to sp_npos_elections * move miner types to `unsigned` * use stable sort * rewrap some long comments * use existing cache instead of building a dedicated stake map * generalize the TryFrom bound on CompactSolution * undo adding sp-core dependency * consume assignments to produce index_assignments * Add a test of Assignment -> IndexAssignment -> Compact * fix `IndexAssignmentOf` doc * move compact test from sp-npos-elections-compact to sp-npos-elections This means that we can put the mocking parts of that into a proper mock package, put the test into a test package among other tests. Having the mocking parts in a mock package enables us to create a benchmark (which is treated as a separate crate) import them. * rename assignments -> sorted_assignments * sort after reducing to avoid potential re-sort issues * add runtime benchmark, fix critical binary search error "Why don't you add a benchmark?", he said. "It'll be good practice, and can help demonstrate that this isn't blowing up the runtime." He was absolutely right. The biggest discovery is that adding a parametric benchmark means that you get a bunch of new test cases, for free. This is excellent, because those test cases uncovered a binary search bug. Fixing that simplified that part of the code nicely. The other nice thing you get from a parametric benchmark is data about what each parameter does. In this case, `f` is the size factor: what percent of the votes (by size) should be removed. 0 means that we should keep everything, 95 means that we should trim down to 5% of original size or less. ``` Median Slopes Analysis ======== -- Extrinsic Time -- Model: Time ~= 3846 + v 0.015 + t 0 + a 0.192 + d 0 + f 0 µs Min Squares Analysis ======== -- Extrinsic Time -- Data points distribution: v t a d f mean µs sigma µs % <snip> 6000 1600 3000 800 0 4385 75.87 1.7% 6000 1600 3000 800 9 4089 46.28 1.1% 6000 1600 3000 800 18 3793 36.45 0.9% 6000 1600 3000 800 27 3365 41.13 1.2% 6000 1600 3000 800 36 3096 7.498 0.2% 6000 1600 3000 800 45 2774 17.96 0.6% 6000 1600 3000 800 54 2057 37.94 1.8% 6000 1600 3000 800 63 1885 2.515 0.1% 6000 1600 3000 800 72 1591 3.203 0.2% 6000 1600 3000 800 81 1219 25.72 2.1% 6000 1600 3000 800 90 859 5.295 0.6% 6000 1600 3000 800 95 684.6 2.969 0.4% Quality and confidence: param error v 0.008 t 0.029 a 0.008 d 0.044 f 0.185 Model: Time ~= 3957 + v 0.009 + t 0 + a 0.185 + d 0 + f 0 µs ``` What's nice about this is the clear negative correlation between amount removed and total time. The more we remove, the less total time things take.
autonomys · Aug 8, 2021 · 46d8efe · 46d8efe
1 parent 2ff601c
commit 46d8efe
Show file tree

Hide file tree

Showing 13 changed files with 871 additions and 353 deletions.
diff --git a/frame/election-provider-multi-phase/Cargo.toml b/frame/election-provider-multi-phase/Cargo.toml
@@ -20,7 +20,7 @@ log = { version = "0.4.14", default-features = false }
 frame-support = { version = "3.0.0", default-features = false, path = "../support" }
 frame-system = { version = "3.0.0", default-features = false, path = "../system" }
 
-sp-io ={ version = "3.0.0", default-features = false, path = "../../primitives/io" }
+sp-io = { version = "3.0.0", default-features = false, path = "../../primitives/io" }
 sp-std = { version = "3.0.0", default-features = false, path = "../../primitives/std" }
 sp-runtime = { version = "3.0.0", default-features = false, path = "../../primitives/runtime" }
 sp-npos-elections = { version = "3.0.0", default-features = false, path = "../../primitives/npos-elections" }
@@ -37,8 +37,9 @@ parking_lot = "0.11.0"
 rand = { version = "0.7.3" }
 hex-literal = "0.3.1"
 substrate-test-utils = { version = "3.0.0", path = "../../test-utils" }
+sp-core = { version = "3.0.0", default-features = false, path = "../../primitives/core" }
 sp-io = { version = "3.0.0", path = "../../primitives/io" }
-sp-core = { version = "3.0.0", path = "../../primitives/core" }
+sp-npos-elections = { version = "3.0.0", default-features = false, features = [ "mocks" ], path = "../../primitives/npos-elections" }
 sp-tracing = { version = "3.0.0", path = "../../primitives/tracing" }
 frame-election-provider-support = { version = "3.0.0", features = ["runtime-benchmarks"], path = "../election-provider-support" }
 pallet-balances = { version = "3.0.0", path = "../balances" }
@@ -64,5 +65,6 @@ std = [
 runtime-benchmarks = [
 	"frame-benchmarking",
 	"rand",
+	"sp-npos-elections/mocks",
 ]
 try-runtime = ["frame-support/try-runtime"]
diff --git a/frame/election-provider-multi-phase/src/benchmarking.rs b/frame/election-provider-multi-phase/src/benchmarking.rs
@@ -18,15 +18,16 @@
 //! Two phase election pallet benchmarking.
 
 use super::*;
-use crate::Pallet as MultiPhase;
+use crate::{Pallet as MultiPhase, unsigned::IndexAssignmentOf};
 use frame_benchmarking::impl_benchmark_test_suite;
 use frame_support::{assert_ok, traits::OnInitialize};
 use frame_system::RawOrigin;
 use rand::{prelude::SliceRandom, rngs::SmallRng, SeedableRng};
 use frame_election_provider_support::Assignment;
-use sp_arithmetic::traits::One;
+use sp_arithmetic::{per_things::Percent, traits::One};
+use sp_npos_elections::IndexAssignment;
 use sp_runtime::InnerOf;
-use sp_std::convert::TryInto;
+use sp_std::convert::{TryFrom, TryInto};
 
 const SEED: u32 = 999;
 
@@ -135,7 +136,7 @@ fn solution_with_size<T: Config>(
 		.collect::<Vec<_>>();
 
 	let compact =
-		<CompactOf<T>>::from_assignment(assignments, &voter_index, &target_index).unwrap();
+		<CompactOf<T>>::from_assignment(&assignments, &voter_index, &target_index).unwrap();
 	let score = compact.clone().score(&winners, stake_of, voter_at, target_at).unwrap();
 	let round = <MultiPhase<T>>::round();
 
@@ -254,6 +255,69 @@ frame_benchmarking::benchmarks! {
 		assert!(<MultiPhase<T>>::queued_solution().is_some());
 	}
 
+	#[extra]
+	trim_assignments_length {
+		// number of votes in snapshot.
+		let v in (T::BenchmarkingConfig::VOTERS[0]) .. T::BenchmarkingConfig::VOTERS[1];
+		// number of targets in snapshot.
+		let t in (T::BenchmarkingConfig::TARGETS[0]) .. T::BenchmarkingConfig::TARGETS[1];
+		// number of assignments, i.e. compact.len(). This means the active nominators, thus must be
+		// a subset of `v` component.
+		let a in (T::BenchmarkingConfig::ACTIVE_VOTERS[0]) .. T::BenchmarkingConfig::ACTIVE_VOTERS[1];
+		// number of desired targets. Must be a subset of `t` component.
+		let d in (T::BenchmarkingConfig::DESIRED_TARGETS[0]) .. T::BenchmarkingConfig::DESIRED_TARGETS[1];
+		// Subtract this percentage from the actual encoded size
+		let f in 0 .. 95;
+
+		// Compute a random solution, then work backwards to get the lists of voters, targets, and assignments
+		let witness = SolutionOrSnapshotSize { voters: v, targets: t };
+		let RawSolution { compact, .. } = solution_with_size::<T>(witness, a, d);
+		let RoundSnapshot { voters, targets } = MultiPhase::<T>::snapshot().unwrap();
+		let voter_at = helpers::voter_at_fn::<T>(&voters);
+		let target_at = helpers::target_at_fn::<T>(&targets);
+		let mut assignments = compact.into_assignment(voter_at, target_at).unwrap();
+
+		// make a voter cache and some helper functions for access
+		let cache = helpers::generate_voter_cache::<T>(&voters);
+		let voter_index = helpers::voter_index_fn::<T>(&cache);
+		let target_index = helpers::target_index_fn::<T>(&targets);
+
+		// sort assignments by decreasing voter stake
+		assignments.sort_by_key(|crate::unsigned::Assignment::<T> { who, .. }| {
+			let stake = cache.get(&who).map(|idx| {
+				let (_, stake, _) = voters[*idx];
+				stake
+			}).unwrap_or_default();
+			sp_std::cmp::Reverse(stake)
+		});
+
+		let mut index_assignments = assignments
+			.into_iter()
+			.map(|assignment| IndexAssignment::new(&assignment, &voter_index, &target_index))
+			.collect::<Result<Vec<_>, _>>()
+			.unwrap();
+
+		let encoded_size_of = |assignments: &[IndexAssignmentOf<T>]| {
+			CompactOf::<T>::try_from(assignments).map(|compact| compact.encoded_size())
+		};
+
+		let desired_size = Percent::from_percent(100 - f.saturated_into::<u8>())
+			.mul_ceil(encoded_size_of(index_assignments.as_slice()).unwrap());
+		log!(trace, "desired_size = {}", desired_size);
+	}: {
+		MultiPhase::<T>::trim_assignments_length(
+			desired_size.saturated_into(),
+			&mut index_assignments,
+			&encoded_size_of,
+		).unwrap();
+	} verify {
+		let compact = CompactOf::<T>::try_from(index_assignments.as_slice()).unwrap();
+		let encoding = compact.encode();
+		log!(trace, "encoded size prediction = {}", encoded_size_of(index_assignments.as_slice()).unwrap());
+		log!(trace, "actual encoded size = {}", encoding.len());
+		assert!(encoding.len() <= desired_size);
+	}
+
 	// This is checking a valid solution. The worse case is indeed a valid solution.
 	feasibility_check {
 		// number of votes in snapshot.

diff --git a/frame/election-provider-multi-phase/src/helpers.rs b/frame/election-provider-multi-phase/src/helpers.rs
@@ -62,6 +62,18 @@ pub fn voter_index_fn<T: Config>(
 	}
 }
 
+/// Create a function that returns the index of a voter in the snapshot.
+///
+/// Same as [`voter_index_fn`] but the returned function owns all its necessary data; nothing is
+/// borrowed.
+pub fn voter_index_fn_owned<T: Config>(
+	cache: BTreeMap<T::AccountId, usize>,
+) -> impl Fn(&T::AccountId) -> Option<CompactVoterIndexOf<T>> {
+	move |who| {
+		cache.get(who).and_then(|i| <usize as TryInto<CompactVoterIndexOf<T>>>::try_into(*i).ok())
+	}
+}
+
 /// Same as [`voter_index_fn`], but the returning index is converted into usize, if possible.
 ///
 /// ## Warning

diff --git a/frame/election-provider-multi-phase/src/mock.rs b/frame/election-provider-multi-phase/src/mock.rs
@@ -17,6 +17,7 @@
 
 use super::*;
 use crate as multi_phase;
+use multi_phase::unsigned::{IndexAssignmentOf, Voter};
 pub use frame_support::{assert_noop, assert_ok};
 use frame_support::{
 	parameter_types,
@@ -41,7 +42,7 @@ use sp_runtime::{
 	traits::{BlakeTwo256, IdentityLookup},
 	PerU16,
 };
-use std::sync::Arc;
+use std::{convert::TryFrom, sync::Arc};
 
 pub type Block = sp_runtime::generic::Block<Header, UncheckedExtrinsic>;
 pub type UncheckedExtrinsic = sp_runtime::generic::UncheckedExtrinsic<AccountId, Call, (), ()>;
@@ -95,19 +96,70 @@ pub fn roll_to_with_ocw(n: u64) {
 	}
 }
 
+pub struct TrimHelpers {
+	pub voters: Vec<Voter<Runtime>>,
+	pub assignments: Vec<IndexAssignmentOf<Runtime>>,
+	pub encoded_size_of:
+		Box<dyn Fn(&[IndexAssignmentOf<Runtime>]) -> Result<usize, sp_npos_elections::Error>>,
+	pub voter_index: Box<
+		dyn Fn(
+			&<Runtime as frame_system::Config>::AccountId,
+		) -> Option<CompactVoterIndexOf<Runtime>>,
+	>,
+}
+
+/// Helpers for setting up trimming tests.
+///
+/// Assignments are pre-sorted in reverse order of stake.
+pub fn trim_helpers() -> TrimHelpers {
+	let RoundSnapshot { voters, targets } = MultiPhase::snapshot().unwrap();
+	let stakes: std::collections::HashMap<_, _> =
+		voters.iter().map(|(id, stake, _)| (*id, *stake)).collect();
+
+	// Compute the size of a compact solution comprised of the selected arguments.
+	//
+	// This function completes in `O(edges)`; it's expensive, but linear.
+	let encoded_size_of = Box::new(|assignments: &[IndexAssignmentOf<Runtime>]| {
+		CompactOf::<Runtime>::try_from(assignments).map(|compact| compact.encoded_size())
+	});
+	let cache = helpers::generate_voter_cache::<Runtime>(&voters);
+	let voter_index = helpers::voter_index_fn_owned::<Runtime>(cache);
+	let target_index = helpers::target_index_fn::<Runtime>(&targets);
+
+	let desired_targets = MultiPhase::desired_targets().unwrap();
+
+	let ElectionResult { mut assignments, .. } = seq_phragmen::<_, CompactAccuracyOf<Runtime>>(
+		desired_targets as usize,
+		targets.clone(),
+		voters.clone(),
+		None,
+	)
+	.unwrap();
+
+	// sort by decreasing order of stake
+	assignments.sort_unstable_by_key(|assignment| {
+		std::cmp::Reverse(stakes.get(&assignment.who).cloned().unwrap_or_default())
+	});
+
+	// convert to IndexAssignment
+	let assignments = assignments
+		.iter()
+		.map(|assignment| {
+			IndexAssignmentOf::<Runtime>::new(assignment, &voter_index, &target_index)
+		})
+		.collect::<Result<Vec<_>, _>>()
+		.expect("test assignments don't contain any voters with too many votes");
+
+	TrimHelpers { voters, assignments, encoded_size_of, voter_index: Box::new(voter_index) }
+}
+
 /// Spit out a verifiable raw solution.
 ///
 /// This is a good example of what an offchain miner would do.
 pub fn raw_solution() -> RawSolution<CompactOf<Runtime>> {
 	let RoundSnapshot { voters, targets } = MultiPhase::snapshot().unwrap();
 	let desired_targets = MultiPhase::desired_targets().unwrap();
 
-	// closures
-	let cache = helpers::generate_voter_cache::<Runtime>(&voters);
-	let voter_index = helpers::voter_index_fn_linear::<Runtime>(&voters);
-	let target_index = helpers::target_index_fn_linear::<Runtime>(&targets);
-	let stake_of = helpers::stake_of_fn::<Runtime>(&voters, &cache);
-
 	let ElectionResult { winners, assignments } = seq_phragmen::<_, CompactAccuracyOf<Runtime>>(
 		desired_targets as usize,
 		targets.clone(),
@@ -116,14 +168,20 @@ pub fn raw_solution() -> RawSolution<CompactOf<Runtime>> {
 	)
 	.unwrap();
 
+	// closures
+	let cache = helpers::generate_voter_cache::<Runtime>(&voters);
+	let voter_index = helpers::voter_index_fn_linear::<Runtime>(&voters);
+	let target_index = helpers::target_index_fn_linear::<Runtime>(&targets);
+	let stake_of = helpers::stake_of_fn::<Runtime>(&voters, &cache);
+
 	let winners = to_without_backing(winners);
 
 	let score = {
 		let staked = assignment_ratio_to_staked_normalized(assignments.clone(), &stake_of).unwrap();
 		to_supports(&winners, &staked).unwrap().evaluate()
 	};
 	let compact =
-		<CompactOf<Runtime>>::from_assignment(assignments, &voter_index, &target_index).unwrap();
+		<CompactOf<Runtime>>::from_assignment(&assignments, &voter_index, &target_index).unwrap();
 
 	let round = MultiPhase::round();
 	RawSolution { compact, score, round }