Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Refactor election solution trimming for efficiency #8614

Merged
35 commits merged into from
May 3, 2021

Conversation

coriolinus
Copy link
Contributor

@coriolinus coriolinus commented Apr 13, 2021

The previous version always trimmed the CompactOf<T> instance,
which was intrinsically inefficient: that's a packed data structure,
which is naturally expensive to edit. It's much easier to edit
the unpacked data structures: the voters and assignments lists.

Closes https://github.com/paritytech/srlabs_findings/issues/80.

Notes:

  • Introduce IndexAssignment intermediate type. We can convert a list of Assignment into a list of IndexAssignment once at a cost of O(edges * lg (Voters + Targets)), then convert the IndexAssignment into CompactOf for the cost of only O(edges). As we need to repeatedly create CompactOf in order to determine the encoded length, we end up performing the O(lg (Voters + Targets)) lookups only once instead of once per encoding
  • Update CompactOf::from_assignment to take a reference to assignments. While we still end up copying nearly all the data, this eliminates the need to clone when using the method repeatedly, and makes it easier to use while varying the length.
  • Modify the assignments list instead of the CompactOf instance. This is huge: it brings the cost of truncation down from O(Voters**2) to O(1)
  • Perform a binary search instead of a linear search to find the max number of voters whose encoding fits into the appropriate size. This brings down the number of encodings from O(Voters) to O(ln voters).
  • Unfortunately, we determined that it is not actually possible to statically determine the size of an arbitrary compacted encoding, despite previous belief otherwise. This is because the #[compact] encoding style for the CompactOf struct uses run-length encoding to compress the data, and in that circumstance, there just isn't a way to determine the actual encoded size without encoding it.
    It would be possible to statically determine an upper bound on the size, but I didn't bother implementing that because of the risk that we'd end up eliminating a voter who we didn't actually need to.

The previous version always trimmed the `CompactOf<T>` instance,
which was intrinsically inefficient: that's a packed data structure,
which is naturally expensive to edit. It's much easier to edit
the unpacked data structures: the `voters` and `assignments` lists.
@coriolinus coriolinus added A3-in_progress Pull request is in progress. No review needed at this stage. B0-silent Changes should not be mentioned in any release notes C1-low PR touches the given topic and has a low impact on builders. D9-needsaudit 👮 PR contains changes to fund-managing logic that should be properly reviewed and externally audited labels Apr 13, 2021
@coriolinus coriolinus self-assigned this Apr 13, 2021
Test suite now compiles. Tests still don't pass because the macro
generating the compact structure still generates `unimplemented!()`
for the actual `compact_length_of` implementation.
The `Compact` solution type is generated distinctly for each runtime,
and has both three type parameters and a built-in limit to the number
of candidates that each voter can vote for. Finally, they have an
optional `#[compact]` attribute which changes the encoding behavior.

The assignment truncation algorithm we're using depends on the ability
to efficiently and accurately determine how much space a `Compact`
solution will take once encoded.

Together, these two facts imply that simple unit tests are not
sufficient to validate the behavior of `Compact::encoded_size_for`.
This commit adds such a fuzzer. It is designed such that it is possible
to add a new fuzzer to the family by simply adjusting the
`generate_solution_type` macro invocation as desired, and making a
few minor documentation edits.

Of course, the fuzzer still fails for now: the generated implementation
for `encoded_size_for` is still `unimplemented!()`. However, once
the macro is updated appropriately, this fuzzer family should allow
us to gain confidence in the correctness of the generated code.
This reverts commit 9160387.

The design of `Compact::encoded_size_for` is flawed. When `#[compact]`
mode is enabled, every integer in the dataset is encoded using run-
length encoding. This means that it is impossible to compute the final
length faster than actually encoding the data structure, because the
encoded length of every field varies with the actual value stored.

Given that we won't be adding that method to the trait, we won't be
needing a fuzzer to validate its performance.
If `CompactSolution::encoded_size_for` can't be implemented in the
way that we wanted, there's no point in adding it.
This is not as efficient as what we'd hoped for, but it should still
be better than what it's replacing. Overall efficiency of
`fn trim_assignments_length` is now `O(edges * lg assignments.len())`.
Sorting the `voters` list causes lots of problems; an invariant that
we need to maintain is that an index into the voters list has a stable
meaning.

Luckily, it turns out that there is no need for the assignments list
to correspond to the voters list. That isn't an invariant, though previously
I'd thought that it was.

This simplifies things; we can just leave the voters list alone,
and sort the assignments list the way that is convenient.
…act`

Next up: `impl<'a, T> From<&'a [IndexAssignmentOf<T>]> for Compact`,
in the proc-macro which makes `Compact`. Should be a pretty straightforward
adaptation of `from_assignment`.
This involves a bit of duplication of types from
`election-provider-multi-phase`; we'll clean those up shortly.

I'm not entirely happy that we had to add a `from_index_assignments`
method to `CompactSolution`, but we couldn't define
`trait CompactSolution: TryFrom<&'a [Self::IndexAssignment]` because
that made trait lookup recursive, and I didn't want to propagate
`CompactSolutionOf<T> + TryFrom<&[IndexAssignmentOf<T>]>` everywhere
that compact solutions are specified.
…fully

Mostly that's just updating the various test functions to keep track of
refactorings elsewhere, though in a few places we needed to refactor some
test-only helpers as well.
Turns out that moving `low` and `high` into an averager function is a
bad idea, because the averager gets copies of those values, which
of course are never updated. Can't use mutable references, because
we want to read them elsewhere in the code. Just compute the average
directly; life is better that way.
@coriolinus coriolinus requested a review from kianenigma April 16, 2021 15:10
@coriolinus coriolinus marked this pull request as ready for review April 16, 2021 15:10
@github-actions github-actions bot added A0-please_review Pull request needs code review. and removed A3-in_progress Pull request is in progress. No review needed at this stage. labels Apr 16, 2021
This means that we can put the mocking parts of that into a proper
mock package, put the test into a test package among other tests.

Having the mocking parts in a mock package enables us to create a
benchmark (which is treated as a separate crate) import them.
max_weight,
);
let removing: usize = assignments.len().saturating_sub(maximum_allowed_voters.saturated_into());
log!(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't always need to log here right? only if something is actually changing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if we don't remove anything, I think it's useful to be able to see the maximum number of voters that would have been allowed for each phase of trimming.

// after this point, we never error.
// check before edit.

log!(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here. seems we should only log if we actually truncate

Comment on lines 39 to 43
fn generate_random_votes(
candidate_count: usize,
voter_count: usize,
mut rng: impl Rng,
) -> (Vec<Voter>, Vec<Assignment>, Vec<CandidateId>) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a function like this already exists somewhere else in the codebase no?

@kianenigma

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The closest I've seen are the functions in common.rs, and those don't quite do what I need.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah unfortunately they are different.

Note that we also had similar function in staking and we removed them.

A good course of action would be to pull common.rs outside of the fuzzer and put this also next to them, if possible. If too much of a PITA for whatever reason, fine for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pull common.rs outside of the fuzzer

Yup: eed1c21

Copy link
Member

@shawntabrizi shawntabrizi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a high level review looks okay to me.

Copy link
Contributor

@kianenigma kianenigma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks all good other than my note about sorting, that might be an issue that we overlooked.

primitives/npos-elections/compact/tests/compact.rs Outdated Show resolved Hide resolved
Comment on lines 39 to 43
fn generate_random_votes(
candidate_count: usize,
voter_count: usize,
mut rng: impl Rng,
) -> (Vec<Voter>, Vec<Assignment>, Vec<CandidateId>) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah unfortunately they are different.

Note that we also had similar function in staking and we removed them.

A good course of action would be to pull common.rs outside of the fuzzer and put this also next to them, if possible. If too much of a PITA for whatever reason, fine for now.

"Why don't you add a benchmark?", he said. "It'll be good practice,
and can help demonstrate that this isn't blowing up the runtime."

He was absolutely right.

The biggest discovery is that adding a parametric benchmark means that
you get a bunch of new test cases, for free. This is excellent, because
those test cases uncovered a binary search bug. Fixing that simplified
that part of the code nicely.

The other nice thing you get from a parametric benchmark is data about
what each parameter does. In this case, `f` is the size factor: what
percent of the votes (by size) should be removed. 0 means that we should
keep everything, 95 means that we should trim down to 5% of original size
or less.

```
Median Slopes Analysis
========
-- Extrinsic Time --

Model:
Time ~=     3846
    + v    0.015
    + t        0
    + a    0.192
    + d        0
    + f        0
              µs

Min Squares Analysis
========
-- Extrinsic Time --

Data points distribution:
    v     t     a     d     f   mean µs  sigma µs       %
<snip>
 6000  1600  3000   800     0      4385     75.87    1.7%
 6000  1600  3000   800     9      4089     46.28    1.1%
 6000  1600  3000   800    18      3793     36.45    0.9%
 6000  1600  3000   800    27      3365     41.13    1.2%
 6000  1600  3000   800    36      3096     7.498    0.2%
 6000  1600  3000   800    45      2774     17.96    0.6%
 6000  1600  3000   800    54      2057     37.94    1.8%
 6000  1600  3000   800    63      1885     2.515    0.1%
 6000  1600  3000   800    72      1591     3.203    0.2%
 6000  1600  3000   800    81      1219     25.72    2.1%
 6000  1600  3000   800    90       859     5.295    0.6%
 6000  1600  3000   800    95     684.6     2.969    0.4%

Quality and confidence:
param     error
v         0.008
t         0.029
a         0.008
d         0.044
f         0.185

Model:
Time ~=     3957
    + v    0.009
    + t        0
    + a    0.185
    + d        0
    + f        0
              µs
```

What's nice about this is the clear negative correlation between
amount removed and total time. The more we remove, the less total
time things take.
Copy link
Contributor

@kianenigma kianenigma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! 👍 please let us know what rough numbers the benchmarks indicate and if we should still be worried about this or not.

@coriolinus
Copy link
Contributor Author

please let us know what rough numbers the benchmarks indicate and if we should still be worried about this or not.

Roughly, for the default (small-ish) candidate/target sizes, the benchmarks actually speed up as we remove a greater fraction of the size. That's all in the commit message for c9f2517. I haven't put a lot of investigation into why precisely that happens, but if final encoding/storage dominates the overall benchmark time, it makes sense; removing more means that there's less to encode.

For large benchmarks, I ran this on my local machine. It just multiplies all the numbers by 10:

$ time target/release/substrate benchmark --execution native --pallet pallet_election_provider_multi_phase --extrinsic trim_assignments_length --low 60000,16000,30000,8000,0 --high 60000,16000,30000,8000,95 --steps 0,0,0,0,19 --no-median-slopes
Fri 23 Apr 2021 11:01:23 AM CEST
Pallet: "pallet_election_provider_multi_phase", Extrinsic: "trim_assignments_length", Lowest values: [60000, 16000, 30000, 8000, 0], Highest values: [60000, 16000, 30000, 8000, 95], Steps: [0, 0, 0, 0, 19], Repeat: 1
Min Squares Analysis
========
-- Extrinsic Time --

Data points distribution:
    v     t     a     d     f   mean µs  sigma µs       %
60000 16000 30000  8000     0     58200         0    0.0%
60000 16000 30000  8000     5     55660         0    0.0%
60000 16000 30000  8000    10     56060         0    0.0%
60000 16000 30000  8000    15     53290         0    0.0%
60000 16000 30000  8000    20     50710         0    0.0%
60000 16000 30000  8000    25     44090         0    0.0%
60000 16000 30000  8000    30     44450         0    0.0%
60000 16000 30000  8000    35     40380         0    0.0%
60000 16000 30000  8000    40     38390         0    0.0%
60000 16000 30000  8000    45     35670         0    0.0%
60000 16000 30000  8000    50     29120         0    0.0%
60000 16000 30000  8000    55     27730         0    0.0%
60000 16000 30000  8000    60     25100         0    0.0%
60000 16000 30000  8000    65     23120         0    0.0%
60000 16000 30000  8000    70     20250         0    0.0%
60000 16000 30000  8000    75     16970         0    0.0%
60000 16000 30000  8000    80     15490         0    0.0%
60000 16000 30000  8000    85     13500         0    0.0%
60000 16000 30000  8000    90     10360         0    0.0%
60000 16000 30000  8000    95      8372         0    0.0%

Quality and confidence:
param     error
v      179595530423356520000000000000000
t      340282366920938500000000000000000000
a      359190219606011100000000000000000
d      340282366920938500000000000000000000
f      10734712128996303000000

Model:
Time ~=        0
    + v        0
    + t 428822927171117800000000000
    + a 150676229848948540000
    + d        0
    + f        0
              µs

Reads = 0 + (0 * v) + (0 * t) + (0 * a) + (0 * d) + (0 * f)
Writes = 0 + (0 * v) + (0 * t) + (0 * a) + (0 * d) + (0 * f)

real	1m40.621s
user	1m38.463s
sys	0m2.159s

Note that I use native execution instead of wasm; for numbers this large, the wasm implementation panics almost immediately with an OOM while generating a test solution. I didn't think it was worth spending a ton of time debugging what's ultimately a test artifact.

I wanted to run it with larger numbers as well, but we run into a wall. I suspect that the benchmarking module is using TargetId = u16 somewhere, because we end up with CompactInvalidIndex during test case generation when we scale up substantially from these values. However, it wasn't clear where the benchmarking module gets its T from; without that, we can't tell where T:: CompactSolution is defined.

Still, even with only 10x the sizes, we still see a reduction in overall time as we trim away larger portions of the initial solution set. I'd expect to see that pattern continue with still larger solution sizes.

@kianenigma
Copy link
Contributor

Note that I use native execution instead of wasm; for numbers this large, the wasm implementation panics almost immediately with an OOM while generating a test solution. I didn't think it was worth spending a ton of time debugging what's ultimately a test artifact.

(unrelated to this PR) hmmm but we should keep in mind that the OCW also runs in wasm and that can also run OOM. All of this really points to the fact that we should brace ourselves more and more for #8348.

@jakoblell jakoblell added D1-audited 👍 PR contains changes to fund-managing logic that has been properly reviewed and externally audited and removed D9-needsaudit 👮 PR contains changes to fund-managing logic that should be properly reviewed and externally audited labels Apr 27, 2021
@shawntabrizi
Copy link
Member

@coriolinus time to merge?

@coriolinus
Copy link
Contributor Author

bot merge

@ghost
Copy link

ghost commented May 3, 2021

Trying merge.

@ghost ghost merged commit 8f0cfda into master May 3, 2021
@ghost ghost deleted the prgn-multi-phase-efficient-trim branch May 3, 2021 07:26
nazar-pc pushed a commit to autonomys/substrate that referenced this pull request Aug 8, 2021
* Refactor election solution trimming for efficiency

The previous version always trimmed the `CompactOf<T>` instance,
which was intrinsically inefficient: that's a packed data structure,
which is naturally expensive to edit. It's much easier to edit
the unpacked data structures: the `voters` and `assignments` lists.

* rework length-trim tests to work with the new interface

Test suite now compiles. Tests still don't pass because the macro
generating the compact structure still generates `unimplemented!()`
for the actual `compact_length_of` implementation.

* simplify

* add a fuzzer which can validate `Compact::encoded_size_for`

The `Compact` solution type is generated distinctly for each runtime,
and has both three type parameters and a built-in limit to the number
of candidates that each voter can vote for. Finally, they have an
optional `#[compact]` attribute which changes the encoding behavior.

The assignment truncation algorithm we're using depends on the ability
to efficiently and accurately determine how much space a `Compact`
solution will take once encoded.

Together, these two facts imply that simple unit tests are not
sufficient to validate the behavior of `Compact::encoded_size_for`.
This commit adds such a fuzzer. It is designed such that it is possible
to add a new fuzzer to the family by simply adjusting the
`generate_solution_type` macro invocation as desired, and making a
few minor documentation edits.

Of course, the fuzzer still fails for now: the generated implementation
for `encoded_size_for` is still `unimplemented!()`. However, once
the macro is updated appropriately, this fuzzer family should allow
us to gain confidence in the correctness of the generated code.

* Revert "add a fuzzer which can validate `Compact::encoded_size_for`"

This reverts commit 9160387.

The design of `Compact::encoded_size_for` is flawed. When `#[compact]`
mode is enabled, every integer in the dataset is encoded using run-
length encoding. This means that it is impossible to compute the final
length faster than actually encoding the data structure, because the
encoded length of every field varies with the actual value stored.

Given that we won't be adding that method to the trait, we won't be
needing a fuzzer to validate its performance.

* revert changes to `trait CompactSolution`

If `CompactSolution::encoded_size_for` can't be implemented in the
way that we wanted, there's no point in adding it.

* WIP: restructure trim_assignments_length by actually encoding

This is not as efficient as what we'd hoped for, but it should still
be better than what it's replacing. Overall efficiency of
`fn trim_assignments_length` is now `O(edges * lg assignments.len())`.

* fix compiler errors

* don't sort voters, just assignments

Sorting the `voters` list causes lots of problems; an invariant that
we need to maintain is that an index into the voters list has a stable
meaning.

Luckily, it turns out that there is no need for the assignments list
to correspond to the voters list. That isn't an invariant, though previously
I'd thought that it was.

This simplifies things; we can just leave the voters list alone,
and sort the assignments list the way that is convenient.

* WIP: add `IndexAssignment` type to speed up repeatedly creating `Compact`

Next up: `impl<'a, T> From<&'a [IndexAssignmentOf<T>]> for Compact`,
in the proc-macro which makes `Compact`. Should be a pretty straightforward
adaptation of `from_assignment`.

* Add IndexAssignment and conversion method to CompactSolution

This involves a bit of duplication of types from
`election-provider-multi-phase`; we'll clean those up shortly.

I'm not entirely happy that we had to add a `from_index_assignments`
method to `CompactSolution`, but we couldn't define
`trait CompactSolution: TryFrom<&'a [Self::IndexAssignment]` because
that made trait lookup recursive, and I didn't want to propagate
`CompactSolutionOf<T> + TryFrom<&[IndexAssignmentOf<T>]>` everywhere
that compact solutions are specified.

* use `CompactSolution::from_index_assignment` and clean up dead code

* get rid of `from_index_assignments` in favor of `TryFrom`

* cause `pallet-election-provider-multi-phase` tests to compile successfully

Mostly that's just updating the various test functions to keep track of
refactorings elsewhere, though in a few places we needed to refactor some
test-only helpers as well.

* fix infinite binary search loop

Turns out that moving `low` and `high` into an averager function is a
bad idea, because the averager gets copies of those values, which
of course are never updated. Can't use mutable references, because
we want to read them elsewhere in the code. Just compute the average
directly; life is better that way.

* fix a test failure

* fix the rest of test failures

* remove unguarded subtraction

* fix npos-elections tests compilation

* ensure we use sp_std::vec::Vec in assignments

* add IndexAssignmentOf to sp_npos_elections

* move miner types to `unsigned`

* use stable sort

* rewrap some long comments

* use existing cache instead of building a dedicated stake map

* generalize the TryFrom bound on CompactSolution

* undo adding sp-core dependency

* consume assignments to produce index_assignments

* Add a test of Assignment -> IndexAssignment -> Compact

* fix `IndexAssignmentOf` doc

* move compact test from sp-npos-elections-compact to sp-npos-elections

This means that we can put the mocking parts of that into a proper
mock package, put the test into a test package among other tests.

Having the mocking parts in a mock package enables us to create a
benchmark (which is treated as a separate crate) import them.

* rename assignments -> sorted_assignments

* sort after reducing to avoid potential re-sort issues

* add runtime benchmark, fix critical binary search error

"Why don't you add a benchmark?", he said. "It'll be good practice,
and can help demonstrate that this isn't blowing up the runtime."

He was absolutely right.

The biggest discovery is that adding a parametric benchmark means that
you get a bunch of new test cases, for free. This is excellent, because
those test cases uncovered a binary search bug. Fixing that simplified
that part of the code nicely.

The other nice thing you get from a parametric benchmark is data about
what each parameter does. In this case, `f` is the size factor: what
percent of the votes (by size) should be removed. 0 means that we should
keep everything, 95 means that we should trim down to 5% of original size
or less.

```
Median Slopes Analysis
========
-- Extrinsic Time --

Model:
Time ~=     3846
    + v    0.015
    + t        0
    + a    0.192
    + d        0
    + f        0
              µs

Min Squares Analysis
========
-- Extrinsic Time --

Data points distribution:
    v     t     a     d     f   mean µs  sigma µs       %
<snip>
 6000  1600  3000   800     0      4385     75.87    1.7%
 6000  1600  3000   800     9      4089     46.28    1.1%
 6000  1600  3000   800    18      3793     36.45    0.9%
 6000  1600  3000   800    27      3365     41.13    1.2%
 6000  1600  3000   800    36      3096     7.498    0.2%
 6000  1600  3000   800    45      2774     17.96    0.6%
 6000  1600  3000   800    54      2057     37.94    1.8%
 6000  1600  3000   800    63      1885     2.515    0.1%
 6000  1600  3000   800    72      1591     3.203    0.2%
 6000  1600  3000   800    81      1219     25.72    2.1%
 6000  1600  3000   800    90       859     5.295    0.6%
 6000  1600  3000   800    95     684.6     2.969    0.4%

Quality and confidence:
param     error
v         0.008
t         0.029
a         0.008
d         0.044
f         0.185

Model:
Time ~=     3957
    + v    0.009
    + t        0
    + a    0.185
    + d        0
    + f        0
              µs
```

What's nice about this is the clear negative correlation between
amount removed and total time. The more we remove, the less total
time things take.
This pull request was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A0-please_review Pull request needs code review. B0-silent Changes should not be mentioned in any release notes C1-low PR touches the given topic and has a low impact on builders. D1-audited 👍 PR contains changes to fund-managing logic that has been properly reviewed and externally audited
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

5 participants