Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UI extraction performance improvements #9212

Closed
wants to merge 40 commits into from

Conversation

ickshonpe
Copy link
Contributor

@ickshonpe ickshonpe commented Jul 19, 2023

Objective

UI extraction is really inefficient:

  • Each extraction function has to walk the entire UiStack and call Query::get for each entity. So if there are 10000 nodes it will do 10000 lookups, even if there are no entities that match the query.
  • prepare_uinodes does an in-place stable sort of the ExtractedUiNodes::uinodes vector. It's extremely expensive as each ExtractedUiNode is quite large (192 bytes) and there can be a lot of them.

Solution

Add a new component UiStackIndex to hold each UI node's stack index generated by ui_stack_system. Then during extraction we can query for each node's stack index and we no longer have to iterate the UiStack each time.

ExtractedUiNodes now has a second vec of

struct ExtractedRange {
    stack_index: u32,
    range: Range<u32>,
}

These ranges partition the ExtractedUinodes::uinodes vector.

Now instead of sorting uinodes we can sort the ExtractedRanges vector by stack index instead. This is faster because ExtractedRanges are smaller and there are fewer of them (a page of text using thousands of ExtractedUiNodes only requires one ExtractedRange).

Benchmarks

cargo run --profile stress-test --features trace_tracy --example many_buttons
prepare text Capture atlas

Notes

  • The PR is quite small. The large number of lines changed is due to the elimination of some if-let blocks reducing indentation, most of the logic is unaltered.

Changelog

  • ExtractedUiNodes::uinodes is no longer sorted in prepare_uinodes, instead it has a second vec of ranges that partition uinodes. This partition is sorted by stack index instead.
  • New component UiStackIndex: holds each UI node's stack index as generated by ui_stack_system.
  • UI extraction functions no longer walk UiStack, instead the stack index for each node is retrieved from their UiStackIndex component.
  • The uinodes field of ExtractedUiNodes is now private. Use the new push and push_nodes to add ExtractedUiNodes to be rendered.

Migration Guide

The uinodes field of ExtractedUiNodes is now private. Use the new push and push_nodes to add ExtractedUiNodes to be rendered.

Instead of a single list of `ExtractedUiNodes`, we have a separate list for each extraction function.
Then we don't need to sort `ExtractedUiNodes` in `prepare_uinodes` as each sublist is already ordered.
@ickshonpe ickshonpe changed the title No sort prepare_uinodes Remove the sort of ExtractedUiNodes from prepare_uinodes Jul 19, 2023
@ickshonpe ickshonpe changed the title Remove the sort of ExtractedUiNodes from prepare_uinodes Don't sort ExtractedUiNodes in prepare_uinodes Jul 19, 2023
Copy link
Contributor

@nicopap nicopap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is sound. Really nice to take advantage of the already-sorted nature of the UiStack though I think the code quality could be improved.

Comment on lines 658 to 692
let mut drains: Vec<_> = extracted_uinodes
.uinodes
.iter_mut()
.map(|uinodes| {
let mut d = uinodes.drain(..);
let first = d.next();
(first, d)
})
.collect();

let mut next_extracted_node = move || {
let mut min_stack_index = usize::MAX;
let mut n = usize::MAX;

for (i, (node, _)) in drains.iter_mut().enumerate() {
if let Some(node) = node {
if node.stack_index < min_stack_index {
n = i;
min_stack_index = node.stack_index;
}
}
}

if n == usize::MAX {
None
} else {
let (current, drain) = &mut drains[n];
let next = drain.next();
let out = current.take();
*current = next;
out
}
};

while let Some(extracted_uinode) = next_extracted_node() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you mentioned in the PR description, this needs to be a method on ExtractedUiNode. Ideally an impl Iterator next method.

crates/bevy_ui/src/render/mod.rs Outdated Show resolved Hide resolved
crates/bevy_ui/src/render/mod.rs Outdated Show resolved Hide resolved
Comment on lines 672 to 689
for (i, (node, _)) in drains.iter_mut().enumerate() {
if let Some(node) = node {
if node.stack_index < min_stack_index {
n = i;
min_stack_index = node.stack_index;
}
}
}

if n == usize::MAX {
None
} else {
let (current, drain) = &mut drains[n];
let next = drain.next();
let out = current.take();
*current = next;
out
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was at first extremely confused by how this works. So here is a summary for other reviewers:

  1. find the index (named n) in drains where node.stack_index is smallest (the for loop)
  2. Pick the (current, drain): (Option<T>, Drain<T>) from drains
  3. Read the current
  4. replace it with the drain.next()
  5. return current

I might suggest some improvement in a couple hours.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it's weird I just typed this in and it worked immediately, and then all the better-thought-out refactors I tried ruined the performance.

@nicopap
Copy link
Contributor

nicopap commented Jul 20, 2023

I opened two PRs to your branch, please check them out:

  1. Using unsafe Interator-based ExtractUiNodes change **making use of unsafe** ickshonpe/bevy#2
  2. No unsafe, but more allocation Interator-based ExtractUiNodes change ickshonpe/bevy#1

@nicopap
Copy link
Contributor

nicopap commented Jul 20, 2023

An alternative implementation is to create a Vec<ExtractedUiNode>, (maybe using a SparseSet?) pass it as a resource to all the extract functions, and insert the relevant node at the index in UiStack, now we have an already-sorted Vec!

The downside is that it might be fairly wasteful if most ui nodes are invisible (supposing we simply "hide" menus instead of despawning them) The current impl might consume at most twice as much memory as there are visible extracted UI nodes though the memory for extracted UI nodes is never reclaimed.

@nicopap nicopap added C-Performance A change motivated by improving speed, memory usage or compile times A-UI Graphical user interfaces, styles, layouts, and widgets C-Breaking-Change A breaking change to Bevy's public API that needs to be noted in a migration guide labels Jul 20, 2023
@nicopap
Copy link
Contributor

nicopap commented Jul 20, 2023

I traced my PRs and they are a bit slower than your code.

@ickshonpe
Copy link
Contributor Author

I traced my PRs and they are a bit slower than your code.

Yeah I can't figure it out. I came back a couple of times yesterday to try a few new ideas but either they made no difference or were significantly worse (those were ones I thought were really promising).

Reimplemented the original version as an iterator and made a few refactorings so it's not so ugly anyway. No performance difference:

Captureprep

@nicopap
Copy link
Contributor

nicopap commented Jul 21, 2023

I did some more experimenting and I found we should also check the time on ExtractSchedule, since inserting into ExtractedUiNodes takes time as well.

@nicopap
Copy link
Contributor

nicopap commented Jul 21, 2023

Using a sparse set (Vec<Option<ExtractedUiNode>>) is much faster than a Illife vector (Vec<Vec<ExtractedUiNode>>) I opened a PR on your branch, but with caveats:

Instead of sorting `ExtractedUiNodes`, add a lookup index. Then in prepare_uinodes sort the indices by `stack_index` then use the sorted indices to retrieve each `ExtractedUiNode` in the correct order.

* Added a field `indices` to `ExtractedUiNodes`.
* Added `ExtractedIndex` type.
* New component, holds the uinode entity's stack index. Updated in `ui_stack_system after the `UiStack` is generated.

`bevy_ui::node_bundles`
* Added the `UiStackIndex` to all UI node bundles.

`bevy_ui::render`
* Extract stack indices from the `UiNode` component.
@ickshonpe
Copy link
Contributor Author

Ready for review. Just going to run some final benchmarks.

@@ -119,6 +119,8 @@ pub struct ImageBundle {
pub z_index: ZIndex,
/// The node's position in the UiStack. Nodes with lower stack indices are drawn first.
/// Nodes with a higher stack index are drawn over nodes with a lower stack index.
///
/// This field is automatically managed by `ui_stack_system`.
Copy link
Contributor

@nicopap nicopap Aug 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a component, not a field. (well yes it's a field, but "component" is more in line with how to use bevy)

Copy link
Contributor Author

@ickshonpe ickshonpe Aug 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copied it from the doc comments for the other fields without thinking. Yeah, it doesn't make sense. The bundle's field isn't the thing being managed by the system. Better change them all.

Copy link
Contributor

@nicopap nicopap Aug 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you prefer, you could open an issue and keep this for a different PR. But I doubt anyone would object this change with this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noticed as well for GlobalTransform:

/// This component is automatically managed by the UI layout system.

I think I'll fix the field thing here as it's within the focus of this PR as it affects the new comments I added, but fix the GlobalTransform comment in another PR.

…nent`. The bundle's field isn't the thing being managed by the system.
@mockersf
Copy link
Member

I relaunched CI, tests are still failing, could you take a look?

@alice-i-cecile alice-i-cecile added this to the 0.12 milestone Aug 20, 2023
/// The node's position in the UiStack. Nodes with lower stack indices are drawn first.
/// Nodes with a higher stack index are drawn over nodes with a lower stack index.
///
/// This component is automatically managed by `ui_stack_system`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Link would be nice here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto replicated doc comments before.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, can't link to ui_stack_system because it's private, that's why the CI was failing.

Copy link
Member

@alice-i-cecile alice-i-cecile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code and docs quality is good, motivation is sound and benchmark results are excellent.

@alice-i-cecile alice-i-cecile added the S-Ready-For-Final-Review This PR has been approved by the community. It's ready for a maintainer to consider merging it label Aug 20, 2023
@ickshonpe
Copy link
Contributor Author

I relaunched CI, tests are still failing, could you take a look?

should be good now

@alice-i-cecile
Copy link
Member

stack::tests::test_ui_stack_system

Test still appears to be failing 🤔

@ickshonpe
Copy link
Contributor Author

stack::tests::test_ui_stack_system

Test still appears to be failing 🤔

I'll have another look

@ickshonpe ickshonpe mentioned this pull request Aug 22, 2023
@alice-i-cecile
Copy link
Member

@ickshonpe I can merge this once the merge conflict is resolved :) Ping me if I'm slow!

@ickshonpe
Copy link
Contributor Author

@ickshonpe I can merge this once the merge conflict is resolved :) Ping me if I'm slow!

I might open a new PR and leave this one. I've had to rework everything to accommodate the recent rendering changes.

This was referenced Sep 2, 2023
@alice-i-cecile
Copy link
Member

Closing in favor of #9668 due to merge conflicts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-UI Graphical user interfaces, styles, layouts, and widgets C-Breaking-Change A breaking change to Bevy's public API that needs to be noted in a migration guide C-Performance A change motivated by improving speed, memory usage or compile times S-Ready-For-Final-Review This PR has been approved by the community. It's ready for a maintainer to consider merging it
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants