Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NonMaxUsize for SparseArrays #11843

Closed
wants to merge 1 commit into from

Conversation

tygyh
Copy link
Contributor

@tygyh tygyh commented Feb 13, 2024

Objective

Solution

  • Rebase and solve change conflicts.

@tygyh
Copy link
Contributor Author

tygyh commented Feb 13, 2024

There is a conflict between this and #2227 which I do not know how to resolve. I am very open to suggestions on how to resolve their changes in 'insert' in 'sparse_set'.

@james7132 james7132 added S-Needs-Benchmarking This set of changes needs performance benchmarking to double-check that they help A-ECS Entities, components, systems, and events C-Performance A change motivated by improving speed, memory usage or compile times labels Feb 13, 2024
@@ -0,0 +1,85 @@
macro_rules! impl_non_max_fmt {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a nonmax crate that already implements this. We should try using that first.

}
} else {
self.sparse.insert(index.clone(), self.dense.len());
self.sparse
.insert(index.clone(), NonMaxUsize::new(self.dense.len()).unwrap());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't an uncommon branch to take, and this introduces a potential panic, even if we never panic normally, this will have a negative impact on performance, and it's not something we can use debug_checked_unwrap on either since we know for a fact that it's something that we can hit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that mean you want me to revert that line?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I honestly am not sure how to deal with this. This branch is unavoidable if we want to avoid unsoundness. If we can find a way to address this without impacting performance, this seems feasible, but otherwise, this is potentially on the hotpath for a lot of operations within the engine, so any tangible performance regression is probably unacceptable. Definitely need to benchmark this in some way to validate.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the potential performance issue here? Is it xor operation or unwrap()?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The best way to deal with this is to improve Rust language (there're a lot of discussions like this). But I'm afraid we're not getting it in the next ten years. :(

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the potential performance issue here? Is it xor operation or unwrap()?

The unwrap, the XOR has persistently shown to be easily pipelinable in any current CPUs without adding too much here, but the extra branch and codegen from panicking has shown to have a signficant impact when working with code that is this hot.

@james7132
Copy link
Member

@tygyh I want to experiment with this a bit more. Do you mind if I resolve the merge conflicts on this branch?

@tygyh
Copy link
Contributor Author

tygyh commented Feb 23, 2024

@tygyh I want to experiment with this a bit more. Do you mind if I resolve the merge conflicts on this branch?
Go ahead

@james7132
Copy link
Member

Actually, this might be easier to rewrite this from scratch given how much has changed. Do you mind if I adopt this? I'm seeing some results on a local re-implementation that at least should be neutral in CPU time performance, while also saving the memory.

@tygyh
Copy link
Contributor Author

tygyh commented Feb 24, 2024

@james7132 Re-adopt this if you want to. I am finished with it.

@james7132
Copy link
Member

Closing in favor of #12083. Will credit you as a part of that PR.

@james7132 james7132 closed this Feb 24, 2024
@tygyh tygyh deleted the nward/sparse-array-option branch February 24, 2024 11:36
github-merge-queue bot pushed a commit that referenced this pull request Mar 3, 2024
# Objective
Adoption of #2104 and #11843. The `Option<usize>` wastes 3-7 bytes of
memory per potential entry, and represents a scaling memory overhead as
the ID space grows.

The goal of this PR is to reduce memory usage without significantly
impacting common use cases.

Co-Authored By: @NathanSWard 
Co-Authored By: @tygyh 

## Solution
Replace `usize` in `SparseSet`'s sparse array with
`nonmax::NonMaxUsize`. NonMaxUsize wraps a NonZeroUsize, and applies a
bitwise NOT to the value when accessing it. This allows the compiler to
niche the value and eliminate the extra padding used for the `Option`
inside the sparse array, while moving the niche value from 0 to
usize::MAX instead.

Checking the [diff in x86 generated
assembly](james7132/bevy_asm_tests@6e4da65),
this change actually results in fewer instructions generated. One
potential downside is that it seems to have moved a load before a
branch, which means we may be incurring a cache miss even if the element
is not there.

Note: unlike #2104 and #11843, this PR only targets the metadata stores
for the ECS and not the component storage itself. Due to #9907 targeting
`Entity::generation` instead of `Entity::index`, `ComponentSparseSet`
storing only up to `u32::MAX` elements would become a correctness issue.

This will come with a cost when inserting items into the SparseSet, as
now there is a potential for a panic. These cost are really only
incurred when constructing a new Table, Archetype, or Resource that has
never been seen before by the World. All operations that are fairly cold
and not on any particular hotpath, even for command application.

---

## Changelog
Changed: `SparseSet` now can only store up to `usize::MAX - 1` elements
instead of `usize::MAX`.
Changed: `SparseSet` now uses 33-50% less memory overhead per stored
item.
spectria-limina pushed a commit to spectria-limina/bevy that referenced this pull request Mar 9, 2024
# Objective
Adoption of bevyengine#2104 and bevyengine#11843. The `Option<usize>` wastes 3-7 bytes of
memory per potential entry, and represents a scaling memory overhead as
the ID space grows.

The goal of this PR is to reduce memory usage without significantly
impacting common use cases.

Co-Authored By: @NathanSWard 
Co-Authored By: @tygyh 

## Solution
Replace `usize` in `SparseSet`'s sparse array with
`nonmax::NonMaxUsize`. NonMaxUsize wraps a NonZeroUsize, and applies a
bitwise NOT to the value when accessing it. This allows the compiler to
niche the value and eliminate the extra padding used for the `Option`
inside the sparse array, while moving the niche value from 0 to
usize::MAX instead.

Checking the [diff in x86 generated
assembly](james7132/bevy_asm_tests@6e4da65),
this change actually results in fewer instructions generated. One
potential downside is that it seems to have moved a load before a
branch, which means we may be incurring a cache miss even if the element
is not there.

Note: unlike bevyengine#2104 and bevyengine#11843, this PR only targets the metadata stores
for the ECS and not the component storage itself. Due to bevyengine#9907 targeting
`Entity::generation` instead of `Entity::index`, `ComponentSparseSet`
storing only up to `u32::MAX` elements would become a correctness issue.

This will come with a cost when inserting items into the SparseSet, as
now there is a potential for a panic. These cost are really only
incurred when constructing a new Table, Archetype, or Resource that has
never been seen before by the World. All operations that are fairly cold
and not on any particular hotpath, even for command application.

---

## Changelog
Changed: `SparseSet` now can only store up to `usize::MAX - 1` elements
instead of `usize::MAX`.
Changed: `SparseSet` now uses 33-50% less memory overhead per stored
item.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-ECS Entities, components, systems, and events C-Performance A change motivated by improving speed, memory usage or compile times S-Needs-Benchmarking This set of changes needs performance benchmarking to double-check that they help
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants