[bevy_<util/ecs>] Better SparseArray performance #2104

NathanSWard · 2021-05-05T02:53:13Z

Problem:

bevy_ecs::SparseArray uses an Option<T> internally to represent is the
value is valid or not.
This causes extra branching as well as extra memory overhead due to
Option's discriminatior.

Solution:

Implement a NonMaxUsize class which guarantees Option<NonMaxUsize> is the same size as NoneMaxUsize.

alice-i-cecile · 2021-05-05T03:01:16Z

I'd love to see benchmarks on this if you have them <3 Not doubting that it's faster, but I'm curious about the impact (and it's nice for release notes).

NathanSWard · 2021-05-05T03:04:08Z

Note, I'm totally up for bike-shedding on the NoneValue trait name, and it's functions' names.

BoxyUwU · 2021-05-05T03:04:54Z

Is there a reason we cant use types with niche's for all of our sparseset indexes? I havent had a chance to look through all our keys and see if this is possible but it would be nice to know before doing something like this

Edit: after looking through it seems like everything we use here is either a usize or contains a usize and so could we just use NonZeroUsize instead

crates/bevy_utils/src/packed_option.rs

NathanSWard · 2021-05-05T04:35:19Z

Edit: after looking through it seems like everything we use here is either a usize or contains a usize and so could we just use NonZeroUsize instead

What about ArchetypeId

    #[inline]
    pub const fn empty() -> ArchetypeId {
        ArchetypeId(0)
    }

Also, I'm pretty sure we need to allow the values to be 0, since a lot of them are indices into arrays and the first valid index is 0 (see the SparseSetIndex trait).
So if there was a NonMaxUsize that would be quite convenient....

cart · 2021-05-05T17:56:56Z

Yeah the primary purpose of these sparse sets is to be used in the context of arrays, so NonZeroUsize isn't really viable. I don't want to push "dummy data" into the zero index of each array or need to remember to add 1 in the context of sparse sets.

cart · 2021-05-05T17:59:14Z

I'd also like to see benches, as the primary motivator here is perf.

crates/bevy_ecs/src/storage/sparse_set.rs

NathanSWard · 2021-05-07T21:53:43Z

Another interesting find.
I used a different implementation that had a NonMaxUsize struct.

#[derive(Copy, Clone, Eq, PartialEq, Ord, PartialOrd, Hash)]
pub struct NonMaxUsize(NonZeroUsize);

impl NonMaxUsize {
    #[inline]
    pub const fn new(n: usize) -> Option<Self> {
        Some(Self(NonZeroUsize::new(n ^ usize::MAX)?))
    }

    #[inline]
    pub const unsafe fn new_unchecked(n: usize) -> Self {
        Self(NonZeroUsize::new_unchecked(n ^ usize::MAX))
    }

    #[inline(always)]
    pub const fn get(self) -> usize {
        self.0.get() ^ usize::MAX
    }
}

Granted the big downside is that whenever you want to access the value you have to do a XOR usize::MAX. But with the implementation, I try to limit the number of get() calls.

This only optimizes that path that we have a SparseArray<T, usize> where I replaced usize with NonMaxUsize.
This performance results seemed to have the best outcome thus far.

BEVY-NONMAXUSIZE.txt

You can see the branch/commit I used for testing here.

BoxyUwU · 2021-05-08T00:47:52Z

n XOR usize::MAX is just bitwise negation right? Could we just do !self.0.get() and !n instead?

NathanSWard · 2021-05-08T02:38:50Z

n XOR usize::MAX is just bitwise negation right? Could we just do !self.0.get() and !n instead?

Yep, they're the same 😄
I would personally prefer the xor since I think its clearer what the implementation is doing, but I don't have really have a strong opinion.

Also in case performance is a concern, they both compile down to the same assembly.

alice-i-cecile · 2021-05-08T02:41:44Z

I find the "!" form clearer, but I don't feel strongly about it :p

NathanSWard · 2021-05-09T02:01:07Z

I just pushed changes changing the implementation to use the NoneMaxUsize impl.
From my benches (as seen in the comments above) this had the best performance.

However, I would like someone to run the benches as well just to confirm :)

Frizi · 2021-05-13T19:34:05Z

I've ran the benchmarks against current main branch. Note that the insertion benchmarks are fairly noisy, but iteration ones is fairly stable.

Table sorted rougly by relative timing difference, from biggest regression first to biggest improvement last. Note that variance quite high in some cases, error bars being bigger than the absolute difference. So take those with a grain of salt.

group                                       main                     sparse-array-option
-----                                       ----                     -------------------
add_remove_component/bevy_sparse_set        1.00  1392.1±142.59µs    1.04  1449.9±358.26µs
add_remove_component_big/bevy_sparse_set    1.00  1529.3±299.08µs    1.02  1553.3±291.35µs
fragmented_iter/bevy                        1.00   460.0±41.16ns     1.01   464.1±33.63ns
fragmented_iter/bevy_foreach                1.00   309.5±24.98ns     1.01   312.4±19.65ns
heavy_compute/bevy                          1.00   1054.9±7.81µs     1.00  1057.9±12.34µs
add_remove_component/bevy_table             1.00  1732.8±149.43µs    1.00  1735.5±156.73µs
simple_iter/bevy_foreach                    1.00     12.6±0.94µs     1.00     12.5±0.57µs
sparse_fragmented_iter/bevy                 1.01     15.1±1.12ns     1.00     15.0±1.27ns
get_component/bevy_system                   1.01  1217.9±52.25µs     1.00  1209.3±105.16µs
add_remove_component_big/bevy_table         1.02      4.2±0.16ms     1.00      4.1±0.10ms
schedule/bevy                               1.03     59.9±2.78µs     1.00     58.4±2.96µs
get_component/bevy                          1.04  1509.7±257.13µs    1.00  1453.1±142.34µs
simple_iter/bevy_sparse_foreach             1.06    62.2±10.27µs     1.00    58.6±11.26µs
simple_iter/bevy                            1.09     23.5±2.21µs     1.00     21.5±1.68µs
simple_insert/bevy                          1.09   678.2±51.43µs     1.00   621.3±30.79µs
sparse_fragmented_iter/bevy_foreach         1.10     13.2±0.87ns     1.00     12.0±1.30ns
simple_iter/bevy_sparse                     1.11     63.1±4.82µs     1.00     56.9±2.38µs
simple_insert/bevy_unbatched                1.20  1963.5±364.22µs    1.00  1638.5±49.03µs

NathanSWard · 2021-05-13T23:09:09Z

ping @cart for opinions/views on the perf tests.

bjorn3 · 2021-05-14T08:19:04Z

@Frizi If you are on Linux could you try using taskset to limit the benchmark process to a specific core and echo performance | sudo -k tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor to make the frequency scaling more predictable.

Frizi · 2021-05-14T12:11:43Z

@Frizi If you are on Linux could you try using taskset to limit the benchmark process to a specific core and echo performance | sudo -k tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor to make the frequency scaling more predictable.

I'm a Windows desktop user :) Also with HyperV and WSL enabled, thus making the Windows kernel a guest OS. So probably not much I can do about the high variance without drastically changing my setup.

alice-i-cecile · 2022-02-04T16:04:01Z

This LGTM, but I prefer the approach taken in #3678.

@NathanSWard

# Objective Adoption of #2104 and #11843. The `Option<usize>` wastes 3-7 bytes of memory per potential entry, and represents a scaling memory overhead as the ID space grows. The goal of this PR is to reduce memory usage without significantly impacting common use cases. Co-Authored By: @NathanSWard Co-Authored By: @tygyh ## Solution Replace `usize` in `SparseSet`'s sparse array with `nonmax::NonMaxUsize`. NonMaxUsize wraps a NonZeroUsize, and applies a bitwise NOT to the value when accessing it. This allows the compiler to niche the value and eliminate the extra padding used for the `Option` inside the sparse array, while moving the niche value from 0 to usize::MAX instead. Checking the [diff in x86 generated assembly](james7132/bevy_asm_tests@6e4da65), this change actually results in fewer instructions generated. One potential downside is that it seems to have moved a load before a branch, which means we may be incurring a cache miss even if the element is not there. Note: unlike #2104 and #11843, this PR only targets the metadata stores for the ECS and not the component storage itself. Due to #9907 targeting `Entity::generation` instead of `Entity::index`, `ComponentSparseSet` storing only up to `u32::MAX` elements would become a correctness issue. This will come with a cost when inserting items into the SparseSet, as now there is a potential for a panic. These cost are really only incurred when constructing a new Table, Archetype, or Resource that has never been seen before by the World. All operations that are fairly cold and not on any particular hotpath, even for command application. --- ## Changelog Changed: `SparseSet` now can only store up to `usize::MAX - 1` elements instead of `usize::MAX`. Changed: `SparseSet` now uses 33-50% less memory overhead per stored item.

james7132 · 2024-03-04T04:23:30Z

This has been implemented for non-component sparse sets with #12083, a PR that was (twice) adopted from this one, and it's likely infeasible to implement for component sparse sets without it being a correctness issue for the rest of the ECS. Closing this out.

@NathanSWard

# Objective Adoption of bevyengine#2104 and bevyengine#11843. The `Option<usize>` wastes 3-7 bytes of memory per potential entry, and represents a scaling memory overhead as the ID space grows. The goal of this PR is to reduce memory usage without significantly impacting common use cases. Co-Authored By: @NathanSWard Co-Authored By: @tygyh ## Solution Replace `usize` in `SparseSet`'s sparse array with `nonmax::NonMaxUsize`. NonMaxUsize wraps a NonZeroUsize, and applies a bitwise NOT to the value when accessing it. This allows the compiler to niche the value and eliminate the extra padding used for the `Option` inside the sparse array, while moving the niche value from 0 to usize::MAX instead. Checking the [diff in x86 generated assembly](james7132/bevy_asm_tests@6e4da65), this change actually results in fewer instructions generated. One potential downside is that it seems to have moved a load before a branch, which means we may be incurring a cache miss even if the element is not there. Note: unlike bevyengine#2104 and bevyengine#11843, this PR only targets the metadata stores for the ECS and not the component storage itself. Due to bevyengine#9907 targeting `Entity::generation` instead of `Entity::index`, `ComponentSparseSet` storing only up to `u32::MAX` elements would become a correctness issue. This will come with a cost when inserting items into the SparseSet, as now there is a potential for a panic. These cost are really only incurred when constructing a new Table, Archetype, or Resource that has never been seen before by the World. All operations that are fairly cold and not on any particular hotpath, even for command application. --- ## Changelog Changed: `SparseSet` now can only store up to `usize::MAX - 1` elements instead of `usize::MAX`. Changed: `SparseSet` now uses 33-50% less memory overhead per stored item.

alice-i-cecile added A-ECS Entities, components, systems, and events C-Performance A change motivated by improving speed, memory usage or compile times labels May 5, 2021

NathanSWard force-pushed the nward/sparse-array-option branch from da0235a to 8d1242f Compare May 5, 2021 03:02

alice-i-cecile reviewed May 5, 2021

View reviewed changes

crates/bevy_utils/src/packed_option.rs Outdated Show resolved Hide resolved

NathanSWard force-pushed the nward/sparse-array-option branch from 8d1242f to 7c9b6d7 Compare May 7, 2021 05:34

bjorn3 reviewed May 7, 2021

View reviewed changes

crates/bevy_ecs/src/storage/sparse_set.rs Outdated Show resolved Hide resolved

NathanSWard force-pushed the nward/sparse-array-option branch from e791424 to 6edd244 Compare May 9, 2021 01:58

NonMaxUsize for SparseArrays

e3a6667

NathanSWard force-pushed the nward/sparse-array-option branch from 6edd244 to e3a6667 Compare May 9, 2021 02:05

add repr[(ransparent)] to ensure size_of::<Option<T>>

d4cf6df

cart added the S-Pre-Relicense This PR was made before Bevy added the Apache license. Cannot be merged or used for other work label Jul 23, 2021

mockersf removed the S-Pre-Relicense This PR was made before Bevy added the Apache license. Cannot be merged or used for other work label Jul 24, 2021

DJMcNab mentioned this pull request Oct 25, 2021

Allow Option<Entity> to use memory layout optimization #3022

Closed

This was referenced Jan 30, 2022

Optimize memory size of SparseArray and integer-based ECS IDs #3678

Closed

Implement paged SparseArray #3813

Closed

alice-i-cecile added the S-Adopt-Me The original PR author has no intent to complete this work. Pick me up! label Jul 14, 2022

tygyh mentioned this pull request Feb 13, 2024

NonMaxUsize for SparseArrays #11843

Closed

james7132 mentioned this pull request Feb 24, 2024

Use NonMaxUsize for non-component SparseSets #12083

Merged

james7132 closed this Mar 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bevy_<util/ecs>] Better SparseArray performance #2104

[bevy_<util/ecs>] Better SparseArray performance #2104

NathanSWard commented May 5, 2021 •

edited

Loading

alice-i-cecile commented May 5, 2021

NathanSWard commented May 5, 2021

BoxyUwU commented May 5, 2021 •

edited

Loading

NathanSWard commented May 5, 2021 •

edited

Loading

cart commented May 5, 2021

cart commented May 5, 2021

NathanSWard commented May 7, 2021 •

edited

Loading

BoxyUwU commented May 8, 2021

NathanSWard commented May 8, 2021

alice-i-cecile commented May 8, 2021

NathanSWard commented May 9, 2021

Frizi commented May 13, 2021 •

edited

Loading

NathanSWard commented May 13, 2021

bjorn3 commented May 14, 2021

Frizi commented May 14, 2021

alice-i-cecile commented Feb 4, 2022

james7132 commented Mar 4, 2024

[bevy_<util/ecs>] Better SparseArray performance #2104

[bevy_<util/ecs>] Better SparseArray performance #2104

Conversation

NathanSWard commented May 5, 2021 • edited Loading

alice-i-cecile commented May 5, 2021

NathanSWard commented May 5, 2021

BoxyUwU commented May 5, 2021 • edited Loading

NathanSWard commented May 5, 2021 • edited Loading

cart commented May 5, 2021

cart commented May 5, 2021

NathanSWard commented May 7, 2021 • edited Loading

BoxyUwU commented May 8, 2021

NathanSWard commented May 8, 2021

alice-i-cecile commented May 8, 2021

NathanSWard commented May 9, 2021

Frizi commented May 13, 2021 • edited Loading

NathanSWard commented May 13, 2021

bjorn3 commented May 14, 2021

Frizi commented May 14, 2021

alice-i-cecile commented Feb 4, 2022

james7132 commented Mar 4, 2024

NathanSWard commented May 5, 2021 •

edited

Loading

BoxyUwU commented May 5, 2021 •

edited

Loading

NathanSWard commented May 5, 2021 •

edited

Loading

NathanSWard commented May 7, 2021 •

edited

Loading

Frizi commented May 13, 2021 •

edited

Loading