column/table level change detection #11120

re0312 · 2023-12-28T17:09:51Z

Objective

currently, bevy's change detection will check all matched archtype entities ,leading to unnecessary overhead in certain scenarios.
try to fix Accelerate change detection by redundantly storing change ticks #5097

Solution

Introducing an AtomicU32 type to track the last mutable access for tables and columns.
Assigning the responsibility of updating this information to the set_table and set_archetype methods.
Implementing the archetype_filter_fetch method on QueryFilter to filter out tables or columns that have not been subject to mutable access since system last run.

Notes

I !!!express reservations about the current implementation due to its involvement in multiple data flows, potentially increasing code vulnerability and expensive . I would highly appreciate it if someone could propose an alternative implementation details with fewer complexities and improved security considerations.

Performance

Intel 13400kf NV4070ti

to comprehensively evaluate the change detection mechanism, additional benchmarking across various scenarios is deemed necessary

regression:

The regression in spawn/insert is attributed to the introduction of an atomic operation for each table per component.
iter_fragmented_* method appears to only iterate over few entities, making the maintenance of tick costly.
query.get_mut_compoent/unchecked(entity) is expensive becase it set the archetype for each entity retrieved.

It is worth mentioning that many_foxes frame_time dropped from 3.06ms to 4.21ms
yellow is main ,red is the pr

the hotspot is animation_player ,propagate_transform which parallel use of get_unchecked for each entity seems leading to numerous invalid cache instances across CPU cores.

Migration Guide

This section is optional. If there are no breaking changes, you can delete this section.

If this PR is a breaking change (relative to the last release of Bevy), describe how a user might need to migrate their code to support these changes
Simply adding new functionality is not a breaking change.
Fixing behavior that was definitely a bug, rather than a questionable design choice is not a breaking change.

alice-i-cecile · 2023-12-28T18:17:19Z

Okay, really neat stuff. Thanks for trying this out and presenting those benchmark results! Initial thoughts:

I think / hope we should be able to do this without the introduction of atomics. As you've seen, they introduce a meaningful performance overhead on all operations. The scheduler should already guarantee that only one system ever has mutable access to each archetype-component (I think this is what you called a table) at once. However, we will still need to be cautious in how we implement parallel iteration over queries directly.
Firmly agree on the need for more benchmarks for change detection before merging something in this space. I want to have a clear sense of the performance tradeoffs involved, and the
We should consider making change detection configurable (Change Detection as Components / Opt-in or opt-out of Change Detection #4882) before merging this work: I suspect the correct trade-offs will vary substantially by component type.

JMS55 · 2023-12-28T20:21:57Z

Definitely excited to have this, as it opens up an opportunity for my reactive UI crate to skip regenerating the UI when a query has no changed data at the archetype/column level.

re0312 · 2023-12-29T03:41:08Z

I think / hope we should be able to do this without the introduction of atomics. As you've seen, they introduce a meaningful performance overhead on all operations. The scheduler should already guarantee that only one system ever has mutable access to each archetype-component (I think this is what you called a table) at once. However, we will still need to be cautious in how we implement parallel iteration over queries directly.

current impl of this pr includes both table (built with their owned table-storage columns) and every columns (both table-storage and sparse-set-storage). schedule cannot guarantee only one mutable access to a table and sparse-set column simultaneously. If we avoid using AtomicType, the viable option would be to only implement column-level change detection specifically for table-storage columns. (of course ,i think not to use atomic is the right way that will greatly reduce overhead )

Firmly agree on the need for more benchmarks for change detection before merging something in this space. I want to have a clear sense of the performance tradeoffs involved, and the

yeah,i think i can add more benchmark related to change detection in another pr

Davier · 2023-12-29T11:43:16Z

crates/bevy_ecs/src/storage/table.rs

+    pub fn read_last_mutable_access_tick(&self) -> Tick {
+        Tick::new(
+            self.last_mutable_access_tick
+                .load(std::sync::atomic::Ordering::Relaxed),
+        )
+    }
+    pub fn update_last_mutable_access_tick(&self, tick: Tick) {
+        // self.last_mutable_access_tick
+        //     .store(tick.get(), std::sync::atomic::Ordering::Relaxed);
+        self.last_mutable_access_tick
+            .fetch_max(tick.get(), std::sync::atomic::Ordering::AcqRel);
+    }


Could you explain the choice for memory ordering? Naively, I would have expected the fetch_max to be Release and the load to be Acquire, so that we cannot miss changes. However I'm not confident in my understanding of atomic ordering.

I haven't deeply considered this aspect yet(so please correct me if i am wrong). My thought was that there might not be any other data flow related to access-tick. Therefore, using a relaxed load operation would be acceptable. Bevy's scheduler may run systems with same table access in parallel, but we cannot guarantee the progress when system would load or store corresponding tick, when two systems write to the same tick concurrently. store operations must be observed. but for load operation ,even if changes are missed,it is not a big deal,our focus is on comparing the loaded value to the system's last_change_tick.

hymm · 2024-01-03T06:34:21Z

The only time we need to update change ticks in parallel is when we're using par_iter. Could we special case that scenario? Maybe use something like https://docs.rs/thread_local/1.1.7/thread_local/ and take the max of all the ticks after the par_iter is done and use that to update the archetype ticks.

Also adding branching inside the iterator is very bad for vectorization. You probably want to somehow update the matched archetypes. Probably needs benchmarking to be sure.

re0312 · 2024-01-03T14:59:24Z

The only time we need to update change ticks in parallel is when we're using par_iter. Could we special case that scenario? Maybe use something like https://docs.rs/thread_local/1.1.7/thread_local/ and take the max of all the ticks after the par_iter is done and use that to update the archetype ticks.

Also adding branching inside the iterator is very bad for vectorization. You probably want to somehow update the matched archetypes. Probably needs benchmarking to be sure.

yeah,this impl is just a try to identify a method that balances both ergonomics and performance.still need more benchmark,my current thoughts are:

archetype/table level change detection is not easy to impl in bevy,A archetype not only updates pararllel when using par_iter but also between systems run(mulityple systems can run parallel with same archetype)
it is not worth it that accurate update archetype/table change tick like Mut when deref_mut of each coloumn/archetype.
similar to flecs,we could update the archetype/column level tick when it first mutable access in a system run, for example ,Query<&mut T >.iter_mut().foreach that will update the corresponding archeype tick even if the loop does nothing;

re0312 added 2 commits December 27, 2023 14:06

init

5ed428f

table

1b78203

Davier reviewed Dec 29, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

column/table level change detection #11120

column/table level change detection #11120

re0312 commented Dec 28, 2023

alice-i-cecile commented Dec 28, 2023

JMS55 commented Dec 28, 2023 •

edited

Loading

re0312 commented Dec 29, 2023

Davier Dec 29, 2023

re0312 Dec 30, 2023

hymm commented Jan 3, 2024

re0312 commented Jan 3, 2024

column/table level change detection #11120

Are you sure you want to change the base?

column/table level change detection #11120

Conversation

re0312 commented Dec 28, 2023

Objective

Solution

Notes

Performance

Migration Guide

alice-i-cecile commented Dec 28, 2023

JMS55 commented Dec 28, 2023 • edited Loading

re0312 commented Dec 29, 2023

Davier Dec 29, 2023

Choose a reason for hiding this comment

re0312 Dec 30, 2023

Choose a reason for hiding this comment

hymm commented Jan 3, 2024

re0312 commented Jan 3, 2024

JMS55 commented Dec 28, 2023 •

edited

Loading