-
-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
column/table level change detection #11120
base: main
Are you sure you want to change the base?
Conversation
Okay, really neat stuff. Thanks for trying this out and presenting those benchmark results! Initial thoughts:
|
Definitely excited to have this, as it opens up an opportunity for my reactive UI crate to skip regenerating the UI when a query has no changed data at the archetype/column level. |
current impl of this pr includes both table (built with their owned table-storage columns) and every columns (both table-storage and sparse-set-storage). schedule cannot guarantee only one mutable access to a table and sparse-set column simultaneously. If we avoid using AtomicType, the viable option would be to only implement column-level change detection specifically for table-storage columns. (of course ,i think not to use atomic is the right way that will greatly reduce overhead )
yeah,i think i can add more benchmark related to change detection in another pr |
pub fn read_last_mutable_access_tick(&self) -> Tick { | ||
Tick::new( | ||
self.last_mutable_access_tick | ||
.load(std::sync::atomic::Ordering::Relaxed), | ||
) | ||
} | ||
pub fn update_last_mutable_access_tick(&self, tick: Tick) { | ||
// self.last_mutable_access_tick | ||
// .store(tick.get(), std::sync::atomic::Ordering::Relaxed); | ||
self.last_mutable_access_tick | ||
.fetch_max(tick.get(), std::sync::atomic::Ordering::AcqRel); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain the choice for memory ordering? Naively, I would have expected the fetch_max to be Release
and the load to be Acquire
, so that we cannot miss changes. However I'm not confident in my understanding of atomic ordering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't deeply considered this aspect yet(so please correct me if i am wrong). My thought was that there might not be any other data flow related to access-tick. Therefore, using a relaxed load operation would be acceptable. Bevy's scheduler may run systems with same table access in parallel, but we cannot guarantee the progress when system would load or store corresponding tick, when two systems write to the same tick concurrently. store operations must be observed. but for load operation ,even if changes are missed,it is not a big deal,our focus is on comparing the loaded value to the system's last_change_tick.
The only time we need to update change ticks in parallel is when we're using Also adding branching inside the iterator is very bad for vectorization. You probably want to somehow update the matched archetypes. Probably needs benchmarking to be sure. |
yeah,this impl is just a try to identify a method that balances both ergonomics and performance.still need more benchmark,my current thoughts are:
|
Objective
Solution
AtomicU32
type to track the last mutable access for tables and columns.set_table
andset_archetype
methods.archetype_filter_fetch
method onQueryFilter
to filter out tables or columns that have not been subject to mutable access since system last run.Notes
I !!!express reservations about the current implementation due to its involvement in multiple data flows, potentially increasing code vulnerability and expensive . I would highly appreciate it if someone could propose an alternative implementation details with fewer complexities and improved security considerations.
Performance
Intel 13400kf NV4070ti
to comprehensively evaluate the change detection mechanism, additional benchmarking across various scenarios is deemed necessary
regression:
The regression in spawn/insert is attributed to the introduction of an atomic operation for each table per component.
iter_fragmented_*
method appears to only iterate over few entities, making the maintenance of tick costly.query.get_mut_compoent/unchecked(entity)
is expensive becase it set the archetype for each entity retrieved.It is worth mentioning that many_foxes frame_time dropped from 3.06ms to 4.21ms
yellow is main ,red is the pr
the hotspot is
animation_player
,propagate_transform
which parallel use of get_unchecked for each entity seems leading to numerous invalid cache instances across CPU cores.Migration Guide