JIT: Investigate what to do about store forwarding stalls due to block copies #100769
Labels
area-CodeGen-coreclr
CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Milestone
When struct copies are transformed into block copies it can lead to store forwarding stalls if the block copy involves padding that was never written. #96524 and #100750 (comment) are examples that show some of the potential cost of these stalls. #99835 (comment) has some discussion as well.
It's possible to generate these struct copies without accessing any padding, but it is at the expense of larger code that is probably slower if the source was also written as a block op, so it is not clear what the right trade off is.
I am also not sure how good the CPUs are at reconstructing the source from several stores. For example, if we wrote both
Span<T>._reference
andSpan<T>._length
as 8 bytes, would a 16-byte SIMD read still stall? If it doesn't then perhaps we could alleviate some issues by cheaply extending some stores to cover padding as well.cc @dotnet/jit-contrib @stephentoub
The text was updated successfully, but these errors were encountered: