Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid cloning Arcs unnecessarily when iterating trackers #6721

Merged
merged 2 commits into from
Dec 13, 2024

Conversation

nical
Copy link
Contributor

@nical nical commented Dec 13, 2024

I noticed that in validate_command_buffer we spend a fair amount of CPU time cloning/dropping arcs while iterating trackers, just to peak into the contents of the buffer or textures (so we don't really need to hold on to the arcs and touch the reference counts).

This PR avoids that by iterating over references to the arcs instead of clones of the arcs.

I'm in a train and running low on battery so I can't properly record some numbers from the benchmarks right now, but when I ran them earlier I was seeing some non-trivial improvements in the order of 10% for some benchmark).

Checklist

  • Run cargo fmt.
  • Run cargo clippy.
  • Run cargo xtask test to run tests.
  • Add change to CHANGELOG.md. See simple instructions inside file.

@nical nical requested a review from a team as a code owner December 13, 2024 17:01
Copy link
Member

@cwfitzgerald cwfitzgerald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh that's very cool

@cwfitzgerald cwfitzgerald merged commit 3bcfe84 into gfx-rs:trunk Dec 13, 2024
27 checks passed
kentslaney pushed a commit to kentslaney/wgpu that referenced this pull request Dec 16, 2024
* Avoid cloning Arcs unnecessarily when iterating trackers

* Changelog entry
@nical
Copy link
Contributor Author

nical commented Dec 16, 2024

I ran the benchmark on a rather stable desktop CPU:

Renderpass: Single Threaded/1 renderpasses x 10000 draws (Renderpass Time)
                        Change within noise threshold.
Renderpass: Single Threaded/1 renderpasses x 10000 draws (Submit Time)
                        time:   [-15.302% -15.069% -14.839%] (p = 0.00 < 0.05)
                        thrpt:  [+17.425% +17.743% +18.067%]
                        Performance has improved.
Renderpass: Single Threaded/2 renderpasses x 5000 draws (Submit Time)
                        time:   [-6.2971% -5.9860% -5.6869%] (p = 0.00 < 0.05)
                        thrpt:  [+6.0298% +6.3672% +6.7203%]
                        Performance has improved.
Renderpass: Single Threaded/4 renderpasses x 2500 draws (Submit Time)
                        time:   [-4.5177% -4.1493% -3.7251%] (p = 0.00 < 0.05)
                        thrpt:  [+3.8692% +4.3289% +4.7315%]
                        Performance has improved.
Renderpass: Single Threaded/8 renderpasses x 1250 draws (Submit Time)
                        time:   [-2.6462% -2.6014% -2.5516%] (p = 0.00 < 0.05)
                        thrpt:  [+2.6184% +2.6709% +2.7182%]
                        Performance has improved.
Renderpass: Multi Threaded/2 threads x 5000 draws
                        time:   [-8.5528% -8.2123% -7.8804%] (p = 0.00 < 0.05)
                        thrpt:  [+8.5545% +8.9471% +9.3527%]
                        Performance has improved.
Renderpass: Multi Threaded/4 threads x 2500 draws
                        time:   [+5.2833% +5.5238% +5.7755%] (p = 0.00 < 0.05)
                        thrpt:  [-5.4601% -5.2346% -5.0182%]
                        Performance has regressed.
Renderpass: Multi Threaded/8 threads x 1250 draws
                        time:   [+1.7695% +2.2015% +2.6507%] (p = 0.00 < 0.05)
                        thrpt:  [-2.5823% -2.1541% -1.7388%]
                        Performance has regressed.
Renderpass: Bindless/10000 draws
                        No change in performance detected.
Renderpass: Empty Submit with 90000 Resources
                        change: [-9.6432% -9.5297% -9.4192%] (p = 0.00 < 0.05)
                        Performance has improved.
Computepass: Single Threaded/1 computepasses x 10000 dispatches (Computepass Time)
                        No change in performance detected.
Computepass: Single Threaded/2 computepasses x 5000 dispatches (Computepass Time)
                        Change within noise threshold.
Computepass: Single Threaded/4 computepasses x 2500 dispatches (Computepass Time)
                        Change within noise threshold.
Computepass: Single Threaded/8 computepasses x 1250 dispatches (Computepass Time)
                        time:   [-1.8904% -1.5646% -1.2260%] (p = 0.00 < 0.05)
                        thrpt:  [+1.2412% +1.5894% +1.9268%]
                        Performance has improved.
Computepass: Single Threaded/1 computepasses x 10000 dispatches (Submit Time)
                        time:   [-2.4299% -2.0110% -1.6030%] (p = 0.00 < 0.05)
                        thrpt:  [+1.6291% +2.0522% +2.4905%]
                        Performance has improved.
Computepass: Single Threaded/2 computepasses x 5000 dispatches (Submit Time)
                        No change in performance detected.
Computepass: Single Threaded/4 computepasses x 2500 dispatches (Submit Time)
                        Change within noise threshold.
Computepass: Single Threaded/8 computepasses x 1250 dispatches (Submit Time)
                        Change within noise threshold.
Computepass: Multi Threaded/2 threads x 5000 dispatch
                        time:   [-3.8511% -3.5067% -3.1018%] (p = 0.00 < 0.05)
                        thrpt:  [+3.2011% +3.6341% +4.0053%]
                        Performance has improved.
Computepass: Multi Threaded/4 threads x 2500 dispatch
                        time:   [+1.8356% +2.5082% +3.1603%] (p = 0.00 < 0.05)
                        thrpt:  [-3.0635% -2.4468% -1.8025%]
                        Performance has regressed.
Computepass: Multi Threaded/8 threads x 1250 dispatch
                        No change in performance detected.
Computepass: Bindless/1000 dispatch
                        Change within noise threshold.
Computepass: Empty Submit with 60000 Resources
                        change: [-8.7931% -8.7012% -8.6076%] (p = 0.00 < 0.05)
                        Performance has improved.

Overall a pretty good improvement on most benchmarks with up to -15% CPU time and a few regressions (up to +5%).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants