Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Background recording in Metal #2259

Closed
kvark opened this issue Jul 20, 2018 · 0 comments
Closed

Background recording in Metal #2259

kvark opened this issue Jul 20, 2018 · 0 comments

Comments

@kvark
Copy link
Member

kvark commented Jul 20, 2018

We finally figured out all the unknowns about Dota performance, and the simplified gist of it is that we need to go through recording faster, much faster than we do. Deferred recording allows us to create this illusion of recording very fast, only to delay the actual work to the submission time. The trick here, as indicated in #2232, is the fact the driver and HW are able to do the work at the same time as we chew through the recording.

I was thinking... if there is a way for us to both get to the submission point earlier AND benefit from online command buffer recording. What if we create deferred recording lists but instead of them to wait for the submission time, we start actually recording them into metal command buffers somewhere on a hidden thread? The submission would then just make sure to wait for that thread to finish working on each submitted command buffer before calling commit on it.

What that would give us? Looks like a hacky solution in place, introducing implicit threading, just to squeeze more performance from Dota. It is indeed, but it would be interesting to see if it gives us a solid advantage ;)

@kvark kvark self-assigned this Jul 20, 2018
bors bot added a commit that referenced this issue Jul 24, 2018
2260: Remote command sink in Metal r=grovesNL a=kvark

Fixes #2259
The results so far are not super promising - highly unstable (presumably, because of the dispatch), with performance around `Immediate` mark. We are still missing the most important follow-up here - to avoid any heap allocations when recording commands. Currently, it just goes with `Vec::new()` and grows it for each pass, which shows up in the profile quite a bit.

The PR also has a bunch of stuff in general optimizations:
  - HAL change in the descriptor allocation API to avoid the heap
  - lighten up Metal descriptor binding path (a bit) by making sure there is enough state slots in advance
  - simplification and refactoring of `CommandSink` implementations

PR checklist:
- [ ] `make` succeeds (on *nix)
- [ ] `make reftests` succeeds
- [x] tested examples with the following backends: Metal
- [ ] `rustfmt` run on changed code


Co-authored-by: Dzmitry Malyshau <kvarkus@gmail.com>
@bors bors bot closed this as completed in #2260 Jul 24, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant