-
Notifications
You must be signed in to change notification settings - Fork 990
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
reuse equivalent samplers in dx12 to help stay under the limit of 2048 #3499
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR!
I have a couple of concerns with the behavior introduced.
I think the caching strategy is too specific ("reusing samplers only across bind groups if 2 or more bind groups have the same number of samplers with the same descriptors").
I think it would be better to instead cache the underlying dx12 samplers at creation time.
Besides anisotropy_clamp
and lod_clamp
(both of which are f32
s) the other properties of a sampler are enums which should allow us to stay below the 2048 limit since I can't imagine people creating samplers with widely different values for anisotropy_clamp
and lod_clamp
.
wgpu-hal/src/lib.rs
Outdated
@@ -939,6 +940,25 @@ pub struct SamplerDescriptor<'a> { | |||
pub border_color: Option<wgt::SamplerBorderColor>, | |||
} | |||
|
|||
impl Hash for SamplerDescriptor<'_> { | |||
fn hash<H: Hasher>(&self, state: &mut H) { | |||
self.label.hash(state); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The dx12 backend doesn't seem to use this, not including it in the hash might get us more cache hits.
If we remove it, we should make this a normal fn (hash_for_dx12
) and move it in the backend instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does that mean there's no possibility that we break the use case of the user creating multiple samplers with different labels expecting to get their unique labels back in the wgpu errors? If you're not 100% sure, I'd be happy to double check it, but it also means that dx12 isn't behaving as expected, no? Feels like it might change in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tbh, I'm not sure if we have to give the labels to the backends in general, and we only do because it makes it easier to debug in case we missed something in wgpu-core and the backend itself is erroring.
For the dx12 backend, I've seen buffers and textures use .SetName
the assign the label but I'm not sure if there is a way to do it for samplers.
@cwfitzgerald is it true that we only give labels to the underlying backends just in case our validation didn't catch an error? And theoretically we could only do it in debug builds?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They're there for two reasons: validation and external tools. Things like renderdoc, PIX, and profilers all use them.
We've long only set names in debug mode, but I think that's a mistake, that's not a decision we get to make.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Especially as debug labels are quite useful in profilers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@teoxoy I still think it could make sense to move the hasher into the dx12 backend, makes it more clear that it's only being used there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've long only set names in debug mode, but I think that's a mistake, that's not a decision we get to make.
I see, ok.
I feel like it's best to include the label in the hash then and add a comment to the hash function telling the user not to add too many different labels
For sure, if we pass them to dx12, but as of right now I couldn't find where/if we do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok when I pass over this again I'll try to get a definitive answer to that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like it's best to include the label in the hash then and add a comment to the hash function telling the user not to add too many different labels
Here I actually disagree - with samplers being such a scarce resource, we should try to de-duplicate as much as possible. Especially when the cause could be completely unintuitive.
In an ideal world, we could make the label assigned to the d3d resource the union of the names of all of the actual wgpu samplers using it. That means when you get a cache hit, you append the new label onto the existing sampler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds fine to me
wgpu-hal/src/dx12/device.rs
Outdated
@@ -1099,6 +1114,7 @@ impl crate::Device<super::Api> for super::Device { | |||
if let Some(ref mut inner) = cpu_samplers { | |||
inner.stage.clear(); | |||
} | |||
let mut sampler_hashes = Vec::new(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can create a new hash instead and accumulate all the sampler hashes in it.
wgpu-hal/src/dx12/mod.rs
Outdated
@@ -262,6 +262,7 @@ pub struct Device { | |||
null_rtv_handle: descriptor::Handle, | |||
mem_allocator: Option<Mutex<suballocation::GpuAllocatorWrapper>>, | |||
dxc_container: Option<shader_compilation::DxcContainer>, | |||
uploaded_sampler_handles: Mutex<HashMap<Vec<u64>, (descriptor::DualHandle, usize)>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uploaded_sampler_handles: Mutex<HashMap<Vec<u64>, (descriptor::DualHandle, usize)>>, | |
uploaded_sampler_bundle: Mutex<HashMap<SamplerBundleHash, Rc<(SamplerBundleHash, descriptor::DualHandle)>>>, |
where
type SamplerBundleHash = u64;
wgpu-hal/src/dx12/mod.rs
Outdated
@@ -512,7 +514,7 @@ enum BufferViewKind { | |||
#[derive(Debug)] | |||
pub struct BindGroup { | |||
handle_views: Option<descriptor::DualHandle>, | |||
handle_samplers: Option<descriptor::DualHandle>, | |||
handle_samplers: Option<(descriptor::DualHandle, Vec<u64>)>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
handle_samplers: Option<(descriptor::DualHandle, Vec<u64>)>, | |
handle_samplers: Option<Rc<(SamplerBundleHash, descriptor::DualHandle)>>, |
@@ -98,6 +98,7 @@ By @teoxoy in [#3436](https://github.com/gfx-rs/wgpu/pull/3436) | |||
#### DX12 | |||
|
|||
- Fix DXC validation issues when using a custom `dxil_path`. By @Elabajaba in [#3434](https://github.com/gfx-rs/wgpu/pull/3434) | |||
- Reuse equivalent samplers in dx12 to help stay under the limit of 2048 per heap. By @Davidster in [#3499](https://github.com/gfx-rs/wgpu/pull/3499) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is accurate.
With this PR, samplers are only reused in the following situation:
Across bind groups if 2 or more bind groups have the same number of samplers with the same descriptors.
Thanks for the comments! And your concern is fair, I'll see if I can find a way to make the samplers be reused regardless of the way they're used in the bind groups |
Going to close as we now have a new plan |
Checklist
cargo clippy
.RUSTFLAGS=--cfg=web_sys_unstable_apis cargo clippy --target wasm32-unknown-unknown
if applicable.Connections
#3350
Description
As mentioned in the linked issue, the sampler count is limited to 2048 in dx12 and will throw an out of memory error if exceeded. I encountered this in my project when loading a lot of textures. My current approach to solving it is to hash the samplers as they're created, and keep a hash map of which sets of samplers have been uploaded to the gpu (including a ref count) so they're only uploaded once and reused across different bind groups if the hashes match.
Testing
I tested it by wiring it up to my project and verifying that the out of memory error no longer occurred. I also double-checked that
uploaded_sampler_handles
gets completely emptied when the user of wgpu drops all of their bind groups.