reuse equivalent samplers in dx12 to help stay under the limit of 2048 #3499

Davidster · 2023-02-19T05:28:30Z

Checklist

Run cargo clippy.
Run RUSTFLAGS=--cfg=web_sys_unstable_apis cargo clippy --target wasm32-unknown-unknown if applicable.
Add change to CHANGELOG.md. See simple instructions inside file.

Connections
#3350

Description
As mentioned in the linked issue, the sampler count is limited to 2048 in dx12 and will throw an out of memory error if exceeded. I encountered this in my project when loading a lot of textures. My current approach to solving it is to hash the samplers as they're created, and keep a hash map of which sets of samplers have been uploaded to the gpu (including a ref count) so they're only uploaded once and reused across different bind groups if the hashes match.

Testing
I tested it by wiring it up to my project and verifying that the out of memory error no longer occurred. I also double-checked that uploaded_sampler_handles gets completely emptied when the user of wgpu drops all of their bind groups.

…8 active samplers

teoxoy

Thank you for the PR!

I have a couple of concerns with the behavior introduced.

I think the caching strategy is too specific ("reusing samplers only across bind groups if 2 or more bind groups have the same number of samplers with the same descriptors").

I think it would be better to instead cache the underlying dx12 samplers at creation time.
Besides anisotropy_clamp and lod_clamp (both of which are f32s) the other properties of a sampler are enums which should allow us to stay below the 2048 limit since I can't imagine people creating samplers with widely different values for anisotropy_clamp and lod_clamp.

teoxoy · 2023-02-20T17:16:27Z

wgpu-hal/src/lib.rs

@@ -939,6 +940,25 @@ pub struct SamplerDescriptor<'a> {
    pub border_color: Option<wgt::SamplerBorderColor>,
 }

+impl Hash for SamplerDescriptor<'_> {
+    fn hash<H: Hasher>(&self, state: &mut H) {
+        self.label.hash(state);


The dx12 backend doesn't seem to use this, not including it in the hash might get us more cache hits.
If we remove it, we should make this a normal fn (hash_for_dx12) and move it in the backend instead.

Does that mean there's no possibility that we break the use case of the user creating multiple samplers with different labels expecting to get their unique labels back in the wgpu errors? If you're not 100% sure, I'd be happy to double check it, but it also means that dx12 isn't behaving as expected, no? Feels like it might change in the future.

Tbh, I'm not sure if we have to give the labels to the backends in general, and we only do because it makes it easier to debug in case we missed something in wgpu-core and the backend itself is erroring.

For the dx12 backend, I've seen buffers and textures use .SetName the assign the label but I'm not sure if there is a way to do it for samplers.

@cwfitzgerald is it true that we only give labels to the underlying backends just in case our validation didn't catch an error? And theoretically we could only do it in debug builds?

They're there for two reasons: validation and external tools. Things like renderdoc, PIX, and profilers all use them.

We've long only set names in debug mode, but I think that's a mistake, that's not a decision we get to make.

Especially as debug labels are quite useful in profilers

@teoxoy I still think it could make sense to move the hasher into the dx12 backend, makes it more clear that it's only being used there.

We've long only set names in debug mode, but I think that's a mistake, that's not a decision we get to make.

I see, ok.

I feel like it's best to include the label in the hash then and add a comment to the hash function telling the user not to add too many different labels

For sure, if we pass them to dx12, but as of right now I couldn't find where/if we do that.

Ok when I pass over this again I'll try to get a definitive answer to that

I feel like it's best to include the label in the hash then and add a comment to the hash function telling the user not to add too many different labels

Here I actually disagree - with samplers being such a scarce resource, we should try to de-duplicate as much as possible. Especially when the cause could be completely unintuitive.

In an ideal world, we could make the label assigned to the d3d resource the union of the names of all of the actual wgpu samplers using it. That means when you get a cache hit, you append the new label onto the existing sampler.

Sounds fine to me

teoxoy · 2023-02-20T17:41:35Z

wgpu-hal/src/dx12/device.rs

@@ -1099,6 +1114,7 @@ impl crate::Device<super::Api> for super::Device {
        if let Some(ref mut inner) = cpu_samplers {
            inner.stage.clear();
        }
+        let mut sampler_hashes = Vec::new();


We can create a new hash instead and accumulate all the sampler hashes in it.

teoxoy · 2023-02-20T17:53:22Z

wgpu-hal/src/dx12/mod.rs

@@ -262,6 +262,7 @@ pub struct Device {
    null_rtv_handle: descriptor::Handle,
    mem_allocator: Option<Mutex<suballocation::GpuAllocatorWrapper>>,
    dxc_container: Option<shader_compilation::DxcContainer>,
+    uploaded_sampler_handles: Mutex<HashMap<Vec<u64>, (descriptor::DualHandle, usize)>>,


Suggested change

uploaded_sampler_handles: Mutex<HashMap<Vec<u64>, (descriptor::DualHandle, usize)>>,

uploaded_sampler_bundle: Mutex<HashMap<SamplerBundleHash, Rc<(SamplerBundleHash, descriptor::DualHandle)>>>,

where

type SamplerBundleHash = u64;

teoxoy · 2023-02-20T17:53:44Z

wgpu-hal/src/dx12/mod.rs

@@ -512,7 +514,7 @@ enum BufferViewKind {
 #[derive(Debug)]
 pub struct BindGroup {
    handle_views: Option<descriptor::DualHandle>,
-    handle_samplers: Option<descriptor::DualHandle>,
+    handle_samplers: Option<(descriptor::DualHandle, Vec<u64>)>,


Suggested change

handle_samplers: Option<(descriptor::DualHandle, Vec<u64>)>,

handle_samplers: Option<Rc<(SamplerBundleHash, descriptor::DualHandle)>>,

teoxoy · 2023-02-20T17:59:55Z

CHANGELOG.md

@@ -98,6 +98,7 @@ By @teoxoy in [#3436](https://github.com/gfx-rs/wgpu/pull/3436)
 #### DX12

 - Fix DXC validation issues when using a custom `dxil_path`. By @Elabajaba in [#3434](https://github.com/gfx-rs/wgpu/pull/3434)
+- Reuse equivalent samplers in dx12 to help stay under the limit of 2048 per heap. By @Davidster in [#3499](https://github.com/gfx-rs/wgpu/pull/3499)


I don't think this is accurate.
With this PR, samplers are only reused in the following situation:
Across bind groups if 2 or more bind groups have the same number of samplers with the same descriptors.

Davidster · 2023-02-20T18:39:08Z

Thanks for the comments! And your concern is fair, I'll see if I can find a way to make the samplers be reused regardless of the way they're used in the bind groups

…less of how theyre used in bind groups

cwfitzgerald · 2023-04-05T22:47:06Z

Going to close as we now have a new plan

Davidster added 2 commits February 19, 2023 00:16

reuse equivalent samplers in dx12 to help stay under the limit of 204…

fbecae8

…8 active samplers

update changelog

a9e2293

teoxoy requested changes Feb 20, 2023

View reviewed changes

Davidster added 3 commits March 1, 2023 22:07

don't hash label for sampler descriptor hashes; cache samplers regard…

f20694c

…less of how theyre used in bind groups

move sampler descriptor hasher into dx12 module

237c997

attempt to use arc instead of manual ref count thing

b0f3550

Davidster mentioned this pull request Mar 9, 2023

DX12 Sampler Heap Requirements Overly Strict #3350

Closed

cwfitzgerald closed this Apr 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reuse equivalent samplers in dx12 to help stay under the limit of 2048 #3499

reuse equivalent samplers in dx12 to help stay under the limit of 2048 #3499

Davidster commented Feb 19, 2023 •

edited

Loading

teoxoy left a comment •

edited

Loading

teoxoy Feb 20, 2023

Davidster Feb 20, 2023

teoxoy Feb 20, 2023

cwfitzgerald Feb 20, 2023

cwfitzgerald Feb 20, 2023

Davidster Feb 20, 2023

teoxoy Feb 20, 2023

Davidster Feb 20, 2023

cwfitzgerald Feb 21, 2023

Davidster Feb 21, 2023

teoxoy Feb 20, 2023

teoxoy Feb 20, 2023

teoxoy Feb 20, 2023

teoxoy Feb 20, 2023

Davidster commented Feb 20, 2023

cwfitzgerald commented Apr 5, 2023

	uploaded_sampler_handles: Mutex<HashMap<Vec<u64>, (descriptor::DualHandle, usize)>>,
	uploaded_sampler_bundle: Mutex<HashMap<SamplerBundleHash, Rc<(SamplerBundleHash, descriptor::DualHandle)>>>,

	handle_samplers: Option<(descriptor::DualHandle, Vec<u64>)>,
	handle_samplers: Option<Rc<(SamplerBundleHash, descriptor::DualHandle)>>,

reuse equivalent samplers in dx12 to help stay under the limit of 2048 #3499

reuse equivalent samplers in dx12 to help stay under the limit of 2048 #3499

Conversation

Davidster commented Feb 19, 2023 • edited Loading

teoxoy left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Davidster commented Feb 20, 2023

cwfitzgerald commented Apr 5, 2023

Davidster commented Feb 19, 2023 •

edited

Loading

teoxoy left a comment •

edited

Loading