Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reuse equivalent samplers in dx12 to help stay under the limit of 2048 #3499

Closed
wants to merge 5 commits into from

Conversation

Davidster
Copy link
Contributor

@Davidster Davidster commented Feb 19, 2023

Checklist

  • Run cargo clippy.
  • Run RUSTFLAGS=--cfg=web_sys_unstable_apis cargo clippy --target wasm32-unknown-unknown if applicable.
  • Add change to CHANGELOG.md. See simple instructions inside file.

Connections
#3350

Description
As mentioned in the linked issue, the sampler count is limited to 2048 in dx12 and will throw an out of memory error if exceeded. I encountered this in my project when loading a lot of textures. My current approach to solving it is to hash the samplers as they're created, and keep a hash map of which sets of samplers have been uploaded to the gpu (including a ref count) so they're only uploaded once and reused across different bind groups if the hashes match.

Testing
I tested it by wiring it up to my project and verifying that the out of memory error no longer occurred. I also double-checked that uploaded_sampler_handles gets completely emptied when the user of wgpu drops all of their bind groups.

Copy link
Member

@teoxoy teoxoy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR!

I have a couple of concerns with the behavior introduced.

I think the caching strategy is too specific ("reusing samplers only across bind groups if 2 or more bind groups have the same number of samplers with the same descriptors").

I think it would be better to instead cache the underlying dx12 samplers at creation time.
Besides anisotropy_clamp and lod_clamp (both of which are f32s) the other properties of a sampler are enums which should allow us to stay below the 2048 limit since I can't imagine people creating samplers with widely different values for anisotropy_clamp and lod_clamp.

@@ -939,6 +940,25 @@ pub struct SamplerDescriptor<'a> {
pub border_color: Option<wgt::SamplerBorderColor>,
}

impl Hash for SamplerDescriptor<'_> {
fn hash<H: Hasher>(&self, state: &mut H) {
self.label.hash(state);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dx12 backend doesn't seem to use this, not including it in the hash might get us more cache hits.
If we remove it, we should make this a normal fn (hash_for_dx12) and move it in the backend instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that mean there's no possibility that we break the use case of the user creating multiple samplers with different labels expecting to get their unique labels back in the wgpu errors? If you're not 100% sure, I'd be happy to double check it, but it also means that dx12 isn't behaving as expected, no? Feels like it might change in the future.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tbh, I'm not sure if we have to give the labels to the backends in general, and we only do because it makes it easier to debug in case we missed something in wgpu-core and the backend itself is erroring.

For the dx12 backend, I've seen buffers and textures use .SetName the assign the label but I'm not sure if there is a way to do it for samplers.

@cwfitzgerald is it true that we only give labels to the underlying backends just in case our validation didn't catch an error? And theoretically we could only do it in debug builds?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They're there for two reasons: validation and external tools. Things like renderdoc, PIX, and profilers all use them.

We've long only set names in debug mode, but I think that's a mistake, that's not a decision we get to make.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Especially as debug labels are quite useful in profilers

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@teoxoy I still think it could make sense to move the hasher into the dx12 backend, makes it more clear that it's only being used there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've long only set names in debug mode, but I think that's a mistake, that's not a decision we get to make.

I see, ok.

I feel like it's best to include the label in the hash then and add a comment to the hash function telling the user not to add too many different labels

For sure, if we pass them to dx12, but as of right now I couldn't find where/if we do that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok when I pass over this again I'll try to get a definitive answer to that

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like it's best to include the label in the hash then and add a comment to the hash function telling the user not to add too many different labels

Here I actually disagree - with samplers being such a scarce resource, we should try to de-duplicate as much as possible. Especially when the cause could be completely unintuitive.

In an ideal world, we could make the label assigned to the d3d resource the union of the names of all of the actual wgpu samplers using it. That means when you get a cache hit, you append the new label onto the existing sampler.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds fine to me

@@ -1099,6 +1114,7 @@ impl crate::Device<super::Api> for super::Device {
if let Some(ref mut inner) = cpu_samplers {
inner.stage.clear();
}
let mut sampler_hashes = Vec::new();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can create a new hash instead and accumulate all the sampler hashes in it.

@@ -262,6 +262,7 @@ pub struct Device {
null_rtv_handle: descriptor::Handle,
mem_allocator: Option<Mutex<suballocation::GpuAllocatorWrapper>>,
dxc_container: Option<shader_compilation::DxcContainer>,
uploaded_sampler_handles: Mutex<HashMap<Vec<u64>, (descriptor::DualHandle, usize)>>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
uploaded_sampler_handles: Mutex<HashMap<Vec<u64>, (descriptor::DualHandle, usize)>>,
uploaded_sampler_bundle: Mutex<HashMap<SamplerBundleHash, Rc<(SamplerBundleHash, descriptor::DualHandle)>>>,

where

type SamplerBundleHash = u64;

@@ -512,7 +514,7 @@ enum BufferViewKind {
#[derive(Debug)]
pub struct BindGroup {
handle_views: Option<descriptor::DualHandle>,
handle_samplers: Option<descriptor::DualHandle>,
handle_samplers: Option<(descriptor::DualHandle, Vec<u64>)>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
handle_samplers: Option<(descriptor::DualHandle, Vec<u64>)>,
handle_samplers: Option<Rc<(SamplerBundleHash, descriptor::DualHandle)>>,

@@ -98,6 +98,7 @@ By @teoxoy in [#3436](https://github.com/gfx-rs/wgpu/pull/3436)
#### DX12

- Fix DXC validation issues when using a custom `dxil_path`. By @Elabajaba in [#3434](https://github.com/gfx-rs/wgpu/pull/3434)
- Reuse equivalent samplers in dx12 to help stay under the limit of 2048 per heap. By @Davidster in [#3499](https://github.com/gfx-rs/wgpu/pull/3499)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is accurate.
With this PR, samplers are only reused in the following situation:
Across bind groups if 2 or more bind groups have the same number of samplers with the same descriptors.

@Davidster
Copy link
Contributor Author

Thanks for the comments! And your concern is fair, I'll see if I can find a way to make the samplers be reused regardless of the way they're used in the bind groups

@cwfitzgerald
Copy link
Member

Going to close as we now have a new plan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants