-
Notifications
You must be signed in to change notification settings - Fork 424
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use SSBOs for bindless masking #5952
Conversation
@smoogipoo can you explain the test grid in the OP? I'm not sure what |
I'll adjust it a bit later, but I intend to test more systems until this is merged. |
If i was to test, are there any specific test scenes i should test against? |
You could try against |
This needs some non-minor conflict resolution.. |
It locally automerged for me actually? Think it was just github getting confused again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initial read-through. There are some other parts I'm not 100% on, but that's due to my own lack of knowledge (the ToScissorSpace
thing is flying over my head a little bit; will either need to read up more, or organoleptically observe that nothing is broken and let it go).
@@ -149,6 +149,7 @@ protected virtual void Dispose(bool disposing) | |||
public static class VertexShaderDescriptor | |||
{ | |||
public const string TEXTURE_2 = "Texture2D"; | |||
public const string TEXTURE_2_NO_MASKING = "Texture2D_NoMasking"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So my main question in regard to this would be - is anyone else going to need using this, and if so, when? It's not terribly clear to me - if I were to guess, I'd say that this is special-casing BufferedContainer
and nothing else should need to use this. Is that right?
Among other reasons, I'm asking to gauge the performance tradeoff here of presumably incurring more batch flushes due to shader rebinds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is mostly just special-casing BufferedContainer
. Didn't realise that until now.
As for performance, BufferedContainer
has to break the batch at some point anyway (for as long as its drawing to the FBO), so I don't think that's an issue.
If there ever comes up a case where a user is making a custom shader, they'd run into a shader bind anyway, so I can't imagine they'd lose performance from this either.
I'd really like to remove this at some point in the future though, just need to figure out a better strategy for handling the case where one of the two stages don't use the resource.
Source looks ready. Pending platform testing on my side (android being the core one). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been through this multiple times, so will just leave my approval here.
Tested on macOS (intel / apple) and Windows.
Will try to figure it out myself right now but not sure where that'll get me... |
Unfortunately I'm basically stuck. Details / what I've tried:
Dunno where to take this further. It doesn't help that renderdoc barely works (I can get it to capture, but I can't actually get the command lists to show their contents after the capture is taken). I'm leaving this be for now in hopes that @smoogipoo can throw an idea or two my way. The "consolation" is that android works... but performance gains are marginal if any. |
Concerning that I can't reproduce this :/ Can you debug into the first call to
And check what uniform/texture layouts you have. This is the layout I get, which shows the RTV (if I'm understanding the one it's referring to) is bound to slot 2 for me: The above may not lead to anything because the cross compiler should be platform agnostic (you should get the same results as me), but it's still interesting nonetheless - if it's different it may indicate potentially something going wrong with the cache? To double check you could also try renaming your It could be that after D3D11 compiles the shader it notices |
Looks the exact same here.
I've nuked my shader cache multiple times when debugging this to no avail, since that was basically the best way I know of to extract the cross-compiled hlsl source to look at it. No dice. I'll try the "optimised out" angle I guess. |
If it's getting optimised out it's not at the cross compile step I don't think. Below attached is the output of hlsl cross-compilation. cbuffer g_GlobalUniforms : register(b0)
{
uint _139_g_BackbufferDraw : packoffset(c0);
uint _139_g_IsDepthRangeZeroToOne : packoffset(c0.y);
uint _139_g_IsClipSpaceYInverted : packoffset(c0.z);
uint _139_g_IsUvOriginTopLeft : packoffset(c0.w);
row_major float4x4 _139_g_ProjMatrix : packoffset(c1);
int _139_g_WrapModeS : packoffset(c5);
int _139_g_WrapModeT : packoffset(c5.y);
};
cbuffer m_BlurParameters : register(b1)
{
float2 _163_g_TexSize : packoffset(c0);
int _163_g_Radius : packoffset(c0.z);
float _163_g_Sigma : packoffset(c0.w);
float2 _163_g_BlurDirection : packoffset(c1);
};
Texture2D<float4> m_Texture : register(t0);
SamplerState m_Sampler : register(s0);
static float4 o_Colour;
static float2 v_TexCoord;
static float2 _unused_output_1;
static float2 _unused_input_0;
static float4 _unused_output_2;
static float4 _unused_input_1;
static float4 _unused_output_3;
static float4 _unused_input_3;
static float2 _unused_output_4;
static float2 _unused_input_4;
struct SPIRV_Cross_Input
{
float2 _unused_input_0 : TEXCOORD0;
float4 _unused_input_1 : TEXCOORD1;
float2 v_TexCoord : TEXCOORD2;
float4 _unused_input_3 : TEXCOORD3;
float2 _unused_input_4 : TEXCOORD4;
};
struct SPIRV_Cross_Output
{
float4 o_Colour : SV_Target0;
float2 _unused_output_1 : SV_Target1;
float4 _unused_output_2 : SV_Target2;
float4 _unused_output_3 : SV_Target3;
float2 _unused_output_4 : SV_Target4;
};
float computeGauss(float x, float sigma)
{
return (0.3989399969577789306640625f * exp((((-0.5f) * x) * x) / (sigma * sigma))) / sigma;
}
float4 blur(int radius, float2 direction, float2 texCoord, float2 texSize, float sigma)
{
float param = 0.0f;
float param_1 = sigma;
float factor = computeGauss(param, param_1);
float4 sum = m_Texture.Sample(m_Sampler, texCoord) * factor;
float totalFactor = factor;
for (int i = 2; i <= 200; i += 2)
{
float x = float(i) - 0.5f;
float param_2 = x;
float param_3 = sigma;
factor = computeGauss(param_2, param_3) * 2.0f;
totalFactor += (2.0f * factor);
sum += (m_Texture.Sample(m_Sampler, texCoord + ((direction * x) / texSize)) * factor);
sum += (m_Texture.Sample(m_Sampler, texCoord - ((direction * x) / texSize)) * factor);
if (i >= radius)
{
break;
}
}
float one = float((_139_g_BackbufferDraw != 0u) ? 0 : 1);
float4 _155 = (sum / totalFactor.xxxx) * one;
return _155;
}
void _internal_real_main()
{
int param = _163_g_Radius;
float2 param_1 = _163_g_BlurDirection;
float2 param_2 = v_TexCoord;
float2 param_3 = _163_g_TexSize;
float param_4 = _163_g_Sigma;
o_Colour = blur(param, param_1, param_2, param_3, param_4);
}
void frag_main()
{
_internal_real_main();
_unused_output_1 = _unused_input_0;
_unused_output_2 = _unused_input_1;
_unused_output_3 = _unused_input_3;
_unused_output_4 = _unused_input_4;
}
SPIRV_Cross_Output main(SPIRV_Cross_Input stage_input)
{
v_TexCoord = stage_input.v_TexCoord;
_unused_input_0 = stage_input._unused_input_0;
_unused_input_1 = stage_input._unused_input_1;
_unused_input_3 = stage_input._unused_input_3;
_unused_input_4 = stage_input._unused_input_4;
frag_main();
SPIRV_Cross_Output stage_output;
stage_output.o_Colour = o_Colour;
stage_output._unused_output_1 = _unused_output_1;
stage_output._unused_output_2 = _unused_output_2;
stage_output._unused_output_3 = _unused_output_3;
stage_output._unused_output_4 = _unused_output_4;
return stage_output;
} The |
Yeah, if anything it would be at a D3D11 compiler level, not cross-compile. But even then I struggle to imagine how it could infer that. Debugging this is probably going to require going into Veldrid itself... |
There is circumstantial crude evidence that this could be the case. Namely that:
|
After extensive debugging it appears that this PR is not the cause of the aforementioned crash as it also happens on In light of this, unblocking and proceeding with merge. |
Prereqs:
DepthWrappingVertex<T>
#5943Can be tested with the following:
Currently, the UBO fallback is set at a measly 60 elements to keep the total struct size under 16KiB (see #5950 for reasoning). This can be improved in the future if this buffer is split up further - not all maskings have a corner radius, not all have a border, not all have edge effects, etc.
I'm not sure how to better do the vertices at this point - I'm all ears if anyone has any genius ideas for how to handle them amicably - consider they'll need to store more indices if the above is done in the future.
This PR does not address single-pixel scissoring issues. That will be done in a followup PR to not complicate things further.
OSU_GRAPHICS_NO_SSBO=1
OSU_GRAPHICS_NO_SSBO=1
OSU_GRAPHICS_NO_SSBO=1
OSU_GRAPHICS_NO_SSBO=1
OSU_GRAPHICS_NO_SSBO=1
OSU_GRAPHICS_NO_SSBO=1
vNext
Maskable vertex input layout has changed
Vertices which want to use the built-in
VertexShaderDescriptor.TEXTURE
fragment shader, or thesh_Masking.h
particle, will need to adjust the vertex input slightly:Likewise, the C# definition must also be updated to write the
m_MaskingIndex
input, and be constructed with anIRenderer
:Non-masking fragment shaders must use non-masking vertex shaders
Any usages of
VertexShaderDescriptor.TEXTURE_2
as a vertex shader in-combination with a non-masking fragment shader (i.e. one that does not use the built-insh_Masking.h
particle), should be updated to useVertexShaderDescriptor.TEXTURE_2_NO_MASKING
instead.IRenderer.PushScissorOffset()
/IRenderer.PopScissorOffset()
is has been removedThe way masking applies scissor has changed such that this no longer has any use.
MaskingInfo
layout has changedScreenSpaceAABB
renamed toScreenSpaceScissorArea
.MaskingRect
renamed toMaskingArea
ToScissorSpace
added. IfScreenSpaceScissorArea
truly represents a screen-space area, set this toMatrix3.Identity
, otherwise set it to convert vertex coordinate inputs to the appropriate coordinate space ofScreenSpaceScissorArea
.