Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use SSBOs for bindless masking #5952

Merged
merged 18 commits into from
Aug 21, 2023
Merged

Use SSBOs for bindless masking #5952

merged 18 commits into from
Aug 21, 2023

Conversation

smoogipoo
Copy link
Contributor

@smoogipoo smoogipoo commented Jul 28, 2023

Prereqs:

Can be tested with the following:

Currently, the UBO fallback is set at a measly 60 elements to keep the total struct size under 16KiB (see #5950 for reasoning). This can be improved in the future if this buffer is split up further - not all maskings have a corner radius, not all have a border, not all have edge effects, etc.

I'm not sure how to better do the vertices at this point - I'm all ears if anyone has any genius ideas for how to handle them amicably - consider they'll need to store more indices if the above is done in the future.

This PR does not address single-pixel scissoring issues. That will be done in a followup PR to not complicate things further.

Tests passed Platform Renderer Surface Environment
1/1 x86 gl gl
1/1 x86 gl gl OSU_GRAPHICS_NO_SSBO=1
1/1 x86 veldrid d3d11
1/1 x86 veldrid d3d11 OSU_GRAPHICS_NO_SSBO=1
0/0 x86 veldrid metal
0/0 x86 veldrid metal OSU_GRAPHICS_NO_SSBO=1
0/0 M1/2 veldrid metal
0/0 M1/2 veldrid metal OSU_GRAPHICS_NO_SSBO=1
0/0 iOS veldrid metal
0/0 iOS veldrid metal OSU_GRAPHICS_NO_SSBO=1
0/0 Android gl gl
0/0 Android gl gl OSU_GRAPHICS_NO_SSBO=1

vNext

Maskable vertex input layout has changed

Vertices which want to use the built-in VertexShaderDescriptor.TEXTURE fragment shader, or the sh_Masking.h particle, will need to adjust the vertex input slightly:

+ #include "Internal/sh_MaskingInfo.h"

layout(location = 0) in vec2 m_Position;
+ layout(location = 1) in int m_MaskingIndex;

+ layout(location = 5) flat out int v_MaskingIndex;
+ layout(location = 6) out highp vec2 v_ScissorPosition;

void main(void)
{
+   InitMasking(m_MaskingIndex);

    // Transform from screen space to masking space.
-   highp vec3 maskingPos = g_ToMaskingSpace * vec3(m_Position, 1.0);
+   highp vec4 maskingPos = g_MaskingInfo.ToMaskingSpace * vec4(m_Position, 1.0, 0.0);
    v_MaskingPosition = maskingPos.xy / maskingPos.z;

+   // Transform from screen space to scissor space.
+   highp vec4 scissorPos = g_MaskingInfo.ToScissorSpace * vec4(m_Position, 1.0, 0.0);
+   v_ScissorPosition = scissorPos.xy / scissorPos.z;

+   v_MaskingIndex = m_MaskingIndex;

    gl_Position = g_ProjMatrix * vec4(m_Position, 1.0, 1.0);
}

Likewise, the C# definition must also be updated to write the m_MaskingIndex input, and be constructed with an IRenderer:

[StructLayout(LayoutKind.Sequential)]
public struct MyCustomVertex : IEquatable<MyCustomVertex>, IVertex
{
    [VertexMember(2, VertexAttribPointerType.Float)]
    public Vector2 Position;

+   [VertexMember(1, VertexAttribPointerType.Int)]
+   private readonly int maskingIndex;

+   public MyCustomVertex(IRenderer renderer)
+   {
+       this = default;
+       maskingIndex = renderer.CurrentMaskingIndex;
+   }

    public readonly bool Equals(MyCustomVertex other) =>
        Position.Equals(other.Position)
+       && maskingIndex == other.maskingIndex;
}

Non-masking fragment shaders must use non-masking vertex shaders

Any usages of VertexShaderDescriptor.TEXTURE_2 as a vertex shader in-combination with a non-masking fragment shader (i.e. one that does not use the built-in sh_Masking.h particle), should be updated to use VertexShaderDescriptor.TEXTURE_2_NO_MASKING instead.

IRenderer.PushScissorOffset()/IRenderer.PopScissorOffset() is has been removed

The way masking applies scissor has changed such that this no longer has any use.

MaskingInfo layout has changed

  • ScreenSpaceAABB renamed to ScreenSpaceScissorArea.
  • MaskingRect renamed to MaskingArea
  • ToScissorSpace added. If ScreenSpaceScissorArea truly represents a screen-space area, set this to Matrix3.Identity, otherwise set it to convert vertex coordinate inputs to the appropriate coordinate space of ScreenSpaceScissorArea.

@peppy
Copy link
Member

peppy commented Jul 29, 2023

@smoogipoo can you explain the test grid in the OP? I'm not sure what 0/0 means for instance.

@smoogipoo
Copy link
Contributor Author

Number of tests passed / total number of tested systems

I'll adjust it a bit later, but I intend to test more systems until this is merged.

@peppy
Copy link
Member

peppy commented Jul 29, 2023

If i was to test, are there any specific test scenes i should test against?

@smoogipoo
Copy link
Contributor Author

smoogipoo commented Jul 29, 2023

You could try against TestSceneScissor, TestSceneEffects, and TestSceneMasking. Though I'm not sure whether they're an extensive test of everything that can also be seen in osu!.

@peppy
Copy link
Member

peppy commented Aug 15, 2023

This needs some non-minor conflict resolution..

@bdach
Copy link
Collaborator

bdach commented Aug 15, 2023

This needs some non-minor conflict resolution..

It locally automerged for me actually? Think it was just github getting confused again.

Copy link
Collaborator

@bdach bdach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initial read-through. There are some other parts I'm not 100% on, but that's due to my own lack of knowledge (the ToScissorSpace thing is flying over my head a little bit; will either need to read up more, or organoleptically observe that nothing is broken and let it go).

osu.Framework/Graphics/BufferedDrawNode.cs Outdated Show resolved Hide resolved
@@ -149,6 +149,7 @@ protected virtual void Dispose(bool disposing)
public static class VertexShaderDescriptor
{
public const string TEXTURE_2 = "Texture2D";
public const string TEXTURE_2_NO_MASKING = "Texture2D_NoMasking";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So my main question in regard to this would be - is anyone else going to need using this, and if so, when? It's not terribly clear to me - if I were to guess, I'd say that this is special-casing BufferedContainer and nothing else should need to use this. Is that right?

Among other reasons, I'm asking to gauge the performance tradeoff here of presumably incurring more batch flushes due to shader rebinds.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is mostly just special-casing BufferedContainer. Didn't realise that until now.

As for performance, BufferedContainer has to break the batch at some point anyway (for as long as its drawing to the FBO), so I don't think that's an issue.

If there ever comes up a case where a user is making a custom shader, they'd run into a shader bind anyway, so I can't imagine they'd lose performance from this either.

I'd really like to remove this at some point in the future though, just need to figure out a better strategy for handling the case where one of the two stages don't use the resource.

@bdach
Copy link
Collaborator

bdach commented Aug 19, 2023

Source looks ready. Pending platform testing on my side (android being the core one).

Copy link
Member

@peppy peppy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been through this multiple times, so will just leave my approval here.

Tested on macOS (intel / apple) and Windows.

@bdach
Copy link
Collaborator

bdach commented Aug 20, 2023

TestSceneEffects hard crashes on D3D11 here. ETW trace says:

ID3D11DeviceContext::DrawIndexed: The Pixel Shader expects a Render Target View bound to slot 1,
but none is bound. This is OK, as writes of an unbound Render Target View are discarded. It is also
possible the developer knows the data will not be used anyway. This is only a problem if the developer
actually intended to bind a Render Target View here.

Will try to figure it out myself right now but not sure where that'll get me...

@bdach bdach added the blocked label Aug 20, 2023
@bdach
Copy link
Collaborator

bdach commented Aug 20, 2023

Unfortunately I'm basically stuck.

Details / what I've tried:

  • Any usage of the blur effect triggers the above crash.
  • Stepping through the uniform binding code it looks like everything is in order.
  • HLSL output of the cross compilation looks correct and matches the uniform bindings.
  • Toggling OSU_GRAPHICS_NO_SSBO=1 has no effect.
  • Trying to switch blur to be using the masking vertex shader has no effect.
  • Explicitly declaring the blur param uniform structure in the non-masking vertex shader has no effect.

Dunno where to take this further. It doesn't help that renderdoc barely works (I can get it to capture, but I can't actually get the command lists to show their contents after the capture is taken).

I'm leaving this be for now in hopes that @smoogipoo can throw an idea or two my way.

The "consolation" is that android works... but performance gains are marginal if any.

@smoogipoo
Copy link
Contributor Author

Concerning that I can't reproduce this :/ Can you debug into the first call to VeldridRenderer.DrawVertices after breaking on

renderer.DrawFrameBuffer(current, new RectangleF(0, 0, current.Texture.Width, current.Texture.Height), ColourInfo.SingleColour(Color4.White));

And check what uniform/texture layouts you have. This is the layout I get, which shows the RTV (if I'm understanding the one it's referring to) is bound to slot 2 for me:

image

The above may not lead to anything because the cross compiler should be platform agnostic (you should get the same results as me), but it's still interesting nonetheless - if it's different it may indicate potentially something going wrong with the cache? To double check you could also try renaming your cache dir.

It could be that after D3D11 compiles the shader it notices g_GlobalUniforms is not used in the fragment shader, despite our hack that is intended for Metal...

@bdach
Copy link
Collaborator

bdach commented Aug 21, 2023

Can you debug into the first call to VeldridRenderer.DrawVertices after breaking on

renderer.DrawFrameBuffer(current, new RectangleF(0, 0, current.Texture.Width, current.Texture.Height), ColourInfo.SingleColour(Color4.White));

Looks the exact same here.

image

To double check you could also try renaming your cache dir.

I've nuked my shader cache multiple times when debugging this to no avail, since that was basically the best way I know of to extract the cross-compiled hlsl source to look at it. No dice.

I'll try the "optimised out" angle I guess.

@bdach
Copy link
Collaborator

bdach commented Aug 21, 2023

If it's getting optimised out it's not at the cross compile step I don't think. Below attached is the output of hlsl cross-compilation.

cbuffer g_GlobalUniforms : register(b0)
{
    uint _139_g_BackbufferDraw : packoffset(c0);
    uint _139_g_IsDepthRangeZeroToOne : packoffset(c0.y);
    uint _139_g_IsClipSpaceYInverted : packoffset(c0.z);
    uint _139_g_IsUvOriginTopLeft : packoffset(c0.w);
    row_major float4x4 _139_g_ProjMatrix : packoffset(c1);
    int _139_g_WrapModeS : packoffset(c5);
    int _139_g_WrapModeT : packoffset(c5.y);
};

cbuffer m_BlurParameters : register(b1)
{
    float2 _163_g_TexSize : packoffset(c0);
    int _163_g_Radius : packoffset(c0.z);
    float _163_g_Sigma : packoffset(c0.w);
    float2 _163_g_BlurDirection : packoffset(c1);
};

Texture2D<float4> m_Texture : register(t0);
SamplerState m_Sampler : register(s0);

static float4 o_Colour;
static float2 v_TexCoord;
static float2 _unused_output_1;
static float2 _unused_input_0;
static float4 _unused_output_2;
static float4 _unused_input_1;
static float4 _unused_output_3;
static float4 _unused_input_3;
static float2 _unused_output_4;
static float2 _unused_input_4;

struct SPIRV_Cross_Input
{
    float2 _unused_input_0 : TEXCOORD0;
    float4 _unused_input_1 : TEXCOORD1;
    float2 v_TexCoord : TEXCOORD2;
    float4 _unused_input_3 : TEXCOORD3;
    float2 _unused_input_4 : TEXCOORD4;
};

struct SPIRV_Cross_Output
{
    float4 o_Colour : SV_Target0;
    float2 _unused_output_1 : SV_Target1;
    float4 _unused_output_2 : SV_Target2;
    float4 _unused_output_3 : SV_Target3;
    float2 _unused_output_4 : SV_Target4;
};

float computeGauss(float x, float sigma)
{
    return (0.3989399969577789306640625f * exp((((-0.5f) * x) * x) / (sigma * sigma))) / sigma;
}

float4 blur(int radius, float2 direction, float2 texCoord, float2 texSize, float sigma)
{
    float param = 0.0f;
    float param_1 = sigma;
    float factor = computeGauss(param, param_1);
    float4 sum = m_Texture.Sample(m_Sampler, texCoord) * factor;
    float totalFactor = factor;
    for (int i = 2; i <= 200; i += 2)
    {
        float x = float(i) - 0.5f;
        float param_2 = x;
        float param_3 = sigma;
        factor = computeGauss(param_2, param_3) * 2.0f;
        totalFactor += (2.0f * factor);
        sum += (m_Texture.Sample(m_Sampler, texCoord + ((direction * x) / texSize)) * factor);
        sum += (m_Texture.Sample(m_Sampler, texCoord - ((direction * x) / texSize)) * factor);
        if (i >= radius)
        {
            break;
        }
    }
    float one = float((_139_g_BackbufferDraw != 0u) ? 0 : 1);
    float4 _155 = (sum / totalFactor.xxxx) * one;
    return _155;
}

void _internal_real_main()
{
    int param = _163_g_Radius;
    float2 param_1 = _163_g_BlurDirection;
    float2 param_2 = v_TexCoord;
    float2 param_3 = _163_g_TexSize;
    float param_4 = _163_g_Sigma;
    o_Colour = blur(param, param_1, param_2, param_3, param_4);
}

void frag_main()
{
    _internal_real_main();
    _unused_output_1 = _unused_input_0;
    _unused_output_2 = _unused_input_1;
    _unused_output_3 = _unused_input_3;
    _unused_output_4 = _unused_input_4;
}

SPIRV_Cross_Output main(SPIRV_Cross_Input stage_input)
{
    v_TexCoord = stage_input.v_TexCoord;
    _unused_input_0 = stage_input._unused_input_0;
    _unused_input_1 = stage_input._unused_input_1;
    _unused_input_3 = stage_input._unused_input_3;
    _unused_input_4 = stage_input._unused_input_4;
    frag_main();
    SPIRV_Cross_Output stage_output;
    stage_output.o_Colour = o_Colour;
    stage_output._unused_output_1 = _unused_output_1;
    stage_output._unused_output_2 = _unused_output_2;
    stage_output._unused_output_3 = _unused_output_3;
    stage_output._unused_output_4 = _unused_output_4;
    return stage_output;
}

The _139_g_BackbufferDraw branch seems to be still alive and well.

@smoogipoo
Copy link
Contributor Author

Yeah, if anything it would be at a D3D11 compiler level, not cross-compile. But even then I struggle to imagine how it could infer that.

Debugging this is probably going to require going into Veldrid itself...

@bdach
Copy link
Collaborator

bdach commented Aug 21, 2023

Yeah, if anything it would be at a D3D11 compiler level, not cross-compile.

There is circumstantial crude evidence that this could be the case. Namely that:

  • blurShader = shaders.Load(VertexShaderDescriptor.TEXTURE_2_NO_MASKING, FragmentShaderDescriptor.BLUR); crashes
  • blurShader = shaders.Load(VertexShaderDescriptor.TEXTURE_2, FragmentShaderDescriptor.BLUR); (with no further changes anywhere else) crashes
  • blurShader = shaders.Load(VertexShaderDescriptor.TEXTURE_2, FragmentShaderDescriptor.TEXTURE); (with no further changes anywhere else) works

@bdach
Copy link
Collaborator

bdach commented Aug 21, 2023

After extensive debugging it appears that this PR is not the cause of the aforementioned crash as it also happens on master on my PC. Looks to be caused by #5687 specifically.

In light of this, unblocking and proceeding with merge.

@bdach bdach enabled auto-merge August 21, 2023 12:06
@bdach bdach removed the blocked label Aug 21, 2023
@bdach bdach merged commit 844a9ee into ppy:master Aug 21, 2023
@smoogipoo smoogipoo deleted the masking-ssbo-2 branch September 11, 2023 02:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants