Use SSBOs for bindless masking #5952

smoogipoo · 2023-07-28T23:34:28Z

Prereqs:

Can be tested with the following:

Replace veldrid-spirv reference with https://www.nuget.org/packages/smoogipoo.Veldrid.SPIRV/1.0.15-g026aad43a3
https://github.com/smoogipoo/osu/tree/masking-ssbo
https://github.com/smoogipoo/osu-resources/tree/masking-ssbo

Currently, the UBO fallback is set at a measly 60 elements to keep the total struct size under 16KiB (see #5950 for reasoning). This can be improved in the future if this buffer is split up further - not all maskings have a corner radius, not all have a border, not all have edge effects, etc.

I'm not sure how to better do the vertices at this point - I'm all ears if anyone has any genius ideas for how to handle them amicably - consider they'll need to store more indices if the above is done in the future.

This PR does not address single-pixel scissoring issues. That will be done in a followup PR to not complicate things further.

Tests passed	Platform	Renderer	Surface	Environment
1/1	x86	gl	gl
1/1	x86	gl	gl	`OSU_GRAPHICS_NO_SSBO=1`
1/1	x86	veldrid	d3d11
1/1	x86	veldrid	d3d11	`OSU_GRAPHICS_NO_SSBO=1`
0/0	x86	veldrid	metal
0/0	x86	veldrid	metal	`OSU_GRAPHICS_NO_SSBO=1`
0/0	M1/2	veldrid	metal
0/0	M1/2	veldrid	metal	`OSU_GRAPHICS_NO_SSBO=1`
0/0	iOS	veldrid	metal
0/0	iOS	veldrid	metal	`OSU_GRAPHICS_NO_SSBO=1`
0/0	Android	gl	gl
0/0	Android	gl	gl	`OSU_GRAPHICS_NO_SSBO=1`

vNext

Maskable vertex input layout has changed

Vertices which want to use the built-in VertexShaderDescriptor.TEXTURE fragment shader, or the sh_Masking.h particle, will need to adjust the vertex input slightly:

+ #include "Internal/sh_MaskingInfo.h"

layout(location = 0) in vec2 m_Position;
+ layout(location = 1) in int m_MaskingIndex;

+ layout(location = 5) flat out int v_MaskingIndex;
+ layout(location = 6) out highp vec2 v_ScissorPosition;

void main(void)
{
+   InitMasking(m_MaskingIndex);

    // Transform from screen space to masking space.
-   highp vec3 maskingPos = g_ToMaskingSpace * vec3(m_Position, 1.0);
+   highp vec4 maskingPos = g_MaskingInfo.ToMaskingSpace * vec4(m_Position, 1.0, 0.0);
    v_MaskingPosition = maskingPos.xy / maskingPos.z;

+   // Transform from screen space to scissor space.
+   highp vec4 scissorPos = g_MaskingInfo.ToScissorSpace * vec4(m_Position, 1.0, 0.0);
+   v_ScissorPosition = scissorPos.xy / scissorPos.z;

+   v_MaskingIndex = m_MaskingIndex;

    gl_Position = g_ProjMatrix * vec4(m_Position, 1.0, 1.0);
}

Likewise, the C# definition must also be updated to write the m_MaskingIndex input, and be constructed with an IRenderer:

[StructLayout(LayoutKind.Sequential)]
public struct MyCustomVertex : IEquatable<MyCustomVertex>, IVertex
{
    [VertexMember(2, VertexAttribPointerType.Float)]
    public Vector2 Position;

+   [VertexMember(1, VertexAttribPointerType.Int)]
+   private readonly int maskingIndex;

+   public MyCustomVertex(IRenderer renderer)
+   {
+       this = default;
+       maskingIndex = renderer.CurrentMaskingIndex;
+   }

    public readonly bool Equals(MyCustomVertex other) =>
        Position.Equals(other.Position)
+       && maskingIndex == other.maskingIndex;
}

Non-masking fragment shaders must use non-masking vertex shaders

Any usages of VertexShaderDescriptor.TEXTURE_2 as a vertex shader in-combination with a non-masking fragment shader (i.e. one that does not use the built-in sh_Masking.h particle), should be updated to use VertexShaderDescriptor.TEXTURE_2_NO_MASKING instead.

`IRenderer.PushScissorOffset()`/`IRenderer.PopScissorOffset()` is has been removed

The way masking applies scissor has changed such that this no longer has any use.

`MaskingInfo` layout has changed

ScreenSpaceAABB renamed to ScreenSpaceScissorArea.
MaskingRect renamed to MaskingArea
ToScissorSpace added. If ScreenSpaceScissorArea truly represents a screen-space area, set this to Matrix3.Identity, otherwise set it to convert vertex coordinate inputs to the appropriate coordinate space of ScreenSpaceScissorArea.

peppy · 2023-07-29T10:15:38Z

@smoogipoo can you explain the test grid in the OP? I'm not sure what 0/0 means for instance.

smoogipoo · 2023-07-29T10:37:52Z

Number of tests passed / total number of tested systems

I'll adjust it a bit later, but I intend to test more systems until this is merged.

peppy · 2023-07-29T10:57:37Z

If i was to test, are there any specific test scenes i should test against?

smoogipoo · 2023-07-29T16:49:32Z

You could try against TestSceneScissor, TestSceneEffects, and TestSceneMasking. Though I'm not sure whether they're an extensive test of everything that can also be seen in osu!.

peppy · 2023-08-15T14:51:37Z

This needs some non-minor conflict resolution..

bdach · 2023-08-15T16:44:57Z

This needs some non-minor conflict resolution..

It locally automerged for me actually? Think it was just github getting confused again.

bdach

Initial read-through. There are some other parts I'm not 100% on, but that's due to my own lack of knowledge (the ToScissorSpace thing is flying over my head a little bit; will either need to read up more, or organoleptically observe that nothing is broken and let it go).

osu.Framework/Graphics/Veldrid/Shaders/VeldridShaderPart.cs

osu.Framework/Graphics/BufferedDrawNode.cs

bdach · 2023-08-15T17:56:01Z

osu.Framework/Graphics/Shaders/ShaderManager.cs

@@ -149,6 +149,7 @@ protected virtual void Dispose(bool disposing)
    public static class VertexShaderDescriptor
    {
        public const string TEXTURE_2 = "Texture2D";
+        public const string TEXTURE_2_NO_MASKING = "Texture2D_NoMasking";


So my main question in regard to this would be - is anyone else going to need using this, and if so, when? It's not terribly clear to me - if I were to guess, I'd say that this is special-casing BufferedContainer and nothing else should need to use this. Is that right?

Among other reasons, I'm asking to gauge the performance tradeoff here of presumably incurring more batch flushes due to shader rebinds.

Yeah, this is mostly just special-casing BufferedContainer. Didn't realise that until now.

As for performance, BufferedContainer has to break the batch at some point anyway (for as long as its drawing to the FBO), so I don't think that's an issue.

If there ever comes up a case where a user is making a custom shader, they'd run into a shader bind anyway, so I can't imagine they'd lose performance from this either.

I'd really like to remove this at some point in the future though, just need to figure out a better strategy for handling the case where one of the two stages don't use the resource.

bdach · 2023-08-19T12:26:26Z

Source looks ready. Pending platform testing on my side (android being the core one).

peppy

I've been through this multiple times, so will just leave my approval here.

Tested on macOS (intel / apple) and Windows.

bdach · 2023-08-20T18:30:52Z

TestSceneEffects hard crashes on D3D11 here. ETW trace says:

ID3D11DeviceContext::DrawIndexed: The Pixel Shader expects a Render Target View bound to slot 1,
but none is bound. This is OK, as writes of an unbound Render Target View are discarded. It is also
possible the developer knows the data will not be used anyway. This is only a problem if the developer
actually intended to bind a Render Target View here.

Will try to figure it out myself right now but not sure where that'll get me...

bdach · 2023-08-20T20:02:28Z

Unfortunately I'm basically stuck.

Details / what I've tried:

Any usage of the blur effect triggers the above crash.
Stepping through the uniform binding code it looks like everything is in order.
HLSL output of the cross compilation looks correct and matches the uniform bindings.
Toggling OSU_GRAPHICS_NO_SSBO=1 has no effect.
Trying to switch blur to be using the masking vertex shader has no effect.
Explicitly declaring the blur param uniform structure in the non-masking vertex shader has no effect.

Dunno where to take this further. It doesn't help that renderdoc barely works (I can get it to capture, but I can't actually get the command lists to show their contents after the capture is taken).

I'm leaving this be for now in hopes that @smoogipoo can throw an idea or two my way.

The "consolation" is that android works... but performance gains are marginal if any.

smoogipoo · 2023-08-21T07:32:16Z

Concerning that I can't reproduce this :/ Can you debug into the first call to VeldridRenderer.DrawVertices after breaking on

osu-framework/osu.Framework/Graphics/Containers/BufferedContainer_DrawNode.cs

Line 122 in a8a1f13

    
           renderer.DrawFrameBuffer(current, new RectangleF(0, 0, current.Texture.Width, current.Texture.Height), ColourInfo.SingleColour(Color4.White));

And check what uniform/texture layouts you have. This is the layout I get, which shows the RTV (if I'm understanding the one it's referring to) is bound to slot 2 for me:

The above may not lead to anything because the cross compiler should be platform agnostic (you should get the same results as me), but it's still interesting nonetheless - if it's different it may indicate potentially something going wrong with the cache? To double check you could also try renaming your cache dir.

It could be that after D3D11 compiles the shader it notices g_GlobalUniforms is not used in the fragment shader, despite our hack that is intended for Metal...

bdach · 2023-08-21T07:52:10Z

Can you debug into the first call to VeldridRenderer.DrawVertices after breaking on

osu-framework/osu.Framework/Graphics/Containers/BufferedContainer_DrawNode.cs

Line 122 in a8a1f13

renderer.DrawFrameBuffer(current, new RectangleF(0, 0, current.Texture.Width, current.Texture.Height), ColourInfo.SingleColour(Color4.White));

Looks the exact same here.

To double check you could also try renaming your cache dir.

I've nuked my shader cache multiple times when debugging this to no avail, since that was basically the best way I know of to extract the cross-compiled hlsl source to look at it. No dice.

I'll try the "optimised out" angle I guess.

bdach · 2023-08-21T07:59:51Z

If it's getting optimised out it's not at the cross compile step I don't think. Below attached is the output of hlsl cross-compilation.

cbuffer g_GlobalUniforms : register(b0)
{
    uint _139_g_BackbufferDraw : packoffset(c0);
    uint _139_g_IsDepthRangeZeroToOne : packoffset(c0.y);
    uint _139_g_IsClipSpaceYInverted : packoffset(c0.z);
    uint _139_g_IsUvOriginTopLeft : packoffset(c0.w);
    row_major float4x4 _139_g_ProjMatrix : packoffset(c1);
    int _139_g_WrapModeS : packoffset(c5);
    int _139_g_WrapModeT : packoffset(c5.y);
};

cbuffer m_BlurParameters : register(b1)
{
    float2 _163_g_TexSize : packoffset(c0);
    int _163_g_Radius : packoffset(c0.z);
    float _163_g_Sigma : packoffset(c0.w);
    float2 _163_g_BlurDirection : packoffset(c1);
};

Texture2D<float4> m_Texture : register(t0);
SamplerState m_Sampler : register(s0);

static float4 o_Colour;
static float2 v_TexCoord;
static float2 _unused_output_1;
static float2 _unused_input_0;
static float4 _unused_output_2;
static float4 _unused_input_1;
static float4 _unused_output_3;
static float4 _unused_input_3;
static float2 _unused_output_4;
static float2 _unused_input_4;

struct SPIRV_Cross_Input
{
    float2 _unused_input_0 : TEXCOORD0;
    float4 _unused_input_1 : TEXCOORD1;
    float2 v_TexCoord : TEXCOORD2;
    float4 _unused_input_3 : TEXCOORD3;
    float2 _unused_input_4 : TEXCOORD4;
};

struct SPIRV_Cross_Output
{
    float4 o_Colour : SV_Target0;
    float2 _unused_output_1 : SV_Target1;
    float4 _unused_output_2 : SV_Target2;
    float4 _unused_output_3 : SV_Target3;
    float2 _unused_output_4 : SV_Target4;
};

float computeGauss(float x, float sigma)
{
    return (0.3989399969577789306640625f * exp((((-0.5f) * x) * x) / (sigma * sigma))) / sigma;
}

float4 blur(int radius, float2 direction, float2 texCoord, float2 texSize, float sigma)
{
    float param = 0.0f;
    float param_1 = sigma;
    float factor = computeGauss(param, param_1);
    float4 sum = m_Texture.Sample(m_Sampler, texCoord) * factor;
    float totalFactor = factor;
    for (int i = 2; i <= 200; i += 2)
    {
        float x = float(i) - 0.5f;
        float param_2 = x;
        float param_3 = sigma;
        factor = computeGauss(param_2, param_3) * 2.0f;
        totalFactor += (2.0f * factor);
        sum += (m_Texture.Sample(m_Sampler, texCoord + ((direction * x) / texSize)) * factor);
        sum += (m_Texture.Sample(m_Sampler, texCoord - ((direction * x) / texSize)) * factor);
        if (i >= radius)
        {
            break;
        }
    }
    float one = float((_139_g_BackbufferDraw != 0u) ? 0 : 1);
    float4 _155 = (sum / totalFactor.xxxx) * one;
    return _155;
}

void _internal_real_main()
{
    int param = _163_g_Radius;
    float2 param_1 = _163_g_BlurDirection;
    float2 param_2 = v_TexCoord;
    float2 param_3 = _163_g_TexSize;
    float param_4 = _163_g_Sigma;
    o_Colour = blur(param, param_1, param_2, param_3, param_4);
}

void frag_main()
{
    _internal_real_main();
    _unused_output_1 = _unused_input_0;
    _unused_output_2 = _unused_input_1;
    _unused_output_3 = _unused_input_3;
    _unused_output_4 = _unused_input_4;
}

SPIRV_Cross_Output main(SPIRV_Cross_Input stage_input)
{
    v_TexCoord = stage_input.v_TexCoord;
    _unused_input_0 = stage_input._unused_input_0;
    _unused_input_1 = stage_input._unused_input_1;
    _unused_input_3 = stage_input._unused_input_3;
    _unused_input_4 = stage_input._unused_input_4;
    frag_main();
    SPIRV_Cross_Output stage_output;
    stage_output.o_Colour = o_Colour;
    stage_output._unused_output_1 = _unused_output_1;
    stage_output._unused_output_2 = _unused_output_2;
    stage_output._unused_output_3 = _unused_output_3;
    stage_output._unused_output_4 = _unused_output_4;
    return stage_output;
}

The _139_g_BackbufferDraw branch seems to be still alive and well.

smoogipoo · 2023-08-21T08:04:28Z

Yeah, if anything it would be at a D3D11 compiler level, not cross-compile. But even then I struggle to imagine how it could infer that.

Debugging this is probably going to require going into Veldrid itself...

bdach · 2023-08-21T08:18:30Z

Yeah, if anything it would be at a D3D11 compiler level, not cross-compile.

There is circumstantial crude evidence that this could be the case. Namely that:

blurShader = shaders.Load(VertexShaderDescriptor.TEXTURE_2_NO_MASKING, FragmentShaderDescriptor.BLUR); crashes
blurShader = shaders.Load(VertexShaderDescriptor.TEXTURE_2, FragmentShaderDescriptor.BLUR); (with no further changes anywhere else) crashes
blurShader = shaders.Load(VertexShaderDescriptor.TEXTURE_2, FragmentShaderDescriptor.TEXTURE); (with no further changes anywhere else) works

bdach · 2023-08-21T12:05:41Z

After extensive debugging it appears that this PR is not the cause of the aforementioned crash as it also happens on master on my PC. Looks to be caused by #5687 specifically.

In light of this, unblocking and proceeding with merge.

smoogipoo added 10 commits July 29, 2023 04:45

Merge branch 'remove-depthwrappingvertex' into masking-ssbo-2

408affa

Allow shaders to have more than 1 "global" buffer

9e6d912

Add masking index to vertices

e57b8fc

Move masking to an SSBO

b78b2b1

Rename/better document MaskingInfo members

fe5ce91

Add scissor support

34cd65d

Add scissor test scene

6fe3610

Expose CurrentMaskingIndex

ba47ad6

Fix masking/scissoring inside BufferedDrawNode

2076762

Adjust size to keep UBO under 16KiB

9e9c202

smoogipoo added type:performance area:graphics area:renderer-veldrid area:renderer-gl labels Jul 28, 2023

pull-request-size bot added the size/XXL label Jul 28, 2023

smoogipoo mentioned this pull request Jul 28, 2023

Send masking parameters through via texture buffer #5526

Closed

smoogipoo added 2 commits July 29, 2023 09:01

Merge branch 'shader-storage-buffer-objects' into masking-ssbo-2

11f58bc

Remove unused SetMasking flush source

1a12533

Update packages

d71f9c7

smoogipoo mentioned this pull request Aug 3, 2023

Use RectangleF for masking scissor #5960

Merged

1 task

bdach added 2 commits August 15, 2023 18:38

Merge branch 'master' into masking-ssbo-2

cdd8696

Fix invalid member reference in xmldoc

342b095

pull-request-size bot added size/XL and removed size/XXL labels Aug 15, 2023

bdach reviewed Aug 15, 2023

View reviewed changes

smoogipoo added 2 commits August 18, 2023 17:08

Update comments

09114b5

Add DrawRectangle to IDrawable

2a7fb2f

peppy approved these changes Aug 19, 2023

View reviewed changes

bdach added the blocked label Aug 20, 2023

peppy mentioned this pull request Aug 20, 2023

Ensure the majority of users get good performance ppy/osu#18892

Closed

bdach approved these changes Aug 21, 2023

View reviewed changes

Merge branch 'master' into masking-ssbo-2

be8894e

bdach enabled auto-merge August 21, 2023 12:06

bdach removed the blocked label Aug 21, 2023

bdach merged commit 844a9ee into ppy:master Aug 21, 2023

LumpBloom7 mentioned this pull request Sep 8, 2023

Bump ppy.osu.Game from 2023.717.0 to 2023.908.0 taulazer/tau#399

Merged

smoogipoo deleted the masking-ssbo-2 branch September 11, 2023 02:25

smoogipoo mentioned this pull request Sep 13, 2023

Revert masking SSBO changes #5990

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use SSBOs for bindless masking #5952

Use SSBOs for bindless masking #5952

smoogipoo commented Jul 28, 2023 •

edited by bdach

Loading

peppy commented Jul 29, 2023

smoogipoo commented Jul 29, 2023

peppy commented Jul 29, 2023

smoogipoo commented Jul 29, 2023 •

edited

Loading

peppy commented Aug 15, 2023

bdach commented Aug 15, 2023

bdach left a comment

bdach Aug 15, 2023

smoogipoo Aug 18, 2023

bdach commented Aug 19, 2023

peppy left a comment

bdach commented Aug 20, 2023 •

edited

Loading

bdach commented Aug 20, 2023 •

edited

Loading

smoogipoo commented Aug 21, 2023

bdach commented Aug 21, 2023

bdach commented Aug 21, 2023

smoogipoo commented Aug 21, 2023

bdach commented Aug 21, 2023

bdach commented Aug 21, 2023

Use SSBOs for bindless masking #5952

Use SSBOs for bindless masking #5952

Conversation

smoogipoo commented Jul 28, 2023 • edited by bdach Loading

vNext

Maskable vertex input layout has changed

Non-masking fragment shaders must use non-masking vertex shaders

IRenderer.PushScissorOffset()/IRenderer.PopScissorOffset() is has been removed

MaskingInfo layout has changed

peppy commented Jul 29, 2023

smoogipoo commented Jul 29, 2023

peppy commented Jul 29, 2023

smoogipoo commented Jul 29, 2023 • edited Loading

peppy commented Aug 15, 2023

bdach commented Aug 15, 2023

bdach left a comment

Choose a reason for hiding this comment

bdach Aug 15, 2023

Choose a reason for hiding this comment

smoogipoo Aug 18, 2023

Choose a reason for hiding this comment

bdach commented Aug 19, 2023

peppy left a comment

Choose a reason for hiding this comment

bdach commented Aug 20, 2023 • edited Loading

bdach commented Aug 20, 2023 • edited Loading

smoogipoo commented Aug 21, 2023

bdach commented Aug 21, 2023

bdach commented Aug 21, 2023

smoogipoo commented Aug 21, 2023

bdach commented Aug 21, 2023

bdach commented Aug 21, 2023

smoogipoo commented Jul 28, 2023 •

edited by bdach

Loading

`IRenderer.PushScissorOffset()`/`IRenderer.PopScissorOffset()` is has been removed

`MaskingInfo` layout has changed

smoogipoo commented Jul 29, 2023 •

edited

Loading

bdach commented Aug 20, 2023 •

edited

Loading

bdach commented Aug 20, 2023 •

edited

Loading