Use fullscreen tri instead of quad #80311
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
As outlined in godotengine/godot-proposals#7366 a full screen triangle is around 5-10% faster than a fullscreen quad.
The main reason is that the diagonal induces inefficiencies in both how rasterized pixels are assigned to threads and how caches are missed (in the case where there are texture/memory accesses) since the borders are processed at different times.
This trick works optimally as long as the render target resolution is < 16384x16384 due to something called guardband clipping, since most modern GPUs implement a guardband of [-32768; 32767] (note: the paper talks about a guardband of -2048; 2047 because it's from the year 2000). If the guardband is exceeded the GPU must internally split the triangle in 2 quads (which is relatively expensive) and when we rasterize the fullscreen tri we use the range [0; width2] x [0; height2].
Most affected by this is the Mobile renderer, since the Clustered one uses Compute Shaders instead of Pixel Shaders.
However Clustered Forward still uses some fullscreen quads for things like Tonemapping and Sky.
I've only touched the Vulkan renderer. The same optimization could be applied to the GLES3 driver.