CMAA2 integration #655

KirillAlekseeenko · 2024-07-17T18:23:58Z

-Integration implemented with different quality modes (higher mode means lower edge detection threshold)
-Tonemaping switches to compute when CMAA2 is on in order to avoid additional copy from RT resource to UAV.
-Created additional HDR luminance resource, but currently it's not used, since there is a problem, that in some locations HDR range is so small, that almost no edges are detected even on max quality preset.

initial implemetation with following compromises: -redundant copies, since the algorithm is working with UAV in-place -edge detection happens in LDR; instead HDR luminance diff could be created during tonemapping

tonemapping changed to compute when CMAA2 is on which allows to save perf by not copying tonemap RT into CMAA2 uav (since CMAA2 has effect only on some pixels, where complex or simple shapes are)

There's a problem with too dark HDR linear, so that edges are not detected with current threshold values

This reverts commit 5f977fc.

Try

Have had only quick look yet. Some overall notes:

Do we really need so many quality presets? Can we simplify code a bit by selection only one or two that do fit well?
Definitely need to remove MSAA cases, MSAA is anti-aliasing by itself and also not supported in engine.
Need to update submodules..

Also good finding in shader/copy_img.comp!

game/graphics/renderer.cpp

Try · 2024-07-17T22:16:36Z

shader/lighting/tonemapping.frag

@@ -10,6 +10,32 @@

 #include "upscale/lanczos.glsl"

+#if defined(COMPUTE)
+const uint THREADGROUP_SIZE = 8;


Any reason to have a constant for THREADGROUP_SIZE? gl_WorkGroupSize should be just fine.

Try · 2024-07-17T22:29:07Z

shader/antialiasing/cmaa2/compute_dispatch_args.comp

+#include "cmaa2_common.glsl"
+
+layout(binding = 8) buffer UboWorkingExecuteIndirectBuffer {
+  uint g_workingExecuteIndirectBuffer[];


Need to add explicit memory layout: std430.
Also why an array? Would it be nicer to have:

uvec3 shapeCandidateIndirectArg;

?

Try · 2024-07-17T22:30:38Z

shader/antialiasing/cmaa2/cmaa2_common.glsl

+
+#if CMAA2_EXTRA_SHARPNESS
+  const float c_dampeningEffect = 0.11;
+  #define g_CMAA2_LocalContrastAdaptationAmount       0.15f


why sometimes use const and sometimes #define ?

Note: generally const is preferred in opengothic, when possible.

Try · 2024-07-17T22:37:05Z

shader/antialiasing/cmaa2/compute_dispatch_args.comp

+void main() {
+  uvec3 groupID = gl_WorkGroupID;
+
+  if(groupID.x == 1) {


Here gl_WorkGroupID is uses as essentially a flag. Better to just use push-constant.

Try · 2024-07-17T22:39:29Z

shader/CMakeLists.txt

@@ -196,6 +196,9 @@ add_shader(tonemapping.vert          copy.vert -DHAS_UV)
 add_shader(tonemapping.frag          lighting/tonemapping.frag)
 add_shader(tonemapping_up.frag       lighting/tonemapping.frag -DUPSCALE)

+add_shader(tonemapping.comp          lighting/tonemapping.frag -S comp -DCOMPUTE)
+add_shader(tonemapping_up.comp       lighting/tonemapping.frag -S comp -DCOMPUTE -DUPSCALE)


Does it make sense to have need anti-aliasing + upscale together?

In FSR1 documentation there's a note that it's better to use antialiased input to FSR1 and we have the same Lanczos filter for upscaling. But since any MLAA algorithm doesn't provide additional samples unlike TAA or MSAA, it seems that edge reconstruction is just a reverse process.

Let's remove tonemapping_up then

KirillAlekseeenko · 2024-07-18T18:24:55Z

4 presets is too much, I just repeated Intel's demo here; MIDDLE and ULTRA should be enough.
I thought about potential integration with MSAA in a future where MSAA is for geometry aliasing and CMAA2 is for anything else.

Try

About compy_img.comp: would you mind if I'll commit fix for local_size_y separately?
Asking as review of PR might take a bit of time.

game/graphics/renderer.cpp

Try · 2024-07-21T22:11:45Z

game/graphics/renderer.cpp

+  static bool isFirstRun = true;
+  // initialization that is needed only on the first run. Make it run only in the first run
+  if (isFirstRun) {
+    cmd.setUniforms(*cmaa2.prepareDispatchIndirectArguments, cmaa2.prepareDispatchIndirectArgumentsUbo, &processCandidatesSetupFlag, sizeof(uint32_t));


line feel a bit wordy, maybe prepareDispatchIndirectArguments -> indirectArgs ?

game/graphics/renderer.cpp

shader/antialiasing/cmaa2/cmaa2_common.glsl

Try · 2024-07-21T22:40:47Z

shader/antialiasing/cmaa2/cmaa2_common.glsl

+  }
+
+uint PackFloat32AsFloat16AndConvertToUint(float arg) {
+  return packHalf2x16(vec2(arg, 0.));


why introduce proxy function, instead of calling packHalf2x16 directly?

To not repeat packHalf2x16(vec2(arg, 0.)) for each channel and make it like f32tof16 in HLSL.

Probably name should be something ala packHalf1x16, to be consistent with existing GLSL functions?

shader/antialiasing/cmaa2/process_candidates.comp

Try · 2024-07-21T23:34:24Z

shader/lighting/tonemapping.frag

+#if defined(COMPUTE)
+layout(local_size_x = 8, local_size_y = 8, local_size_z = 1) in;
+layout(binding = 2) uniform writeonly image2D tonemappedOutput;
+layout(r32f, binding = 3) uniform writeonly image2D hdrLumaOutput;


writeonly doesn't require explicit pixel-format specifier

Also compute variant can be removed, if you segregate sceneTonemapped (pre CMAA) and anti-aliased sceneTonemapped, that is written by deferred_color_apply_2x2.comp

No, it's the same sceneTonemapped as used in cmaa2

In cmaa2_common.glsl;
layout(binding = 0) uniform sampler2D inputColorReadonly; - not a storage-image, but a simple texture. Also good to rename it to sceneTonemapped, in order to math with c++ side.

In deferred_color_apply_2x2.comp outputColor is indeed a writable image, but you dont have to write into a same one.
Maybe deferred_color_apply_2x2 can later be converted into fragment-shader? It's not heavy one compute features, other then a unordered-write.

game/graphics/renderer.cpp

shader/antialiasing/cmaa2/cmaa2_common.glsl

KirillAlekseeenko · 2024-07-23T19:44:17Z

About copy_img.comp, yeah, it's better to commit separately

#655

game/graphics/renderer.cpp

Try · 2024-07-23T22:43:32Z

shader/antialiasing/cmaa2/cmaa2_common.glsl

+layout(r32ui, binding = 6) uniform uimage2D workingDeferredBlendItemListHeads;
+
+layout(binding = 7) buffer UboWorkingControlBuffer {
+  uint workingControlBuffer[];


Seem to be part of porting HLSL code. Based on use-cases:

workingControlBuffer[3]; // numCandidates, workingControlBuffer workingControlBuffer[4]; // shapeCandidateCount workingControlBuffer[8]; // blendLocationCount, edgeListCounter workingControlBuffer[12]; // counterIndex

Haven't found other array elements to be in use. Maybe due to removal of extra variants?

No, extra variants didnt require additional counters. I suppose counters are aligned for faster access. 3 and 4 are close probably because they are used together in the indirect setup pass

There is a publication about atomics performance profile: https://www.youtube.com/watch?v=VaE_uKPfjv0
I don't remember details now, yet my takeaway was that only single variable case is fast-path. Anything else - unpredictable across multiple gpu vendors.

In any case no point of using array, if you really need padding - use struct:

buffer { uint numCandidates; uint shapeCandidateCount; uint padding0[3]; uint blendLocationCount; ... }

Try · 2024-07-23T22:47:19Z

game/graphics/renderer.cpp

@@ -187,7 +214,9 @@ void Renderer::initSettings() {

  auto prevVidResIndex = settings.vidResIndex;
  settings.vidResIndex = Gothic::inst().settingsGetF("INTERNAL","vidResIndex");
-  settings.fxaaEnabled = (Gothic::options().fxaaPreset > 0) && (settings.vidResIndex==0);
+  settings.cmaa2Enabled = (Gothic::options().cmaa2Preset>0) && (settings.vidResIndex==0);
+  settings.fxaaEnabled = (Gothic::options().fxaaPreset>0) && (settings.vidResIndex==0) && !settings.cmaa2Enabled;


Probably we can just remove FXAA and also remove explicit name of the technique from command-line. Just -aa 1, instead of -cmaa2 1.

Try · 2024-07-23T23:29:21Z

shader/antialiasing/cmaa2/deferred_color_apply_2x2.comp

+#define CMAA2_UAV_STORE_TYPED
+#define CMAA2_UAV_STORE_TYPED_UNORM_FLOAT
+
+#ifdef CMAA2_UAV_STORE_TYPED


We probably can keep only code relevant to CMAA2_UAV_STORE_TYPED_UNORM_FLOAT:
rgba32f is overkill, probably wont use it for RT anytime soon;
r32ui (untyped) implies VK_KHR_image_format_list (or similar extension), that we do not support

Try · 2024-07-24T21:44:21Z

shader/CMakeLists.txt

@@ -196,6 +196,9 @@ add_shader(tonemapping.vert          copy.vert -DHAS_UV)
 add_shader(tonemapping.frag          lighting/tonemapping.frag)
 add_shader(tonemapping_up.frag       lighting/tonemapping.frag -DUPSCALE)

+add_shader(tonemapping.comp          lighting/tonemapping.frag -S comp -DCOMPUTE)
+add_shader(tonemapping_up.comp       lighting/tonemapping.frag -S comp -DCOMPUTE -DUPSCALE)


Let's remove tonemapping_up then

shader/antialiasing/cmaa2/cmaa2_common.glsl

Try · 2024-07-24T22:00:09Z

shader/antialiasing/cmaa2/cmaa2_common.glsl

+  }
+
+uint PackFloat32AsFloat16AndConvertToUint(float arg) {
+  return packHalf2x16(vec2(arg, 0.));


Probably name should be something ala packHalf1x16, to be consistent with existing GLSL functions?

Try · 2024-07-24T22:08:24Z

shader/lighting/tonemapping.frag

+#if defined(COMPUTE)
+layout(local_size_x = 8, local_size_y = 8, local_size_z = 1) in;
+layout(binding = 2) uniform writeonly image2D tonemappedOutput;
+layout(r32f, binding = 3) uniform writeonly image2D hdrLumaOutput;


In cmaa2_common.glsl;
layout(binding = 0) uniform sampler2D inputColorReadonly; - not a storage-image, but a simple texture. Also good to rename it to sceneTonemapped, in order to math with c++ side.

In deferred_color_apply_2x2.comp outputColor is indeed a writable image, but you dont have to write into a same one.
Maybe deferred_color_apply_2x2 can later be converted into fragment-shader? It's not heavy one compute features, other then a unordered-write.

-indirect buffer size -naming -removed extra output formats

Try · 2024-07-30T18:19:55Z

@KirillAlekseeenko Pr generally looks good. Last few changes can do by myself.

Let me know, when you happy with current state of PR, so I can start testing/finalizing

KirillAlekseeenko · 2024-07-31T17:04:22Z

Seems that it's ready for integration

shader/antialiasing/cmaa2/compute_dispatch_args.comp

…the next one

Try · 2024-08-05T17:26:10Z

Experimenting with edge detection in HDR space:

Final image(HDR)	Footprint

Final image(LDR)	Footprint

In LDR path, I've changed encoding from packR11G11B10E4F to packUnorm4x8, as image is already tonemapped and in gamma-space.
In HDR path I'm using packR11G11B10F, to math frame buffer format.

HDR luminance resource, but currently it's not used, since there is a problem

Computing luminance in flight via CMAA2_SUPPORT_HDR_COLOR_RANGE==1 seem to work.

At this point I'm thing to make HDR as default and only path, since it fits better into existing pipeline (0.06ms performance saved on RTX, compare to LDR!) and some VRAM bandwidth savings.

@KirillAlekseeenko let me know what you think about it.

KirillAlekseeenko · 2024-08-05T21:57:21Z

I wanted to use HDR luminance but encountered cases where HDR is much darker than LDR like on the shots below. So finally decided to stay with LDR. The game was run without cmd args except for -aa, colors on screens are in [0, 1] range
HDR:

LDR:

Try · 2024-08-05T23:04:58Z

I wanted to use HDR luminance but encountered cases where HDR is much darker

It seems that tonemapping been applied twice or not applied at all on you image.

Here same spot on my working branch (9:00am):

KirillAlekseeenko · 2024-08-06T08:01:42Z

There's an input to tonemapping pass and sky pixel history. I didnt look deeply into this but for some reason after sky and fog passes color values are around 0.1. Could I mess up with game settings or something else (maybe some hidden setting)? Capture is done on PR's branch without local changes.

Try · 2024-08-06T21:01:44Z

I didnt look deeply into this but for some reason after sky and fog passes color values are around 0.1.

Here how sceneLinear (pre tonemapping) look on my end:

if we divide it by exposure value in a frame = 0.0000088533, we will get sky brightness = 18 072lum, what is fine for day-time, with clear sky.

Can you check if after my changes (HDR path) rendering works fine?

KirillAlekseeenko · 2024-08-07T21:59:03Z

It works fine but there's two-three times less candidate pixels than with LDR pass, maybe it's worth trying adaptive edge detection threshold based on exposure value

Try · 2024-08-07T22:52:17Z

maybe it's worth trying adaptive edge detection threshold based on exposure value

Probably no, at least I don't see how it can be better. Right now input is pre-exposed, close to [0..1] range. Maybe raise cmaa2EdgeThreshold; but I'm not sure how to reason about quality then - as it's gonna be subjective.

KirillAlekseeenko · 2024-08-08T19:03:22Z

Seems that there's no need to raise threshold, I just took a look at nsight trace and it's even faster than fxaa

Try · 2024-08-08T21:35:12Z

Merged, thanks!

KirillAlekseeenko added 8 commits July 11, 2024 00:33

initial commit

0afef90

initial implemetation with following compromises: -redundant copies, since the algorithm is working with UAV in-place -edge detection happens in LDR; instead HDR luminance diff could be created during tonemapping

tonemapping changed to compute when CMAA2 is on

c180693

tonemapping changed to compute when CMAA2 is on which allows to save perf by not copying tonemap RT into CMAA2 uav (since CMAA2 has effect only on some pixels, where complex or simple shapes are)

added HDR luminance edge detection

845afdd

There's a problem with too dark HDR linear, so that edges are not detected with current threshold values

code cleaning

b89a9a2

Merge remote-tracking branch 'upstream/master' into CMAA2-integration

cfcd702

merge with upstream

5f977fc

Revert "merge with upstream"

2b4c077

This reverts commit 5f977fc.

revert excess updates

ced78dc

Try reviewed Jul 17, 2024

View reviewed changes

KirillAlekseeenko added 4 commits July 21, 2024 17:38

work on comments in review (except for removing MSAA cases)

495f3d4

removed msaa cases

393a760

Merge remote-tracking branch 'upstream/master' into CMAA2-integration

92f9f20

macos build fix

2846bd8

Try reviewed Jul 21, 2024

View reviewed changes

game/graphics/renderer.cpp Outdated Show resolved Hide resolved

Try reviewed Jul 21, 2024

View reviewed changes

shader/antialiasing/cmaa2/cmaa2_common.glsl Outdated Show resolved Hide resolved

work on review (except for bidnings, remapping and 16x16 threadgroup)

aefc65e

Try added a commit that referenced this pull request Jul 23, 2024

fix workgroup size in copy-image

6c7e9c7

#655

KirillAlekseeenko added 3 commits July 23, 2024 23:59

Merge remote-tracking branch 'upstream/master' into CMAA2-integration

877e417

threadgroup 16x8, removed remapping

3c39820

additional fixes

4939f35

Try reviewed Jul 24, 2024

View reviewed changes

KirillAlekseeenko added 5 commits July 26, 2024 01:25

additional fixes

fc8b660

-indirect buffer size -naming -removed extra output formats

Merge branch 'Try:master' into CMAA2-integration

75e80cf

workingControlBuffer refactored

82d1ef0

removed fxaa

c06bc76

Merge remote-tracking branch 'upstream/master' into CMAA2-integration

f74a3ce

KirillAlekseeenko added 2 commits July 30, 2024 00:40

renamed Cmaa2Preset -> AaPreset

dad786e

code style

1b8bee1

Try reviewed Aug 3, 2024

View reviewed changes

shader/antialiasing/cmaa2/compute_dispatch_args.comp Outdated Show resolved Hide resolved

shader/antialiasing/cmaa2/compute_dispatch_args.comp Outdated Show resolved Hide resolved

Try added 12 commits August 3, 2024 23:11

implement apply as draw-indirect

92e045f

Delete deferred_color_apply_2x2.comp

ae03fad

cleanup tonemapping

9ad5276

fixup

c022e18

move pack function to common; some naming stuff

26b7af9

fix layout transition for swapchain

14c63a0

add indirect commands structs

cee13d0

merge image-processing shaders with settingup indirect arguments for …

d362db4

…the next one

fixup

878acce

remove more shader options

79579b4

Merge branch 'master' into pr/655

d9efb9c

compact ubo bindings

3b39bc3

HDR path

c34d6a4

fixup barriers

b889da9

final cleanups

69eb8c6

Try merged commit ff2adde into Try:master Aug 8, 2024
1 check was pending

CMAA2 integration #655

CMAA2 integration #655

Conversation

KirillAlekseeenko commented Jul 17, 2024

Try left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

KirillAlekseeenko commented Jul 18, 2024

Try left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

KirillAlekseeenko commented Jul 23, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Try commented Jul 30, 2024

KirillAlekseeenko commented Jul 31, 2024

Try commented Aug 5, 2024

KirillAlekseeenko commented Aug 5, 2024 • edited Loading

Try commented Aug 5, 2024

KirillAlekseeenko commented Aug 6, 2024 • edited Loading

Try commented Aug 6, 2024 • edited Loading

KirillAlekseeenko commented Aug 7, 2024

Try commented Aug 7, 2024

KirillAlekseeenko commented Aug 8, 2024

Try commented Aug 8, 2024

KirillAlekseeenko commented Jul 23, 2024 •

edited

Loading

KirillAlekseeenko commented Aug 5, 2024 •

edited

Loading

KirillAlekseeenko commented Aug 6, 2024 •

edited

Loading

Try commented Aug 6, 2024 •

edited

Loading