Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CMAA2 integration #655

Merged
merged 44 commits into from
Aug 8, 2024
Merged

CMAA2 integration #655

merged 44 commits into from
Aug 8, 2024

Conversation

KirillAlekseeenko
Copy link
Contributor

-Integration implemented with different quality modes (higher mode means lower edge detection threshold)
-Tonemaping switches to compute when CMAA2 is on in order to avoid additional copy from RT resource to UAV.
-Created additional HDR luminance resource, but currently it's not used, since there is a problem, that in some locations HDR range is so small, that almost no edges are detected even on max quality preset.

initial implemetation with following compromises:
-redundant copies, since the algorithm is working with UAV in-place
-edge detection happens in LDR; instead HDR luminance diff could be created during tonemapping
tonemapping changed to compute when CMAA2 is on which allows to save perf by not copying tonemap RT into CMAA2 uav (since CMAA2 has effect only on some pixels, where complex or simple shapes are)
There's a problem with too dark HDR linear, so that edges are not detected with current threshold values
Copy link
Owner

@Try Try left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have had only quick look yet. Some overall notes:

  • Do we really need so many quality presets? Can we simplify code a bit by selection only one or two that do fit well?
  • Definitely need to remove MSAA cases, MSAA is anti-aliasing by itself and also not supported in engine.
  • Need to update submodules..

Also good finding in shader/copy_img.comp!

game/graphics/renderer.cpp Outdated Show resolved Hide resolved
game/graphics/renderer.cpp Outdated Show resolved Hide resolved
@@ -10,6 +10,32 @@

#include "upscale/lanczos.glsl"

#if defined(COMPUTE)
const uint THREADGROUP_SIZE = 8;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to have a constant for THREADGROUP_SIZE? gl_WorkGroupSize should be just fine.

#include "cmaa2_common.glsl"

layout(binding = 8) buffer UboWorkingExecuteIndirectBuffer {
uint g_workingExecuteIndirectBuffer[];
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to add explicit memory layout: std430.
Also why an array? Would it be nicer to have:

uvec3 shapeCandidateIndirectArg;

?


#if CMAA2_EXTRA_SHARPNESS
const float c_dampeningEffect = 0.11;
#define g_CMAA2_LocalContrastAdaptationAmount 0.15f
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why sometimes use const and sometimes #define ?

Note: generally const is preferred in opengothic, when possible.

void main() {
uvec3 groupID = gl_WorkGroupID;

if(groupID.x == 1) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here gl_WorkGroupID is uses as essentially a flag. Better to just use push-constant.

@@ -196,6 +196,9 @@ add_shader(tonemapping.vert copy.vert -DHAS_UV)
add_shader(tonemapping.frag lighting/tonemapping.frag)
add_shader(tonemapping_up.frag lighting/tonemapping.frag -DUPSCALE)

add_shader(tonemapping.comp lighting/tonemapping.frag -S comp -DCOMPUTE)
add_shader(tonemapping_up.comp lighting/tonemapping.frag -S comp -DCOMPUTE -DUPSCALE)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to have need anti-aliasing + upscale together?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In FSR1 documentation there's a note that it's better to use antialiased input to FSR1 and we have the same Lanczos filter for upscaling. But since any MLAA algorithm doesn't provide additional samples unlike TAA or MSAA, it seems that edge reconstruction is just a reverse process.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove tonemapping_up then

@KirillAlekseeenko
Copy link
Contributor Author

  1. 4 presets is too much, I just repeated Intel's demo here; MIDDLE and ULTRA should be enough.
  2. I thought about potential integration with MSAA in a future where MSAA is for geometry aliasing and CMAA2 is for anything else.

Copy link
Owner

@Try Try left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About compy_img.comp: would you mind if I'll commit fix for local_size_y separately?
Asking as review of PR might take a bit of time.

game/graphics/renderer.cpp Outdated Show resolved Hide resolved
game/graphics/renderer.cpp Outdated Show resolved Hide resolved
game/graphics/renderer.cpp Outdated Show resolved Hide resolved
static bool isFirstRun = true;
// initialization that is needed only on the first run. Make it run only in the first run
if (isFirstRun) {
cmd.setUniforms(*cmaa2.prepareDispatchIndirectArguments, cmaa2.prepareDispatchIndirectArgumentsUbo, &processCandidatesSetupFlag, sizeof(uint32_t));
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line feel a bit wordy, maybe prepareDispatchIndirectArguments -> indirectArgs ?

game/graphics/renderer.cpp Outdated Show resolved Hide resolved
shader/antialiasing/cmaa2/cmaa2_common.glsl Outdated Show resolved Hide resolved
shader/antialiasing/cmaa2/cmaa2_common.glsl Outdated Show resolved Hide resolved
shader/antialiasing/cmaa2/cmaa2_common.glsl Outdated Show resolved Hide resolved
}

uint PackFloat32AsFloat16AndConvertToUint(float arg) {
return packHalf2x16(vec2(arg, 0.));
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why introduce proxy function, instead of calling packHalf2x16 directly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To not repeat packHalf2x16(vec2(arg, 0.)) for each channel and make it like f32tof16 in HLSL.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably name should be something ala packHalf1x16, to be consistent with existing GLSL functions?

shader/antialiasing/cmaa2/process_candidates.comp Outdated Show resolved Hide resolved
#if defined(COMPUTE)
layout(local_size_x = 8, local_size_y = 8, local_size_z = 1) in;
layout(binding = 2) uniform writeonly image2D tonemappedOutput;
layout(r32f, binding = 3) uniform writeonly image2D hdrLumaOutput;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

writeonly doesn't require explicit pixel-format specifier

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also compute variant can be removed, if you segregate sceneTonemapped (pre CMAA) and anti-aliased sceneTonemapped, that is written by deferred_color_apply_2x2.comp

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's the same sceneTonemapped as used in cmaa2

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In cmaa2_common.glsl;
layout(binding = 0) uniform sampler2D inputColorReadonly; - not a storage-image, but a simple texture. Also good to rename it to sceneTonemapped, in order to math with c++ side.

In deferred_color_apply_2x2.comp outputColor is indeed a writable image, but you dont have to write into a same one.
Maybe deferred_color_apply_2x2 can later be converted into fragment-shader? It's not heavy one compute features, other then a unordered-write.

game/graphics/renderer.cpp Outdated Show resolved Hide resolved
@KirillAlekseeenko
Copy link
Contributor Author

KirillAlekseeenko commented Jul 23, 2024

About copy_img.comp, yeah, it's better to commit separately

Try added a commit that referenced this pull request Jul 23, 2024
game/graphics/renderer.cpp Outdated Show resolved Hide resolved
layout(r32ui, binding = 6) uniform uimage2D workingDeferredBlendItemListHeads;

layout(binding = 7) buffer UboWorkingControlBuffer {
uint workingControlBuffer[];
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seem to be part of porting HLSL code. Based on use-cases:

workingControlBuffer[3]; // numCandidates, workingControlBuffer 
workingControlBuffer[4]; // shapeCandidateCount  
workingControlBuffer[8]; // blendLocationCount, edgeListCounter
workingControlBuffer[12]; // counterIndex 

Haven't found other array elements to be in use. Maybe due to removal of extra variants?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, extra variants didnt require additional counters. I suppose counters are aligned for faster access. 3 and 4 are close probably because they are used together in the indirect setup pass

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a publication about atomics performance profile: https://www.youtube.com/watch?v=VaE_uKPfjv0
I don't remember details now, yet my takeaway was that only single variable case is fast-path. Anything else - unpredictable across multiple gpu vendors.

In any case no point of using array, if you really need padding - use struct:

buffer {
  uint numCandidates;
  uint shapeCandidateCount;
  uint padding0[3];
  uint blendLocationCount;
  ...
  }

@@ -187,7 +214,9 @@ void Renderer::initSettings() {

auto prevVidResIndex = settings.vidResIndex;
settings.vidResIndex = Gothic::inst().settingsGetF("INTERNAL","vidResIndex");
settings.fxaaEnabled = (Gothic::options().fxaaPreset > 0) && (settings.vidResIndex==0);
settings.cmaa2Enabled = (Gothic::options().cmaa2Preset>0) && (settings.vidResIndex==0);
settings.fxaaEnabled = (Gothic::options().fxaaPreset>0) && (settings.vidResIndex==0) && !settings.cmaa2Enabled;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably we can just remove FXAA and also remove explicit name of the technique from command-line. Just -aa 1, instead of -cmaa2 1.

#define CMAA2_UAV_STORE_TYPED
#define CMAA2_UAV_STORE_TYPED_UNORM_FLOAT

#ifdef CMAA2_UAV_STORE_TYPED
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably can keep only code relevant to CMAA2_UAV_STORE_TYPED_UNORM_FLOAT:
rgba32f is overkill, probably wont use it for RT anytime soon;
r32ui (untyped) implies VK_KHR_image_format_list (or similar extension), that we do not support

@@ -196,6 +196,9 @@ add_shader(tonemapping.vert copy.vert -DHAS_UV)
add_shader(tonemapping.frag lighting/tonemapping.frag)
add_shader(tonemapping_up.frag lighting/tonemapping.frag -DUPSCALE)

add_shader(tonemapping.comp lighting/tonemapping.frag -S comp -DCOMPUTE)
add_shader(tonemapping_up.comp lighting/tonemapping.frag -S comp -DCOMPUTE -DUPSCALE)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove tonemapping_up then

shader/antialiasing/cmaa2/cmaa2_common.glsl Outdated Show resolved Hide resolved
}

uint PackFloat32AsFloat16AndConvertToUint(float arg) {
return packHalf2x16(vec2(arg, 0.));
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably name should be something ala packHalf1x16, to be consistent with existing GLSL functions?

#if defined(COMPUTE)
layout(local_size_x = 8, local_size_y = 8, local_size_z = 1) in;
layout(binding = 2) uniform writeonly image2D tonemappedOutput;
layout(r32f, binding = 3) uniform writeonly image2D hdrLumaOutput;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In cmaa2_common.glsl;
layout(binding = 0) uniform sampler2D inputColorReadonly; - not a storage-image, but a simple texture. Also good to rename it to sceneTonemapped, in order to math with c++ side.

In deferred_color_apply_2x2.comp outputColor is indeed a writable image, but you dont have to write into a same one.
Maybe deferred_color_apply_2x2 can later be converted into fragment-shader? It's not heavy one compute features, other then a unordered-write.

@Try
Copy link
Owner

Try commented Jul 30, 2024

@KirillAlekseeenko Pr generally looks good. Last few changes can do by myself.

Let me know, when you happy with current state of PR, so I can start testing/finalizing

@KirillAlekseeenko
Copy link
Contributor Author

Seems that it's ready for integration

@Try
Copy link
Owner

Try commented Aug 5, 2024

Experimenting with edge detection in HDR space:

Final image(HDR) Footprint
изображение изображение
Final image(LDR) Footprint
изображение изображение

In LDR path, I've changed encoding from packR11G11B10E4F to packUnorm4x8, as image is already tonemapped and in gamma-space.
In HDR path I'm using packR11G11B10F, to math frame buffer format.

HDR luminance resource, but currently it's not used, since there is a problem

Computing luminance in flight via CMAA2_SUPPORT_HDR_COLOR_RANGE==1 seem to work.

At this point I'm thing to make HDR as default and only path, since it fits better into existing pipeline (0.06ms performance saved on RTX, compare to LDR!) and some VRAM bandwidth savings.

@KirillAlekseeenko let me know what you think about it.

@KirillAlekseeenko
Copy link
Contributor Author

KirillAlekseeenko commented Aug 5, 2024

I wanted to use HDR luminance but encountered cases where HDR is much darker than LDR like on the shots below. So finally decided to stay with LDR. The game was run without cmd args except for -aa, colors on screens are in [0, 1] range
HDR:
HDR
LDR:
LDR

@Try
Copy link
Owner

Try commented Aug 5, 2024

I wanted to use HDR luminance but encountered cases where HDR is much darker

It seems that tonemapping been applied twice or not applied at all on you image.

Here same spot on my working branch (9:00am):
изображение

@KirillAlekseeenko
Copy link
Contributor Author

KirillAlekseeenko commented Aug 6, 2024

There's an input to tonemapping pass and sky pixel history. I didnt look deeply into this but for some reason after sky and fog passes color values are around 0.1. Could I mess up with game settings or something else (maybe some hidden setting)? Capture is done on PR's branch without local changes.
image

@Try
Copy link
Owner

Try commented Aug 6, 2024

I didnt look deeply into this but for some reason after sky and fog passes color values are around 0.1.

Here how sceneLinear (pre tonemapping) look on my end:
изображение

if we divide it by exposure value in a frame = 0.0000088533, we will get sky brightness = 18 072lum, what is fine for day-time, with clear sky.

Can you check if after my changes (HDR path) rendering works fine?

@KirillAlekseeenko
Copy link
Contributor Author

It works fine but there's two-three times less candidate pixels than with LDR pass, maybe it's worth trying adaptive edge detection threshold based on exposure value

@Try
Copy link
Owner

Try commented Aug 7, 2024

maybe it's worth trying adaptive edge detection threshold based on exposure value

Probably no, at least I don't see how it can be better. Right now input is pre-exposed, close to [0..1] range. Maybe raise cmaa2EdgeThreshold; but I'm not sure how to reason about quality then - as it's gonna be subjective.

@KirillAlekseeenko
Copy link
Contributor Author

Seems that there's no need to raise threshold, I just took a look at nsight trace and it's even faster than fxaa

@Try Try merged commit ff2adde into Try:master Aug 8, 2024
1 check was pending
@Try
Copy link
Owner

Try commented Aug 8, 2024

Merged, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants