-
-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rolling Scanline Simulation #16282
Rolling Scanline Simulation #16282
Conversation
…re. This is implemented with a scrolling scissor rect rather than in the shader itself as this is more efficient although may not work for every shader pass - we may need an option to exclude certain passes. The implementation simply divides the screen up by the number of sub frames and then moves the scissor rect down over the screen over the number of sub frames.
Hello, First, as the author of the BFI code (past the initial simple 120hz implementation), and the sub-frames, I want to say I obviously, more than most anyone, appreciate BFI and what it can do in general and am always happy to see further improvements. But, I was sort of hoping the sub-frames feature would allow things to go the other way in the RA configuration side, and to be able to remove some of the hacked in functionality in the driver code, instead of further expanding on it. Perhaps even eventually allowing for the hard-coded BFI implementation to be removed at some point, and just allow it all to be handled more elegantly (if less efficiently) via shaders. Also, regarding rolling-scan in particular, for testing the sub-frame shaders I did build a rolling scan BFI shader that I posted in the programming-shaders a little while back around when it was merged, that functions in nearly this same way (though, again, less cycle efficient of course). How much that cycle efficiency matters should probably be measured. Also, at least for my shader implementation, it was just a bit too crude for me to ever consider using it over the full-frame 'standard' BFI. I was mainly using it just to test that subframes were working in general. The lines where the subframe dividing were just very apparent, it was almost like vsync off tearing, even though there was no vertical shift. It's possible this could be improved with some overlap (with brightness adjustment etc), but that's certainly, in my opinion, a job for shaders to try to fine tune and not on the driver side. Anyway to the RA team in general, just giving my thoughts, I'm not going to be angry if this is merged or anything. Oh one more specific request I have if it is merged, for my use for the shaders, current_subframe starts at 1, not 0, for when subframes aren't in use. It looked like the video_info default you were using was starting at 0, and while I didn't see any direct conflicts with the existing sub-frame code.. having it start from a different count could get confusing. |
@Ophidon oh wow, I totally missed your rolling scan shader. I'll have to check the scrollback. I've been working on a rudimentary one myself using sin() to fade the edges and avoid the tearline effect. It works okay at 120 Hz (which is as high as my monitor goes) but it leaves a big space in the middle that's always non-zero (and a steady 0.5 in the middle), which means half the brightness but no benefit to motionblur. Doing the same thing at 180 Hz should be much nicer since it'll end up with a black bar in the middle on every third subframe. But I digress. As for the current PR, I think having a driver-based option is going to open the effect up to more people, as a shockingly large minority of users apparently eschews shaders altogether for a variety of reasons. I do agree that the subframe numbers should be in concert between the driver method and the shaders, though I planned to discuss whether starting at 0 or 1 is the best idea. I also expected that it would start at 1. I also wondered if we might be best-served by combining the 2 separate subframe uniforms into a single vec2 (or higher) using the swizzle to differentiate, which would also allow for future expansion if we come up with another useful value (e.g., a pre-calculated current subframe divided by total subframes, which I find myself calculating a lot when messing with them). Sorry, I'm veering off-topic again. |
Hi, thanks very much for this implementation. I understand there are some things @Ophidon takes issue with. Are there any things in specific Ophidon can think of that can be done that could convince you that this PR can be merged? |
Hi @Ophidon, hope youre well and thanks for all the various versions of bfi it does help with motion clarity immensely. So Id like to take a step back here as what technically were you hoping to achieve with shader subframes? The issue of interference (Id avoid the word tearing as that, to me at least, is the display of part of one frame and the rest of another) at the sub-frame edges it looks to me to be a display side issue. At least on my display it looks like I get noise at the joint and is probably dependent on the panel being used. I have to investigate this further though but Im not sure you can resolve that particular issue GPU side as it might work for one panel but make matters worse for another. As for implementation Im not sure the argument really holds that shader subframes cleans up the code as youre having to fill out the constant buffers which is just another piece of gpu state that is passed over and really is no different to scissor rect state being set. I do disagree that handling this in the shaders is a more elegant solution though. It just adds more implementation variations for something that really only can be implemented in a very limited number of ways and brute forcing it via shaders just seems wrong to me. EDIT: [STRIKE THROUGH: This statement on further thought is probably not true - I leave here for full transparency and some points still stand] Ultimately any one pixel has to be displayed fully on for one subframe: any deviation from that either ends up with overly bright or dark areas areas and/or adds more motion blur over standard bfi. That essentially leaves you with what Ive done or you do an interlacing type scheme where you divide the screen into more areas vertically and display a number of them per subframe but that just gets you further away from how a crt works and just makes the interference problem worse. [END STRIKE THROUGH]. I do see that scissor rects adds added complication to the driver but I clearly marked it using #defines which a lot of additions dont. I think setting GPU state to do something is just part of what a retroarch driver does and in this case simulating a crt is just part of that. What I will say after all that is that at least with my shader, the sony megatron, the split falls on a black part of the scanline at 120hz and so is hidden and is not a problem. However as you say Im not sure any of this is a benefit over standard bfi - its just technically more accurate to how a crt works. I see this feature as more a stepping stone and really should be done by the display but we're not in control of that and so we have to do all these bfi 'hacks'. |
Ok so after further thought I can see another way that does involve shaders but also to be efficient should use scissor rects: Before I write this @Ophidon I haven't seen your rolling scanline shader implementation as I can't find the shader - is it a slang shader? So apologies if this is just repeating what your implementation does. So above I believe I am probably wrong on saying any one pixel needs to be fully on for one sub frame and fully off in the rest of the sub frames. I can certainly see that as long as the luminance over the frame adds up to 100% you can have any distribution of luminance per frame over a set of sub frames. I'm going to implement the following scheme in the Sony Megatron: again I'm going to divide the screen up vertically by the number of sub frames but instead of doing what is in this pull request I am instead going to take two consecutive subframes and for their particular subframes box I am going to blend linearly from full opacity (at the top to full transparency at the bottom i.e black. Then for the previous frames subframe box I am going to blend linear from full transparency down to full opacity. Thus for every pixel a full 100% luminance is written over the screen with every pixel being lit twice by the two consecutive subframes. This scheme has a major issue though in that we need to retain the previous back buffer to finish off the luminance for the bottom of the previous frame - so more C code for all our drivers (I'm not too concerned about this as long as things are clearly delineated and labelled in the code BUT I'm not having to maintain this project so I can be like that!). There are another few downsides compared to this pull request implementation: a) Ideally we'd still use scissor rects as at least we can keep down the cost to executing the shader twice at high refresh rates which is a high concern for a lot of users. b) Shader authors would probably need to add support for their particular shader - its not a universal solution like this pull request. This could be implemented as a post pass that can be added to any shader preset though so maybe that's not to bad but presets would need to be added/updated and shader parameters added so there's little getting away from this not being truly universal again unlike this pull request. The benefit is that it potentially gets rid of the interference noise and theoretically gets us closer to a rolling scanline that a CRT has. |
Actually we can make this proposal universal! By universal I mean that it works with all shaders without having to modify them and allows users to simply switch it on/off in the main menu. So we can make it universal by adding a post pass automatically like I did for my HDR implementation as in a built in shader. Regardless of it being universal we would (as I said above) have to add support for keeping the previous frames back buffer about for a frame so there's driver work and added complication regardless unless someone has a better idea? You could have this less intrusive pull request as the first implementation and then add this more advance version later on. |
Good Morning, Very thorough analysis. Enough to put me to shame really, hah. I do have some random thoughts at this point though. First when I was thinking about it being 'elegant' trying to just do everything from the shader side, it is mere conceptual stuff that it's a nice flow if: driver takes input image, hands off to shaders for whatever transforms are desired, gets the transformed image back and finishes output. Adding various image transforms, but not others, on the driver side just makes it 'feel' more like spaghetti code to me. This wasn't possible with BFI (or anything else truly temporal) before the sub-frame shaders implementation, so it made sense previously to me there. But, I fully admit the shader side can be significantly less efficient and that I am more immune to the downsides of this with a modern high-end desktop gpu than most. Next, regarding the utility of rolling scan BFI vs the existing implementation. I believe there are actually 2 'real' benefits beyond just the theoretical 'correctness' of doing it the way a CRT does. One possibly is (though this is subjective experience and not truly tested) a significant reduction in eye strain even at the same Hz. The average amount of photons being sent towards your eyes remains relatively level with rolling scan bfi, instead of full image flashes. From my own personal reference, I have a CRT, and IPS screen I use at 180hz, and a new oled at 360hz. In order of the amount of time it takes me to feel any level of eyestrain, the CRT is the longest, the IPS the second, and the oled the shortest, with 'normal' full-frame bfi in use on both of the latter screens. My thoughts on the difference between the IPS and the OLED is that software bfi automatically does a natural level of rolling scan just because that's still how the pixel scanout on the screen is occurring, it's just going at 180/360hz instead of the preferred 60hz that would always keep a section of the screen lit. Also for the IPS, there is markedly longer rise/fall time on the pixels perhaps getting closer to some part of the screen being lit continually, and further the backlight is remaining constant regardless (this is probably the bigger factor). So oled is the screen type that could benefit the very most from a good rolling scan implementation, in my current opinion. The second possible 'real' benefit for rolling scan over full-frame bfi is also mainly for OLED with its lower peak brightness. Rolling scan only lighting up a smaller portion of the screen at a time as Hz gets higher and higher, should allow it to take advantage of peak brightness window limits to compensate for the brightness loss that would otherwise reach unusable levels even in a dark room. I don't know, however, how well the screen algorithms work for such quickly changing lit and completely dark areas, if they lag even a single sub-frame behind, well.... On the last subject of your current implementation change ideas.. needing access to previous frames is actually another point in favor of the shader-side implementation isn't it? Via these that already exist: OriginalHistory#: This accesses the input # frames back in time. There is no limit on #, except larger numbers will consume more VRAM. OriginalHistory0 is an alias for Original, OriginalHistory1 is the previous frame and so on. PassFeedback#: This accesses PassOutput# from the previous frame. Any pass can read the feedback of any feedback, since it is causal. PassFeedback# will typically be aliased to a more readable value. In general yet another point in favor of the shader side in my opinion is, especially if we want to try to hide the rolling scan interference lines in a black area or apply hdr specifically for bfi brightness loss compensation, it is good for the shaders to be able to know if these are enabled or not. Which they easily can if its just part of the preset, but I don't think they can otherwise, unless we send yet more uniforms to flag it.? One more neat thing of shaders having control of BFI is it can only be applied to the 'real' part of the screen when bezels or other borders are in use. Annnnyway, I'm fine with whereever we go from here. Motion clarity getting any coding attention at all, after all the dismal 60hz sample and hold lcd years that it drove me crazy so many people were blind to, is welcome to me. My one real change request remains just making current_subframe start from 1 instead of 0, if you still use that, to match how its being sent to the shaders. :) |
Great points on the additional benefits of a rolling scan - I never thought about the fact it could help with peak brightness because the window size is much smaller. This might be a real win and is already a benefit that can be had with this pull request actually think about it - it might be why it feels brighter on my screen with the Megatron - dont know - I need a colorimeter to tell. So with regards to the shader-side implementation you're talking about, the driver has changes in to support the original history, pass feed back etc so its kind of not true its 'shader only' as such, much like you passing in values via the constant buffer you're still having to change the driver and add support. All we're doing here is adding more scissor rect functionality but maybe we can clean this up a little by setting a single variable that all sites can test for subframes being on and all the other various edge cases that you've caught. I've still got to prove out my theory for the 'next gen' proposal as I need to convince myself that this will work without areas being dim but its even better that I might not have to write too much more to get this working because of the original history (Im not sure pass feedback would be relevant for this situation - I think! not 100% sure). As for the current_subframe sure I can change it to start from 1 but I'm going to have to change the math elsewhere to turn it into a proper index by subtracting 1 off of it - its not too bad admittedly just a bit unintuitive to most coders. What was the reason for starting it at 1 rather than the much more common 0? I do agree both values should match though. |
Admittedly starting at 1 over 0 isn't something I put a -ton- of thought into, but I did some. It started with knowing that, as hunterk mentioned above, the most common thing that would be done with these values on the shader side would be dividing CurrentSubFrame by TotalSubFrames for their ratio. And knowing that, I didn't want TotalSubFrames to ever be 0. I don't know how shaders handle div by 0, so I just did it to be safe. I'm not actually a slang shader expert, I just knew my existing bfi code could be retrofitted to allow shaders to do whatever they wanted with the extra frames. Further, I wanted the default ratio to come to 1, as in 'I need to handle this whole real frame interval right now, not a subsection of it' when either the subframe setting is disabled OR the menu is up/core is in ff, or paused, etc. Thus I made CurrentSubFrame start at 1 as well, so that would be true. |
heh, those are actually really good justifications for starting at 1. I hadn't considered them. |
Yes so when writing a shader I'd normally want to start at the top of the screen on the first sub frame so if 2 subframes then 0 / 2 = 0 and then on the second frame I'd like to start half way down 1 / 2 = 0.5 you can then easily add 1 to the current frame on both to get the bottom part of the area. If you instead start at 1 I'm now essentially getting the bottom of the area. This is fine as we can just subtract 1 from it to get the top its just something most shader writers would find unintuitive- starting from the bottom of the an area/screen and working upwards - its a bit odd is all and will catch people out who aren't expecting it.
So why would TotalSubFrames ever be 0? It should never be set to 0 as that would be no frames at all right?
But when the sub frame setting is disabled surely the total number of frames is 1 and the current frame is 0 and so we're ok in our shader as we're now just working over the total area of the screen from top to bottom with whatever code we've written? Same goes for the menu etc. I'm not sure I'm following the reason here I'm afraid maybe I need to think about it. |
So I just tried a test version of my second proposal by having a whole screen gradient from full brightness down to black and then reversing it in the next subframe and the result is not good. Basically in the center of the screen (when using 2 subframes) where we're in the middle of the gradients for both sub frames (in my test version) you get the darkness of BFI but then suffer the motion blur of a standard screen because there are no totally black pixels frames. Possibly this technique might be good for 180hz screens (ideally 240hz) and above as there would be a totally black pixel for at least one of the sub frames i.e in the middle of the gradient (the middle of a subframe area) it would be 50%, 50%, 0% over the three frames. This has a darkness issue though as essentially out of three frames the pixel is only on for one of them on average. Same goes for 240hz the pixel is only on for one frame out of 4 - essentially you keep needing the screen brightness to be searingly brighter as you go up 60hz steps. As such as it stands this pull request looks about as good as it gets for a rolling scan at 120hz - it offers possibly slightly brighter screen over standard BFI because only half the screen is lit in any one frame but has the down side of the interference at the intersection which is luckily covered by my shader but we're not so lucky in other situations. |
TotalSubFrames could have been 0 if you don't conceptualize the first and therefore 'real' frame to be a member of the following 'sub-frames'. But as doing it the other way, counting that original frame as a sub-frame, led to not needing to worry about div by 0, the choice was easy of course. The same is true, about it just being a matter of conceptualization, for why currentsubframe started at 1. To me, having the ratio of current/total be the 'up to' point made the most sense. So 1/1 = 1 = the whole frame, for the default scenario, made the most sense. And as to the first in series starting at 0 or 1, meh, neither bothers me, so I went with what got me a ratio of 1 instead of 0 as the default for that reason. If it was an actual array index, or something similar, that would have been different and I definitely would have started at 0. If you want me to test things for the rolling scan, I can at 120/180/240/360hz. I dont think I can at 300, as the 360hz 1440p oled uses DSC which apparently disallows creation of custom resolutions, a fact that would have been useful to know before I purchased it. -_- |
heh, yes, this is exactly what I was describing with my stab at it. The darkness of BFI with the motionblur of no BFI 😅 I used sin() to get the gradient so it would hopefully scale gracefully to other refresh rates, but at 120 it's identical to multiplying by texcoord.y (or 1.0 - texcoord.y). |
Regarding my quote above here, keep in mind that directly relating this current/total subframe ratio to a vertical slice is true for rolling scan bfi, but isn't for a lot of other possible uses of sub-frames, so it has to be more generic in meaning. Which was part of the consideration of making the default 'off' values give a ratio of 1 with the meaning of 'handle it all' not just specifically, as in the case of rolling scan bfi, 'to the top of the screen'. I'm hoping there's plenty of sub-frame uses no-one has thought of yet. One thing I'm currently interested in seeing is if, at least for 2d, the sprite and background layer(s) from some cores could be sent separately to the shaders to be able to make 'smarter' motion interpolation/frame generation with considerably less artifacting. I think overall that's quite enough talk about a previous PR instead of this one and a fairly inconsequential 0 vs 1 now though, lol. As for this feature, I am quite willing to test how it looks at higher hz and on various screen types, but if issues with the line(s) remain, I think it's just regrettably an undercooked feature at the moment. And keep in mind I say this as a fan of the concept because of the benefits of it working (well) I mentioned before. I definitely don't think we can rely on being 'lucky' that any given applied shader covers it, especially considering that done this way the shaders currently have no way to know the feature is on, and thus be able to try to consciously adjust for it. And as for the higher hz pixel cycles like 50% 50% 0%, I believe hunterk is correct that you're getting the full brightness reduction downside of BFI without the corresponding clarity, a terrible tradeoff. :/ At 180hz, for instance, you get ~66% brightness reduction and ~66% blur reduction with a 100-0-0 cycle, whereas the 50-50-0 cycle still gets you ~66% brightness reduction, but only ~33% blur reduction. :/ And keep in mind I have updated the 'standard' full-screen bfi fairly recently to have brightness/clarity tradeoff choices now at higher hz as well, so it has an advantage there still too. 180hz can do 100-0-0 or 100-100-0. 240hz can do 100-0-0-0, 100-100-0-0, 100-100-100-0, and so on. |
Ok so Ill change it to start at 1 but youre going to get complaints from others. Its generic and applicable to every situation that 'total' is a count and 'current' is an index - it is an array of sub frames contained inside a frame conceptually. As for my test results yes thats what I said above its not good. Im not sure there are any other options for shaders to take on this as the interference is a display side problem and cant be fixed by a GPU that has no knowledge of how the display works. Its probably a temporal algorithm display side causing it as well. We really are limited in options as my initial statement that I struck through above does in fact appear to be true i.e you have to have pixels fully on or fully off. One last thing to try from my point of view is to see what happens in the worst case scenario for an interlaced scheme i.e instead of one rect per sub frame as we have here we use many. The worst case is then just alternating lines on and off for each sub frame like interlacing of old but within a frame (and I suppose one interlaced frame is black). Id like to see what happens to the interference in this case to fully see what we're up against. One last point is that it feels as if you see this pull request as limiting solutions provided by shader writers. Just to clarify that it doesnt - shader authors can still try to fix the interference issue themselves with this on or off - all we're doing here is limiting the area the shader draws to but its still executed and the authors can still tell where the scissor rect is and where they are drawing to - its just an additional help to them. |
Just to also add I will add support for 180hz doing 100, 100, 0 and 240hz doing 100, 100, 100, 0 etc etc Ill just add an option when 3 subframes and above is used and this rolling simulation is turned on. The other thing to add eventually would be to drive the scissor rect from the .slangp - at first whether its enabled/disabled for a given pass and then to determine where it is and then to determine where it is over subframes. Quite a bit of work though for that... |
Yeah I think we're quite limited on options here - it does feel like bfi is basically the only way from a gpu perspective (or relying on shaders to hide the interference as the megatron does). If someone could give us a programmable display then we'd be cooking! |
I know it isn't limiting anything, I'd have a MUCH bigger issue with merging it if it did. ;) Even in my very first post I said I was ok with it being merged, I just wasn't fully convinced it was necessary to do driver side. But whatever, no harm no foul since, as you say, it wasn't removing any ability to do it the other way. If I've slipped a bit farther from feeling it's mergeable, it's not because I feel you've done any sort of poor job coding it, or even any disagreement over driver/shader side, it's that I'm (regrettably) starting to agree there is indeed perhaps some display side problem making this nearly an unsolveable problem on our end. Full-frame BFI has a well known display side issue with voltage retention on a lot of lcd's at 120hz. But thankfully there is also a rapidly growing segment of displays that don't have the problem at 120hz (oleds) and any higher Hz mulitple than 120hz can also be made to work without issue on lcds. For this rolling scan issue... if there is a display that is immune to it, I haven't found it yet, and I've also now tried on my VA television at 120hz and pulled a TN panel out of the closet to test that too, so that's literally one of every current non-crt display types in use that I've tried it on. So, what I wouldn't want, more than any of the rest, is for the setting to look so poor for the users (either via the interference lines, or a brightness loss that doesn't get commensurate clarity gain in trade), that they just immediately say 'woah, bfi is awful, nobody should ever use that!' I wouldn't be a fan of a nearly pure 'trap' setting being available so to speak. And don't forget even with megatron shader you're testing with that might be luckily hiding it when you only have one line at 120hz, but how about at 360hz like I can run it... that's a lot more joints to keep track of, in different locations. |
Try this for your implementation, maybe without any shader on that is hiding the joint. Keep your screen still and your eyes as still as you can.. for me in that scenario none of the 'interference' is really apparent. Move your eyes up and down rapidly though, and it becomes super apparent. Which maybe means it's not a display thing, so much as an optical stroboscopic effect issue?! CRT doesn't have apparent joints of course with rolling scan, nor have I heard of it with the lg oled tvs that have built in 60hz bfi (which also is rolling scan). So, if that's really what's going on, I hope it just doesn't require literally being down to a line or 2 being 'rolled' at a time to avoid the effect, which would require utterly ridiculous Hz to emulate. |
I was just reading a thread on blurbusters' forum about checkerboard/interlacing/noise BFI and he/they said all of those strategies look worse. I was playing around with monitor-pixel-scale checkboarding and it indeed looked weird in a different way lol. I couldn't tell whether the motion was any better, and to my eyes, it looked choppier (possibly an optical illusion). Similarly, he suggested there's no free lunch when it comes to brightness vs motion blur reduction. 50% brightness reduction (i.e., half on, half off) = 50% blur improvement. 25% brightness reduction (i.e., on, on, on, off) = 25% blur improvement. He mentioned rolling scan as a partial outlier, but that you need blended edges to avoid artifacts (as we've seen/experienced). However, it also seems clear that any area that is not completely black at least part of the time is not really going to get any motion blur improvement. |
Yeah a lot of my existing BFI implementation comes directly from his theory as it is now. I talked directly with him a good bit when doing the initial changes for allowing any multiple of 60hz instead of just 120hz, back in 2020 or whenever it was. And yeah, there is absolutely a direct relationship between full black period and motion clarity, not just reduced brightness. Aka 100-0-0 being heavily superior to 50-50-0 for clarity but equal for avg brightness as I mentioned a bit ago which is a horrendous deal. If avg brightness itself was a factor for clarity, you could just turn down regular sample and hold brightness and we wouldn't need to bother with this annoying flicker at all. :) One thing I question though is how he means to avoid artifacts for rolling scan bfi with blending edges. But that would depend on his definition of 'blending' I suppose. Best case scenario, unless we're missing something, I just see the 'interference' lines getting replaced with small strips of the screen that have lower motion clarity than the surrounding areas via some blending or subframe overlap method. And that lower motion clarity strip would be a reasonably apparent 'artifact' in itself. Slightly more tolerable than the interference lines perhaps... but I dont see who would ever pick it over the existing implementation that has no visual artifacts at all. |
Ok so Ive got a lot to say here so Ill chunk it up a bit: firstly bfi and shader subframes. Lets not merge these two techniques/features in the future. From a user experience perspective they are two separate features in the menu currently. From a technical perspective they should also be separate: the original implementation of BFI is perfect: the first subframe we execute the shader and the subsequent sub frames we simply clear the screen (you could argue we need an option to clear to non zero black but I digress). Clearing the screen is a highly optimised operation and will be much much faster than anything a shader does as it has hardware support (most gpus will not write to the surface but instead set meta data - signalling its been cleared to a value held in a register). This is really important for a number of reasons not least that its more power efficient. This means the battery on your phone doesnt run out as fast and less heat needs to be dissipated. Less heat dissipated means screens on a mobile device can run brighter and cpus and gpus can run faster and more efficiently. The latter being critical for low end devices such as the raspberry pi. The pi5 being a key milestone for retroarch shaders, hdr and bfi but this applies to laptops, phones, consoles etc that retroarch is used on. Efficiency is king and for those same reasons really the new lit bfi feature should copy the first frame rather than execute the entire chain of passes again. But is of less concern atm given fewer low end devices support refresh rates above 120hz - maybe the pi5 is the exception but future proofing is good. Personally I use retroarch on my mobile more than any other device so this kind of stuff is important to me - Im looking into adding 120hz support to retroarch on android that my phone does provide and Id like to use BFI with it. So shader subframes is a different feature to me as its a different implementation as has been done so we're good just lets kill the idea of potentially merging it and bfi. Ill cover the meat of above later on when I get a moment. |
Is that Android using an oled screen? Otherwise you are fairly likely (but not guaranteed according to some reports, it's all down to how the screens algorithm works) to run into the 120Hz voltage retention issue. If you are talking about a 120hz lcd, there is a known solution to combat that voltage retention as well. I haven't officially implemented it, because it DOES cause visual artifacting of it's own at a level I find too annoying for it to be a 'true' solution, but that can be subjective. What you have to do, is at some user defined rate that will be before the image retention is noticeable (which for my screen was around every 20s) hold either an 'on' or 'off' frame for a double beat (which will cause a quite noticeable intentional flicker) . So the injected stutter pattern would go like on-off-on-off-off-on-off-on-off. I played around with it, but I also just felt it was close to a 'trap' setting like rolling-scan would be without better solutions to its issues. Anyone who truly cares about BFI, imho would be better served getting a screen where that issue just doesn't exist, than dealing with that hacky half-solution. Which should be possible in the mobile space too, thanks to oled. If you want to implement it yourself though, feel free. Only as an -option- at 120hz of course, not forced, since not all screens are affected. Technically 240hz doing on-off-off-off is (somewhat less) susceptible too, but that can be handled just by using 240hz at on-on-off-off instead (which it will default to now with the dark frames settings). All odd multiples like 180hz, 300hz are completely immune at any bfi setting. |
Yes, this is my hypothesis, as well. Unless all pixels are full-black for at least some amount of time, the ones that just get turned down will have the same reduced brightness but no blur benefit, and if you blend the edges beyond that full-black area, those blended areas will be that much darker over time. So yeah, I'm not really sure how it's supposed to be superior, unless the reduced-brightness-with-no-blur-benefit strips are just considered the cost of doing business. |
Ok so I've been doing some more experimenting and I think I have some kind of a solution but I've still got to fully crack the problem - I can get it to work in specific scenarios but need to do more tests for a general solution. I'm a bit hampered by my display not liking 120hz at all. Anyway bear with me. |
Sorry been away and havent had time to experiment further over the weekend but my experiment was essentially how much of a strip you need to hide the 'interference' and in my limited case of using the megatron I only needed a single pixel either side of the dividing line so not much. We can easily combine this with aligning the spilt to the dark lines of scanlines - we always know its upscaling and we always know where the points between scanlines are. We dont know however if the shader will have dip in luminance between scanlines. But a drop in motion clarity for two pixels might not be that bad. One big issue im finding is that shader subframes for some reason looks to be causing a split in that the top half of the screen lags behind the bottom half. This is really noticeable on my favourite motion blur scenario: 1st level of Dynamite Heady on the Megadrive. Can someone else see if they can reproduce this effect? Its really strange and might be a bug my end. |
Problem with hiding it behind shader scanlines is one you need to do it for all screen resolution and frequency and shader combos, and two not everybody uses or prefers crt style shaders anyway. And even for those that do (which is me) I'm actually somewhat reluctant to use it on the oled screen for uneven wear precaution reasons. As for the top and bottom split you refer to.. that's why I originally said my rolling scan bfi implementation looked kinda like vsync off, which I certainly didn't have it off (nor would I recommend at all for subframe use). I tested that displaying -only- any given subframe, and it displayed the correct portion of the screen. So at least for my implementation, at up to 360hz, it was correctly starting from the top, and ending at the bottom for any given real frame, and if it wasn't displaying the correct number of subframes for each 'real' frame, the emulation speed would be way off like if you set subframes for a hz that doesn't match your actual display hz. If you just mean subframes being on in general though, outside of trying to implement rolling scan bfi, no, my normal full frame bfi implementation works with zero visual artifacts through subframes as far as I can see. And without any subframe aware shader active, subframes being active are just functioning essentially as a higher swap interval with more overhead (not in vulkan though, vulkan already used a very close equivalent of subframes to emulate swap interval as apparently vulkan doesn't support real swap interval), but with no artifacts either, for me. Make certain that you do have vsync on in RA and at least not force disabled in your os settings, and that RA is configured to the correct refresh rate your screen is actually running at, and that the subframe setting level you chose is correct for that Hz. |
Hi ok so after a trip away and a bout of COVID I managed to get time to take a look at this again. So I think the whole rolling scanline thing - implemented GPU side is a no go. The reason: there is an optical illusion where we get a distinct tear through the middle of the screen when scrolling. This isnt the 'interference' around the split line although could be related. A good example of what I'm talking about is Dynamite Headdy on the Megadrive first stage and look at the trees/bushes in the foreground (but it happens in all scrolling games or any movement if its fast enough). If you either implement a shader that simply displays the top half of the screen on one sub frame and the bottom half on the other OR just turn on this PR (and turn off shaders) then you can see the obvious tear as if the top half is ahead of the bottom half. If I go into PIX for windows and capture multiple frames I can see that there is no tear and the frames are rendered how I would imagine them to be but I definitely perceive a tear. I think this is an optical illusion and there isn't any getting around it as I cant see it in the characters - Dynamite Headdy or the big red robot that aren't moving relative to the screen and so dont have motion blur. I think what is happening is the eye is seeing the blur split in two but I'm not sure. You can mitigate this issue with a big thick transition bar of about 20 pixels in size as the blur covers the tears but you can kind of see it transitioning still (like a gradient). I currently think the only way to implement a rolling scanline is do it display side and have it much more like how a CRT does it with continuous scan down the screen within a frame. I dont think any attainable hz is going to do it - 1000hz etc. Possibly 240x60hz (14400hz) might i.e a single scanline for every frame at 60hz. It'd be good to see if anybody else can repro this or has seen it themselves and this isn't some quirk of my setup - you can just use this PR or write a very simple shader (I can post one here if needs be). |
Overall I think the whole shader subframe feature is a feature without any utility currently. If you cant do scanline simulation of any sort without major artifacts then what else would this feature be used for? The only other thing that is in the vicinity of this is interlacing but that doesnt require sub frame functionality as the console outputs it via whole frames. It might be an idea to at the very least hide the option behind developer mode or something as its just another option for end users to get confused about unless there is some utility to it. |
I HIGHLY disagree that subframes are 'without utility'. Frame motion interpolation would be a gargantuan use case when smartly implemented. I actually have an idea how it can be almost as good for 2d games as dlss3 is for modern games that hand off motion vectors, by sending the sprite and background layer(s) separately to the shaders as possible. Accomplishing that though will probably require someone other than me. For comparison Lossless Scaling as available through steam can apply motion interpolation up to 120hz for 60hz content right now for a fairly minimal input lag increase, and it's more useable without the advantages of actual dlss3 than I ever would have thought. However, it actually works better, with less artifacting, in modern games than it does for older simpler games via RA. An algorithm designed specifically for older games that could be implemented straight to RA, and also with the ability to split layers? That could produce some pretty incredible results. There are other uses even with just regular full frame BFI where it can be applied to just the 'real' part of the frame when using borders or bezels, that can't be done with the bfi implementation. Doing a 120hz voltage retention safe version of BFI with intentional occasional stutter, that I discussed earlier, is also a good use case. As having that feature more 'hidden' through needing to activate the appropriate shader IS probably better than having a front-end option that causes intentional, annoying, artifacting when activated. Increasing the framerate of the bezel reflections on corresponding shaders for people with the horsepower to spare would be a smaller use. And, also very importantly, it is entirely possible someone else will think of some other interesting use we haven't even considered yet. But if it's hidden behind dev options to the point even most shader authors wont remember it exists, let alone end users, odds of any of those being explored become minimal. |
Yeah, I think there are other uses. I haven't tried it yet, but I suspect having noise that ticks at a higher than 60 Hz rate could be good for a lot of stuff. |
@Ophidon Can we go ahead and merge this at this point? I'm fine with any further discussions on this subject but I'm not sure if they should hold back the merging of this PR as it currently stands. |
I am fine whether it is or isn't. I believe, and @MajorPainTheCactus can correct me if I'm wrong though, as it currently stands we're all in agreement there are too significant of visual artifacting flaws with emulated rolling-scan bfi, whether implemented via this PR, or via a shader, to be an actually useful feature. The current consensus is that it's also very likely a hardware limitation that can't be overcome without reaching utterly insane Hz (aka not even a possible 1khz we might reach within the decade would do it). Maybe the BlurBusters guy could assist with something we've overlooked. It's aggravating because if the issues could be solved I think we also all agree it would have some nice benefits over the 'standard' full-frame BFI as it works now. But I'll leave any decision to merge or not, with the perhaps unsolveable flaws, up to you and @MajorPainTheCactus. |
Well I think the idea is that we can always build upon this in future PRs. It doesn't have to be perfect on the first go. After merging this PR, let's start a new Github issue where we bring in all the stakeholders (BlurBusters, you, MajorPainTheCactus) so we can discuss how to further improve it. |
Created a new Github issue so discussion can be furthered as to where we go from here. I think it can be important to just continue to improve this in steps so that we can prevent a stalemate. |
What voltage retention issue are you talking about? Strobing is generally unsafe on LCD displays and it can cause semi permanent damage (persistent strobing across cold boots etc). i thought before we should add a big fat warning and/or only allow it for OLED/CRT |
Yes, you're talking about the same issue. It is caused by voltage accumulation from all the real frames being on a + voltage phase of the refresh rate, while all the black frames are done on the - phase (or vice versa). I've never experienced or heard of a case where it didn't go away after a relatively short while of returning to normal use, but yes it rightly freaks people out. Blurbusters definitely gets all the credit for understanding and explaining that was what was causing the image retention on a display type that by all rights should be immune to such things. It is not, however, 'generally' unsafe for LCDs since we know the cause and also how to avoid it. I have used 180hz+ strobing for thousands of hours on multiple LCD screens without the slightest hint of any image retention or any other issues. I was also a little curious at one point if turning pixels on and off rapidly non-stop like that for such a long timeframe could lead to a higher incidence of stuck pixels, but I have seen zero such evidence in again, thousands of hours. All odd hz multiples are immune to the image retention by default because the displayed frame keeps switching which phase it is using. On(+)/Off(-)/(Off+)/On(-)/Off(+)/Off(-) for 180hz, etc. On the other hand, 240hz and other even multiples are a little more susceptible in theory. But one, not nearly as much as 120hz in the worst case, as every real frame might be on + but there is also a black frame on + between them. And two, after the last BFI updates that added the ability to adjust the number of on/off frames ostensibly for clarity/brightness tradeoff selection, there are adjustable settings (that it defaults to) that will also be fully immune at those hz rates. Ie On(+)/On(-)/Off(+)/Off(-) for 240hz. What I was mentioning in what you quoted is a known solution for the problem even at 120hz. But it requires intentionally stuttering the output occasionally (in my experience for my screens every 20s or so) so that the on and off frames switch phases. But since doing that with bfi causes a very noticeable flicker... it's a suboptimal solution. I'd really just recommend using a different screen that doesn't need that workaround if you really care about motion clarity. But adding a shader that can do a phase shift for those who don't mind the flicker, sure, why not. |
Only seeing this recently. BTW, in fast horizontal scrolling, there can be tearing artifacts with rolling-scan. You need motion sufficiently fast (about 8 retropixels/frame or faster, which produces 2-pixel-offsets for 4-segment sharp-boundary rolling scan). This is fixed via using alphablend overlaps. However, gamma-correcting the overlaps can be challenging so that all pixels emit the same number of photons is challenging. And to put fadebehind effects (so that a short-shutter photo of rolling scan looks more similar to a short-shutter photo of a CRT). And even LCD GtG distorts the alphablend overlaps. So alphablend overlaps works best on OLEDs of a known gamma (and doing gamma correction, and disabling ABL). For LCD, sharp-boundary rolling scan is better (and tolerating the tearing artifacts during fast platformers). Now that said, we know many HDR OLEDs can become brighter if only a few pixels are illuminated per refresh cycle at a time. Then again we have to keep ABL enabled for HDR brightness boosting, because you can convert SDR into HDR, brightness boost the pixels in HDR luminances, and use the 25% window size to make rolling-scan strobe much brighter. This improves a lot if you use 8-segment rolling scan (60fps at 480Hz OLED) to reduce HDR window size per refresh cycle, allowing HDR OLED to have a much brighter picture during rolling BFI! Also, I have a TestUFO version of rolling scan BFI under development that actually simulates the behavior of a CRT beam more accurately (including phosphor fadebehind effect). Related: #10757
I spent years trying to convince display manufacturers of this. The beauty of this is that we DIY our own open source version. "Just bring refresh rate", removes display tech pre-requisites. It's display technology agnostic (generally, within reason). This is the future of software BFI. You have no choice. Therefore, it's an excellent choice, and not a poor choice. Also, the engineering time for an open source universal CRT beam simulator (while high), is far less than the engineering hours at 10 display manufacturers creating 10 independent rolling implementations. And if we had an open source implementation, more display manufacturers may try to incorporate it into their firmwares. Some of us know more about BFI than engineers working on generic Chinese outsourced panels, as an example. It is our humble responsibility. Some algorithms I release are MIT/Apache because I want to incubate both GPL projects and commercial projects. |
The nice thing is we know about workarounds nowadays, some of which you already have seen.
For odd-divisor: You can even do it at VRR refresh rates too, e.g. 3 refresh cycle cadence for 60fps at 360Hz, by using two VRR-black-frames between refresh cycles. Due to the laws of physics of VRR displays, you must have a minimum refresh rate of 3x the frame rate. But after that, the MaxHz can be arbitrary and allow you a continuous analog adjustment of variable persistence (that is based on your VRR refresh rate range). Basically, it's still a fixed framerate, except VRR is simply used as a mechanism to have the black-frametimes at an arbitrary number of milliseconds versus visible-frametimes. (The total frametimes of all 3 refreshtime frametimes would still be 1/60sec for 60Hz emulators). That avoids image retention for all 60fps content for ALL 180Hz+ displays (continuum: 181, 182, 183, [...], 239, 240, 241, [...], 299, 300, 301Hz, etc). VRR-BFI means you can do 60fps BFI immune to image retention on ALL 180Hz+ VRR displays (Even if they're 240Hz evenly divisible by 60fps) So the odd-count of refresh cycles solves that, despite the MaxHz being evenly divisible by framerate. The problem is the need for microsecond-accurate VRR frametimes, since a 1% aberration = 1% brightness flicker. So software VRR-BFI is very prone to software-based slight flickers, so put that specific Present() of a prerendered black frame (prerendered! no GPU commands) in a REALTIME thread, for minimum BFI flicker during software VRR-BFI. You can even adjust the amount of display motion blur during VRR-BFI by adjusting the "Visible:black" frametime ratios, much like TestUFO Variable-Persistence BFI, except it's not a digital adjustment -- VRR frametime = analog persistence adjustment! Now that being said, I would skip Blur Busters VRR-BFI trick (unless it's easy to test), and focus on rolling BFI more (because you already have a commit here), since brute refresh rates really are nice. Just make sure you have configurable rolling segmentcount, for adjustable persistence:brightness tradeoff. Much like a slow-persistence CRT that ghosts more versus fast-persistence CRT that ghosts less. |
…re. This is implemented with a scrolling scissor rect rather than in the shader itself as this is more efficient although may not work for every shader pass - we may need an option to exclude certain passes. The implementation simply divides the screen up by the number of sub frames and then moves the scissor rect down over the screen over the number of sub frames. (libretro#16282)
Description
Added rolling scan line simulation based on the shader subframe feature. This is implemented with a scrolling scissor rect rather than in the shader itself as this is more efficient although may not work for every shader pass - we may need an option to exclude certain passes. The implementation simply divides the screen up by the number of sub frames and then moves the scissor rect down over the screen over the number of sub frames. The higher the refresh rate the more accurate the scanline simulation. Implementing a rolling scanline on the GPU is a really poor implementation choice and should instead be done on the display itself as an entire image needs to passed over the cable for every subframe BUT we have no control over displays so this is the next best thing.