-
-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Beam Racing/Scanline Sync to RetroArch (aka Lagless VSYNC) #6984
Comments
Additional timesaver notes: General Best Practices Debugging raster problems can be frustrating, so here's knowledge by myself/Calamity/Toni Wilen/Unwinder/etc. These are big timesaver tips:
Hopefully these best practices reduce the amount of hairpulling during frameslice beamracing. Special Notes
|
$120 Funds Now Added to BountySourceAdded $120 BountySource -- permanent funds -- no expiry. https://www.bountysource.com/issues/60853960-lagless-vsync-support-add-beam-racing-to-retroarch Trigger for BountySource completion:
Minimum refresh rate required: Native refresh rate. Emulator Support Preferances: Preferably including either NES and/or VICE Commodore 64, but you can choose any two emulators that is easiest to add beam-racing to). Notes: GSYNC/FreeSync compatible beam racing is nice (works in WinUAE) but not required for this BountySource award; can be a stretch goal later. Must support at least native refresh rate (e.g. 50Hz, 60Hz) but would be a bonus to also support multiple thereof (e.g. 100Hz or 120Hz) -- as explained, this is done via automatically cherrypicking which refresh cycles to beamrace (WinUAE style algorithm or another mutually agreed algorithm). Effort AssessmentAssessment is that Item 1 will probably require about a thousand-ish lines of code, while item 3 (modification to individual emulator modules) can be as little as 10 lines or thereabouts. 99% of the beam racing is already implemented by most 8-bit and 16-bit emulators and emulator modules, it's simply the missing 1% (sync between emuraster and realraster) that is somewhat 'complicated' to grasp. The goal is to simplify and centallize as much complexity of the beam racing centrally as possible, and minimize emulator-module work as much as possible -- and achieve original-machine latencies (e.g. software emulator with virtually identical latencies as an original machine) which has already been successfully achieved with this technique. Most of the complexity is probably testing/debugging the multiple platforms. It's Easier Than Expected. Learning/Debugging Is The Hard PartTony of WinUAE said it was easier than expected. It's simply learning that's hard. 90% of your work will be learning how to realtime-beamrace a modern GPU. 10% of your time coding. Few people BountySource Donation Dollar Match Thru the $360 LevelDOLLAR MATCH CHALLENGE -- Until End of September 2018 I will match dollar-for-dollar all additional donations by other users up to another $120. Growing my original donation to $240 in addition to $120 other people's donations = $360 BountySource! EDIT: Dollar match maxed out 2018/07/17 -- I've donated $360 |
How could this possibly be done reliably on desktop OSes (non-hard-realtime) where scheduling latency is random? |
See above. It's already in an emulator. It's already successfully achieved. That's not a problem thanks to the jittermargin technique. Look closely at the labels in Frame 3. As long as the Present() occurs with a tearline inside that region, there is NO TEARING, because it's a duplicate frameslice at the screen region that's currently scanning-out onto the video cable. (As people already know, a static image never has a tearline -- tearlines only occurs with images in motion). The jitter margin technique works, is proven, is already implemented, and is already in a downloadable emulator, if you wish to see for yourself. In addition, I also have video proof below: Remember, I am the inventor of TestUFO.com and founder of BlurBusters.comIf you've seen the UFO or similar tests in any website (RTings, TFTCentral, PCMonitors, etc, they are likely using one of my display-testing inventions that I've got a peer-reviewed conference paper with NIST.gov, NOKIA, Keltek. So my reputation precedes me, and now that out of the way: As a result, I know what I am talking about.You can adjust the jittermargin to give as much as 16.7ms of error margin (Item 9 of Best Practices above). Error margin with zero artifacts is achieved via jittermargin (duplicate frameslice = no tearline). Some videos I've created of my Tearline Jedi Demo -- Here's YouTube video proof of stable rasters on GeForces/Radeons: And the world's first real-raster cross platform Kefrens Bars demo (8000 frameslices per second -- 8000 tearlines per second -- way overkill for an emulator -- 100 tearlines per refresh cycle with 1-pixel-row framebuffers stretched vertically between tearlines. I also intentionally glitch it at the end by moving around a window; demonstrating GPU-processing scheduling interference). Now, its much more forgiving for emulators because the tearlines (That you see in this) is all hidden in the jittermargin technique. Duplicate refresh cycles (and duplicate frameslices / scanline areas) have no tearlines. You just make sure that the emulator raster stays ahead of real raster, and frameslice new slices onto the screen in between the emuraster & realraster. As long as you keep adding frameslices ahead of realraster -- no artifacts or tearing shows up. Common beam racing margins with WinUAE successfully is approximately 20-25 scanlines during 10 frameslice operation in WinUAE emulator. So the margin can safely jitter (computer performance problems) without artifacts. If you use 10 frameslices (1/10th screen height) -- at 60Hz for 240p, that's approximately a 1.67ms jitter margin -- most newer computers can handle that just fine. You can easily increase jitter margin to almost a full refresh cycle by adding distance between realraster & emuraster -- to give you more time to add new frameslices in between. And even if there was a 1-frame mis-performance, (e.g. computer freeze), the only artifact is a brief sudden reappearance of tearing before it disappears. Also, Check the 360-degree jittermargin technique as part of Step 9 and 14 of Best Practices, that can massively expand the jitter margin to a full wraparound refresh cycle's worth:
AND
And single-refresh-cycle beam racing mis-sync artifacts are not really objectionable (an instantaneous one-refresh-cycle reappearance of a tearline that disappears when the beam racing "catches up" and goes back to the jitter margin tolerances.) 240p scaled onto 1080p is roughly 4.5 real scanlines per 1 emulator scanline. Obviously, the real raster "Register" will increment scan line number roughly 4.5 times faster. But as you have seen, Tearline Jedi successfully beam-races a Radeon/GeForce on both PC/Mac without a raster register simply by using existing precision counter offsets. Sure, there's 1-scanline jittering as seen in YouTube video. But tearing never shows in emulators because that's 100% fully hidden in the jittermargin technique making it 100% artifactless even if it is 1ms ahead or 1ms behind (If you've configured those beam racing tolerances for example -- can be made an adjustable slider -- tighter for super fast more-realtime systems -- a looser for slower/older systems). But we're only worried about screen-height distance between the two. We need to merely simply make sure the emuraster is at least 1 frameslice (or more) below the realraster, relative-screen-height-wise -- and we can continue adding VSYNC OFF frameslices in between emu raster and real raster -- creating a tearingless VSYNC OFF mode, because the framebuffer swap (Present() or glutSwapBuffers()) is a duplicate screen area, no pixels changed, so no tearline is visible. It's conceptually easy to understand once you have the "Eureka" moment. There's already high speed video proof of sub-frame latencies (same-frame-response) achieved with this technique. e.g. mid-screen input reads for bottom-of-screen reactions are possible, replicating original's machine latency (to an error margin of one frameslice). As you can see, the (intentionally-visible) rasters in the earlier videos are so stable and falls within common jittermargin sizes (for intentionally-invisible tearlines). With this, you create a (16.7ms - 1.67ms = 15ms jitter margin). That means with 10 frameslices with the refresh-cycle-wraparound jitter margin technique -- your beamracing can go too fast or too slow in a much wider and much safer 15ms range. Today, Windows scheduling is sub-1ms and PI schecduling is sub-4ms, so it's not a problem. The necessary accuracy to do realworld beamracing happened 8-to-10 years ago already. Yes, nobody really did it for emulators because it took someone to apply all the techniques together (1) Understanding how to beamrace a GPU, (2) Initially understanding the low level black box of Present()-to-Photons at least to the video output port signal level. (3) Understanding the techniques to make it very forgiving, and (4) Experience with 8-bit era raster interrupts. In tests, WinUAE beam racing actually worked on a year-2010 desktop with an older GPU, at lower frameslice granularities -- someone also posted screenshots of an older Intel 4000-series GPU laptop in the WinUAE beamracing thread. Zero artifacts, looked perfectly like VSYNC ON but virtually lagless (well -- one frameslice's worth of lag). Your question is understandable, but the fantastic new knowledge we all now have, now compensates totally for it -- a desktop with a GeForce GTX Titan about ~100x the accuracy margin needed for sub-refresh-latency frameslice beam racing. So as a reminder, the accuracy requirements necessary to pull off this technical feat, already occured 8-to-10 years ago and the WinUAE emulator successfully is beamracing on an 8-year-old computer today in tests. I implore you to reread our research (especially the 18-point Best Practices), watch the videos, and view the links, to understand that it is actually quite forgiving thanks to the jittermargin technique. (Bet you are surprised to learn that we are already so far past the rubicon necessary for this reliable accuracy, as long as the Best Practices are followed.) |
BountySource now $140Someone added $10, so I also added $10. NOTE: I am currently dollar-matching donations (thru the $360 level) until end of September. Contribute to the pot: https://www.bountysource.com/issues/60853960-lagless-vsync-support-add-beam-racing-to-retroarch |
BountySource now $200Twinphalex added $30, so I also added $30. |
$850 on BountySourceWow! bparker06 just generously donated $650 to turn this into an $850 bounty (bparker06, if you're reading this, reach out to me, will you? -- mark@blurbusters.com -- And to reconfirm you were previously aware that I'm currently dollar-matching only up to the BountySource $360 commitment -- Thanks!) |
Now $1050 BountySourceI've topped up; and have donated $360 totalled This is now number 32 biggest pot on BountySource.com at the moment! |
So..... since this is getting to be serious territory, I might as well post multiple references that may be of interest, to help jumpstart any developers who may want to begin working on this: Useful LinksVideos of GroovyMAME lagless VSYNC experiment by Calamity: Screenshots of WinUAE lagless VSYNC running on a laptop with Intel GPU: Corresponding (older) Blur Busters Forums thread: Corresponding LibRetro lag investigation thread (Beginning at post #628 onwards): The color filtered frame slice debug mode (found in WinUAE, plus the GroovyMAME patch) is a good validation method of realtimeness -- visually seeing how close your realraster is to emuraster -- I recommend adding this debugging technique to the RetroArch beam racing module to assist in debugging beam racing. Minimum Pre-Requisites for Cross-Platform Beam RacingAs a reminder, our research has successfully simplified the minimum system requirements for cross-platform beam racing to just simply the following three items:
If you can meet (1) and (2) and (3) then no raster register is required. VSYNC OFF tearlines are just rasters, and can be "reliably-enough" controlled (when following 18-point Best Practices list above) simply as precision timed Present() or glutSwapBuffers() as precision-time-offsets from a VSYNC timestamp, corresponding to predicted scanout position. Quick Reference Of Available VSYNC timestamping APIsWhile mentioned earlier, I'll resummarize compactly: These "VSYNC timestamp" APIs have suitable accuracies for the "raster-register-less" cross platform beam racing technique. Make sure to filter any timestamp errors and freezes (missed vsyncs) -- see Best Practices above.
If you 100% strictly focus on the VSYNC timestamp technique, these may be among the only #ifdefs that you need. Other workarounds for getting VSYNC timestamps in VSYNC OFF modeAs tearlines are just rasters, it's good to know all the relevant APIs if need be. These are optional, but may serve as useful fallbacks, if need be (be sure to read Best Practices, e.g. expensiveness of API calls that we've discovered, and some mitigation techniques that we've discovered). Be noted, it is necessary to use VSYNC OFF to use beam raced frameslicing. All known platforms (PC, Mac, Linux, Android) have methods that can access VSYNC OFF. On some platforms, this may interfere with your ability to get VSYNC timestamps. As a workaround you may have to instead poll the "In VBlank" flag (or busyloop in a separate thread for the bit-state change, and timestamp immediately after) -- in order to get VSYNC timestamps while in VSYNC OFF mode. Here are alternative APIs that helps you work around this, if absolutely necessary.
Currently, it seems implementations of get_vblank_timestamp() tend to call drm_calc_vbltimestamp_from_scanoutpos() so you may not need to do this. However, this additional information is provided to help speed up your research when developing for this bounty. |
As you remember, retro_set_raster_poll is supposed to be called every time after an emulator module plots a scanline to its internal framebuffer. retro_set_raster_poll API proposalAs written earlier, basically retro_set_raster_poll (if added) simply allows the central RetroArch screen rendering code to optionally an "early peek" at the incompletely-rendered offscreen emulator buffer, every time the emulator modules plots a new scanline. That allows the central code to beam-race scanlines (whether tightly or loosely, coarsely or ultra-zero-latency realtimeness, etc) onto the screen. It is not limited to frameslice beamracing. By centralizing it into a generic API, the central code (future implementations) can decide how its wants to realtime-stream scanlines onto the screens (bypassing preframebuffering). This maximizes future flexibility.
The bounty doesn't even ask you to implement all of this. Just 1 technique per 3 platforms (one for PC, one for Mac, one for Linux). The API simply provides flexibility to add other beamracing workflows later. VSYNC OFF frameslicing (essentially tearingless VSYNC OFF / lagless VSYNC ON) is the easiest way to achieve. Each approach has their pros/cons. Some are very forgiving, some are very platform specific, some are ultra-low-lag, and some work on really old machines. I simply suggest VSYNC OFF frameslice beamracing because that can be implemented in exactly the same way on Windows+Mac+Linux, so is the easiest. But one realizes there's a lot of flexibility. The proposed retro_set_raster_poll API call would be called at roughly the horizontal scanrate (excluding VBI scanlines). Which means for 480p, that API call would be called almost ~31,500 times per second. Or 240p that API would be called almost ~15000 times per second. While high -- the good news is that this isn't a problem because most API calls would be an immediate return for coarse frameslicing. For example, WinUAE defaults at 10 frameslices per refresh cycle, 600 frameslices per second. So retro_set_raster_poll would simply do nothing (return immediately) until 1/10th of a screen height's worth of emulator scanlines are built up. And then will execute. So out of all those tens of thousands of retro_set_raster_poll calls, only 600 would be 'expensive' if RetroArch is globally configured to be limited to 10-frameslice-per-refresh beam racing (1/10th screen lag due to beam chase distance between emuraster + realraster). The rest of the calls would simply be immediate returns (e.g. not a framesliceful built up yet). Some emulator modules only need roughly 10 lines of modificationThe complexity is centralized. The emulator module is simply modified (hopefully as little as 10 line modification for the easiest emulator modules, such as NES) to call retro_set_raster_poll on all platforms. The beam racing complexity is all hidden centrally. Nearly all 8-bit and 16-bit emulator modules already beamrace into their own internal framebuffers. Those are the 'easy' ones to add the retro_set_raster_poll API. So those would be easy. The bounty only needs 2 emulators to be implemented. The central would decide how to beam race obviously (but frameslice beam racing would be the most crossplatform method, but it doesn't have to be the only method). Platform doesn't support it yet? Automatically disable beamracing (return immediately from retro_set_raster_poll). Screen rotation doesn't match emulator scan direction? Ditto, return immediately too. Whatever code a platform has implemented for beam racing synchronization (emuraster to realraster), it can be hidden centrally. That's what part of bounty also pays for: Add the generic crossplatform API call so the rest of us can have fun adding various kinds of beam-racing possibilities that are appropriate for specific platforms. Obviously, the initial 3 platforms need to be supported (One for Windows, one for Mac, and one for Linux) but the fact that an API gets added, means additional platforms can be later supported. The emulators aren't responsible for handling that complexity at all -- from a quick glance, it is only a ~10 line change to NES, for example. No #ifdefs needed in emulator modules! Instead, most of the beam racing sync complexity is centrallized. |
Would the behavior need to be adjusted for emulators that output interlaced content momentarily? The SNES can switch from interlaced output to progressive during a vblank. Both NTSC and PAL are actually interlaced signals and the console is usually just rendering even lines (or is it odd lines? I don't recall now) using a technique commonly referred to double-strike. |
I don't see why that would matter, the only requirement here is that the core can be run on a per scanline basis, and that the vertical refresh rate is constant and close to the monitor rate. |
I'm still wrapping my head around it, but yeah, now I see it. Interlaced content would be handled internally by the emulator as it already does. |
About Interlaced ModesNo, behaviour doesn't need to be adjusted for interlaced. Interlaced is still 60 temporal images per second, basically half-fields spaced 1/60 sec apart. Conceptually, it's like frames that contains only odd scanlines, then a frame containing only even scanlines Conceptually, you can think of interlaced 480i as the following: T+0/60sec = the 240 odd scanlines Etc. Since interlaced was designed in the analog era where scanlines can be arbitrarily vertically positioned anywhere on a CRT tube -- 8-bit-era computer/console makers found a creative way to simply overlap the even/odd scanlines instead of offset them (between each other) -- via a minor TV signal timing modification -- creating a 240p mode out of 480i. But 240p and 480i still contains exactly 60 temporal images of 240 scanlines apiece, regardless. Note: With VBI, it is sometimes called "525i" instead of "480i" Terminologically, 480i was often called "30 frames per second" but NTSC/PAL temporal resolution was always permanently 60 fullscreen's worth of scanouts per second, regardless of interlaced or progressive. "Frame" terminology is when one cycle of full (static-image) resolution is built up. However, motion resolution was always 60, since you can display a completely different image in the second field of 480i -- and Sports/Soap operas always did that (60 temporal images per second since ~1930s). Deinterlacers may use historical information (the past few fields) to "enhance" the current field (i.e. converting 480i into 480p). Often, "bob" deinterlace are beam racing friendly. For advanced deinterlacing algorithms, what may be displayed is an input-lagged result (e.g. lookforward deinterlacer that displays the intermediate middle combined result of a 3-frame or 5-frame history -- adding 1 frame or 2 frames lag). Beam racing this will still have a lagged result like any good deinterlacer may have, albiet with slightly less lag (up to 1 frame less lag). Now, if there's no deinterlacing done (e.g. original interlacing preserved to output) then deinterlacing lag (for lookforward+lookbackward deinterlacers) isn't applicable here. Emulators typically generally handle 480i as 60 framebuffers per second. That's the proper way to do it, anyway -- whether you do simple bob deinterlace, or any advanced deinterlace algorithms. I used to work in the home theater industry, being the moderator of the AVSFORUM Home Theater Computers forums, and have worked with vendors (including working for RUNCO as a consultant) on their video processor & scaler products. So I understand my "i" and "p" stuff... If all these concepts this is too complicated, just add it as an additional condition to automatically disable beam racing ("If in interlaced mode instead of progressive mode, disable the laggy deinterlacer or disable beam racing"). Most retro consoles used 240p instead of 480i. Even NTSC 480i (real interlacing) is often handled as 60 framebuffers per second in an emulator, even if some sources used to call it "480i/30" (two temporal fields per frame, offset 1/60sec apart). Note: One can simply visually seamlessly enter/exit beamracing on the fly (in real time) -- there might be one tiny microstutter during the enter/exit (1/60sec lag increase-decrease) but that's an acceptable penalty during, say, a screen rotation or a video mode change (most screens take time to catch up in mode changes anyway). This is accomplished by using one VBI-synchronized full buffer Present()s per refresh (software-based VBI synchronization) instead of mid-frame Present()s (true beam racing). e.g. during screen rotation when scanout directions diverge (realworld vs emu scanout) but could include the entering/exiting interlaced mode in the SNES module, if SNES module is chosen to be the two first modules to support beam racing as part of the bounty requirements. Remember, you only need to support two emulator modules to claim the bounty. If you choose an SNES module as part of the bounty, then the SNES module would still count towards the bounty even if beamracing was automatically disabled during interlaced mode (if too complex to wrap your head around it). For simplicity, supporting beam racing during interlaced modes is not a mandatory requirement for claiming this bounty -- however it is easy to support or to add later (by a programmer who understands interlacing & deinterlacing). |
Formerly someone (Burnsedia) started working on this BountySource issue until they realized this was a C/C++ project. I'm updating the original post to be clear that this is a C/C++ skilled project. |
@Burnsedia Your past track record on bountysource came to my attention, you marked 5 bounties as "solving", yet all of them are still open. |
Has anyone tried this on Nvidia 700 or 900 series cards. I have had major issues with these cards and inconstant timing of the frame-buffer. The time at which the frame-buffer is actually sampled can vary by as much as half a frame making racing the beam completely impossible. The problem stems from an over-sized video output buffer and also memory compression of some kind. As soon as the active scan starts the output buffer is filled at an unlimited rate (really fast), this causes the read position in the frame-buffer to pull way ahead of the real beam position. The output buffer seems to store compressed pixels, for a screen of mostly solid color about half a frame can fit in the output buffer, for a screen of incompressible noise only a small number of lines can fit and therefor has much more normal timing. This issue has plagued my mind for several years (I gave my 960 away because it bothered me so much), but I have yet to see any other mentions of this issue. I only post this here now because its relevant. |
Bountysource increased to $1142. |
Someone should close this issue and apologize to bakers. |
Hey, I'd love to contribute some money to the bounty! But I see that it hasnt had anything added since 2018 and Im feeling hesitant, Is it worth doing it? Also it would be cool to promote in some way, I'm surprised I don't hear more people talking about it! |
It's still a valid bounty. Most of the funds are mine -- and this bounty will be honored. There was a bit of talk about it in 2018, but currently quiet on these fronts at the moment. The buzz can be restarted at pretty much any time, if a small group of us of similar interests can start a buzz campaign about this. Some of us have jobs though, or got affected by pandemic affecting work, and have to work harder to compensate, etc. But I'm pretty much 100% behind seeing this happen. BTW, the new "240 Hz IPS" monitors are spectacular for RetroArch (even for 60Hz operation). |
I find it so weird that there aren't dozens of devs jumping at the opportunity to implement this... More than 4 years have passed since this ticket was created and still no working implementation?! Huh?! Input lag is one of THE most pressing issues that needs addressing in emulators, and WinUAE has proven that this technique works extremely well in practice. With the "lagless vsync" feature enabled in WinUAE with a frame-slice of 4, I really see zero reason to bother with real hardware anymore. The best of all — it works flawlessly with complex shaders! It's a huge game-changer, and I'm quite disappointed that developers of other emulators are so incredibly slow at adapting this brilliant technique. For the record, I don't care about RetroArch at all, otherwise I'd be doing this. But I started nagging the VICE devs about it; their C64 emulator badly needs it (some C64 games are virtually unplayable with the current 60-100ms lag). Might follow my own advice and will implement it myself, eventually... |
This bounty is solely for a RetroArch implementation. We also regret that nobody has picked this up yet. We have tried funding it with money, clearly that is not enough. It has to come from the heart from someone passionate enough and capable to do it. |
Yes. WinUAE has led the way, having already implemented this. Someone needs to add retro_set_raster_poll placeholders (see #10758). As a reminder to all -- this techinique is really truly the only way to organically get universal organic original machine latency in an emulator (universal native-machine / FPGA-league latency originality). VSYNC OFF frameslice beam racing is the closest you can get to raster-plotting directly to front-buffer, one row at a time, in real time, in sync with the real world raster. Same latency as original machine, to the error margin of 1-2 frameslices (subrefresh segments). Some of the faster GPUs can now exceed 10,000 frameslices per second. We are rapidly approaching an era where we may be able to do full fine granularity NTSC scanrate too! (1-pixel tall VSYNC OFF frameslices -- e.g. each pixel row is its separate tearline) |
Talked to the VICE people today about it. They're considering it, but some large scale refactorings will come first, which might take years. |
I'd like to at least start implementing some of the auxiliary things which would be needed to get the whole thing going. Thankfully blurbusters provided a lot of documentation and I feel like it should be possible to maybe break up all that has to be done into chunks. If we get some of these chunks done, even without a working implementation the entire thing might not seem so daunting to do. |
As I've mentioned elsewhere, I believe one of the major hurdles for RetroArch/libretro in implementing this is that we typically work in full-frame chunks. That is, the core runs long enough to generate a frame's worth of audio and video, then passes it to the frontend. For this, we'll need to pass along much smaller chunks and sync them more often. I suspect the cores that already use libco to hop between the core and libretro threads are probably going to be the lowest-hanging fruit. IIRC someone (maybe RealNC?) tinkered with this awhile back unsuccessfully, but I don't recall what exactly fell short. |
That's exactly what the retro_set_raster_poll is designed to do. Please look at #10758. I've already addressed this. Several emulators (e.g. NES) already render line-based. We simply need to add callbacks there, and it will be highly configurable for the future, with any one or more of the following:
I actually already spent dozens of hours researching RetroArch's source code. It's simpler thank you think. The first step is adding the raster scan line callback to the existing RetroArch callback APIs -- header it out to template it in, even if no module is "activated" yet. Then it is a simple matter of activation one module at a time (on modules that already render line-based) The flow is
Step 1 is easier than you think, if you ANY raster interrupt experience at all. Step 2 simply needs to gain some |
☝🏻 I'm 99% sure the answer is similarly simple with VICE. The problem there is more the infrastructure side of things; now it's tightly coupled with GTK and it uses some vsync mechanism provided by GTK (well, the GTK3 version, at least; the SDL one would be easier to hack I assume). Raster interrupts are common on the C64, so it's either already rendering by rasterline internally, or it would be trivial to add (haven't read the source yet). People are vastly overestimating the difficulty of implementing this technique, I think... Okay, maybe in a very generic framework like RA it could be a little bit trickier. |
Here's a TL;DR resummarization of the technique and why it's easier than expected to implement in RetroArch, if done in a staged manner. Any graphics API capable of VSYNC OFF tearlines, should technically be beamraceable. The key important discovery is: VSYNC OFF frameslice beam racing discovered to be relatively platform-independentSince framebuffer tearing (a raster artifact) is a platform-independent artifact, we piggyback on tearing as our beamracing method, and make tearing invisible with a trick, while achieving subrefresh lag. At Pressent() followed by a Flush() as well as the GL equivalent (glFlush()) -- upon return is the raster-exact time a VSYNC OFF tearline occurs. My tests have shown that frameslice beam racing is pretty much graphics API independent, provided you: The Four Pre-Requisites(A) Can get tearing to appear (aka VSYNC OFF) (B) Can flush all queued graphics call (aka Flush() in DX or glFlush() in OpenGL) (C) Has a high precision tick API call such as RTDSC instruction, micros() call, or std::chrono::high_resolution_clock(), or equivalent such as QueryPerformanceCounter() (D) Has any crude method of estimating raster as an approximate offset between blanking intervals (e.g. as a time offset between blocking VSYNC ON calls) Exact Scan Line Number Not Necessary (Zero Artifacts!)You don't even need to know the exact scanline number -- just estimate it as an offset between refresh cycles. You use (C) to create timestamps from (D), and estimate a raster scanline number from that. The exact moment the return from Flush() is the exact raster-moment the tearline appears in realtime in the display's scanout (approximately as a relative time offset between VSYNC's), as per this diagram of first post: Precisely Raster-Timing a VSYNC OFF Tearline(D) can be achieved via several methods. Some platforms have an .InVBlank flag, others via release of a blocking VSYNC ON call, or by monitoring the WIndows compositor). One method is to run two concurrent framebuffers -- one visible VSYNC OFF frame buffer (for frameslice beam racing), and one invisible VSYNC ON frame buffer (for monitoring blanking intervals to estimate raster offsets from) Item (C) allows you to precisely time your Present() + Flush() call (or OpenGL equivalent), one can use ultra-precise busywait to flush at exact a sub-millisecond moment you can (sometimes achieves accuracy of less than 10 microseconds). You can use a timer right up to ~0.5ms-1ms prior, then busywait the rest of the way, to save a bit of resources. Timers aren't accurate enough, but busywaits are super-accurate even on a Raspberry Pi. Exact-Scanline-Number APIs are OptionalSome platforms even lets you query an exact current-scanline number (e.g. D3DKMTGetScanLine() ...) which you can use, but you should design cross-platform genericness as a fallback for platforms that don't have a raster scan line number poll. For computing estimated raster scan line -- Windows, Mac, Linux, Android already have some indirect some methods of doing (D), so that's the only platform-specific part, but tearlines are platform-independent, making possible cross-platform frameslice beam racing via OpenGL. Estimating methods are perfectly fine, thanks to the jitter-hiding technique described. Jitter Safety Margin That Keeps Artifacts 100% InvisibleObviously, computer performance will jitter. This is solved by frameslicing ahead of the real world raster, within the error margin to hide raster-jitter in the VSYNC OFF tearline. As long as the performance jittering stays within the temporal time of frameslice (e.g. 4 frameslices per 60Hz refresh cycle = 4ms safety margin before artifacts appear).
Essentially at 4 frame slices, you're simply Present()+Flush() (or OpenGL equivalent, glFlush() ...) four times per 60Hz refresh cycle. Display Scaling IndependentScaling-wise, a 60Hz digital refresh cycle (even 4K HDMI/DisplayPort) scans top to bottom at the GPU output level (left to right, top to bottom digital pixel delivery) at roughly 1:1 physical surface-area sync relative to a VGA output (one 60Hz refresh) / NTSC signal (one 60Hz field), ever since the 1940s NTSC through 2020s DisplayPort, within less than a ~1ms error margin. Same for 50Hz PAL, with monitor set to 50Hz. So this is scaling-independent for a non-QFT ordinary 50Hz or 60Hz HDMI / DisplayPort. You can output 1080p, and pretend the 540th scanline (middle of screen) roughly maps to the raster of the 120th scanline of a 240p signal, or 240th scanline of a 480i signal. So you can still emuraster-realraster sync, within the error margin quite easily. Different-Hz Compensation / QFT / VRR Signal CompensationYes, it works: WinUAE successfully does this already. (E.g. beam racing every other refresh cycle of a 120Hz output signal) It becomes slightly tricker with QFT or VRR signals, or different-Hz signals, but the key is you can fast-beamrace random refresh cycles, and idle the emulator module until the targeted refresh cycle, and you beamrace that specific refresh cycle. So you can do 120Hz, 180Hz, 240Hz. You simply fast-beamrace every 2nd refresh cycle (surge-execute 1/60sec emulator in 1/120sec bursts, by knowing the refresh rate, and knowing the time interval between refresh cycles is 1/120sec, and you have to fast-sync emuraster with realraster in 2x realtime). To keep emulator running in "realtime" you only emuraster-realraster beamrace specific output refresh cycles. Many Specific RetroArch Modules Already Rendering One Scanline At A TimeSeveral RetroArch modules already are rendering line-based (Most of the pre-GPU-era modules do -- including many MAME modules, the NES module, the SNES module, etc) For GPU-era modules (e.g. Nintendo 64), just keep its own retro_set_raster_poll blank intentionally. You only need to worry about implementing it in the modules that already (repeat: I use the word "ALREADY") is rendering line-based. Those are the easiest to beam race to a real raster. Questions?I'm happy to share hundreds of hours of due diligence of my helping the WinUAE author (as well as Calamity's GroovyMAME prototype, as well as Tom Harte's CLK, which also implements variants of this algorithm). The WinUAE implementation is the most mature, being almost any-Hz, any-VRR, any-QFT compatible, as long as display scanout direction is same as emulator scanout direction. |
Even Shorter TL;DR Education For Anybody Remotely Familiar With Raster InterruptsVSYNC OFF tearlines are just simply rasters, no matter what platform. This is the KEY to cross-platform beam racing. The key "TL;DR" educational image is that VSYNC OFF tearlines are almost raster exact at the exit of the flush API call (RTDSC timestamped). You're simply presenting your framebuffer repeatedly (at precise subrefresh time intervals) while the EXISTING emulator module is already rasterplotting lines to its own framebuffer. As long as you Present()+Flush() AHEAD of the guesstimated real-world raster of the GPU output jack, the tearline stays INVISIBLE! No artifacts. Even with performance jitter. And, don't worry about the display type LCD/CRT/OLED, that's irrelevant to implementing this algorithm. You want 1/180sec input lag relative to original machine? You want 1/1000sec input lag relative to original machine? Etc.
Tested in a C# loop, a slower language than C++ What is the Input Lag?Input lag is always subrefresh (less than 1/60sec lagged relative to original machine!!) as long as you have at least 2 frameslices per refresh cycle (120 Present()+Flush() per second). Lag was measured to be between 1/frameslice to 2/frameslice relative to original machine, when tested on a VGA output jack (e.g. GTX 760). Measurements with digital outputs are similar, though the digital transceivers imparts a slight tapedelay lag (of a few scanlines, easily hidden in the raster-jitter safety margin). Remember, a 2020s DisplayPort still has a horizontal scan rate. It raster-outputs one pixel row at a time, at metered intervals, just like a 1920s Baird/Farnsworth TV signal!. Raster workflow has been unchanged for a century, as 2D serialization to a 1D signal. That's why tearlines still exist even on DisplayPort -- tearlines are simply raster interruptions at the GPU port. Most platforms in the last 10 years have been found to beam race accurately if not in battery saver mode (turn that off, btw). Frameslice count can be either dynamic or preset (e.g. configurable option, like WinUAE). Note: Performance is indeed higher (more frameslices per second) without Flush() but raster jitter increases due to asynchronous GPU pipelining behaviours. You can use Flush() by default, but add a hidden toggle to enable/disable the Flush(), or as a multiple-choice selection "High Accuracy Lagless VSYNC" versus "Fast Lagless VSYNC". Now you're an Einstein.Scroll back up and re-read my bigger walls of text with this newfound knowledge. |
💯 🥇 👍🏻 🚀 Thanks for this @mdrejhon! Yeah I always thought it's not such a difficult to grasp technique, kind of baffled why people think it's some voodoo black magic or hard to implement... I'd like to add that in WinUAE there's a shortcut key (forgot what) that lets you visualise the jitter/error; pretty cool and illustrative. I use a frameslice of 4 in general, that's pretty reliable, and the jitter varies a lot, but it doesn't matter, as explained above!
For the record, I'm using a frameslice of 4 with quite heavy CRT shaders oversampled to 4x vertical resolution at 1080p; works like a charm! (vertical oversampling helps a lot with scanline uniformity & getting rid of vertical moire-patterns) |
Btw, at this point might be easier to implement it yourself and show people how it's done 😎 |
Yes, it's a rather neat feature. One method RetroArch can use for debugging is changing the color-tint of the previous already-rendered emulator framebuffer, but not the new framesliceful of scanlines being rasterplotted by the emulator module. (even separate individual color tints for each frameslice) This makes tearing visible again (as a tint-difference tearing artifact), to watch the realtime raster jitter of VSYNC OFF tearlines for debugging purposes. It's fun to uncloak the raster jitter (vibrating VSYNC OFF tearlines) during debugging purpose to see how the safety margin is performing.
In theory I could do it, but it might not be till 2025 that I might be convinced to do so. I have too many paid projects to focus on to put food on the table -- and some are even superficially related to beam racing techniques (but for far more specialized purposes than emulators) Monitor processing electronics are often beamraced internally -- where you use a rolling window between the incoming pixel rows from the video cable, before scanning it out rapidly (in a subrefresh-latency manner). LCDs and OLEDs already scan like a CRT as seen in high speed videos (www.blurbusters.com/scanout) and high-Hz esports LCDs already do internally beamraced processing in the display scaler nowadays. Raster has always been a very been convenient 2D serialization into 1D for frame delivery purposes, and it persists to this date even in a high-Hz VRR 10-bit HDR DSC digital signal. That still uses raster scanout too! The thing that would push me (if nobody else) is a 480Hz OLED capable of accurate #10757 -- and this is not yet milked by anyone. With 480 Hz+, the CRT electron beam can be much more accurately simulated in software as a shader (adds a temporal equivalent of a spatial CRT filter). Instead of VSYNC OFF frameslice beamracing Hz=Hz, you could instead literally use 8 digital refresh cycle during VSYNC ON to simulate 1 CRT Hz via rolling-scan (software based rolling BFI with a phosphor fadebehind). Then beam race that by relaying raster data from retro_set_raster_poll to the electron beam simulator in real time (whether line at a time, or chunks at a time). There'd be only 1/480sec latency between emulator raster and real-world raster (aka full frame refresh cycle containing a rolling bar segment of 1/480sec worth of CRT electron beam simulation), the emulator continuing to run in real time for original pixel-for-pixel latency (same scanout latency, same scanout velocity). BTW, I'd guesstimate 2025-ish for a 480Hz OLED. Hoping! Anyway, first things first, keep it simple. If I suddenly have free time, I might do #10758 next year because it's dirt-easy, without even touching my BountySource donation. #10758 is just boring internal prep work as header-file templating with no visible/feature change to software. But I beg someone else to beat me to the punch, so I can keep doing more industry-impacting work first. I'd rather someone else trailblaze much sooner, at least pave the groundwork. |
New Apache 2.0 Source Code for a VSYNC estimator
We just released an open source cross platfrom VSYNC estimator accurate enough for cross platform synchronization between emulator Hz and real display Hz in high level languages (even JavaScript). Accurate enough for beam racing applications! Or simply slowly flywheeling an CPU-calculated emulator refresh rate (via RTDSC) towards more accurately aligning with the real-world refresh cycles, to prevent latency. Useful for input delay algorithms too (locking a VSYNC phase offsetting) https://github.com/blurbusters/RefreshRateCalculator It's the refresh rate estimator engine used by both www,vsynctester.com and www.testufo.com/refreshrate Here's the README.md:
|
BountySource Requirements Reduction AnnouncementDecember 2023Did you know Retrotink 4K is a Blur Busters Approved product? I worked with them to add fully adjustable 240Hz BFI to a composite/S-Video/component signal, for output to any 240Hz LCD or OLED. I recommend the new 240Hz OLEDs, since you can reduce 60Hz motion blur by 75% with the 240:60 ratio combined with GtG=nigh near 0. Perfect Sonic Hedgehog with BFI and CRT filters simultaneously... Retrotink 4K can do everything TestUFO can, including TestUFO Variable Persistence Demo For 240Hz Monitors and even brighten using a HDR nits booster, brighter than LG firmware TV BFI! So I'm way ahead of RetroArch, in an already released BFI product. So RetroArch, please catch up! I want to see open source versions. Also, crosspost:
|
Uh, you may want to check this out: bountysource/core#1586 |
Oh wow. Appreciate it. I missed that memo. I haven't been paying attention to that. My bounty was long before the PayPal refund window, so I'll just have to swallow the loss. OK, I declare offer a $500 code bounty directly from Blur Busters, staked on my reputation. $500 Bounty -- directly from Blur Busters / TestUFO
(IMPORTANT: Ignore the BountySource link at the top. The earlier bankrupt bountysource took my money with it, so I'll put up $500 staked on the reputation of Blur Busters instead.) |
@blurbusters or some maintainer might want to edit your original post to remove the bountysource link 👀 |
I can't -- This was posted back when @blurbusters was a user instead of an organization. Github has grandfathered my comment under my old personal (now organization) username. For the damn longest time it showed up as a "Ghost" user, but github has since corrected it to show the organization's username (formerly personal username). There seems to be no workaround to be able to edit the post to remove the link. If the admin can edit it, the admin should edit it (if github allows) Now that being said, check out the comments I wrote at 11390 in response to @Ophidon's completion of #16142. |
Feature Request Description
A new lagless VSYNC technique has been developed that is already implemented in some emulators. This should be added to RetroArch too.
Bounty available
There is currently a BountySource of about $500 to add the beam racing API to RetroArch plus support at least 2 emulator modules (scroll below for bounty trigger conditions). RetroArch is a C / C++ project.
Synchronize emu raster with real world raster to reduce input lag
It is achieved via synchronizing the emulator's raster to the real world's raster. It is successfully implemented in some emulators, and uses less processing power than RunAhead, and is more forgiving than expected thanks to a "jitter margin" technique that has been invented by a group of us (myself and a few emulator authors).
For lurkers/readers: Don't know what a "raster" or "beam racing" is? Read WIRED Magazine's Racing the beam article. Many 8-bit and 16-bit computers, consoles and arcade machines utilized similar techniques for many tricks, and emulators typically implement them
Already Proven, Already Working
GroovyMAME -- Dropbox .7z file: Successful experiment via unsubmitted patch by Calamity (and thread)
There is currently discussion between other willing emulator authors behind the scenes for adding lagless VSYNC (real-world beam racing support).
Preservationist Friendly. Preserves original input lag accurately.
Beam racing preserves all original latencies including mid-screen input reads.
Less horsepower needed than RunAhead.
RunAhead is amazing! That said, there are other lag-reducing tools that we should also make available too.
Android and Pi GPUs (too slow for RunAhead in many emulators) even work with this lag-reducing technique.
Beam racing works on PI/Android, allows slower cycle exact emulators to have dramatic lag reductions,
We have found it scales in both direction. Including Android and PI. Powerful computers can gain ultra-tight beam racing margins (sync between emuraster and realraster can be sub-millisecond on GTX 1080 Ti). Slower computers can gain very forgiving beam racing margins. The beam racing margin is adjustable -- can be up to 1 refresh cycle in size.
In other words, graphics are essentially raster-streamed to the display practically real-time (through a creative tearingless VSYNC OFF trick that works with standard Direct3D/OpenGL/Metal/etc), while the emulator is merrily executing at 1:1 original speed.
Diagrammatic Concept
Just like duplicate refresh cycles never have tearlines even in VSYNC OFF, duplicate frameslices never have tearlines either. We're simply subdividing frames into subframes, and then using VSYNC OFF instead.
We don't even need a raster register (it can help, but we've come up with a different method), since rasters can be a time-based offset from VSYNC, and that can still be accurate enough for flawless sub-millisecond latency difference between emulator and original machine.
Emulators can merrily run at original machine speed. Essentially streaming pixels darn-near-raster-realtime (submillisecond difference). What many people don't realize is 1080p and 4K signals still top-to-bottom scan like an old 60Hz CRT in default monitor orientation -- we're simply synchronizing to cable scanout, the scanout method of serializing 2D images to a 1D cable is fundamnetally unchanged. Achieving real raster sync between the emulator raster and real raster!
Many emulators already render 1 scanline at a time to an offscreen framebuffer. So 99% of the beam racing work is already done.
Simple Pre-Requisites
Distilling down to minimum requirements makes rasters cross-platform:
Such as RTDSC or QueryPerformanceCounter or std::chrono::high_resolution_clock
We use beam racing to hide tearlines in the jitter margin, creating a tearingless VSYNC OFF (lagless VSYNC ON) with a very tight (but forgiving) synchronization between emulator raster and real raster.
The simplified retro_set_raster_poll API Proposal
Proposing to add an API -- retro_set_raster_poll -- to allow this data to be relayed to an optional centralized beamracing module for RetroArch to implement realworld sync between emuraster and realraster via whatever means possible (including frameslice beam racing & front buffer beam racing, and/or other future beam racing sync techniques).
The goal of this API simply allows the centralized beamracing module to do an early peak at the incomplete emulator refresh cycle framebuffer every time a new emulator scan line has been plotted to it.
This minimizes modifications to emulators, allowing centralization of beam racing code.
The central code handle its own refresh cycle scanout synchronization (busylooping to pace correctly to real world's raster scan line number which can be extrapolated in a cross-platform manner as seen below!) without the emulator worrying about any other beam racing specifics.
Further Detail
Basically it's a beam-raced VSYNC OFF mode that looks exactly like VSYNC ON (perfect tearingless VSYNC OFF). The emulator can merrily render at 1:1 speed while realtime streaming graphics to the display, without surge-execution needed. This requires far less horsepower on the CPU, works with "cycle-exact" emulators (unlike RunAhead) and allows ultra low lag on Raspberry PI and Android processors. Frame-slice beam racing is already used for Android Virtual Reality too, but works successfully for emulators.
Which emulators does this benefit?
This lag reduction technique will benefit any emulator that already does internal beam racing (e.g. to support original raster interrupts). Nearly all retro platforms -- most 8-bit and 16-bit platforms -- can benefit.
This lag-reduction technique does not benefit high level emulation.
Related Raster Work on GPUs
Doing actual "raster interrupts" style work on Radeon/GeForces/Intels is actually surprisingly easy: tearlines are just rasters -- see YouTube video.
This provide the groundwork for lagless VSYNC operation, synchronization of realraster and emuraster. With the emulator method, the tearlines are hidden via the jittermargin approach.
Common Developer Misconceptions
First, to clear up common developer misconceptions of assumed "showstoppers"...
Proposal
Recommended Hook
It calls the raster poll every emulator scan line plotted. The incomplete contents of the emulator framebuffer (complete up to the most recently plotted emulator scanline) is provided. This allows centralization of frameslice beamracing in the quickest and simplest way.
Cross-Platform Method: Getting VSYNC timestamps
You don't need a raster register if you can do this! You can extrapolate approximate scan line numbers simply as a time offset from a VSYNC timestamp. You don't need line-exact accuracy for flawless emulator frameslice beamracing.
For the cross-platform route -- the register-less method -- you need to listen for VSYNC timestamps while in VSYNC OFF mode.
These ideally should become your only #ifdefs -- everything else about GPU beam racing is cross platform.
PC Version
Mac Version
Other platforms have various methods of getting a VSYNC event hook (e.g. Mac CVDisplayLinkOutputCallback) which roughly corresponds to the Mac's blanking interval. If you are using the registerless method and generic precision clocks (e.g. RTDSC wrappers) these can potentially be your only #ifdefs in your cross platform beam racing -- just simply the various methods of getting VSYNC timestamps. The rest have no platform-specificness.
Linux Version
See GPU Driver Documentation. There is a get_vblank_timestamp() available, and sometimes a get_scanout_position() (raster register equivalent). Personally I'd only focus on the obtaining VSYNC timestamping -- much simpler and more guaranteed on all platforms.
Getting the current raster scan line number
For raster calculation you can do one of the two:
(A) Raster-register-less-method: Use RTDSC or QueryPerformanceCounter or std::chrono::high_resolution_clock to profile the times between refresh cycle. On Windows, you can use known fractional refresh rate (from QueryDisplayConfig) to bootstrap this "best-estimate" refresh rate calculation, and refine this in realtime. Calculating raster position is simply a relative time between two VSYNC timestamps, allowing 5% for VBI (meaning 95% of 1/60sec for 60Hz would be a display scanning out). NOTE: Optionally, to improve accuracy, you can dejitter. Use a trailing 1-second interval average to dejitter any inaccuracies (they calm to 1-scanline-or-less raster jitter), ignore all outliers (e.g. missed VSYNC timestamps caused by computer freezes). Alternatively, just use jittermargin technique to hide VSYNC timestamp inaccuracies.
(B) Raster-register-method: Use D3DKMTGetScanLine to get your GPU's current scanline on the graphics output. Wait at least 1 scanline between polls (e.g. sleep 10 microseconds between polls), since this is an expensive API call that can stress a GPU if busylooping on this register.
NOTE: If you need to retrieve the "hAdaptor" parameter for D3DKMTGetScanLine -- then get your adaptor URL such as \.\\DISPLAY1 via EnumDisplayDevices() ... Then call D3DKMTOpenAdapterFromHdc() with this adaptor URL in order to open the hAdaptor handle which you can then finally pass to D3DKMTGetScanLine that works with Vulkan/OpenGL/D3D/9/10/11/12+ .... D3DKMT is simply a hook into the hAdaptor that is being used for your Windows desktop, which exists as a D3D surface regardless of what API your game is using, and all you need is to know the scanline number. So who gives a hoot about the "D3DKMT" prefix, it works fine with beamracing with OpenGL or Vulkan API calls. (KMT stands for Kernel Mode Thunk, but you don't need Admin priveleges to do this specific API call from userspace.)
Improved VBI size monitoring
You don't need raster-exact precision for basic frameslice beamracing, but knowing VBI size makes it more accurate to do frameslice beamracing since VBI size varies so much from platform to platform, resolution to resolution. Often it just varies a few percent, and most sub-millisecond inaccuracies is easily hidden within jittermargin technique.
But, if you've programmed with retro platforms, you are probably familiar with the VBI (blanking interval) -- essentially the overscan space between refresh cycles. This can vary from 1% to 5% of a refresh cycle, though extreme timings tweaking can make VBI more than 300% the size of the active image (e.g. Quick Frame Transport tricks -- fast scan refresh cycles with long VBIs in between). For cross platform frameslice beamracing it's OK to assume ~5% being the VBI, but there are many tricks to know the VBI size.
Turning The Above Data into Real Frameslice Beamracing
For simplicity, begin with emu Hz = real Hz (e.g. 60Hz)
(5a) Returns immediately to emulator module if not yet a full new framesliceful have been appended to the existing offscreen emulator framebuffer (don't do anything to the partially completed framebuffer). Update a counter, do nothing else, return immediately.
(5b) However once you've got a full frameslice worth built up since the last frameslice presented, it's now time to frameslice the next frameslice. Don't return right away. Instead, immediately do an intentional CPU busyloop until the realraster reaches roughly 2 frameslice-heights above your emulator raster (relative screen-height wise). So if your emulator framebuffer is filled up to bottom edge of where frameslice 4 is, then do a busyloop until realraster hits the top edge* of frameslice 3. Then immediately Present() or glutSwapBuffers() upon completing busyloop. Then Flush() right away.
NOTE: The tearline (invisible if unchanged graphics at raster are) will sometimes be a few pixels below the scan line number (the amount of time for a memory blit - memory bandwidth dependant - you can compensate for it, or you can just hide any inaccuracy in jittermargin)
NOTE2: This is simply the recommended beamrace margin to begin experimenting with: A 2 frameslice beamracing margin is very jitter-margin friendly.
Note: 120Hz scanout diagram from a different post of mine. Replace with emu refresh rate.matching real refresh rate, i.e. monitor set to 60 Hz instead. This diagram is simply to help raster veterans conceptualize how modern-day tearlines relates to raster position as a time-based offset from VBI
Bottom line: As long as you keep repeatedly Present()-ing your incompletely-rasterplotted (but progressively more complete) emulator framebuffer ahead of the realraster, the incompleteness of the emulator framebuffer never shows glitches or tearlines. The display never has a chance to display the incompleteness of your emulator framebuffer, because the display's realraster is showing only the latest completed portions of your emulator's framebuffer. You're simply appending new emulator scanlines to the existing emulator framebuffer, and presenting that incomplete emulator framebuffer always ahead of real raster. No tearlines show up because the already-refreshed-part is duplicate (unchanged) where the realraster is. It thusly looks identical to VSYNC ON.
Precision Assumptions:
Special Note On HLSL-Style Filters: You can use HLSL/fuzzyline style shaders with frameslices. WinUAE just does a full-screen redo on the incomplete emu framebuffer, but one could do it selectively (from just above the realraster all the way to just below the emuraster) as a GPU performance-efficiency optimization.
Adverse Conditions To Detect To Automatically disable beamracing
Optional, but for user-friendly ease of use, you can automatically enter/exit beamracing on the fly if desired. You can verify common conditions such as making sure all is me:
Exiting beamracing can be simply switching to "racing the VBI" (doing a Present() between refresh cycles), so you're just simulating traditional VSYNC ON via VSYNC OFF via that manual VSYNC'ing. This is like 1-frameslice beamracing (next frame response). This provides a quick way to enter/exit beamracing on the fly when conditions change dynamically. A Surface Tablet gets rotated, a module gets switched, refresh rate gets changed mid-game, etc.
Questions?
I'd be happy to answer questions.
The text was updated successfully, but these errors were encountered: