-
Notifications
You must be signed in to change notification settings - Fork 892
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add NV low latency support #3690
base: master
Are you sure you want to change the base?
Add NV low latency support #3690
Conversation
m_frameId += 1; | ||
if (!Repeat) { | ||
m_frameId = m_presenter->lowLatencyEnabled() ? | ||
m_device->getLatencyMarkers().present : |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this work with games that are not using latency markers, but only the sleep API? (e.g. Apex Legends)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was fixed in a previous update to this PR to only use the present latency marker if it is valid.
Support for the low-latency extensions just got released in the 545.23.06 NVIDIA driver so this is usable/testable now. (Propagating this note to all three project PRs, apologies for any duplication! 😸 ) |
src/dxvk/dxvk_device.cpp
Outdated
@@ -274,6 +275,7 @@ namespace dxvk { | |||
DxvkSubmitStatus* status) { | |||
DxvkSubmitInfo submitInfo = { }; | |||
submitInfo.cmdList = commandList; | |||
submitInfo.frameId = m_latencyMarkers.render; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that resource initialization (zeroing) submissions can happen independently from the immediate context or CS thread. submitCommandList
is used by every submission including init submissions, and this can cause slightly awkward consequences such as submission tied with a particular frame ID after present.
It might be better to exclude InitContext from low latency tracking completely to avoid edge cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to disable passing frame ids at vkQueueSubmit time for work submitted from the initializer context.
74c1ba4
to
2518e1c
Compare
src/d3d11/d3d11_device.h
Outdated
} | ||
|
||
D3D11SwapChain* GetLowLatencySwapChain() { | ||
return (m_swapchains.size()) == 1 ? m_swapchains[0] : nullptr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's an inherent race condition here if the app destroys the swap chain while also calling any of the LowLatencyDevice methods that access it, in which case we're likely to just crash. Is this something we should be robust against?
src/d3d11/d3d11_device.h
Outdated
std::remove(m_swapchains.begin(), m_swapchains.end(), swapchain); | ||
} | ||
|
||
UINT GetSwapchainCount() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem to be used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
Requires at least Nvidia 545.xx drivers. The patches are from the following pull requests: [wine](ValveSoftware/wine#200) [dxvk-nvapi](jp7677/dxvk-nvapi#147) [vkd3d-proton](HansKristian-Work/vkd3d-proton#1739) [dxvk](doitsujin/dxvk#3690) Thanks also goes to @ptr1337 for initially building and testing the patchset.
Requires at least Nvidia 545.xx drivers. The patches are from the following pull requests: [wine](ValveSoftware/wine#200) [dxvk-nvapi](jp7677/dxvk-nvapi#147) [vkd3d-proton](HansKristian-Work/vkd3d-proton#1739) [dxvk](doitsujin/dxvk#3690) Thanks also goes to @ptr1337 for initially building and testing the patchset.
src/dxvk/dxvk_presenter.cpp
Outdated
@@ -48,6 +57,7 @@ namespace dxvk { | |||
|
|||
|
|||
VkResult Presenter::acquireNextImage(PresenterSync& sync, uint32_t& index) { | |||
std::lock_guard<dxvk::mutex> lock(m_lowLatencyMutex); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AcquireNextImage can block. I think holding a mutex for the entire duration of acquire can cause potential trouble.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see the same concern was also raised in vkd3d-proton review, so I suppose this can be addressed in a similar way (only hold the lock in case of recreation).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the locking in acquire and present removed my layer-based LL2 implementation works with significantly less stutters — so this is probably that has to be fixed. I think the locks aren't really necessary here anyway since access to swapchain is externally synchronized (which implies it can't be destroyed for the duration of call).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lock should only be taken when the swapchain is destroyed now. So I think this issue can be considered resolved.
src/dxvk/dxvk_presenter.cpp
Outdated
@@ -68,11 +78,13 @@ namespace dxvk { | |||
VkResult Presenter::presentImage( | |||
VkPresentModeKHR mode, | |||
uint64_t frameId) { | |||
std::lock_guard<dxvk::mutex> lock(m_lowLatencyMutex); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
QueuePresent is also allowed to block, although this might be less of a concern if the driver is implemented in a way that doesn't block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated this to only take the lock when destroying the swapchain.
src/dxvk/dxvk_presenter.cpp
Outdated
@@ -151,6 +163,8 @@ namespace dxvk { | |||
|
|||
|
|||
VkResult Presenter::recreateSwapChain(const PresenterDesc& desc) { | |||
std::lock_guard<dxvk::mutex> lock(m_lowLatencyMutex); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
recreateSurface
can also result in the swapchain getting destroyed. I think we currently lack locking there (and this might be a tricky one since recreateSurface
will keep m_swapchain
NULL until the next call to recreateSwapchain
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated this to only take the lock when destroying the swapchain. Whenever dxvk calls one of the other LL2 entry points it holds the low latency lock, and confirms the swapchain handle is valid. This approach should prevent any usage of the destroyed swapchain handle.
src/dxvk/dxvk_presenter.cpp
Outdated
uint32_t timingCount = 0; | ||
|
||
std::lock_guard<dxvk::mutex> lock(m_lowLatencyMutex); | ||
m_vkd->vkGetLatencyTimingsNV(m_vkd->device(), m_swapchain, &timingCount, &markerInfo); | ||
|
||
if (timingCount != 0) { | ||
frameReports.resize(timingCount, { VK_STRUCTURE_TYPE_GET_LATENCY_MARKER_INFO_NV }); | ||
markerInfo.pTimings = frameReports.data(); | ||
|
||
m_vkd->vkGetLatencyTimingsNV(m_vkd->device(), m_swapchain, &timingCount, &markerInfo); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think VkStructureType
passed to resize
is wrong here (VK_STRUCTURE_TYPE_GET_LATENCY_MARKER_INFO_NV
→ VK_STRUCTURE_TYPE_LATENCY_TIMINGS_FRAME_REPORT_NV
)? So together with updated Vulkan headers for spec 2 version of NV_ll2 this should look more like:
uint32_t timingCount = 0; | |
std::lock_guard<dxvk::mutex> lock(m_lowLatencyMutex); | |
m_vkd->vkGetLatencyTimingsNV(m_vkd->device(), m_swapchain, &timingCount, &markerInfo); | |
if (timingCount != 0) { | |
frameReports.resize(timingCount, { VK_STRUCTURE_TYPE_GET_LATENCY_MARKER_INFO_NV }); | |
markerInfo.pTimings = frameReports.data(); | |
m_vkd->vkGetLatencyTimingsNV(m_vkd->device(), m_swapchain, &timingCount, &markerInfo); | |
std::lock_guard<dxvk::mutex> lock(m_lowLatencyMutex); | |
m_vkd->vkGetLatencyTimingsNV(m_vkd->device(), m_swapchain, &markerInfo); | |
if (markerInfo.timingCount != 0) { | |
frameReports.resize(markerInfo.timingCount, { VK_STRUCTURE_TYPE_LATENCY_TIMINGS_FRAME_REPORT_NV }); | |
markerInfo.pTimings = frameReports.data(); | |
m_vkd->vkGetLatencyTimingsNV(m_vkd->device(), m_swapchain, &markerInfo); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, this should be fixed now.
src/d3d11/d3d11_device.h
Outdated
} | ||
|
||
void RemoveSwapchain(D3D11SwapChain* swapchain) { | ||
std::remove(m_swapchains.begin(), m_swapchains.end(), swapchain); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::remove(m_swapchains.begin(), m_swapchains.end(), swapchain); | |
m_swapchains.erase(std::remove(m_swapchains.begin(), m_swapchains.end(), swapchain)); |
Found by clang-cl.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, thanks for running this change through clang-cl!
|
||
|
||
if (m_device->GetDXVKDevice()->features().nvLowLatency2) { | ||
m_device->AddSwapchain(presenter.ref()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to miss a matching RemoveSwapchain
call.
Requires at least Nvidia 545.xx drivers. The patches are from the following pull requests: [wine](ValveSoftware/wine#200) [dxvk-nvapi](jp7677/dxvk-nvapi#147) [vkd3d-proton](HansKristian-Work/vkd3d-proton#1739) [dxvk](doitsujin/dxvk#3690) Thanks also goes to @ptr1337 for initially building and testing the patchset.
a7cdb7b
to
88b323b
Compare
This commit add support for the VK_NV_low_latency2 extension, and implements the ID3DLowLatencyDevice interface.
88b323b
to
d0e232c
Compare
Just like I mentioned in the vkd3d-proton PR, thank you for your patience in regards to the slow updates. Feel free to provide any additional feedback, or concerns and I will address them as quickly as possible. I am aware there are a couple of outstanding issues. I will get to those tomorrow. I think most of the major concerns have been resolved though. |
if (!Repeat) | ||
m_frameId += 1; | ||
if (!Repeat) { | ||
m_frameId = (m_presenter->lowLatencyEnabled() && m_device->getLatencyMarkers().present) ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two things mandates that m_frameId is monotonically increasing:
- VK_KHR_present_id: A non-zero presentId must be greater than any non-zero presentId passed previously by the application for the same swapchain.
- CallbackFence: The helper works similarly to a timeline semaphore so the timeline value must be increasing.
I think we need to ensure that the value increase here. For present IDs, recreating the swapchain will do the job when we see a smaller ID. CallbackFence will need some more modification: I'll describe my idea below.
- Make D3D9Swapchain use CallbackFence and remove Fence; the latter seems to be a superset of the former.
- Move CallbackFence into Presenter, so that it can be easily recreated by the Presenter. Change accesses to e.g.
D3D11Swapchain::m_frameLatencySignal
to something likem_presenter->frameLatencySignal()
. - Make
Presenter::destroySwapchain
destroy the signal too, then also recreate the signal inPresenter::recreateSwapchain
. (destroySwapchain
waits for the current frame to be signaled.) Add a new parameter toPresenter::recreateSwapchain
for the frame ID the signal should be initialized with. Also resetm_lastFrameId
to the specified value.
cc @doitsujin in case they have feedback for the proposed refactor.
Hi, We have updated on proton-cachyos the nvidia-reflex patches. One user is following a issue in apex legends. I have attached a Screenshot. Reflex without boost is working without problems. 545 drivers. Changes compared to last version: If you need any further informations or testing, feel free to hit me or @A1RM4X Edit: No issue. |
Toggling Reflex causing game logic crash feels pretty unlikely so it might be good to test this multiple times to confirm that it's not a fluke. If it's reproducible, I think attaching logs with |
curious on the state of merging this PR.. as equivalent VKD3D and DXVK-NVAPI Reflex patches got merged already.. |
Please don't make "state of merging" type of comments. The PR will be merged once all the concerns are addressed. Currently, the frame ID interaction with swapchain and CallbackFence needs to be fixed. |
We have also a user, which reports problems with this MR, if VRR is used together with Reflex in Overwatch. i can provide later the day some logs from him. |
Any updates on this? |
FWIW one motivation for #4609 is to eventually support this. I haven't looked into low_latency_2 on top of the new code specifically yet, and some problems remain that need to be solved (we'll need to virtualize our internal frame IDs so we can pass LL2 frame IDs to the vulkan swapchain etc; threading is also still a thing but should be much more logical now), but an implementation should be much more viable in the future. |
This pull request implements a new device interface called ID3DLowLatencyDevice using the VK_NV_low_latency2 extension. The purpose of this interface, is to give dxvk-nvapi a way to implement the nvapi Reflex interface.