Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(nvenc): implement async encode #3629

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

cgutman
Copy link
Collaborator

@cgutman cgutman commented Feb 3, 2025

Description

This PR implements the Asynchronous Mode of NVENC encoding using an event handle on Windows. This implementation is unlikely to have performance gains (or losses) because it's still a 1:1 single thread encoding loop.

However, this seems to resolve the encoder hangs I've seen when navigating through the NVIDIA app while streaming. If anyone else is regularly experiencing those, I'd appreciate testing this to see if it also resolves the hangs for them too.

When I was debugging the Sunshine hang when the NVIDIA app was running, I saw our encoder thread stuck deep inside the GPU driver waiting in NtGdiDdDDIWaitForSynchronizationObjectFromCpu() forever. Based on my analysis of NvEncodeAPI64.dll in Ghidra and debugging Sunshine under WinDbg, the codepath which hangs appears to only be exercised with NV_ENC_LOCK_BITSTREAM::doNotWait == 0, so async mode sidesteps the driver bug.

Screenshot

Issues Fixed or Closed

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Dependency update (updates to dependencies)
  • Documentation update (changes to documentation)
  • Repository update (changes to repository files, e.g. .github/...)

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have added or updated the in code docstring/documentation-blocks for new or existing methods/components

Copy link

codecov bot commented Feb 3, 2025

Codecov Report

Attention: Patch coverage is 0% with 17 lines in your changes missing coverage. Please review.

Project coverage is 11.61%. Comparing base (9970939) to head (7982f52).

Files with missing lines Patch % Lines
src/video.cpp 0.00% 4 Missing and 5 partials ⚠️
src/nvenc/nvenc_d3d11.cpp 0.00% 7 Missing ⚠️
src/nvenc/nvenc_base.cpp 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3629      +/-   ##
==========================================
- Coverage   11.62%   11.61%   -0.01%     
==========================================
  Files          93       92       -1     
  Lines       17319    17333      +14     
  Branches     8085     8096      +11     
==========================================
  Hits         2014     2014              
+ Misses      14710    12832    -1878     
- Partials      595     2487    +1892     
Flag Coverage Δ
Linux 11.29% <0.00%> (-0.01%) ⬇️
Windows 13.02% <0.00%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
src/nvenc/nvenc_base.cpp 0.00% <0.00%> (ø)
src/nvenc/nvenc_d3d11.cpp 0.00% <0.00%> (ø)
src/video.cpp 21.55% <0.00%> (-0.18%) ⬇️

... and 29 files with indirect coverage changes

@ReenigneArcher
Copy link
Member

Might be related to this? #3411

Copy link

sonarqubecloud bot commented Feb 3, 2025

Quality Gate Failed Quality Gate failed

Failed conditions
2 New issues
2 New Code Smells (required ≤ 0)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

@cgutman
Copy link
Collaborator Author

cgutman commented Feb 4, 2025

I encountered the NVENC hang again (this time during resolution change). With the fix here, we didn't hang inside the nvEncEncodePicture() or nvEncLockBitstream() this time (the async wait timed out as expected), but then we hung during destruction on nvEncDestroyBitstreamBuffer().

Full stack
00 0000004e`efb4ddc8 00007ff8`2add74d9     win32u!NtGdiDdDDIWaitForSynchronizationObjectFromCpu+0x14
01 0000004e`efb4ddd0 00007ff8`2add7495     d3d11!D3D11CoreCreateLayeredDevice+0x15b59
02 0000004e`efb4de30 00007fff`967cf377     d3d11!D3D11CoreCreateLayeredDevice+0x15b15
03 0000004e`efb4de90 00007fff`9679cede     nvwgf2umx!SetDependencyInfo+0x5afc7
04 0000004e`efb4def0 00007fff`9679cca4     nvwgf2umx!SetDependencyInfo+0x28b2e
05 0000004e`efb4e110 00007fff`967a4f28     nvwgf2umx!SetDependencyInfo+0x288f4
06 0000004e`efb4e180 00007fff`967a4db4     nvwgf2umx!SetDependencyInfo+0x30b78
07 0000004e`efb4e2e0 00007fff`967a404b     nvwgf2umx!SetDependencyInfo+0x30a04
08 0000004e`efb4e350 00007fff`96784f4a     nvwgf2umx!SetDependencyInfo+0x2fc9b
09 0000004e`efb4e380 00007fff`968202cb     nvwgf2umx!SetDependencyInfo+0x10b9a
0a 0000004e`efb4e3b0 00007fff`96932970     nvwgf2umx!SetDependencyInfo+0xabf1b
0b 0000004e`efb4e3e0 00007fff`96863247     nvwgf2umx!SetDependencyInfo+0x1be5c0
0c 0000004e`efb4e440 00007fff`96862df7     nvwgf2umx!SetDependencyInfo+0xeee97
0d 0000004e`efb4e4a0 00007fff`96862583     nvwgf2umx!SetDependencyInfo+0xeea47
0e 0000004e`efb4e4e0 00007fff`9678be53     nvwgf2umx!SetDependencyInfo+0xee1d3
0f 0000004e`efb4e5a0 00007fff`95e3f34b     nvwgf2umx!SetDependencyInfo+0x17aa3
10 0000004e`efb4e6f0 00007fff`9627290a     nvwgf2umx!NVAPI_DirectMethods+0x119fcb
11 0000004e`efb4e7f0 00007ff8`2ad9262b     nvwgf2umx!OpenAdapter10+0x2794a
12 0000004e`efb4e830 00007ff8`2ad9249d     d3d11!CreateDirect3D11DeviceFromDXGIDevice+0x1805b
13 0000004e`efb4e920 00007ff8`0d33216a     d3d11!CreateDirect3D11DeviceFromDXGIDevice+0x17ecd
14 0000004e`efb4e960 00007ff8`0d349505     nvEncodeAPI64+0x216a
15 0000004e`efb4ea00 00007ff8`0d35585f     nvEncodeAPI64+0x19505
16 0000004e`efb4eb00 00007ff7`3c209fc2     nvEncodeAPI64+0x2585f
17 0000004e`efb4eb30 00007ff7`3c20cbef     sunshine!destroy_encoder+0x64 [C:\Users\camer\moonlight-src\LB_Sunshine\src\nvenc\nvenc_base.cpp @ 458] 
18 0000004e`efb4f350 00007ff7`3c20cc3e     sunshine!~nvenc_d3d11_native+0x33 [C:\Users\camer\moonlight-src\LB_Sunshine\src\nvenc\nvenc_d3d11_native.cpp @ 24] 
19 0000004e`efb4f380 00007ff7`3d965f18     sunshine!~nvenc_d3d11_native+0x18 [C:\Users\camer\moonlight-src\LB_Sunshine\src\nvenc\nvenc_d3d11_native.cpp @ 24] 
1a 0000004e`efb4f3b0 00007ff7`3da10a94     sunshine!operator()+0x28 [C:\msys64\ucrt64\include\c++\14.2.0\bits\unique_ptr.h @ 94] 
1b 0000004e`efb4f3e0 00007ff7`3d8cae5c     sunshine!~unique_ptr+0x54 [C:\msys64\ucrt64\include\c++\14.2.0\bits\unique_ptr.h @ 399] 
1c 0000004e`efb4f430 00007ff7`3d8cae18     sunshine!~d3d_nvenc_encode_device_t+0x2c [C:\Users\camer\moonlight-src\LB_Sunshine\src\platform\windows\display_vram.cpp @ 1053] 
1d 0000004e`efb4f460 00007ff7`3d966068     sunshine!~d3d_nvenc_encode_device_t+0x18 [C:\Users\camer\moonlight-src\LB_Sunshine\src\platform\windows\display_vram.cpp @ 1053] 
1e 0000004e`efb4f490 00007ff7`3da11494     sunshine!operator()+0x28 [C:\msys64\ucrt64\include\c++\14.2.0\bits\unique_ptr.h @ 94] 
1f 0000004e`efb4f4c0 00007ff7`3d8d404a     sunshine!~unique_ptr+0x54 [C:\msys64\ucrt64\include\c++\14.2.0\bits\unique_ptr.h @ 399] 
20 0000004e`efb4f510 00007ff7`3d8d4008     sunshine!~nvenc_encode_session_t+0x2a [C:\Users\camer\moonlight-src\LB_Sunshine\src\video.cpp @ 375] 
21 0000004e`efb4f540 00007ff7`3d966308     sunshine!~nvenc_encode_session_t+0x18 [C:\Users\camer\moonlight-src\LB_Sunshine\src\video.cpp @ 375] 
22 0000004e`efb4f570 00007ff7`3da12954     sunshine!operator()+0x28 [C:\msys64\ucrt64\include\c++\14.2.0\bits\unique_ptr.h @ 94] 
23 0000004e`efb4f5a0 00007ff7`3c1a2ff3     sunshine!~unique_ptr+0x54 [C:\msys64\ucrt64\include\c++\14.2.0\bits\unique_ptr.h @ 399] 
24 0000004e`efb4f5f0 00007ff7`3c1a56fa     sunshine!encode_run+0x942 [C:\Users\camer\moonlight-src\LB_Sunshine\src\video.cpp @ 1930] 
25 0000004e`efb4f800 00007ff7`3c1a5a7d     sunshine!capture_async+0x5ca [C:\Users\camer\moonlight-src\LB_Sunshine\src\video.cpp @ 2302] 
26 0000004e`efb4fa50 00007ff7`3c1966a4     sunshine!capture+0xbb [C:\Users\camer\moonlight-src\LB_Sunshine\src\video.cpp @ 2325] 
27 0000004e`efb4fbb0 00007ff7`3db724f6     sunshine!videoThread+0x2ea [C:\Users\camer\moonlight-src\LB_Sunshine\src\stream.cpp @ 1816] 
28 0000004e`efb4fd50 00007ff7`3db9ff0c     sunshine!__invoke_impl<void, void (*)(stream::session_t*), stream::session_t*>+0x36 [C:\msys64\ucrt64\include\c++\14.2.0\bits\invoke.h @ 61] 
29 0000004e`efb4fd90 00007ff7`3dabcda5     sunshine!__invoke<void (*)(stream::session_t*), stream::session_t*>+0x3c [C:\msys64\ucrt64\include\c++\14.2.0\bits\invoke.h @ 97] 
2a 0000004e`efb4fdd0 00007ff7`3dabce18     sunshine!_M_invoke<0, 1>+0x45 [C:\msys64\ucrt64\include\c++\14.2.0\bits\std_thread.h @ 301] 
2b 0000004e`efb4fe10 00007ff7`3dabb8cc     sunshine!operator()+0x18 [C:\msys64\ucrt64\include\c++\14.2.0\bits\std_thread.h @ 308] 
2c 0000004e`efb4fe40 00007ff7`3dbcc07f     sunshine!_M_run+0x1c [C:\msys64\ucrt64\include\c++\14.2.0\bits\std_thread.h @ 253] 
2d 0000004e`efb4fe70 00007ff7`3d786ceb     sunshine!boost_asio_detail_posix_thread_function+0x4f
2e 0000004e`efb4feb0 00007ff8`2efd37b0     sunshine!WspiapiQueryDNS+0xb812b
2f 0000004e`efb4fef0 00007ff8`2ff4e8d7     ucrtbase!wcsrchr+0x150
30 0000004e`efb4ff20 00007ff8`3163bf2c     kernel32!BaseThreadInitThunk+0x17
31 0000004e`efb4ff50 00000000`00000000     ntdll!RtlUserThreadStart+0x2c

To allow the encoder thread to continue to make forward progress in that case, I've added a simple async teardown codepath for NVENC that moves destruction to a separate thread. Reducing the amount of non-trivial work done on the encoder thread is generally a good idea, but I made it an opt-in thing because some encoders may make assumptions that they will be constructed and destructed on the same thread (or that there will only ever be one instance of an encoder).

I haven't successfully reproduced the issue again to confirm there's not some other hang hiding yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants