Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MoltenVK: Use an external project instead of a pre-compiled dylib #9981

Merged
merged 1 commit into from
Nov 13, 2021

Conversation

OatmealDome
Copy link
Member

@OatmealDome OatmealDome commented Jul 31, 2021

Changes the build system to build MoltenVK instead of having a dylib in the repository. This also opens up the opportunity to modify MoltenVK to our needs if needed.

Also, the MoltenVK version has been updated to v1.1.5 (Vulkan SDK 1.2.189).

EDIT: This PR is being blocked by the presence of the Intel macOS buildbot (pr-osx-x64). The M1 Mac Mini can build MoltenVK perfectly fine for both Intel and ARM, so the Intel buildbot needs to be shut down before can be merged.

@OatmealDome OatmealDome marked this pull request as draft July 31, 2021 23:32
@kemenaran
Copy link
Contributor

(This is largely outside of the scope of this PR - but for what is worth, gfx-portability as a drop-in replacement for MoltenVK performs way better for me (in terms of FPS and frame pacing). Is there anything preventing MoltenVK to be replaced by gfx-portability, or is it just some work to be done?)

@OatmealDome
Copy link
Member Author

@kemenaran As far as I'm aware, there are no plans to replace MoltenVK with gfx-portability.

Copy link
Contributor

@Dentomologist Dentomologist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Builds and runs without issue.

Minor nitpick: The blank lines you added in CMakeLists.txt files have a couple extraneous spaces at the beginning; I'm surprised the linter didn't complain about them.

@OatmealDome
Copy link
Member Author

Didn't even notice, thanks. Xcode likes to add those.

I don't think the linter runs on CMakeLists?

Copy link
Contributor

@Dentomologist Dentomologist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Approval conditional on upgrading or removing the macOS Intel builder)

@leoetlino
Copy link
Member

Related: dolphin-emu/www#132 (we need to migrate users to universal builds before we can shut down the macOS Intel VM)

@nickp-ca
Copy link

Any reason the intel MacOS buildbot pr-osx-x64 is still needed considering pr-osx-universal will run on Intel?

@OatmealDome
Copy link
Member Author

@nickp-ca It isn't needed and is planned to be shut down (see the comment above yours).

@OatmealDome OatmealDome marked this pull request as ready for review September 17, 2021 03:41
@leoetlino
Copy link
Member

Could you rebase this PR please?

@OatmealDome
Copy link
Member Author

Rebased. Also, updated to the latest MoltenVK release.

The build time is a bit slow, even on incremental builds. I have some ideas for decreasing the build time, but I'll open a separate PR for that considering this one has already been reviewed and approved...

Also, update MoltenVK to match Vulkan SDK 1.2.189.
@leoetlino leoetlino merged commit 0d4c827 into dolphin-emu:master Nov 13, 2021
@iljatanger
Copy link

Hello dear developer and thank you very much for your labour and such great emulator.

Started from this release, the performance in many games (SMG, SMG2, Metroid Prime, Super Mario bros.) etc got significantly lower (2-3 times!).

this is 100% tested on many releases (about 20 times).

Initially I tried to update Dolphin on MacBook Air M1 from the beta version 15445 to the newest one 16101, because I had artifacts (black and transparent squares in the center of screen in Super Smash Bros. Brawl). And I was shocked what the performance became so poor. If at 15445 3X in Mario Galaxy and Metroid Prime was playable, now it became much worse, so only 1X and 2x (hardly) playable. So after that I installed many releases and found, that such visible, significant performance drop had started from this release. The size of the release also was changed started from this version 15474 (38.8mb instead of 34.9 before, so I suggest it was some global recompiling of the code).

so now I stopped on version 15472 as the latest playable one. But unfortunately, I still have the same visual artifacts in Super Smash Bros...

Could you ever be so kind to check whether there is some mistake in this release starting from 15474?

@bluesalt
Copy link

All the version started from this release have kept crash with vulkan mode in my macbook pro. Like @iljatanger , the workable version stopped at 15472 in my macbook.

Process: Dolphin
Path: /Applications/Dolphin.app/Contents/MacOS/Dolphin
Identifier: org.dolphin-emu.dolphin
Version: 5.0-15474 (5.0)
Code Type: X86-64 (Native)
Parent Process: ??? [1]
Responsible: Dolphin [3953]

OS Version: Mac OS X 10.15.7 (19H2)
Report Version: 12
Bridge OS Version: 4.6 (17P6610)

System Integrity Protection: enabled

Crashed Thread: 28 Emuthread - Starting

Exception Type: EXC_BAD_ACCESS (SIGSEGV)
Exception Codes: KERN_INVALID_ADDRESS at 0x0000000000000000
Exception Note: EXC_CORPSE_NOTIFY

Termination Signal: Segmentation fault: 11
Termination Reason: Namespace SIGNAL, Code 0xb
Terminating Process: exc handler

@OatmealDome
Copy link
Member Author

This is a known issue with MoltenVK and will be fixed as part of the next MoltenVK release. (https://bugs.dolphin-emu.org/issues/12840)

@pizuz
Copy link

pizuz commented Jul 9, 2022

Hello dear developer and thank you very much for your labour and such great emulator.

Started from this release, the performance in many games (SMG, SMG2, Metroid Prime, Super Mario bros.) etc got significantly lower (2-3 times!).

I can second that. At the moment, even the deprecated OpenGL seems to be faster than Vulkan in the games I tested. This performance regression affects all releases including the most recent one (at the time of writing 5.0-16831).

Late 2013 iMac with 3,1 GHz Quad-Core Intel Core i7 and NVIDIA GeForce GT 750M 1 GB running macOS 10.15.7.

@OatmealDome
Copy link
Member Author

@pizuz Here's an experiment to try, if you want.

Try running LIBVULKAN_PATH=/path/to/libMoltenVK.dylib /path/to/DolphinBeforePerformanceRegression.app/Contents/MacOS/Dolphin in the Terminal with a path to the latest MoltenVK dylib from LunarG.

@pizuz
Copy link

pizuz commented Jul 14, 2022

I did it the other way round and copied the last known good version 1.1.2 (which was the last update introduced in #9441) into latest master - which gave it some substantial speedup. The performance regression must have been introduced somewhere between 1.1.3 and 1.1.5 because even the latest SDK release binary didn't change anything.

I'll try to bisect next.

EDIT: for some weird reason the MoltenVK version 1.1.2 bundled with Dolphin is different from the SDK version 1.1.2 - as in the SDK version running as slow as all following versions. On the other hand, the one bundled with Dolphin since #9441 is a lot faster. I have absolutely no clue whywhy that is.

EDIT: The regression was apparently introduced in either MVK 1.1.3 or 1.1.4

@OatmealDome
Copy link
Member Author

@pizuz If I remove the code signatures from those files, the file hashes end up being exactly the same. Perhaps you made a mistake somewhere?

$ shasum dolphin.dylib 
6d6af070954de2a5049550c3f23ffcad43e9e506  dolphin.dylib
$ shasum lunarg.dylib
cc12d4c897ea01370c51c2dcd94bd62e6ddb2f03  lunarg.dylib
$ codesign --remove-signature dolphin.dylib 
$ codesign --remove-signature lunarg.dylib 
$ shasum dolphin.dylib 
3be651368e7d9c9940a907eb894728194a413106  dolphin.dylib
$ shasum lunarg.dylib 
3be651368e7d9c9940a907eb894728194a413106  lunarg.dylib

@pizuz
Copy link

pizuz commented Jul 14, 2022

Yes, I did apparently. I have no idea what happened. Anyway, the results are in:

The regression was introduced in either 1.1.3 or 1.1.4. I can't say for sure, because 1.1.3 doesn't load in Dolphin. I will have to bisect MVK itself. Luckily they have a CI in place similar to Dolphin, but this might still take a while. Scratch that, they delete artifacts older than a few weeks, apparently. I'll probably file a bug with MVK once I find the culprit commit.

@OatmealDome
Copy link
Member Author

OatmealDome commented Jul 18, 2022

@pizuz Could you try running the following command in the terminal with the latest dev build and see if performance gets any better vs when running it normally? It might cause some graphical glitches, though.

MVK_ALLOW_METAL_FENCES=1 /path/to/Dolphin.app/Contents/MacOS/Dolphin

@pizuz
Copy link

pizuz commented Jul 19, 2022

The command makes a difference in performance-light scenes such as splash screens which render twice as fast, now. In other instances, such as the attract screen in Super Mario Galaxy, there is no improvement, unfortunately. I'm still trying to find time to bisect those old PRs in MoltenVK to pinpoint the change that caused this. MVK 1.1.2 definitely runs a lot faster.

@TellowKrinkle
Copy link
Contributor

Bisected to KhronosGroup/MoltenVK@4371ef4, which matches up with the pool reset issues I reported. If it is just this, reducing the size of our descriptor pool is a fine temporary solution for testing while we wait for MVK to properly fix this.

If you could check that to confirm that it fixes performance (or just confirm that reducing the descriptor pool size fixes performance), that would be ideal.

@pizuz
Copy link

pizuz commented Aug 7, 2022

I‘ll try building it later today. Once I figure out how to get CMake to find zlib and sdl again.

@MayImilae
Copy link
Contributor

@pizuz I'd love to try that build myself.

@pizuz
Copy link

pizuz commented Aug 9, 2022

@pizuz I'd love to try that build myself.

I'd love to share, but CMake still doesn't seem to let me build and I have no idea why that is. I'm getting no reasonable error message. It did work before.

cmake error.txt

@OatmealDome
Copy link
Member Author

@pizuz Building using static SDL from the submodule on macOS does not work at this time. Install SDL as a shared library onto your system (for example, with brew) and it should work.

@pizuz
Copy link

pizuz commented Aug 9, 2022

Thanks for the hint. Now the Metal backend fails to build... Is there a CMake option to disable it? For the test, I don‘t need it.

Building CXX object Source...es/videometal.dir/MTLStateTracker.mm.o
FAILED: Source/Core/VideoBackends/Metal/CMakeFiles/videometal.dir/MTLStateTracker.mm.o 
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++ -DAUTOUPDATE=1 -DDATA_DIR=\"/usr/local/share/dolphin-emu/\" -DFMT_EXCEPTIONS=0 -DFMT_LOCALE -DFMT_SHARED -DHAS_LIBMGBA -DHAS_OPENGL -DHAS_VULKAN -DHAVE_CRC32 -DHAVE_SDL2=1 -DLZMA_API_STATIC -DSFML_STATIC -DSPIRV_CROSS_EXCEPTIONS_TO_ASSERTIONS -DUSE_ANALYTICS=1 -DUSE_MEMORYWATCHER=1 -DUSE_PIPES=1 -DUSE_UPNP -D_ARCH_64=1 -D_DEFAULT_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -D_M_X86=1 -D_M_X86_64=1 -D__LIBUSB__ -D__STDC_CONSTANT_MACROS -D__STDC_LIMIT_MACROS -I/Users/pizuz/Downloads/Dolphin-dev/Dolphin/Externals/SFML/include -I/Users/pizuz/Downloads/Dolphin-dev/Dolphin/Source/Core -I/Users/pizuz/Downloads/Dolphin-dev/Dolphin/Externals/enet/include -I/Users/pizuz/Downloads/Dolphin-dev/Dolphin/External/minizip -I/Users/pizuz/Downloads/Dolphin-dev/Dolphin/Externals/ed25519 -I/Users/pizuz/Downloads/Dolphin-dev/Dolphin/Externals/soundtouch -I/Users/pizuz/Downloads/Dolphin-dev/Dolphin/Externals/mbedtls/include -I/Users/pizuz/Downloads/Dolphin-dev/Dolphin/Externals/discord-rpc/include -I/Users/pizuz/Downloads/Dolphin-dev/Dolphin/Externals/picojson -I/Users/pizuz/Downloads/Dolphin-dev/Dolphin/build/Source/Core -I/Users/pizuz/Downloads/Dolphin-dev/Dolphin/Externals/minizip/. -I/Users/pizuz/Downloads/Dolphin-dev/Dolphin/build/Externals/zlib-ng/zlib-ng -I/Users/pizuz/Downloads/Dolphin-dev/Dolphin/Externals/zlib-ng/zlib-ng -I/Users/pizuz/Downloads/Dolphin-dev/Dolphin/Externals/cubeb/include -I/Users/pizuz/Downloads/Dolphin-dev/Dolphin/build/exports -I/Users/pizuz/Downloads/Dolphin-dev/Dolphin/Externals/liblzma/api -I/Users/pizuz/Downloads/Dolphin-dev/Dolphin/Externals/zstd/lib -I/Users/pizuz/Downloads/Dolphin-dev/Dolphin/Externals/libusb/libusb/libusb -I/Users/pizuz/Downloads/Dolphin-dev/Dolphin/Externals/pugixml/. -I/Users/pizuz/Downloads/Dolphin-dev/Dolphin/Externals/mGBA/mgba/include -I/Users/pizuz/Downloads/Dolphin-dev/Dolphin/build/Externals/mGBA/mgba/include -I/Users/pizuz/Downloads/Dolphin-dev/Dolphin/Externals/hidapi/hidapi -I/Users/pizuz/Downloads/Dolphin-dev/Dolphin/Externals/spirv_cross/SPIRV-Cross/include -I/Users/pizuz/Downloads/Dolphin-dev/Dolphin/Externals/spirv_cross/SPIRV-Cross -isystem /usr/local/include -isystem /Users/pizuz/Downloads/Dolphin-dev/Dolphin/Externals/glslang/glslang/Public -isystem /Users/pizuz/Downloads/Dolphin-dev/Dolphin/Source/Core/VideoCommon/SYSTEM -isystem /Users/pizuz/Downloads/Dolphin-dev/Dolphin/Externals/rangeset/include -O3 -DNDEBUG -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX11.1.sdk -mmacosx-version-min=10.14.0 -msse2 -fdiagnostics-color -fno-strict-aliasing -fno-exceptions -fvisibility-inlines-hidden -fvisibility=hidden -fomit-frame-pointer -mssse3 -march=core2 -Wall -Wtype-limits -Wsign-compare -Wignored-qualifiers -Wuninitialized -Wshadow -Winit-self -Wmissing-declarations -Wmissing-variable-declarations -Werror=format -std=c++2a -MD -MT Source/Core/VideoBackends/Metal/CMakeFiles/videometal.dir/MTLStateTracker.mm.o -MF Source/Core/VideoBackends/Metal/CMakeFiles/videometal.dir/MTLStateTracker.mm.o.d -o Source/Core/VideoBackends/Metal/CMakeFiles/videometal.dir/MTLStateTracker.mm.o -c /Users/pizuz/Downloads/Dolphin-dev/Dolphin/Source/Core/VideoBackends/Metal/MTLStateTracker.mm
/Users/pizuz/Downloads/Dolphin-dev/Dolphin/Source/Core/VideoBackends/Metal/MTLStateTracker.mm:631:30: error: use of overloaded operator '!=' is ambiguous (with operand types 'Metal::DepthStencilSelector' and 'Metal::DepthStencilSelector')
    if (pipe->DepthStencil() != m_current.depth_stencil)
        ~~~~~~~~~~~~~~~~~~~~ ^  ~~~~~~~~~~~~~~~~~~~~~~~
/Users/pizuz/Downloads/Dolphin-dev/Dolphin/Source/Core/VideoBackends/Metal/MTLObjectCache.h:41:8: note: candidate function
  bool operator!=(const DepthStencilSelector& other) { return !(*this == other); }
       ^
/Users/pizuz/Downloads/Dolphin-dev/Dolphin/Source/Core/VideoBackends/Metal/MTLObjectCache.h:40:8: note: candidate function
  bool operator==(const DepthStencilSelector& other) { return value == other.value; }
       ^
/Users/pizuz/Downloads/Dolphin-dev/Dolphin/Source/Core/VideoBackends/Metal/MTLObjectCache.h:40:8: note: candidate function (with reversed parameter order)
1 error generated.
[1153/3570] Performing configure step for 'MoltenVK'
ninja: build stopped: subcommand failed.

@OatmealDome
Copy link
Member Author

@MayImilae @pizuz A potential fix landed in MoltenVK upstream - could you test this build and see if it improves anything? https://dl.dolphin-emu.org/prs/65/fe/pr-10968-dolphin-latest-universal.dmg

@Bryceshaw06
Copy link

@OatmealDome Yes, this build improved performance for me drastically! M1 Macbook Air Mario Galaxy 2 @ 1080p ~40 fps -> 60 fps, no dips. I was wondering why my iPhone 11 was outpreforming my M1 Mac in Dolphin, that explains it.

@MayImilae
Copy link
Contributor

MayImilae commented Aug 10, 2022

That is a massive performance improvement! However it is slightly less than anticipated. So in the Elysia test I described previously before the perf drop I would get 60fps in MoltenVK (metal in latest master gets 61fps). With pr10968 I am getting 58fps. I'll hella take that, that's way better than 37fps, but it's slightly less than it was before. Possibly from other changes in Dolphin or in MoltenVK, no way to know.

Either way, very nice. What change was it that fixed this?

@OatmealDome
Copy link
Member Author

Thanks for testing! MoltenVK will be releasing a new version on August 15 to coincide with a new Vulkan SDK release. Not sure if we should be waiting for that, or if I should just make a PR updating to the latest upstream ASAP for the report.

@MayImilae Tellow found that vkResetDescriptorSets was implemented in such a way where it will reset every descriptor set in a pool regardless if it was actually used. Dolphin's Vulkan backend allocates 100000 descriptor sets, but doesn't use all of them, so every call to vkResetDescriptorSets ends up taking a significant amount of CPU time. A PR was just merged (KhronosGroup/MoltenVK#1676) which improves the implementation to only reset what was used, which is very likely where the performance improvement comes from.

Hopefully I got that right. Tellow can chime in if I didn't :P

@pizuz
Copy link

pizuz commented Aug 10, 2022

It seems to improve the speed quite a bit (looks like it's on par with OpenGL again on the SMG splash screen), but the performance once achieved with MVK 1.1.2 was a lot better. I'm still seeing a substantial gap between Vulkan and Metal.

@Bryceshaw06
Copy link

@pizuz Do you know if it's possible to use MVK 1.1.2 with the latest version of dolphin (or can you direct me to the dolphin version that used 1.1.2)? I would like to test to see the difference in performance between the two versions.

@pizuz
Copy link

pizuz commented Aug 10, 2022

@pizuz Do you know if it's possible to use MVK 1.1.2 with the latest version of dolphin (or can you direct me to the dolphin version that used 1.1.2)? I would like to test to see the difference in performance between the two versions.

Just copy the dylib over (either from the last Dolphin version before this PR or from the official Vulkan SDK) and rename it to libvulkan.dylib. I don't know if the FBFetch patch is required for M1 macs, but since my nVidia card doesn't support that feature anyway, I never encountered any issues.

@OatmealDome
Copy link
Member Author

OatmealDome commented Aug 10, 2022

@pizuz Could you try the fixed build with this command that I brought up a while ago?

MVK_ALLOW_METAL_FENCES=1 /path/to/Dolphin.app/Contents/MacOS/Dolphin

I ask because a change was made to MoltenVK to better comply with the Vulkan specification which loses some performance, and they also had to workaround a bug in NVIDIA's GPU drivers and in doing so lost even more performance. Those two things combined might be the reason why you're still seeing a drop. Turning that flag on will restore the old behaviour.

@pizuz
Copy link

pizuz commented Aug 10, 2022

@pizuz Could you try the fixed build with this command that I brought up a while ago?

MVK_ALLOW_METAL_FENCES=1 /path/to/Dolphin.app/Contents/MacOS/Dolphin

I ask because a change was made to MoltenVK to better comply with the Vulkan specification which loses some performance, and they also had to workaround a bug in NVIDIA's GPU drivers and in doing so lost even more performance. Those two things combined might be the reason why you're still seeing a drop. Turning that flag on will restore the old behaviour.

Again, no real difference. The “Please mind your surroundings and don't hit other people with your Wiimote” splash screen renders faster, the SMG attract screen that follows still has the same slowdowns as without the argument. I don't want to downplay the MVK fix, because it definitely brings some very good improvement.

@TellowKrinkle
Copy link
Contributor

@pizuz could you use Instruments to collect System Traces and Metal System Traces of MVK 1.1.2 vs latest?

How to collect traces in Instruments
  1. Open Instruments, either by searching for it in spotlight or through Xcode → Open Developer Tool in Xcode's main menu
  2. Select the trace type you want
    image
  3. Click "All Processes" on the top, select "Choose Target...", and select Dolphin. You can also add LIBVULKAN_PATH to the environment variables to use a custom MoltenVK dylib. Then select "Choose"
    image
  4. Click and hold the red button in the upper left corner, then select "Recording Options..."
    image
  5. In global options, you can optionally change the recording mode to "Last 2 seconds" to keep the trace short. Traces get big fast. I've had issues with the "Last X seconds" mode losing data in Metal System Traces though, so you might want to stick to Deferred for those.
    image
  6. Press record, do whatever you want to record a trace of in Dolphin, and then press the stop button in Instruments without quitting Dolphin. Stopping can take a few seconds, and you don't want the last few seconds of trace to be dolphin quitting.
  7. Cmd-S to save, and upload to some file transfer site because the trace will probably be too big to upload to GitHub or Discord.

@pizuz
Copy link

pizuz commented Aug 12, 2022

I‘ll try over the weekend.

@pizuz
Copy link

pizuz commented Aug 14, 2022

Alright, these are the Metal traces and the system traces.

I uploaded four different scenarios: MVK 1.1.2, MVK 1.1.11pre, OpenGL and Metal. I used the beginning of the Super Mario Galaxy attract screen, loaded from a save state and without a frame rate cap. Interestingly, the difference between MVK 1.1.11pre and 1.1.2 is less than I initially remembered.

@TellowKrinkle
Copy link
Contributor

It looks like MVK is stalling for about 6ms every other frame when we attempt to present a surface, stuck waiting for some other task from a previous frame to finish. I've encountered this on my 750M laptop in PCSX2 as well, it can get really bad there. I haven't encountered it on any non-Nvidia GPU.

Screenshots of MVK 1.1.11 doing bad things image image

I'll see if 1.1.2 runs PCSX2 faster on that laptop, and if it does, I'll bisect it there.

@billhollings
Copy link

Screenshots of MVK 1.1.11 doing bad things

  1. I see only two surfaces in the trace. What is the value of VkSwapchainCreateInfoKHR::minImageCount? If it is in fact set to 2, does any improvement happen if you set it to 3?

  2. I see a reference to MVKSemaphoreEmulated. I assume this is is because the default for MVKConfiguration::semaphoreUseMTLEvent (env var MVK_ALLOW_METAL_EVENTS) is disabled when using NVIDIA GPUs. Unfortunately, this ties Vulkan semaphores to CPU callbacks from command buffer completion, rather than leaving them in the GPU. What happens if you enable the above config flag or equivalent env var? And if that breaks things, the second choice would be to enable MVKConfiguration::semaphoreUseMTLFence (env var MVK_ALLOW_METAL_FENCES).

@TellowKrinkle
Copy link
Contributor

TellowKrinkle commented Aug 19, 2022

MVK_ALLOW_METAL_FENCES=1 clears it up for me. @pizuz?

@billhollings MVK_ALLOW_METAL_EVENTS seems to be disabled regardless of the env variable on Nvidia GPUs, so it doesn't do anything. Is that intended?

Also, are there any drawbacks to fences outside of the inability to sync across queues? AFAIK both Dolphin and PCSX2 use a single queue for everything, so if that's the only drawback, we could always enable fences from within the emulator to avoid having to set environment variables.

For image count, we use surface_capabilities.minImageCount + 1, which should be 3 for MVK (and is 3 on my computer when I pause in a debugger)

@billhollings
Copy link

MVK_ALLOW_METAL_EVENTS seems to be disabled regardless of the env variable on Nvidia GPUs, so it doesn't do anything. Is that intended?

Uggh. You're correct. 🤦🏻‍♂️

Are you in a position to override that line of code to force it on NVIDIA, and test whether it works properly? If it does work in some cases on NVIDIA, we could redesign the logic of the env vars to allow a kind of "force MTLEvents in all cases" option.

@TellowKrinkle
Copy link
Contributor

I removed the Nvidia check and ran it for ~10min in PCSX2 and ~5min in Dolphin and didn't get any crashes or other weird things.

@billhollings
Copy link

I removed the Nvidia check and ran it for ~10min in PCSX2 and ~5min in Dolphin and didn't get any crashes or other weird things.

Thanks for the testing and feedback. I've created a MoltenVK issue to track changing MoltenVK to accommodate this situation.

@TellowKrinkle
Copy link
Contributor

@pizuz Can you test #11179 and see if it resolves your performance issues on Nvidia?

@pizuz
Copy link

pizuz commented Oct 23, 2022

@TellowKrinkle, sorry, I didn't find the time to test it until now. During the last days, you guys merged a ton of performance optimisations and I didn't have the time to bisect, yet, to see if updating to MVK 1.2.0 brought any benefit in itself. Anyway, latest master provides a huge performance boost to the point that MVK actually outperforms your Metal backend on my machine.

@TellowKrinkle
Copy link
Contributor

TellowKrinkle commented Oct 23, 2022

Nice. (If you're wondering, MoltenVK 1.2.0 got a new alternative to MTLEvents that works on Nvidia GPUs, which is now enabled by default)

And with the now-working multithreaded submission, you can expect MVK performance to be similar to Metal in performance, depending on how fast your GPU's Metal driver is compared to MoltenVK (MVK does all the Metal work on the submission thread, so the Metal driver being slow doesn't slow down the emulator). In my testing, AMD's Metal driver is fast, while Intel's isn't so much. I guess Nvidia's is on the slower side of things. (Metal will still be faster in games with a lot of synchronization, like Paper Mario)

@pizuz
Copy link

pizuz commented Oct 23, 2022

In my limited testing capabilities, my nVidia Kepler card (iMac 14,3 - now on Monterey thanks to OCLP) is faster on MVK than using the Metal backend. Oh, and it‘s faster than my M2 MacBook Air.

Good work :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.