-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MoltenVK: Use an external project instead of a pre-compiled dylib #9981
MoltenVK: Use an external project instead of a pre-compiled dylib #9981
Conversation
18d71b5
to
0bd4eb6
Compare
(This is largely outside of the scope of this PR - but for what is worth, gfx-portability as a drop-in replacement for MoltenVK performs way better for me (in terms of FPS and frame pacing). Is there anything preventing MoltenVK to be replaced by gfx-portability, or is it just some work to be done?) |
@kemenaran As far as I'm aware, there are no plans to replace MoltenVK with gfx-portability. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Builds and runs without issue.
Minor nitpick: The blank lines you added in CMakeLists.txt files have a couple extraneous spaces at the beginning; I'm surprised the linter didn't complain about them.
0bd4eb6
to
1d50999
Compare
Didn't even notice, thanks. Xcode likes to add those. I don't think the linter runs on CMakeLists? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Approval conditional on upgrading or removing the macOS Intel builder)
Related: dolphin-emu/www#132 (we need to migrate users to universal builds before we can shut down the macOS Intel VM) |
Any reason the intel MacOS buildbot pr-osx-x64 is still needed considering pr-osx-universal will run on Intel? |
@nickp-ca It isn't needed and is planned to be shut down (see the comment above yours). |
Could you rebase this PR please? |
1d50999
to
7304ef0
Compare
Rebased. Also, updated to the latest MoltenVK release. The build time is a bit slow, even on incremental builds. I have some ideas for decreasing the build time, but I'll open a separate PR for that considering this one has already been reviewed and approved... |
Also, update MoltenVK to match Vulkan SDK 1.2.189.
7304ef0
to
2209dc0
Compare
Hello dear developer and thank you very much for your labour and such great emulator. Started from this release, the performance in many games (SMG, SMG2, Metroid Prime, Super Mario bros.) etc got significantly lower (2-3 times!). this is 100% tested on many releases (about 20 times). Initially I tried to update Dolphin on MacBook Air M1 from the beta version 15445 to the newest one 16101, because I had artifacts (black and transparent squares in the center of screen in Super Smash Bros. Brawl). And I was shocked what the performance became so poor. If at 15445 3X in Mario Galaxy and Metroid Prime was playable, now it became much worse, so only 1X and 2x (hardly) playable. So after that I installed many releases and found, that such visible, significant performance drop had started from this release. The size of the release also was changed started from this version 15474 (38.8mb instead of 34.9 before, so I suggest it was some global recompiling of the code). so now I stopped on version 15472 as the latest playable one. But unfortunately, I still have the same visual artifacts in Super Smash Bros... Could you ever be so kind to check whether there is some mistake in this release starting from 15474? |
All the version started from this release have kept crash with vulkan mode in my macbook pro. Like @iljatanger , the workable version stopped at 15472 in my macbook. Process: Dolphin OS Version: Mac OS X 10.15.7 (19H2) System Integrity Protection: enabled Crashed Thread: 28 Emuthread - Starting Exception Type: EXC_BAD_ACCESS (SIGSEGV) Termination Signal: Segmentation fault: 11 |
This is a known issue with MoltenVK and will be fixed as part of the next MoltenVK release. (https://bugs.dolphin-emu.org/issues/12840) |
I can second that. At the moment, even the deprecated OpenGL seems to be faster than Vulkan in the games I tested. This performance regression affects all releases including the most recent one (at the time of writing 5.0-16831). Late 2013 iMac with 3,1 GHz Quad-Core Intel Core i7 and NVIDIA GeForce GT 750M 1 GB running macOS 10.15.7. |
@pizuz Here's an experiment to try, if you want. Try running |
I did it the other way round and copied the last known good version 1.1.2 (which was the last update introduced in #9441) into latest master - which gave it some substantial speedup. The performance regression must have been introduced somewhere between 1.1.3 and 1.1.5 because even the latest SDK release binary didn't change anything. I'll try to bisect next.
EDIT: The regression was apparently introduced in either MVK 1.1.3 or 1.1.4 |
@pizuz If I remove the code signatures from those files, the file hashes end up being exactly the same. Perhaps you made a mistake somewhere?
|
Yes, I did apparently. I have no idea what happened. Anyway, the results are in: The regression was introduced in either 1.1.3 or 1.1.4. I can't say for sure, because 1.1.3 doesn't load in Dolphin. I will have to bisect MVK itself. |
@pizuz Could you try running the following command in the terminal with the latest dev build and see if performance gets any better vs when running it normally? It might cause some graphical glitches, though.
|
The command makes a difference in performance-light scenes such as splash screens which render twice as fast, now. In other instances, such as the attract screen in Super Mario Galaxy, there is no improvement, unfortunately. I'm still trying to find time to bisect those old PRs in MoltenVK to pinpoint the change that caused this. MVK 1.1.2 definitely runs a lot faster. |
Bisected to KhronosGroup/MoltenVK@4371ef4, which matches up with the pool reset issues I reported. If it is just this, reducing the size of our descriptor pool is a fine temporary solution for testing while we wait for MVK to properly fix this. If you could check that to confirm that it fixes performance (or just confirm that reducing the descriptor pool size fixes performance), that would be ideal. |
I‘ll try building it later today. Once I figure out how to get CMake to find zlib and sdl again. |
@pizuz I'd love to try that build myself. |
I'd love to share, but CMake still doesn't seem to let me build and I have no idea why that is. I'm getting no reasonable error message. It did work before. |
@pizuz Building using static SDL from the submodule on macOS does not work at this time. Install SDL as a shared library onto your system (for example, with brew) and it should work. |
Thanks for the hint. Now the Metal backend fails to build... Is there a CMake option to disable it? For the test, I don‘t need it.
|
@MayImilae @pizuz A potential fix landed in MoltenVK upstream - could you test this build and see if it improves anything? https://dl.dolphin-emu.org/prs/65/fe/pr-10968-dolphin-latest-universal.dmg |
@OatmealDome Yes, this build improved performance for me drastically! M1 Macbook Air Mario Galaxy 2 @ 1080p ~40 fps -> 60 fps, no dips. I was wondering why my iPhone 11 was outpreforming my M1 Mac in Dolphin, that explains it. |
That is a massive performance improvement! However it is slightly less than anticipated. So in the Elysia test I described previously before the perf drop I would get 60fps in MoltenVK (metal in latest master gets 61fps). With pr10968 I am getting 58fps. I'll hella take that, that's way better than 37fps, but it's slightly less than it was before. Possibly from other changes in Dolphin or in MoltenVK, no way to know. Either way, very nice. What change was it that fixed this? |
Thanks for testing! MoltenVK will be releasing a new version on August 15 to coincide with a new Vulkan SDK release. Not sure if we should be waiting for that, or if I should just make a PR updating to the latest upstream ASAP for the report. @MayImilae Tellow found that Hopefully I got that right. Tellow can chime in if I didn't :P |
It seems to improve the speed quite a bit (looks like it's on par with OpenGL again on the SMG splash screen), but the performance once achieved with MVK 1.1.2 was a lot better. I'm still seeing a substantial gap between Vulkan and Metal. |
@pizuz Do you know if it's possible to use MVK 1.1.2 with the latest version of dolphin (or can you direct me to the dolphin version that used 1.1.2)? I would like to test to see the difference in performance between the two versions. |
Just copy the dylib over (either from the last Dolphin version before this PR or from the official Vulkan SDK) and rename it to libvulkan.dylib. I don't know if the FBFetch patch is required for M1 macs, but since my nVidia card doesn't support that feature anyway, I never encountered any issues. |
@pizuz Could you try the fixed build with this command that I brought up a while ago?
I ask because a change was made to MoltenVK to better comply with the Vulkan specification which loses some performance, and they also had to workaround a bug in NVIDIA's GPU drivers and in doing so lost even more performance. Those two things combined might be the reason why you're still seeing a drop. Turning that flag on will restore the old behaviour. |
Again, no real difference. The “Please mind your surroundings and don't hit other people with your Wiimote” splash screen renders faster, the SMG attract screen that follows still has the same slowdowns as without the argument. I don't want to downplay the MVK fix, because it definitely brings some very good improvement. |
@pizuz could you use Instruments to collect System Traces and Metal System Traces of MVK 1.1.2 vs latest? How to collect traces in Instruments
|
I‘ll try over the weekend. |
Alright, these are the Metal traces and the system traces. I uploaded four different scenarios: MVK 1.1.2, MVK 1.1.11pre, OpenGL and Metal. I used the beginning of the Super Mario Galaxy attract screen, loaded from a save state and without a frame rate cap. Interestingly, the difference between MVK 1.1.11pre and 1.1.2 is less than I initially remembered. |
It looks like MVK is stalling for about 6ms every other frame when we attempt to present a surface, stuck waiting for some other task from a previous frame to finish. I've encountered this on my 750M laptop in PCSX2 as well, it can get really bad there. I haven't encountered it on any non-Nvidia GPU. I'll see if 1.1.2 runs PCSX2 faster on that laptop, and if it does, I'll bisect it there. |
|
@billhollings Also, are there any drawbacks to fences outside of the inability to sync across queues? AFAIK both Dolphin and PCSX2 use a single queue for everything, so if that's the only drawback, we could always enable fences from within the emulator to avoid having to set environment variables. For image count, we use |
Uggh. You're correct. 🤦🏻♂️ Are you in a position to override that line of code to force it on NVIDIA, and test whether it works properly? If it does work in some cases on NVIDIA, we could redesign the logic of the env vars to allow a kind of "force |
I removed the Nvidia check and ran it for ~10min in PCSX2 and ~5min in Dolphin and didn't get any crashes or other weird things. |
Thanks for the testing and feedback. I've created a MoltenVK issue to track changing MoltenVK to accommodate this situation. |
@TellowKrinkle, sorry, I didn't find the time to test it until now. During the last days, you guys merged a ton of performance optimisations and I didn't have the time to bisect, yet, to see if updating to MVK 1.2.0 brought any benefit in itself. Anyway, latest master provides a huge performance boost to the point that MVK actually outperforms your Metal backend on my machine. |
Nice. (If you're wondering, MoltenVK 1.2.0 got a new alternative to MTLEvents that works on Nvidia GPUs, which is now enabled by default) And with the now-working multithreaded submission, you can expect MVK performance to be similar to Metal in performance, depending on how fast your GPU's Metal driver is compared to MoltenVK (MVK does all the Metal work on the submission thread, so the Metal driver being slow doesn't slow down the emulator). In my testing, AMD's Metal driver is fast, while Intel's isn't so much. I guess Nvidia's is on the slower side of things. (Metal will still be faster in games with a lot of synchronization, like Paper Mario) |
In my limited testing capabilities, my nVidia Kepler card (iMac 14,3 - now on Monterey thanks to OCLP) is faster on MVK than using the Metal backend. Oh, and it‘s faster than my M2 MacBook Air. Good work :) |
Changes the build system to build MoltenVK instead of having a dylib in the repository. This also opens up the opportunity to modify MoltenVK to our needs if needed.
Also, the MoltenVK version has been updated to v1.1.5 (Vulkan SDK 1.2.189).
EDIT: This PR is being blocked by the presence of the Intel macOS buildbot (pr-osx-x64
). The M1 Mac Mini can build MoltenVK perfectly fine for both Intel and ARM, so the Intel buildbot needs to be shut down before can be merged.