Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fence_wait failed in rendering_device_driver_vulkan.cpp #94177

Closed
MartinFretigne opened this issue Jul 10, 2024 · 19 comments
Closed

fence_wait failed in rendering_device_driver_vulkan.cpp #94177

MartinFretigne opened this issue Jul 10, 2024 · 19 comments

Comments

@MartinFretigne
Copy link

MartinFretigne commented Jul 10, 2024

Tested versions

  • reproducible with Godot v4.3.beta2.official.b75f0485b

System information

Android 14 - Godot Engine v4.3.beta2.official.b75f0485b - Forward Mobile

Issue description

With my project, on Google Pixel 6a phone (Vulkan 1.3.269 - Forward Mobile - Using Device #0: ARM - Mali-G78), I get the following error USER ERROR: Unable to acquire framebuffer, continuously, after a while (its random, it happens sometimes after 1 minute, sometimes after 10). Its working fine with the compatibility renderer.

07-11 02:18:37.644  8470  8542 E godot   : USER ERROR: Condition "err != VK_SUCCESS" is true. Returning: FAILED
07-11 02:18:37.644  8470  8542 E godot   :    at: fence_wait (drivers/vulkan/rendering_device_driver_vulkan.cpp:2066)
07-11 02:18:37.644  8470  8542 E BufferQueueProducer: [SurfaceView[com.ggg.Game1/com.godot.game.GodotApp]#1(BLAST Consumer)1](id:211600000001,api:1,p:8470,c:8470) dequeueBuffer: attempting to exceed the max dequeued buffer count (2)
07-11 02:18:37.644  8470  8542 W vulkan  : dequeueBuffer timed out: Function not implemented (-38)
07-11 02:18:37.645  8470  8542 E godot   : USER ERROR: Unable to acquire framebuffer.
07-11 02:18:37.645  8470  8542 E godot   :    at: screen_prepare_for_drawing (servers/rendering/rendering_device.cpp:3503)
07-11 02:18:37.645  8470  8542 E godot   : USER ERROR: Condition "err != VK_SUCCESS" is true. Returning: FAILED
07-11 02:18:37.645  8470  8542 E godot   :    at: command_queue_execute_and_present (drivers/vulkan/rendering_device_driver_vulkan.cpp:2266)
07-11 02:18:37.645  8470  8542 E godot   : USER ERROR: Condition "err != VK_SUCCESS" is true. Returning: FAILED
07-11 02:18:37.645  8470  8542 E godot   :    at: command_queue_execute_and_present (drivers/vulkan/rendering_device_driver_vulkan.cpp:2266)
07-11 02:18:37.645  8470  8542 E godot   : USER ERROR: Condition "err != VK_SUCCESS" is true. Returning: FAILED
07-11 02:18:37.645  8470  8542 E godot   :    at: fence_wait (drivers/vulkan/rendering_device_driver_vulkan.cpp:2066)
07-11 02:18:37.645  8470  8542 E BufferQueueProducer: [SurfaceView[com.ggg.Game1/com.godot.game.GodotApp]#1(BLAST Consumer)1](id:211600000001,api:1,p:8470,c:8470) dequeueBuffer: attempting to exceed the max dequeued buffer count (2)
07-11 02:18:37.645  8470  8542 W vulkan  : dequeueBuffer timed out: Function not implemented (-38)
07-11 02:18:37.645  8470  8542 E godot   : USER ERROR: Unable to acquire framebuffer.
07-11 02:18:37.645  8470  8542 E godot   :    at: screen_prepare_for_drawing (servers/rendering/rendering_device.cpp:3503)

It may be related to UI elements, since when I delete some of them (Panels and RichTextLabels) the game wont freeze. It may also be a coincidence, since I was unable to reproduce with a basic scene + my UI. Sorry for the lack of info, I know it would be a miracle if someone understand the problem with the small amount of info I'm able to provide.

Anyway, I saw this bug about "screen_prepare_for_drawing Unable to acquire framebuffer" too, maybe its related ? #94104 Even though my issue happen on a mobile + mali and not on desktop + nvidia.

Steps to reproduce

N/A

Minimal reproduction project (MRP)

Run the project for a while. At some point the fire will stop moving and you should see the above errors in adb logcat.
game1-mrp.zip

@huwpascoe
Copy link
Contributor

If possible, please share the app's Android logcat events around the time of the crash. Might show why it's suddenly happening.

@akien-mga
Copy link
Member

Anyway, I saw this bug about "screen_prepare_for_drawing Unable to acquire framebuffer" too, maybe its related ? #94104 Even though my issue happen on a mobile + mali and not on desktop + nvidia.

It does sound related to that issue, CC @DarioSamo.

@MartinFretigne Aside from the error spam, does the game/app work fine? On desktop + nvidia it was found to be a benign issue and we've just silenced the error, which might have fixed the issue for mobile + mali too.

@DarioSamo
Copy link
Contributor

DarioSamo commented Jul 10, 2024

Anyway, I saw this bug about "screen_prepare_for_drawing Unable to acquire framebuffer" too, maybe its related ? #94104 Even though my issue happen on a mobile + mali and not on desktop + nvidia.

It does sound related to that issue, CC @DarioSamo.

Sounds unlikely if it keeps happening. It's benign on the other case because it just happens during resizing but it's just due to the swap chain out of date. From the sound of it in the log this sounds like it pretty much never recovers and the internal error doesn't sound very promising.

07-10 14:44:00.003 13570 13622 E BufferQueueProducer: [SurfaceView[com.ggg.Game1/com.godot.game.GodotApp]#1(BLAST Consumer)1](id:350200000001,api:1,p:13570,c:13570) dequeueBuffer: attempting to exceed the max dequeued buffer count (2)
07-10 14:44:00.003 13570 13622 W vulkan  : dequeueBuffer timed out: Function not implemented (-38)

That sounds like an internal driver error. I'd imagine this happens before the PR that added the message and it's just showing the error message instead of silently freezing.

@MartinFretigne
Copy link
Author

MartinFretigne commented Jul 11, 2024

l2.txt
I attached the adb logcats logs. @huwpascoe

@akien-mga No, from the moment where the errors are spammed, the app is unusable, as if it is frozen.

@DarioSamo
Copy link
Contributor

DarioSamo commented Jul 11, 2024

The very first error you get is:

07-11 02:18:37.644  8470  8542 E godot   : USER ERROR: Condition "err != VK_SUCCESS" is true. Returning: FAILED
07-11 02:18:37.644  8470  8542 E godot   :    at: fence_wait (drivers/vulkan/rendering_device_driver_vulkan.cpp:2066)

This shows screen prepare for drawing is clearly unrelated. You likely got a VK_DEVICE_LOST on that very first fence wait and are therefore getting a similar error every frame after. The GPU just froze at some point for whatever reason and it's just not recovering.

I'd clarify this on both the issue title and the initial post as it's very important, the error that is being repeated is just because another error triggered first.

@MartinFretigne
Copy link
Author

MartinFretigne commented Jul 11, 2024

If the GPU was frozen, would I still be able to use other apps ? Because only this app freeze when the problem happen while Android and other apps are still usable. By the way, this problem occurs on the two pixel 6a that I have. I will clarify @DarioSamo .

@MartinFretigne MartinFretigne changed the title screen_prepare_for_drawing Unable to acquire framebuffer fence_wait failed in rendering_device_driver_vulkan.cpp Jul 11, 2024
@DarioSamo
Copy link
Contributor

DarioSamo commented Jul 11, 2024

If the GPU was frozen, would I still be able to use other apps?

Yes, by frozen I mean it's frozen to the Godot application. There's a few states where the application will never recover if the driver fails. If you investigate the error codes returned by the functions that failed, most likely vkWaitForFences, you'll likely encounter the DEVICE_LOST error.

I think you'll have to provide whatever project you're having trouble with here, as you can even get this error from content like custom shaders and such if they're not correct.

@MartinFretigne
Copy link
Author

game1-mrp.zip
I managed to reduce the size of my app by 99% and still got the same error (after about one hour). I added the zip of my app.

The main scene (mrp4.tscn) should be left running to reproduce the problem (freeze and errors).

I could try to reduce the size of my app even more, but I feel like the more I delete things, the less often the problem occurs.

@huwpascoe
Copy link
Contributor

(There is something unusual, identical copies of a font embedded in theme.tres and mrp4.tscn, causing both to be very big. It'd be better for loading time and organization if the fonts were saved as a separate resource.)

One hour... what states did the app go through in that hour? Did it ever go into the background? Screen turn off? etc.

@MartinFretigne
Copy link
Author

MartinFretigne commented Jul 11, 2024

The app stayed in the foreground and the screen stayed on the whole time. I just started the app again, this time the error occurred in 4 minutes. Logs attached
logcat2.txt

(I will look into the font embedded in the theme and scene, that's not deliberate, I don't care about the font at all at this point. Thank you. edit: I removed the font then tested again -> same issue. It was worth a try.)

@huwpascoe
Copy link
Contributor

What we know

  • the application freezes, not crashes
  • there's no trace of a device loss occurring.
  • it's entirely random, worse the more busy the app is

I think it's a race condition, not a driver thing.

@huwpascoe
Copy link
Contributor

Fix Queue Synchronization

Looks like the work to resolve this might already be done.

@darksylinc
Copy link
Contributor

Fix Queue Synchronization
Looks like the work to resolve this might already be done.

I doubt that. That code seems to be a performance optimization.

The true problem is that the device appears to be lost. Even it avoids the error there, it's going to error 2 lines later with the call to vkResetFences.

Btw is it possible that you keep rendering more and more vertices?

ARM Mali has an upper bound of 180MBs of vertex data rendered per VkRenderPass. If this threshold is exceeded the driver will emit VK_ERROR_DEVICE_LOST.

AFAIK newer Malis don't have this limit but I'm not 90% sure, and I also don't know if your device has the limit.

@darksylinc
Copy link
Contributor

07-10 14:44:00.003 13570 13622 E BufferQueueProducer: [SurfaceView[com.ggg.Game1/com.godot.game.GodotApp]#1(BLAST Consumer)1](id:350200000001,api:1,p:13570,c:13570) dequeueBuffer: attempting to exceed the max dequeued buffer count (2)
07-10 14:44:00.003 13570 13622 W vulkan  : dequeueBuffer timed out: Function not implemented (-38)

That sounds like an internal driver error. I'd imagine this happens before the PR that added the message and it's just showing the error message instead of silently freezing.

Yes and no. The error is saying that Godot has queued up the maximum number of swapchain (or anything else, like command submissions) and the GPU is not consuming them.

Whether this is happening because of a deadlock, a GPU fault, is anyone's guess and could be either Godot's or the driver's (or GPUs!) fault.

@MartinFretigne
Copy link
Author

MartinFretigne commented Jul 12, 2024

Btw is it possible that you keep rendering more and more vertices?

No. The scene in the zip is static, it does not instantiate objects during runtime after the _ready function.
I guess I could add a printf in fence_wait to make sure the device is indeed 'lost', I don't believe I will have time to do that before I leave (this weekend, for about 4 weeks), but who knows. But at worst I will do it when I'm back (with any others suggestions I read here).

@clayjohn clayjohn modified the milestones: 4.3, 4.4 Jul 24, 2024
@MartinFretigne
Copy link
Author

Using the latest code on master, I printed the error in fence_wait, it shows -4 (VK_ERROR_DEVICE_LOST). I don't know what to do from here.

@vvvvvvitor
Copy link

I just got this error on my game

E 0:00:02:0437   fence_wait: Condition "err != VK_SUCCESS" is true. Returning: FAILED
  <Origem C++>   drivers/vulkan/rendering_device_driver_vulkan.cpp:2066 @ fence_wait()

@MartinFretigne
Copy link
Author

Just so you know : I switched to Legacy/OpenGL for one month, continued development and refactored my code a lot, then tried with Vulkan renderer again -> I don't have the problem anymore. I cannot provide any explanation, sorry, all I can say is that it probably came from something in my code and not in godot, since I have not changed version (I'm using 4.3). I'm closing this ticket.

@vvvvvvitor
Copy link

The issue still persists, you shouldn't have closed the ticket, as other people are still affected by it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants